<a href="https://colab.research.google.com/github/avinashnair02/Banglore_Houseprice_prediction/blob/main/LCEL(Langchain_Expression_Language).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Transitioning from Old class to New Pipe Base Operator

## 1. Understanding `Runnables`
- `Runnables` are self-contained units of work.
- Can be executed in isolation or combined for complex operations.
- Provides flexibility in execution (sync, async, parallel).

## 2. `RunnableParallel`
- Executes tasks concurrently.
- Useful for performance enhancement in scenarios where tasks can run independently.
- Syntax example:
    ```python
    from some_module import RunnableParallel
    ```

## 3. `RunnablePassthrough`
- A simple `Runnable` that passes inputs directly to outputs without modification.
- Helpful for debugging or chaining in pipelines.
- Example use case:
    ```python
    from some_module import RunnablePassthrough
    passthrough = RunnablePassthrough()
    result = passthrough.run(input_data)
    ```

## 4. `RunnableLambda`
- Allows quick, inline definitions of small, custom functions.
- Example:
    ```python
    from some_module import RunnableLambda
    lambda_op = RunnableLambda(lambda x: x * 2)
    result = lambda_op.run(5)  # Output: 10
    ```

## 5. Assign Functions
- Used to assign values or parameters during execution.
- Useful in data pipelines to update intermediate values.

## 6. Performance Improvement (Inference Speed)
- Focus on optimizing the inference speed by leveraging parallel execution.
- Use `RunnableParallel` or batching techniques.
- Consider optimizing data pipelines by removing unnecessary steps.

## 7. Async Invoke
- Executes operations asynchronously, improving the overall throughput of the system.
- Syntax example:
    ```python
    async def async_operation():
        result = await some_async_function()
    ```

## 8. Batch Support
- Handles multiple inputs at once to improve performance.
- Can be combined with `RunnableParallel` for parallel batch execution.

## 9. Async Batch Execution
- Combines asynchronous execution with batch processing for high-performance tasks.
- Reduces overall execution time for larger datasets.

## 10. Using `Itemgetter` with `LCEL`
- `Itemgetter` is used to extract specific items from collections.
- When combined with `LCEL` (LangChain Execution Layer), it can streamline complex operations.

## 11. Bind Tools
- `Bind` tools help to connect different steps in the pipeline.
- Ensures smooth data flow between various `Runnable` components.

## 12. Stream Support
- Keep your pipelines more responsive by incorporating stream support for data.
- This allows continuous data processing and near real-time outputs.
  


In [1]:
!pip install langchain_google_genai
!pip install langchain_community
!pip install langchain
!pip install langchain_huggingface
!pip install langchain_groq

Collecting langchain_google_genai
  Downloading langchain_google_genai-2.0.7-py3-none-any.whl.metadata (3.6 kB)
Collecting filetype<2.0.0,>=1.2.0 (from langchain_google_genai)
  Downloading filetype-1.2.0-py2.py3-none-any.whl.metadata (6.5 kB)
Downloading langchain_google_genai-2.0.7-py3-none-any.whl (41 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.3/41.3 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading filetype-1.2.0-py2.py3-none-any.whl (19 kB)
Installing collected packages: filetype, langchain_google_genai
Successfully installed filetype-1.2.0 langchain_google_genai-2.0.7
Collecting langchain_community
  Downloading langchain_community-0.3.13-py3-none-any.whl.metadata (2.9 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain_community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting httpx-sse<0.5.0,>=0.4.0 (from langchain_community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collect

In [None]:
from google.colab import userdata
GROQ_API_KEY=userdata.get('GROQ_API_KEY')
import os
os.environ["GROQ_API_KEY"]=GROQ_API_KEY

In [2]:
from google.colab import userdata
GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')
import os
os.environ["GOOGLE_API_KEY"]=GOOGLE_API_KEY

In [3]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
from langchain_google_genai import ChatGoogleGenerativeAI
llm = ChatGoogleGenerativeAI(model="gemini-1.0-pro")

In [4]:
'''from langchain_huggingface import HuggingFaceEmbeddings
embeddings=HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
from langchain_groq import ChatGroq
import os
llm=ChatGroq(model_name="Gemma2-9b-It")'''

'from langchain_huggingface import HuggingFaceEmbeddings\nembeddings=HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")\nfrom langchain_groq import ChatGroq\nimport os\nllm=ChatGroq(model_name="Gemma2-9b-It")'

# this is my simple chain (old chaining concept)

In [5]:
template= 'Hi! I am learning {skill}. Can you suggest me top 5 things to learn?\n'

In [6]:
from langchain import PromptTemplate

In [7]:
prompt = PromptTemplate(template=template,input_variables=["skill"])

In [8]:
print(prompt)

input_variables=['skill'] input_types={} partial_variables={} template='Hi! I am learning {skill}. Can you suggest me top 5 things to learn?\n'


In [9]:
from langchain import LLMChain

In [10]:
llm_chain = LLMChain(prompt=prompt,llm=llm)

  llm_chain = LLMChain(prompt=prompt,llm=llm)


In [11]:
print(llm_chain.run('Data Science'))

  print(llm_chain.run('Data Science'))


**Top 5 Things to Learn for Aspiring Data Scientists:**

1. **Python or R Programming:** These programming languages are essential for data manipulation, analysis, and visualization.

2. **Statistics and Probability:** Understand foundational concepts in statistics, such as probability distributions, hypothesis testing, and regression analysis.

3. **Machine Learning Algorithms:** Explore supervised and unsupervised learning algorithms, including linear regression, decision trees, and clustering.

4. **Data Visualization and Communication:** Effectively present data insights and findings using visualization tools like Tableau, Power BI, or matplotlib.

5. **Cloud Computing (e.g., AWS, Azure):** Gain experience with cloud platforms for data storage, processing, and analysis. This is becoming increasingly important for handling large datasets.


In [12]:
print(llm_chain.run({'skill':'Data Science'}))

1. **Programming Languages:** Python and R are the most popular programming languages for data science. Focus on mastering at least one of these languages.

2. **Data Structures and Algorithms:** Understand the fundamental concepts of data structures (e.g., lists, arrays, dictionaries) and algorithms (e.g., sorting, searching). This will help you manipulate and analyze data efficiently.

3. **Statistics and Probability:** Develop a strong foundation in statistics, including descriptive statistics, probability distributions, hypothesis testing, and regression analysis. This knowledge is essential for understanding and interpreting data.

4. **Machine Learning:** Learn about different machine learning algorithms (e.g., supervised, unsupervised), their applications, and how to evaluate their performance. This will enable you to build models that can predict outcomes or make decisions based on data.

5. **Data Visualization:** Master the art of visualizing data effectively. Learn how to us

# this is a implementation  using LCEL

In [13]:
llm

ChatGoogleGenerativeAI(model='models/gemini-1.0-pro', google_api_key=SecretStr('**********'), client=<google.ai.generativelanguage_v1beta.services.generative_service.client.GenerativeServiceClient object at 0x7f796c57f100>, default_metadata=())

In [14]:
prompt

PromptTemplate(input_variables=['skill'], input_types={}, partial_variables={}, template='Hi! I am learning {skill}. Can you suggest me top 5 things to learn?\n')

In [15]:
chain = prompt | llm

In [16]:
print(chain.invoke({'skill':'Big Data'}))

content='**Top 5 Essential Concepts for Big Data Learning:**\n\n1. **Data Management and Processing:**\n   - Hadoop Distributed File System (HDFS) and MapReduce\n   - Apache Spark and Apache Flink\n   - Data warehousing and data lakes\n\n2. **Data Analytics and Machine Learning:**\n   - Supervised and unsupervised learning algorithms\n   - Big data analytics tools (e.g., Apache Pig, Hive)\n   - Machine learning libraries (e.g., TensorFlow, Keras)\n\n3. **Data Visualization and Exploration:**\n   - Tools for data visualization (e.g., Tableau, Power BI)\n   - Exploratory data analysis techniques\n   - Data storytelling and presentation\n\n4. **Data Security and Governance:**\n   - Data encryption and access control\n   - Data privacy regulations (e.g., GDPR, CCPA)\n   - Data governance frameworks and best practices\n\n5. **Big Data Architectures and Technologies:**\n   - Cloud computing platforms (e.g., AWS, Azure, GCP)\n   - Data pipelines and data ingestion\n   - NoSQL databases and da

In [17]:
from langchain_core.output_parsers import StrOutputParser

In [18]:
parser = StrOutputParser()

In [19]:
chain = prompt | llm | parser

In [20]:
print(chain.invoke({'skill':'Machine Learning'}))

**Top 5 Things to Learn for Machine Learning Beginners:**

1. **Fundamentals of Machine Learning:**
   - Types of machine learning algorithms (supervised, unsupervised, reinforcement)
   - Basic concepts like features, labels, loss functions, and metrics
   - Common machine learning models (linear regression, decision trees, support vector machines)

2. **Data Preprocessing and Feature Engineering:**
   - Data cleaning, transformation, and normalization
   - Feature selection and dimensionality reduction
   - Handling missing values and outliers

3. **Model Training and Evaluation:**
   - Training machine learning models using different algorithms
   - Hyperparameter tuning and model selection
   - Evaluating model performance using metrics like accuracy, precision, recall, and F1-score

4. **Model Deployment and Monitoring:**
   - Deploying trained models into production environments
   - Monitoring model performance and detecting drift
   - Maintaining and updating models as new data

# lets discuss about the runnables

In [21]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough , RunnableLambda

In [22]:
chain = RunnablePassthrough()

In [23]:
chain.invoke('Welcome to this youtube channel')

'Welcome to this youtube channel'

In [24]:
chain = RunnablePassthrough() | RunnablePassthrough() | RunnablePassthrough()

In [25]:
chain.invoke('Welcome to my sunny"s youtube channel')

'Welcome to my sunny"s youtube channel'

In [26]:
def string_upper(input):
  return input.upper()

In [27]:
chain = RunnablePassthrough() | RunnableLambda(string_upper)

In [28]:
chain.invoke('Welcome to my sunny"s youtube channel')

'WELCOME TO MY SUNNY"S YOUTUBE CHANNEL'

In [29]:
string_upper.invoke('Welcome to my sunny"s youtube channel')

AttributeError: 'function' object has no attribute 'invoke'

In [30]:
chain = RunnableLambda(string_upper)

In [31]:
chain.invoke('Welcome to my sunny"s youtube channel')

'WELCOME TO MY SUNNY"S YOUTUBE CHANNEL'

In [32]:
chain = RunnableParallel({'x':RunnablePassthrough(),'y':RunnablePassthrough()})

In [33]:
chain.invoke("Sunny")

{'x': 'Sunny', 'y': 'Sunny'}

In [34]:
chain.invoke({'Youtube': '@sunnysavita10','Blog': "Sunny's blog"})

{'x': {'Youtube': '@sunnysavita10', 'Blog': "Sunny's blog"},
 'y': {'Youtube': '@sunnysavita10', 'Blog': "Sunny's blog"}}

In [35]:
lambda x: x['Blog']

<function __main__.<lambda>(x)>

In [36]:
chain = RunnableParallel({'x':RunnablePassthrough(),'Blog':lambda x: x['Blog']})

In [37]:
chain.invoke({'Youtube': '@sunnysavita10','Blog': "Sunny's blog"})

{'x': {'Youtube': '@sunnysavita10', 'Blog': "Sunny's blog"},
 'Blog': "Sunny's blog"}

In [38]:
def fetch_website(input: dict):
    output = input.get('Website','Not found')
    return output

In [39]:
mydict={'Youtube': '@sunnysavita10','Blog': "Sunny's blog"}

In [40]:
mydict.get("website","Not found")

'Not found'

In [41]:
chain = RunnableParallel({'Website':RunnablePassthrough() | RunnableLambda(fetch_website),
                          'Blog':lambda z: z['Blog']})

In [42]:
chain.invoke({'Youtube': '@sunnysavita10','Blog': "Sunny's blog"})

{'Website': 'Not found', 'Blog': "Sunny's blog"}

In [43]:
chain.invoke({'Youtube': '@sunnysavita10','Blog': "Sunny's blog" , 'Website' : 'sunnysavita.com'})

{'Website': 'sunnysavita.com', 'Blog': "Sunny's blog"}

In [44]:
def extra_func(input):
    return 'Happy Learning'

In [45]:
chain = RunnableParallel({'x' : RunnablePassthrough()}).assign(extra=RunnableLambda(extra_func))

In [46]:
chain = RunnableParallel({'x' : RunnablePassthrough()}).assign(y=RunnableLambda(extra_func))

In [47]:
chain.invoke('Hello')

{'x': 'Hello', 'y': 'Happy Learning'}

In [48]:
!pip install chromadb

Collecting chromadb
  Downloading chromadb-0.6.0-py3-none-any.whl.metadata (6.8 kB)
Collecting build>=1.0.3 (from chromadb)
  Downloading build-1.2.2.post1-py3-none-any.whl.metadata (6.5 kB)
Collecting chroma-hnswlib==0.7.6 (from chromadb)
  Downloading chroma_hnswlib-0.7.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (252 bytes)
Collecting fastapi>=0.95.2 (from chromadb)
  Downloading fastapi-0.115.6-py3-none-any.whl.metadata (27 kB)
Collecting uvicorn>=0.18.3 (from uvicorn[standard]>=0.18.3->chromadb)
  Downloading uvicorn-0.34.0-py3-none-any.whl.metadata (6.5 kB)
Collecting posthog>=2.4.0 (from chromadb)
  Downloading posthog-3.7.4-py2.py3-none-any.whl.metadata (2.0 kB)
Collecting onnxruntime>=1.14.1 (from chromadb)
  Downloading onnxruntime-1.20.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (4.5 kB)
Collecting opentelemetry-exporter-otlp-proto-grpc>=1.2.0 (from chromadb)
  Downloading opentelemetry_exporter_otlp_proto_grpc-1.29.0-py3-

In [50]:
from langchain_community.document_loaders import TextLoader, DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma

### Reading the txt files from source directory

loader = DirectoryLoader('./source', glob="./*.txt", loader_cls=TextLoader)
docs = loader.load()

### Creating Chunks using RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=50,
    chunk_overlap=10,
    length_function=len
)
new_docs = text_splitter.split_documents(documents=docs)
doc_strings = [doc.page_content for doc in new_docs]

###  BGE Embddings

'''from langchain.embeddings import HuggingFaceBgeEmbeddings

model_name = "BAAI/bge-base-en-v1.5"
model_kwargs = {'device': 'cpu'}
encode_kwargs = {'normalize_embeddings': True} # set True to compute cosine similarity
embeddings = HuggingFaceBgeEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs,
)
'''

### Creating Retriever using Vector DB

db = Chroma.from_documents(new_docs, embeddings)
retriever = db.as_retriever(search_kwargs={"k": 4})

ValueError: Expected Embedings to be non-empty list or numpy array, got [] in upsert.

In [None]:
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = PromptTemplate.from_template(template)


In [None]:
retrieval_chain = (
    RunnableParallel({"context": retriever, "question": RunnablePassthrough()})
    | prompt
    | llm
    | StrOutputParser()
    )

In [None]:
question ="what is llama3? can you highlight 3 important points?"

In [None]:
retrieval_chain.invoke(question)

'- Llama 3 is a large language model developed by Meta AI.\n- It was released in April 2024.\n- It is used in various applications, including language translation, question answering, and text generation.'

In [None]:
import time

start_time = time.time()

result = retrieval_chain.invoke(question)

print('Time taken:',time.time() - start_time)

Time taken: 1.6216003894805908


Exception ignored in: <coroutine object RunnableSequence.ainvoke at 0x7b0fd7ee3c30>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/google/colab/_variable_inspector.py", line 27, in run
KeyError: '__builtins__'
Exception ignored in: <coroutine object RunnableSequence.ainvoke at 0x7b0fd7ee3c30>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/google/colab/_variable_inspector.py", line 27, in run
KeyError: '__builtins__'


In [None]:
start_time = time.time()

result = retrieval_chain.ainvoke(question)

print('Time taken:',time.time() - start_time)

Time taken: 0.00014090538024902344


In [None]:
start_time = time.time()

batch_output = retrieval_chain.batch([
                        "what is llama3?",
                        "can you highlight 3 main properties?"
                       ])

print('Time taken:',time.time() - start_time)

Time taken: 0.9684896469116211


In [None]:
batch_output

['Llama 3 is a family of large language models developed by Meta AI.',
 'The provided context does not mention anything about the main properties, so I cannot extract the requested data from the provided context.']

In [None]:
start_time = time.time()

batch_output = await retrieval_chain.abatch([
                        "what is llama3?",
                        "can you highlight 3 main properties?"
                       ])

print('Time taken:',time.time() - start_time)

Time taken: 0.9242796897888184


In [None]:
batch_output

In [None]:
my_dict = {'Youtube': '@sunnysavita10','Blog': "sunny's blog" , 'Website' : 'sunnysavita.com'}
my_dict

{'Youtube': '@sunnysavita10',
 'Blog': "sunny's blog",
 'Website': 'sunnysavita.com'}

In [None]:
from operator import itemgetter

website = itemgetter('Website')

In [None]:
website(my_dict)

'sunnysavita.com'

In [None]:
template = """Answer the question based only on the following context:
{context}

Question: {question}

Answer in the following language: {language}
"""
prompt = PromptTemplate.from_template(template)


In [None]:
retrieval_chain = (
    RunnableParallel({"context": itemgetter('question') | retriever,
                       "question": itemgetter('question'),
                       "language": itemgetter('language')
                       })
    | prompt
    | llm
    | StrOutputParser()
    )

In [None]:
### itemgetter only works with dictionaries , input has to be a dict

response = retrieval_chain.invoke({'question': "what is llama3?",
                        'language': "Spnish"})

print(response)

Llama (Large Language Model Meta AI) es una familia


In [None]:
template = 'Hi! I am learning {skill}. Can you suggest me top 5 things to learn?\n'

prompt = PromptTemplate.from_template(template=template)

chain = prompt | llm

In [None]:
for s in chain.stream({'skill':'Big Data'}):
    print(s.content,end='')

**Top 5 Must-Learn Concepts for Big Data:**

1. **Data Management and Storage:** Understand various data storage technologies like Hadoop Distributed File System (HDFS), NoSQL databases (e.g., MongoDB, Cassandra), and data warehousing concepts.

2. **Data Processing and Analysis:** Learn about frameworks like Apache Spark, MapReduce, and Hadoop for processing and analyzing large datasets. Explore data mining, machine learning, and statistical techniques for extracting insights.

3. **Cloud Computing:** Familiarize yourself with cloud platforms like AWS, Azure, and GCP that offer scalable and cost-effective solutions for storing, processing, and analyzing big data.

4. **Data Visualization and Communication:** Gain expertise in tools like Tableau, Power BI, and QlikView to effectively communicate data insights to stakeholders. Learn about data visualization best practices and storytelling techniques.

5. **Data Governance and Security:** Understand data governance principles, data quali

In [None]:
import json
from langchain_core.messages import ToolMessage
from langchain_core.tools import tool
from langchain_core.utils.function_calling import convert_to_openai_tool

@tool
def multiply(first_number: int, second_number: int):
    """Multiplies two numbers together."""
    return first_number * second_number

model_with_tools = llm.bind(tools=[convert_to_openai_tool(multiply)])

In [None]:
response = model_with_tools.invoke('What is 35 * 46?')

In [None]:
response