## Transitioning from Old class to New Pipe Base Operator

## 1. Understanding `Runnables`
- `Runnables` are self-contained units of work.
- Can be executed in isolation or combined for complex operations.
- Provides flexibility in execution (sync, async, parallel).

## 2. `RunnableParallel`
- Executes tasks concurrently.
- Useful for performance enhancement in scenarios where tasks can run independently.
- Syntax example:
    ```python
    from some_module import RunnableParallel
    ```

## 3. `RunnablePassthrough`
- A simple `Runnable` that passes inputs directly to outputs without modification.
- Helpful for debugging or chaining in pipelines.
- Example use case:
    ```python
    from some_module import RunnablePassthrough
    passthrough = RunnablePassthrough()
    result = passthrough.run(input_data)
    ```

## 4. `RunnableLambda`
- Allows quick, inline definitions of small, custom functions.
- Example:
    ```python
    from some_module import RunnableLambda
    lambda_op = RunnableLambda(lambda x: x * 2)
    result = lambda_op.run(5)  # Output: 10
    ```

## 5. Assign Functions
- Used to assign values or parameters during execution.
- Useful in data pipelines to update intermediate values.

## 6. Performance Improvement (Inference Speed)
- Focus on optimizing the inference speed by leveraging parallel execution.
- Use `RunnableParallel` or batching techniques.
- Consider optimizing data pipelines by removing unnecessary steps.

## 7. Async Invoke
- Executes operations asynchronously, improving the overall throughput of the system.
- Syntax example:
    ```python
    async def async_operation():
        result = await some_async_function()
    ```

## 8. Batch Support
- Handles multiple inputs at once to improve performance.
- Can be combined with `RunnableParallel` for parallel batch execution.

## 9. Async Batch Execution
- Combines asynchronous execution with batch processing for high-performance tasks.
- Reduces overall execution time for larger datasets.

## 10. Using `Itemgetter` with `LCEL`
- `Itemgetter` is used to extract specific items from collections.
- When combined with `LCEL` (LangChain Execution Layer), it can streamline complex operations.

## 11. Bind Tools
- `Bind` tools help to connect different steps in the pipeline.
- Ensures smooth data flow between various `Runnable` components.

## 12. Stream Support
- Keep your pipelines more responsive by incorporating stream support for data.
- This allows continuous data processing and near real-time outputs.
  


In [1]:
!pip install langchain_google_genai
!pip install langchain_community
!pip install langchain
!pip install langchain_huggingface
!pip install langchain_groq

Collecting langchain_google_genai
  Downloading langchain_google_genai-2.1.9-py3-none-any.whl.metadata (7.2 kB)
Collecting filetype<2.0.0,>=1.2.0 (from langchain_google_genai)
  Downloading filetype-1.2.0-py2.py3-none-any.whl.metadata (6.5 kB)
Collecting google-ai-generativelanguage<0.7.0,>=0.6.18 (from langchain_google_genai)
  Downloading google_ai_generativelanguage-0.6.18-py3-none-any.whl.metadata (9.8 kB)
Downloading langchain_google_genai-2.1.9-py3-none-any.whl (49 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 kB[0m [31m992.1 kB/s[0m eta [36m0:00:00[0m
[?25hDownloading filetype-1.2.0-py2.py3-none-any.whl (19 kB)
Downloading google_ai_generativelanguage-0.6.18-py3-none-any.whl (1.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m10.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: filetype, google-ai-generativelanguage, langchain_google_genai
  Attempting uninstall: google-ai-generativelan

Collecting langchain_community
  Downloading langchain_community-0.3.27-py3-none-any.whl.metadata (2.9 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain_community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain_community)
  Downloading pydantic_settings-2.10.1-py3-none-any.whl.metadata (3.4 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain_community)
  Downloading httpx_sse-0.4.1-py3-none-any.whl.metadata (9.4 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain_community)
  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain_community)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting python-dotenv>=0.21.0 (from pydantic-settings<3.0.0,>=2.4.0->langchain_community)
  Downloading python_dotenv-1.1.1-py3-none-any.whl.metadata (24 k

In [2]:
from google.colab import userdata
GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')
import os
os.environ["GOOGLE_API_KEY"]=GOOGLE_API_KEY

In [3]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")
from langchain_google_genai import ChatGoogleGenerativeAI
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash-lite-001")

In [4]:
'''from langchain_huggingface import HuggingFaceEmbeddings
embeddings=HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
from langchain_groq import ChatGroq
import os
llm=ChatGroq(model_name="Gemma2-9b-It")'''

'from langchain_huggingface import HuggingFaceEmbeddings\nembeddings=HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")\nfrom langchain_groq import ChatGroq\nimport os\nllm=ChatGroq(model_name="Gemma2-9b-It")'

# This is my simple chain (old chaining concept)

In [5]:
template= 'Hi! I am learning {skill}. Can you suggest me top 5 things to learn?\n'

In [6]:
from langchain import PromptTemplate

In [7]:
prompt = PromptTemplate(template=template,input_variables=["skill"])

In [8]:
print(prompt)

input_variables=['skill'] input_types={} partial_variables={} template='Hi! I am learning {skill}. Can you suggest me top 5 things to learn?\n'


In [9]:
from langchain import LLMChain
llm_chain = LLMChain(prompt=prompt,llm=llm)
print(llm_chain.run('Data Science'))

  llm_chain = LLMChain(prompt=prompt,llm=llm)
  print(llm_chain.run('Data Science'))


That's great! Data Science is a fascinating and rapidly growing field. Here are the top 5 things I recommend you learn, along with some context and resources:

1.  **Programming Fundamentals (Python or R):** This is the absolute bedrock. You need a programming language to manipulate data, build models, and automate tasks.

    *   **Why:** Data science heavily relies on code. Python and R are the dominant languages.
    *   **What to Learn:**
        *   **Variables and Data Types:** Integers, floats, strings, booleans.
        *   **Data Structures:** Lists, dictionaries (Python) or lists, data frames (R).
        *   **Control Flow:** `if/else` statements, `for` and `while` loops.
        *   **Functions:** Defining and using functions to modularize your code.
        *   **Libraries:**  You'll need to learn specific libraries, but start with the basics.
    *   **Resources:**
        *   **Python:**
            *   **Codecademy:**  [https://www.codecademy.com/learn/learn-python-3](h

In [10]:
print(llm_chain.run({'skill':'Data Science'}))

That's fantastic! Data Science is a rewarding field. Here are the top 5 things I recommend you focus on when starting out, along with explanations and how to approach learning them:

1.  **Python Programming:**

    *   **Why it's crucial:** Python is the dominant language in data science. It has a vast ecosystem of libraries specifically designed for data manipulation, analysis, and machine learning.  It's relatively easy to learn and has excellent community support.
    *   **What to learn:**
        *   **Fundamentals:** Variables, data types (integers, floats, strings, booleans, lists, dictionaries, tuples), operators, control flow (if/else statements, loops), functions, and object-oriented programming (classes and objects - start with the basics).
        *   **Key libraries:**
            *   **NumPy:** For numerical computing, array operations (essential for handling large datasets).
            *   **Pandas:** For data manipulation and analysis (data cleaning, transformation, a

# This is a implementation  using LCEL

In [11]:
llm

ChatGoogleGenerativeAI(model='models/gemini-2.0-flash-lite-001', google_api_key=SecretStr('**********'), client=<google.ai.generativelanguage_v1beta.services.generative_service.client.GenerativeServiceClient object at 0x7a503f96b2d0>, default_metadata=(), model_kwargs={})

In [12]:
prompt

PromptTemplate(input_variables=['skill'], input_types={}, partial_variables={}, template='Hi! I am learning {skill}. Can you suggest me top 5 things to learn?\n')

In [13]:
chain = prompt | llm

In [14]:
print(chain.invoke({'skill':'Big Data'}))

content='Okay, diving into the world of Big Data can feel overwhelming, but here\'s a recommended top 5 list of things to learn, focusing on practical skills and a solid foundation:\n\n1.  **Data Storage and Processing Fundamentals (Hadoop Ecosystem):**\n\n    *   **Why it\'s crucial:**  You need to understand how to store and process massive datasets. The Hadoop ecosystem (or its modern alternatives) is the cornerstone of this.\n    *   **What to learn:**\n        *   **Hadoop Distributed File System (HDFS):** Learn how data is stored across multiple machines (nodes) for redundancy and scalability. Understand concepts like block sizes, replication, and data locality.\n        *   **MapReduce (or its modern replacement):**  Grasp the basic idea of breaking down large computations into parallel tasks. While MapReduce itself is often considered "legacy," understanding the concept is vital to understanding how distributed processing works.  The newer frameworks like Spark are built on sim

In [15]:
from langchain_core.output_parsers import StrOutputParser

In [16]:
parser = StrOutputParser()

In [17]:
chain = prompt | llm | parser

In [18]:
print(chain.invoke({'skill':'Machine Learning'}))

That's fantastic! Machine learning is a fascinating field. Here are the top 5 things I recommend you focus on when starting out, in a suggested order of learning:

1.  **Python Programming Fundamentals:**
    *   **Why it's essential:** Python is the dominant language in machine learning.  You'll need it to write your code, manipulate data, and build models.
    *   **What to learn:**
        *   **Basic syntax:** Variables, data types (integers, floats, strings, booleans, lists, dictionaries), operators.
        *   **Control flow:** `if/else` statements, `for` and `while` loops.
        *   **Functions:**  Defining and calling functions, understanding arguments and return values.
        *   **Data structures:** Lists, dictionaries, tuples, sets (how to use, access elements, and manipulate them).
        *   **Object-Oriented Programming (OOP) Basics (optional, but helpful):** Classes, objects, methods, inheritance.
        *   **Libraries:**  Get familiar with how to install and imp

# Lets discuss about the runnables

In [19]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough , RunnableLambda

In [20]:
chain = RunnablePassthrough()

In [21]:
chain.invoke('Welcome to this youtube channel')

'Welcome to this youtube channel'

In [22]:
chain = RunnablePassthrough() | RunnablePassthrough() | RunnablePassthrough()

In [24]:
chain.invoke('Welcome to my Atif"s youtube channel')

'Welcome to my Atif"s youtube channel'

In [25]:
def string_upper(input):
  return input.upper()

In [26]:
chain = RunnablePassthrough() | RunnableLambda(string_upper)

In [28]:
chain.invoke('Welcome to my Atif"s youtube channel')

'WELCOME TO MY ATIF"S YOUTUBE CHANNEL'

In [29]:
!pip install chromadb

Collecting chromadb
  Downloading chromadb-1.0.16-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.4 kB)
Collecting pybase64>=1.4.1 (from chromadb)
  Downloading pybase64-1.4.2-cp311-cp311-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl.metadata (8.7 kB)
Collecting posthog<6.0.0,>=2.4.0 (from chromadb)
  Downloading posthog-5.4.0-py3-none-any.whl.metadata (5.7 kB)
Collecting onnxruntime>=1.14.1 (from chromadb)
  Downloading onnxruntime-1.22.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (4.6 kB)
Collecting opentelemetry-api>=1.2.0 (from chromadb)
  Downloading opentelemetry_api-1.36.0-py3-none-any.whl.metadata (1.5 kB)
Collecting opentelemetry-exporter-otlp-proto-grpc>=1.2.0 (from chromadb)
  Downloading opentelemetry_exporter_otlp_proto_grpc-1.36.0-py3-none-any.whl.metadata (2.4 kB)
Collecting opentelemetry-sdk>=1.2.0 (from chromadb)
  Downloading opentelemetry_sdk-1.36.0-py3-none-any.whl.metadata (1.5 k

In [34]:
import os

# Create the source directory if it doesn't exist
if not os.path.exists('./source'):
    os.makedirs('./source')

# Create a dummy text file in the source directory
with open('./source/dummy_doc.txt', 'w') as f:
    f.write('This is a dummy document for testing.')

In [35]:
from langchain_community.document_loaders import TextLoader, DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma

### Reading the txt files from source directory

loader = DirectoryLoader('./source', glob="./*.txt", loader_cls=TextLoader)
docs = loader.load()

### Creating Chunks using RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=50,
    chunk_overlap=10,
    length_function=len
)
new_docs = text_splitter.split_documents(documents=docs)
doc_strings = [doc.page_content for doc in new_docs]

###  BGE Embddings

'''from langchain.embeddings import HuggingFaceBgeEmbeddings

model_name = "BAAI/bge-base-en-v1.5"
model_kwargs = {'device': 'cpu'}
encode_kwargs = {'normalize_embeddings': True} # set True to compute cosine similarity
embeddings = HuggingFaceBgeEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs,
)
'''

### Creating Retriever using Vector DB

db = Chroma.from_documents(new_docs, embeddings)
retriever = db.as_retriever(search_kwargs={"k": 4})

In [36]:
template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = PromptTemplate.from_template(template)


In [37]:
retrieval_chain = (
    RunnableParallel({"context": retriever, "question": RunnablePassthrough()})
    | prompt
    | llm
    | StrOutputParser()
    )

In [48]:
question ="Pakistan?"

In [49]:
retrieval_chain.invoke(question)

'The provided context does not contain any information about Pakistan. Therefore, I cannot answer the question.'

In [42]:
import time

start_time = time.time()

result = retrieval_chain.invoke(question)

print('Time taken:',time.time() - start_time)

Time taken: 1.0051040649414062


In [43]:
start_time = time.time()

batch_output = retrieval_chain.batch([
                        "what is llama3?",
                        "can you highlight 3 main properties?"
                       ])

print('Time taken:',time.time() - start_time)

Time taken: 2.474843740463257


In [50]:
batch_output

['Based on the provided context, I cannot answer the question "what is llama3?". The document only contains the text "This is a dummy document for testing." and does not mention llama3.',
 'Based on the provided context, I cannot highlight 3 main properties. The document only contains a single sentence: "This is a dummy document for testing." There are no properties mentioned.']

In [51]:
template = """Answer the question based only on the following context:
{context}

Question: {question}

Answer in the following language: {language}
"""
prompt = PromptTemplate.from_template(template)


In [54]:
from operator import itemgetter

retrieval_chain = (
    RunnableParallel({"context": itemgetter('question') | retriever,
                       "question": itemgetter('question'),
                       "language": itemgetter('language')
                       })
    | prompt
    | llm
    | StrOutputParser()
    )

In [55]:
### itemgetter only works with dictionaries , input has to be a dict

response = retrieval_chain.invoke({'question': "what is llama3?",
                        'language': "Spnish"})

print(response)

No se puede determinar qué es llama3 basándome en la información proporcionada. El documento solo dice que es un documento de prueba.


In [56]:
template = 'Hi! I am learning {skill}. Can you suggest me top 5 things to learn?\n'

prompt = PromptTemplate.from_template(template=template)

chain = prompt | llm

In [57]:
for s in chain.stream({'skill':'Big Data'}):
    print(s.content,end='')

Okay, here are my top 5 suggestions for what to learn when starting with Big Data, designed to give you a solid foundation:

1.  **Fundamentals of Data and Databases (SQL & NoSQL):**

    *   **Why it's important:** You need to understand how data is structured, stored, and retrieved. This is the bedrock of any big data project.  Even with advanced tools, you'll often need to interact with data through databases.
    *   **What to learn:**
        *   **SQL (Structured Query Language):** Learn the basics of querying relational databases (e.g., MySQL, PostgreSQL, SQL Server). Focus on `SELECT`, `FROM`, `WHERE`, `JOIN`, `GROUP BY`, and `ORDER BY` clauses.  Understand data types and normalization.
        *   **NoSQL Databases:** Get a conceptual understanding of NoSQL databases like MongoDB (document-oriented), Cassandra (column-family), and Redis (key-value).  Understand when and why you'd choose a NoSQL database over SQL.  Learn basic CRUD (Create, Read, Update, Delete) operations for 

In [58]:
import json
from langchain_core.messages import ToolMessage
from langchain_core.tools import tool
from langchain_core.utils.function_calling import convert_to_openai_tool

@tool
def multiply(first_number: int, second_number: int):
    """Multiplies two numbers together."""
    return first_number * second_number

model_with_tools = llm.bind(tools=[convert_to_openai_tool(multiply)])

In [59]:
response = model_with_tools.invoke('What is 35 * 46?')

In [60]:
response

AIMessage(content='', additional_kwargs={'function_call': {'name': 'multiply', 'arguments': '{"second_number": 46.0, "first_number": 35.0}'}}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.0-flash-lite-001', 'safety_ratings': []}, id='run--4cafb001-a64a-4e38-9605-d10db84eb1f9-0', tool_calls=[{'name': 'multiply', 'args': {'second_number': 46.0, 'first_number': 35.0}, 'id': 'f0a500f3-24a4-467e-92b6-7e4700d2c2f1', 'type': 'tool_call'}], usage_metadata={'input_tokens': 32, 'output_tokens': 9, 'total_tokens': 41, 'input_token_details': {'cache_read': 0}})

# About the Author

<div style="background-color: #f8f9fa; border-left: 5px solid #28a745; padding: 20px; margin-bottom: 20px; border-radius: 5px;">
  <h2 style="color: #28a745; margin-top: 0; font-family: 'Poppins', sans-serif;">Muhammad Atif Latif</h2>
  <p style="font-size: 16px; color: #495057;">Data Scientist & Machine Learning Engineer</p>
  
  <p style="font-size: 15px; color: #6c757d; margin-top: 15px;">
    Passionate about building AI solutions that solve real-world problems. Specialized in machine learning,
    deep learning, and data analytics with experience implementing production-ready models.
  </p>
</div>

## Connect With Me

<div style="display: flex; flex-wrap: wrap; gap: 10px; margin-top: 15px;">
  <a href="https://github.com/m-Atif-Latif" target="_blank">
    <img src="https://img.shields.io/badge/GitHub-Follow-212121?style=for-the-badge&logo=github" alt="GitHub">
  </a>
  <a href="https://www.kaggle.com/matiflatif" target="_blank">
    <img src="https://img.shields.io/badge/Kaggle-Profile-20BEFF?style=for-the-badge&logo=kaggle" alt="Kaggle">
  </a>
  <a href="https://www.linkedin.com/in/muhammad-atif-latif-13a171318" target="_blank">
    <img src="https://img.shields.io/badge/LinkedIn-Connect-0077B5?style=for-the-badge&logo=linkedin" alt="LinkedIn">
  </a>
  <a href="https://x.com/mianatif5867" target="_blank">
    <img src="https://img.shields.io/badge/Twitter-Follow-1DA1F2?style=for-the-badge&logo=twitter" alt="Twitter">
  </a>
  <a href="https://www.instagram.com/its_atif_ai/" target="_blank">
    <img src="https://img.shields.io/badge/Instagram-Follow-E4405F?style=for-the-badge&logo=instagram" alt="Instagram">
  </a>
  <a href="mailto:muhammadatiflatif67@gmail.com">
    <img src="https://img.shields.io/badge/Email-Contact-D14836?style=for-the-badge&logo=gmail" alt="Email">
  </a>
</div>

---