<a href="https://colab.research.google.com/github/Abdullah-shamito/LLMs-projects/blob/main/NL_Command_Based_Car_Dataset_Filtering_System.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Introduction

In this notebook, I will demonstrate the power of integrating LangChain with language models to create a robust system capable of interpreting natural language commands to filter and retrieve data from a CSV file.


My system consists of several key components:
1.	Data Parsing and Storage: I will load the dataset, which includes various attributes of cars, into a structured format that facilitates easy querying.
2.	Command Interpretation: Utilizing LangChain integrated with OpenAI's model, I will develop an agent capable of parsing natural language. This agent identifies the intent of user commands and extracts relevant filtering criteria such as car make, model, color, etc.
3.	Filtering Logic: Based on the interpreted commands, the agent applies filtering logic to the dataset. This logic is designed to match the extracted criteria against the dataset to find and retrieve the relevant records.
4.	User Interaction and Output: The results are then formatted and presented to the user in an easily understandable form.


Note : While LangChain offers a streamlined approach to building this system, it's important to note that other NLP techniques can also be utilized to achieve similar results. Tools and frameworks such as spaCy, TensorFlow, and PyTorch offer powerful capabilities for building custom NLP solutions. In subsequent sections of this notebook, I will provide sample cells demonstrating how these technologies can be employed to build similar systems.



In [None]:
# Install necessary libraries
#!pip install pandas tensorflow torch torchvision
#!pip install transformers

!pip -q install langchain openai
!pip install langchain_experimental

Collecting langchain_experimental
  Downloading langchain_experimental-0.0.59-py3-none-any.whl (199 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m199.5/199.5 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain-community<0.3,>=0.2 (from langchain_experimental)
  Downloading langchain_community-0.2.1-py3-none-any.whl (2.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m15.4 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community<0.3,>=0.2->langchain_experimental)
  Downloading dataclasses_json-0.6.6-py3-none-any.whl (28 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community<0.3,>=0.2->langchain_experimental)
  Downloading marshmallow-3.21.2-py3-none-any.whl (49 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.3/49.3 kB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting typing-inspect<1,>=0.4.0 

In [None]:
import os
import getpass

API_KEY = getpass.getpass("Enter your API key: ")
os.environ["OPENAI_API_KEY"] = API_KEY

Enter your API key: ··········


In [None]:
!pip show langchain

Name: langchain
Version: 0.2.1
Summary: Building applications with LLMs through composability
Home-page: https://github.com/langchain-ai/langchain
Author: 
Author-email: 
License: MIT
Location: /usr/local/lib/python3.10/dist-packages
Requires: aiohttp, async-timeout, langchain-core, langchain-text-splitters, langsmith, numpy, pydantic, PyYAML, requests, SQLAlchemy, tenacity
Required-by: langchain-community


#Data Parsing and Storage

In [None]:
import pandas as pd
# Load the dataset
url = '/content/sample_data/cars.csv'
df = pd.read_csv(url)

# Display the first few rows of the dataset
df.head()


Unnamed: 0,id,Car Brand,Car Model,Car Color,Year of Manufacture
0,1.0,Ford,Club Wagon,Teal,1993.0
1,2.0,Cadillac,Escalade ESV,Fuscia,2007.0
2,3.0,Mazda,Miata MX-5,Orange,2009.0
3,4.0,Audi,A4,Orange,2005.0
4,5.0,Nissan,Maxima,Pink,1994.0


#Initialization of LangChain CSV Agent for Command Interpretation and Data Filtering

In [None]:
from langchain_experimental.agents import create_csv_agent
from langchain.llms import OpenAI

In [None]:
agent = create_csv_agent(OpenAI(temperature=0),
                         '/content/sample_data/cars.csv',
                         verbose=True)

In [None]:
agent

AgentExecutor(verbose=True, agent=RunnableAgent(runnable=RunnableAssign(mapper={
  agent_scratchpad: RunnableLambda(lambda x: format_log_to_str(x['intermediate_steps']))
})
| PromptTemplate(input_variables=['agent_scratchpad', 'input'], partial_variables={'df_head': '|    |   id | Car Brand   | Car Model    | Car Color   |   Year of Manufacture |\n|---:|-----:|:------------|:-------------|:------------|----------------------:|\n|  0 |    1 | Ford        | Club Wagon   | Teal        |                  1993 |\n|  1 |    2 | Cadillac    | Escalade ESV | Fuscia      |                  2007 |\n|  2 |    3 | Mazda       | Miata MX-5   | Orange      |                  2009 |\n|  3 |    4 | Audi        | A4           | Orange      |                  2005 |\n|  4 |    5 | Nissan      | Maxima       | Pink        |                  1994 |', 'tools': 'python_repl_ast - A Python shell. Use this to execute python commands. Input should be a valid python command. When using this tool, sometimes outp

#Testing

In [None]:
agent.run("Get me all Audi cars with color red")

  warn_deprecated(




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to filter the dataframe by car brand and car color
Action: python_repl_ast
Action Input: df[(df['Car Brand'] == 'Audi') & (df['Car Color'] == 'Red')][0m[36;1m[1;3m          id Car Brand Car Model Car Color  Year of Manufacture
2277  2278.0      Audi    riolet       Red               1995.0
2683  2684.0      Audi        A5       Red               2012.0
3236  3237.0      Audi        S4       Red               2001.0
4203  4204.0      Audi        TT       Red               2008.0
5453  5454.0      Audi    riolet       Red               1997.0
5733  5734.0      Audi        S6       Red               1995.0
6003  6004.0      Audi        A6       Red               1996.0
6550  6551.0      Audi        Q7       Red               2011.0
8338  8339.0      Audi        TT       Red               2003.0
9486  9487.0      Audi        TT       Red               2000.0[0m[32;1m[1;3m I now know the final answer
Final An

"df[(df['Car Brand'] == 'Audi') & (df['Car Color'] == 'Red')]"

In [None]:
agent.run("Get all Nissan Altima")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to filter the dataframe for rows where the Car Brand is Nissan and the Car Model is Altima
Action: python_repl_ast
Action Input: df[(df['Car Brand'] == 'Nissan') & (df['Car Model'] == 'Altima')][0m[36;1m[1;3m          id Car Brand Car Model   Car Color  Year of Manufacture
36      37.0    Nissan    Altima      Orange               2002.0
475    476.0    Nissan    Altima      Orange               2000.0
512    513.0    Nissan    Altima  Aquamarine               2011.0
864    865.0    Nissan    Altima      Indigo               2009.0
889    890.0    Nissan    Altima      Orange               1995.0
1397  1398.0    Nissan    Altima      Fuscia               2006.0
1689  1690.0    Nissan    Altima     Crimson               1997.0
2047  2048.0    Nissan    Altima      Indigo               2012.0
2757  2758.0    Nissan    Altima      Violet               2013.0
2968  2969.0    Nissan    Altima      Yellow        

"df[(df['Car Brand'] == 'Nissan') & (df['Car Model'] == 'Altima')]"

In [None]:
agent.run("get all cars that were manufactured between 2000 and 2006")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to filter the dataframe based on the Year of Manufacture column
Action: python_repl_ast
Action Input: df[(df['Year of Manufacture'] >= 2000) & (df['Year of Manufacture'] <= 2006)][0m[36;1m[1;3m          id    Car Brand                Car Model   Car Color  \
3        4.0         Audi                       A4      Orange   
5        6.0      Bentley  Continental Flying Spur      Fuscia   
6        7.0  Lamborghini                 Gallardo      Violet   
12      13.0      Hyundai                  Elantra         Red   
30      31.0   Volkswagen               New Beetle      Purple   
...      ...          ...                      ...         ...   
9978  9979.0        Dodge                  Caravan  Aquamarine   
9991  9992.0        Honda                     CR-V        Teal   
9993  9994.0          GMC                    Yukon      Purple   
9994  9995.0         Ford                  Contour     Crimson   
9

"df[(df['Year of Manufacture'] >= 2000) & (df['Year of Manufacture'] <= 2006)]"

#Example of Command Interpretation using TensorFlow, and PyTorch

In [None]:
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Tokenize commands
tokenizer = Tokenizer()
sample_commands = [
    "Get me all Toyota cars with color red",
    "Get all Honda Civic",
    "Get all cars that were manufactured between 2000 and 2006"
]
tokenizer.fit_on_texts(sample_commands)
sequences = tokenizer.texts_to_sequences(sample_commands)
padded_sequences = pad_sequences(sequences, padding='post')

# Define the TensorFlow model
class CommandParserModel(tf.keras.Model):
    def __init__(self, vocab_size, embedding_dim, rnn_units):
        super(CommandParserModel, self).__init__()
        self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
        self.rnn = tf.keras.layers.LSTM(rnn_units, return_sequences=True, return_state=True)
        self.dense = tf.keras.layers.Dense(vocab_size)

    def call(self, x, states=None, return_state=False):
        x = self.embedding(x)
        if states is None:
            states = self.rnn.get_initial_state(x)
        x, state_h, state_c = self.rnn(x, initial_state=states)
        x = self.dense(x)
        if return_state:
            return x, (state_h, state_c)
        else:
            return x

# Create the model
vocab_size = len(tokenizer.word_index) + 1
embedding_dim = 64
rnn_units = 128
tf_model = CommandParserModel(vocab_size, embedding_dim, rnn_units)

# Example inference
example_input = tf.constant(padded_sequences)
predictions = tf_model(example_input)
print(predictions)


In [None]:
from torch.utils.data import DataLoader, Dataset

class CommandDataset(Dataset):
    def __init__(self, commands, tokenizer):
        self.commands = commands
        self.tokenizer = tokenizer
        self.sequences = tokenizer.texts_to_sequences(commands)
        self.padded_sequences = pad_sequences(self.sequences, padding='post')

    def __len__(self):
        return len(self.commands)

    def __getitem__(self, idx):
        return torch.tensor(self.padded_sequences[idx], dtype=torch.long)

# Define the PyTorch model
class CommandParser(nn.Module):
    def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim):
        super(CommandParser, self).__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.rnn = nn.LSTM(embedding_dim, hidden_dim, batch_first=True)
        self.fc = nn.Linear(hidden_dim, output_dim)

    def forward(self, x):
        x = self.embedding(x)
        x, _ = self.rnn(x)
        x = self.fc(x)
        return x

# Create the dataset and DataLoader
command_dataset = CommandDataset(sample_commands, tokenizer)
command_loader = DataLoader(command_dataset, batch_size=2, shuffle=True)

# Create the model
vocab_size = len(tokenizer.word_index) + 1
embedding_dim = 64
hidden_dim = 128
output_dim = vocab_size
torch_model = CommandParser(vocab_size, embedding_dim, hidden_dim, output_dim)

# Example inference
for batch in command_loader:
    output = torch_model(batch)
    print(output)
    break
