## **"Filling Missing Pincodes Based on Address Using LangChain and OpenAI."**

**Objective:**
* This script is designed to automatically fill in missing pincodes based on address data using LangChain's OpenAI model. 
It extracts pincodes and states from addresses where they are missing and updates the dataset accordingly.

**Process:**
1. Load the dataset containing missing pincodes.
2. Use LangChain with OpenAI's model to generate pincodes from address details.
3. Replace missing pincodes with the extracted ones.
4. Save the updated dataset.

**Step-1 : Load the necessary libraries**

* The below libraries provide us to use the OpenAI models using Langchain and also it uses PromptTemplate model along Langchain schema messages.
* Along with this we also implement re libraries to extract the regular expressions from updated csv output.

In [38]:
import pandas as pd 
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
import os 
import re

**Step-2 : Setup Open AI Key**

* Provide the OpenAPI ai key below.

In [39]:
os.environ["OPENAPI_AI_KEY"]="Add your API key here"

**Step-3 : Load the Dataset**

* Load the dataset, the dataset which i provided consists of general address of companies along with missing pincodes of these addresses.

In [40]:
file_path="E:\\Projects\\Datasets\\Pincode_missing.csv"
df=pd.read_csv(file_path)

**Step-4 : Data Analysis and formatting**

In [41]:
df.head()

Unnamed: 0,Company Name,Street Name,Place,Location,Pincode
0,Wipro,Dodda Kannelli,Sarjapur Road,Bangalore,
1,Wipro,No. 72,Keonics Electronic City,Bangalore,560100.0
2,Wipro,Survey No. 203/1,Manikonda Village,Hyderabad,
3,Wipro,Plot No. 2,MIDC,"Rajiv Gandhi Infotech Park ,Hinjewadi,pune",411057.0
4,TCS,185,Lloyds Road,"Gopalapuram,Chennai",


In [42]:
df.isnull().sum()

Company Name    0
Street Name     0
Place           0
Location        0
Pincode         8
dtype: int64

**Step-5 : Merge the data columns**

In [43]:
df["Address"] = df["Street Name"] + ", " + df["Place"] + ", " + df["Location"]


**Step-6 : Setup the model**

* Your code is used to initialize an AI language model (ChatOpenAI) from LangChain.

In [44]:
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.3,openai_api_key="Add your API key here")

**Step-7 : Setup the message**

* This function uses OpenAI to predict the missing pincode for a given address. It creates a conversation with the AI by providing a system message (AI's role) and a human message (the query). The AI's response is then searched for a 6-digit pincode, which is extracted and returned.

In [45]:
from langchain.schema import SystemMessage, HumanMessage

def get_pincode(address):
    """Use OpenAI to predict the missing pincode."""
    messages = [
        SystemMessage(content="You are an AI assistant that provides accurate pincodes based on address."),
        HumanMessage(content=f"What is the pincode for {address}?")  
    ]
    response = llm(messages)
    match = re.search(r'\b\d{6}\b', response.content)
    return match.group(0) if match else ""


**Step-8 : Fill Missing Pincodes**


* The below code is used to apply the changes in the pincode and fill it.

In [46]:
df["Pincode"] = df.apply(lambda row: get_pincode(row["Address"]) if pd.isna(row["Pincode"]) else row["Pincode"], axis=1)

In [47]:
df['Pincode'] = df['Pincode'].astype(int)

**Step-9 : Save the updated file**

* The dataset in which pincode is filled is provided through this code kindly check the folder where the code is saved.

In [61]:
updated_file = "updated_pincodes1.csv"
df.to_csv(updated_file, index=False)

**Conclusion** 

* This script successfully automates the process of filling missing pincodes by leveraging OpenAI's language model. 
By analyzing address data, it ensures accurate and efficient completion of missing values, reducing manual effort. 
The updated dataset is saved for further use, improving data integrity and usability. 

* If there are any sort of errors in the code or the updated csv implement the code below.

In [49]:
# Load the dataset
"""file_path = "/mnt/data/updated_file2.csv"
df = pd.read_csv(file_path)

# Set up OpenAI API Key
os.environ["OPENAI_API_KEY"] = "your_openai_api_key_here"
llm = OpenAI()"""

'file_path = "/mnt/data/updated_file2.csv"\ndf = pd.read_csv(file_path)\n\n# Set up OpenAI API Key\nos.environ["OPENAI_API_KEY"] = "your_openai_api_key_here"\nllm = OpenAI()'

In [50]:
# Define prompt template for pincode extraction
"""
prompt = PromptTemplate(
    input_variables=["address"],
    template="Extract the pincode from the following address: {address}"
)
"""

'\nprompt = PromptTemplate(\n    input_variables=["address"],\n    template="Extract the pincode from the following address: {address}"\n)\n'

In [51]:
'''
def get_pincode(address):
    """Fetch pincode using LangChain's OpenAI model."""
    try:
        response = llm(prompt.format(address=address))
        return response.strip()
    except Exception as e:
        print(f"Error fetching pincode for {address}: {e}")
    return None
'''

'\ndef get_pincode(address):\n    """Fetch pincode using LangChain\'s OpenAI model."""\n    try:\n        response = llm(prompt.format(address=address))\n        return response.strip()\n    except Exception as e:\n        print(f"Error fetching pincode for {address}: {e}")\n    return None\n'

In [52]:
# Fill missing pincodes
'''
df.loc[df['Pincode'].str.contains('Please provide', na=False), 'Pincode'] = df.apply(
    lambda row: get_pincode(row['Address']) if pd.isna(row['Pincode']) or 'Please provide' in str(row['Pincode']) else row['Pincode'],
    axis=1
)
'''

"\ndf.loc[df['Pincode'].str.contains('Please provide', na=False), 'Pincode'] = df.apply(\n    lambda row: get_pincode(row['Address']) if pd.isna(row['Pincode']) or 'Please provide' in str(row['Pincode']) else row['Pincode'],\n    axis=1\n)\n"

In [53]:
# Save updated CSV
'''
df.to_csv("/mnt/data/updated_pincodes.csv", index=False)
print("Updated pincodes saved successfully.")
'''

'\ndf.to_csv("/mnt/data/updated_pincodes.csv", index=False)\nprint("Updated pincodes saved successfully.")\n'

* We can also read the updated csv and use it to make a chatbot which answers our questions based on the infoemation of csv file.

## **"Building a CSV-Based AI Question Answering System Using LangChain and OpenAI"**

**Objective:**
* This project aims to develop an AI-powered system that reads data from a CSV file and answers questions based on its contents. Using LangChain and OpenAI, the system processes structured text, converts it into searchable vector embeddings, and retrieves relevant information in response to user queries. This approach enhances data accessibility and automation in real-world applications like customer support, data analysis, and business intelligence.

**Step-1 : Load the necessary libraries**

In [54]:
import os 
import pandas as pd
from langchain.chat_models import ChatOpenAI
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.document_loaders import DataFrameLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA

**Step-2 : Add OpenAI Key**

In [55]:
os.environ ["OPENAI_API_KEY"] = "Add API Key here"

**Step-3 : Load the csv**
* Loads the CSV into a Pandas DataFrame or as LangChain Documents.

In [69]:
df = pd.read_csv("E:\\Projects\\updated_pincodes1.csv")

In [70]:
df.head()

Unnamed: 0,Company Name,Street Name,Place,Location,Pincode,Address,text
0,Wipro,Dodda Kannelli,Sarjapur Road,Bangalore,560035,"Dodda Kannelli, Sarjapur Road, Bangalore","Dodda Kannelli, Sarjapur Road, Bangalore"
1,Wipro,No. 72,Keonics Electronic City,Bangalore,560100,"No. 72, Keonics Electronic City, Bangalore","No. 72, Keonics Electronic City, Bangalore"
2,Wipro,Survey No. 203/1,Manikonda Village,Hyderabad,500089,"Survey No. 203/1, Manikonda Village, Hyderabad","Survey No. 203/1, Manikonda Village, Hyderabad"
3,Wipro,Plot No. 2,MIDC,"Rajiv Gandhi Infotech Park ,Hinjewadi,pune",411057,"Plot No. 2, MIDC, Rajiv Gandhi Infotech Park...","Plot No. 2, MIDC, Rajiv Gandhi Infotech Park..."
4,TCS,185,Lloyds Road,"Gopalapuram,Chennai",600086,"185, Lloyds Road, Gopalapuram,Chennai","185, Lloyds Road, Gopalapuram,Chennai"


**Step-4 : Load the csv as text**

In [71]:
# Ensure the DataFrame has a 'text' column for the DataFrameLoader
df['text'] = df['Address']

loader = DataFrameLoader(df, page_content_column='text')
docs = loader.load()

**Step-5 : Split the text into chunks**

* Thew below code splits the text into smaller parts for efficient search.


In [72]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 500, chunk_overlap = 100)
split_docs = text_splitter.split_documents(docs)

**Step-6 : Create a vector store**

* Create a vector database (FAISS) to store and retrieve similar text.


In [73]:
from langchain_community.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(openai_api_key=os.getenv("Add your API key here"))


In [81]:
embeddings = OpenAIEmbeddings(openai_api_key="Add your API key here")
vectorstore = FAISS.from_documents(split_docs, embeddings)

**Step-7 : Create a retrieval-based QA system**


In [82]:
qa = RetrievalQA.from_chain_type(llm = ChatOpenAI(model_name = 'gpt-3.5-turbo'), retriever = vectorstore.as_retriever())

**Step-8 : Ask questions**


In [84]:
query = "What is the pincode for XYZ address?"
response = qa.run(query)
print(response)

I don't have the information about the pincode for the addresses mentioned. You may need to look up the specific pincode for each address using an official postal service website or contact the respective local post office for accurate information.


In [88]:
query1 = "what is the pincode for hyderabad Location of Wipro according to the file"
response1= qa.run(query1)
print(response1)

Based on the locations provided in the file, the pincode for Hyderabad location of Wipro would be 500081.
