## **"Automated State & Country Identification and Address Completion Using LangChain and OpenAI"**

**Objective:**
* This project aims to automatically identify and fill missing State and Country fields in a dataset using LangChain and OpenAI's GPT model. By leveraging AI, the system extracts accurate state names based on existing Pincode, Location, and Address data, ensuring a more complete and structured dataset for analysis.

**Step-1 : Load the necessary libraries**

In [2]:
import os 
import pandas as pd 
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.schema.messages import SystemMessage, HumanMessage

**Step-2 : Enter your API key**

In [3]:
os.environ["OPENAI_API_KEY"] = "Add_your _API_key"
llm = ChatOpenAI(model_name = "gpt-3.5-turbo", temperature = 0.3)

  llm = ChatOpenAI(model_name = "gpt-3.5-turbo", temperature = 0.3)


**Step-3 : Load the updated csv file**

In [4]:
file_path = "E:\\Projects\\updated_pincodes1.csv"
df = pd.read_csv(file_path)

In [5]:
df.head()

Unnamed: 0,Company Name,Street Name,Place,Location,Pincode,Address,text
0,Wipro,Dodda Kannelli,Sarjapur Road,Bangalore,560035,"Dodda Kannelli, Sarjapur Road, Bangalore","Dodda Kannelli, Sarjapur Road, Bangalore"
1,Wipro,No. 72,Keonics Electronic City,Bangalore,560100,"No. 72, Keonics Electronic City, Bangalore","No. 72, Keonics Electronic City, Bangalore"
2,Wipro,Survey No. 203/1,Manikonda Village,Hyderabad,500089,"Survey No. 203/1, Manikonda Village, Hyderabad","Survey No. 203/1, Manikonda Village, Hyderabad"
3,Wipro,Plot No. 2,MIDC,"Rajiv Gandhi Infotech Park ,Hinjewadi,pune",411057,"Plot No. 2, MIDC, Rajiv Gandhi Infotech Park...","Plot No. 2, MIDC, Rajiv Gandhi Infotech Park..."
4,TCS,185,Lloyds Road,"Gopalapuram,Chennai",600086,"185, Lloyds Road, Gopalapuram,Chennai","185, Lloyds Road, Gopalapuram,Chennai"


**Step-4 : Create an empty column in the dataset**

In [6]:
if "State"  not in df.columns:
    df["State"] = None

In [19]:
df = df.drop(columns = ["text"], axis = 1)

In [7]:
def get_state(pincode, location, address):
    """Use OpenAI to predict the state based on pincode, location and place."""
    messages = [
        SystemMessage(content = "You are an AI assistant that provides the correct Indian State Name based on the Pincode, Location and Place."),
        HumanMessage(content = f"Which state does the following belong to \nPincode : {pincode}\nLocation : {location}\nPlace : {address}\n return only the state name.")
    ]
    response = llm(messages)
    return response.content.strip()

**Step-5 : Fill missing values in the "State" column**

In [8]:
df["State"] = df.apply(lambda row : get_state(row["Pincode"], row["Location"], row["Place"]) if pd.isna(row["State"])else row["State"], axis = 1)

  response = llm(messages)


In [11]:
if "Country" not in df.columns:
    df["Country"] = None

In [13]:
def get_country(state):
    """Use OpenAI to predict the country based on state"""
    messages = [
        SystemMessage(content = "You are an AI assistant that provides the correct Country Name based on the State."),
        HumanMessage(content = f"Which country does the following belong to \nState : {state} return only country name.")
    ]
    response = llm(messages)
    return response.content.strip()

In [14]:
df["Country"] = df.apply(lambda row : get_country(row["State"]) if pd.isna(row["Country"]) else row["Country"], axis = 1)

In [20]:
df.head(21)

Unnamed: 0,Company Name,Street Name,Place,Location,Pincode,Address,State,Country
0,Wipro,Dodda Kannelli,Sarjapur Road,Bangalore,560035,"Dodda Kannelli, Sarjapur Road, Bangalore",Karnataka,India
1,Wipro,No. 72,Keonics Electronic City,Bangalore,560100,"No. 72, Keonics Electronic City, Bangalore",Karnataka,India
2,Wipro,Survey No. 203/1,Manikonda Village,Hyderabad,500089,"Survey No. 203/1, Manikonda Village, Hyderabad",Telangana,India
3,Wipro,Plot No. 2,MIDC,"Rajiv Gandhi Infotech Park ,Hinjewadi,pune",411057,"Plot No. 2, MIDC, Rajiv Gandhi Infotech Park...","The Pincode 411057, Location Rajiv Gandhi Info...",India
4,TCS,185,Lloyds Road,"Gopalapuram,Chennai",600086,"185, Lloyds Road, Gopalapuram,Chennai",Tamil Nadu,India
5,TCS,No. 1,Software Units Layout,"Madhapur,Hyderabad",500081,"No. 1, Software Units Layout, Madhapur,Hyderabad",Telangana,India
6,TCS,IT/ITES SEZ,Rajarhat,"New Town,Kolkata",700160,"IT/ITES SEZ, Rajarhat, New Town,Kolkata",West Bengal,India
7,TCS,No. 769,Anna Salai,Chennai,600002,"No. 769, Anna Salai, Chennai",Tamil Nadu,India
8,TCS,Deccan Park,Plot No. 1,"Software units layout,Madhapur,Hyderabad",500081,"Deccan Park, Plot No. 1, Software units layo...",Telangana,India
9,TCS,Think Campus,Electronic City Phase II,Bangalore,560100,"Think Campus, Electronic City Phase II, Banga...",Karnataka,India


**Step-6 : Download the updated file**

* If you want to download the csv execute the below code.

In [18]:
# df.to_csv("updated_states.csv", index = False)

**Conclusion:**
* This project successfully automates the process of identifying missing state and country names and completing address details using LangChain and OpenAI's GPT model. By leveraging AI, we ensure accurate and consistent data enrichment based on existing Pincode, Location, and Address information. This approach enhances data quality, minimizes manual effort, and provides a scalable solution for large datasets, making it valuable for businesses and applications that rely on precise geographical information.