## **Title: "Extracting and Visualizing Chinese Address Information Using OpenAI and Folium"**

**Objective :**

* The objective of this project is to extract and visualize Chinese address information using OpenAI's language models and Folium. The project involves the following steps:

1. Import necessary libraries.
2. Set the OpenAI API Key.
3. Load the dataset containing Chinese addresses.
4. Display and check the dataframe information.
5. Add new columns for provinces and country if they don't exist.
6. Define functions to get provinces and country information using OpenAI.
7. Apply these functions to the dataframe to populate the new columns.
8. Save the updated dataframe to an Excel file.
9. Visualize the data on a map using Folium.

1. **Import necessary libraries**:


In [1]:
import pandas as pd
import numpy as np 
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain.prompts import PromptTemplate 
import xlrd 
from langchain.schema import SystemMessage, HumanMessage
import os

2. **Set OpenAI API Key**:

In [None]:
os.environ["OPENAI_API_KEY"] = "Replace your API key here"

3. **Load the dataset**:

In [3]:
df = pd.read_excel("E:\\Projects\\Datasets\\Chinese_addresses.xlsx")

4. **Display the first few rows of the dataframe**:


In [4]:
df.head()

Unnamed: 0,Company Name,Street Name,Place,City,Pincode
0,Wipro Chengdu Limited,天府大道南段599号天府软件园D2栋,天府软件园,成都,610041
1,Wipro Dalian Limited,甘井子区亿达春田BEST城D7栋,亿达春田BEST城,大连,116033
2,Cognizant Technology Solutions,汇贤园9号楼6-9层，腾飞软件园,高新技术产业园区,大连,-


5. **Check the dataframe information**:

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   Company Name  3 non-null      object
 1   Street Name   3 non-null      object
 2   Place         3 non-null      object
 3   City          3 non-null      object
 4   Pincode       3 non-null      object
dtypes: object(5)
memory usage: 252.0+ bytes


6. **Add a new column for provinces if it doesn't exist**:

In [6]:
if 'Provinces' not in df.columns:
    df["Provinces"] = None

7. **Define a function to get provinces using OpenAI**:

In [7]:
def get_provinces(pincode, street, place, city):
    """Use OpenAI to predict the province based on pincode, street,place, city """
    address = f"{pincode}, {street}, {city}"
    messages = [
        SystemMessage(content = "You are an AI assistant that extract provinces information from Chinese addresses"),
        HumanMessage(content = f"What is the provinces of the following address? {address}\n return only with provinces name in Chinese.")
    ]

    response = llm(messages)
    return response.content.strip() if response.content else ""

8. **Initialize the language model and apply the function to the dataframe**:

In [None]:
llm = ChatOpenAI(model_name='gpt-4', temperature=0.5, openai_api_key="Replace your API key here")

In [11]:
df['Provinces'] = df.apply(lambda row: get_provinces(row['Pincode'], row['Street Name'], row['Place'], row['City']) if pd.isna(row['Provinces']) else row['Provinces'], axis=1)

9. **Display the updated dataframe**:

In [12]:
df["Provinces"].head()

0     四川
1    辽宁省
2    辽宁省
Name: Provinces, dtype: object

10. **Add a new column for country if it doesn't exist**:

In [13]:
if "Country" not in df.columns:
    df["Country"] = None

11. **Define a function to get country using OpenAI**:

In [14]:
def get_country(Provinces, City, Place):
        address1 = f"{Provinces},{City},{Place}"
        messages = [
            SystemMessage(content="You are an AI assistant that extracts country information from Chinese address"),
            HumanMessage(content=f"What is the country in the following address? {address1}\n return only with country name in Chinese.")
        ]
        response = llm(messages)
        return response.content.strip() if response.content else ""

12. **Apply the function to the dataframe**:

In [15]:
df["Country"] = df.apply(lambda row: get_country(row["Provinces"], row["City"], row["Place"]) if pd.isna(row["Country"]) else row["Country"], axis =1)

In [16]:
df.head()

Unnamed: 0,Company Name,Street Name,Place,City,Pincode,Provinces,Country
0,Wipro Chengdu Limited,天府大道南段599号天府软件园D2栋,天府软件园,成都,610041,四川,中国
1,Wipro Dalian Limited,甘井子区亿达春田BEST城D7栋,亿达春田BEST城,大连,116033,辽宁省,中国
2,Cognizant Technology Solutions,汇贤园9号楼6-9层，腾飞软件园,高新技术产业园区,大连,-,辽宁省,中国


13. **Save the updated dataframe to an Excel file**:

In [None]:
# df.to_excel ("E:\\Projects\\updated_chinese_address.xlsx", index = False)

14. **Translate the dataset into english**

In [None]:
llm = ChatOpenAI(model = "gpt-4", temperature = 0, openai_api_key = "Replace your API key here")

In [19]:
def translate_text(text):
    if pd.isna(text) or text.strip()=="":
        return text
    
    prompt = f""" 
    Translate the following Chinese text into English while preserving its meaning:
    Text: {text}
    Translation:
    """ 
    response = llm([HumanMessage(content = prompt)])
    return response.content.strip()

In [20]:
columns_to_translate = ["Street Name", "Place", "City", "Provinces", "Country"]

for col in columns_to_translate:
    df[col] = df[col].apply(translate_text)

In [21]:
df.head()

Unnamed: 0,Company Name,Street Name,Place,City,Pincode,Provinces,Country
0,Wipro Chengdu Limited,"No. 599, South Section of Tianfu Avenue, Build...",Tianfu Software Park,Chengdu,610041,Sichuan,China
1,Wipro Dalian Limited,"D7 Building, Yida Chuntian BEST City, Ganjingz...",Yida Chuntian BEST City,Dalian,116033,Liaoning Province,China
2,Cognizant Technology Solutions,"Building 9, Floors 6-9, Huixian Garden, Tengfe...",High-tech Industrial Park,Dalian,-,Liaoning Province,China


15. **Visualize the data using Folium**:


In [23]:
import folium

# Create a map centered around China
china_map = folium.Map(location=[35.8617, 104.1954], zoom_start=4)

# Define a dictionary to map translated province names to their coordinates
province_coordinates = {
    'Sichuan': [30.5728, 104.0668],
    'Liaoning Province': [41.8057, 123.4315]
}

# Add markers to the map for each province
for index, row in df.iterrows():
    province = row['Provinces']
    if province in province_coordinates:
        folium.Marker(
            location=province_coordinates[province],
            popup=row['Company Name'],
            tooltip=province
        ).add_to(china_map)

# Display the map
china_map