## MultiDataFrame Agents in LangChain

As many other libraries, LangChain perfectly integrates with useful python libraries like `pandas`. For additional documentation see [LangChain-Pandas DataFrames](https://python.langchain.com/docs/integrations/tools/pandas/)

In [None]:
%pip install langchain langchain_experimental watermark openai tabulate -q

Note: you may need to restart the kernel to use updated packages.


In [5]:
import os
import warnings
from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent
from langchain.llms import OpenAI
import pandas as pd

from dotenv import load_dotenv
import os

# Load environment variables from .env file
load_dotenv()

openai_key = os.getenv("OPENAI_API_KEY")
hf_token = os.getenv("HUGGINGFACEHUB_API_TOKEN")

warnings.filterwarnings("ignore")

## Single Dataframe Example 

An agent can interact with a single DataFrame:

In [25]:
# get dataset from url (titanic dataset)

url = "https://raw.githubusercontent.com/adamerose/datasets/master/titanic.csv"
df=pd.read_csv(url)
print(df.shape)
df.head()

(891, 15)


Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


In [12]:
from langchain.agents.agent_types import AgentType
from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent
from langchain_openai import ChatOpenAI

In [18]:
agent = create_pandas_dataframe_agent(
    ChatOpenAI(temperature=0, model="gpt-3.5-turbo"),
    df,
    verbose=True,
    agent_type=AgentType.OPENAI_FUNCTIONS,
    allow_dangerous_code=True
)

In [19]:
agent.invoke("how many rows are there?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `python_repl_ast` with `{'query': 'df.shape[0]'}`


[0m[36;1m[1;3m891[0m[32;1m[1;3mThere are 891 rows in the dataframe.[0m

[1m> Finished chain.[0m


{'input': 'how many rows are there?',
 'output': 'There are 891 rows in the dataframe.'}

## Multiple DataFrames

In [21]:
# create another dummy dataframe by copying and substituting something

df1=df.copy()
df1["age"] = df1["age"].fillna(df1["age"].mean())
df1

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.000000,1,0,7.2500,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.000000,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.000000,0,0,7.9250,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.000000,1,0,53.1000,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.000000,0,0,8.0500,S,Third,man,True,,Southampton,no,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
886,0,2,male,27.000000,0,0,13.0000,S,Second,man,True,,Southampton,no,True
887,1,1,female,19.000000,0,0,30.0000,S,First,woman,False,B,Southampton,yes,True
888,0,3,female,29.699118,1,2,23.4500,S,Third,woman,False,,Southampton,no,False
889,1,1,male,26.000000,0,0,30.0000,C,First,man,True,C,Cherbourg,yes,True


In [28]:
df2=df1.copy()
df2["Age_Multiplied"]=df1["age"] * 2
df2.head()

Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone,Age_Multiplied
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False,44.0
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False,76.0
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True,52.0
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False,70.0
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True,70.0


In [29]:
agent = create_pandas_dataframe_agent(
    ChatOpenAI(temperature=0, model="gpt-3.5-turbo"),
    [df1,df2],
    verbose=True,
    agent_type=AgentType.OPENAI_FUNCTIONS,
    allow_dangerous_code=True
)

In [30]:
agent.run("How many rows in the age column are different")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `python_repl_ast` with `{'query': "df1['age'].nunique()"}`


[0m[36;1m[1;3m89[0m[32;1m[1;3mThere are 89 different values in the age column of the first dataframe.[0m

[1m> Finished chain.[0m


'There are 89 different values in the age column of the first dataframe.'

Of course these are very simple examples, but there are many possibilities to exploit the actual power of these DataFrame agents for deeper analysis and datasets managing. 