In [10]:
from langchain.agents.agent_types import AgentType
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
from langchain_experimental.agents.agent_toolkits import create_csv_agent

In [11]:
from getpass import getpass
OPEN_API_KEY = getpass()

import os
os.environ["OPENAI_API_KEY"] = OPEN_API_KEY

In [14]:
agent = create_csv_agent(
    ChatOpenAI(temperature=0, model="gpt-3.5-turbo-1106"),
    "titanic.csv",
    verbose=True,
    agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
)

In [8]:
print(agent.agent.llm_chain.prompt.template)


You are working with a pandas dataframe in Python. The name of the dataframe is `df`.
You should use the tools below to answer the question posed of you:

python_repl_ast: A Python shell. Use this to execute python commands. Input should be a valid python command. When using this tool, sometimes output is abbreviated - make sure it does not look abbreviated before using it in your answer.

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [python_repl_ast]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question


This is the result of `print(df.head())`:
{df_head}

Begin!
Question: {input}
{agent_scratchpad}


In [15]:
agent.run("how many people have more than 3 siblings?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I can use the pandas dataframe to filter for people with more than 3 siblings and then count the number of rows.
Action: python_repl_ast
Action Input: df[df['SibSp'] > 3]['PassengerId'].count()[0m
Observation: [36;1m[1;3m30[0m
Thought:[32;1m[1;3mThere are 30 people with more than 3 siblings.
Final Answer: 30[0m

[1m> Finished chain.[0m


'30'

In [16]:
agent.run("what is the square root of the average age?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to calculate the average age and then take the square root.
Action: python_repl_ast
Action Input: df['Age'].mean() ** 0.5[0m
Observation: [36;1m[1;3m5.449689683556195[0m
Thought:[32;1m[1;3mThe square root of the average age is 5.45
Final Answer: 5.45[0m

[1m> Finished chain.[0m


'5.45'

In [18]:
agent.run("what is the square root of the average age?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I can use the nunique() function to count the number of unique values in the Age column.
Action: python_repl_ast
Action Input: df['Age'].nunique()[0m
Observation: [36;1m[1;3m88[0m
Thought:[32;1m[1;3mI now know the final answer
Final Answer: There are 88 different rows in the Age column.[0m

[1m> Finished chain.[0m


'There are 88 different rows in the Age column.'

In [19]:
agent.run("which class of people have higher chances of survival?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to calculate the survival rate for each passenger class.
Action: python_repl_ast
Action Input: df.groupby('Pclass')['Survived'].mean()[0m
Observation: [36;1m[1;3mPclass
1    0.629630
2    0.472826
3    0.242363
Name: Survived, dtype: float64[0m
Thought:[32;1m[1;3mThe first class passengers have the highest survival rate, followed by second class and then third class.
Final Answer: First class passengers have the highest chances of survival.[0m

[1m> Finished chain.[0m


'First class passengers have the highest chances of survival.'

In [20]:
agent.run("from which place did people mostly board from?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I can use the value_counts() method on the 'Embarked' column to find out which place people mostly boarded from.
Action: python_repl_ast
Action Input: df['Embarked'].value_counts()[0m
Observation: [36;1m[1;3mEmbarked
S    644
C    168
Q     77
Name: count, dtype: int64[0m
Thought:[32;1m[1;3mThe people mostly boarded from 'S' (Southampton).
Final Answer: Southampton[0m

[1m> Finished chain.[0m


'Southampton'