# PandasAI

PandasAI is a Python library that adds Generative AI capabilities to pandas, the popular data analysis and manipulation tool. It is designed to be used in conjunction with pandas, and is not a replacement for it.

PandasAI makes pandas (and all the most used data analyst libraries) conversational, allowing you to ask questions to your data in natural language. For example, you can ask PandasAI to find all the rows in a DataFrame where the value of a column is greater than 5, and it will return a DataFrame containing only those rows.

You can also ask PandasAI to draw graphs, clean data, impute missing values, and generate features.

```
pip install pandasai
```

In [3]:
from pandasai import SmartDataframe
import pandas as pd 


df = pd.DataFrame({
    "country": [
        "United States",
        "United Kingdom",
        "France",
        "Germany",
        "Italy",
        "Spain",
        "Canada",
        "Australia",
        "Japan",
        "China",
    ],
    "gdp": [
        19294482071552,
        2891615567872,
        2411255037952,
        3435817336832,
        1745433788416,
        1181205135360,
        1607402389504,
        1490967855104,
        4380756541440,
        14631844184064,
    ],
    "happiness_index": [6.94, 7.16, 6.66, 7.07, 6.38, 6.4, 7.23, 7.22, 5.87, 5.12],
})

In [None]:
from pandasai.llm import OpenAI

llm = OpenAI(api_token="YOUR TOKEN")

In [None]:
sdf = SmartDataframe(df, config={"llm": llm})

In [None]:
sdf[sdf['country'] == 'United States']

In [None]:
sdf.chat("Return the top 5 countries by GDP")

In [None]:
sdf.chat("What's the sum of the gdp of the 2 unhappiest countries?")

In [None]:
print(sdf.last_code_generated)

In [None]:
sdf.chat("Plot a chart of the gdp by country")

In [None]:
sdf.chat("Plot a histogram of the gdp by country, using a different color for each bar")

In [None]:
sdf.plot_bar_chart(x="country", y="gdp")

In [None]:
sdf.plot_pie_chart(labels="country", values="gdp")

## SmartDatalake

Sometimes, you might want to work with multiple dataframes at a time, letting the LLM orchestrate which one(s) to use to answer your queries. In such cases, instead of using a SmartDataframe you should rather use a SmartDatalake.

The concept is very similar to the SmartDataframe, but instead of accepting only 1 df as input, it can accept multiple ones.

In [5]:
from pandasai import SmartDatalake

In [6]:
employees_df = pd.DataFrame(
    {
        "EmployeeID": [1, 2, 3, 4, 5],
        "Name": ["John", "Emma", "Liam", "Olivia", "William"],
        "Department": ["HR", "Sales", "IT", "Marketing", "Finance"],
    }
)

salaries_df = pd.DataFrame(
    {
        "EmployeeID": [1, 2, 3, 4, 5],
        "Salary": [5000, 6000, 4500, 7000, 5500],
    }
)

lake = SmartDatalake(
    [employees_df, salaries_df],
    config={"llm": llm}
)
lake.chat("Who gets paid the most?")

NameError: name 'llm' is not defined

In [None]:
print(lake.last_code_executed)

In [None]:
users_df = pd.DataFrame(
    {
        "id": [1, 2, 3, 4, 5],
        "name": ["John", "Emma", "Liam", "Olivia", "William"]
    }
)
users = SmartDataframe(users_df, name="users")

photos_df = pd.DataFrame(
    {
        "id": [31, 32, 33, 34, 35],
        "user_id": [1, 1, 2, 4, 5]
    }
)
photos = SmartDataframe(photos_df, name="photos")

lake = SmartDatalake([users, photos], config={"llm": llm})
lake.chat("How many photos has been uploaded by John?")

In [None]:
print(lake.last_code_executed)

## Different LLMs

Although at the moment OpenAI GPT3.5 and GPT4 are the recommended models, we also support other models, like Starcoder and Falcon.

You can use them as if follows:

In [7]:
from pandasai import SmartDataframe
from pandasai.llm import Starcoder, Falcon

starcoder_llm = Starcoder(api_token="YOUR TOKEN")
falcon_llm = Falcon(api_token="YOUR TOKEN")

df1 = SmartDataframe(df, config={"llm": starcoder_llm})
df2 = SmartDataframe(df, config={"llm": falcon_llm})

print(df1.chat("Which country has the highest GDP?"))
print(df2.chat("Which one is the unhappiest country?"))

            Please use langchain.llms.HuggingFaceHub instead, although please be 
            aware that it may perform poorly.
            
            Please use langchain.llms.HuggingFaceHub instead, although please be 
            aware that it may perform poorly.
            


Unfortunately, I was not able to answer your question, because of the following error:

The remote server has responded with an error HTTP code: 400; Authorization header is correct, but the token seems invalid

Unfortunately, I was not able to answer your question, because of the following error:

The remote server has responded with an error HTTP code: 400; Authorization header is correct, but the token seems invalid



## LangChain LLMs

In [8]:
!pip install pandasai[langchain]

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting langchain<0.0.200,>=0.0.199 (from pandasai[langchain])
  Obtaining dependency information for langchain<0.0.200,>=0.0.199 from https://files.pythonhosted.org/packages/4c/c0/c47c76fb3a3c1dd6a71d8b92c0715cc21c24fecf0b15bf33d11cda1ba9db/langchain-0.0.199-py3-none-any.whl.metadata
  Downloading langchain-0.0.199-py3-none-any.whl.metadata (13 kB)
Collecting dataclasses-json<0.6.0,>=0.5.7 (from langchain<0.0.200,>=0.0.199->pandasai[langchain])
  Obtaining dependency information for dataclasses-json<0.6.0,>=0.5.7 from https://files.pythonhosted.org/packages/97/5f/e7cc90f36152810cab08b6c9c1125e8bcb9d76f8b3018d101b5f877b386c/dataclasses_json-0.5.14-py3-none-any.whl.metadata
  Downloading dataclasses_json-0.5.14-py3-none-any.whl.metadata (22 kB)
Collecting langchainplus-sdk>=0.0.9 (from langchain<0.0.200,>=0.0.199->pandasai[langchain])
  Obtaining dependency information for langchainplus-sdk>=0.0.9 from https://f

In [9]:
from pandasai import SmartDataframe
from langchain.llms import OpenAI
# from langchain.llms import Anthropic
# from langchain.llms import LlamaCpp

langchain_llm = OpenAI(openai_api_key="YOUR TOKEN", max_tokens=1000)
langchain_sdf = SmartDataframe(df, config={"llm": langchain_llm})
langchain_sdf.chat("Which are the top 5 countries by GPD?")

'Unfortunately, I was not able to answer your question, because of the following error:\n\nIncorrect API key provided: YOUR TOKEN. You can find your API key at https://platform.openai.com/account/api-keys.\n'

## Connectors

PandasAI provides a number of connectors that allow you to connect to different data sources. These connectors are designed to be easy to use, even if you are not familiar with the data source or with PandasAI.

To use a connector, you first need to install the required dependencies. You can do this by running the following command:

In [None]:
!pip install pandasai[connectors]

In [None]:
from pandasai.connectors import MySQLConnector, PostgreSQLConnector

# With a MySQL database
loan_connector = MySQLConnector(
    config={
        "host": "localhost",
        "port": 3306,
        "database": "mydb",
        "username": "root",
        "password": "root",
        "table": "loans",
        "where": [
            # this is optional and filters the data to
            # reduce the size of the dataframe
            ["loan_status", "=", "PAIDOFF"],
        ],
    }
)

# With a PostgreSQL database
payment_connector = PostgreSQLConnector(
    config={
        "host": "localhost",
        "port": 5432,
        "database": "mydb",
        "username": "root",
        "password": "root",
        "table": "payments",
        "where": [
            # this is optional and filters the data to
            # reduce the size of the dataframe
            ["payment_status", "=", "PAIDOFF"],
        ],
    }
)

df_connector = SmartDatalake([loan_connector, payment_connector], config={"llm": llm})
response = df_connector.chat("How many loans from the United states?")
print(response)

In [None]:
from pandasai.connectors.yahoo_finance import YahooFinanceConnector

yahoo_connector = YahooFinanceConnector("MSFT")
df = SmartDataframe(yahoo_connector, config={"llm": llm})

response = df.chat("What is the closing price for yesterday?")
print(response)

In [None]:
yahoo_connector = YahooFinanceConnector("TSLA")

df_connector = SmartDataframe(yahoo_connector, config={"llm": llm})
response = df_connector.chat("Plot the chart of tesla over time")