Skip to content

Commit

Permalink
feat: multiple dataframes (#181)
Browse files Browse the repository at this point in the history
* chore: avoid traversing the code twice for removing df overwrites and unsafe imports

* chore: update tests

* feat: multiple dataframes supported

* chore: update readme

* test: generate python code prompt

* test: multiple dataframes

* test: correct multiple dataframes

* docs: add multiple dataframes to the doc

* docs: update pai options to include Google Palm

---------

Co-authored-by: Gabriele Venturi <lele.venturi@gmail.com>
  • Loading branch information
victor-hugo-dc and gventuri committed May 28, 2023
1 parent 2c0a6f7 commit be4c717
Show file tree
Hide file tree
Showing 9 changed files with 438 additions and 80 deletions.
34 changes: 33 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,38 @@ pandas_ai(

![Chart](images/histogram-chart.png?raw=true)

Additionally, you can also pass in multiple dataframes to PandasAI and ask questions relating them.

```python
import pandas as pd
from pandasai import PandasAI

employees_data = {
'EmployeeID': [1, 2, 3, 4, 5],
'Name': ['John', 'Emma', 'Liam', 'Olivia', 'William'],
'Department': ['HR', 'Sales', 'IT', 'Marketing', 'Finance']
}

salaries_data = {
'EmployeeID': [1, 2, 3, 4, 5],
'Salary': [5000, 6000, 4500, 7000, 5500]
}

employees_df = pd.DataFrame(employees_data)
salaries_df = pd.DataFrame(salaries_data)


llm = OpenAI()
pandas_ai = PandasAI(llm)
pandas_ai([employees_df, salaries_df], "Who gets paid the most?")
```

The above code will return the following:

```
Oh, Olivia gets paid the most.
```

You can find more examples in the [examples](examples) directory.

## Command-Line Tool
Expand All @@ -103,7 +135,7 @@ Options:

- **-d, --dataset**: The file path to the dataset.
- **-t, --token**: Your HuggingFace or OpenAI API token, if no token provided pai will pull from the `.env` file.
- **-m, --model**: Choice of LLM, either `openai`, `open-assistant`, or `starcoder`.
- **-m, --model**: Choice of LLM, either `openai`, `open-assistant`, `starcoder`, or Google `palm`.
- **-p, --prompt**: Prompt that PandasAI will run.

To view a full list of available options and their descriptions, run the following command:
Expand Down
73 changes: 55 additions & 18 deletions docs/getting-started.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,22 @@
Usage
=====
`pandasai` is developed on top of `pandas` api. The objective is to make dataframe conversation
# Usage

`pandasai` is developed on top of `pandas` api. The objective is to make dataframe conversation
using Large Language Models (LLMs).

Installation
------------
## Installation

To use pandasai, first install it using pip through [PyPi](https://pypi.org/project/pandasai/) package distribution
To use pandasai, first install it using pip through [PyPi](https://pypi.org/project/pandasai/) package distribution
framework. It is actively developed so be vigilant for versions updates.

```console
pip install pandasai
```

>It is recommended to create a Virtual environment using your preffred choice of Environment Managers e.g conda,
>Poetry etc
> It is recommended to create a Virtual environment using your preffred choice of Environment Managers e.g conda,
> Poetry etc
## Getting Started

Getting Started
---------------
Below is simple example to get started with `pandasai`.

```python
Expand All @@ -41,24 +40,24 @@ pandas_ai.run(df, prompt='Which are the 5 happiest countries?')

## Generate openai API Token

Users are required to generate `YOUR_API_TOKEN`. Follow below simple steps to generate your API_TOKEN with
Users are required to generate `YOUR_API_TOKEN`. Follow below simple steps to generate your API_TOKEN with
[openai](https://platform.openai.com/overview).

1. Go to https://openai.com/api/ and signup with your email address or connect your Google Account.
1. Go to https://openai.com/api/ and signup with your email address or connect your Google Account.
2. Go to View API Keys on left side of your Personal Account Settings
3. Select Create new Secret key

> The API access to openai is a paid service. You have to set up billing.
>Read the [Pricing](https://platform.openai.com/docs/quickstart/pricing) information before experimenting.
> The API access to openai is a paid service. You have to set up billing.
> Read the [Pricing](https://platform.openai.com/docs/quickstart/pricing) information before experimenting.
## Demo in Google Colab

Try out PandasAI in your browser:

[![Open in Colab](https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667)](https://colab.research.google.com/drive/1rKz7TudOeCeKGHekw7JFNL4sagN9hon-?usp=sharing)

### Examples
### Examples

Other [examples](../examples) are included in the repository along with samples of data.

#### Working with CSV
Expand All @@ -81,7 +80,9 @@ print(response)
```

#### Working is Pandas Dataframe

Example of using PandasAI with a Pandas DataFrame

```python
import pandas as pd
from data.sample_dataframe import dataframe
Expand All @@ -99,8 +100,10 @@ print(response)

```

#### Plotting
#### Plotting

Example of using PandasAI to generate a chart from a Pandas DataFrame

```python
import pandas as pd
from data.sample_dataframe import dataframe
Expand All @@ -118,4 +121,38 @@ response = pandas_ai.run(
)
print(response)
# Output: check out images/histogram-chart.png
```
```

### Working with multiple dataframes

Example of using PandasAI with multiple Pandas DataFrames

```python
import pandas as pd

from pandasai import PandasAI
from pandasai.llm.openai import OpenAI

employees_data = {
'EmployeeID': [1, 2, 3, 4, 5],
'Name': ['John', 'Emma', 'Liam', 'Olivia', 'William'],
'Department': ['HR', 'Sales', 'IT', 'Marketing', 'Finance']
}

salaries_data = {
'EmployeeID': [1, 2, 3, 4, 5],
'Salary': [5000, 6000, 4500, 7000, 5500]
}

employees_df = pd.DataFrame(employees_data)
salaries_df = pd.DataFrame(salaries_data)

llm = OpenAI()
pandas_ai = PandasAI(llm, verbose=True, conversational=False)
response = pandas_ai.run(
[employees_df, salaries_df],
"Who gets paid the most?",
)
print(response)
# Output: Olivia gets paid the most.
```
26 changes: 26 additions & 0 deletions examples/with_multiple_dataframes.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
"""Example of using PandasAI on multiple Pandas DataFrame"""

import pandas as pd
from pandasai import PandasAI
from pandasai.llm.openai import OpenAI

employees_data = {
'EmployeeID': [1, 2, 3, 4, 5],
'Name': ['John', 'Emma', 'Liam', 'Olivia', 'William'],
'Department': ['HR', 'Sales', 'IT', 'Marketing', 'Finance']
}

salaries_data = {
'EmployeeID': [1, 2, 3, 4, 5],
'Salary': [5000, 6000, 4500, 7000, 5500]
}

employees_df = pd.DataFrame(employees_data)
salaries_df = pd.DataFrame(salaries_data)


llm = OpenAI()
pandas_ai = PandasAI(llm, verbose=True)
response = pandas_ai([employees_df, salaries_df], "Who gets paid the most?")
print(response)
# Output: Olivia

0 comments on commit be4c717

Please sign in to comment.