Include groupby-related operations #127

gventuri · 2023-05-15T22:40:02Z

🚀 The feature

Include groupby-related operations

Motivation, pitch

As @mzy2240 pointed out, the current version does not understand groupby-related operations, which might also be good to have.

Alternatives

No response

Additional context

For example, if you add a continent column to the example and ask which countries are happiest in each continent, it will give random answer.

sandiemann · 2023-05-17T08:50:09Z

@gventuri I believe the LLM will generate random code and answers. I can take a look at this.

sandiemann · 2023-05-17T09:01:02Z

I have re-produced it,

dataframe = {
    "country": [
        "United States",
        "United Kingdom",
        "France",
        "Germany",
        "Italy",
        "Spain",
        "Canada",
        "Australia",
        "Japan",
        "China",
    ],
    "gdp": [
        19294482071552,
        2891615567872,
        2411255037952,
        3435817336832,
        1745433788416,
        1181205135360,
        1607402389504,
        1490967855104,
        4380756541440,
        14631844184064,
    ],
    "happiness_index": [6.94, 7.16, 6.66, 7.07, 6.38, 6.4, 7.23, 7.22, 5.87, 5.12],
    "continents": [
        "North America",
        "Europe",
        "Europe",
        "Europe",
        "Europe",
        "Europe",
        "North America",
        "Oceania",
        "Asia",
        "Asia",
    ],
}
   ...: import pandas as pd
   ...: from examples.data.sample_dataframe import dataframe
   ...: 
   ...: from pandasai import PandasAI
   ...: from pandasai.llm.openai import OpenAI
   ...: 
   ...: df = pd.DataFrame(dataframe)
   ...: 
   ...: llm = OpenAI()
   ...: pandas_ai = PandasAI(llm, verbose=True, conversational=False)
   ...: response = pandas_ai.run(df, "which countries are happiest in each continent?")
   ...: print(response)
   ...: 
Running PandasAI with openai LLM...
Code generated:

import pandas as pd
# create the dataframe
data = {'country': ['Italy', 'United States', 'United Kingdom', 'Germany', 'France', 'Canada', 'Australia', 'Japan', 'China', 'Brazil'],
        'gdp': [2891615567872, 3855524628, 9138963667, 4749942652, 3191757996, 1731052879, 1425977883, 5082466594, 14608167700, 2143320000],
        'happiness_index': [7.16, 6.38, 6.38, 6.66, 7.16, 7.31, 7.23, 5.91, 5.14, 6.95],
        'continents': ['Europe', 'Europe', 'North America', 'Europe', 'Europe', 'North America', 'Australia', 'Asia', 'Asia', 'South America']}
df = pd.DataFrame(data)
# group by continent and get the country with the highest happiness index
result = df.groupby('continents')['country', 'happiness_index'].apply(lambda x: x[x.happiness_index == x.happiness_index.max()])
# print the result
print(result)

The code returns countries which were not in dataframe. The issue is with the prompt which was addressed here and fixed in the new release v0.2.13

gventuri · 2023-05-28T22:33:28Z

Feature added with be4c717

Closing

gventuri added the enhancement New feature or request label May 15, 2023

gventuri mentioned this issue May 15, 2023

Support operation on multiple data frames, for example concat, merge, join, append, compare, etc #86

Closed

gventuri closed this as completed May 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include groupby-related operations #127

Include groupby-related operations #127

gventuri commented May 15, 2023

sandiemann commented May 17, 2023

sandiemann commented May 17, 2023

gventuri commented May 28, 2023

Include groupby-related operations #127

Include groupby-related operations #127

Comments

gventuri commented May 15, 2023

🚀 The feature

Motivation, pitch

Alternatives

Additional context

sandiemann commented May 17, 2023

sandiemann commented May 17, 2023

gventuri commented May 28, 2023