Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include groupby-related operations #127

Closed
gventuri opened this issue May 15, 2023 · 3 comments
Closed

Include groupby-related operations #127

gventuri opened this issue May 15, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@gventuri
Copy link
Collaborator

馃殌 The feature

Include groupby-related operations

Motivation, pitch

As @mzy2240 pointed out, the current version does not understand groupby-related operations, which might also be good to have.

Alternatives

No response

Additional context

For example, if you add a continent column to the example and ask which countries are happiest in each continent, it will give random answer.

@sandiemann
Copy link
Contributor

@gventuri I believe the LLM will generate random code and answers. I can take a look at this.

@sandiemann
Copy link
Contributor

I have re-produced it,

dataframe = {
    "country": [
        "United States",
        "United Kingdom",
        "France",
        "Germany",
        "Italy",
        "Spain",
        "Canada",
        "Australia",
        "Japan",
        "China",
    ],
    "gdp": [
        19294482071552,
        2891615567872,
        2411255037952,
        3435817336832,
        1745433788416,
        1181205135360,
        1607402389504,
        1490967855104,
        4380756541440,
        14631844184064,
    ],
    "happiness_index": [6.94, 7.16, 6.66, 7.07, 6.38, 6.4, 7.23, 7.22, 5.87, 5.12],
    "continents": [
        "North America",
        "Europe",
        "Europe",
        "Europe",
        "Europe",
        "Europe",
        "North America",
        "Oceania",
        "Asia",
        "Asia",
    ],
}
   ...: import pandas as pd
   ...: from examples.data.sample_dataframe import dataframe
   ...: 
   ...: from pandasai import PandasAI
   ...: from pandasai.llm.openai import OpenAI
   ...: 
   ...: df = pd.DataFrame(dataframe)
   ...: 
   ...: llm = OpenAI()
   ...: pandas_ai = PandasAI(llm, verbose=True, conversational=False)
   ...: response = pandas_ai.run(df, "which countries are happiest in each continent?")
   ...: print(response)
   ...: 
Running PandasAI with openai LLM...
Code generated:

import pandas as pd
# create the dataframe
data = {'country': ['Italy', 'United States', 'United Kingdom', 'Germany', 'France', 'Canada', 'Australia', 'Japan', 'China', 'Brazil'],
        'gdp': [2891615567872, 3855524628, 9138963667, 4749942652, 3191757996, 1731052879, 1425977883, 5082466594, 14608167700, 2143320000],
        'happiness_index': [7.16, 6.38, 6.38, 6.66, 7.16, 7.31, 7.23, 5.91, 5.14, 6.95],
        'continents': ['Europe', 'Europe', 'North America', 'Europe', 'Europe', 'North America', 'Australia', 'Asia', 'Asia', 'South America']}
df = pd.DataFrame(data)
# group by continent and get the country with the highest happiness index
result = df.groupby('continents')['country', 'happiness_index'].apply(lambda x: x[x.happiness_index == x.happiness_index.max()])
# print the result
print(result)

The code returns countries which were not in dataframe. The issue is with the prompt which was addressed here and fixed in the new release v0.2.13

@gventuri
Copy link
Collaborator Author

Feature added with be4c717

Closing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants
@gventuri @sandiemann and others