## Introduction to Pandas AI

Pandas AI is a powerful Python library that significantly enhances the data analysis capabilities of standard pandas DataFrames. Its primary purpose is to bridge the gap between human language and data operations by integrating large language models (LLMs) with pandas, making data analysis more intuitive and accessible.

**Core Capabilities:**

*   **Natural Language Querying:** Users can interact with their DataFrames using plain English questions, eliminating the need to write complex code for common data operations.
*   **Data Manipulation:** It allows for data cleaning, transformation, and manipulation through natural language commands.
*   **Data Visualization:** Pandas AI can generate various types of visualizations based on natural language requests, helping users to quickly understand patterns and insights.
*   **Insight Extraction:** It can assist in extracting meaningful insights and summaries from data, often providing explanations for its findings.

By leveraging the power of LLMs, Pandas AI enables users—from beginners to experienced data scientists—to perform sophisticated data analysis, generate reports, and gain insights from their data simply by describing their needs in natural language, thereby democratizing data science.

In [None]:
!pip install pandasai
!pip install --upgrade numpy pandas scipy

## Loading Sample Data

### Subtask:
Load a sample dataset (e.g., a CSV file) into a pandas DataFrame to use for demonstrations throughout the notebook.


In [None]:
import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({
    'CustomerID': [1, 2, 3, 4, 5],
    'Gender': ['Male', 'Female', 'Female', 'Male', 'Female'],
    'Age': [25, 34, 29, 42, 38],
    'AnnualIncome_k$': [50, 70, 40, 90, 65],
    'SpendingScore_1_100': [40, 80, 20, 95, 70]
})

print("Sample DataFrame created successfully.")
display(df.head())

Sample DataFrame created successfully.


Unnamed: 0,CustomerID,Gender,Age,AnnualIncome_k$,SpendingScore_1_100
0,1,Male,25,50,40
1,2,Female,34,70,80
2,3,Female,29,40,20
3,4,Male,42,90,95
4,5,Female,38,65,70


## Basic Data Querying

### Subtask:
Demonstrate basic data querying using natural language prompts. Show examples of asking questions about the DataFrame, such as 'What are the average values of X?' or 'Show me the top 5 rows where Y is Z'.


In [None]:
from pandasai import SmartDataframe
from pandasai.llm import OpenAI

import os

os.environ["OPENAI_API_KEY"] = "YOUR_API_KEY"

llm = OpenAI()

# Create a SmartDataframe instance
sdf = SmartDataframe(df, config={
    "llm": llm,
    "enable_cache": False,
    "verbose": True
})

print("SmartDataframe initialized successfully.")

SmartDataframe initialized successfully.


**Reasoning**:
Now that the SmartDataframe is successfully initialized, I will use natural language queries to answer the questions specified in the subtask: 'What is the average Age?', 'What is the most common Gender?', and 'Show me the top 3 customers with the highest SpendingScore_1_100'.



In [None]:
display(df)

Unnamed: 0,CustomerID,Gender,Age,AnnualIncome_k$,SpendingScore_1_100
0,1,Male,25,50,40
1,2,Female,34,70,80
2,3,Female,29,40,20
3,4,Male,42,90,95
4,5,Female,38,65,70


### 1. Average Age

In [None]:
average_age = sdf.chat("What is the average Age?")

print(average_age)

33.6


### 2. Most common gender

In [None]:
most_common_gender = sdf.chat("What is the most common Gender?")
print(most_common_gender)

The most common gender is Female.


### 3. Top 3 customers with highest Spending Score

In [None]:
top_spending_customers = sdf.chat("Show me the top 3 customers with the highest SpendingScore_1_100")

display(top_spending_customers)

Top 3 customers with highest SpendingScore_1_100:



Unnamed: 0,CustomerID,Gender,Age,AnnualIncome_k$,SpendingScore_1_100
3,4,Male,42,90,95
1,2,Female,34,70,80
4,5,Female,38,65,70


### 4. Average age of all the categories along with the category name

In [None]:
top_spending_customers = sdf.chat("Average age of all the categories along with the category name")

display(top_spending_customers)

{'type': 'dataframe', 'value':    Gender        Age
0  Female  33.666667
1    Male  33.500000}


Unnamed: 0,Gender,Age
0,Female,33.666667
1,Male,33.5
