<img src="https://drive.google.com/uc?export=view&id=1wYSMgJtARFdvTt5g7E20mE4NmwUFUuog" width="200">

[![Build Fast with AI](https://img.shields.io/badge/BuildFastWithAI-GenAI%20Bootcamp-blue?style=for-the-badge&logo=artificial-intelligence)](https://www.buildfastwithai.com/genai-course)
[![EduChain GitHub](https://img.shields.io/github/stars/satvik314/educhain?style=for-the-badge&logo=github&color=gold)](https://github.com/satvik314/educhain)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1m64PXi6DUHjHoJHArxPmhe99hx9K7hiR#scrollTo=eDVleQA53eMW)
## Master Generative AI in 6 Weeks
**What You'll Learn:**
- Build with Latest LLMs
- Create Custom AI Apps
- Learn from Industry Experts
- Join Innovation Community
Transform your AI ideas into reality through hands-on projects and expert mentorship.
[Start Your Journey](https://www.buildfastwithai.com/genai-course)
*Empowering the Next Generation of AI Innovators

# Data Analysis with PandasAI

This notebook demonstrates how to leverage PandasAI for intelligent data analysis by combining pandas DataFrames with natural language queries using OpenAI's LLM. The examples show basic usage with sales data, but can be extended to more complex analyses.

Table of Contents:

1. Installation and Setup
2. Creating a Smart DataFrame
3. Basic Data Analysis with Natural Language
4. Advanced Queries and Visualizations


## Setup
To get started, we need to install the last version of PandasAI.

In [55]:
!pip install -qU pandasai pandas


## Setting Up PandasAI With BambooLLM 🐼
#### Useful Documentation Links

- [PandasAI Documentation](https://github.com/gventuri/pandas-ai)  

Since PandasAI is powered by a LLM, you should import the LLM you'd like to use for your use case.

By default, if no LLM is provided, it will use BambooLLM.

You can get your free API key signing up at https://pandabi.ai (you can also configure it in your .env file)

In [54]:
import os
from google.colab import userdata
from pandasai import SmartDataframe
from pandasai.llm import BambooLLM

os.environ['PANDASAI_API_KEY'] = userdata.get('PANDASAI_API_KEY')

llm = BambooLLM()

### Creating a Smart Dataframe

In [36]:
import pandas as pd
from pandasai import SmartDataframe

# Create sample data
sales_data = pd.DataFrame({
    "country": ["United States", "United Kingdom", "France", "Germany"],
    "sales": [5000, 3200, 2900, 4100]
})

# Convert to SmartDataframe
sdf = SmartDataframe(sales_data, config={"llm": llm})

In [None]:
sales_data

In [38]:
sales_data.loc[sales_data['sales'].idxmax(), 'country']

'United States'

In [39]:
# Query your data
response = sdf.chat('Which country has the highest sales?')
print(response)

The country with the highest sales is United States.


In [40]:
response = sdf.chat('What are the total sales?')
print(response)

15200


### Analyzing Titanic Dataset

In [41]:
# Load and analyze Titanic dataset
titanic_df = pd.read_csv('/content/train.csv')
titanic_smart = SmartDataframe(titanic_df, config={"llm": llm})

In [None]:
titanic_df

In [None]:
titanic_smart

###Trying Different Queries.

In [None]:
print("Query 1: Basic passenger count")
response = titanic_smart.chat("How many total passengers were on the Titanic?")
print(response)

In [None]:
print("\nQuery 2.1: Age distribution")
response = titanic_smart.chat("What was the age distribution of passengers? Show me the average, minimum and maximum ages.")
print(response)

In [None]:
print("\nQuery 2.2: Age distribution Chart")
response = titanic_smart.chat("What was the age distribution of passengers? Show me a bar chart.")
print(response)


In [None]:
print("\nQuery 3: Class analysis - Multi-Query")
response = titanic_smart.chat("How many passengers were in each passenger class (1st, 2nd, 3rd) and what was the survival rate for each class?")
print(response)

In [None]:
print("\nQuery 4: Complex Fare Analysis")
response = titanic_smart.chat("What was the relationship between ticket fare prices and survival rates? Break it down by passenger class and show any notable patterns.Give me multi line graph with different y-axis")
print(response)

In [None]:
print("\nQuery 5: Complex demographic analysis")
response = titanic_smart.chat("Compare the survival rates between different age groups, gender and passenger classes combined. Focus on identifying which demographic groups had the highest and lowest survival rates. Give a intuitive bar chart.")

print(response)
