<a href="https://colab.research.google.com/github/Tanu-N-Prabhu/Python/blob/master/Data%20Analysis/Level%201/pandas_for_data_analysis_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Pandas for Data Analysis

This beginner-friendly notebook will walk you through the basics of using Pandas for data analysis. We will cover reading data, inspecting it, cleaning, filtering, and basic operations using a simple dataset.

In [1]:
import pandas as pd

## Step 1: Load Sample Data
We will create a small sample sales dataset manually to keep things simple.

In [2]:

data = {
    'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04'],
    'Product': ['Apples', 'Bananas', 'Apples', 'Bananas'],
    'Quantity': [10, 15, 20, 5],
    'Price': [0.5, 0.3, 0.5, 0.3]
}
df = pd.DataFrame(data)
df


Unnamed: 0,Date,Product,Quantity,Price
0,2023-01-01,Apples,10,0.5
1,2023-01-02,Bananas,15,0.3
2,2023-01-03,Apples,20,0.5
3,2023-01-04,Bananas,5,0.3


## Step 2: Understand the Data

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Date      4 non-null      object 
 1   Product   4 non-null      object 
 2   Quantity  4 non-null      int64  
 3   Price     4 non-null      float64
dtypes: float64(1), int64(1), object(2)
memory usage: 260.0+ bytes


## Step 3: Descriptive Statistics
Get a quick summary of numeric columns.

In [4]:
df.describe()

Unnamed: 0,Quantity,Price
count,4.0,4.0
mean,12.5,0.4
std,6.454972,0.11547
min,5.0,0.3
25%,8.75,0.3
50%,12.5,0.4
75%,16.25,0.5
max,20.0,0.5


## Step 4: Create a New Column
We can calculate `Revenue = Quantity × Price`.

In [5]:
df['Revenue'] = df['Quantity'] * df['Price']
df

Unnamed: 0,Date,Product,Quantity,Price,Revenue
0,2023-01-01,Apples,10,0.5,5.0
1,2023-01-02,Bananas,15,0.3,4.5
2,2023-01-03,Apples,20,0.5,10.0
3,2023-01-04,Bananas,5,0.3,1.5


## Step 5: Filter Rows
Let's filter rows where Revenue is greater than 5.

In [6]:
df[df['Revenue'] > 5]

Unnamed: 0,Date,Product,Quantity,Price,Revenue
2,2023-01-03,Apples,20,0.5,10.0


## Step 6: Group By Product
Total revenue per product.

In [7]:
df.groupby('Product')['Revenue'].sum()

Unnamed: 0_level_0,Revenue
Product,Unnamed: 1_level_1
Apples,15.0
Bananas,6.0


## Step 7: Sorting the DataFrame
Sort by revenue in descending order.

In [8]:
df.sort_values(by='Revenue', ascending=False)

Unnamed: 0,Date,Product,Quantity,Price,Revenue
2,2023-01-03,Apples,20,0.5,10.0
0,2023-01-01,Apples,10,0.5,5.0
1,2023-01-02,Bananas,15,0.3,4.5
3,2023-01-04,Bananas,5,0.3,1.5


## Step 8: Handling Missing Values
Let's simulate missing data and handle it.

In [9]:

df.loc[2, 'Quantity'] = None
print("With missing value:")
print(df)

# Fill missing value with 0
df['Quantity'] = df['Quantity'].fillna(0)
print("After filling missing value:")
print(df)


With missing value:
         Date  Product  Quantity  Price  Revenue
0  2023-01-01   Apples      10.0    0.5      5.0
1  2023-01-02  Bananas      15.0    0.3      4.5
2  2023-01-03   Apples       NaN    0.5     10.0
3  2023-01-04  Bananas       5.0    0.3      1.5
After filling missing value:
         Date  Product  Quantity  Price  Revenue
0  2023-01-01   Apples      10.0    0.5      5.0
1  2023-01-02  Bananas      15.0    0.3      4.5
2  2023-01-03   Apples       0.0    0.5     10.0
3  2023-01-04  Bananas       5.0    0.3      1.5
