<a href="https://colab.research.google.com/github/Tanu-N-Prabhu/Python/blob/master/Data%20Analysis/Level%201/Pandas_for_Data_Analysis_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Pandas for Data Analysis
This notebook provides a beginner-friendly walkthrough of key pandas functionalities using simple examples. Topics include:
- Reading data from CSV, Excel, and JSON
- Selecting and filtering rows and columns
- Merging, joining, and concatenating datasets
- Grouping and aggregating data
- Applying functions with `.apply()`, `.map()`, and `.lambda`
- Transforming data with `.pivot()`, `.melt()`, `.stack()`, `.unstack()`

In [1]:
# Importing required libraries
import pandas as pd
import numpy as np
import json

## Reading Data
We'll create small example files and read them using pandas.

In [2]:
# Create example CSV file
csv_data = '''Name,Age,Salary
Alice,25,50000
Bob,30,60000
Charlie,35,70000'''
with open('example.csv', 'w') as f:
    f.write(csv_data)
# Read CSV
df_csv = pd.read_csv('example.csv')
df_csv

Unnamed: 0,Name,Age,Salary
0,Alice,25,50000
1,Bob,30,60000
2,Charlie,35,70000


In [3]:
# Create example Excel file
df_csv.to_excel('example.xlsx', index=False)
# Read Excel
df_excel = pd.read_excel('example.xlsx')
df_excel

Unnamed: 0,Name,Age,Salary
0,Alice,25,50000
1,Bob,30,60000
2,Charlie,35,70000


In [4]:
# Create example JSON file
json_data = df_csv.to_dict(orient='records')
with open('example.json', 'w') as f:
    json.dump(json_data, f)
# Read JSON
df_json = pd.read_json('example.json')
df_json

Unnamed: 0,Name,Age,Salary
0,Alice,25,50000
1,Bob,30,60000
2,Charlie,35,70000


## Selecting and Filtering Rows and Columns

In [5]:
# Selecting columns
df_csv['Name']

Unnamed: 0,Name
0,Alice
1,Bob
2,Charlie


In [6]:
# Filtering rows where Age > 28
df_csv[df_csv['Age'] > 28]

Unnamed: 0,Name,Age,Salary
1,Bob,30,60000
2,Charlie,35,70000


## Merging, Joining, and Concatenating

In [7]:
# Create another DataFrame
dept = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie'], 'Dept': ['HR', 'IT', 'Finance']})
# Merge on 'Name'
pd.merge(df_csv, dept, on='Name')

Unnamed: 0,Name,Age,Salary,Dept
0,Alice,25,50000,HR
1,Bob,30,60000,IT
2,Charlie,35,70000,Finance


In [8]:
# Concatenate two dataframes vertically
pd.concat([df_csv, df_csv])

Unnamed: 0,Name,Age,Salary
0,Alice,25,50000
1,Bob,30,60000
2,Charlie,35,70000
0,Alice,25,50000
1,Bob,30,60000
2,Charlie,35,70000


## Grouping and Aggregating

In [9]:
# Group by and calculate average salary
df_csv.groupby('Age').agg({'Salary': 'mean'})

Unnamed: 0_level_0,Salary
Age,Unnamed: 1_level_1
25,50000.0
30,60000.0
35,70000.0


## Applying Functions: `.apply()`, `.map()`, `.lambda`

In [10]:
# Using .apply with lambda
df_csv['Tax'] = df_csv['Salary'].apply(lambda x: x * 0.1)
df_csv

Unnamed: 0,Name,Age,Salary,Tax
0,Alice,25,50000,5000.0
1,Bob,30,60000,6000.0
2,Charlie,35,70000,7000.0


## Transforming Data: `.pivot()`, `.melt()`, `.stack()`, `.unstack()`

In [11]:
# Melt example
melted = pd.melt(df_csv, id_vars=['Name'], value_vars=['Age', 'Salary'])
melted

Unnamed: 0,Name,variable,value
0,Alice,Age,25
1,Bob,Age,30
2,Charlie,Age,35
3,Alice,Salary,50000
4,Bob,Salary,60000
5,Charlie,Salary,70000


In [12]:
# Pivot table
df_pivot = melted.pivot(index='Name', columns='variable', values='value')
df_pivot

variable,Age,Salary
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Alice,25,50000
Bob,30,60000
Charlie,35,70000
