# Pandas – Data Analysis Library

## 1. Why Learn Pandas?
### What is it?

Pandas is a Python library for data manipulation and analysis. It provides Series (1D) and DataFrames (2D tabular data).

### Why we need it?
1.Simplifies data cleaning, exploration, and manipulation.

2.Handles large datasets efficiently.

3.Built on top of NumPy for fast computations.

### Where is it used?
1.Data Analysis & Reporting

2.Machine Learning (preprocessing datasets)

3.Finance, Healthcare, Marketing data analysis

4.CSV/Excel/SQL data handling

## 2. Importing Pandas


In [1]:
import pandas as pd

## 3.Series & DataFrames

In [5]:
# Series 
s = pd.Series([10, 20, 30, 40]) 
print("Series:\n", s) 
# DataFrame from dict 
data = {'Name': ['Alice','Bob','Charlie'], 'Age':[25,30,35]} 
df = pd.DataFrame(data) 
print("\nDataFrame:\n", df)

Series:
 0    10
1    20
2    30
3    40
dtype: int64

DataFrame:
       Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35


## 4.Reading & Exploring data

In [14]:

# Read CSV (replace with your file path)
# df = pd.read_csv("titanic.csv")
 
# Example DataFrame
data = {'Name':['Alice','Bob','Charlie'],'Age':[25,30,35],'Sex':['F','M','M']}
df = pd.DataFrame(data)
 
# Explore
print("head:\n",df.head())
print("\ntail:\n",df.tail())
print("\ninfo:\n",df.info())
print("\ndescribe:\n",df.describe())
print("\nshape:\n",df.shape)
 

head:
       Name  Age Sex
0    Alice   25   F
1      Bob   30   M
2  Charlie   35   M

tail:
       Name  Age Sex
0    Alice   25   F
1      Bob   30   M
2  Charlie   35   M
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    3 non-null      object
 1   Age     3 non-null      int64 
 2   Sex     3 non-null      object
dtypes: int64(1), object(2)
memory usage: 200.0+ bytes

info:
 None

describe:
         Age
count   3.0
mean   30.0
std     5.0
min    25.0
25%    27.5
50%    30.0
75%    32.5
max    35.0

shape:
 (3, 3)


## 5. Selecting Columns & Rows

In [15]:
# Select column
print(df['Name'])
 
# Select multiple columns
print(df[['Name','Age']])
 
# Select row by index
print(df.iloc[0])       # first row
print(df.loc[0])        # first row using label
 
# Select subset of rows and columns
print(df.loc[0:1, ['Name','Sex']])

0      Alice
1        Bob
2    Charlie
Name: Name, dtype: object
      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35
Name    Alice
Age        25
Sex         F
Name: 0, dtype: object
Name    Alice
Age        25
Sex         F
Name: 0, dtype: object
    Name Sex
0  Alice   F
1    Bob   M


## 6. Filtering & Sorting

In [16]:
# Filter rows
print(df[df['Age'] > 28])
 
# Sort by Age
print(df.sort_values('Age', ascending=False))

      Name  Age Sex
1      Bob   30   M
2  Charlie   35   M
      Name  Age Sex
2  Charlie   35   M
1      Bob   30   M
0    Alice   25   F


## 7. Adding & Removing Columns

In [17]:
# Add new column
df['Age_in_5yrs'] = df['Age'] + 5
print(df)
 
# Drop column
df = df.drop('Age_in_5yrs', axis=1)
print(df)

      Name  Age Sex  Age_in_5yrs
0    Alice   25   F           30
1      Bob   30   M           35
2  Charlie   35   M           40
      Name  Age Sex
0    Alice   25   F
1      Bob   30   M
2  Charlie   35   M


## 8. GroupBy & Aggregation

In [18]:
data = {'Name':['Alice','Bob','Charlie','Alice','Bob'],
        'Score':[85,90,95,80,70]}
df = pd.DataFrame(data)
 
# Group by Name and calculate mean score
grouped = df.groupby('Name').mean()
print(grouped)

         Score
Name          
Alice     82.5
Bob       80.0
Charlie   95.0


## 9. Handling Missing Data

In [19]:

data = {'Name':['Alice','Bob','Charlie','David'],
        'Age':[25, None, 35, 40]}
df = pd.DataFrame(data)
 
# Fill missing value
df['Age'] = df['Age'].fillna(df['Age'].mean())
print(df)
 
# Drop rows with missing values
# df = df.dropna()
 

      Name        Age
0    Alice  25.000000
1      Bob  33.333333
2  Charlie  35.000000
3    David  40.000000
