# Pandas – Data Analysis Library

## 1. Why Learn Pandas?



### What is it?

Pandas is a Python library for data manipulation and analysis. It provides Series (1D) and DataFrames (2D tabular data).

### Why we need it?
1.Simplifies data cleaning, exploration, and manipulation.

2.Handles large datasets efficiently.

3.Built on top of NumPy for fast computations

### Applications of Pandas
1.Data Analysis & Reporting

2.Machine Learning (preprocessing datasets)

3.Finance, Healthcare, Marketing data analysis

4.CSV/Excel/SQL data handling

## 2. Importing Pandas

In [2]:
import pandas as pd

## 3. Series & DataFrames


In [3]:
# Series 
s = pd.Series([10, 20, 30, 40]) 
print("Series:\n", s)
'''
print("\nDataFrame:\n", s) 
'''
# DataFrame from dict 
data = {'Name': ['Alice','Bob','Charlie'], 'Age':[25,30,35]} 
'''
s=pd.Series(data)
print("\nSeries:\n",s)
'''
df = pd.DataFrame(data) 
print("\nDataFrame:\n", df)

Series:
 0    10
1    20
2    30
3    40
dtype: int64

DataFrame:
       Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35


## 4. Reading and Exploring the Data

In [4]:

# Read CSV (replace with your file path)
df = pd.read_csv("titanic.csv")
# print(df)

# Example DataFrame
data = {'Name':['Alice','Bob','Charlie'],'Age':[25,30,35],'Sex':['F','M','M']}
df = pd.DataFrame(data)
 
# Explore
print(df.head(4))


      Name  Age Sex
0    Alice   25   F
1      Bob   30   M
2  Charlie   35   M


In [36]:
print(df.tail(5))



      Name  Age Sex
0    Alice   25   F
1      Bob   30   M
2  Charlie   35   M


In [61]:
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Name    4 non-null      object 
 1   Age     4 non-null      float64
dtypes: float64(1), object(1)
memory usage: 196.0+ bytes
None


In [38]:
print(df.describe())


        Age
count   3.0
mean   30.0
std     5.0
min    25.0
25%    27.5
50%    30.0
75%    32.5
max    35.0


In [39]:
print(df.shape)

(3, 3)


## 5. selecting rows and columns(using iloc)


In [48]:
# Select column
print(df['Name'],"\n")

0      Alice
1        Bob
2    Charlie
Name: Name, dtype: object 



In [49]:
# Select multiple columns
print(df[['Name','Age']],"\n")

      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35 



In [50]:
# Select row by index
print(df.iloc[0],"\n")       # first row
print(df.loc[0],"\n")        # first row using label

Name    Alice
Age        25
Sex         F
Name: 0, dtype: object 

Name    Alice
Age        25
Sex         F
Name: 0, dtype: object 



In [None]:
# Select subset of rows and columns
print(df.loc[0:1, ['Name','Sex']],"\n")

    Name Sex
0  Alice   F
1    Bob   M 



## 6. Filtering & Sorting

In [53]:

# Filter rows
print(df[df['Age'] > 28])
 
# Sort by Age
print(df.sort_values('Age', ascending=False))

      Name  Age Sex
1      Bob   30   M
2  Charlie   35   M
      Name  Age Sex
2  Charlie   35   M
1      Bob   30   M
0    Alice   25   F


## 7. Adding and removing columns

In [60]:
# Add new column
df['Age_in_5yrs'] = df['Age'] + 5
print(df,"\n")
 
# Drop column
df = df.drop('Age_in_5yrs', axis=1)
print(df)
 

      Name        Age  Age_in_5yrs
0    Alice  25.000000    30.000000
1      Bob  33.333333    38.333333
2  Charlie  35.000000    40.000000
3    David  40.000000    45.000000 

      Name        Age
0    Alice  25.000000
1      Bob  33.333333
2  Charlie  35.000000
3    David  40.000000


## 8. GroupBy & Aggregation

In [5]:
# 8. GroupBy & Aggregation
data = {'Name':['Alice','Bob','Charlie','Alice','Bob'],
        'Score':[85,90,95,80,70]}
df = pd.DataFrame(data)
print(df,"\n")
 
# Group by Name and calculate mean score
grouped = df.groupby('Name').mean()
print(grouped)

      Name  Score
0    Alice     85
1      Bob     90
2  Charlie     95
3    Alice     80
4      Bob     70 

         Score
Name          
Alice     82.5
Bob       80.0
Charlie   95.0


## 9. Handling Missing Data

In [None]:

data = {'Name':['Alice','Bob','Charlie','David'],
        'Age':[25, None, 35, 40]}
df = pd.DataFrame(data)
print(df,"\n")

# Fill missing value
df['Age'] = df['Age'].fillna(df['Age'].mean())
print(df)
 
# Drop rows with missing values
df = df.dropna()
df
 

      Name   Age
0    Alice  25.0
1      Bob   NaN
2  Charlie  35.0
3    David  40.0 

      Name        Age
0    Alice  25.000000
1      Bob  33.333333
2  Charlie  35.000000
3    David  40.000000


Unnamed: 0,Name,Age
0,Alice,25.0
1,Bob,33.333333
2,Charlie,35.0
3,David,40.0


: 