# Pandas – Data Analysis Library

### 1.Why Learn Pandas?

What is it?

Pandas is a Python library for data manipulation and analysis. It provides Series (1D) and DataFrames (2D tabular data).

### Why we need it?

1.Simplifies data cleaning, exploration, and manipulation.

2.Handles large datasets efficiently.

3.Built on top of NumPy for fast computations.

### Where is it used?

1.Data Analysis & Reporting

2.Machine Learning (preprocessing datasets)

3.Finance, Healthcare, Marketing data analysis

4.CSV/Excel/SQL data handling

### 2. Importing Pandas

In [2]:
import pandas as pd

### 3.Series & DataFrames


In [6]:
# Series 
s = pd.Series([10, 20, 30, 40]) 
print("Series:\n", s) 
# DataFrame 
data = {'Name': ['Alice','Bob','Charlie'], 'Age':[25,30,35]} 
df = pd.DataFrame(data) 
print("\nDataFrame:\n", df)

Series:
 0    10
1    20
2    30
3    40
dtype: int64

DataFrame:
       Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35


### 4.Reading and Exploring Data

In [15]:
# Read CSV (replace with your file path)
# df = pd.read_csv("titanic.csv")
 
# Example DataFrame
df = pd.read_csv("heart.csv")
# data = {'Name':['Alice','Bob','Charlie'],'Age':[25,30,35],'Sex':['F','M','M']}
# df = pd.DataFrame(data)
 
# Explore
print(df.head(5))
print(df.tail(5))
print(df.info())
print(df.describe())
print(df.shape)

   age  sex  cp  trestbps  chol  fbs  restecg  thalach  exang  oldpeak  slope  \
0   52    1   0       125   212    0        1      168      0      1.0      2   
1   53    1   0       140   203    1        0      155      1      3.1      0   
2   70    1   0       145   174    0        1      125      1      2.6      0   
3   61    1   0       148   203    0        1      161      0      0.0      2   
4   62    0   0       138   294    1        1      106      0      1.9      1   

   ca  thal  target  
0   2     3       0  
1   0     3       0  
2   0     3       0  
3   1     3       0  
4   3     2       0  
      age  sex  cp  trestbps  chol  fbs  restecg  thalach  exang  oldpeak  \
1020   59    1   1       140   221    0        1      164      1      0.0   
1021   60    1   0       125   258    0        0      141      1      2.8   
1022   47    1   0       110   275    0        0      118      1      1.0   
1023   50    0   0       110   254    0        0      159      0      0.0

### 5. Selecting Columns & Rows

In [16]:
# Select column
print(df['age'].head(5))
 
# Select multiple columns
print(df[['sex','age']].head(5))
 
# Select row by index
print(df.iloc[0])       # first row
print(df.loc[0])        # first row using label
 
# Select subset of rows and columns
print(df.loc[0:1, ['age','sex']].head(5))

0    52
1    53
2    70
3    61
4    62
Name: age, dtype: int64
   sex  age
0    1   52
1    1   53
2    1   70
3    1   61
4    0   62
age          52.0
sex           1.0
cp            0.0
trestbps    125.0
chol        212.0
fbs           0.0
restecg       1.0
thalach     168.0
exang         0.0
oldpeak       1.0
slope         2.0
ca            2.0
thal          3.0
target        0.0
Name: 0, dtype: float64
age          52.0
sex           1.0
cp            0.0
trestbps    125.0
chol        212.0
fbs           0.0
restecg       1.0
thalach     168.0
exang         0.0
oldpeak       1.0
slope         2.0
ca            2.0
thal          3.0
target        0.0
Name: 0, dtype: float64
   age  sex
0   52    1
1   53    1


### 6. Filtering & Sorting


In [14]:
# Filter rows
print(df[df['age'] > 28].head(5))
 
# Sort by Age
print(df.sort_values('age', ascending=False).head(5))

   age  sex  cp  trestbps  chol  fbs  restecg  thalach  exang  oldpeak  slope  \
0   52    1   0       125   212    0        1      168      0      1.0      2   
1   53    1   0       140   203    1        0      155      1      3.1      0   
2   70    1   0       145   174    0        1      125      1      2.6      0   
3   61    1   0       148   203    0        1      161      0      0.0      2   
4   62    0   0       138   294    1        1      106      0      1.9      1   

   ca  thal  target  
0   2     3       0  
1   0     3       0  
2   0     3       0  
3   1     3       0  
4   3     2       0  
     age  sex  cp  trestbps  chol  fbs  restecg  thalach  exang  oldpeak  \
387   77    1   0       125   304    0        0      162      1      0.0   
162   77    1   0       125   304    0        0      162      1      0.0   
160   77    1   0       125   304    0        0      162      1      0.0   
965   76    0   2       140   197    0        2      116      0      1.1   
9

### 7. Adding & Removing Columns


In [18]:
# Add new column
df['Age_in_5yrs'] = df['age'] + 5
print(df.head(2))


   age  sex  cp  trestbps  chol  fbs  restecg  thalach  exang  oldpeak  slope  \
0   52    1   0       125   212    0        1      168      0      1.0      2   
1   53    1   0       140   203    1        0      155      1      3.1      0   

   ca  thal  target  Age_in_5yrs  
0   2     3       0           57  
1   0     3       0           58  


In [19]:

# Drop column
df = df.drop('Age_in_5yrs', axis=1)
print(df.head(2))

   age  sex  cp  trestbps  chol  fbs  restecg  thalach  exang  oldpeak  slope  \
0   52    1   0       125   212    0        1      168      0      1.0      2   
1   53    1   0       140   203    1        0      155      1      3.1      0   

   ca  thal  target  
0   2     3       0  
1   0     3       0  


### 8. GroupBy & Aggregation


In [23]:
data = {'Name':['Alice','Bob','Charlie','Alice','Bob'],
        'Score':[85,90,95,80,70]}
df = pd.DataFrame(data)
 
# Group by Name and calculate mean score
grouped = df.groupby('Name').mean()
print(grouped)

         Score
Name          
Alice     82.5
Bob       80.0
Charlie   95.0


### 9. Handling Missing Data


In [25]:
# data = {'Name':['Alice','Bob','Charlie','David'],
#         'Age':[25, None, 35, 40]}
# df = pd.DataFrame(data)
df = pd.read_csv("heart.csv")
# Fill missing value
df['age'] = df['age'].fillna(df['age'].mean())
print(df.head(5))
 
# Drop rows with missing values
# df = df.dropna()
 

   age  sex  cp  trestbps  chol  fbs  restecg  thalach  exang  oldpeak  slope  \
0   52    1   0       125   212    0        1      168      0      1.0      2   
1   53    1   0       140   203    1        0      155      1      3.1      0   
2   70    1   0       145   174    0        1      125      1      2.6      0   
3   61    1   0       148   203    0        1      161      0      0.0      2   
4   62    0   0       138   294    1        1      106      0      1.9      1   

   ca  thal  target  
0   2     3       0  
1   0     3       0  
2   0     3       0  
3   1     3       0  
4   3     2       0  
