<a href="https://colab.research.google.com/github/Chowdhurynaseeh/ML_Batch-03/blob/main/pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Step 1: Getting Started with Pandas**

In [None]:
!pip install pandas



In [None]:
import pandas as pd
import numpy as np   # often used together

## **Step 2: Series (1D Data)**

In [None]:
# A Series is like a labeled 1D array.
# From list
s = pd.Series([10, 20, 30, 40])
print(s)

# With custom index
s2 = pd.Series([10, 20, 30], index=["a", "b", "c"])
print(s2)

# Unlike NumPy, pandas gives labels (index).

0    10
1    20
2    30
3    40
dtype: int64
a    10
b    20
c    30
dtype: int64


## **Step 3: DataFrame (2D Table)**

In [None]:
# A DataFrame is like an Excel sheet or SQL table.
# From dictionary
data = {
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [25, 30, 35],
    "City": ["Dhaka", "Chittagong", "Sylhet"]
}

df = pd.DataFrame(data)
print(df)

      Name  Age        City
0    Alice   25       Dhaka
1      Bob   30  Chittagong
2  Charlie   35      Sylhet


## **Step 4: Inspecting Data**

In [None]:
# print(df.head())      # First 5 rows
# print(df.tail())      # Last 5 rows
# print(df.shape)       # (rows, cols)
# print(df.info())      # Column types & memory
print(df.describe())  # Stats for numeric columns

        Age
count   3.0
mean   30.0
std     5.0
min    25.0
25%    27.5
50%    30.0
75%    32.5
max    35.0


## **Step 5: Selecting Data**

In [None]:
# print(df["Name"])        # Select single column (Series)
# print(df[["Name", "City"]])  # Select multiple columns

# print(df.loc[0])         # Row by label
# print(df.iloc[1])        # Row by position (2nd row)
print(df.loc[0, "Name"]) # Single value


Alice


## **Step 6: Filtering Data**

In [None]:
# print(df[df["Age"] > 28])     # Filter rows where Age > 28
print(df[df["City"] == "Dhaka"])

    Name  Age   City
0  Alice   25  Dhaka


## **Step 7: Adding / Modifying Columns**

In [None]:
df["Age+5"] = df["Age"] + 5   # Add new column
df["Senior"] = df["Age"] > 30 # Boolean column
print(df)

      Name  Age        City  Age+5  Senior
0    Alice   25       Dhaka     30   False
1      Bob   30  Chittagong     35   False
2  Charlie   35      Sylhet     40    True


## **Step 8: Handling Missing Data**

In [None]:
df2 = pd.DataFrame({
    "Name": ["Ali", "Sara", "John"],
    "Age": [25, None, 40],
    "City": ["Dhaka", "Khulna", None]
})

# print(df2.isnull())    # Check missing
# print(df2.dropna())    # Drop rows with NaN
print(df2.fillna("Unknown"))  # Replace NaN

   Name      Age     City
0   Ali     25.0    Dhaka
1  Sara  Unknown   Khulna
2  John     40.0  Unknown


## **Step 9: Grouping & Aggregation**

In [None]:
print(df.groupby("City")["Age"].mean())  # Avg age per city

City
Chittagong    30.0
Dhaka         25.0
Sylhet        35.0
Name: Age, dtype: float64


## **Step 10: Importing & Exporting Data**

In [None]:
# Save
df.to_csv("people.csv", index=False)

# Load
df_loaded = pd.read_csv("people.csv")
print(df_loaded)

      Name  Age        City  Age+5  Senior
0    Alice   25       Dhaka     30   False
1      Bob   30  Chittagong     35   False
2  Charlie   35      Sylhet     40    True
