<a href="https://colab.research.google.com/github/Ramandeep-Singh17/PandasCompleteNoteswithProject./blob/main/PandasCompleteNotes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🐼 Pandas Complete Notes – Google Colab Style (By Ramandeep Singh)

---

## 📌 Table of Contents
1. What is Pandas?  
2. Series & DataFrame  
3. Data Load karna  
4. Basic Operations  
5. Data Select & Filter  
6. Missing Data  
7. Data Clean & Type Change  
8. Useful Functions  
9. GroupBy & Aggregation  
10. Merge & Join  
11. Export to File  
12. Feature Engineering  
13. Categorical Encoding  
14. DateTime Columns  
15. Project Learnings  
16. Advance Pandas (Optional)

---

## 🧠 1. What is Pandas?

```python
import pandas as pd
import numpy as np
```

### ✅ What?
Pandas is a Python library used for data manipulation and analysis. It offers two main data structures:
- Series: One-dimensional labeled array
- DataFrame: Two-dimensional labeled table (like Excel)

### ✅ Why?
- Clean and analyze structured data easily
- Supports multiple file formats (CSV, Excel, JSON)
- Fast, flexible, and intuitive syntax

### ✅ When to Use?
- When working with table-like datasets (rows/columns)
- During data preprocessing, transformation, or EDA

### ✅ Real-life Example
- You want to clean a customer sales Excel file and analyze patterns.

---

## 📦 2. Series & DataFrame

```python
# Series – 1D labeled array
s = pd.Series([10, 20, 30])
print(s)

# DataFrame – 2D table
data = {'Name': ['Alice', 'Bob'], 'Age': [25, 30]}
df = pd.DataFrame(data)
print(df)
```

---

## 📥 3. Data Load Karna (File Read)

```python
pd.read_csv("file.csv")       # CSV file
pd.read_excel("file.xlsx")    # Excel file
pd.read_json("file.json")     # JSON file
```

### 💡 Tip:
- Always check `df.head()` after loading to verify content.

---

## 🛠️ 4. Basic Operations

```python
df.head()       # First 5 rows
df.tail()       # Last 5 rows
df.shape        # (rows, columns)
df.info()       # Data types + nulls
df.describe()   # Summary stats
```

---

## 🎯 5. Data Select & Filter

```python
df['Age']                    # Single column
df[['Name', 'Age']]          # Multiple columns
df.loc[2]                    # Row by index label
df.iloc[2]                   # Row by position
df[df['Age'] > 25]           # Filter rows
```

---

## 🚫 6. Missing Data

```python
df.isnull().sum()                      # Null count
df.dropna()                            # Drop missing rows
df.fillna(0)                           # Fill nulls with 0
df['col'].replace('Unknown', np.nan)  # Replace with NaN
```

---

## 🧽 7. Data Clean & Type Change

```python
df['Age'] = df['Age'].astype(int)     # Convert to integer
df['Name'] = df['Name'].str.strip()   # Remove extra space
df['Name'] = df['Name'].str.lower()   # To lowercase
```

---

## 🧰 8. Useful Functions

```python
df['col'].value_counts()    # Frequency count
df['col'].unique()          # Unique values
df['col'].nunique()         # Count of unique values
df.sample(5)                # Random rows
df.duplicated().sum()       # Count duplicate rows
```

---

## 🧠 9. GroupBy & Aggregation

```python
df.groupby('department').mean()          # Avg by dept
df.groupby('gender')['salary'].sum()     # Sum by gender
```

---

## 🔗 10. Merge & Join

```python
pd.merge(df1, df2, on='id', how='inner')  # Merge on id
pd.concat([df1, df2])                     # Stack vertically
```

---

## 📤 11. Export to File

```python
df.to_csv("output.csv", index=False)     # Save to CSV
df.to_excel("output.xlsx")               # Save to Excel
```

---

## 🧠 12. Feature Engineering
   
   *Feature Engineering ka matlab hota hai naye features banana ya existing ko improve karna*

     Salary aur bonus ko add karke ek naya column bana rahe hain

```python
df['BonusSalary'] = df['salary'] + df['bonus']
df['Initial'] = df['Name'].apply(lambda x: x[0])
```

---

## 🧾 13. Categorical Encoding

 Categorical Encoding ka matlab hai text categories ko numerical form me badalna.

 get_dummies() se One-Hot Encoding hota hai – har unique value ke liye ek column banta hai


```python
pd.get_dummies(df, columns=['Gender'])
```

---

## 📆 14. DateTime Columns

   Date column ko proper datetime object me convert kar rahe hain
   
   Taaki hum year, month, day jaise attributes nikal sakein

```python
df['Date'] = pd.to_datetime(df['Date'])
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month
```

---

## 🔍 15. Project Learnings

```python
df.loc[2]['Title']  # Specific row-column access

df[df['episodes'] == 'Unknown']  # Filter unknown episodes

df['episodes'].replace('Unknown', np.nan, inplace=True)  # Clean
```

---

## 🧪 16. Advance Pandas (Optional)

```python
df.query("Age > 30 and Gender == 'Male'")

df.eval("Total = salary + bonus", inplace=True)

df.explode('skills')

pd.cut(df['Age'], bins=[0,18,35,60], labels=['Teen','Adult','Senior'])

df.corr()  # Correlation matrix
