# 🐼 Learning Pandas in Python for EDA

<div style="background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); padding: 25px; border-radius: 12px; margin: 20px 0;">
    <h2 style="color: #ffffff; margin: 0; font-weight: 600;">What is Pandas?</h2>
</div>

**Pandas** is a powerful, open-source data manipulation and analysis library for Python. Built on top of NumPy, it provides high-performance, easy-to-use data structures and tools specifically designed for working with structured data.

---

## 🎯 Core Functionality

<div style="background: linear-gradient(135deg, #f093fb 0%, #f5576c 100%); padding: 20px; border-radius: 10px; margin: 15px 0;">

Pandas excels at:
- **Data Cleaning** → Handle missing values, duplicates, and inconsistencies
- **Data Transformation** → Reshape, pivot, merge, and aggregate datasets
- **Data Analysis** → Statistical operations and time series analysis
- **Data Visualization** → Quick plotting capabilities integrated with Matplotlib

</div>

---

## ✨ Key Benefits

<table style="width: 100%; border-collapse: collapse; margin: 20px 0;">
    <tr style="background: linear-gradient(135deg, #3494e6 0%, #ec6ead 100%);">
        <th style="padding: 15px; color: #ffffff; text-align: left; border-radius: 8px 0 0 0;">Benefit</th>
        <th style="padding: 15px; color: #ffffff; text-align: left; border-radius: 0 8px 0 0;">Description</th>
    </tr>
    <tr style="background: #1a1a2e;">
        <td style="padding: 12px; color: #e0e0e0; border-bottom: 1px solid #333;"><strong>⚡ Performance</strong></td>
        <td style="padding: 12px; color: #e0e0e0; border-bottom: 1px solid #333;">Optimized C-based operations for handling large datasets efficiently</td>
    </tr>
    <tr style="background: #16213e;">
        <td style="padding: 12px; color: #e0e0e0; border-bottom: 1px solid #333;"><strong>🔄 Flexibility</strong></td>
        <td style="padding: 12px; color: #e0e0e0; border-bottom: 1px solid #333;">Works seamlessly with CSV, Excel, SQL, JSON, and more</td>
    </tr>
    <tr style="background: #1a1a2e;">
        <td style="padding: 12px; color: #e0e0e0; border-bottom: 1px solid #333;"><strong>🎨 Intuitive</strong></td>
        <td style="padding: 12px; color: #e0e0e0; border-bottom: 1px solid #333;">Pythonic syntax with DataFrame and Series structures</td>
    </tr>
    <tr style="background: #16213e;">
        <td style="padding: 12px; color: #e0e0e0;"><strong>🌐 Integration</strong></td>
        <td style="padding: 12px; color: #e0e0e0;">Compatible with NumPy, Matplotlib, Scikit-learn ecosystem</td>
    </tr>
</table>

---

## 📦 Installation

<div style="background: linear-gradient(135deg, #fa709a 0%, #fee140 100%); padding: 3px; border-radius: 10px; margin: 20px 0;">
    <div style="background: #1e1e1e; padding: 20px; border-radius: 8px;">
        <h3 style="color: #ffd700; margin-top: 0;">💻 Installing on Jupyter Notebook</h3>
        <p style="color: #b0b0b0; margin-bottom: 15px;">Execute the following command in a code cell:</p>
    </div>
</div>

```python
# Install pandas in Jupyter environment
!pip install pandas --upgrade
```

<div style="background: linear-gradient(135deg, #a8edea 0%, #fed6e3 100%); padding: 3px; border-radius: 10px; margin: 20px 0;">
    <div style="background: #1e1e1e; padding: 20px; border-radius: 8px;">
        <h3 style="color: #4ecdc4; margin-top: 0;">🪟 Installing on Windows (Command Prompt/PowerShell)</h3>
        <p style="color: #b0b0b0; margin-bottom: 15px;">Open your terminal and run:</p>
    </div>
</div>

```bash
# Standard installation
pip install pandas

# Or with specific version
pip install pandas==2.1.0

# Verify installation
python -c "import pandas as pd; print(pd.__version__)"
```

<div style="background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); padding: 20px; border-radius: 10px; margin: 25px 0; border-left: 5px solid #ffd700;">
    <p style="color: #ffffff; margin: 0; font-size: 14px;">
        💡 <strong>Pro Tip:</strong> After installation, restart your Jupyter kernel to ensure pandas is properly loaded into your environment.
    </p>
</div>

---

<div style="text-align: center; margin: 30px 0; padding: 20px; background: linear-gradient(135deg, #f5af19 0%, #f12711 100%); border-radius: 10px;">
    <h3 style="color: #ffffff; margin: 0;">🚀 Ready to Start Your EDA Journey!</h3>
</div>

In [1]:
import pandas as pd
import numpy as np

---

## 📊 Creating Your First DataFrame

<div style="background: linear-gradient(135deg, #11998e 0%, #38ef7d 100%); padding: 3px; border-radius: 10px; margin: 20px 0;">
    <div style="background: #1e1e1e; padding: 20px; border-radius: 8px;">
        <h3 style="color: #38ef7d; margin-top: 0;">🎯 DataFrame from Dictionary</h3>
        <p style="color: #b0b0b0; margin-bottom: 10px;">A DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. Let's create one from a Python dictionary:</p>
    </div>
</div>

```python
import pandas as pd

# Create a dictionary with student data
student_data = {
    "name": ['Anish', 'Manish'],
    "marks": [92, 82],
    "city": ['Delhi', 'Mumbai']
}

# Convert dictionary to DataFrame
df = pd.DataFrame(student_data)

# Display the DataFrame
df
```

<div style="background: linear-gradient(135deg, #fa709a 0%, #fee140 100%); padding: 20px; border-radius: 10px; margin: 20px 0; border-left: 5px solid #38ef7d;">
    <p style="color: #1e1e1e; margin: 0; font-size: 14px;">
        📝 <strong>Note:</strong> Each dictionary key becomes a column name, and the values (lists) become the rows. This is one of the most common ways to create DataFrames in pandas!
    </p>
</div>

---

In [None]:
dict1 = {
    "name" : ['Anish', 'Manish'],
    "makrs" : [92, 82],
    "city" : ['Delhi', 'Mumbai']
}

df = pd.DataFrame(dict1)
df

Unnamed: 0,name,makrs,city
0,Anish,92,Delhi
1,Manish,82,Mumbai


---

## 💾 Exporting DataFrame to CSV

<div style="background: linear-gradient(135deg, #7f00ff 0%, #e100ff 100%); padding: 3px; border-radius: 10px; margin: 20px 0;">
    <div style="background: #1e1e1e; padding: 20px; border-radius: 8px;">
        <h3 style="color: #e100ff; margin-top: 0;">📁 Saving Data to CSV Files</h3>
        <p style="color: #b0b0b0; margin-bottom: 10px;">Learn how to export your DataFrame to CSV format with and without index column:</p>
    </div>
</div>

```python
# Export DataFrame with default index column
df.to_csv("data.csv")

# Export DataFrame without index column
df.to_csv("data_without_index.csv", index=False)
```

<div style="background: linear-gradient(135deg, #ff6b6b 0%, #feca57 100%); padding: 3px; border-radius: 10px; margin: 20px 0;">
    <div style="background: #1e1e1e; padding: 15px; border-radius: 8px;">
        <h4 style="color: #feca57; margin-top: 0;">📄 Output: data.csv (with index)</h4>
    </div>
</div>

```csv
,name,marks,city
0,Anish,92,Delhi
1,Manish,82,Mumbai
```

<div style="background: linear-gradient(135deg, #4facfe 0%, #00f2fe 100%); padding: 3px; border-radius: 10px; margin: 20px 0;">
    <div style="background: #1e1e1e; padding: 15px; border-radius: 8px;">
        <h4 style="color: #00f2fe; margin-top: 0;">📄 Output: data_without_index.csv (without index)</h4>
    </div>
</div>

```csv
name,marks,city
Anish,92,Delhi
Manish,82,Mumbai
```

<div style="background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); padding: 20px; border-radius: 10px; margin: 20px 0; border-left: 5px solid #00f2fe;">
    <p style="color: #ffffff; margin: 0; font-size: 14px;">
        💡 <strong>Key Difference:</strong> Using <code style="background: #1e1e1e; padding: 3px 8px; border-radius: 4px; color: #feca57;">index=False</code> removes the default numeric index column from the CSV file, creating a cleaner output for sharing or importing into other tools.
    </p>
</div>

---

In [None]:
df.to_csv("data.csv")

df.to_csv("data_without_index", index = False)

---

## 🔍 Exploring DataFrame Methods

<div style="background: linear-gradient(135deg, #f093fb 0%, #f5576c 100%); padding: 3px; border-radius: 10px; margin: 20px 0;">
    <div style="background: #1e1e1e; padding: 20px; border-radius: 8px;">
        <h3 style="color: #f5576c; margin-top: 0;">👀 Quick Data Preview & Statistical Summary</h3>
        <p style="color: #b0b0b0; margin-bottom: 10px;">Essential methods to inspect and understand your dataset at a glance:</p>
    </div>
</div>

### 📌 View Top Rows with `head()`

```python
# Display first 5 rows (default)
df.head()
```

<div style="background: #1e1e1e; padding: 15px; border-radius: 8px; margin: 15px 0; border: 2px solid #f093fb;">
    <table style="width: 100%; border-collapse: collapse;">
        <tr style="background: #2d2d2d;">
            <th style="padding: 10px; color: #f093fb; text-align: left; border-bottom: 2px solid #f5576c;"></th>
            <th style="padding: 10px; color: #f093fb; text-align: left; border-bottom: 2px solid #f5576c;">name</th>
            <th style="padding: 10px; color: #f093fb; text-align: left; border-bottom: 2px solid #f5576c;">marks</th>
            <th style="padding: 10px; color: #f093fb; text-align: left; border-bottom: 2px solid #f5576c;">city</th>
        </tr>
        <tr style="background: #1a1a1a;">
            <td style="padding: 10px; color: #feca57; font-weight: bold;">0</td>
            <td style="padding: 10px; color: #e0e0e0;">Anish</td>
            <td style="padding: 10px; color: #e0e0e0;">92</td>
            <td style="padding: 10px; color: #e0e0e0;">Delhi</td>
        </tr>
        <tr style="background: #252525;">
            <td style="padding: 10px; color: #feca57; font-weight: bold;">1</td>
            <td style="padding: 10px; color: #e0e0e0;">Manish</td>
            <td style="padding: 10px; color: #e0e0e0;">82</td>
            <td style="padding: 10px; color: #e0e0e0;">Mumbai</td>
        </tr>
    </table>
</div>

```python
# Display only first row
df.head(1)
```

<div style="background: #1e1e1e; padding: 15px; border-radius: 8px; margin: 15px 0; border: 2px solid #f093fb;">
    <table style="width: 100%; border-collapse: collapse;">
        <tr style="background: #2d2d2d;">
            <th style="padding: 10px; color: #f093fb; text-align: left; border-bottom: 2px solid #f5576c;"></th>
            <th style="padding: 10px; color: #f093fb; text-align: left; border-bottom: 2px solid #f5576c;">name</th>
            <th style="padding: 10px; color: #f093fb; text-align: left; border-bottom: 2px solid #f5576c;">marks</th>
            <th style="padding: 10px; color: #f093fb; text-align: left; border-bottom: 2px solid #f5576c;">city</th>
        </tr>
        <tr style="background: #1a1a1a;">
            <td style="padding: 10px; color: #feca57; font-weight: bold;">0</td>
            <td style="padding: 10px; color: #e0e0e0;">Anish</td>
            <td style="padding: 10px; color: #e0e0e0;">92</td>
            <td style="padding: 10px; color: #e0e0e0;">Delhi</td>
        </tr>
    </table>
</div>

---

### 📌 View Bottom Rows with `tail()`

```python
# Display last 5 rows (default)
df.tail()
```

<div style="background: #1e1e1e; padding: 15px; border-radius: 8px; margin: 15px 0; border: 2px solid #38ef7d;">
    <table style="width: 100%; border-collapse: collapse;">
        <tr style="background: #2d2d2d;">
            <th style="padding: 10px; color: #38ef7d; text-align: left; border-bottom: 2px solid #11998e;"></th>
            <th style="padding: 10px; color: #38ef7d; text-align: left; border-bottom: 2px solid #11998e;">name</th>
            <th style="padding: 10px; color: #38ef7d; text-align: left; border-bottom: 2px solid #11998e;">marks</th>
            <th style="padding: 10px; color: #38ef7d; text-align: left; border-bottom: 2px solid #11998e;">city</th>
        </tr>
        <tr style="background: #1a1a1a;">
            <td style="padding: 10px; color: #feca57; font-weight: bold;">0</td>
            <td style="padding: 10px; color: #e0e0e0;">Anish</td>
            <td style="padding: 10px; color: #e0e0e0;">92</td>
            <td style="padding: 10px; color: #e0e0e0;">Delhi</td>
        </tr>
        <tr style="background: #252525;">
            <td style="padding: 10px; color: #feca57; font-weight: bold;">1</td>
            <td style="padding: 10px; color: #e0e0e0;">Manish</td>
            <td style="padding: 10px; color: #e0e0e0;">82</td>
            <td style="padding: 10px; color: #e0e0e0;">Mumbai</td>
        </tr>
    </table>
</div>

```python
# Display only last row
df.tail(1)
```

<div style="background: #1e1e1e; padding: 15px; border-radius: 8px; margin: 15px 0; border: 2px solid #38ef7d;">
    <table style="width: 100%; border-collapse: collapse;">
        <tr style="background: #2d2d2d;">
            <th style="padding: 10px; color: #38ef7d; text-align: left; border-bottom: 2px solid #11998e;"></th>
            <th style="padding: 10px; color: #38ef7d; text-align: left; border-bottom: 2px solid #11998e;">name</th>
            <th style="padding: 10px; color: #38ef7d; text-align: left; border-bottom: 2px solid #11998e;">marks</th>
            <th style="padding: 10px; color: #38ef7d; text-align: left; border-bottom: 2px solid #11998e;">city</th>
        </tr>
        <tr style="background: #1a1a1a;">
            <td style="padding: 10px; color: #feca57; font-weight: bold;">1</td>
            <td style="padding: 10px; color: #e0e0e0;">Manish</td>
            <td style="padding: 10px; color: #e0e0e0;">82</td>
            <td style="padding: 10px; color: #e0e0e0;">Mumbai</td>
        </tr>
    </table>
</div>

---

### 📊 Statistical Summary with `describe()`

```python
# Generate descriptive statistics for numerical columns
df.describe()
```

<div style="background: #1e1e1e; padding: 15px; border-radius: 8px; margin: 15px 0; border: 2px solid #e100ff;">
    <table style="width: 100%; border-collapse: collapse;">
        <tr style="background: #2d2d2d;">
            <th style="padding: 10px; color: #e100ff; text-align: left; border-bottom: 2px solid #7f00ff;"></th>
            <th style="padding: 10px; color: #e100ff; text-align: center; border-bottom: 2px solid #7f00ff;">marks</th>
        </tr>
        <tr style="background: #1a1a1a;">
            <td style="padding: 10px; color: #feca57; font-weight: bold;">count</td>
            <td style="padding: 10px; color: #e0e0e0; text-align: center;">2.0</td>
        </tr>
        <tr style="background: #252525;">
            <td style="padding: 10px; color: #feca57; font-weight: bold;">mean</td>
            <td style="padding: 10px; color: #e0e0e0; text-align: center;">87.0</td>
        </tr>
        <tr style="background: #1a1a1a;">
            <td style="padding: 10px; color: #feca57; font-weight: bold;">std</td>
            <td style="padding: 10px; color: #e0e0e0; text-align: center;">7.071068</td>
        </tr>
        <tr style="background: #252525;">
            <td style="padding: 10px; color: #feca57; font-weight: bold;">min</td>
            <td style="padding: 10px; color: #e0e0e0; text-align: center;">82.0</td>
        </tr>
        <tr style="background: #1a1a1a;">
            <td style="padding: 10px; color: #feca57; font-weight: bold;">25%</td>
            <td style="padding: 10px; color: #e0e0e0; text-align: center;">84.5</td>
        </tr>
        <tr style="background: #252525;">
            <td style="padding: 10px; color: #feca57; font-weight: bold;">50%</td>
            <td style="padding: 10px; color: #e0e0e0; text-align: center;">87.0</td>
        </tr>
        <tr style="background: #1a1a1a;">
            <td style="padding: 10px; color: #feca57; font-weight: bold;">75%</td>
            <td style="padding: 10px; color: #e0e0e0; text-align: center;">89.5</td>
        </tr>
        <tr style="background: #252525;">
            <td style="padding: 10px; color: #feca57; font-weight: bold;">max</td>
            <td style="padding: 10px; color: #e0e0e0; text-align: center;">92.0</td>
        </tr>
    </table>
</div>

<div style="background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); padding: 20px; border-radius: 10px; margin: 20px 0; border-left: 5px solid #e100ff;">
    <p style="color: #ffffff; margin: 0; font-size: 14px;">
        💡 <strong>Pro Tip:</strong> <code style="background: #1e1e1e; padding: 3px 8px; border-radius: 4px; color: #feca57;">describe()</code> automatically calculates count, mean, standard deviation, min/max, and quartiles for all numerical columns—perfect for quick statistical analysis!
    </p>
</div>

---

---

## 📥 Reading & Accessing DataFrame Data

<div style="background: linear-gradient(135deg, #4facfe 0%, #00f2fe 100%); padding: 3px; border-radius: 10px; margin: 20px 0;">
    <div style="background: #1e1e1e; padding: 20px; border-radius: 8px;">
        <h3 style="color: #00f2fe; margin-top: 0;">📂 Loading CSV Files & Accessing Elements</h3>
        <p style="color: #b0b0b0; margin-bottom: 10px;">Learn how to read CSV files and access specific columns and values from your DataFrame:</p>
    </div>
</div>

### 📖 Reading CSV File

```python
# Load data from CSV file
data = pd.read_csv('data.csv')
data
```

<div style="background: #1e1e1e; padding: 15px; border-radius: 8px; margin: 15px 0; border: 2px solid #4facfe;">
    <table style="width: 100%; border-collapse: collapse;">
        <tr style="background: #2d2d2d;">
            <th style="padding: 10px; color: #4facfe; text-align: left; border-bottom: 2px solid #00f2fe;">Unnamed: 0</th>
            <th style="padding: 10px; color: #4facfe; text-align: left; border-bottom: 2px solid #00f2fe;">name</th>
            <th style="padding: 10px; color: #4facfe; text-align: left; border-bottom: 2px solid #00f2fe;">marks</th>
            <th style="padding: 10px; color: #4facfe; text-align: left; border-bottom: 2px solid #00f2fe;">city</th>
        </tr>
        <tr style="background: #1a1a1a;">
            <td style="padding: 10px; color: #feca57; font-weight: bold;">0</td>
            <td style="padding: 10px; color: #e0e0e0;">Anish</td>
            <td style="padding: 10px; color: #e0e0e0;">92</td>
            <td style="padding: 10px; color: #e0e0e0;">Delhi</td>
        </tr>
        <tr style="background: #252525;">
            <td style="padding: 10px; color: #feca57; font-weight: bold;">1</td>
            <td style="padding: 10px; color: #e0e0e0;">Manish</td>
            <td style="padding: 10px; color: #e0e0e0;">82</td>
            <td style="padding: 10px; color: #e0e0e0;">Mumbai</td>
        </tr>
    </table>
</div>

---

### 🎯 Accessing a Single Column

```python
# Access the 'name' column (returns a Series)
data['name']
```

<div style="background: #1e1e1e; padding: 15px; border-radius: 8px; margin: 15px 0; border: 2px solid #f093fb;">
    <pre style="color: #e0e0e0; margin: 0; font-family: 'Courier New', monospace;">
<span style="color: #feca57;">0</span>      Anish
<span style="color: #feca57;">1</span>     Manish
<span style="color: #38ef7d;">Name:</span> name, <span style="color: #38ef7d;">dtype:</span> object
    </pre>
</div>

---

### 🔢 Accessing Specific Value by Index

```python
# Access the value at index 1 in 'name' column
data['name'][1]
```

<div style="background: #1e1e1e; padding: 15px; border-radius: 8px; margin: 15px 0; border: 2px solid #38ef7d;">
    <pre style="color: #e0e0e0; margin: 0; font-family: 'Courier New', monospace; font-size: 16px;">
<span style="color: #feca57;">'Manish'</span>
    </pre>
</div>

<div style="background: linear-gradient(135deg, #fa709a 0%, #fee140 100%); padding: 20px; border-radius: 10px; margin: 20px 0; border-left: 5px solid #38ef7d;">
    <p style="color: #1e1e1e; margin: 0; font-size: 14px;">
        ⚠️ <strong>Note:</strong> The CSV was saved with index, creating an extra <code style="background: #1e1e1e; padding: 3px 8px; border-radius: 4px; color: #feca57;">'Unnamed: 0'</code> column. Use <code style="background: #1e1e1e; padding: 3px 8px; border-radius: 4px; color: #feca57;">index_col=0</code> parameter to avoid this: <code style="background: #1e1e1e; padding: 3px 8px; border-radius: 4px; color: #feca57;">pd.read_csv('data.csv', index_col=0)</code>
    </p>
</div>

<div style="background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); padding: 20px; border-radius: 10px; margin: 20px 0; border-left: 5px solid #4facfe;">
    <p style="color: #ffffff; margin: 0; font-size: 14px;">
        💡 <strong>Pro Tip:</strong> Use <code style="background: #1e1e1e; padding: 3px 8px; border-radius: 4px; color: #feca57;">.loc[]</code> or <code style="background: #1e1e1e; padding: 3px 8px; border-radius: 4px; color: #feca57;">.iloc[]</code> for more robust indexing: <code style="background: #1e1e1e; padding: 3px 8px; border-radius: 4px; color: #feca57;">data.loc[1, 'name']</code> or <code style="background: #1e1e1e; padding: 3px 8px; border-radius: 4px; color: #feca57;">data.iloc[1, 0]</code>
    </p>
</div>

---

---

## ✏️ Modifying DataFrame Values

<div style="background: linear-gradient(135deg, #ff6b6b 0%, #feca57 100%); padding: 3px; border-radius: 10px; margin: 20px 0;">
    <div style="background: #1e1e1e; padding: 20px; border-radius: 8px;">
        <h3 style="color: #feca57; margin-top: 0;">🔧 Updating Values in DataFrame</h3>
        <p style="color: #b0b0b0; margin-bottom: 10px;">Learn the correct way to modify values in your DataFrame to avoid warnings:</p>
    </div>
</div>

### 📋 Original DataFrame

```python
df
```

<div style="background: #1e1e1e; padding: 15px; border-radius: 8px; margin: 15px 0; border: 2px solid #4facfe;">
    <table style="width: 100%; border-collapse: collapse;">
        <tr style="background: #2d2d2d;">
            <th style="padding: 10px; color: #4facfe; text-align: left; border-bottom: 2px solid #00f2fe;"></th>
            <th style="padding: 10px; color: #4facfe; text-align: left; border-bottom: 2px solid #00f2fe;">name</th>
            <th style="padding: 10px; color: #4facfe; text-align: left; border-bottom: 2px solid #00f2fe;">marks</th>
            <th style="padding: 10px; color: #4facfe; text-align: left; border-bottom: 2px solid #00f2fe;">city</th>
        </tr>
        <tr style="background: #1a1a1a;">
            <td style="padding: 10px; color: #feca57; font-weight: bold;">0</td>
            <td style="padding: 10px; color: #e0e0e0;">Anish</td>
            <td style="padding: 10px; color: #e0e0e0;">92</td>
            <td style="padding: 10px; color: #e0e0e0;">Delhi</td>
        </tr>
        <tr style="background: #252525;">
            <td style="padding: 10px; color: #feca57; font-weight: bold;">1</td>
            <td style="padding: 10px; color: #e0e0e0;">Manish</td>
            <td style="padding: 10px; color: #e0e0e0;">82</td>
            <td style="padding: 10px; color: #e0e0e0;">Mumbai</td>
        </tr>
    </table>
</div>

---

### ❌ Incorrect Method (Chained Assignment)

```python
# This works but generates a warning (not recommended)
df['name'][1] = 'Rohan'
df
```

<div style="background: #2d1b1b; padding: 15px; border-radius: 8px; margin: 15px 0; border-left: 5px solid #ff6b6b;">
    <p style="color: #ff6b6b; margin: 0; font-weight: bold; font-size: 14px;">⚠️ FutureWarning & SettingWithCopyWarning</p>
    <pre style="color: #ffb3b3; margin: 10px 0 0 0; font-size: 12px; overflow-x: auto;">
ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment.

Use `df.loc[row_indexer, "col"] = values` instead.
    </pre>
</div>

<div style="background: #1e1e1e; padding: 15px; border-radius: 8px; margin: 15px 0; border: 2px solid #38ef7d;">
    <table style="width: 100%; border-collapse: collapse;">
        <tr style="background: #2d2d2d;">
            <th style="padding: 10px; color: #38ef7d; text-align: left; border-bottom: 2px solid #11998e;"></th>
            <th style="padding: 10px; color: #38ef7d; text-align: left; border-bottom: 2px solid #11998e;">name</th>
            <th style="padding: 10px; color: #38ef7d; text-align: left; border-bottom: 2px solid #11998e;">marks</th>
            <th style="padding: 10px; color: #38ef7d; text-align: left; border-bottom: 2px solid #11998e;">city</th>
        </tr>
        <tr style="background: #1a1a1a;">
            <td style="padding: 10px; color: #feca57; font-weight: bold;">0</td>
            <td style="padding: 10px; color: #e0e0e0;">Anish</td>
            <td style="padding: 10px; color: #e0e0e0;">92</td>
            <td style="padding: 10px; color: #e0e0e0;">Delhi</td>
        </tr>
        <tr style="background: #252525;">
            <td style="padding: 10px; color: #feca57; font-weight: bold;">1</td>
            <td style="padding: 10px; color: #38ef7d;">Rohan</td>
            <td style="padding: 10px; color: #e0e0e0;">82</td>
            <td style="padding: 10px; color: #e0e0e0;">Mumbai</td>
        </tr>
    </table>
</div>

---

### ✅ Correct Method (Using `.loc[]`)

```python
# Recommended approach - use .loc[] for safe assignment
df.loc[1, 'name'] = 'Rohan'
df
```

<div style="background: #1e1e1e; padding: 15px; border-radius: 8px; margin: 15px 0; border: 2px solid #38ef7d;">
    <table style="width: 100%; border-collapse: collapse;">
        <tr style="background: #2d2d2d;">
            <th style="padding: 10px; color: #38ef7d; text-align: left; border-bottom: 2px solid #11998e;"></th>
            <th style="padding: 10px; color: #38ef7d; text-align: left; border-bottom: 2px solid #11998e;">name</th>
            <th style="padding: 10px; color: #38ef7d; text-align: left; border-bottom: 2px solid #11998e;">marks</th>
            <th style="padding: 10px; color: #38ef7d; text-align: left; border-bottom: 2px solid #11998e;">city</th>
        </tr>
        <tr style="background: #1a1a1a;">
            <td style="padding: 10px; color: #feca57; font-weight: bold;">0</td>
            <td style="padding: 10px; color: #e0e0e0;">Anish</td>
            <td style="padding: 10px; color: #e0e0e0;">92</td>
            <td style="padding: 10px; color: #e0e0e0;">Delhi</td>
        </tr>
        <tr style="background: #252525;">
            <td style="padding: 10px; color: #feca57; font-weight: bold;">1</td>
            <td style="padding: 10px; color: #38ef7d;">Rohan</td>
            <td style="padding: 10px; color: #e0e0e0;">82</td>
            <td style="padding: 10px; color: #e0e0e0;">Mumbai</td>
        </tr>
    </table>
</div>

<div style="background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); padding: 20px; border-radius: 10px; margin: 20px 0; border-left: 5px solid #38ef7d;">
    <h4 style="color: #ffffff; margin-top: 0;">📚 Why Use .loc[]?</h4>
    <ul style="color: #e0e0e0; margin: 10px 0 0 20px; line-height: 1.8;">
        <li><strong>Single Operation:</strong> Updates happen in one step, not chained</li>
        <li><strong>No Warnings:</strong> Avoids FutureWarning and SettingWithCopyWarning</li>
        <li><strong>Guaranteed Update:</strong> Works correctly with Copy-on-Write in pandas 3.0+</li>
        <li><strong>Clear Intent:</strong> Explicitly specifies row and column for modification</li>
    </ul>
</div>

<div style="background: linear-gradient(135deg, #fa709a 0%, #fee140 100%); padding: 20px; border-radius: 10px; margin: 20px 0; border-left: 5px solid #e100ff;">
    <p style="color: #1e1e1e; margin: 0; font-size: 14px;">
        💡 <strong>Pro Tip:</strong> Always use <code style="background: #1e1e1e; padding: 3px 8px; border-radius: 4px; color: #feca57;">.loc[row, column]</code> for label-based indexing or <code style="background: #1e1e1e; padding: 3px 8px; border-radius: 4px; color: #feca57;">.iloc[row, column]</code> for position-based indexing when modifying DataFrame values!
    </p>
</div>

---

---

## 🏷️ Customizing DataFrame Index

<div style="background: linear-gradient(135deg, #a8edea 0%, #fed6e3 100%); padding: 3px; border-radius: 10px; margin: 20px 0;">
    <div style="background: #1e1e1e; padding: 20px; border-radius: 8px;">
        <h3 style="color: #4ecdc4; margin-top: 0;">🔖 Setting Custom Row Labels</h3>
        <p style="color: #b0b0b0; margin-bottom: 10px;">Replace default numeric indices with meaningful custom labels:</p>
    </div>
</div>

### 📋 Original DataFrame (Default Index)
```python
data
```

<div style="background: #1e1e1e; padding: 15px; border-radius: 8px; margin: 15px 0; border: 2px solid #4facfe;">
    <table style="width: 100%; border-collapse: collapse;">
        <tr style="background: #2d2d2d;">
            <th style="padding: 10px; color: #4facfe; text-align: left; border-bottom: 2px solid #00f2fe;">Unnamed: 0</th>
            <th style="padding: 10px; color: #4facfe; text-align: left; border-bottom: 2px solid #00f2fe;">name</th>
            <th style="padding: 10px; color: #4facfe; text-align: left; border-bottom: 2px solid #00f2fe;">marks</th>
            <th style="padding: 10px; color: #4facfe; text-align: left; border-bottom: 2px solid #00f2fe;">city</th>
        </tr>
        <tr style="background: #1a1a1a;">
            <td style="padding: 10px; color: #feca57; font-weight: bold;">0</td>
            <td style="padding: 10px; color: #e0e0e0;">Anish</td>
            <td style="padding: 10px; color: #e0e0e0;">92</td>
            <td style="padding: 10px; color: #e0e0e0;">Delhi</td>
        </tr>
        <tr style="background: #252525;">
            <td style="padding: 10px; color: #feca57; font-weight: bold;">1</td>
            <td style="padding: 10px; color: #e0e0e0;">Manish</td>
            <td style="padding: 10px; color: #e0e0e0;">82</td>
            <td style="padding: 10px; color: #e0e0e0;">Mumbai</td>
        </tr>
    </table>
</div>

---

### 🎯 Setting Custom Index Labels
```python
# Assign custom labels to DataFrame rows
data.index = ['first', 'second']
data
```

<div style="background: #1e1e1e; padding: 15px; border-radius: 8px; margin: 15px 0; border: 2px solid #38ef7d;">
    <table style="width: 100%; border-collapse: collapse;">
        <tr style="background: #2d2d2d;">
            <th style="padding: 10px; color: #38ef7d; text-align: left; border-bottom: 2px solid #11998e;">Unnamed: 0</th>
            <th style="padding: 10px; color: #38ef7d; text-align: left; border-bottom: 2px solid #11998e;">name</th>
            <th style="padding: 10px; color: #38ef7d; text-align: left; border-bottom: 2px solid #11998e;">marks</th>
            <th style="padding: 10px; color: #38ef7d; text-align: left; border-bottom: 2px solid #11998e;">city</th>
        </tr>
        <tr style="background: #1a1a1a;">
            <td style="padding: 10px; color: #f093fb; font-weight: bold;">first</td>
            <td style="padding: 10px; color: #e0e0e0;">Anish</td>
            <td style="padding: 10px; color: #e0e0e0;">92</td>
            <td style="padding: 10px; color: #e0e0e0;">Delhi</td>
        </tr>
        <tr style="background: #252525;">
            <td style="padding: 10px; color: #f093fb; font-weight: bold;">second</td>
            <td style="padding: 10px; color: #e0e0e0;">Manish</td>
            <td style="padding: 10px; color: #e0e0e0;">82</td>
            <td style="padding: 10px; color: #e0e0e0;">Mumbai</td>
        </tr>
    </table>
</div>

<div style="background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); padding: 20px; border-radius: 10px; margin: 20px 0; border-left: 5px solid #f093fb;">
    <h4 style="color: #ffffff; margin-top: 0;">✨ Benefits of Custom Index</h4>
    <ul style="color: #e0e0e0; margin: 10px 0 0 20px; line-height: 1.8;">
        <li><strong>Meaningful Labels:</strong> Use descriptive names instead of numeric indices</li>
        <li><strong>Easier Access:</strong> Reference rows by name: <code style="background: #1e1e1e; padding: 3px 8px; border-radius: 4px; color: #feca57;">data.loc['first']</code></li>
        <li><strong>Better Readability:</strong> Makes data exploration more intuitive</li>
        <li><strong>Date/Time Support:</strong> Can use datetime objects as index for time-series data</li>
    </ul>
</div>

<div style="background: linear-gradient(135deg, #fa709a 0%, #fee140 100%); padding: 20px; border-radius: 10px; margin: 20px 0; border-left: 5px solid #38ef7d;">
    <p style="color: #1e1e1e; margin: 0; font-size: 14px;">
        💡 <strong>Pro Tip:</strong> You can also set index during DataFrame creation: <code style="background: #1e1e1e; padding: 3px 8px; border-radius: 4px; color: #feca57;">pd.DataFrame(data, index=['first', 'second'])</code> or when reading CSV: <code style="background: #1e1e1e; padding: 3px 8px; border-radius: 4px; color: #feca57;">pd.read_csv('file.csv', index_col='column_name')</code>
    </p>
</div>

---

---

# 🐼 Understanding Pandas: The Ultimate Data Analysis Tool

<div style="background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); padding: 30px; border-radius: 15px; margin: 25px 0; box-shadow: 0 8px 20px rgba(0,0,0,0.3);">
    <h2 style="color: #ffffff; margin: 0; font-weight: 700; font-size: 28px;">What is Pandas?</h2>
    <p style="color: #e0e0e0; margin: 15px 0 0 0; font-size: 16px; line-height: 1.6;">
        Pandas is a high-performance, open-source Python library built on top of <strong>NumPy</strong> that provides powerful data structures and analysis tools. It's designed specifically for handling structured (tabular) data with ease and efficiency.
    </p>
</div>

---

## 🆚 Pandas vs Excel

<div style="display: grid; grid-template-columns: 1fr 1fr; gap: 20px; margin: 25px 0;">
    <div style="background: linear-gradient(135deg, #f093fb 0%, #f5576c 100%); padding: 3px; border-radius: 12px;">
        <div style="background: #1a1a2e; padding: 20px; border-radius: 10px;">
            <h3 style="color: #f5576c; margin-top: 0;">📊 Excel Limitations</h3>
            <ul style="color: #b0b0b0; line-height: 1.8;">
                <li>Limited to ~1 million rows</li>
                <li>Manual, click-based operations</li>
                <li>Slow with large datasets</li>
                <li>Hard to reproduce analysis</li>
                <li>Version control issues</li>
                <li>Limited automation</li>
            </ul>
        </div>
    </div>
    <div style="background: linear-gradient(135deg, #38ef7d 0%, #11998e 100%); padding: 3px; border-radius: 12px;">
        <div style="background: #1a1a2e; padding: 20px; border-radius: 10px;">
            <h3 style="color: #38ef7d; margin-top: 0;">🚀 Pandas Advantages</h3>
            <ul style="color: #b0b0b0; line-height: 1.8;">
                <li>Handles millions of rows effortlessly</li>
                <li>Code-based, repeatable workflows</li>
                <li>Lightning-fast operations</li>
                <li>Complete reproducibility</li>
                <li>Git-friendly scripts</li>
                <li>Full automation capability</li>
            </ul>
        </div>
    </div>
</div>

---

## 🆚 Pandas vs Python Native Structures

<table style="width: 100%; border-collapse: collapse; margin: 25px 0;">
    <tr style="background: linear-gradient(135deg, #3494e6 0%, #ec6ead 100%);">
        <th style="padding: 15px; color: #ffffff; text-align: left; border-radius: 8px 0 0 0;">Feature</th>
        <th style="padding: 15px; color: #ffffff; text-align: center;">Lists/Tuples/Dicts</th>
        <th style="padding: 15px; color: #ffffff; text-align: center; border-radius: 0 8px 0 0;">Pandas</th>
    </tr>
    <tr style="background: #1a1a2e;">
        <td style="padding: 12px; color: #feca57; font-weight: bold; border-bottom: 1px solid #333;">Speed</td>
        <td style="padding: 12px; color: #ff6b6b; text-align: center; border-bottom: 1px solid #333;">❌ Slow (Pure Python)</td>
        <td style="padding: 12px; color: #38ef7d; text-align: center; border-bottom: 1px solid #333;">✅ Fast (NumPy-backed)</td>
    </tr>
    <tr style="background: #16213e;">
        <td style="padding: 12px; color: #feca57; font-weight: bold; border-bottom: 1px solid #333;">Data Operations</td>
        <td style="padding: 12px; color: #ff6b6b; text-align: center; border-bottom: 1px solid #333;">❌ Manual loops required</td>
        <td style="padding: 12px; color: #38ef7d; text-align: center; border-bottom: 1px solid #333;">✅ Vectorized operations</td>
    </tr>
    <tr style="background: #1a1a2e;">
        <td style="padding: 12px; color: #feca57; font-weight: bold; border-bottom: 1px solid #333;">Missing Data</td>
        <td style="padding: 12px; color: #ff6b6b; text-align: center; border-bottom: 1px solid #333;">❌ Manual handling</td>
        <td style="padding: 12px; color: #38ef7d; text-align: center; border-bottom: 1px solid #333;">✅ Built-in NaN support</td>
    </tr>
    <tr style="background: #16213e;">
        <td style="padding: 12px; color: #feca57; font-weight: bold; border-bottom: 1px solid #333;">Alignment</td>
        <td style="padding: 12px; color: #ff6b6b; text-align: center; border-bottom: 1px solid #333;">❌ No automatic alignment</td>
        <td style="padding: 12px; color: #38ef7d; text-align: center; border-bottom: 1px solid #333;">✅ Automatic label alignment</td>
    </tr>
    <tr style="background: #1a1a2e;">
        <td style="padding: 12px; color: #feca57; font-weight: bold;">Memory Efficiency</td>
        <td style="padding: 12px; color: #ff6b6b; text-align: center;">❌ Higher overhead</td>
        <td style="padding: 12px; color: #38ef7d; text-align: center;">✅ Optimized storage</td>
    </tr>
</table>

---

## ⚡ The Power of NumPy Speed

<div style="background: linear-gradient(135deg, #fa709a 0%, #fee140 100%); padding: 3px; border-radius: 12px; margin: 25px 0;">
    <div style="background: #1e1e1e; padding: 25px; border-radius: 10px;">
        <h3 style="color: #fee140; margin-top: 0;">🔥 Why Pandas is Lightning Fast</h3>
        <p style="color: #e0e0e0; line-height: 1.8; margin: 15px 0;">
            Pandas is built on top of <strong style="color: #feca57;">NumPy</strong>, which uses <strong style="color: #feca57;">C-based optimized operations</strong>. This means:
        </p>
        <ul style="color: #b0b0b0; line-height: 1.8;">
            <li><strong style="color: #38ef7d;">Vectorization:</strong> Operations on entire arrays without Python loops (10-100x faster)</li>
            <li><strong style="color: #38ef7d;">Contiguous Memory:</strong> Data stored efficiently in memory blocks for rapid access</li>
            <li><strong style="color: #38ef7d;">Compiled Code:</strong> Core operations run at C speed, not Python interpreter speed</li>
            <li><strong style="color: #38ef7d;">Broadcasting:</strong> Intelligent element-wise operations without explicit iteration</li>
        </ul>
        <div style="background: #2d2d2d; padding: 15px; border-radius: 8px; margin-top: 15px;">
            <p style="color: #feca57; margin: 0; font-family: monospace;">
                <strong>Example:</strong> Adding 1 million numbers<br>
                <span style="color: #ff6b6b;">Python List:</span> ~100ms<br>
                <span style="color: #38ef7d;">Pandas/NumPy:</span> ~1ms <strong style="color: #fee140;">(100x faster!)</strong>
            </p>
        </div>
    </div>
</div>

---

## 🎯 Data Analytics & Preprocessing Power

<div style="background: linear-gradient(135deg, #4facfe 0%, #00f2fe 100%); padding: 3px; border-radius: 12px; margin: 25px 0;">
    <div style="background: #1a1a2e; padding: 25px; border-radius: 10px;">
        <h3 style="color: #00f2fe; margin-top: 0;">🛠️ Essential EDA Capabilities</h3>
        <div style="display: grid; grid-template-columns: 1fr 1fr; gap: 20px; margin-top: 15px;">
            <div>
                <h4 style="color: #feca57; margin: 10px 0;">Data Cleaning</h4>
                <ul style="color: #b0b0b0; line-height: 1.6; font-size: 14px;">
                    <li>Handle missing values (dropna, fillna)</li>
                    <li>Remove duplicates automatically</li>
                    <li>Data type conversions</li>
                    <li>String operations at scale</li>
                </ul>
            </div>
            <div>
                <h4 style="color: #feca57; margin: 10px 0;">Transformations</h4>
                <ul style="color: #b0b0b0; line-height: 1.6; font-size: 14px;">
                    <li>Group by aggregations</li>
                    <li>Pivot tables & reshaping</li>
                    <li>Merge & join operations</li>
                    <li>Apply custom functions</li>
                </ul>
            </div>
            <div>
                <h4 style="color: #feca57; margin: 10px 0;">Analysis</h4>
                <ul style="color: #b0b0b0; line-height: 1.6; font-size: 14px;">
                    <li>Statistical summaries (mean, std, etc.)</li>
                    <li>Correlation analysis</li>
                    <li>Time series operations</li>
                    <li>Window functions (rolling, expanding)</li>
                </ul>
            </div>
            <div>
                <h4 style="color: #feca57; margin: 10px 0;">Visualization</h4>
                <ul style="color: #b0b0b0; line-height: 1.6; font-size: 14px;">
                    <li>Built-in plotting methods</li>
                    <li>Histogram, box plots, scatter</li>
                    <li>Integration with matplotlib/seaborn</li>
                    <li>Quick exploratory visuals</li>
                </ul>
            </div>
        </div>
    </div>
</div>

---

## 📂 Data Loading: Pandas vs Excel

<div style="background: linear-gradient(135deg, #a8edea 0%, #fed6e3 100%); padding: 3px; border-radius: 12px; margin: 25px 0;">
    <div style="background: #1e1e1e; padding: 25px; border-radius: 10px;">
        <h3 style="color: #4ecdc4; margin-top: 0;">🌐 Pandas: Universal Data Reader</h3>
        <p style="color: #e0e0e0; margin: 15px 0;">Pandas can read from virtually any data source with simple commands:</p>
    </div>
</div>

```python
# CSV Files
df = pd.read_csv('data.csv')

# Excel Files (multiple sheets supported)
df = pd.read_excel('data.xlsx', sheet_name='Sheet1')

# SQL Databases
df = pd.read_sql('SELECT * FROM table', connection)

# JSON Files
df = pd.read_json('data.json')

# HTML Tables (from web pages)
df = pd.read_html('https://example.com')[0]

# Clipboard (copy from anywhere, paste to pandas!)
df = pd.read_clipboard()

# Parquet, HDF5, Feather (big data formats)
df = pd.read_parquet('data.parquet')
```

<div style="background: #1a1a2e; padding: 20px; border-radius: 10px; margin: 20px 0; border-left: 5px solid #38ef7d;">
    <p style="color: #e0e0e0; margin: 0; line-height: 1.8;">
        <strong style="color: #feca57;">Excel:</strong> Primarily limited to .xlsx, .xls, .csv files with manual import wizards<br>
        <strong style="color: #38ef7d;">Pandas:</strong> 20+ file formats, APIs, databases - all programmatically accessible
    </p>
</div>

---

## 🔧 Robust Data Operations

<table style="width: 100%; border-collapse: collapse; margin: 25px 0;">
    <tr style="background: linear-gradient(135deg, #f5af19 0%, #f12711 100%);">
        <th style="padding: 15px; color: #ffffff; text-align: left; border-radius: 8px 0 0 0;">Operation</th>
        <th style="padding: 15px; color: #ffffff; text-align: left;">Excel Approach</th>
        <th style="padding: 15px; color: #ffffff; text-align: left; border-radius: 0 8px 0 0;">Pandas Approach</th>
    </tr>
    <tr style="background: #1a1a2e;">
        <td style="padding: 12px; color: #feca57; font-weight: bold; border-bottom: 1px solid #333;">Adding Rows</td>
        <td style="padding: 12px; color: #b0b0b0; border-bottom: 1px solid #333;">Manual copy-paste or insert</td>
        <td style="padding: 12px; color: #38ef7d; border-bottom: 1px solid #333;"><code style="background: #0d0d0d; padding: 2px 6px; border-radius: 4px;">df.concat([df, new_rows])</code></td>
    </tr>
    <tr style="background: #16213e;">
        <td style="padding: 12px; color: #feca57; font-weight: bold; border-bottom: 1px solid #333;">Deleting Rows</td>
        <td style="padding: 12px; color: #b0b0b0; border-bottom: 1px solid #333;">Select & delete manually</td>
        <td style="padding: 12px; color: #38ef7d; border-bottom: 1px solid #333;"><code style="background: #0d0d0d; padding: 2px 6px; border-radius: 4px;">df.drop([0, 1, 2])</code></td>
    </tr>
    <tr style="background: #1a1a2e;">
        <td style="padding: 12px; color: #feca57; font-weight: bold; border-bottom: 1px solid #333;">Filtering</td>
        <td style="padding: 12px; color: #b0b0b0; border-bottom: 1px solid #333;">AutoFilter, manual selection</td>
        <td style="padding: 12px; color: #38ef7d; border-bottom: 1px solid #333;"><code style="background: #0d0d0d; padding: 2px 6px; border-radius: 4px;">df[df['age'] > 25]</code></td>
    </tr>
    <tr style="background: #16213e;">
        <td style="padding: 12px; color: #feca57; font-weight: bold; border-bottom: 1px solid #333;">Merging Tables</td>
        <td style="padding: 12px; color: #b0b0b0; border-bottom: 1px solid #333;">VLOOKUP/INDEX-MATCH formulas</td>
        <td style="padding: 12px; color: #38ef7d; border-bottom: 1px solid #333;"><code style="background: #0d0d0d; padding: 2px 6px; border-radius: 4px;">pd.merge(df1, df2, on='key')</code></td>
    </tr>
    <tr style="background: #1a1a2e;">
        <td style="padding: 12px; color: #feca57; font-weight: bold; border-bottom: 1px solid #333;">Sorting</td>
        <td style="padding: 12px; color: #b0b0b0; border-bottom: 1px solid #333;">Sort button, limited options</td>
        <td style="padding: 12px; color: #38ef7d; border-bottom: 1px solid #333;"><code style="background: #0d0d0d; padding: 2px 6px; border-radius: 4px;">df.sort_values(['col1', 'col2'])</code></td>
    </tr>
    <tr style="background: #16213e;">
        <td style="padding: 12px; color: #feca57; font-weight: bold;">Grouping</td>
        <td style="padding: 12px; color: #b0b0b0;">Pivot tables (manual setup)</td>
        <td style="padding: 12px; color: #38ef7d;"><code style="background: #0d0d0d; padding: 2px 6px; border-radius: 4px;">df.groupby('category').sum()</code></td>
    </tr>
</table>

---

# 📊 Pandas Data Structures

<div style="background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); padding: 30px; border-radius: 15px; margin: 25px 0; text-align: center;">
    <h2 style="color: #ffffff; margin: 0; font-size: 26px;">Two Core Data Structures</h2>
    <p style="color: #e0e0e0; margin: 15px 0 0 0; font-size: 16px;">Series (1D) & DataFrame (2D)</p>
</div>

---

## 📏 Understanding Arrays with Labels

<div style="background: #1a1a2e; padding: 25px; border-radius: 12px; margin: 25px 0;">
    <h3 style="color: #feca57; margin-top: 0;">🔢 What are 1D and 2D Arrays?</h3>
    <div style="display: grid; grid-template-columns: 1fr 1fr; gap: 25px; margin-top: 20px;">
        <div style="background: linear-gradient(135deg, #f093fb 0%, #f5576c 100%); padding: 3px; border-radius: 10px;">
            <div style="background: #16213e; padding: 20px; border-radius: 8px;">
                <h4 style="color: #f5576c; margin-top: 0; text-align: center;">1D Array (One Dimension)</h4>
                <p style="color: #b0b0b0; text-align: center; margin: 10px 0;">A single row or column of data</p>
                <div style="background: #0d0d0d; padding: 15px; border-radius: 8px; margin-top: 15px;">
                    <pre style="color: #38ef7d; margin: 0; text-align: center; font-size: 14px;">
[10, 20, 30, 40, 50]
    ↓   ↓   ↓   ↓   ↓
  Index: 0, 1, 2, 3, 4
                    </pre>
                </div>
            </div>
        </div>
        <div style="background: linear-gradient(135deg, #4facfe 0%, #00f2fe 100%); padding: 3px; border-radius: 10px;">
            <div style="background: #16213e; padding: 20px; border-radius: 8px;">
                <h4 style="color: #00f2fe; margin-top: 0; text-align: center;">2D Array (Two Dimensions)</h4>
                <p style="color: #b0b0b0; text-align: center; margin: 10px 0;">Rows AND columns (like a table)</p>
                <div style="background: #0d0d0d; padding: 15px; border-radius: 8px; margin-top: 15px;">
                    <pre style="color: #38ef7d; margin: 0; text-align: center; font-size: 14px;">
    Col0  Col1  Col2
Row0 [10,   20,   30]
Row1 [40,   50,   60]
Row2 [70,   80,   90]
                    </pre>
                </div>
            </div>
        </div>
    </div>
</div>

---

## 📌 Series: 1D Labeled Array

<div style="background: linear-gradient(135deg, #fa709a 0%, #fee140 100%); padding: 3px; border-radius: 12px; margin: 25px 0;">
    <div style="background: #1e1e1e; padding: 25px; border-radius: 10px;">
        <h3 style="color: #fee140; margin-top: 0;">🎯 What is a Series?</h3>
        <p style="color: #e0e0e0; line-height: 1.8; margin: 15px 0;">
            A <strong>Series</strong> is a one-dimensional labeled array that can hold any data type (integers, strings, floats, objects). Think of it as a single column from an Excel spreadsheet with row labels.
        </p>
        <div style="background: #2d2d2d; padding: 20px; border-radius: 8px; margin-top: 20px;">
            <h4 style="color: #38ef7d; margin-top: 0;">Key Characteristics:</h4>
            <ul style="color: #b0b0b0; line-height: 1.8;">
                <li><strong style="color: #feca57;">One Data Type:</strong> All elements must be the same type (homogeneous)</li>
                <li><strong style="color: #feca57;">Labeled Index:</strong> Each value has a label (can be custom or default 0, 1, 2...)</li>
                <li><strong style="color: #feca57;">Single Column:</strong> Represents one feature/variable from your dataset</li>
            </ul>
        </div>
    </div>
</div>

### 🎨 Visual Representation of Series

<div style="display: grid; grid-template-columns: 1fr 1fr; gap: 20px; margin: 25px 0;">
    <div style="background: #1a1a2e; padding: 20px; border-radius: 10px; border: 3px solid #f093fb;">
        <h4 style="color: #f093fb; margin-top: 0; text-align: center;">Default Integer Index</h4>
        <div style="background: #0d0d0d; padding: 20px; border-radius: 8px;">
            <table style="width: 100%; border-collapse: collapse;">
                <tr style="background: #2d2d2d;">
                    <th style="padding: 10px; color: #feca57; text-align: center; border-bottom: 2px solid #f093fb;">Index</th>
                    <th style="padding: 10px; color: #feca57; text-align: center; border-bottom: 2px solid #f093fb;">Values</th>
                </tr>
                <tr style="background: #1a1a1a;">
                    <td style="padding: 10px; color: #f093fb; text-align: center; font-weight: bold;">0</td>
                    <td style="padding: 10px; color: #e0e0e0; text-align: center;">92</td>
                </tr>
                <tr style="background: #252525;">
                    <td style="padding: 10px; color: #f093fb; text-align: center; font-weight: bold;">1</td>
                    <td style="padding: 10px; color: #e0e0e0; text-align: center;">85</td>
                </tr>
                <tr style="background: #1a1a1a;">
                    <td style="padding: 10px; color: #f093fb; text-align: center; font-weight: bold;">2</td>
                    <td style="padding: 10px; color: #e0e0e0; text-align: center;">78</td>
                </tr>
                <tr style="background: #252525;">
                    <td style="padding: 10px; color: #f093fb; text-align: center; font-weight: bold;">3</td>
                    <td style="padding: 10px; color: #e0e0e0; text-align: center;">95</td>
                </tr>
            </table>
            <p style="color: #38ef7d; text-align: center; margin: 15px 0 0 0; font-family: monospace; font-size: 12px;">dtype: int64</p>
        </div>
    </div>
    <div style="background: #1a1a2e; padding: 20px; border-radius: 10px; border: 3px solid #38ef7d;">
        <h4 style="color: #38ef7d; margin-top: 0; text-align: center;">Custom Label Index</h4>
        <div style="background: #0d0d0d; padding: 20px; border-radius: 8px;">
            <table style="width: 100%; border-collapse: collapse;">
                <tr style="background: #2d2d2d;">
                    <th style="padding: 10px; color: #feca57; text-align: center; border-bottom: 2px solid #38ef7d;">Index</th>
                    <th style="padding: 10px; color: #feca57; text-align: center; border-bottom: 2px solid #38ef7d;">Values</th>
                </tr>
                <tr style="background: #1a1a1a;">
                    <td style="padding: 10px; color: #38ef7d; text-align: center; font-weight: bold;">Anish</td>
                    <td style="padding: 10px; color: #e0e0e0; text-align: center;">92</td>
                </tr>
                <tr style="background: #252525;">
                    <td style="padding: 10px; color: #38ef7d; text-align: center; font-weight: bold;">Manish</td>
                    <td style="padding: 10px; color: #e0e0e0; text-align: center;">85</td>
                </tr>
                <tr style="background: #1a1a1a;">
                    <td style="padding: 10px; color: #38ef7d; text-align: center; font-weight: bold;">Rohan</td>
                    <td style="padding: 10px; color: #e0e0e0; text-align: center;">78</td>
                </tr>
                <tr style="background: #252525;">
                    <td style="padding: 10px; color: #38ef7d; text-align: center; font-weight: bold;">Priya</td>
                    <td style="padding: 10px; color: #e0e0e0; text-align: center;">95</td>
                </tr>
            </table>
            <p style="color: #38ef7d; text-align: center; margin: 15px 0 0 0; font-family: monospace; font-size: 12px;">dtype: int64</p>
        </div>
    </div>
</div>

```python
# Creating a Series
import pandas as pd

# Method 1: From a list (default index)
scores = pd.Series([92, 85, 78, 95])

# Method 2: From a list with custom index
scores = pd.Series([92, 85, 78, 95], index=['Anish', 'Manish', 'Rohan', 'Priya'])

# Method 3: From a dictionary (keys become index)
scores = pd.Series({'Anish': 92, 'Manish': 85, 'Rohan': 78, 'Priya': 95})
```

---

## 📊 DataFrame: 2D Labeled Array

<div style="background: linear-gradient(135deg, #4facfe 0%, #00f2fe 100%); padding: 3px; border-radius: 12px; margin: 25px 0;">
    <div style="background: #1e1e1e; padding: 25px; border-radius: 10px;">
        <h3 style="color: #00f2fe; margin-top: 0;">📋 What is a DataFrame?</h3>
        <p style="color: #e0e0e0; line-height: 1.8; margin: 15px 0;">
            A <strong>DataFrame</strong> is a two-dimensional labeled data structure with columns of potentially different types. Think of it as a spreadsheet or SQL table in Python—it's the primary data structure you'll work with in pandas.
        </p>
        <div style="background: #2d2d2d; padding: 20px; border-radius: 8px; margin-top: 20px;">
            <h4 style="color: #38ef7d; margin-top: 0;">Key Characteristics:</h4>
            <ul style="color: #b0b0b0; line-height: 1.8;">
                <li><strong style="color: #feca57;">Multiple Columns:</strong> Each column is a Series (can have different data types)</li>
                <li><strong style="color: #feca57;">Labeled Rows & Columns:</strong> Both axes have labels for easy access</li>
                <li><strong style="color: #feca57;">Tabular Structure:</strong> Rows represent observations, columns represent features</li>
                <li><strong style="color: #feca57;">Heterogeneous:</strong> Different columns can have different data types</li>
            </ul>
        </div>
    </div>
</div>

### 🎨 Visual Representation of DataFrame

<div style="background: #1a1a2e; padding: 25px; border-radius: 10px; border: 3px solid #00f2fe; margin: 25px 0;">
    <h4 style="color: #00f2fe; margin-top: 0; text-align: center; font-size: 20px;">Complete Student Records Table</h4>
    <div

---

## 🔬 Creating Series: Three Methods Explained

<div style="background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); padding: 3px; border-radius: 10px; margin: 20px 0;">
    <div style="background: #1e1e1e; padding: 20px; border-radius: 8px;">
        <h3 style="color: #e100ff; margin-top: 0;">Why All Three Work? Understanding Homogeneous Data</h3>
    </div>
</div>

### 📝 Method 1: Function Reference (No Call)

```python
ser = pd.Series(np.random.rand)
print(ser)
```

<div style="background: #1e1e1e; padding: 15px; border-radius: 8px; margin: 15px 0; border: 2px solid #f093fb;">
    <pre style="color: #e0e0e0; margin: 0; font-family: 'Courier New', monospace; font-size: 14px;">
<span style="color: #feca57;">0</span>    <built-in method rand of numpy.random.mtrand.RandomState object>
<span style="color: #38ef7d;">dtype:</span> object
    </pre>
</div>

<div style="background: #1a1a2e; padding: 15px; border-radius: 8px; margin: 15px 0; border-left: 4px solid #f093fb;">
    <p style="color: #b0b0b0; margin: 0; font-size: 14px;">
        <strong style="color: #f093fb;">✅ Works:</strong> Series contains ONE element (the function object itself). Data type is <code style="background: #0d0d0d; padding: 2px 6px; border-radius: 4px; color: #feca57;">object</code> — homogeneous!
    </p>
</div>

---

### 📝 Method 2: Array of Random Numbers

```python
ser = pd.Series(np.random.rand(10))
print(ser)
```

<div style="background: #1e1e1e; padding: 15px; border-radius: 8px; margin: 15px 0; border: 2px solid #38ef7d;">
    <pre style="color: #e0e0e0; margin: 0; font-family: 'Courier New', monospace; font-size: 14px;">
<span style="color: #feca57;">0</span>    0.548814
<span style="color: #feca57;">1</span>    0.715189
<span style="color: #feca57;">2</span>    0.602763
<span style="color: #feca57;">3</span>    0.544883
<span style="color: #feca57;">4</span>    0.423655
<span style="color: #feca57;">5</span>    0.645894
<span style="color: #feca57;">6</span>    0.437587
<span style="color: #feca57;">7</span>    0.891773
<span style="color: #feca57;">8</span>    0.963663
<span style="color: #feca57;">9</span>    0.383442
<span style="color: #38ef7d;">dtype:</span> float64
    </pre>
</div>

<div style="background: #1a1a2e; padding: 15px; border-radius: 8px; margin: 15px 0; border-left: 4px solid #38ef7d;">
    <p style="color: #b0b0b0; margin: 0; font-size: 14px;">
        <strong style="color: #38ef7d;">✅ Works:</strong> All 10 elements are <code style="background: #0d0d0d; padding: 2px 6px; border-radius: 4px; color: #feca57;">float64</code> (decimal numbers). Same data type throughout — homogeneous!
    </p>
</div>

---

### 📝 Method 3: Dictionary with Integers

```python
ser = pd.Series({'Anish': 30, 'Manish': 20})
print(ser)
```

<div style="background: #1e1e1e; padding: 15px; border-radius: 8px; margin: 15px 0; border: 2px solid #00f2fe;">
    <pre style="color: #e0e0e0; margin: 0; font-family: 'Courier New', monospace; font-size: 14px;">
<span style="color: #feca57;">Anish</span>     30
<span style="color: #feca57;">Manish</span>    20
<span style="color: #38ef7d;">dtype:</span> int64
    </pre>
</div>

<div style="background: #1a1a2e; padding: 15px; border-radius: 8px; margin: 15px 0; border-left: 4px solid #00f2fe;">
    <p style="color: #b0b0b0; margin: 0; font-size: 14px;">
        <strong style="color: #00f2fe;">✅ Works:</strong> Both values (30, 20) are <code style="background: #0d0d0d; padding: 2px 6px; border-radius: 4px; color: #feca57;">int64</code> (integers). Dictionary keys become index labels — homogeneous!
    </p>
</div>

---

## 🎯 The Homogeneous Rule Explained

<div style="background: linear-gradient(135deg, #fa709a 0%, #fee140 100%); padding: 20px; border-radius: 10px; margin: 20px 0; border-left: 5px solid #38ef7d;">
    <p style="color: #1e1e1e; margin: 0; font-size: 14px; line-height: 1.8;">
        <strong>💡 Key Insight:</strong> "One data type" means all <strong>values</strong> in the Series share the same type. Each example above is homogeneous:<br><br>
        <span style="color: #0d0d0d;">• Method 1 → All values are <code style="background: #1e1e1e; padding: 2px 6px; border-radius: 4px; color: #feca57;">object</code> type (1 function object)</span><br>
        <span style="color: #0d0d0d;">• Method 2 → All values are <code style="background: #1e1e1e; padding: 2px 6px; border-radius: 4px; color: #feca57;">float64</code> type (10 floats)</span><br>
        <span style="color: #0d0d0d;">• Method 3 → All values are <code style="background: #1e1e1e; padding: 2px 6px; border-radius: 4px; color: #feca57;">int64</code> type (2 integers)</span>
    </p>
</div>

---

In [None]:
ser = pd.Series(np.random.rand)
print(ser)

ser = pd.Series(np.random.rand(10))
print(ser)

ser = pd.Series({'Anish': 30, 'Manish': 20})
print(ser)

0    <bound method RandomState.rand of RandomState(...
dtype: object
0    0.075121
1    0.653702
2    0.475667
3    0.495001
4    0.896271
5    0.362971
6    0.169614
7    0.500373
8    0.438406
9    0.512061
dtype: float64
Anish     30
Manish    20
dtype: int64


In [3]:
newDf = pd.DataFrame(np.random.rand(15,5), index = np.arange(15))

print(newDf)
print(type(newDf))
print(newDf.describe())

           0         1         2         3         4
0   0.487075  0.951729  0.713426  0.408018  0.883319
1   0.025682  0.721661  0.680489  0.656143  0.136115
2   0.208127  0.050762  0.627465  0.508233  0.525950
3   0.659830  0.237752  0.676131  0.921983  0.531979
4   0.717693  0.341146  0.644210  0.015014  0.980058
5   0.796539  0.438237  0.748725  0.763206  0.004339
6   0.992464  0.059365  0.232841  0.497218  0.347772
7   0.601058  0.944080  0.940456  0.679780  0.225532
8   0.196227  0.995020  0.370868  0.107475  0.058957
9   0.635300  0.823346  0.931363  0.573825  0.640488
10  0.876006  0.828089  0.975743  0.738102  0.731814
11  0.258264  0.533453  0.674037  0.418709  0.155063
12  0.704313  0.487586  0.075521  0.256547  0.850324
13  0.080062  0.738969  0.405849  0.298420  0.299678
14  0.357040  0.655655  0.970961  0.425377  0.065928
<class 'pandas.core.frame.DataFrame'>
               0          1          2          3          4
count  15.000000  15.000000  15.000000  15.000000  15

In [4]:
newDf.loc[1, 3] = 'Anish' # row = 1, col = 3
newDf

  newDf.loc[1, 3] = 'Anish' # row = 1, col = 3


Unnamed: 0,0,1,2,3,4
0,0.487075,0.951729,0.713426,0.408018,0.883319
1,0.025682,0.721661,0.680489,Anish,0.136115
2,0.208127,0.050762,0.627465,0.508233,0.52595
3,0.65983,0.237752,0.676131,0.921983,0.531979
4,0.717693,0.341146,0.64421,0.015014,0.980058
5,0.796539,0.438237,0.748725,0.763206,0.004339
6,0.992464,0.059365,0.232841,0.497218,0.347772
7,0.601058,0.94408,0.940456,0.67978,0.225532
8,0.196227,0.99502,0.370868,0.107475,0.058957
9,0.6353,0.823346,0.931363,0.573825,0.640488


In [5]:
print(newDf.index)
print(newDf.columns)

Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14], dtype='int64')
RangeIndex(start=0, stop=5, step=1)


---

## 🔄 Converting DataFrame to NumPy Array

<div style="background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); padding: 3px; border-radius: 10px; margin: 20px 0;">
    <div style="background: #1e1e1e; padding: 20px; border-radius: 8px;">
        <h3 style="color: #e100ff; margin-top: 0;">📦 Extract Underlying NumPy Array</h3>
        <p style="color: #b0b0b0; margin-bottom: 10px;">Convert your DataFrame to a raw NumPy array for numerical computations:</p>
    </div>
</div>

```python
# Convert DataFrame to NumPy array
newDf.to_numpy()
```

<div style="background: #1e1e1e; padding: 15px; border-radius: 8px; margin: 15px 0; border: 2px solid #38ef7d;">
    <pre style="color: #e0e0e0; margin: 0; font-family: 'Courier New', monospace; font-size: 14px;">
array([['Anish', 92, 'Delhi'],
       ['Manish', 82, 'Mumbai']], dtype=object)
    </pre>
</div>

<div style="background: linear-gradient(135deg, #fa709a 0%, #fee140 100%); padding: 20px; border-radius: 10px; margin: 20px 0; border-left: 5px solid #38ef7d;">
    <h4 style="color: #1e1e1e; margin-top: 0;">🎯 What Happens?</h4>
    <ul style="color: #1e1e1e; margin: 10px 0 0 20px; line-height: 1.8;">
        <li><strong>Removes Labels:</strong> Index and column names are stripped away</li>
        <li><strong>Pure Values:</strong> Returns only the raw data as a 2D array</li>
        <li><strong>Type Conversion:</strong> If mixed types exist, converts to common dtype (usually <code style="background: #1e1e1e; padding: 2px 6px; border-radius: 4px; color: #feca57;">object</code>)</li>
        <li><strong>Shape Preserved:</strong> Maintains rows × columns structure</li>
    </ul>
</div>

<div style="background: #1a1a2e; padding: 15px; border-radius: 8px; margin: 15px 0; border-left: 4px solid #4facfe;">
    <p style="color: #b0b0b0; margin: 0; font-size: 14px;">
        💡 <strong style="color: #4facfe;">Use Case:</strong> Useful when you need to pass data to machine learning libraries (scikit-learn, TensorFlow) or perform matrix operations that require NumPy arrays.
    </p>
</div>

---

In [6]:
newDf.to_numpy()
newDf.head()

Unnamed: 0,0,1,2,3,4
0,0.487075,0.951729,0.713426,0.408018,0.883319
1,0.025682,0.721661,0.680489,Anish,0.136115
2,0.208127,0.050762,0.627465,0.508233,0.52595
3,0.65983,0.237752,0.676131,0.921983,0.531979
4,0.717693,0.341146,0.64421,0.015014,0.980058


# 📊 DataFrame Operations Guide

## Dataset Overview
Working with a student performance dataset containing academic scores, study habits, and demographic information across multiple subjects and metrics.

---

## 🔄 Transpose Operation (`.T`)

**Purpose:** Flips rows and columns of a DataFrame - rows become columns and columns become rows.

### Syntax
```python
df.T
```

### Example Transformation

**Before (Original DataFrame)**
```
     Math  Science  English
Alice   85      90       88
Bob     78      82       85
```

**After (Transposed)**
```
         Alice  Bob
Math        85   78
Science     90   82
English     88   85
```

---

## 🔢 Sorting by Index (`.sort_index()`)

**Purpose:** Arranges DataFrame rows or columns based on their index labels in ascending or descending order.

### Syntax

```python
df.sort_index(axis=0, ascending=False)
```

### Parameters Explained

| Parameter | Values | Effect |
|-----------|--------|--------|
| `axis=0` | Default | Sorts **rows** by index (vertical sorting) |
| `axis=1` | Optional | Sorts **columns** by names (horizontal sorting) |
| `ascending=False` | | Reverse order (Z→A, 9→0) |

### Example Transformation

**Before Sorting (`axis=0, ascending=False`)**
```
       Score
Alice     85
Bob       78
Charlie   92
```

**After Sorting**
```
       Score
Charlie   92
Bob       78
Alice     85
```

**With `axis=1` Example**

**Before**
```
     Math  Science  Art  English
Alice  85      90   88       92
```

**After (`axis=1, ascending=True`)**
```
     Art  English  Math  Science
Alice 88       92    85       90
```

---

💡 **Key Difference:** `axis=0` works on rows (students), `axis=1` works on columns (subjects)

In [9]:
newDf.T

newDf.T
newDf.sort_index(axis = 0, ascending=False)

newDf


Unnamed: 0,0,1,2,3,4
0,0.487075,0.951729,0.713426,0.408018,0.883319
1,0.025682,0.721661,0.680489,Anish,0.136115
2,0.208127,0.050762,0.627465,0.508233,0.52595
3,0.65983,0.237752,0.676131,0.921983,0.531979
4,0.717693,0.341146,0.64421,0.015014,0.980058
5,0.796539,0.438237,0.748725,0.763206,0.004339
6,0.992464,0.059365,0.232841,0.497218,0.347772
7,0.601058,0.94408,0.940456,0.67978,0.225532
8,0.196227,0.99502,0.370868,0.107475,0.058957
9,0.6353,0.823346,0.931363,0.573825,0.640488


# 🔍 Accessing DataFrame Columns

## Understanding DataFrame Structure

A **DataFrame is a collection of Series objects** - each column is essentially a Series with shared index.

---

## 📌 Column Selection

**Purpose:** Extract a single column from DataFrame, which returns as a Series object.

### Syntax
```python
df[column_name]
type(df[column_name])  # Returns: pandas.core.series.Series
```

### Example Transformation

**Original DataFrame**
```
     Name   Age  Score
0    Alice   20    85
1    Bob     22    78
2    Charlie 21    92
```

**Selecting Column `0` (First Column)**
```python
df[0]
```

**Output (Series)**
```
0      Alice
1        Bob
2    Charlie
Name: 0, dtype: object
```

**Data Type:** `<class 'pandas.core.series.Series'>`

---

## 🏗️ DataFrame Architecture
```
DataFrame = Multiple Series Combined
│
├── Column 0 (Series) → ['Alice', 'Bob', 'Charlie']
├── Column 1 (Series) → [20, 22, 21]
└── Column 2 (Series) → [85, 78, 92]
```

**Key Insight:** Each column operates independently as a Series but shares the same row index structure.

---

## ✏️ Modifying Values

**Purpose:** Update specific cell values or entire columns in DataFrame.

### Syntax for Cell Modification
```python
df.loc[row_index, column_name] = new_value
```

### Syntax for Column Modification
```python
df[column_name] = new_values
```

### Example Transformations

**Before Modification**
```
     Name   Score
0    Alice    85
1    Bob      78
```

**Single Cell Update**
```python
df.loc[0, 'Score'] = 95
```

**After**
```
     Name   Score
0    Alice    95  ← Changed
1    Bob      78
```

**Entire Column Update**
```python
df['Score'] = [90, 88]
```

**After**
```
     Name   Score
0    Alice    90  ← Changed
1    Bob      88  ← Changed
```

---

💡 **Pro Tip:** Use `.loc[]` for label-based indexing, `.iloc[]` for position-based indexing when modifying values.

In [12]:
print(newDf[0])
print(type(newDf[0]))

0     0.487075
1     0.025682
2     0.208127
3     0.659830
4     0.717693
5     0.796539
6     0.992464
7     0.601058
8     0.196227
9     0.635300
10    0.876006
11    0.258264
12    0.704313
13    0.080062
14    0.357040
Name: 0, dtype: float64
<class 'pandas.core.series.Series'>


# 📋 DataFrame Copying Methods

## Understanding References vs Copies

**Critical Concept:** Assignment creates a reference (view), not an independent copy.

---

## ⚠️ Reference Assignment (Shallow)

**Purpose:** Creates a pointer to the same DataFrame object in memory.

### Syntax

```python
df_reference = df
```

### Example Transformation

**Original DataFrame**
```
     Name   Score
0    Alice    85
1    Bob      78
```

**Creating Reference**

```python
df_new = df
df_new.loc[0, 'Score'] = 95
```

**After Modification**

**`df_new` (Modified)**
```
     Name   Score
0    Alice    95  ← Changed
1    Bob      78
```

**`df` (Original Also Changed!)**
```
     Name   Score
0    Alice    95  ← Also changed!
1    Bob      78
```

**Why?** Both variables point to the **same memory location** - modifying one affects both.

---

## ✅ Deep Copy (Independent)

**Purpose:** Creates a completely separate DataFrame - changes don't affect the original.

### Syntax Method 1 (Recommended)

```python
df_copy = df.copy()
```

### Syntax Method 2 (Alternative)

```python
df_copy = df[:]
```

### Example Transformation

**Original DataFrame**
```
     Name   Score
0    Alice    85
1    Bob      78
```

**Creating Independent Copy**

```python
df_copy = df.copy()
df_copy.loc[0, 'Score'] = 95
```

**After Modification**

**`df_copy` (Modified)**
```
     Name   Score
0    Alice    95  ← Changed
1    Bob      78
```

**`df` (Original Unchanged)**
```
     Name   Score
0    Alice    85  ← Remains same
1    Bob      78
```

---

## 🔑 Key Differences

| Method | Type | Memory | Changes Affect Original? |
|--------|------|--------|--------------------------|
| `df2 = df` | Reference/View | Shared | ✅ Yes |
| `df2 = df.copy()` | Deep Copy | Separate | ❌ No |
| `df2 = df[:]` | Deep Copy | Separate | ❌ No |

---

💡 **Best Practice:** Always use `.copy()` when you need independent DataFrames to avoid unintended modifications.

# 🏷️ Renaming Columns & Data Modification

## Column Renaming

**Purpose:** Replace all column names with new labels for better readability or standardization.

### Syntax
```python
df.columns = new_column_list
```

### Example Transformation

**Before Renaming**
```
     student_name  math_score  science_score  english_score  attendance
0    Alice         85          90             88             95
1    Bob           78          82             85             92
```

**Applying New Column Names**
```python
df_new = df[:]
df_new.columns = list("ABCDE")
```

**After Renaming**
```
     A      B   C   D   E
0    Alice  85  90  88  95
1    Bob    78  82  85  92
```

**Explanation:** `list("ABCDE")` creates `['A', 'B', 'C', 'D', 'E']` - each letter becomes a new column name sequentially.

---

## 🔒 Deep Copy Independence Test

**Purpose:** Verify that modifications in copied DataFrame don't affect the original.

### Syntax
```python
df_copy = df[:]
df_copy.loc[row_index, column_name] = new_value
```

### Example Transformation

**Original DataFrame (`newDf`)**
```
     student_name  math_score  science_score
0    Alice         85          90
1    Bob           78          82
```

**Creating Copy & Modifying**
```python
df_new = newDf[:]
df_new.columns = list("ABCDE")
df_new.loc[0, 'A'] = 'ANISH'
```

**After Modification**

**`df_new` (Modified Copy)**
```
     A       B   C   D   E
0    ANISH   85  90  88  95  ← Changed
1    Bob     78  82  85  92
```

**`newDf` (Original Unchanged)**
```
     student_name  math_score  science_score
0    Alice         85          90             ← Remains same
1    Bob           78          82
```

---

## 📊 Comparison Output

**`print(newDf.head())`**
```
     student_name  math_score  science_score  english_score  attendance
0    Alice         85          90             88             95
1    Bob           78          82             85             92
```

**`print(df_new.head())`**
```
     A       B   C   D   E
0    ANISH   85  90  88  95
1    Bob     78  82  85  92
```

---

💡 **Key Takeaway:** The slice operator `[:]` creates a true deep copy - column renaming and cell modifications in `df_new` have **zero impact** on the original `newDf`.

# 🗑️ Deleting Columns & Rows

## Accidental Column Creation

**Warning:** Using integer index with `.loc[]` can create unintended new columns.

### Example Transformation

**Before Modification**
```
     A      B   C   D   E
0    ANISH  85  90  88  95
1    Bob    78  82  85  92
```

**Incorrect Assignment**
```python
df_new.loc[0, 0] = 654
```

**After (New Column Created!)**
```
     0    A      B   C   D   E
0    654  ANISH  85  90  88  95  ← New column '0' added
1    NaN  Bob    78  82  85  92
```

**Why?** `.loc[]` expects column **names**, not positions - it created a new column named `0`.

---

## 🔧 Dropping Columns & Rows (`.drop()`)

**Purpose:** Remove specified columns or rows from DataFrame based on labels.

### Syntax
```python
df.drop(label, axis=1)  # Drop column
df.drop(label, axis=0)  # Drop row
```

### Parameters Explained

| Parameter | Value | Target | Effect |
|-----------|-------|--------|--------|
| `axis=1` | Column axis | Columns | Removes column with specified name |
| `axis=0` | Row axis (default) | Rows | Removes row with specified index |

---

## 📌 Column Deletion Example

**Before Dropping**
```
     0    A      B   C
0    654  ANISH  85  90
1    NaN  Bob    78  82
```

**Drop Column Named '0'**
```python
df_new = df_new.drop(0, axis=1)
```

**After Dropping**
```
     A      B   C
0    ANISH  85  90  ← Column '0' removed
1    Bob    78  82
```

---

## 📌 Row Deletion Example (`axis=0`)

**Before Dropping**
```
     A      B   C
0    ANISH  85  90
1    Bob    78  82
2    Carol  92  88
```

**Drop Row with Index 0**
```python
df_new = df_new.drop(0, axis=0)
```

**After Dropping**
```
     A      B   C
1    Bob    78  82  ← Row index 0 removed
2    Carol  92  88
```

---

## 🔑 Key Differences Summary

| Operation | Axis Value | Removes | Example |
|-----------|------------|---------|---------|
| `drop(0, axis=1)` | Column axis | Column named `0` | Vertical deletion |
| `drop(0, axis=0)` | Row axis | Row at index `0` | Horizontal deletion |

---

💡 **Pro Tip:** Add `inplace=True` to modify the original DataFrame: `df.drop(0, axis=1, inplace=True)` - otherwise, it returns a new DataFrame.

# 🎯 Advanced Selection with `.loc[]`

## Subset Selection (Rows & Columns)

**Purpose:** Extract specific rows and columns simultaneously using label-based indexing.

---

## 📦 Selecting Specific Rows & Columns

### Syntax
```python
df.loc[row_labels, column_labels]
```

### Example Transformation

**Original DataFrame**
```
     A      B   C   D   E
0    ANISH  85  90  88  95
1    Bob    78  82  85  92
2    Carol  92  88  91  89
3    David  80  79  84  90
```

**Selecting Rows 1, 2 and Columns C, D**
```python
df_new.loc[[1, 2], ['C', 'D']]
```

**Result (Copy Returned)**
```
     C   D
1    82  85
2    88  91
```

**Returns:** A **copy** of the subset - modifying this won't affect the original DataFrame.

---

## 📊 All Rows, Specific Columns

### Syntax
```python
df.loc[:, column_labels]
```

### Example Transformation

**Original DataFrame**
```
     A      B   C   D   E
0    ANISH  85  90  88  95
1    Bob    78  82  85  92
2    Carol  92  88  91  89
3    David  80  79  84  90
```

**Selecting All Rows for Columns C, D**
```python
df_new.loc[:, ['C', 'D']]
```

**Result**
```
     C   D
0    90  88
1    82  85
2    88  91
3    79  84
```

**Explanation:** `:` selects **all rows**, `['C', 'D']` filters only columns C and D.

---

## 📋 Specific Rows, All Columns

### Syntax
```python
df.loc[row_labels, :]
```

### Example Transformation

**Original DataFrame**
```
     A      B   C   D   E
0    ANISH  85  90  88  95
1    Bob    78  82  85  92
2    Carol  92  88  91  89
3    David  80  79  84  90
```

**Selecting Rows 1, 2 with All Columns**
```python
df_new.loc[[1, 2], :]
```

**Result**
```
     A      B   C   D   E
1    Bob    78  82  85  92
2    Carol  92  88  91  89
```

**Explanation:** `[1, 2]` selects specific rows, `:` includes **all columns**.

---

## 🔑 Selection Pattern Summary

| Syntax | Rows | Columns | Result |
|--------|------|---------|--------|
| `.loc[[1,2], ['C','D']]` | Rows 1, 2 | Columns C, D | 2×2 subset |
| `.loc[:, ['C','D']]` | All rows | Columns C, D | Full column slice |
| `.loc[[1,2], :]` | Rows 1, 2 | All columns | Full row slice |

---

💡 **Memory Note:** `.loc[]` selections return **copies** by default - changes to the subset won't affect the original unless chained properly.

# 🔍 Boolean Indexing & Advanced Filtering

## Conditional Row Selection

**Purpose:** Filter rows based on logical conditions - only rows meeting the criteria are returned.

---

## 🎯 Basic Conditional Filtering

### Syntax
```python
df.loc[condition]
df.loc[df[column] operator value]
```

### Creating Boolean Conditions

**Step 1: Understanding Boolean Series**
```python
# Create a boolean condition
condition = df_new['B'] < 0.3
print(condition)
```

**Output (Boolean Series)**
```
0    False
1     True
2     True
3    False
Name: B, dtype: bool
```

**Step 2: Use Condition with `.loc[]`**
```python
# Apply the condition to filter rows
df.loc[condition]
# OR directly
df.loc[df['B'] < 0.3]
```

---

### Example Transformation

**Original DataFrame**
```
     A      B     C   D   E
0    ANISH  0.85  90  88  95
1    Bob    0.28  82  85  92
2    Carol  0.15  88  91  89
3    David  0.42  79  84  90
```

**Method 1: Using Variable**
```python
condition = df_new['B'] < 0.3
df_new.loc[condition]
```

**Method 2: Direct Filtering**
```python
df_new.loc[df_new['B'] < 0.3]
```

**Result**
```
     A      B     C   D   E
1    Bob    0.28  82  85  92  ← B = 0.28 < 0.3
2    Carol  0.15  88  91  89  ← B = 0.15 < 0.3
```

**How It Works:**
1. `df_new['B'] < 0.3` creates a boolean Series: `[False, True, True, False]`
2. `.loc[]` uses this mask to filter - only `True` rows are returned

---

## 📋 Common Filtering Use Cases

### 1️⃣ **Numeric Comparisons**
```python
# Students with scores above 85
df.loc[df['C'] > 85]
```

**Before**
```
     Name   C
0    Alice  90
1    Bob    82
2    Carol  88
```

**After**
```
     Name   C
0    Alice  90  ← 90 > 85
2    Carol  88  ← 88 > 85
```

---

### 2️⃣ **String Matching**
```python
# Students named 'Bob'
df.loc[df['A'] == 'Bob']
```

**Before**
```
     A      B
0    Alice  85
1    Bob    78
2    Bob    92
```

**After**
```
     A    B
1    Bob  78
2    Bob  92
```

---

### 3️⃣ **Multiple Conditions (AND)**
```python
# Scores > 80 AND attendance > 90
df.loc[(df['C'] > 80) & (df['E'] > 90)]
```

**Before**
```
     Name   C   E
0    Alice  90  95
1    Bob    82  88
2    Carol  75  92
```

**After**
```
     Name   C   E
0    Alice  90  95  ← C=90>80 AND E=95>90
```

**Note:** Use `&` (AND), must wrap each condition in `()`

---

### 4️⃣ **Multiple Conditions (OR)**
```python
# Scores < 70 OR attendance < 85
df.loc[(df['C'] < 70) | (df['E'] < 85)]
```

**Before**
```
     Name   C   E
0    Alice  90  95
1    Bob    65  80
2    Carol  85  82
```

**After**
```
     Name   C   E
1    Bob    65  80  ← C=65<70 OR E=80<85
2    Carol  85  82  ← E=82<85
```

**Note:** Use `|` (OR)

---

### 5️⃣ **String Contains**
```python
# Names containing 'ar'
df.loc[df['A'].str.contains('ar')]
```

**Before**
```
     A
0    Alice
1    Carol
2    Barry
```

**After**
```
     A
1    Carol  ← Contains 'ar'
2    Barry  ← Contains 'ar'
```

---

### 6️⃣ **Null Value Filtering**
```python
# Rows with missing values in column C
df.loc[df['C'].isna()]

# Rows without missing values
df.loc[df['C'].notna()]
```

**Before**
```
     Name   C
0    Alice  90
1    Bob    NaN
2    Carol  88
```

**After (`.isna()`)**
```
     Name   C
1    Bob    NaN  ← Has missing value
```

---

### 7️⃣ **Value in List**
```python
# Students named Alice or Carol
df.loc[df['A'].isin(['Alice', 'Carol'])]
```

**Before**
```
     A
0    Alice
1    Bob
2    Carol
3    David
```

**After**
```
     A
0    Alice  ← In list
2    Carol  ← In list
```

---

### 8️⃣ **NOT Condition (Negation)**
```python
# Students NOT named 'Bob'
condition = df['A'] == 'Bob'
df.loc[~condition]
```

**Before**
```
     A      B
0    Alice  85
1    Bob    78
2    Carol  92
```

**After**
```
     A      B
0    Alice  85  ← NOT Bob
2    Carol  92  ← NOT Bob
```

**Note:** `~` inverts the boolean Series

---

## 🔑 Operator Reference

| Operator | Meaning | Example |
|----------|---------|---------|
| `==` | Equal to | `df['A'] == 'Bob'` |
| `!=` | Not equal | `df['C'] != 90` |
| `>` | Greater than | `df['B'] > 0.5` |
| `<` | Less than | `df['B'] < 0.3` |
| `>=` | Greater/equal | `df['C'] >= 80` |
| `<=` | Less/equal | `df['E'] <= 95` |
| `&` | AND | `(condition1) & (condition2)` |
| `\|` | OR | `(condition1) \| (condition2)` |
| `~` | NOT | `~(df['A'] == 'Bob')` |

---

💡 **Pro Tip:** Always wrap multiple conditions in parentheses `()` when using `&` or `|` to avoid syntax errors due to operator precedence.

# 🎯 `.iloc[]` vs `.loc[]` - Position vs Label Based Indexing

## Understanding the Difference

| Feature | `.loc[]` | `.iloc[]` |
|---------|----------|-----------|
| **Selection Method** | Label-based (names) | Position-based (integers) |
| **Row Selection** | Uses index labels | Uses integer positions (0, 1, 2...) |
| **Column Selection** | Uses column names | Uses integer positions (0, 1, 2...) |
| **Use Case** | Known column/index names | Position-based slicing |

---

## 📍 Position-Based Selection with `.iloc[]`

### Syntax
```python
df.iloc[row_position, column_position]
df.iloc[row_positions, column_positions]
```

### Example: Single Cell Access

**DataFrame**
```
     A      B     C   D   E
0    ANISH  0.85  90  88  95
1    Bob    0.28  82  85  92
2    Carol  0.15  88  91  89
3    David  0.42  79  84  90
```

**Accessing First Cell (Row 0, Column 0)**
```python
df_new.iloc[0, 0]
```

**Result**
```
'ANISH'
```

**Explanation:** Returns the value at position `[0, 0]` - first row, first column.

---

### Example: Multiple Rows, All Columns

**Selecting Rows at Position 0 and 3**
```python
df_new.iloc[[0, 3], :]
```

**Result**
```
     A      B     C   D   E
0    ANISH  0.85  90  88  95  ← Position 0
3    David  0.42  79  84  90  ← Position 3
```

**Explanation:** `[0, 3]` selects rows at positions 0 and 3, `:` includes all columns.

---

## 🔑 When to Use `.loc[]` vs `.iloc[]`

### Use `.loc[]` When:
```python
# You know column/index names
df.loc[df['score'] > 80, ['name', 'grade']]
df.loc['student_123', 'math_score']
```

### Use `.iloc[]` When:
```python
# You need position-based slicing
df.iloc[0:5, 2:4]  # First 5 rows, columns 2-3
df.iloc[-1, :]      # Last row
df.iloc[:, 0]       # First column
```

---

## 🗑️ Dropping Rows & Columns

### Important: `.drop()` Returns a Copy!

**Purpose:** Remove rows or columns - original DataFrame remains unchanged unless reassigned.

---

### Example 1: Dropping Single Row

**Before**
```
     A      B     C   D   E
0    ANISH  0.85  90  88  95
1    Bob    0.28  82  85  92
2    Carol  0.15  88  91  89
3    David  0.42  79  84  90
```

**Dropping Row at Index 3**
```python
df_new.drop(3, axis=0)
```

**Returned Copy**
```
     A      B     C   D   E
0    ANISH  0.85  90  88  95
1    Bob    0.28  82  85  92
2    Carol  0.15  88  91  89  ← Row 3 removed in copy
```

**Original `df_new` (Unchanged)**
```
     A      B     C   D   E
0    ANISH  0.85  90  88  95
1    Bob    0.28  82  85  92
2    Carol  0.15  88  91  89
3    David  0.42  79  84  90  ← Still exists!
```

---

### Example 2: Dropping Single Column

**Before**
```
     A      B     C   D   E
0    ANISH  0.85  90  88  95
1    Bob    0.28  82  85  92
```

**Dropping Column 'E'**
```python
df_new.drop('E', axis=1)
```

**Returned Copy**
```
     A      B     C   D
0    ANISH  0.85  90  88  ← Column E removed in copy
1    Bob    0.28  82  85
```

**Original `df_new` (Unchanged)**
```
     A      B     C   D   E
0    ANISH  0.85  90  88  95  ← Still has column E!
1    Bob    0.28  82  85  92
```

---

### Example 3: Dropping Multiple Columns

**Before**
```
     A      B     C   D   E
0    ANISH  0.85  90  88  95
1    Bob    0.28  82  85  92
```

**Dropping Columns 'A' and 'B'**
```python
df_new.drop(['A', 'B'], axis=1)
```

**Returned Copy**
```
     C   D   E
0    90  88  95  ← Columns A, B removed in copy
1    82  85  92
```

**Original `df_new` (Unchanged)**
```
     A      B     C   D   E
0    ANISH  0.85  90  88  95  ← Still has A, B!
1    Bob    0.28  82  85  92
```

---

## ✅ Permanent Deletion (Two Methods)

### Method 1: Reassignment
```python
df_new = df_new.drop('E', axis=1)
```

**Before**
```
     A      B     C   D   E
0    ANISH  0.85  90  88  95
1    Bob    0.28  82  85  92
```

**After Reassignment**
```
     A      B     C   D
0    ANISH  0.85  90  88  ← Column E permanently removed
1    Bob    0.28  82  85
```

---

### Method 2: Using `inplace=True`
```python
df_new.drop('E', axis=1, inplace=True)
```

**Before**
```
     A      B     C   D   E
0    ANISH  0.85  90  88  95
1    Bob    0.28  82  85  92
```

**After `inplace=True`**
```
     A      B     C   D
0    ANISH  0.85  90  88  ← Column E permanently removed
1    Bob    0.28  82  85
```

---

## 🔑 Key Takeaways

| Operation | Effect on Original | How to Make Permanent |
|-----------|-------------------|----------------------|
| `df.drop()` | No change | `df = df.drop()` or `inplace=True` |
| `df.loc[]` | No change (returns copy) | Direct assignment |
| `df.iloc[]` | No change (returns copy) | Direct assignment |

---

💡 **Best Practice:** Use `inplace=True` sparingly - reassignment (`df = df.drop()`) is more explicit and easier to debug.

# 🔄 Index Management After Deletion

## Dropping Multiple Rows with `inplace=True`

### Syntax
```python
df.drop(row_labels, axis=0, inplace=True)
```

### Example Transformation

**Before Deletion**
```
     A      B     C   D
0    ANISH  0.85  90  88
1    Bob    0.28  82  85
2    Carol  0.15  88  91
3    David  0.42  79  84
```

**Deleting Rows 1 and 3 Permanently**
```python
df_new.drop([1, 3], axis=0, inplace=True)
```

**After Deletion**
```
     A      B     C   D
0    ANISH  0.85  90  88  ← Row 0 remains
2    Carol  0.15  88  91  ← Row 2 remains (gap in index!)
```

**Issue:** Index now has gaps `[0, 2]` instead of continuous `[0, 1]`.

---

## 🔢 Resetting Index (`.reset_index()`)

**Purpose:** Recreate a continuous integer index starting from 0 after deletions.

---

### Method 1: Default Reset (Creates Extra Column)

**Syntax**
```python
df.reset_index()
```

**Before Reset**
```
     A      B     C   D
0    ANISH  0.85  90  88
2    Carol  0.15  88  91  ← Gap in index
```

**After `reset_index()`**
```
   index  A      B     C   D
0      0  ANISH  0.85  90  88
1      2  Carol  0.15  88  91  ← New 'index' column added!
```

**Problem:** Old index preserved as a new column named `index`.

---

### Method 2: Clean Reset with `drop=True` ✅

**Syntax**
```python
df.reset_index(drop=True)
```

**Before Reset**
```
     A      B     C   D
0    ANISH  0.85  90  88
2    Carol  0.15  88  91  ← Gap in index
```

**After `reset_index(drop=True)`**
```
     A      B     C   D
0    ANISH  0.85  90  88  ← New index 0
1    Carol  0.15  88  91  ← New index 1 (continuous!)
```

**Result:** Clean, continuous index `[0, 1]` without extra columns.

---

## 🔑 Parameter Comparison

| Method | Creates Index Column? | Result | Use Case |
|--------|----------------------|--------|----------|
| `reset_index()` | ✅ Yes | Preserves old index as column | When old index is meaningful |
| `reset_index(drop=True)` | ❌ No | Discards old index | Clean sequential numbering |

---

### Making Reset Permanent

**Option 1: Reassignment**
```python
df_new = df_new.reset_index(drop=True)
```

**Option 2: Using `inplace=True`**
```python
df_new.reset_index(drop=True, inplace=True)
```

**Before**
```
     A      B
0    ANISH  85
5    Carol  88
```

**After (with `inplace=True`)**
```
     A      B
0    ANISH  85  ← Reset permanently
1    Carol  88
```

---

## 📊 Complete Workflow Example

**Step 1: Original DataFrame**
```
     Name   Score
0    Alice    85
1    Bob      78
2    Carol    92
3    David    80
4    Emma     88
```

**Step 2: Delete Rows**
```python
df.drop([1, 3], axis=0, inplace=True)
```
```
     Name   Score
0    Alice    85
2    Carol    92  ← Index gap
4    Emma     88  ← Index gap
```

**Step 3: Reset Index**
```python
df.reset_index(drop=True, inplace=True)
```
```
     Name   Score
0    Alice    85  ← Continuous
1    Carol    92  ← Continuous
2    Emma     88  ← Continuous
```

---

💡 **Best Practice:** Always use `drop=True` when resetting index after deletions unless you specifically need to preserve the old index values.

# 📚 Essential Pandas Functions - Learning Path

## Dataset Overview
Working with student performance data containing scores, attendance, and demographic information with potential missing values and duplicates.

---

## 1️⃣ Inspecting DataFrame Structure

### `df.shape` - Dimensions

**Purpose:** Returns the number of rows and columns as a tuple.

**Syntax**
```python
df.shape
```

**Example**
```python
# Returns: (100, 5)
# 100 rows, 5 columns
```

---

### `df.info()` - Complete Overview

**Purpose:** Displays column names, data types, non-null counts, and memory usage.

**Syntax**
```python
df.info()
```

**Output Example**
```
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   name        98 non-null     object
 1   math        95 non-null     float64
 2   science     97 non-null     float64
 3   attendance  100 non-null    int64  
 4   grade       90 non-null     object

memory usage: 4.0 KB
```

---

### `df.describe()` - Statistical Summary

**Purpose:** Generates descriptive statistics for numeric columns (count, mean, std, min, max, quartiles).

**Syntax**
```python
df.describe()
```

**Output Example**
```
          math    science  attendance
count    95.00      97.00      100.00
mean     78.50      82.30       85.20
std      12.40      10.80        8.50
min      45.00      50.00       60.00
25%      70.00      75.00       80.00
50%      80.00      83.00       87.00
75%      88.00      90.00       92.00
max      98.00      99.00       98.00
```

---

## 2️⃣ Detecting Missing Values

### `df.isnull()` - Boolean Matrix

**Purpose:** Returns DataFrame of same shape with `True` for missing values, `False` otherwise.

**Syntax**
```python
df.isnull()
```

**Example Transformation**

**Original DataFrame**
```
     name   math  science
0    Alice  85.0     90.0
1    Bob    NaN     82.0
2    Carol  88.0     NaN
```

**After `isnull()`**
```
     name   math   science
0    False  False  False
1    False  True   False  ← math is missing
2    False  False  True   ← science is missing
```

---

### `df.isnull().sum()` - Count Missing Values

**Purpose:** Counts total missing values per column.

**Syntax**
```python
df.isnull().sum()
```

**Output Example**
```
name        2
math        5
science     3
attendance  0
grade      10
dtype: int64
```

---

### `df.notnull()` - Inverse of `isnull()`

**Purpose:** Returns `True` for non-missing values, `False` for NaN.

**Syntax**
```python
df.notnull()
```

**Example**
```
     name   math   science
0    True   True   True
1    True   False  True   ← math is missing
2    True   True   False  ← science is missing
```

---

## 3️⃣ Creating Missing Values

### Assigning `None` to Column

**Purpose:** Manually introduce missing values for testing or data manipulation.

**Syntax**
```python
df['column'] = None
```

**Example Transformation**

**Before**
```
     name   math  science
0    Alice  85    90
1    Bob    78    82
```

**Setting Entire Column to None**
```python
df['science'] = None
```

**After**
```
     name   math  science
0    Alice  85    NaN
1    Bob    78    NaN
```

---

## 4️⃣ Filtering with `isnull()` Using `.loc[]`

### Why Use `.loc[]` with Conditions?

**Purpose:** Combines filtering and selection in one step - more efficient and readable than separate operations.

**Syntax**
```python
df.loc[df['column'].isnull()]
```

**Example Transformation**

**Original DataFrame**
```
     name   math  science
0    Alice  85.0  90.0
1    Bob    NaN   82.0
2    Carol  88.0  NaN
3    David  92.0  95.0
```

**Filter Rows Where Math is Missing**
```python
df.loc[df['math'].isnull()]
```

**Result**
```
     name   math  science
1    Bob    NaN   82.0    ← Only rows with missing math
```

---

### Alternative (Less Efficient) ❌

**Separate Operations**
```python
# Step 1: Create boolean mask
mask = df['math'].isnull()

# Step 2: Filter DataFrame
filtered = df[mask]
```

**Why `.loc[]` is Better:** ✅
- Single line operation
- More readable
- Can combine with column selection: `df.loc[condition, ['name', 'math']]`
- Industry standard practice

---

## 5️⃣ Removing Missing Values

### `df.dropna()` - Drop Rows with Missing Values

**Purpose:** Removes rows containing any or all NaN values based on parameters.

---

### Basic Usage (Drop Any NaN)

**Syntax**
```python
df.dropna()
```

**Before**
```
     name   math  science
0    Alice  85.0  90.0
1    Bob    NaN   82.0
2    Carol  88.0  NaN
3    David  92.0  95.0
```

**After `dropna()`**
```
     name   math  science
0    Alice  85.0  90.0    ← Only complete row kept
3    David  92.0  95.0
```

**Effect:** Rows 1 and 2 removed (contained NaN).

---

### `how='all'` - Drop Only Fully Empty Rows

**Syntax**
```python
df.dropna(how='all')
```

**Before**
```
     name   math  science
0    Alice  85.0  90.0
1    NaN    NaN   NaN     ← All values missing
2    Carol  NaN   88.0
```

**After `dropna(how='all')`**
```
     name   math  science
0    Alice  85.0  90.0
2    Carol  NaN   88.0    ← Kept (has some values)
```

**Effect:** Only row 1 dropped (completely empty).

---

### `axis=1` - Drop Columns with Missing Values

**Syntax**
```python
df.dropna(axis=1)
```

**Before**
```
     name   math  science  attendance
0    Alice  85.0  NaN      95
1    Bob    78.0  NaN      92
2    Carol  88.0  NaN      90
```

**After `dropna(axis=1)`**
```
     name   math  attendance
0    Alice  85.0  95          ← science column removed
1    Bob    78.0  92
2    Carol  88.0  90
```

**Effect:** Science column dropped (contained NaN).

---

### `axis=1, how='all'` - Drop Only Fully Empty Columns

**Syntax**
```python
df.dropna(axis=1, how='all')
```

**Before**
```
     name   math  empty_col  science
0    Alice  85.0  NaN        90.0
1    Bob    78.0  NaN        NaN
2    Carol  88.0  NaN        88.0
```

**After `dropna(axis=1, how='all')`**
```
     name   math  science
0    Alice  85.0  90.0      ← empty_col removed
1    Bob    78.0  NaN
2    Carol  88.0  88.0
```

**Effect:** Only `empty_col` dropped (100% NaN).

---

### `subset` - Drop Based on Specific Columns

**Syntax**
```python
df.dropna(subset=['column1', 'column2'])
```

**Before**
```
     name   math  science  grade
0    Alice  85.0  90.0     A
1    Bob    NaN   82.0     B
2    Carol  88.0  NaN      NaN
```

**Drop Rows with Missing Math or Science**
```python
df.dropna(subset=['math', 'science'])
```

**After**
```
     name   math  science  grade
0    Alice  85.0  90.0     A     ← Only complete math & science
```

**Effect:** Rows 1 and 2 dropped (missing math or science), grade NaN ignored.

---

## 6️⃣ Removing Duplicate Rows

### `df.drop_duplicates()` - Remove Duplicates

**Purpose:** Eliminates duplicate rows based on all or specific columns.

---

### Basic Usage (All Columns)

**Syntax**
```python
df.drop_duplicates()
```

**Before**
```
     name   score
0    Alice  85
1    Bob    78
2    Alice  85    ← Exact duplicate
3    Carol  92
```

**After `drop_duplicates()`**
```
     name   score
0    Alice  85    ← First occurrence kept
1    Bob    78
3    Carol  92
```

---

### `subset` - Check Specific Columns Only

**Syntax**
```python
df.drop_duplicates(subset=['column'])
```

**Before**
```
     name   math  science
0    Alice  85    90
1    Bob    78    82
2    Alice  92    88    ← Duplicate name, different scores
3    Carol  88    91
```

**Drop Duplicates Based on Name Only**
```python
df.drop_duplicates(subset=['name'])
```

**After**
```
     name   math  science
0    Alice  85    90      ← First Alice kept
1    Bob    78    82
3    Carol  88    91
```

**Effect:** Row 2 dropped (duplicate name), even though scores differ.

---

### `keep` Parameter - Control Which Duplicate to Keep

**Syntax**
```python
df.drop_duplicates(keep='first')   # Default
df.drop_duplicates(keep='last')    # Keep last occurrence
df.drop_duplicates(keep=False)     # Remove all duplicates
```

---

#### `keep='first'` (Default)

**Before**
```
     name   score
0    Alice  85
1    Bob    78
2    Alice  85    ← Duplicate
3    Alice  85    ← Duplicate
```

**After `keep='first'`**
```
     name   score
0    Alice  85    ← First kept
1    Bob    78
```

---

#### `keep='last'`

**Before**
```
     name   score
0    Alice  85    ← Duplicate
1    Bob    78
2    Alice  85    ← Duplicate
3    Alice  85    ← Last occurrence
```

**After `keep='last'`**
```
     name   score
1    Bob    78
3    Alice  85    ← Last kept
```

---

#### `keep=False` - Remove All Duplicates

**Before**
```
     name   score
0    Alice  85    ← Duplicate
1    Bob    78    ← Unique
2    Alice  85    ← Duplicate
3    Carol  92    ← Unique
```

**After `keep=False`**
```
     name   score
1    Bob    78    ← Only unique rows kept
3    Carol  92
```

**Effect:** All duplicate rows removed entirely (both occurrences).

---

## 7️⃣ Basic Statistical Functions

### Aggregation Functions

**Purpose:** Calculate summary statistics for numeric columns.

**Syntax**
```python
df['column'].min()
df['column'].max()
df['column'].mean()
df['column'].median()
df['column'].sum()
df['column'].std()
```

**Example DataFrame**
```
     name   math
0    Alice  85
1    Bob    78
2    Carol  92
3    David  88
```

**Calculations**
```python
df['math'].min()     # 78
df['math'].max()     # 92
df['math'].mean()    # 85.75
df['math'].median()  # 86.5
df['math'].sum()     # 343
df['math'].std()     # 5.85
```

---

### `df['column'].unique()` - Unique Values

**Purpose:** Returns array of unique values in a column.

**Syntax**
```python
df['column'].unique()
```

**Example**
```
     grade
0    A
1    B
2    A
3    C
4    B
```

**Output**
```python
df['grade'].unique()
# Returns: array(['A', 'B', 'C'])
```

---

### `df['column'].value_counts()` - Frequency Distribution

**Purpose:** Counts occurrences of each unique value, sorted by frequency.

**Syntax**
```python
df['column'].value_counts()
df['column'].value_counts(dropna=False)  # Include NaN in count
```

**Example**
```
     grade
0    A
1    B
2    A
3    NaN
4    B
5    A
```

**Without NaN Count**
```python
df['grade'].value_counts()
```
```
A    3
B    2
Name: grade, dtype: int64
```

**With NaN Count (`dropna=False`)**
```python
df['grade'].value_counts(dropna=False)
```
```
A      3
B      2
NaN    1
Name: grade, dtype: int64
```

---

## 8️⃣ Using `inplace` Parameter

**Purpose:** Modify the original DataFrame instead of returning a copy.

**Syntax**
```python
df.dropna(inplace=True)
df.drop_duplicates(inplace=True)
df.reset_index(drop=True, inplace=True)
```

**Without `inplace`**
```python
df_clean = df.dropna()  # Original df unchanged
```

**With `inplace`**
```python
df.dropna(inplace=True)  # Original df modified directly
```

---

## 📋 Learning Sequence Summary

1. **Inspect** → `shape`, `info()`, `describe()`
2. **Detect Missing** → `isnull()`, `notnull()`, `.sum()`
3. **Filter Missing** → `df.loc[df['col'].isnull()]`
4. **Remove Missing** → `dropna()`, `how`, `axis`, `subset`
5. **Remove Duplicates** → `drop_duplicates()`, `subset`, `keep`
6. **Basic Stats** → `min()`, `max()`, `mean()`, `median()`
7. **Unique Values** → `unique()`, `value_counts()`
8. **Permanent Changes** → `inplace=True`

---

💡 **Best Practice:** Always inspect data with `info()` and `isnull().sum()` before cleaning to understand missing patterns.

# 📊 Working with Excel Files in Pandas

## Required Libraries Overview

### `xlrd` - Excel Reader (Legacy)

**Purpose:** Reads older Excel files (`.xls` format, Excel 2003 and earlier).

**Installation**
```python
pip install xlrd
```

**Key Points:**
- Supports `.xls` files only
- Read-only operations
- Required for `pd.read_excel()` with old Excel formats
- Legacy library, limited modern Excel support

---

### `openpyxl` - Modern Excel Handler

**Purpose:** Reads and writes modern Excel files (`.xlsx` format, Excel 2007+).

**Installation**
```python
pip install openpyxl
```

**Key Points:**
- Supports `.xlsx` files (most common format)
- Both read and write operations
- Preserves Excel formatting, formulas, charts
- Industry standard for Excel operations in Python
- Required for `df.to_excel()` operations

---

## 📖 Reading Excel Files

### Basic Read (Default Sheet)

**Syntax**
```python
df = pd.read_excel('filename.xlsx')
```

**Example**
```python
data = pd.read_excel('students.xlsx')
```

**Effect:** Reads the **first sheet** by default.

**Excel File Structure**
```
students.xlsx
├── Sheet1 (default)
│   ├── name   | math | science
│   ├── Alice  | 85   | 90
│   └── Bob    | 78   | 82
└── Sheet2
    └── ...
```

**Resulting DataFrame**
```
     name   math  science
0    Alice  85    90
1    Bob    78    82
```

---

### Reading Specific Sheet

**Syntax**
```python
df = pd.read_excel('filename.xlsx', sheet_name='SheetName')
```

**Example**
```python
data = pd.read_excel('students.xlsx', sheet_name='Sheet2')
```

**Effect:** Reads only the specified sheet (`Sheet2`).

---

### Reading Multiple Sheets

**Syntax**
```python
df = pd.read_excel('filename.xlsx', sheet_name=None)
```

**Example**
```python
all_sheets = pd.read_excel('students.xlsx', sheet_name=None)
```

**Result:** Returns a **dictionary** where keys are sheet names.
```python
{
    'Sheet1': DataFrame1,
    'Sheet2': DataFrame2
}

# Access individual sheets
df1 = all_sheets['Sheet1']
df2 = all_sheets['Sheet2']
```

---

## ✏️ Writing Excel Files

### ⚠️ Basic Write (Overwrites Entire File!)

**Syntax**
```python
df.to_excel('filename.xlsx')
```

**Example - Problem Scenario**

**Original Excel File (`data.xlsx`)**
```
data.xlsx
├── Sheet1: [Student data with 100 rows]
└── Sheet2: [Grade data with 50 rows]
```

**Modifying and Saving**
```python
data = pd.read_excel('data.xlsx', sheet_name='Sheet2')
data.iloc[0, 0] = 54
data.to_excel('data.xlsx')
```

**After Save - File Structure**
```
data.xlsx
└── Sheet1: [Only modified Sheet2 data]  ← Original Sheet1 DELETED!
```

**Problem:** `to_excel()` **replaces the entire file** - all other sheets are lost!

---

## 🔧 Preserving Multiple Sheets (Correct Method)

### Using `ExcelWriter` with `mode='a'`

**Purpose:** Append or update specific sheets without deleting existing ones.

**Syntax**
```python
with pd.ExcelWriter('filename.xlsx', mode='a', engine='openpyxl', if_sheet_exists='replace') as writer:
    df.to_excel(writer, sheet_name='SheetName', index=False)
```

---

### Example: Update One Sheet, Keep Others

**Original File Structure**
```
data.xlsx
├── Sheet1: [Student names and IDs]
├── Sheet2: [Math scores]
└── Sheet3: [Science scores]
```

**Reading and Modifying Sheet2**
```python
import pandas as pd

# Read specific sheet
data = pd.read_excel('data.xlsx', sheet_name='Sheet2')

# Modify data
data.iloc[0, 0] = 54
```

**Before Modification (Sheet2)**
```
     student_id  math_score
0    101         85
1    102         78
2    103         92
```

**After Modification (Sheet2)**
```
     student_id  math_score
0    54          85          ← Changed
1    102         78
2    103         92
```

**Writing Back Without Losing Other Sheets**
```python
with pd.ExcelWriter('data.xlsx', mode='a', engine='openpyxl', if_sheet_exists='replace') as writer:
    data.to_excel(writer, sheet_name='Sheet2', index=False)
```

**After Save - File Structure**
```
data.xlsx
├── Sheet1: [Original student data - PRESERVED]
├── Sheet2: [Updated math scores]
└── Sheet3: [Original science data - PRESERVED]
```

---

### Parameter Explanation

| Parameter | Value | Effect |
|-----------|-------|--------|
| `mode='a'` | Append mode | Opens existing file without overwriting |
| `engine='openpyxl'` | Excel engine | Required for `.xlsx` files |
| `if_sheet_exists='replace'` | Replace | Overwrites only the specified sheet |
| `if_sheet_exists='overlay'` | Overlay | Updates cells without clearing sheet |
| `index=False` | No index column | Prevents adding row numbers |

---

## 📝 Writing Multiple Sheets at Once

**Purpose:** Create a new Excel file with multiple sheets.

**Syntax**
```python
with pd.ExcelWriter('filename.xlsx', engine='openpyxl') as writer:
    df1.to_excel(writer, sheet_name='Sheet1', index=False)
    df2.to_excel(writer, sheet_name='Sheet2', index=False)
    df3.to_excel(writer, sheet_name='Sheet3', index=False)
```

**Example**
```python
# Three separate DataFrames
students = pd.DataFrame({'name': ['Alice', 'Bob'], 'id': [101, 102]})
math = pd.DataFrame({'id': [101, 102], 'score': [85, 78]})
science = pd.DataFrame({'id': [101, 102], 'score': [90, 82]})

# Write all to one file
with pd.ExcelWriter('grades.xlsx', engine='openpyxl') as writer:
    students.to_excel(writer, sheet_name='Students', index=False)
    math.to_excel(writer, sheet_name='Math', index=False)
    science.to_excel(writer, sheet_name='Science', index=False)
```

**Resulting File Structure**
```
grades.xlsx
├── Students: [name, id]
├── Math: [id, score]
└── Science: [id, score]
```

---

## 🔑 Common Parameters for `to_excel()`

| Parameter | Default | Purpose | Example |
|-----------|---------|---------|---------|
| `sheet_name` | 'Sheet1' | Name of the sheet | `sheet_name='Grades'` |
| `index` | True | Include row index | `index=False` (cleaner) |
| `header` | True | Include column names | `header=True` |
| `startrow` | 0 | Starting row position | `startrow=5` (skip rows) |
| `startcol` | 0 | Starting column position | `startcol=2` (offset) |

---

## 🔄 Complete Workflow Example

**Step 1: Read Existing File**
```python
data = pd.read_excel('students.xlsx', sheet_name='Grades')
```

**Original Data**
```
     name   math  science
0    Alice  85    90
1    Bob    78    82
2    Carol  92    88
```

**Step 2: Modify Data**
```python
data.loc[data['name'] == 'Bob', 'math'] = 95
```

**After Modification**
```
     name   math  science
0    Alice  85    90
1    Bob    95    82      ← Updated
2    Carol  92    88
```

**Step 3: Save Without Losing Other Sheets**
```python
with pd.ExcelWriter('students.xlsx', mode='a', engine='openpyxl', if_sheet_exists='replace') as writer:
    data.to_excel(writer, sheet_name='Grades', index=False)
```

**Result:** Only 'Grades' sheet updated, all other sheets preserved.

---

## 📋 Quick Reference

| Task | Command |
|------|---------|
| Read first sheet | `pd.read_excel('file.xlsx')` |
| Read specific sheet | `pd.read_excel('file.xlsx', sheet_name='Sheet2')` |
| Read all sheets | `pd.read_excel('file.xlsx', sheet_name=None)` |
| Write new file | `df.to_excel('file.xlsx', index=False)` |
| Update one sheet | Use `ExcelWriter` with `mode='a'` |
| Write multiple sheets | Use `ExcelWriter` context manager |

---

💡 **Critical Warning:** Never use `df.to_excel()` directly on existing multi-sheet files - always use `ExcelWriter` with `mode='a'` to preserve other sheets!