# Python DataFrames (pandas)

This notebook introduces the basics and advanced features of pandas `DataFrame`. DataFrames are central to data manipulation in Python.

## Table of Contents
1. [Basic Concepts](#basic)
2. [Advanced Concepts](#advanced)
3. [Exercises](#exercises)
4. [Real-World Applications](#applications)

## 1. Basic Concepts <a name="basic"></a>

### 1.1 Creating a DataFrame

In [None]:
import pandas as pd

# From a dictionary
data = {
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [25, 30, 35],
    "City": ["New York", "Los Angeles", "Chicago"]
}

df = pd.DataFrame(data)
df

### 1.2 Reading Data from Files
Commonly, pandas is used to read CSV, Excel, or JSON files.

In [None]:
# Example (uncomment if you have a file)
# df_csv = pd.read_csv('data.csv')
# df_excel = pd.read_excel('data.xlsx')
# df_json = pd.read_json('data.json')
pass

### 1.3 Basic Inspection
Methods for quickly assessing your DataFrame’s shape and contents.

In [None]:
print(df.head())     # First few rows
print(df.tail())     # Last few rows
print(df.info())     # Data types and null counts
print(df.describe()) # Statistical summary

## 2. Advanced Concepts <a name="advanced"></a>

### 2.1 Indexing and Selection
Pandas offers powerful indexing with `.loc` (label-based) and `.iloc` (integer-based).

In [None]:
# Label-based indexing
print("Label-based indexing:")
print(df.loc[0, "Name"])
print()

# Integer-based indexing
print("Integer-based indexing:")
print(df.iloc[1, 2])

### 2.2 Merging and Joining
You can combine DataFrames in various ways using `merge()`, `join()`, or `concat()`.

In [None]:
data_extra = {
    "Name": ["Alice", "Bob"],
    "Salary": [70000, 80000]
}
df_extra = pd.DataFrame(data_extra)

merged_df = pd.merge(df, df_extra, on="Name", how="left")
merged_df

### 2.3 GroupBy and Aggregation
Grouping data by categories and applying aggregate functions like `sum`, `mean`, or `count`.

In [None]:
# Example data
df_sales = pd.DataFrame({
    "Product": ["A", "A", "B", "B", "B"],
    "Sales": [100, 150, 200, 120, 180],
    "Region": ["North", "South", "North", "South", "North"]
})

grouped = df_sales.groupby("Product").agg({"Sales": "sum"})
grouped

### 2.4 Handling Missing Data
Missing data is common in real datasets. Pandas provides methods like `dropna()`, `fillna()`, etc.

In [None]:
df_missing = pd.DataFrame({
    "Col1": [1, None, 3],
    "Col2": [None, 5, 6]
})
print(df_missing)

df_dropped = df_missing.dropna()
print("\nAfter dropna:\n", df_dropped)

df_filled = df_missing.fillna(0)
print("\nAfter fillna(0):\n", df_filled)

## 3. Exercises <a name="exercises"></a>

### Exercise 1: Data Cleaning
1. Create a DataFrame with columns `Name`, `Age`, `City`, and some missing values.
2. Drop rows with missing values.
3. Fill missing values in `Age` with the mean age.


In [None]:
# Your code here
import numpy as np

df_ex = pd.DataFrame({
    "Name": ["Tom", "Jane", "Steve", "NaN"],
    "Age": [25, None, 30, 22],
    "City": ["Boston", "", "Seattle", None]
})
# 1) Create the DataFrame
# 2) Drop rows with missing values
# 3) Fill missing Age with mean

### Exercise 2: GroupBy and Aggregation
Using the `df_sales` DataFrame shown earlier (or create your own):
1. Group by `Region`.
2. Calculate the average sales per region.
3. Print the results.

In [None]:
# Your code here


### Exercise 3: Merging DataFrames
1. Create two DataFrames `df1` and `df2` with a common column (e.g., `id`).
2. Perform a left merge on `id`.
3. Perform an inner merge on `id`.

In [None]:
# Your code here


## 4. Real-World Applications <a name="applications"></a>

### ETL (Extract, Transform, Load)
- Data scientists use pandas to extract data from various sources (databases, APIs, files), transform it (cleaning, feature engineering), and load it into analytics tools.

### Exploratory Data Analysis (EDA)
- Pandas is essential for quick EDA: summarizing datasets, detecting outliers, etc.

### Time-Series Analysis
- Pandas offers specialized support for time-series data, making it popular in finance and IoT data processing.

These are just a few examples—pandas is central to nearly every data-related task in Python!