
### **Book 1: 01_pandas_intro.ipynb**
#### **1. Introduction to Pandas and its Ecosystem**
- Importance in data analysis and manipulation  
- Relationship with NumPy, matplotlib, and scikit-learn  
- When and why to use Pandas  

#### **2. Pandas Core Data Structures**
- `Series`: 1D labeled array  
  - Creating Series from lists, dictionaries  
  - Indexing and slicing  
- `DataFrame`: 2D labeled data structure  
  - Creating DataFrames from lists of dicts, dicts of lists, NumPy arrays, etc.  
  - Basic structure: `head()`, `tail()`, `shape`, `columns`, `index`  

#### **3. Data Selection and Indexing**
- Column access: dot notation vs bracket notation  
- Row access: `loc[]` vs `iloc[]`  
- Slicing, selecting with conditions  
- Boolean indexing  

#### **4. Data Types and Type Conversion**
- Checking data types (`.dtypes`, `info()`)  
- Converting types: `astype()`  
- Handling categorical data  

#### **5. Basic Operations on DataFrames**
- Descriptive statistics: `mean()`, `sum()`, `count()`, `std()`  
- Value counts and unique values  
- Sorting: `sort_values()` and `sort_index()`  
- Renaming columns and indices  

#### **6. Working with Text Data**
- String methods with `.str` accessor  
- Splitting, replacing, lower/upper casing  

#### **7. Importing and Exporting Data**
- Reading from CSV, Excel, JSON, SQL  
- Writing to file formats  
- Parameters for handling missing values and column types  

#### **8. Practical Examples**
- Quick intro dataset: Titanic, Iris, or Movies  
- Loading, inspecting, basic analysis  

---

### **Book 2: 02_data_cleaning.ipynb**
#### **1. Overview of Data Cleaning in Pandas**
- Importance of cleaning in the data science pipeline  
- Defining "clean" data  

#### **2. Handling Missing Data**
- Detecting null values: `isnull()`, `notnull()`  
- Dropping missing data: `dropna()`  
- Filling missing data: `fillna()`  
  - Methods: forward fill, backward fill, scalar fill  

#### **3. Duplicates**
- Finding duplicates: `duplicated()`  
- Removing duplicates: `drop_duplicates()`  

#### **4. Dealing with Outliers**
- Identifying outliers with statistical methods (IQR, Z-score)  
- Removing or flagging outliers  

#### **5. Data Type Correction**
- Converting types with `pd.to_numeric()`, `pd.to_datetime()`, `astype()`  
- Parsing dates and dealing with different formats  

#### **6. Data Transformation Techniques**
- Scaling numerical columns (min-max, z-score)  
- Applying functions: `apply()`, `map()`, `applymap()`  
- Lambda functions  

#### **7. String Cleaning**
- Stripping whitespace  
- Replacing patterns with regex (`str.replace()`)  
- Lowercasing, removing punctuation  

#### **8. Index Handling**
- Setting and resetting index (`set_index()`, `reset_index()`)  
- Dealing with hierarchical (MultiIndex) data  

#### **9. Practical Data Cleaning Workflow**
- Step-by-step cleaning on a messy dataset  
- Using a reproducible pipeline  

---

### **Book 3: 03_groupby_merge.ipynb**
#### **1. Understanding `groupby()`**
- Grouping logic: split-apply-combine  
- Aggregations: `sum()`, `mean()`, `agg()`  
- Custom aggregation functions  
- Multi-level groupings  

#### **2. Transform and Filter**
- Using `transform()` to broadcast group-wise values  
- Filtering groups with `filter()`  

#### **3. Pivot Tables and Crosstabs**
- Creating pivot tables: `pivot_table()`  
- Frequency tables with `crosstab()`  

#### **4. Combining DataFrames**
- Concatenation: `pd.concat()`  
  - Axis-based stacking  
- Merging: `pd.merge()`  
  - Inner, left, right, outer joins  
  - Merging on index vs columns  
- Joining: `join()`  

#### **5. Reshaping Data**
- `melt()` for unpivoting  
- `pivot()` for re-pivoting  
- Stack and unstack  

#### **6. Practical Case Studies**
- Grouping and aggregating real datasets (e.g., sales data, movie ratings)  
- Merging multiple data sources (users + transactions)  

---

### **Book 4: 04_eda_with_pandas.ipynb**
#### **1. Introduction to Exploratory Data Analysis (EDA)**
- Goal of EDA  
- Pandas as an EDA tool  

#### **2. Overview and Structure Inspection**
- Checking data dimensions, types  
- Null values heatmap (if using visualization)  
- Summary statistics  

#### **3. Univariate Analysis**
- Distributions of individual features  
- Value counts for categoricals  
- Histograms for numericals (`plot.hist()`)  

#### **4. Bivariate Analysis**
- Correlation matrix: `.corr()`  
- Scatter plots, bar plots, groupby comparisons  
- Crosstabs for category-category relationships  

#### **5. Feature Relationships**
- Conditional filtering and visualization  
- Grouped statistics  
- Boxplots and violin plots (if visual tools used)  

#### **6. Outlier and Anomaly Detection**
- Visualizing with boxplots  
- Using quantile thresholds  

#### **7. Feature Engineering for EDA**
- Creating new columns  
- Binning and bucketing (`pd.cut()`, `pd.qcut()`)  
- Label encoding or mapping  

#### **8. Visualization Integration (Optional)**
- Quick visualizations with `.plot()`  
- Using `matplotlib` or `seaborn` alongside Pandas  

#### **9. End-to-End EDA Walkthrough**
- Load → Clean → Explore → Engineer  
- Narrative-driven exploration of a dataset  

---

## **05_advanced_pandas.ipynb**

1. **Introduction to Advanced Pandas**
   - Recap of basic Pandas concepts  
   - Importance of advanced techniques for large and complex datasets  

2. **Advanced Indexing & Selection**
   - MultiIndexing and hierarchical data manipulation  
   - Advanced slicing with `.xs()` and `.loc[]` on MultiIndex DataFrames  
   - Boolean indexing with complex conditions  

3. **Complex Data Reshaping**
   - Advanced pivoting and unpivoting techniques (`pivot`, `melt`, and `stack`/`unstack`)  
   - Working with wide-to-long data formats  
   - Handling hierarchical columns after reshaping

4. **Time Series and Date/Time Functionality**
   - Resampling, frequency conversion, and moving windows  
   - Rolling, expanding, and exponentially weighted windows  
   - Shifting and lag features for time-series modeling  
   - Handling time zones and datetime indexing

5. **Advanced Aggregation and Transformation**
   - Custom aggregation functions using `agg()` and `apply()`  
   - Grouped operations on MultiIndex data  
   - Window functions for more dynamic aggregations

6. **Feature Engineering in Pandas**
   - Creating new features from date/time and text data  
   - Handling categorical encoding and dummy variable creation  
   - Techniques for reducing memory footprint and improving performance

7. **Performance Optimization in Pandas**
   - Vectorization techniques and avoiding Python loops  
   - Optimizing memory usage and data types  
   - Parallel processing and using libraries (e.g., Dask integration)

8. **Best Practices and Real-World Examples**
   - Integrating advanced Pandas techniques into data pipelines  
   - Troubleshooting common pitfalls in large-scale data handling  
   - Case studies showing advanced data wrangling and analysis

---
