# FrontierML: Machine Learning with Real-World Data

Welcome to **FrontierML**, an interactive course that teaches machine learning through hands-on implementation with real-world data collection and analysis.

## 🎯 Course Overview

This course combines **theoretical understanding** with **practical implementation**, emphasizing:

- **Real-world data collection** through ethical web scraping and APIs
- **Mathematical foundations** with step-by-step derivations and citations
- **Implementation from scratch** to understand core concepts
- **Industry-standard tools** (scikit-learn, pandas, matplotlib)
- **Best practices** for reproducible data science

## 📚 Course Structure

### Chapter 1: Data Collection and Web Scraping
**File:** `01_data_collection.ipynb`

Learn to collect and preprocess real-world data:
- Ethical web scraping principles and techniques
- API interactions for data collection
- Data quality assessment and cleaning
- Feature engineering for machine learning
- Storage and management best practices

**Key Skills:** Web scraping, data preprocessing, feature engineering

---

### Chapter 2: Linear Regression
**File:** `02_linear_regression.ipynb`

Master the foundation of predictive modeling:
- Mathematical derivation of least squares estimation
- Implementation from scratch using NumPy
- Real estate price prediction with scraped data
- Model evaluation and interpretation
- Assumptions and diagnostics

**Key Skills:** Regression analysis, mathematical implementation, model evaluation

---

### Chapter 3: Logistic Regression
**File:** `03_logistic_regression.ipynb`

Understand classification algorithms:
- Sigmoid function and maximum likelihood estimation
- Binary and multi-class classification
- Feature scaling and regularization
- ROC curves and performance metrics

**Key Skills:** Classification, probability estimation, performance evaluation

---

### Chapter 4: Decision Trees
**File:** `04_decision_trees.ipynb`

Learn tree-based learning algorithms:
- Information theory and entropy calculations
- Tree construction and splitting criteria
- Overfitting prevention and pruning
- Feature importance analysis

**Key Skills:** Tree algorithms, information theory, interpretability

---

### Chapter 5: Random Forests
**File:** `05_random_forest.ipynb`

Master ensemble learning techniques:
- Bootstrap aggregating (bagging) principles
- Random feature selection strategies
- Out-of-bag error estimation
- Hyperparameter tuning and optimization

**Key Skills:** Ensemble methods, hyperparameter tuning, model selection

---

### Chapter 6: Support Vector Machines
**File:** `06_support_vector_machines.ipynb`

Understand margin-based classification:
- Geometric interpretation and margin maximization
- Kernel methods and the kernel trick
- Handling non-linearly separable data
- Multi-class classification strategies

**Key Skills:** Geometric thinking, kernel methods, optimization

---

### Chapter 7: Neural Networks
**File:** `07_neural_networks.ipynb`

Introduction to deep learning fundamentals:
- Perceptron and multi-layer networks
- Backpropagation algorithm implementation
- Activation functions and optimization
- Practical considerations for deep learning

**Key Skills:** Neural networks, gradient-based optimization, deep learning

## 🛠️ Prerequisites

- **Python Programming:** Basic to intermediate Python skills
- **Mathematics:** Linear algebra, calculus basics, probability
- **Statistics:** Descriptive statistics, hypothesis testing concepts

## 📦 Required Libraries

All dependencies are listed in `requirements.txt`:
```bash
pip install -r requirements.txt
```

Key libraries:
- **Data Processing:** pandas, numpy
- **Machine Learning:** scikit-learn
- **Visualization:** matplotlib, seaborn, plotly
- **Web Scraping:** requests, beautifulsoup4, selenium
- **Jupyter Book:** jupyter-book

## 🚀 Getting Started

### Option 1: Jupyter Book (Recommended)
```bash
# Build the interactive book
make book

# Serve locally
make serve
```

### Option 2: Individual Notebooks
```bash
# Start Jupyter Lab
make jupyter

# Navigate to notebooks/ directory
```

## 📖 Learning Philosophy

This course follows several key principles:

### 1. **Theory + Practice**
Every algorithm is presented with:
- Mathematical foundations and derivations
- Step-by-step implementation from scratch
- Real-world applications with actual data

### 2. **Real Data Focus**
Instead of toy datasets:
- Scrape real-world data from websites and APIs
- Handle messy, incomplete data
- Address real preprocessing challenges

### 3. **Reproducible Science**
All work follows scientific principles:
- Proper citations for all concepts
- Documented methodology and assumptions
- Version-controlled code and data

### 4. **Progressive Complexity**
Start simple, build complexity:
- Begin with linear models
- Progress to ensemble methods
- Conclude with neural networks

## 🎓 Learning Outcomes

By completing this course, you will:

1. **Collect real-world data** from various sources ethically and efficiently
2. **Understand mathematical foundations** of core ML algorithms with confidence
3. **Implement algorithms from scratch** using NumPy and Python
4. **Apply production tools** like scikit-learn effectively
5. **Evaluate and interpret models** using appropriate metrics and visualizations
6. **Follow best practices** for reproducible data science workflows

## 📊 Assessment Approach

Each chapter includes:
- **Interactive exercises** embedded in notebooks
- **Real-world projects** with actual data
- **Mathematical problems** to test understanding
- **Implementation challenges** to build coding skills

## 🔗 Additional Resources

- **References:** Complete bibliography in `references.bib`
- **Utilities:** Custom functions in `utils/` directory
- **Data:** Processed datasets in `data/` directory
- **Documentation:** Additional guides in `docs/` directory

## 🎯 Confidence Ratings

Throughout this course, we provide confidence ratings (1-5) for:
- Mathematical explanations and derivations
- Implementation correctness and efficiency
- Data collection methodologies
- Best practice recommendations

---

**Ready to begin your machine learning journey?** Start with [Chapter 1: Data Collection](01_data_collection.ipynb) and learn to gather real-world data for analysis!