# üß† Session 1: Kickoff & Introduction to Machine Learning

**All models are wrong, but some are useful.** - [George Box](https://en.wikipedia.org/wiki/All_models_are_wrong)

![ML 19th Century](https://github.com/ValRCS/RBS_LIFT_AI_ML_Models/blob/main/img/Data_Model_19th_cent.png?raw=true)

Welcome to the first session of the *Machine Learning Foundation* course! In this session, we will cover the basics of Machine Learning, the different types, and key concepts needed to get started.

## Course Repository

* Github: https://github.com/ValRCS/RBS_LIFT_AI_ML_Models

* [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ValRCS/RBS_LIFT_AI_ML_Models/blob/main/notebooks/Session_1_Course_Introduction.ipynb)




## üïí 00:00‚Äì00:10 ‚Äì Course Kickoff and Orientation
- Instructor and participant introductions
- Course goals and structure
- Topics to be covered
- Tools and environment setup (Python, Jupyter/Colab, GitHub)

**Icebreaker**: In one word, what do you expect from this course?

![AI Team](https://raw.githubusercontent.com/ValRCS/RBS_LIFT_AI_ML_Models/refs/heads/main/img/ai_ml_venn_diagram.png)

## üïí 00:10‚Äì00:25 ‚Äì What Is Machine Learning?


**Definition:** Machine Learning (ML) is a field of computer science that focuses on building systems that learn from and make decisions based on data.

**Difference from Traditional Programming:**
```
Traditional Programming: Data + Rules ‚Üí Output
Machine Learning: Data + Output ‚Üí Rules (Model)
```


The key distinction between traditional programming and machine learning lies in **how the rules (logic) are derived**. 

---

### üîπ Traditional Programming

**Formula**:
`Data + Rules ‚Üí Output`

**Interpretation**:
In classical programming, a human programmer explicitly defines the **rules** (logic, conditions, algorithms). The program takes **input data**, applies these **handcrafted rules**, and produces an **output**.

**Example**:
To determine if a person is eligible to vote:

```python
def can_vote(age):
    return age >= 18
```

* **Rule**: age must be 18 or more
* **Input**: age = 20
* **Output**: True (can vote)

---

### üîπ Machine Learning

**Formula**:
`Data + Output ‚Üí Rules (Model)`

**Interpretation**:
In machine learning, instead of programming the rules manually, we feed the system **examples of data and corresponding correct outputs**, and the system **learns the rules** (in the form of a model). This is often referred to as ‚Äútraining a model.‚Äù

**Example**:
You provide:

* Inputs: Age, income, education, voting history, etc.
* Outputs: Whether each person voted or not
  The ML algorithm finds **patterns** and produces a **model** that can predict future voting behavior.

---

### üîç Core Conceptual Difference

| Aspect          | Traditional Programming            | Machine Learning                                        |
| --------------- | ---------------------------------- | ------------------------------------------------------- |
| **Rules**       | Hand-coded by a human              | Learned from data                                       |
| **Flexibility** | Fixed ‚Äî changes require code edits | Adaptive ‚Äî improves with more data                      |
| **Suitability** | Best for deterministic tasks       | Best for complex or fuzzy pattern recognition           |
| **Examples**    | Calculators, sorting algorithms    | Spam filters, voice recognition, recommendation systems |

---

### üí° Why This Matters

Machine learning shifts the burden from the **human explicitly understanding the problem well enough to write rules**, to the **machine discovering patterns from data**, often uncovering insights that humans might miss.

In real-world applications where rules are too complex, subtle, or numerous to define manually (like facial recognition or fraud detection), ML dramatically outperforms traditional programming approaches.



### Real-world Applications:
- Spam filtering
- Recommender systems (e.g. Netflix, YouTube)
- Voice recognition (e.g. Siri, Alexa)
- Fraud detection

![ML Examples](https://github.com/ValRCS/RBS_LIFT_AI_ML_Models/blob/main/img/Netflix_in_80s.png?raw=true)

## üïí 00:25‚Äì00:35 ‚Äì Types of Machine Learning

### 1. **Supervised Learning**
- Input: Features + Labels
- Goal: Predict output
- Example: Predict house price

### 2. **Unsupervised Learning**
- Input: Features only
- Goal: Find structure or patterns
- Example: Group similar customers

### 3. **Reinforcement Learning**
- Input: Environment
- Goal: Maximize reward over time
- Example: Robot learning to walk

TODO: add an illustration of all three types together

### 4. **Semi-supervised Learning**
- Input: Small amount of labeled data + large amount of unlabeled data
- Goal: Improve learning with limited labels

## üïí 00:35‚Äì00:45 ‚Äì Key Concepts & ML Workflow

### Key Terms:
- **Feature**: Input variable
- **Label**: Output variable (target)
- **Model**: A mathematical structure to map input to output
- **Training**: Fitting the model to data
- **Testing**: Evaluating model performance on new data

### Common Problems:
- **Overfitting**: Model learns noise, not general patterns
- **Underfitting**: Model too simple to capture data patterns

### ML Pipeline Overview:
```text
1. Data Collection
2. Data Cleaning and Preprocessing
3. Model Training
4. Model Evaluation
5. Deployment
```

![ML Process](https://github.com/ValRCS/RBS_LIFT_AI_ML_Models/blob/main/img/ML_Pipeline.png?raw=true)

## Machine Learning Is a Process, Not an Algorithm

Successful ML systems follow a **lifecycle**, not a single modeling step.

A fuller commonly used structure would include the following stages:

1. Frame the problem
2. Get the data
3. Explore the data
4. Prepare the data
5. Try multiple models
6. Improve and fine-tune
7. Present results
8. Deploy and monitor

## üïí 00:45‚Äì00:50 ‚Äì Session Summary & Next Steps

## Optional: If the Group Is Already Experienced
### Advanced Concepts Overview
This section can be covered or skipped depending on the audience.

- Bias vs variance intuition (conceptual)
- Why accuracy can be misleading
- Data leakage and why it invalidates results
- Why ML projects fail after deployment

No mathematics or code is required to understand these ideas.

### Bias vs Variance (Intuition Only)

**Bias** describes models that are *too simple* for the problem.  
They miss important patterns, even on training data.

**Variance** describes models that are *too sensitive* to the data.  
They fit training data extremely well but fail on new, unseen data.

**Intuition example**
- Predicting house prices using only *number of rooms* ‚Üí high bias  
- Predicting house prices using *hundreds of noisy features* ‚Üí high variance  

Most real ML work is about finding a **useful compromise**, not eliminating one entirely.

---

### Why Accuracy Can Be Misleading

Accuracy answers the question:  
> *How often is the model correct?*

Sometimes this is the **wrong question**.

**Example**
- Dataset: 99% ‚Äúnot fraud‚Äù, 1% ‚Äúfraud‚Äù
- Model that always predicts ‚Äúnot fraud‚Äù ‚Üí **99% accuracy**
- Real usefulness ‚Üí **zero**

In many real problems (fraud detection, medical diagnosis, moderation),
**the type of mistake matters more than the total number of mistakes**.

---

### Data Leakage (Why It Invalidates Results)

**Data leakage** occurs when information from the *future* or from the *answer itself*
is accidentally used during training.

**Examples**
- Predicting exam success using ‚Äúfinal grade‚Äù as a feature  
- Predicting customer churn using ‚Äúaccount closed date‚Äù  
- Normalizing or scaling data *before* splitting into train and test sets  

The model appears to perform extremely well ‚Äî but only because it had access
to information it would never have in real use.

**Rule of thumb**
> If your evaluation looks *too good to be true*, it probably is.

---

### Why ML Projects Fail After Deployment

Many ML systems work well in notebooks but fail in real-world use.

**Common reasons**
- Incoming data changes over time
- Data pipelines break silently
- Model performance is never monitored
- Business goals change but the model does not
- No one is responsible for maintenance

**Key insight**
> Training a model is often the **smallest part** of a successful ML system.

This is why production ML always involves **monitoring, retraining, and human oversight**.

---

### Optional Reflection Prompt (Advanced Groups)

- Which of these issues have you encountered in practice?
- Which one seems most likely in your own domain?
- Which problem cannot be fixed by ‚Äúa better algorithm‚Äù?


## ‚úÖ Summary and Exit Prompt
By the end of this session, you should be able to:
- Explain what machine learning is
- Recognize types of ML and their applications
- Understand the core concepts and ML process


## Reflection (Before We Move to Code in Next Session)

Consider the following questions:

- What problem from your domain *could* be framed as supervised learning?
- What would count as "success" for that problem?
- What data would you need‚Äîbut probably do not yet have?

We will revisit these questions throughout the course.