# 🧠 Session 2: ML Process Overview + Warm-Up Activity

## 🕒 00:00–00:20 – ML Workflow – Step-by-Step Process Overview
**🎯 Objectives:**
- Familiarize participants with the full machine learning pipeline
- Explain key steps and their role in developing a successful model

**📚 Content:**
1. **Define the Problem**
   - What are we trying to predict or discover?
   - Example: Predict house prices, detect spam, group customers

2. **Collect and Prepare Data**
   - Data sources, quality, format
   - Cleaning, handling missing values, encoding, normalization

3. **Split the Data**
   - Train/test (and optionally validation) sets
   - Reason: avoid data leakage and evaluate generalization

4. **Select a Model**
   - Depends on problem type (regression, classification, clustering)
   - Examples: Linear regression, decision trees, K-means

5. **Train the Model**
   - Fit model to training data
   - Use appropriate loss function and optimizer

6. **Evaluate Performance**
   - Accuracy, precision/recall, RMSE, confusion matrix
   - Cross-validation to reduce variance

7. **Tune and Optimize**
   - Hyperparameter tuning, regularization, feature selection

8. **Deploy & Monitor**
   - Turn into usable tool or service
   - Monitor model drift and retrain as needed

**🖼️ Visual Aid:** ML workflow pipeline diagram (see slides)

**💬 Suggested Engagement:**
- Ask: “Which of these steps do you think takes the most time in real projects?”

## 🕒 00:20–00:30 – Quick Pair Discussion – ML Problem Classifier
**🎯 Objective:**
- Help learners apply the ML workflow to real-world scenarios

**📝 Activity:**
- Present short real-world scenarios:
  - Classify spam emails
  - Recommend books to users
  - Predict product return rates
  - Cluster patients based on symptoms

- In pairs, students identify:
  - ML type (supervised, unsupervised, etc.)
  - Features and labels
  - Key workflow stages

**💡 Follow-up:** 2–3 pairs present briefly, clarify any misconceptions.

## 🕒 00:30–00:45 – Tools and Environment Setup
**🎯 Objectives:**
- Ensure readiness for hands-on tasks
- Introduce Python, Jupyter/Colab, and ML libraries

**📚 Content:**
- How to launch Jupyter or open Google Colab
- Load a dataset (e.g., Iris or Titanic)
- Import libraries: `pandas`, `scikit-learn`, `matplotlib`, `seaborn`

**💬 Talk-through:**
- `pandas` = data manipulation
- `scikit-learn` = ML models
- `matplotlib` / `seaborn` = visualizations

**🛠️ Preparation Checklist:**
- Access to GitHub repo or Colab
- Python/Jupyter/Colab working
- “Hello ML” notebook available (load data, view `df.head()`)

**📌 Pro Tip:** Assign TA or assistant to help with technical issues.

## ✅ Session Wrap-Up
**Exit Reflection:**
- What step in the ML process are you most curious about?
- Use poll, sticky notes, or group discussion.

In [1]:
print("Hello LIFT!")

Hello LIFT!


In [22]:
name = "Valdis"
print("Hello", name)

Hello Valdis


In [6]:
print(f"Still {name}")

Still Voldermārs


In [5]:
name = "Voldermārs"

In [7]:
counter = 0
print(f"{name} is {counter} years old")

Voldermārs is 0 years old


In [23]:
counter += 1 # increase counter variable by one
print(f"{name} is now {counter} years old")

Valdis is now 15 years old


In [24]:
# Hello ML Template: Load Dataset and Preview
import pandas as pd

# Example: Load Iris dataset
from sklearn.datasets import load_iris
iris = load_iris(as_frame=True)
df = iris.frame
df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0


In [27]:
# Check basic info and summary
df.info()
df.describe() # gives us basics statistics about numerical columns in our dataframe

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   sepal length (cm)  150 non-null    float64
 1   sepal width (cm)   150 non-null    float64
 2   petal length (cm)  150 non-null    float64
 3   petal width (cm)   150 non-null    float64
 4   target             150 non-null    int64  
dtypes: float64(4), int64(1)
memory usage: 6.0 KB


Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
count,150.0,150.0,150.0,150.0,150.0
mean,5.843333,3.057333,3.758,1.199333,1.0
std,0.828066,0.435866,1.765298,0.762238,0.819232
min,4.3,2.0,1.0,0.1,0.0
25%,5.1,2.8,1.6,0.3,0.0
50%,5.8,3.0,4.35,1.3,1.0
75%,6.4,3.3,5.1,1.8,2.0
max,7.9,4.4,6.9,2.5,2.0
