# Section 6 — Machine Learning & AI
This section covers common steps for building ML models in Python using **scikit-learn**, and introduces **Deep Learning** using **Keras + TensorFlow**.

## 6.1 Preprocessing
We often need to clean and prepare data before modeling. Common tasks:
- Handle missing values
- Scale/normalize numerical features
- Encode categorical variables
- Split data into training and testing sets

In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer

# Example synthetic dataset
df = pd.DataFrame({
    'age': [25, 32, 47, np.nan, 52, 36],
    'income': [50000, 60000, 80000, 72000, 90000, np.nan],
    'city': ['A', 'B', 'A', 'C', 'B', 'A'],
    'target': [0, 1, 0, 1, 1, 0]
})
df

In [None]:
# Define features and target
X = df[['age','income','city']]
y = df['target']

# Identify column types
num_features = ['age','income']
cat_features = ['city']

# Build preprocessing pipeline
num_transformer = Pipeline([
    ('imputer', SimpleImputer(strategy='mean')),
    ('scaler', StandardScaler())
])

cat_transformer = Pipeline([
    ('imputer', SimpleImputer(strategy='most_frequent')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))
])

preprocessor = ColumnTransformer([
    ('num', num_transformer, num_features),
    ('cat', cat_transformer, cat_features)
])

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
X_train_prepared = preprocessor.fit_transform(X_train)
X_train_prepared[:5]

## 6.2 Common Models
We can start with classic supervised models like linear regression, logistic regression, decision trees, etc.

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

clf = Pipeline([
    ('preprocessor', preprocessor),
    ('model', LogisticRegression())
])

clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
accuracy_score(y_test, y_pred)

## 6.3 Evaluation
We evaluate models using metrics like:
- Accuracy for classification
- MSE or RMSE for regression
- Cross-validation for robust performance estimates

In [None]:
from sklearn.model_selection import cross_val_score

scores = cross_val_score(clf, X, y, cv=3, scoring='accuracy')
scores.mean()

## 6.4 Intro to Deep Learning
### TensorFlow and Keras
- **TensorFlow** is a powerful numerical computation library and ML framework.
- **Keras** is a high-level API (built into TensorFlow) that makes it easier to build and train neural networks.

We’ll build a simple feed‑forward network using Keras inside TensorFlow to demonstrate the workflow.

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

# Generate random data for demonstration
np.random.seed(42)
X_train = np.random.rand(1000, 10)
y_train = np.random.randint(0, 2, size=(1000,))
X_test = np.random.rand(200, 10)
y_test = np.random.randint(0, 2, size=(200,))

# Define a simple neural network
model = keras.Sequential([
    layers.Dense(32, activation='relu', input_shape=(10,)),
    layers.Dense(16, activation='relu'),
    layers.Dense(1, activation='sigmoid')
])

# Compile the model
model.compile(optimizer='adam',
              loss='binary_crossentropy',
              metrics=['accuracy'])

# Train the model
history = model.fit(X_train, y_train, epochs=5, batch_size=32, validation_split=0.2, verbose=1)

In [None]:
# Evaluate the model
loss, acc = model.evaluate(X_test, y_test, verbose=0)
print(f'Test Accuracy: {acc:.2f}')