# ML/AI Fundamentals - Your First Steps

Welcome to your ML journey! This notebook covers essential concepts every ML practitioner needs.

## 🎯 Learning Objectives
- Master linear algebra with NumPy
- Understand probability and statistics
- Build your first ML model
- Visualize data effectively

## 1. Linear Algebra Foundations

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report
import seaborn as sns

# Set style for better plots
plt.style.use('seaborn-v0_8')
np.random.seed(42)

### Vectors and Matrices

In [None]:
# Vectors (1D arrays)
vector_a = np.array([1, 2, 3])
vector_b = np.array([4, 5, 6])

print("Vector A:", vector_a)
print("Vector B:", vector_b)

# Vector operations
print("\nVector Operations:")
print("Addition:", vector_a + vector_b)
print("Dot product:", np.dot(vector_a, vector_b))
print("Magnitude of A:", np.linalg.norm(vector_a))

In [None]:
# Matrices (2D arrays)
matrix_A = np.array([[1, 2, 3],
                     [4, 5, 6],
                     [7, 8, 9]])

matrix_B = np.array([[9, 8, 7],
                     [6, 5, 4],
                     [3, 2, 1]])

print("Matrix A:")
print(matrix_A)
print("\nMatrix B:")
print(matrix_B)

# Matrix operations
print("\nMatrix multiplication:")
print(np.dot(matrix_A, matrix_B))

print("\nTranspose of A:")
print(matrix_A.T)

### 🧠 Exercise 1: Vector Operations
Create two vectors of your choice and compute:
1. Their dot product
2. The angle between them (hint: use dot product formula)
3. Their cross product (for 3D vectors)

In [None]:
# Your code here
# Exercise 1 solution space

## 2. Probability and Statistics

In [None]:
# Generate sample data
np.random.seed(42)
data = np.random.normal(100, 15, 1000)  # Mean=100, Std=15, n=1000

# Basic statistics
print("Descriptive Statistics:")
print(f"Mean: {np.mean(data):.2f}")
print(f"Median: {np.median(data):.2f}")
print(f"Standard Deviation: {np.std(data):.2f}")
print(f"Variance: {np.var(data):.2f}")

# Visualize distribution
plt.figure(figsize=(12, 4))

plt.subplot(1, 2, 1)
plt.hist(data, bins=30, density=True, alpha=0.7, color='skyblue')
plt.title('Distribution of Data')
plt.xlabel('Value')
plt.ylabel('Density')

plt.subplot(1, 2, 2)
plt.boxplot(data)
plt.title('Box Plot')
plt.ylabel('Value')

plt.tight_layout()
plt.show()

### Probability Distributions

In [None]:
from scipy import stats

# Different distributions
x = np.linspace(-4, 4, 100)

plt.figure(figsize=(15, 5))

# Normal distribution
plt.subplot(1, 3, 1)
y_normal = stats.norm.pdf(x, 0, 1)
plt.plot(x, y_normal, 'b-', linewidth=2, label='Normal(0,1)')
plt.title('Normal Distribution')
plt.legend()

# Exponential distribution
plt.subplot(1, 3, 2)
x_exp = np.linspace(0, 5, 100)
y_exp = stats.expon.pdf(x_exp, scale=1)
plt.plot(x_exp, y_exp, 'r-', linewidth=2, label='Exponential(1)')
plt.title('Exponential Distribution')
plt.legend()

# Uniform distribution
plt.subplot(1, 3, 3)
x_uniform = np.linspace(-2, 2, 100)
y_uniform = stats.uniform.pdf(x_uniform, -1, 2)
plt.plot(x_uniform, y_uniform, 'g-', linewidth=2, label='Uniform(-1,1)')
plt.title('Uniform Distribution')
plt.legend()

plt.tight_layout()
plt.show()

## 3. Your First Machine Learning Model

In [None]:
# Create a synthetic dataset
X, y = make_classification(n_samples=1000, n_features=2, n_redundant=0, 
                          n_informative=2, n_clusters_per_class=1, random_state=42)

# Convert to DataFrame for easier handling
df = pd.DataFrame(X, columns=['Feature_1', 'Feature_2'])
df['Target'] = y

print("Dataset shape:", df.shape)
print("\nFirst 5 rows:")
print(df.head())

print("\nClass distribution:")
print(df['Target'].value_counts())

In [None]:
# Visualize the data
plt.figure(figsize=(10, 6))

# Scatter plot colored by class
colors = ['red', 'blue']
for i, color in enumerate(colors):
    mask = df['Target'] == i
    plt.scatter(df[mask]['Feature_1'], df[mask]['Feature_2'], 
               c=color, label=f'Class {i}', alpha=0.6)

plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Binary Classification Dataset')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

### Train Your First Model

In [None]:
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"Training set size: {X_train.shape[0]}")
print(f"Test set size: {X_test.shape[0]}")

# Create and train the model
model = LogisticRegression(random_state=42)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)

print(f"\nModel Accuracy: {accuracy:.3f}")
print("\nDetailed Classification Report:")
print(classification_report(y_test, y_pred))

In [None]:
# Visualize the decision boundary
def plot_decision_boundary(X, y, model, title):
    plt.figure(figsize=(10, 8))
    
    # Create a mesh
    h = 0.01
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                        np.arange(y_min, y_max, h))
    
    # Make predictions on the mesh
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    
    # Plot the decision boundary
    plt.contourf(xx, yy, Z, alpha=0.4, cmap=plt.cm.RdYlBu)
    
    # Plot the data points
    colors = ['red', 'blue']
    for i, color in enumerate(colors):
        mask = y == i
        plt.scatter(X[mask, 0], X[mask, 1], c=color, label=f'Class {i}', alpha=0.8)
    
    plt.xlabel('Feature 1')
    plt.ylabel('Feature 2')
    plt.title(title)
    plt.legend()
    plt.show()

# Plot decision boundary for test data
plot_decision_boundary(X_test, y_test, model, 'Logistic Regression Decision Boundary')

## 4. Key ML Concepts

### Bias-Variance Tradeoff Demonstration

In [None]:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
from sklearn.metrics import mean_squared_error

# Generate regression data
np.random.seed(42)
n_samples = 100
X_reg = np.random.uniform(-1, 1, n_samples).reshape(-1, 1)
y_reg = 1.5 * X_reg.ravel() + 0.5 * X_reg.ravel()**2 + np.random.normal(0, 0.1, n_samples)

# Split the data
X_train_reg, X_test_reg, y_train_reg, y_test_reg = train_test_split(
    X_reg, y_reg, test_size=0.3, random_state=42)

# Test different polynomial degrees
degrees = [1, 2, 5, 10, 15]
train_errors = []
test_errors = []

plt.figure(figsize=(15, 10))

for i, degree in enumerate(degrees):
    # Create polynomial features
    poly_model = Pipeline([
        ('poly', PolynomialFeatures(degree=degree)),
        ('linear', LinearRegression())
    ])
    
    # Fit the model
    poly_model.fit(X_train_reg, y_train_reg)
    
    # Predictions
    y_train_pred = poly_model.predict(X_train_reg)
    y_test_pred = poly_model.predict(X_test_reg)
    
    # Calculate errors
    train_error = mean_squared_error(y_train_reg, y_train_pred)
    test_error = mean_squared_error(y_test_reg, y_test_pred)
    
    train_errors.append(train_error)
    test_errors.append(test_error)
    
    # Plot
    plt.subplot(2, 3, i+1)
    X_plot = np.linspace(-1, 1, 100).reshape(-1, 1)
    y_plot = poly_model.predict(X_plot)
    
    plt.scatter(X_train_reg, y_train_reg, alpha=0.6, color='blue', label='Train')
    plt.scatter(X_test_reg, y_test_reg, alpha=0.6, color='red', label='Test')
    plt.plot(X_plot, y_plot, color='green', linewidth=2)
    plt.title(f'Degree {degree}\nTrain MSE: {train_error:.3f}, Test MSE: {test_error:.3f}')
    plt.legend()

# Plot bias-variance tradeoff
plt.subplot(2, 3, 6)
plt.plot(degrees, train_errors, 'o-', label='Training Error', color='blue')
plt.plot(degrees, test_errors, 'o-', label='Test Error', color='red')
plt.xlabel('Polynomial Degree')
plt.ylabel('Mean Squared Error')
plt.title('Bias-Variance Tradeoff')
plt.legend()
plt.yscale('log')

plt.tight_layout()
plt.show()

## 🧠 Exercise 2: Model Comparison
Try different models on our classification dataset:
1. Decision Tree
2. Random Forest
3. Support Vector Machine

Compare their accuracies and visualize their decision boundaries.

In [None]:
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC

# Your code here
# Exercise 2 solution space
models = {
    'Decision Tree': DecisionTreeClassifier(random_state=42),
    'Random Forest': RandomForestClassifier(random_state=42),
    'SVM': SVC(random_state=42)
}

# Train and evaluate each model
results = {}
for name, model in models.items():
    # Fit model
    model.fit(X_train, y_train)
    
    # Predict
    y_pred = model.predict(X_test)
    
    # Calculate accuracy
    accuracy = accuracy_score(y_test, y_pred)
    results[name] = accuracy
    
    print(f"{name} Accuracy: {accuracy:.3f}")

# Plot comparison
plt.figure(figsize=(10, 6))
models_list = list(results.keys())
accuracies = list(results.values())

plt.bar(models_list, accuracies, color=['skyblue', 'lightgreen', 'lightcoral'])
plt.title('Model Comparison')
plt.ylabel('Accuracy')
plt.ylim(0, 1)
for i, v in enumerate(accuracies):
    plt.text(i, v + 0.01, f'{v:.3f}', ha='center')
plt.show()

## 🎯 Key Takeaways

Congratulations! You've completed your first ML notebook. Here's what you learned:

### ✅ Mathematical Foundations
- Vector and matrix operations are fundamental to ML
- Understanding probability distributions helps in data analysis
- Statistics provide insights into data quality and model performance

### ✅ Machine Learning Basics
- ML is about finding patterns in data
- Train/test split prevents overfitting
- Different algorithms have different strengths

### ✅ Bias-Variance Tradeoff
- Simple models might underfit (high bias)
- Complex models might overfit (high variance)
- The goal is finding the right balance

## 🚀 Next Steps

1. **Practice**: Try the exercises above
2. **Explore**: Modify the code and see what happens
3. **Read**: Review the concepts you found challenging
4. **Move On**: Ready for Module 02 - Python ML Stack!

## 📚 Additional Resources

- [Linear Algebra Review](https://www.khanacademy.org/math/linear-algebra)
- [Statistics and Probability](https://www.coursera.org/learn/stanford-statistics)
- [Scikit-learn Documentation](https://scikit-learn.org/stable/)
- [NumPy Tutorial](https://numpy.org/learn/)

Great job! You're well on your way to becoming an ML engineer! 🎉