<a href="https://colab.research.google.com/github/axel-sirota/nwm-llm-program/blob/main/Week0/Week0_Basic_Python_and_ML_Exercises.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Basic Python and Machine Learning Review Exercises

In this notebook, you'll review key concepts in Python and machine learning. Each exercise is designed to help solidify your understanding by applying these concepts to practical tasks. Please fill in the code where indicated.

## Exercise 1: Python Functions, Loops, and Dictionaries

### Task: Write a function that takes a list of numbers and returns a dictionary where the keys are the numbers and the values are the squares of the numbers.
You need to iterate over the list using a loop and store the squares in a dictionary.

In [None]:
def square_dict(numbers):
    result = {}  # Create an empty dictionary
    for num in numbers:
        # Your code here: add the number as the key and its square as the value
        pass
    return result

numbers = [1, 2, 3, 4, 5]
print(square_dict(numbers))  # Expected output: {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}

## Exercise 2: Generators and Loops

### Task: Create a generator that yields squares of numbers from 1 to 10, and use a loop to print each generated value.
Generators use `yield` instead of `return` to produce a sequence of results lazily (one at a time).

In [None]:
def square_generator():
    for i in range(1, 11):
        yield None  # Replace None with the correct operation to yield squares of i

for value in square_generator():
    print(value)  # Expected output: 1, 4, 9, 16, ... 100

## Exercise 3: Linear Regression with California Housing Dataset

### Task: Load the California Housing dataset, perform a train-test split, and implement linear regression to predict housing prices.
You will use `sklearn` to load the dataset, split it into training and testing sets, and fit a linear regression model. You will also evaluate the model's performance.

Remember to:
- Split the data using `train_test_split`
- Fit the `LinearRegression` model
- Predict and evaluate using metrics like MSE and R2 score.

In [None]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# Load the data
california = fetch_california_housing()
X = california.data
y = california.target

# Split the data using train_test_split function
X_train, X_test, y_train, y_test = None

# Create the model
model = None
model.fit(X_train, y_train)

# Predict and evaluate
predictions = model.predict(X_test)
mse = None  # Calculate mean squared error here
r2 = None  # Calculate R2 score here
print(f"MSE: {mse}")
print(f"R2 Score: {r2}")

## Exercise 4: Logistic Regression with Iris Dataset

### Task: Load the Iris dataset, perform a train-test split, and implement logistic regression to classify the species of iris flowers.
You will use `LogisticRegression` to predict the species of iris based on the features. Remember to split the data first.

Steps:
- Load the Iris dataset
- Split the data into training and testing sets
- Train the logistic regression model
- Predict and evaluate the model's accuracy

In [None]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

# Load the data
iris = load_iris()
X = None  # Replace None with the correct operation to get the data
y = None  # Replace None with the correct operation to get the target

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create the model and fit it
model = None  # Replace None with the correct operation to create a LogisticRegression model

# Predict and evaluate
predictions = model.predict(X_test)
accuracy = None  # Calculate accuracy here
print(f"Accuracy: {accuracy}")

## Exercise 5: Clustering with PCA and K-Means

### Task: Use PCA to reduce the dimensions of the Iris dataset and then perform K-Means clustering.
First, apply PCA to reduce the feature dimensions, and then use K-Means to cluster the data into 3 clusters.

Steps:
- Apply PCA to reduce to 2 components
- Fit the K-Means clustering algorithm
- Predict and analyze the clusters

In [None]:
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans

# Apply PCA
pca = None  # Replace None with the correct operation to create a PCA model
X_pca = None  # Replace None with the correct operation to apply PCA to the data

# Perform K-Means clustering
kmeans = KMeans(n_clusters=3)
kmeans.fit(X_pca)

# Predict cluster labels
clusters = kmeans.predict(X_pca)
print(clusters)  # Analyze the cluster assignments