# üìù Exercises: Introduction to Machine Learning

**Module 1: Machine Learning with Python**

---

## üéØ Goals

These exercises will help you consolidate what you have learned about:
- NumPy and array operations
- Pandas and data manipulation
- Visualization with Matplotlib
- First Machine Learning model

---

## üìã Instructions

1. Read each exercise carefully
2. Write your code in the cells provided
3. Run and check your results
4. If you get stuck, check out the guided practice notebook
5. **Don't see the solutions until you've tried!**

---

## ‚öôÔ∏è Initial Configuration

In [None]:
# Importar bibliotecas necesarias
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Configuraci√≥n de visualizaci√≥n
%matplotlib inline
plt.style.use('seaborn-v0_8-darkgrid')
plt.rcParams['figure.figsize'] = (10, 6)

# Semilla para reproducibilidad
np.random.seed(42)

print("‚úÖ Bibliotecas importadas correctamente")

---

## Exercise 1: Operations with NumPy (‚≠ê)

### Context:
You have the monthly sales (in thousands of dollars) of a store for a year.

### Tasks:
1. Create an array with the sales: `[45, 52, 48, 61, 55, 58, 62, 70, 65, 68, 72, 80]`
2. Calculate:
   - Average sales
   - Medium
   - Standard deviation
   - Minimum and maximum sales
3. Calculate the percentage increase from the first to the last month
4. Create a new array with the normalized sales (subtract the mean and divide by the standard dev.)

In [None]:
# Tu c√≥digo aqu√≠


---

## Exercise 2: Data Analysis with Pandas (‚≠ê‚≠ê)

### Context:
You have data on students with their grades in different subjects.

### Data:

In [None]:
# Ejecuta esta celda para crear el DataFrame
datos_estudiantes = {
    'Nombre': ['Ana', 'Luis', 'Carlos', 'Mar√≠a', 'Pedro', 'Laura', 'Jos√©', 'Carmen', 'Miguel', 'Sara'],
    'Matem√°ticas': [85, 78, 92, 88, 76, 95, 82, 90, 84, 91],
    'F√≠sica': [82, 75, 88, 85, 79, 92, 80, 87, 83, 89],
    'Qu√≠mica': [88, 80, 90, 92, 82, 94, 85, 91, 86, 93],
    'Horas_Estudio': [15, 10, 20, 18, 12, 22, 14, 19, 16, 21]
}

df_estudiantes = pd.DataFrame(datos_estudiantes)
print(df_estudiantes)

### Tasks:

1. Calculate the average of each student (average of the 3 subjects) and add it as a new column called ``Average'`
2. Find the student with the best GPA
3. Filter students with an average greater than 85
4. Calculate the correlation between `Study_Hours` and `Average`
5. Sort the DataFrame by `Average` from highest to lowest

In [None]:
# Tu c√≥digo aqu√≠ - Tarea 1: Calcular promedio


In [None]:
# Tu c√≥digo aqu√≠ - Tarea 2: Mejor estudiante


In [None]:
# Tu c√≥digo aqu√≠ - Tarea 3: Filtrar estudiantes


In [None]:
# Tu c√≥digo aqu√≠ - Tarea 4: Correlaci√≥n


In [None]:
# Tu c√≥digo aqu√≠ - Tarea 5: Ordenar


---

## Exercise 3: Visualization with Matplotlib (‚≠ê‚≠ê)

### Context:
Using the student DataFrame from the previous exercise.

### Tasks:

1. Create a **bar graph** showing each student's average
2. Create a **scatter plot** of `Study_Hours` vs `Average` and add a trend line
3. Create a **histogram** of the averages with 5 bins
4. Create a figure with 2x2 subplots showing:
   - Average Bar Chart
   - Study_Hours Scatter vs Average
   - Histogram of averages
   - Line graph showing the Math scores of all students

In [None]:
# Tu c√≥digo aqu√≠ - Tarea 1: Gr√°fico de barras


In [None]:
# Tu c√≥digo aqu√≠ - Tarea 2: Scatter plot con tendencia
# Pista: usa np.polyfit() y np.poly1d() para la l√≠nea de tendencia


In [None]:
# Tu c√≥digo aqu√≠ - Tarea 3: Histograma


In [None]:
# Tu c√≥digo aqu√≠ - Tarea 4: Subplots 2x2


---

## Exercise 4: Machine Learning Model (‚≠ê‚≠ê‚≠ê)

### Context:
We are going to create a model that predicts the sales of a product based on advertising spending.

### Data:

In [None]:
# Ejecuta esta celda para generar los datos
np.random.seed(42)

# Gasto en publicidad (miles de d√≥lares)
gasto_publicidad = np.random.uniform(10, 100, 100)

# Ventas = 50 + 3 * publicidad + ruido
ventas = 50 + 3 * gasto_publicidad + np.random.normal(0, 20, 100)

# Crear DataFrame
df_marketing = pd.DataFrame({
    'Gasto_Publicidad': gasto_publicidad,
    'Ventas': ventas
})

print("Dataset de Marketing:")
print(df_marketing.head())
print(f"\nTotal de registros: {len(df_marketing)}")

### Tasks:

1. **Exploration**:
   - Calculate descriptive statistics
   - Create a scatter plot of Advertising_Expense vs Sales
   - Calculate the correlation between both variables

2. **Data Preparation**:
   - Separates features (X) and target (y)
   - Divided into train (80%) and test (20%)

3. **Training**:
   - Create a Linear Regression model
   - Train it with training data
   - Shows the coefficient and the intercept

4. **Evaluation**:
   - Make predictions on the test set
   - Calculate RMSE and R¬≤
   - Create a graph showing:
     * Real points (scatter)
     * The regression line

5. **Prediction**:
   - How many sales would you expect with an expenditure of $50,000 on advertising?

In [None]:
# Tu c√≥digo aqu√≠ - Tarea 1: Exploraci√≥n


In [None]:
# Tu c√≥digo aqu√≠ - Tarea 2: Preparaci√≥n


In [None]:
# Tu c√≥digo aqu√≠ - Tarea 3: Entrenamiento


In [None]:
# Tu c√≥digo aqu√≠ - Tarea 4: Evaluaci√≥n


In [None]:
# Tu c√≥digo aqu√≠ - Tarea 5: Predicci√≥n


---

## Exercise 5: Mini Project - Temperature Analysis (‚≠ê‚≠ê‚≠ê)

### Context:
You have temperature data for a city over several days and want to analyze patterns.

### Data:

In [None]:
# Ejecuta esta celda para generar los datos
np.random.seed(42)

# 90 d√≠as de datos
dias = np.arange(1, 91)

# Temperatura con tendencia estacional (simulando primavera)
temperatura_base = 15 + 0.2 * dias  # Incremento gradual
variacion_diaria = 5 * np.sin(dias * 2 * np.pi / 7)  # Variaci√≥n semanal
ruido = np.random.normal(0, 2, 90)
temperatura = temperatura_base + variacion_diaria + ruido

df_temperatura = pd.DataFrame({
    'Dia': dias,
    'Temperatura_C': temperatura
})

print("Dataset de Temperatura:")
print(df_temperatura.head(10))

### Tasks:

**Part 1: Exploratory Analysis**
1. Calculate:
   - Average, minimum and maximum temperature
   - Standard deviation
2. Create a line graph showing temperature over time
3. Add a horizontal line for the average temperature

**Part 2: Statistical Analysis**
1. Create a ``Week'` column grouping every 7 days
2. Calculate the average temperature per week
3. Create a bar chart with the weekly average

**Part 3: Prediction**
1. Train a linear regression model to predict temperature based on day
2. Evaluate the model (RMSE and R¬≤)
3. Predict the temperature for days 91, 92, 93, 94, 95
4. Create a graph showing:
   - Historical data (blue dots)
   - Predictions (red dots)
   - Trend line

**Part 4: Insights**
1. Is the temperature increasing or decreasing?
2. How reliable is the model for future predictions?
3. What limitations does this model have?

In [None]:
# Parte 1: An√°lisis Exploratorio


In [None]:
# Parte 2: An√°lisis Estad√≠stico


In [None]:
# Parte 3: Predicci√≥n


**Part 4: Answer your insights here:**

1. Temperature trend:
   - [Your answer here]

2. Model reliability:
   - [Your answer here]

3. Limitations:
   - [Your answer here]

---

## üéØ Self-assessment

Before looking at the solutions, check:

### Exercise 1: NumPy
- [ ] I correctly calculated all statistics
- [ ] I understand what normalization is
- [ ] I can explain the percentage increase

### Exercise 2: Pandas
- [ ] I created the average column correctly
- [ ] I can filter data with conditions
- [ ] I understand what correlation means

### Exercise 3: Visualization
- [ ] My charts have titles and labels
- [ ] I can create different types of charts
- [ ] I understand when to use each type

### Exercise 4: Simple ML Model
- [ ] I split train/test correctly
- [ ] I trained and evaluated the model
- [ ] I can interpret RMSE and R¬≤

### Exercise 5: Project
- [ ] I completed all parts
- [ ] My graphics are informative
- [ ] I can explain the limitations

---

## üöÄ Next Steps

1. **If you completed everything**: Excellent! Review solutions for alternative approaches
2. **If you got stuck**: Okay, review the theoretical material and guided practice
3. **If you want more**: Try modifying the exercises with your own data

---

## üí° Final Tips

- üìä Visualization is key: always graph your data
- üîç Explore before you model: understand your data first
- üß™ Experiment: change parameters and observe
- üìñ Read errors calmly: Python tells you what's wrong
- üí™ Practice makes perfect: keep practicing!

---

**Good luck with the exercises! üéì**