# üéì Introduction to Machine Learning - Guided Practice

**Module 1: Machine Learning with Python**

---

## üìã Content

1. Environment configuration in Google Colab
2. Introduction to NumPy
3. Introduction to Pandas
4. Visualization with Matplotlib
5. First contact with Scikit-learn
6. Complete example: Simple prediction

---

## ‚ö†Ô∏è Important

- **Execute each cell in order** (Shift + Enter)
- **Read the comments** in the code
- **Experiment** modifying values
- If something fails, check the error and try again

---

## 1. Environment Configuration in Google Colab

### ‚úÖ Verify that we are in Colab

Google Colab already has the core libraries installed, but it's always good to check!

In [None]:
# Verificar si estamos en Google Colab
try:
    import google.colab
    print("‚úÖ Estamos en Google Colab")
    IN_COLAB = True
except:
    print("‚ö†Ô∏è  No estamos en Colab, pero el c√≥digo funcionar√° igual")
    IN_COLAB = False

In [None]:
# Verificar versiones de las bibliotecas principales
import sys
import numpy as np
import pandas as pd
import matplotlib
import sklearn

print("üìö Versiones de las bibliotecas:")
print(f"  Python: {sys.version.split()[0]}")
print(f"  NumPy: {np.__version__}")
print(f"  Pandas: {pd.__version__}")
print(f"  Matplotlib: {matplotlib.__version__}")
print(f"  Scikit-learn: {sklearn.__version__}")
print("\n‚úÖ Todas las bibliotecas est√°n instaladas y listas para usar")

### üé® Display Settings

We are going to configure Matplotlib so that the graphs look good on the notebook.

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Configuraci√≥n para gr√°ficos m√°s bonitos
%matplotlib inline
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")

# Configurar el tama√±o de las figuras
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['font.size'] = 12

print("‚úÖ Configuraci√≥n de visualizaci√≥n lista")

---

## 2. Introduction to NumPy

**NumPy** (Numerical Python) is the fundamental library for scientific computing in Python.

### Why NumPy?
- ‚ö° **Fast**: Vectorized operations in C
- üßÆ **Mathematics**: Linear algebra, statistics, etc.
- üî¢ **Multidimensional arrays**: Base for everything in ML

### NumPy Arrays

In [None]:
# Crear un array simple
lista_python = [1, 2, 3, 4, 5]
array_numpy = np.array([1, 2, 3, 4, 5])

print("Lista de Python:", lista_python)
print("Array de NumPy:", array_numpy)
print("\nTipo de dato:", type(array_numpy))
print("Forma (shape):", array_numpy.shape)

### Vectorized Operations

In NumPy, operations are applied to **all elements** automatically. This is much faster than using loops!

In [None]:
# Operaciones elemento por elemento
arr = np.array([1, 2, 3, 4, 5])

print("Array original:", arr)
print("Multiplicar por 2:", arr * 2)
print("Elevar al cuadrado:", arr ** 2)
print("Sumar 10:", arr + 10)

# ¬°Sin necesidad de loops!

In [None]:
# Operaciones entre arrays
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

print("Array 1:", arr1)
print("Array 2:", arr2)
print("\nSuma:", arr1 + arr2)
print("Multiplicaci√≥n:", arr1 * arr2)
print("Producto punto:", np.dot(arr1, arr2))

### Multidimensional Arrays

In ML, we work with **matrices** (2D arrays) all the time.

In [None]:
# Crear una matriz 2D (filas x columnas)
matriz = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])

print("Matriz:")
print(matriz)
print("\nForma (shape):", matriz.shape)  # (3 filas, 3 columnas)
print("N√∫mero de dimensiones:", matriz.ndim)
print("N√∫mero total de elementos:", matriz.size)

In [None]:
# Indexing y Slicing
print("Elemento en fila 0, columna 1:", matriz[0, 1])  # 2
print("Primera fila:", matriz[0, :])  # [1, 2, 3]
print("Segunda columna:", matriz[:, 1])  # [2, 5, 8]
print("\nSubmatriz 2x2:")
print(matriz[:2, :2])

### Useful Statistical Functions

In [None]:
datos = np.array([10, 20, 30, 40, 50])

print("Datos:", datos)
print("\nEstad√≠sticas:")
print(f"  Media (promedio): {np.mean(datos)}")
print(f"  Mediana: {np.median(datos)}")
print(f"  Desviaci√≥n est√°ndar: {np.std(datos):.2f}")
print(f"  M√≠nimo: {np.min(datos)}")
print(f"  M√°ximo: {np.max(datos)}")
print(f"  Suma: {np.sum(datos)}")

### üéØ Mini Exercise

Create an array with the numbers 1 to 10, square each number, and calculate the average.

In [None]:
# Tu c√≥digo aqu√≠
numeros = np.arange(1, 11)  # 1 al 10
cuadrados = numeros ** 2
promedio = np.mean(cuadrados)

print("N√∫meros:", numeros)
print("Cuadrados:", cuadrados)
print(f"Promedio de los cuadrados: {promedio}")

---

## 3. Introduction to Pandas

**Pandas** is the library for analysis and manipulation of tabular data.

### DataFrame: Python's Excel

A DataFrame is like a table with rows and columns.

In [None]:
# Crear un DataFrame desde un diccionario
datos = {
    'Nombre': ['Ana', 'Luis', 'Carlos', 'Mar√≠a', 'Pedro'],
    'Edad': [25, 32, 28, 45, 38],
    'Ciudad': ['Madrid', 'Barcelona', 'Madrid', 'Valencia', 'Barcelona'],
    'Salario': [30000, 45000, 35000, 55000, 42000]
}

df = pd.DataFrame(datos)
print("DataFrame:")
print(df)

### Basic Exploration

In [None]:
# Primeras filas
print("Primeras 3 filas:")
print(df.head(3))

In [None]:
# Informaci√≥n del DataFrame
print("Informaci√≥n del DataFrame:")
print(df.info())

In [None]:
# Estad√≠sticas descriptivas
print("Estad√≠sticas:")
print(df.describe())

### Data Selection

In [None]:
# Seleccionar una columna
print("Columna 'Nombre':")
print(df['Nombre'])

print("\nTipo:", type(df['Nombre']))  # Es una Serie de Pandas

In [None]:
# Seleccionar m√∫ltiples columnas
print("Nombre y Salario:")
print(df[['Nombre', 'Salario']])

In [None]:
# Filtrar filas con condiciones
print("Personas con salario > 40000:")
print(df[df['Salario'] > 40000])

In [None]:
# M√∫ltiples condiciones
print("Personas de Madrid con edad < 30:")
filtro = (df['Ciudad'] == 'Madrid') & (df['Edad'] < 30)
print(df[filtro])

### Useful Operations

In [None]:
# Agregar una nueva columna
df['Salario_Anual'] = df['Salario'] * 12
print("DataFrame con Salario Anual:")
print(df)

In [None]:
# Agrupar y calcular estad√≠sticas
print("Salario promedio por ciudad:")
print(df.groupby('Ciudad')['Salario'].mean())

In [None]:
# Ordenar por una columna
print("Ordenado por Edad (descendente):")
print(df.sort_values('Edad', ascending=False))

---

## 4. Visualization with Matplotlib

"A picture is worth a thousand words" - especially in Machine Learning.

### Line Chart

In [None]:
# Datos de ejemplo
x = np.linspace(0, 10, 100)
y = np.sin(x)

plt.figure(figsize=(10, 6))
plt.plot(x, y, linewidth=2, label='sen(x)')
plt.xlabel('x')
plt.ylabel('y')
plt.title('Funci√≥n Seno')
plt.grid(True, alpha=0.3)
plt.legend()
plt.show()

### Scatter Plot

In [None]:
# Datos aleatorios con correlaci√≥n
np.random.seed(42)
horas_estudio = np.random.uniform(1, 10, 50)
calificacion = 50 + 4 * horas_estudio + np.random.normal(0, 5, 50)

plt.figure(figsize=(10, 6))
plt.scatter(horas_estudio, calificacion, alpha=0.6, s=100)
plt.xlabel('Horas de Estudio')
plt.ylabel('Calificaci√≥n')
plt.title('Relaci√≥n entre Horas de Estudio y Calificaci√≥n')
plt.grid(True, alpha=0.3)
plt.show()

print(f"Correlaci√≥n: {np.corrcoef(horas_estudio, calificacion)[0,1]:.3f}")

### Histogram

In [None]:
# Distribuci√≥n de edades
edades = np.random.normal(35, 10, 1000)  # Media 35, desv. est. 10

plt.figure(figsize=(10, 6))
plt.hist(edades, bins=30, edgecolor='black', alpha=0.7)
plt.xlabel('Edad')
plt.ylabel('Frecuencia')
plt.title('Distribuci√≥n de Edades')
plt.axvline(edades.mean(), color='red', linestyle='--', linewidth=2, label=f'Media: {edades.mean():.1f}')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

### Subplots: Multiple Plots

In [None]:
# Crear 2x2 gr√°ficos
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

# Gr√°fico 1: L√≠nea
x = np.linspace(0, 10, 100)
axes[0, 0].plot(x, np.sin(x))
axes[0, 0].set_title('Seno')
axes[0, 0].grid(True, alpha=0.3)

# Gr√°fico 2: Coseno
axes[0, 1].plot(x, np.cos(x), color='orange')
axes[0, 1].set_title('Coseno')
axes[0, 1].grid(True, alpha=0.3)

# Gr√°fico 3: Scatter
axes[1, 0].scatter(np.random.randn(100), np.random.randn(100), alpha=0.5)
axes[1, 0].set_title('Datos Aleatorios')
axes[1, 0].grid(True, alpha=0.3)

# Gr√°fico 4: Histograma
axes[1, 1].hist(np.random.randn(1000), bins=30, edgecolor='black', alpha=0.7)
axes[1, 1].set_title('Distribuci√≥n Normal')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

---

## 5. First Contact with Scikit-learn

**Scikit-learn** is THE Machine Learning library in Python.

### Anatomy of a Model in Scikit-learn

All models in sklearn follow the same pattern:```python
# 1. Importar el modelo
from sklearn.algoritmo import Modelo

# 2. Crear instancia del modelo
modelo = Modelo(parametros)

# 3. Entrenar con datos
modelo.fit(X_train, y_train)

# 4. Hacer predicciones
predicciones = modelo.predict(X_test)

# 5. Evaluar
score = modelo.score(X_test, y_test)
```

### Example: Iris Dataset (ML Classic)

The Iris dataset contains measurements of flowers and their species. It's perfect for learning!

In [None]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report

# Cargar el dataset
iris = load_iris()
X = iris.data  # Features (4 medidas de las flores)
y = iris.target  # Labels (3 especies)

print("Informaci√≥n del dataset:")
print(f"  N√∫mero de muestras: {X.shape[0]}")
print(f"  N√∫mero de features: {X.shape[1]}")
print(f"  Clases: {iris.target_names}")
print(f"\nFeatures: {iris.feature_names}")

In [None]:
# Ver algunos ejemplos
df_iris = pd.DataFrame(X, columns=iris.feature_names)
df_iris['species'] = iris.target_names[y]
print("\nPrimeras 5 filas:")
print(df_iris.head())

In [None]:
# Visualizar relaci√≥n entre features
plt.figure(figsize=(10, 6))
colors = ['red', 'green', 'blue']
for i, species in enumerate(iris.target_names):
    mask = y == i
    plt.scatter(X[mask, 0], X[mask, 1], 
                c=colors[i], label=species, 
                alpha=0.6, s=100)

plt.xlabel('Sepal Length (cm)')
plt.ylabel('Sepal Width (cm)')
plt.title('Iris Dataset: Sepal Length vs Width')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

### Entrenar un Modelo Simple: K-Nearest Neighbors (KNN)


In [None]:
# 1. Dividir datos en entrenamiento y prueba
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

print(f"Datos de entrenamiento: {X_train.shape[0]} muestras")
print(f"Datos de prueba: {X_test.shape[0]} muestras")

In [None]:
# 2. Crear y entrenar el modelo
modelo = KNeighborsClassifier(n_neighbors=3)
modelo.fit(X_train, y_train)

print("‚úÖ Modelo entrenado exitosamente")

In [None]:
# 3. Hacer predicciones
y_pred = modelo.predict(X_test)

print("Primeras 10 predicciones:")
print(f"  Predichas: {y_pred[:10]}")
print(f"  Reales:    {y_test[:10]}")

In [None]:
# 4. Evaluar el modelo
accuracy = accuracy_score(y_test, y_pred)
print(f"\nüéØ Precisi√≥n (Accuracy): {accuracy:.2%}")

print("\nReporte de clasificaci√≥n:")
print(classification_report(y_test, y_pred, target_names=iris.target_names))

### üéâ You just trained your first ML model!

With just a few lines of code:
- ‚úÖ We load data
- ‚úÖ We divide into train/test
- ‚úÖ We train a model
- ‚úÖ We made predictions
- ‚úÖ We evaluate performance

And we got >90% accuracy! üöÄ

---

## 6. Complete Example: House Price Prediction

Let's create a more realistic example using **Linear Regression**.

### Scenario:
We want to predict the price of a house based on:
- Size (m¬≤)
- Number of rooms
- Age of the house

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Generar datos sint√©ticos realistas
np.random.seed(42)
n_casas = 200

# Features
tamano = np.random.uniform(50, 300, n_casas)  # 50-300 m¬≤
habitaciones = np.random.randint(1, 6, n_casas)  # 1-5 habitaciones
edad = np.random.uniform(0, 50, n_casas)  # 0-50 a√±os

# Precio basado en una f√≥rmula (con algo de ruido)
precio_base = 100000
precio = (
    precio_base +
    tamano * 1500 +  # $1500 por m¬≤
    habitaciones * 20000 -  # $20,000 por habitaci√≥n
    edad * 1000 +  # -$1000 por a√±o de antig√ºedad
    np.random.normal(0, 30000, n_casas)  # Ruido aleatorio
)

# Crear DataFrame
casas_df = pd.DataFrame({
    'Tama√±o_m2': tamano,
    'Habitaciones': habitaciones,
    'Edad_a√±os': edad,
    'Precio_USD': precio
})

print("Dataset de casas:")
print(casas_df.head(10))

In [None]:
# Estad√≠sticas descriptivas
print("\nEstad√≠sticas:")
print(casas_df.describe())

In [None]:
# Visualizar relaciones
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# Tama√±o vs Precio
axes[0].scatter(casas_df['Tama√±o_m2'], casas_df['Precio_USD'], alpha=0.5)
axes[0].set_xlabel('Tama√±o (m¬≤)')
axes[0].set_ylabel('Precio (USD)')
axes[0].set_title('Tama√±o vs Precio')
axes[0].grid(True, alpha=0.3)

# Habitaciones vs Precio
axes[1].scatter(casas_df['Habitaciones'], casas_df['Precio_USD'], alpha=0.5)
axes[1].set_xlabel('Habitaciones')
axes[1].set_ylabel('Precio (USD)')
axes[1].set_title('Habitaciones vs Precio')
axes[1].grid(True, alpha=0.3)

# Edad vs Precio
axes[2].scatter(casas_df['Edad_a√±os'], casas_df['Precio_USD'], alpha=0.5)
axes[2].set_xlabel('Edad (a√±os)')
axes[2].set_ylabel('Precio (USD)')
axes[2].set_title('Edad vs Precio')
axes[2].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

### Train the Linear Regression Model

In [None]:
# Preparar datos
X = casas_df[['Tama√±o_m2', 'Habitaciones', 'Edad_a√±os']].values
y = casas_df['Precio_USD'].values

# Dividir en train/test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

print(f"Entrenamiento: {len(X_train)} casas")
print(f"Prueba: {len(X_test)} casas")

In [None]:
# Crear y entrenar el modelo
modelo_casas = LinearRegression()
modelo_casas.fit(X_train, y_train)

print("‚úÖ Modelo entrenado")
print("\nCoeficientes aprendidos:")
print(f"  Tama√±o:       ${modelo_casas.coef_[0]:,.2f} por m¬≤")
print(f"  Habitaciones: ${modelo_casas.coef_[1]:,.2f} por habitaci√≥n")
print(f"  Edad:         ${modelo_casas.coef_[2]:,.2f} por a√±o")
print(f"  Intercepto:   ${modelo_casas.intercept_:,.2f}")

In [None]:
# Hacer predicciones
y_pred = modelo_casas.predict(X_test)

# Evaluar
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)

print("üìä M√©tricas de Evaluaci√≥n:")
print(f"  RMSE: ${rmse:,.2f}")
print(f"  R¬≤ Score: {r2:.3f}")
print(f"\nInterpretaci√≥n:")
print(f"  El modelo explica {r2*100:.1f}% de la varianza en los precios")
print(f"  Error promedio: ¬±${rmse:,.0f}")

In [None]:
# Visualizar predicciones vs valores reales
plt.figure(figsize=(10, 6))
plt.scatter(y_test, y_pred, alpha=0.5)
plt.plot([y_test.min(), y_test.max()], 
         [y_test.min(), y_test.max()], 
         'r--', linewidth=2, label='Predicci√≥n perfecta')
plt.xlabel('Precio Real (USD)')
plt.ylabel('Precio Predicho (USD)')
plt.title(f'Predicciones vs Realidad (R¬≤ = {r2:.3f})')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

### Prediction for a New House

In [None]:
# Casa nueva: 150 m¬≤, 3 habitaciones, 10 a√±os
casa_nueva = np.array([[150, 3, 10]])

precio_predicho = modelo_casas.predict(casa_nueva)[0]

print("üè† Predicci√≥n para casa nueva:")
print(f"  Caracter√≠sticas: 150 m¬≤, 3 habitaciones, 10 a√±os")
print(f"  Precio estimado: ${precio_predicho:,.2f}")

---

## üéì Summary of what was learned

### ‚úÖ You have learned:

1. **NumPy**: Arrays, vectorized operations, statistics
2. **Pandas**: DataFrames, filtering, grouping, analysis
3. **Matplotlib**: Line plots, scatter plots, histograms, subplots
4. **Scikit-learn**: Data loading, train/test split, models, prediction
5. **ML Pipeline**: From data to predictions

### üöÄ Next Steps:

1. Complete the **exercises** to practice
2. Experiment by modifying the examples
3. Continue with the next lesson: **Simple Linear Regression**

---

## üí° Important Tips

- üîÑ **Practice regularly**: ML is learned by doing
- üìä **Always view**: Charts reveal patterns
- üß™ **Experiment**: Change parameters and see what happens
- üìñ **Read documentation**: sklearn has great docs
- ü§ù **Collaborate**: Discuss with colleagues, learn from others

---

## üìö Additional Resources

- [NumPy Quickstart](https://numpy.org/doc/stable/user/quickstart.html)
- [Pandas 10 Minutes Tutorial](https://pandas.pydata.org/docs/user_guide/10min.html)
- [Matplotlib Tutorials](https://matplotlib.org/stable/tutorials/index.html)
- [Scikit-learn Getting Started](https://scikit-learn.org/stable/getting_started.html)

---

**Congratulations on completing the guided practice! üéâ**

Now you are ready for the exercises. Good luck! üí™