# 30 Days of Machine Learning

Welcome to the **30 Days of Machine Learning** project! This Jupyter Notebook guides you through 30 days of hands-on machine learning projects, from beginner to intermediate levels. Each day focuses on a specific concept, technique, or project, building your skills progressively.

## How to Use This Notebook
- **Daily Tasks**: Each day has a markdown cell with objectives, resources, and steps, followed by code cells for implementation.
- **Code Cells**: All days now include complete code examples.
- **Resources**: Links to datasets (e.g., Kaggle) and tutorials are provided.
- **Progress Tracking**: Run and modify code cells as you complete each task. Save your work regularly.
- **Sharing**: On Day 30, share your projects on GitHub or Kaggle.

## Prerequisites
- Python 3.x
- Libraries: NumPy, Pandas, Scikit-learn, TensorFlow/Keras, Matplotlib, Seaborn, Flask, NLTK, Rasa, XGBoost
- Install dependencies: `pip install numpy pandas scikit-learn tensorflow matplotlib seaborn flask nltk rasa xgboost`

Let's get started!

## Day 1: Introduction to Python for ML
**Objective**: Set up your environment and learn NumPy and Pandas basics.
**Resources**: [NumPy Docs](https://numpy.org/doc/stable/), [Pandas Docs](https://pandas.pydata.org/docs/)
**Steps**:
1. Install Python, Jupyter, and required libraries.
2. Create NumPy arrays and perform basic operations.
3. Load a CSV file with Pandas and explore its structure.

In [None]:
import numpy as np
import pandas as pd

# Create a 2D NumPy array
array = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print("2D Array:\n", array)

# Perform basic operations
array_sum = np.sum(array)
array_mean = np.mean(array)
print("Sum:", array_sum)
print("Mean:", array_mean)

# Create a sample DataFrame (simulating a CSV)
data = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'London', 'Paris']
})
print("\nFirst 5 rows of DataFrame:\n", data.head())

## Day 2: Exploratory Data Analysis (EDA)
**Objective**: Perform EDA on a Kaggle dataset.
**Dataset**: [Titanic](https://www.kaggle.com/c/titanic/data)
**Steps**:
1. Load the dataset with Pandas.
2. Visualize distributions with Matplotlib/Seaborn.
3. Identify missing values and correlations.

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Load Titanic dataset (download from Kaggle or use sample data)
# For this example, we'll create a simplified dataset
data = pd.DataFrame({
    'Survived': [0, 1, 1, 0, 1],
    'Pclass': [3, 1, 2, 3, 1],
    'Age': [22, 38, 26, np.nan, 35],
    'Fare': [7.25, 71.83, 13.0, 8.05, 53.1]
})

# Summarize missing values
print("Missing Values:\n", data.isnull().sum())

# Plot histograms
data['Age'].hist(bins=10)
plt.title('Age Distribution')
plt.xlabel('Age')
plt.ylabel('Count')
plt.show()

# Correlation heatmap
sns.heatmap(data.corr(), annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()

## Day 3: Linear Regression
**Objective**: Build a linear regression model to predict house prices.
**Dataset**: [House Prices](https://www.kaggle.com/c/house-prices-advanced-regression-techniques)
**Steps**:
1. Load and preprocess the dataset.
2. Train a linear regression model.
3. Evaluate with RMSE and visualize predictions.

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt

# Load dataset (replace with actual path)
# For this example, we'll simulate a simple dataset
np.random.seed(42)
X = np.random.rand(100, 1) * 10  # Square footage
y = 50 + 30 * X + np.random.randn(100, 1) * 10  # Price

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = LinearRegression()
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print(f'RMSE: {rmse:.2f}')

# Visualize
plt.scatter(X_test, y_test, color='blue', label='Actual')
plt.plot(X_test, y_pred, color='red', label='Predicted')
plt.xlabel('Square Footage')
plt.ylabel('Price')
plt.legend()
plt.show()

## Day 4: Data Preprocessing
**Objective**: Handle missing values and encode categorical data.
**Dataset**: Titanic or any dataset with missing values
**Steps**:
1. Impute missing numerical values (mean/median).
2. Encode categorical variables (One-Hot or Label Encoding).
3. Scale numerical features.

In [None]:
import pandas as pd
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import OneHotEncoder, StandardScaler

# Create sample dataset (simulating Titanic)
data = pd.DataFrame({
    'Age': [22, 38, np.nan, 35],
    'Sex': ['male', 'female', 'male', 'female'],
    'Fare': [7.25, 71.83, 8.05, 53.1]
})

# Impute missing values
imputer = SimpleImputer(strategy='mean')
data['Age'] = imputer.fit_transform(data[['Age']])

# Encode categorical variables
encoder = OneHotEncoder(sparse=False, drop='first')
sex_encoded = encoder.fit_transform(data[['Sex']])
data['Sex_male'] = sex_encoded[:, 0]
data = data.drop('Sex', axis=1)

# Scale numerical features
scaler = StandardScaler()
data[['Age', 'Fare']] = scaler.fit_transform(data[['Age', 'Fare']])

print("Preprocessed Data:\n", data)

## Day 5: Logistic Regression
**Objective**: Build a binary classification model.
**Dataset**: [Iris](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_iris.html)
**Steps**:
1. Load Iris dataset (binary subset).
2. Train logistic regression model.
3. Evaluate with accuracy and confusion matrix.

In [None]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt

# Load Iris dataset (binary subset: Setosa vs. Versicolor)
iris = load_iris()
X = iris.data[iris.target != 2][:, :2]  # Use first two features
y = iris.target[iris.target != 2]

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train logistic regression
model = LogisticRegression()
model.fit(X_train, y_train)

# Evaluate
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
cm = confusion_matrix(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')

# Visualize confusion matrix
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.title('Confusion Matrix')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.show()

## Day 6: Decision Trees
**Objective**: Use decision trees for classification.
**Dataset**: [Wine](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_wine.html)
**Steps**:
1. Load Wine dataset.
2. Train a decision tree classifier.
3. Visualize the tree.

In [None]:
from sklearn.datasets import load_wine
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

# Load Wine dataset
wine = load_wine()
X, y = wine.data, wine.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train decision tree
model = DecisionTreeClassifier(max_depth=3, random_state=42)
model.fit(X_train, y_train)

# Visualize tree
plt.figure(figsize=(12, 8))
plot_tree(model, feature_names=wine.feature_names, class_names=wine.target_names, filled=True)
plt.title('Decision Tree')
plt.show()

## Day 7: Random Forest
**Objective**: Improve classification with Random Forest.
**Dataset**: Wine or Titanic
**Steps**:
1. Train a Random Forest classifier.
2. Compare performance with decision tree.
3. Analyze feature importance.

In [None]:
from sklearn.datasets import load_wine
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
import numpy as np

# Load Wine dataset
wine = load_wine()
X, y = wine.data, wine.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train Decision Tree
dt_model = DecisionTreeClassifier(random_state=42)
dt_model.fit(X_train, y_train)
dt_pred = dt_model.predict(X_test)
dt_accuracy = accuracy_score(y_test, dt_pred)

# Train Random Forest
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
rf_model.fit(X_train, y_train)
rf_pred = rf_model.predict(X_test)
rf_accuracy = accuracy_score(y_test, rf_pred)

print(f'Decision Tree Accuracy: {dt_accuracy:.2f}')
print(f'Random Forest Accuracy: {rf_accuracy:.2f}')

# Feature importance
importances = rf_model.feature_importances_
indices = np.argsort(importances)[::-1]
plt.bar(range(X.shape[1]), importances[indices], align='center')
plt.xticks(range(X.shape[1]), np.array(wine.feature_names)[indices], rotation=90)
plt.title('Feature Importance')
plt.show()

## Day 8: K-Nearest Neighbors (KNN)
**Objective**: Implement KNN for classification.
**Dataset**: Iris
**Steps**:
1. Train KNN model.
2. Experiment with different k values.
3. Evaluate performance.

In [None]:
from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt

# Load Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Experiment with different k values
k_values = range(1, 21)
accuracies = []

for k in k_values:
    model = KNeighborsClassifier(n_neighbors=k)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    accuracies.append(accuracy_score(y_test, y_pred))

# Plot accuracies
plt.plot(k_values, accuracies, marker='o')
plt.title('KNN Accuracy vs. k')
plt.xlabel('k')
plt.ylabel('Accuracy')
plt.show()

# Train final model with best k
best_k = k_values[np.argmax(accuracies)]
model = KNeighborsClassifier(n_neighbors=best_k)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(f'Best k: {best_k}, Accuracy: {accuracy_score(y_test, y_pred):.2f}')

## Day 9: Support Vector Machines (SVM)
**Objective**: Use SVM for classification.
**Dataset**: Iris
**Steps**:
1. Train SVM with different kernels.
2. Evaluate performance.
3. Visualize decision boundaries (for 2D data).

In [None]:
from sklearn.datasets import load_iris
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import numpy as np
import matplotlib.pyplot as plt

# Load Iris dataset (use 2 features for visualization)
iris = load_iris()
X = iris.data[:, :2]  # First two features
y = iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train SVM with different kernels
kernels = ['linear', 'rbf']
for kernel in kernels:
    model = SVC(kernel=kernel, random_state=42)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    print(f'{kernel} kernel Accuracy: {accuracy_score(y_test, y_pred):.2f}')

# Visualize decision boundaries (linear kernel)
model = SVC(kernel='linear', random_state=42)
model.fit(X_train, y_train)

x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.02), np.arange(y_min, y_max, 0.02))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.8)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.coolwarm, edgecolors='k')
plt.title('SVM Decision Boundaries (Linear Kernel)')
plt.xlabel('Sepal Length')
plt.ylabel('Sepal Width')
plt.show()

## Day 10: K-Means Clustering
**Objective**: Perform clustering on customer data.
**Dataset**: [Mall Customers](https://www.kaggle.com/vjchoudhary7/customer-segmentation-tutorial-in-python)
**Steps**:
1. Load dataset and select features.
2. Apply K-Means clustering.
3. Visualize clusters and evaluate with silhouette score.

In [None]:
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
import numpy as np
import matplotlib.pyplot as plt

# Simulate Mall Customers dataset
np.random.seed(42)
X = np.vstack([
    np.random.normal([30, 30000], [5, 5000], (50, 2)),
    np.random.normal([40, 60000], [5, 10000], (50, 2)),
    np.random.normal([50, 90000], [5, 15000], (50, 2))
])

# Apply K-Means
kmeans = KMeans(n_clusters=3, random_state=42)
labels = kmeans.fit_predict(X)

# Evaluate with silhouette score
silhouette = silhouette_score(X, labels)
print(f'Silhouette Score: {silhouette:.2f}')

# Visualize clusters
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], c='red', marker='x', s=200, label='Centroids')
plt.title('K-Means Clustering')
plt.xlabel('Age')
plt.ylabel('Annual Income')
plt.legend()
plt.show()

## Day 11: Principal Component Analysis (PCA)
**Objective**: Reduce dimensionality with PCA.
**Dataset**: Iris
**Steps**:
1. Apply PCA to reduce dimensions.
2. Visualize explained variance.
3. Train a classifier on reduced data.

In [None]:
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt

# Load Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Apply PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)

# Visualize explained variance
print(f'Explained Variance Ratio: {pca.explained_variance_ratio_}')
plt.bar(range(1, 3), pca.explained_variance_ratio_, tick_label=['PC1', 'PC2'])
plt.title('Explained Variance by Principal Components')
plt.ylabel('Variance Ratio')
plt.show()

# Train classifier on reduced data
X_train, X_test, y_train, y_test = train_test_split(X_pca, y, test_size=0.2, random_state=42)
model = LogisticRegression(random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(f'Accuracy on PCA data: {accuracy_score(y_test, y_pred):.2f}')

## Day 12: Gradient Boosting with XGBoost
**Objective**: Use XGBoost for classification.
**Dataset**: Titanic
**Steps**:
1. Train XGBoost model.
2. Evaluate performance.
3. Analyze feature importance.

In [None]:
from xgboost import XGBClassifier, plot_importance
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import pandas as pd
import matplotlib.pyplot as plt

# Simulate Titanic dataset
data = pd.DataFrame({
    'Pclass': [1, 2, 3, 1, 3],
    'Age': [38, 26, 22, 35, 28],
    'Fare': [71.83, 13.0, 7.25, 53.1, 8.05],
    'Survived': [1, 1, 0, 1, 0]
})
X = data.drop('Survived', axis=1)
y = data['Survived']

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train XGBoost
model = XGBClassifier(random_state=42, use_label_encoder=False, eval_metric='logloss')
model.fit(X_train, y_train)

# Evaluate
y_pred = model.predict(X_test)
print(f'Accuracy: {accuracy_score(y_test, y_pred):.2f}')

# Feature importance
plot_importance(model)
plt.show()

## Day 13: Hyperparameter Tuning
**Objective**: Tune model hyperparameters.
**Dataset**: Any from previous days
**Steps**:
1. Use GridSearchCV to tune parameters.
2. Compare performance before and after tuning.

In [None]:
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import accuracy_score

# Load Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train default model
default_model = RandomForestClassifier(random_state=42)
default_model.fit(X_train, y_train)
default_pred = default_model.predict(X_test)
default_accuracy = accuracy_score(y_test, default_pred)

# Define parameter grid
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 10, 20],
    'min_samples_split': [2, 5]
}

# Run GridSearchCV
model = RandomForestClassifier(random_state=42)
grid_search = GridSearchCV(model, param_grid, cv=5, scoring='accuracy')
grid_search.fit(X_train, y_train)

# Evaluate tuned model
tuned_model = grid_search.best_estimator_
tuned_pred = tuned_model.predict(X_test)
tuned_accuracy = accuracy_score(y_test, tuned_pred)

print(f'Default Accuracy: {default_accuracy:.2f}')
print(f'Tuned Accuracy: {tuned_accuracy:.2f}')
print(f'Best Parameters: {grid_search.best_params_}')

## Day 14: Cross-Validation
**Objective**: Implement cross-validation for robust evaluation.
**Dataset**: Any from previous days
**Steps**:
1. Perform k-fold cross-validation.
2. Compare with train-test split results.

In [None]:
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.metrics import accuracy_score

# Load Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LogisticRegression(random_state=42, max_iter=200)
model.fit(X_train, y_train)
split_pred = model.predict(X_test)
split_accuracy = accuracy_score(y_test, split_pred)

# K-fold cross-validation
cv_scores = cross_val_score(model, X, y, cv=5, scoring='accuracy')

print(f'Train-Test Split Accuracy: {split_accuracy:.2f}')
print(f'Cross-Validation Accuracy: {cv_scores.mean():.2f} ± {cv_scores.std():.2f}')

## Day 15: Confusion Matrix and Metrics
**Objective**: Evaluate classification models.
**Dataset**: Any classification dataset
**Steps**:
1. Generate confusion matrix.
2. Calculate precision, recall, F1-score.
3. Visualize results.

In [None]:
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, precision_score, recall_score, f1_score
import seaborn as sns
import matplotlib.pyplot as plt

# Load Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = LogisticRegression(random_state=42, max_iter=200)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

# Generate confusion matrix
cm = confusion_matrix(y_test, y_pred)

# Calculate metrics
precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')
f1 = f1_score(y_test, y_pred, average='weighted')

print(f'Precision: {precision:.2f}')
print(f'Recall: {recall:.2f}')
print(f'F1-Score: {f1:.2f}')

# Visualize confusion matrix
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.title('Confusion Matrix')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.show()

## Day 16: Time Series Forecasting
**Objective**: Forecast time series data with ARIMA.
**Dataset**: [Air Passengers](https://www.kaggle.com/rakannimer/air-passengers)
**Steps**:
1. Load and preprocess time series data.
2. Fit ARIMA model.
3. Forecast and visualize results.

In [None]:
from statsmodels.tsa.arima.model import ARIMA
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Simulate Air Passengers dataset
np.random.seed(42)
dates = pd.date_range(start='1949-01-01', periods=144, freq='M')
data = pd.Series(np.cumsum(np.random.randn(144)) + 100, index=dates)

# Fit ARIMA model
model = ARIMA(data, order=(5, 1, 0))
model_fit = model.fit()

# Forecast
forecast = model_fit.forecast(steps=12)

# Visualize
plt.plot(data, label='Historical')
plt.plot(forecast, label='Forecast', color='red')
plt.title('ARIMA Forecast')
plt.xlabel('Date')
plt.ylabel('Passengers')
plt.legend()
plt.show()

## Day 17: Introduction to Neural Networks
**Objective**: Learn TensorFlow/Keras basics.
**Dataset**: Iris
**Steps**:
1. Build a simple neural network.
2. Train and evaluate model.

In [None]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.utils import to_categorical

# Load Iris dataset
iris = load_iris()
X, y = iris.data, iris.target
y = to_categorical(y)  # One-hot encode labels

# Split and scale data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Build neural network
model = Sequential([
    Dense(16, activation='relu', input_shape=(4,)),
    Dense(8, activation='relu'),
    Dense(3, activation='softmax')
])
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train and evaluate
model.fit(X_train, y_train, epochs=50, batch_size=16, verbose=0)
loss, accuracy = model.evaluate(X_test, y_test, verbose=0)
print(f'Test Accuracy: {accuracy:.2f}')

## Day 18: Feedforward Neural Network
**Objective**: Build a deeper neural network.
**Dataset**: Any classification dataset
**Steps**:
1. Design a multi-layer network.
2. Train and evaluate.
3. Visualize loss curves.

In [None]:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.utils import to_categorical
import matplotlib.pyplot as plt

# Load Iris dataset
iris = load_iris()
X, y = iris.data, iris.target
y = to_categorical(y)

# Split and scale data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Design deeper network
model = Sequential([
    Dense(32, activation='relu', input_shape=(4,)),
    Dense(16, activation='relu'),
    Dense(8, activation='relu'),
    Dense(3, activation='softmax')
])
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train
history = model.fit(X_train, y_train, epochs=100, batch_size=16, validation_split=0.2, verbose=0)

# Evaluate
loss, accuracy = model.evaluate(X_test, y_test, verbose=0)
print(f'Test Accuracy: {accuracy:.2f}')

# Visualize loss
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Loss Curves')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.show()

## Day 19: Convolutional Neural Networks (CNNs)
**Objective**: Build a CNN for image classification.
**Dataset**: [MNIST](https://www.tensorflow.org/datasets/catalog/mnist)
**Steps**:
1. Load and preprocess images.
2. Build and train CNN.
3. Evaluate and visualize predictions.

In [None]:
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
from tensorflow.keras.utils import to_categorical
import matplotlib.pyplot as plt

# Load and preprocess MNIST
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.reshape(-1, 28, 28, 1) / 255.0
X_test = X_test.reshape(-1, 28, 28, 1) / 255.0
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

# Build CNN
model = Sequential([
    Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)),
    MaxPooling2D((2, 2)),
    Flatten(),
    Dense(64, activation='relu'),
    Dense(10, activation='softmax')
])
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train
model.fit(X_train, y_train, epochs=5, batch_size=32, verbose=0)

# Evaluate
loss, accuracy = model.evaluate(X_test, y_test, verbose=0)
print(f'Test Accuracy: {accuracy:.2f}')

# Visualize predictions
predictions = model.predict(X_test[:5])
for i in range(5):
    plt.imshow(X_test[i].reshape(28, 28), cmap='gray')
    plt.title(f'Predicted: {predictions[i].argmax()}, Actual: {y_test[i].argmax()}')
    plt.show()

## Day 20: Transfer Learning
**Objective**: Use pre-trained models for classification.
**Dataset**: [Cats vs Dogs](https://www.tensorflow.org/datasets/catalog/cats_vs_dogs)
**Steps**:
1. Load pre-trained model (e.g., VGG16).
2. Fine-tune on dataset.
3. Evaluate performance.

In [None]:
from tensorflow.keras.applications import VGG16
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.utils import to_categorical
import numpy as np

# Simulate Cats vs Dogs dataset (small sample for demo)
X = np.random.rand(100, 224, 224, 3)  # Simulated images
y = np.random.randint(0, 2, 100)  # Binary labels
y = to_categorical(y)

# Load pre-trained VGG16
base_model = VGG16(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
base_model.trainable = False

# Build model
model = Sequential([
    base_model,
    Flatten(),
    Dense(128, activation='relu'),
    Dense(2, activation='softmax')
])
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train
model.fit(X, y, epochs=3, batch_size=16, validation_split=0.2, verbose=0)

# Evaluate
loss, accuracy = model.evaluate(X, y, verbose=0)
print(f'Test Accuracy: {accuracy:.2f}')

## Day 21: NLP - Text Preprocessing
**Objective**: Preprocess text data for NLP.
**Dataset**: Any text dataset (e.g., IMDB reviews)
**Steps**:
1. Tokenize and clean text.
2. Remove stop words.
3. Create word frequency plot.

In [None]:
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from collections import Counter
import matplotlib.pyplot as plt
import nltk

# Download NLTK data
nltk.download('punkt')
nltk.download('stopwords')

# Sample text (simulating IMDB reviews)
text = "This movie is great! I love the acting and the story. The movie was amazing."

# Tokenize and clean
tokens = word_tokenize(text.lower())
tokens = [word for word in tokens if word.isalpha()]

# Remove stop words
stop_words = set(stopwords.words('english'))
tokens = [word for word in tokens if word not in stop_words]

# Word frequency
word_freq = Counter(tokens)

# Plot
plt.bar(word_freq.keys(), word_freq.values())
plt.title('Word Frequency')
plt.xlabel('Words')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.show()

## Day 22: Sentiment Analysis
**Objective**: Build a sentiment analysis model.
**Dataset**: [IMDB](https://www.tensorflow.org/datasets/catalog/imdb_reviews)
**Steps**:
1. Preprocess text and create Bag of Words.
2. Train logistic regression.
3. Evaluate model.

In [None]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Simulate IMDB dataset
texts = [
    "This movie is great and amazing",
    "Terrible film, really bad",
    "I loved the story",
    "Not good, very boring"
]
labels = [1, 0, 1, 0]  # 1: positive, 0: negative

# Preprocess text (Bag of Words)
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.25, random_state=42)

# Train logistic regression
model = LogisticRegression(random_state=42)
model.fit(X_train, y_train)

# Evaluate
y_pred = model.predict(X_test)
print(f'Accuracy: {accuracy_score(y_test, y_pred):.2f}')

## Day 23: Word Embeddings
**Objective**: Use Word2Vec for embeddings.
**Dataset**: Any text corpus
**Steps**:
1. Train Word2Vec model.
2. Visualize embeddings with t-SNE.
3. Find similar words.

In [None]:
from gensim.models import Word2Vec
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
import numpy as np

# Sample corpus
sentences = [
    ['machine', 'learning', 'is', 'fun'],
    ['deep', 'learning', 'is', 'powerful'],
    ['artificial', 'intelligence', 'is', 'exciting']
]

# Train Word2Vec
model = Word2Vec(sentences, vector_size=100, window=5, min_count=1, workers=4)

# Find similar words
print("Words similar to 'learning':", model.wv.most_similar('learning', topn=3))

# Visualize embeddings with t-SNE
words = list(model.wv.key_to_index)
vectors = model.wv[words]
tsne = TSNE(n_components=2, random_state=42)
vectors_2d = tsne.fit_transform(vectors)

plt.scatter(vectors_2d[:, 0], vectors_2d[:, 1])
for i, word in enumerate(words):
    plt.annotate(word, (vectors_2d[i, 0], vectors_2d[i, 1]))
plt.title('Word Embeddings (t-SNE)')
plt.show()

## Day 24: Text Classification with LSTM
**Objective**: Build an LSTM for text classification.
**Dataset**: IMDB
**Steps**:
1. Preprocess text and create sequences.
2. Build and train LSTM.
3. Evaluate model.

In [None]:
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense
from sklearn.model_selection import train_test_split

# Simulate IMDB dataset
texts = [
    "This movie is great and amazing",
    "Terrible film, really bad",
    "I loved the story",
    "Not good, very boring"
] * 25  # Increase size
labels = [1, 0, 1, 0] * 25

# Preprocess text
tokenizer = Tokenizer(num_words=1000)
tokenizer.fit_on_texts(texts)
sequences = tokenizer.texts_to_sequences(texts)
X = pad_sequences(sequences, maxlen=10)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.2, random_state=42)

# Build LSTM
model = Sequential([
    Embedding(1000, 32, input_length=10),
    LSTM(32),
    Dense(1, activation='sigmoid')
])
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train
model.fit(X_train, y_train, epochs=5, batch_size=16, verbose=0)

# Evaluate
loss, accuracy = model.evaluate(X_test, y_test, verbose=0)
print(f'Test Accuracy: {accuracy:.2f}')

## Day 25: Reinforcement Learning
**Objective**: Implement Q-Learning basics.
**Environment**: [OpenAI Gym](https://gym.openai.com/)
**Steps**:
1. Set up a simple environment (e.g., FrozenLake).
2. Implement Q-Learning algorithm.
3. Visualize learning progress.

In [None]:
import gym
import numpy as np
import matplotlib.pyplot as plt

# Set up FrozenLake environment
env = gym.make('FrozenLake-v1', is_slippery=False)
n_states = env.observation_space.n
n_actions = env.action_space.n

# Initialize Q-table
Q = np.zeros((n_states, n_actions))

# Q-Learning parameters
alpha = 0.1
gamma = 0.99
epsilon = 0.1
episodes = 1000
rewards = []

# Q-Learning algorithm
for episode in range(episodes):
    state = env.reset()
    total_reward = 0
    done = False
    while not done:
        if np.random.rand() < epsilon:
            action = env.action_space.sample()
        else:
            action = np.argmax(Q[state])
        next_state, reward, done, _ = env.step(action)
        Q[state, action] += alpha * (reward + gamma * np.max(Q[next_state]) - Q[state, action])
        state = next_state
        total_reward += reward
    rewards.append(total_reward)

# Visualize rewards
plt.plot(rewards)
plt.title('Q-Learning Rewards')
plt.xlabel('Episode')
plt.ylabel('Total Reward')
plt.show()

## Day 26: Simple Chatbot
**Objective**: Build a chatbot with NLTK or Rasa.
**Resources**: [NLTK Docs](https://www.nltk.org/), [Rasa Docs](https://rasa.com/docs/)
**Steps**:
1. Create a rule-based or ML-based chatbot.
2. Test with sample inputs.
3. Save model.

In [None]:
from nltk.chat.util import Chat, reflections

# Define chatbot pairs
pairs = [
    [r'hi|hello', ['Hello!', 'Hi there!']],
    [r'how are you', ['I am doing great, thanks!']],
    [r'what is your name', ['I am Grok, your friendly chatbot.']],
    [r'quit', ['Bye!']]
]

# Create chatbot
chatbot = Chat(pairs, reflections)

# Test chatbot
print("Type 'quit' to exit")
chatbot.converse()

## Day 27: Model Deployment
**Objective**: Deploy a model with Flask.
**Dataset**: Any classification model
**Steps**:
1. Train a model and save it.
2. Create a Flask app for predictions.
3. Test API endpoints.

In [None]:
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
import joblib
from flask import Flask, request, jsonify

# Train and save model
iris = load_iris()
X, y = iris.data, iris.target
model = LogisticRegression(random_state=42, max_iter=200)
model.fit(X, y)
joblib.dump(model, 'model.pkl')

# Create Flask app
app = Flask(__name__)
model = joblib.load('model.pkl')

@app.route('/predict', methods=['POST'])
def predict():
    data = request.get_json(force=True)
    features = [data['features']]
    prediction = model.predict(features)
    return jsonify({'prediction': int(prediction[0])})

# Test API (run in a separate script)
# if __name__ == '__main__':
#     app.run(debug=True)

# Test with curl or requests:
# curl -X POST -H "Content-Type: application/json" -d '{"features": [5.1, 3.5, 1.4, 0.2]}' http://127.0.0.1:5000/predict

## Day 28: Model Monitoring
**Objective**: Monitor model performance and drift.
**Dataset**: Any previous dataset
**Steps**:
1. Simulate new data.
2. Check for data drift.
3. Retrain model if needed.

In [None]:
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
import numpy as np

# Load Iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Train initial model
model = LogisticRegression(random_state=42, max_iter=200)
model.fit(X, y)
initial_pred = model.predict(X)
initial_accuracy = accuracy_score(y, initial_pred)

# Simulate new data (with drift)
X_new = X + np.random.normal(0, 0.5, X.shape)  # Add noise
new_pred = model.predict(X_new)
new_accuracy = accuracy_score(y, new_pred)

# Check for drift
print(f'Initial Accuracy: {initial_accuracy:.2f}')
print(f'New Data Accuracy: {new_accuracy:.2f}')

# Retrain if drift detected
if new_accuracy < initial_accuracy * 0.9:
    model.fit(X_new, y)
    retrained_pred = model.predict(X_new)
    print(f'Retrained Accuracy: {accuracy_score(y, retrained_pred):.2f}')

## Day 29: Capstone Project
**Objective**: Build an end-to-end ML pipeline.
**Dataset**: [Stock Prices](https://www.kaggle.com/datasets/timoboz/stock-prices)
**Steps**:
1. Preprocess data and engineer features.
2. Train a model (e.g., LSTM or regression).
3. Evaluate and visualize predictions.

In [None]:
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
import numpy as np
import matplotlib.pyplot as plt

# Simulate stock prices
np.random.seed(42)
prices = np.cumsum(np.random.randn(200)) + 100

# Preprocess data
scaler = MinMaxScaler()
prices_scaled = scaler.fit_transform(prices.reshape(-1, 1))

# Create sequences
def create_sequences(data, seq_length):
    X, y = [], []
    for i in range(len(data) - seq_length):
        X.append(data[i:i + seq_length])
        y.append(data[i + seq_length])
    return np.array(X), np.array(y)

X, y = create_sequences(prices_scaled, 10)
X_train, X_test = X[:int(0.8*len(X))], X[int(0.8*len(X)):]
y_train, y_test = y[:int(0.8*len(y))], y[int(0.8*len(y)):]

# Build LSTM
model = Sequential([
    LSTM(50, activation='relu', input_shape=(10, 1)),
    Dense(1)
])
model.compile(optimizer='adam', loss='mse')

# Train
model.fit(X_train, y_train, epochs=20, batch_size=16, verbose=0)

# Predict
y_pred = model.predict(X_test, verbose=0)

# Visualize
plt.plot(scaler.inverse_transform(y_test), label='Actual')
plt.plot(scaler.inverse_transform(y_pred), label='Predicted')
plt.title('Stock Price Prediction')
plt.xlabel('Time')
plt.ylabel('Price')
plt.legend()
plt.show()

## Day 30: Present and Share
**Objective**: Share your projects.
**Resources**: [GitHub](https://github.com/), [Kaggle](https://www.kaggle.com/)
**Steps**:
1. Create a GitHub repository for your projects.
2. Write a blog post or Kaggle notebook.
3. Share on social media or forums.

In [None]:
# Instructions for sharing
print("1. Create a GitHub repository:\n"
      "   - Go to github.com and create a new repository.\n"
      "   - Push this notebook: `git add .`, `git commit -m 'Add 30 Days of ML'`, `git push`.\n")
print("2. Write a blog post or Kaggle notebook:\n"
      "   - Summarize your 30-day journey.\n"
      "   - Upload to Kaggle or a blog platform like Medium.\n")
print("3. Share on social media:\n"
      "   - Post on LinkedIn or Twitter with #30DaysOfML.\n"
      "   - Include your GitHub/Kaggle link.")