# **Python `scikit-learn` Module Practice**
This notebook provides an overview and practice examples for the `scikit-learn` module, a library used for machine learning tasks such as classification, regression, clustering, and data preprocessing.

## **1. Installing Scikit-learn**
Ensure `scikit-learn` is installed using:
```bash
pip install scikit-learn
```

Import the necessary modules:

In [None]:
from sklearn import datasets, model_selection, preprocessing, metrics
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.cluster import KMeans

## **2. Loading and Exploring Datasets**
Scikit-learn provides several built-in datasets such as `iris`, `digits`, and `boston`. You can also load your custom datasets.

In [None]:
# Load the iris dataset
iris = datasets.load_iris()
print(f"Features: {iris.feature_names}")
print(f"Target: {iris.target_names}")

# Split the dataset
X_train, X_test, y_train, y_test = model_selection.train_test_split(
    iris.data, iris.target, test_size=0.2, random_state=42
)
print(f"Training data shape: {X_train.shape}, Testing data shape: {X_test.shape}")

## **3. Data Preprocessing**
Scikit-learn provides preprocessing tools such as scaling, encoding, and imputation.

In [None]:
# Scale the data
scaler = preprocessing.StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
print(f"First scaled training sample: {X_train_scaled[0]}")

## **4. Building and Training Models**
Scikit-learn provides APIs to build and train machine learning models, such as logistic regression and random forests.

In [None]:
# Train a Logistic Regression model
log_reg = LogisticRegression(max_iter=200)
log_reg.fit(X_train_scaled, y_train)

# Train a Random Forest Classifier
rf_clf = RandomForestClassifier(random_state=42)
rf_clf.fit(X_train, y_train)

## **5. Evaluating Models**
Evaluate model performance using metrics like accuracy, precision, recall, and confusion matrix.

In [None]:
# Evaluate Logistic Regression model
y_pred_log = log_reg.predict(X_test_scaled)
print(f"Logistic Regression Accuracy: {metrics.accuracy_score(y_test, y_pred_log)}")

# Evaluate Random Forest Classifier
y_pred_rf = rf_clf.predict(X_test)
print(f"Random Forest Accuracy: {metrics.accuracy_score(y_test, y_pred_rf)}")

# Confusion Matrix
print(f"Confusion Matrix:\n{metrics.confusion_matrix(y_test, y_pred_rf)}")

## **6. Hyperparameter Tuning**
Optimize model performance using techniques like Grid Search or Random Search.

In [None]:
# Perform Grid Search on Random Forest
param_grid = {
    'n_estimators': [10, 50, 100],
    'max_depth': [None, 10, 20]
}
grid_search = model_selection.GridSearchCV(
    rf_clf, param_grid, cv=3
)
grid_search.fit(X_train, y_train)
print(f"Best Parameters: {grid_search.best_params_}")

## **7. Clustering with KMeans**
Scikit-learn supports unsupervised learning algorithms like KMeans for clustering.

In [None]:
# Apply KMeans clustering
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(iris.data)
print(f"Cluster Centers:\n{kmeans.cluster_centers_}")
print(f"Predicted Clusters: {kmeans.labels_}")

## **8. Model Persistence**
Save and load trained models using joblib or pickle.

In [None]:
from joblib import dump, load

# Save the model
dump(rf_clf, 'random_forest_model.joblib')
print("Model saved.")

# Load the model
loaded_model = load('random_forest_model.joblib')
print("Model loaded.")

## **9. Practical Example: Regression with Boston Dataset**
Demonstrating regression tasks using the Boston housing dataset.

In [None]:
# Load the Boston dataset
boston = datasets.load_boston()
X_train, X_test, y_train, y_test = model_selection.train_test_split(
    boston.data, boston.target, test_size=0.2, random_state=42
)

# Train a Linear Regression model
from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(X_train, y_train)

# Evaluate the model
y_pred = lin_reg.predict(X_test)
print(f"Mean Squared Error: {metrics.mean_squared_error(y_test, y_pred)}")

## **10. Practical Example: Cross-Validation**
Use cross-validation to evaluate models more robustly.

In [None]:
# Perform cross-validation
scores = model_selection.cross_val_score(
    rf_clf, X_train, y_train, cv=5, scoring='accuracy'
)
print(f"Cross-validation scores: {scores}")
print(f"Mean accuracy: {scores.mean()}")