<a href="https://colab.research.google.com/github/aidanbolinger/MachineLearning/blob/main/ICP_8_(1).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
from sklearn.datasets import load_iris
from sklearn.model_selection import ParameterGrid, RandomizedSearchCV, train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression, Perceptron
from sklearn.neighbors import KNeighborsClassifier
import pandas as pd
from google.colab import drive

# Load dataset
drive.mount('/content/drive')
data = pd.read_csv("/content/drive/MyDrive/Data.csv")
X = data.drop("Grade", axis=1)
y = data["Grade"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

n_features = X.shape[1]
valid_pca_components = [n for n in [2, 5, 10] if n <= n_features]

# Define classifiers
classifiers = {
    'RandomForestClassifier': (
        RandomForestClassifier(random_state=42),
        {
            'pca__n_components': valid_pca_components,
            'classifier__n_estimators': [50, 100, 200],
            'classifier__max_depth': [None, 10, 20]
        }
    ),
    'LogisticRegression': (
        LogisticRegression(max_iter=1000, random_state=42),
        {
            'pca__n_components': valid_pca_components,
            'classifier__C': [0.01, 0.1, 1, 10],
            'classifier__penalty': ['l2'],
            'classifier__solver': ['lbfgs']
        }
    ),
    'Perceptron': (
        Perceptron(max_iter=1000, random_state=42),
        {
            'pca__n_components': valid_pca_components,
            'classifier__penalty': ['l2', 'l1'],
            'classifier__alpha': [0.0001, 0.001, 0.01],
        }
    ),
    'KNeighborsClassifier': (
        KNeighborsClassifier(),
        {
            'pca__n_components': valid_pca_components,
            'classifier__n_neighbors': [3, 5, 7],
            'classifier__weights': ['uniform', 'distance'],
        }
    )
}

# Cross-validation folds
cv_folds = [3, 5, 7]

# Loop through classifiers and CV options
for clf_name, (clf, param_grid) in classifiers.items():
    for cv_fold in cv_folds:
        print(f"\n---Classifier: {clf_name}, CV Folds: {cv_fold} ---")

        # Create pipeline
        pipe = Pipeline([
            ('scaler', StandardScaler()),
            ('pca', PCA()),
            ('classifier', clf)
        ])
        
        # Calculate total parameter combinations and set n_iter
        total_params = len(ParameterGrid(param_grid))
        n_iter = min(10, total_params)
        
        # RandomizedSearchCV
        random_search = RandomizedSearchCV(
            pipe, 
            param_distributions=param_grid, 
            n_iter=n_iter, 
            cv=cv_fold, 
            random_state=42, 
            n_jobs=-1
        )
        random_search.fit(X_train, y_train)

        # Results
        print("Best parameters found:", random_search.best_params_)
        print(f"Best cross-validation score: {random_search.best_score_:.2f}")
        print(f"Test set score: {random_search.score(X_test, y_test):.2f}")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).

---Classifier: RandomForestClassifier, CV Folds: 3 ---




Best parameters found: {'pca__n_components': 2, 'classifier__n_estimators': 50, 'classifier__max_depth': None}
Best cross-validation score: 0.93
Test set score: 0.93

---Classifier: RandomForestClassifier, CV Folds: 5 ---




Best parameters found: {'pca__n_components': 2, 'classifier__n_estimators': 50, 'classifier__max_depth': None}
Best cross-validation score: 0.94
Test set score: 0.93

---Classifier: RandomForestClassifier, CV Folds: 7 ---




Best parameters found: {'pca__n_components': 2, 'classifier__n_estimators': 50, 'classifier__max_depth': None}
Best cross-validation score: 0.96
Test set score: 0.93

---Classifier: LogisticRegression, CV Folds: 3 ---




Best parameters found: {'pca__n_components': 2, 'classifier__solver': 'lbfgs', 'classifier__penalty': 'l2', 'classifier__C': 1}
Best cross-validation score: 0.98
Test set score: 0.86

---Classifier: LogisticRegression, CV Folds: 5 ---




Best parameters found: {'pca__n_components': 2, 'classifier__solver': 'lbfgs', 'classifier__penalty': 'l2', 'classifier__C': 1}
Best cross-validation score: 0.98
Test set score: 0.86

---Classifier: LogisticRegression, CV Folds: 7 ---




Best parameters found: {'pca__n_components': 2, 'classifier__solver': 'lbfgs', 'classifier__penalty': 'l2', 'classifier__C': 1}
Best cross-validation score: 0.98
Test set score: 0.86

---Classifier: Perceptron, CV Folds: 3 ---




Best parameters found: {'pca__n_components': 2, 'classifier__penalty': 'l2', 'classifier__alpha': 0.0001}
Best cross-validation score: 0.98
Test set score: 0.93

---Classifier: Perceptron, CV Folds: 5 ---




Best parameters found: {'pca__n_components': 2, 'classifier__penalty': 'l1', 'classifier__alpha': 0.01}
Best cross-validation score: 1.00
Test set score: 1.00

---Classifier: Perceptron, CV Folds: 7 ---




Best parameters found: {'pca__n_components': 2, 'classifier__penalty': 'l2', 'classifier__alpha': 0.001}
Best cross-validation score: 0.98
Test set score: 0.86

---Classifier: KNeighborsClassifier, CV Folds: 3 ---




Best parameters found: {'pca__n_components': 2, 'classifier__weights': 'distance', 'classifier__n_neighbors': 5}
Best cross-validation score: 0.93
Test set score: 0.86

---Classifier: KNeighborsClassifier, CV Folds: 5 ---




Best parameters found: {'pca__n_components': 2, 'classifier__weights': 'distance', 'classifier__n_neighbors': 5}
Best cross-validation score: 0.96
Test set score: 0.86

---Classifier: KNeighborsClassifier, CV Folds: 7 ---




Best parameters found: {'pca__n_components': 2, 'classifier__weights': 'uniform', 'classifier__n_neighbors': 3}
Best cross-validation score: 0.98
Test set score: 0.86


Check for 3 fold, 5 fold and 7 fold cross validation

Replace classifier, SVC with RandomForestClassifier and LogisticRegression, Perceptron, knn .

Update the param_grid accordingly (e.g., for RandomForestClassifier, use n_estimators, max_depth, etc.)

Also replace Gridsearch with randomnsearch function.

Relplace with with your own csv dataset using code below:

In [None]:
import pandas as pd

data = pd.read_csv("your_dataset.csv")
X = data.drop("target_column", axis=1)
y = data["target_column"]
