<a href="https://colab.research.google.com/github/OAlbuja/3-column-preview-card/blob/master/Copia_de_03_scikitLearn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Scikit-learn

The most prominent Python library for traditional machine learning (ML) is *scikit-learn*.  This library includes simple and efficient tools for predictive data analysis, while it is accessible to everybody, and reusable in various contexts.  Scikit-learn is an open source (commercially usable under the BSD license) ML library that supports supervised and unsupervised learning, and is built on NumPy, SciPy, and matplotlib.  It also provides various tools for model fitting, data preprocessing, model selection, model evaluation, and many other utilities.  The main source of information about Scikit-learn is its [webpage](https://scikit-learn.org/).

There are different ways to install Scikit-learn, since this library is available on Windows, Mac OS, and Linux.  In fact, Scikit-learn can be installed through *pip* or *conda*.  A complete guide of scikit-learn can be found [here](https://scikit-learn.org/stable/install.html).  In addition, if you have problems with the installation of Python or Scikit-learn in your local machine, you can also run the examples in Google Colab (or [Colaboratory](https://colab.research.google.com/)).

Importing basic libraries

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import sklearn.neural_network as snn
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn import svm
from sklearn import neighbors
from sklearn import tree
from sklearn.linear_model import LogisticRegression
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import BaggingClassifier
from sklearn import metrics
from scipy.stats import zscore

Example for loading Matlab datasets

In [None]:
import scipy.io

mat = scipy.io.loadmat('sample_data/HSEfeatures.mat')
data = mat['heart']
x = data[:,0:27]
y = data[:,27:28]
print(x.shape, y.shape)

## Regression

Dataset loading

In [None]:
dataset = datasets.fetch_california_housing()
Xtrain, Xtest, Ytrain, Ytest = train_test_split(dataset.data, dataset.target, test_size=0.3, random_state=0)
Xtrain = zscore(Xtrain, axis=0)
Xtest = zscore(Xtest, axis=0)

Support vector regression

In [None]:
mdl = svm.SVR(kernel='rbf')
mdl.fit(Xtrain, Ytrain)
Y_hat = mdl.predict(Xtest)

K-nearest neighbor regression

In [None]:
mdl = neighbors.KNeighborsRegressor(n_neighbors=3, weights='distance')
mdl.fit(Xtrain, Ytrain)
Y_hat = mdl.predict(Xtest)

Neural network regression

In [None]:
mdl = snn.MLPRegressor(hidden_layer_sizes=(50, 50), max_iter=1000)
mdl.fit(Xtrain, Ytrain)
plt.plot(mdl.loss_curve_)
Y_hat = mdl.predict(Xtest)

Model evaluation: mean squared error

In [None]:
metrics.mean_squared_error(Ytest, Y_hat)

**Assignment 1**: Determine the best model to predict the median house value, after adjusting the different hyperparameteres of each model.

## Classification

Dataset loading

In [None]:
Xtrain, Xtest, Ytrain, Ytest = train_test_split(x, y, test_size=0.3)

Logistic regresion

In [None]:
mdl = LogisticRegression(max_iter=5000)
mdl.fit(Xtrain, Ytrain)
Ypred = mdl.predict(Xtest)

Discriminant Analysis

In [None]:
mdl = QuadraticDiscriminantAnalysis()
mdl.fit(Xtrain, Ytrain)
Ypred = mdl.predict(Xtest)

Support vector machine

In [None]:
mdl = svm.SVC()
mdl.fit(Xtrain, Ytrain)
Ypred = mdl.predict(Xtest)

K-nearest neighbors

In [None]:
mdl = neighbors.KNeighborsClassifier(n_neighbors=3)
mdl.fit(Xtrain, Ytrain)
Ypred = mdl.predict(Xtest)

Decision tree

In [None]:
mdl = tree.DecisionTreeClassifier()
mdl.fit(Xtrain, Ytrain)
tree.plot_tree(mdl)
Ypred = mdl.predict(Xtest)

Naive Bayes

In [None]:
mdl = GaussianNB()
mdl.fit(Xtrain, Ytrain)
Ypred = mdl.predict(Xtest)

Neural network

In [None]:
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler

# Normalizar datos
scaler = StandardScaler()
Xtrain_scaled = scaler.fit_transform(Xtrain)
Xtest_scaled = scaler.transform(Xtest)

# Ajustar modelo de redes neuronales
mdl = MLPClassifier(hidden_layer_sizes=(200, 100, 50, 20), max_iter=2000, solver='adam', activation='tanh', alpha=0.01, batch_size=64, learning_rate='adaptive')
mdl.fit(Xtrain_scaled, Ytrain)

# Visualizar la curva de pérdida durante el entrenamiento
plt.plot(mdl.loss_curve_)
plt.xlabel('Número de iteraciones')
plt.ylabel('Pérdida')
plt.title('Curva de pérdida durante el entrenamiento')
plt.show()

# Realizar predicciones en el conjunto de prueba
Ypred_proba = mdl.predict_proba(Xtest_scaled)[:, 1]
threshold = 0.8  # Ajusta el umbral según sea necesario
Ypred_custom = (Ypred_proba > threshold).astype(int)

Bagging

In [None]:
mdl = BaggingClassifier(estimator=svm.SVC(), n_estimators=25)
mdl.fit(Xtrain, Ytrain)
Ypred = mdl.predict(Xtest)

Classification report and confusion matrix

In [None]:
print(f"Classification report for classifier {mdl}:\n"
    f"{metrics.classification_report(Ytest, Ypred)}\n")
cm = metrics.confusion_matrix(Ytest, Ypred, labels=mdl.classes_)
disp = metrics.ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=mdl.classes_)
disp.plot();

**Assignment 2**: Determine the best model to predict the hand-written digits, after adjusting the different hyperparameteres of each model.

**Assignment 3**: Determine the best model to predict `fashion_mnist` dataset, after adjusting the different hyperparameteres of each model.