# Scikit-learn

From Oficial site: https://scikit-learn.org

"In order to avoid potential conflicts with other packages it is strongly recommended to use a virtual environment (venv) or a conda environment." 

Here we use a venv ( https://github.com/FabioRochaPoeta/initializaing-git-automated-power-shell ) in 3 basic examples. User guide: https://scikit-learn.org/stable/user_guide.html

And here is 15 most important features of this library: https://www.analyticsvidhya.com/blog/2021/07/15-most-important-features-of-scikit-learn/

In [None]:
# Installing
%pip install -U scikit-learn

## 1. Preprocessing (Dataset Transformations) - https://scikit-learn.org/stable/modules/preprocessing.html

In [8]:
from sklearn.preprocessing import StandardScaler

# Load dataset
X = [[1, -1, 2], [2, 0, 0], [0, 1, -1]]
scaler = StandardScaler().fit(X)
X_scaled = scaler.transform(X)

# Print results
print("Original data:", X)
print("Scaled data:", X_scaled)


Original data: [[1, -1, 2], [2, 0, 0], [0, 1, -1]]
Scaled data: [[ 0.         -1.22474487  1.33630621]
 [ 1.22474487  0.         -0.26726124]
 [-1.22474487  1.22474487 -1.06904497]]


This code demonstrates how to use Scikit-learn's StandardScaler class to preprocess data by scaling it to have 
### zero mean and unit variance. 

First, we create an instance of the StandardScaler class and fit it to the data using the fit method. Then, we transform the data using the transform method and store the scaled data in the X_scaled variable. Finally, we print both the original and scaled data to the console.

## 2. Cross-Validating (Model selection and evaluation) - https://scikit-learn.org/stable/modules/cross_validation.html

In [9]:
from sklearn.datasets import load_iris
from sklearn.model_selection import cross_val_score
from sklearn.tree import DecisionTreeClassifier

# Load dataset
iris = load_iris()

# Train model with cross-validation
model = DecisionTreeClassifier()
scores = cross_val_score(model, iris.data, iris.target, cv=5)

# Print results
print("Cross-validation scores:", scores)
print("Mean score:", scores.mean())


Cross-validation scores: [0.96666667 0.96666667 0.9        0.93333333 1.        ]
Mean score: 0.9533333333333334


This code demonstrates how to use Scikit-learn's cross_val_score function to 
### perform cross-validation on a machine learning model. 

First, we load the Iris dataset using the load_iris function. Then, we create an instance of the DecisionTreeClassifier class and use the cross_val_score function to train and evaluate the model using 5-fold cross-validation. Finally, we print the cross-validation scores and their mean to the console.

# 3. Clustering Data (Unsupervised learning) - https://scikit-learn.org/stable/modules/clustering.html#overview-of-clustering-methods

In [10]:
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans

# Generate data
X, y = make_blobs(n_samples=1000, centers=3, random_state=42)

# Cluster data
kmeans = KMeans(n_clusters=3, random_state=42)
labels = kmeans.fit_predict(X)

# Print results
print("Cluster labels:", labels)


Cluster labels: [0 0 1 2 2 0 1 1 1 1 2 0 2 1 1 1 2 2 2 2 1 0 0 0 1 2 2 2 1 1 0 0 1 2 2 0 0
 1 0 0 2 1 2 0 2 0 2 2 0 1 0 2 1 0 2 0 2 2 2 1 1 2 0 0 2 2 0 1 1 2 0 2 1 1
 1 0 1 2 2 2 1 2 2 2 1 0 1 0 2 2 2 2 1 1 0 1 0 2 1 1 1 0 0 2 0 1 1 2 1 2 0
 1 1 1 2 1 0 0 1 2 2 1 0 1 0 0 1 0 1 1 2 1 1 0 2 0 2 1 1 1 2 2 0 0 0 2 1 2
 2 2 2 0 1 0 2 1 2 1 2 0 1 2 2 1 2 2 1 1 0 0 1 2 2 1 2 0 1 0 1 0 2 1 2 1 2
 0 2 0 2 0 1 2 1 1 2 0 1 1 1 0 2 1 2 2 1 2 1 2 2 1 1 0 0 1 1 2 0 2 0 1 0 0
 1 2 0 2 2 1 0 2 2 0 2 0 1 1 0 1 0 1 0 0 0 1 0 2 1 2 1 2 1 1 0 1 2 1 2 1 1
 1 1 2 0 1 0 0 1 2 2 0 1 2 1 1 2 0 2 0 0 1 0 1 0 1 1 2 2 1 2 0 0 2 1 1 0 2
 1 0 0 2 0 0 1 0 2 1 0 1 0 0 0 0 0 1 0 0 1 2 0 0 1 0 0 1 0 1 2 1 2 2 1 1 0
 0 0 2 0 1 0 1 0 1 1 1 2 2 0 0 2 2 2 2 2 1 2 2 1 2 2 0 1 1 1 1 0 1 0 0 0 1
 1 2 2 1 2 1 2 0 1 2 0 1 0 1 1 0 2 0 1 0 1 2 0 2 0 0 0 2 1 2 2 0 2 1 2 0 1
 2 2 0 1 2 0 0 0 1 0 2 0 1 2 2 2 2 2 0 1 0 1 1 2 2 0 2 0 1 0 1 0 2 0 0 0 1
 2 2 1 2 0 0 0 0 2 2 0 2 1 0 1 2 1 1 2 2 1 1 1 0 1 2 0 2 2 0 0 1 0 2 1 2 0
 1 0 1 1 



This code demonstrates how to use Scikit-learn's KMeans class to cluster data. First, we use the make_blobs function to 
### generate synthetic data with three clusters. Then, we create an instance of the KMeans class with three clusters and fit it to the data using the fit_predict method. Finally, we print the cluster labels assigned by the algorithm to the console.