# Basic Machine Learning Tutorials

This notebook walks through simple scikit-learn examples step by step.

Run the cell below if you need to install the required libraries. In Google Colab they come pre-installed.

In supervised learning we observe pairs $(x, y)$ and seek a function $f(x;	heta)$ that predicts $y$ from $x$.
Each algorithm solves an optimization problem to find parameters $	heta$ that minimize a chosen loss function.


In [None]:
!pip install scikit-learn matplotlib

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_regression, load_iris, load_digits
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report, ConfusionMatrixDisplay
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA


## 0. Linear regression on synthetic data

Linear regression assumes a linear relationship between the feature $x$ and target $y$: 

$$y = w x + b + \epsilon.$$

Here we generate samples $x_i$ from a normal distribution and add Gaussian noise $\epsilon_i$ to the targets.
Using the least-squares criterion we recover the weight $w$ and bias $b$ that minimize
$$J(w,b)=\sum_i (y_i - w x_i - b)^2.$$
In matrix form the solution is $(\hat w, \hat b)= (X^T X)^{-1}X^T y$ where $X$ includes a column of ones for the bias.


In [None]:
X, y, coef = make_regression(n_samples=100, n_features=1, noise=10.0, coef=True, random_state=42)
print('X shape:', X.shape)
print('y shape:', y.shape)


In [None]:
plt.scatter(X, y, color='blue', label='Data')
plt.xlabel('Feature')
plt.ylabel('Target')
plt.legend()
plt.tight_layout()
plt.show()

In [None]:
model = LinearRegression()
model.fit(X, y)
print('True coefficient:', coef)
print('Learned coefficient:', model.coef_[0])
print('Intercept:', model.intercept_)


In [None]:
x_grid = np.linspace(X.min(), X.max(), 100).reshape(-1,1)
y_pred = model.predict(x_grid)
print('Prediction shape:', y_pred.shape)
plt.scatter(X, y, color='blue', label='Data')
plt.plot(x_grid, y_pred, color='red', label='Fit')
plt.xlabel('Feature')
plt.ylabel('Target')
plt.legend()
plt.tight_layout()
plt.show()


## 1. Logistic regression on Iris

Logistic regression models the probability of each class. For binary output the model uses the sigmoid
$$P(y=1\mid x)=\sigma(w^T x+b)=\frac{1}{1+e^{-w^T x-b}}.$$
More generally, the multi-class version applies a softmax over the linear scores.
Training maximizes the data likelihood which is equivalent to minimizing the cross-entropy loss.
We split the Iris dataset into train and test sets and measure classification accuracy.


In [None]:
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)
print('Train shape:', X_train.shape, y_train.shape)
print('Test shape:', X_test.shape, y_test.shape)


In [None]:
clf = LogisticRegression(max_iter=200)
clf.fit(X_train, y_train)
preds = clf.predict(X_test)
print('Accuracy:', accuracy_score(y_test, preds))


In [None]:
ConfusionMatrixDisplay.from_predictions(y_test, preds)
plt.title('Logistic Regression Confusion Matrix')
plt.tight_layout()
plt.show()


## 2. k-NN classification on digits

The $k$-nearest neighbors method stores all training examples. For a new sample $x$ we compute the Euclidean
distance to every training point, $$d(x, x_i)=\sqrt{\|x-x_i\|^2}.$$
The $k$ points with smallest distance form the neighborhood $N_k(x)$ and the predicted label is the most common
among them: $$\hat y=\operatorname{mode}(\{y_i : x_i \in N_k(x)\}).$$
This lazy approach has no training phase beyond storing the data.


In [None]:
X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)
print('Train shape:', X_train.shape)
print('Test shape:', X_test.shape)


In [None]:
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
preds = knn.predict(X_test)
print('Test accuracy:', accuracy_score(y_test, preds))


In [None]:
ConfusionMatrixDisplay.from_predictions(y_test, preds)
plt.title('k-NN Confusion Matrix')
plt.tight_layout()
plt.show()


## 3. Decision tree classifier

Decision trees build a hierarchy of if-else rules. At each node we choose the feature and threshold that maximize
the reduction in impurity, often measured by the Gini index
$$G=1-\sum_k p_k^2,$$
where $p_k$ is the proportion of class $k$ in that node. Splits continue until nodes become pure or a maximum depth
is reached. The resulting tree makes predictions by following the learned rules from root to leaf.


In [None]:
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)
print('Train shape:', X_train.shape)


In [None]:
tree = DecisionTreeClassifier(max_depth=3, random_state=42)
tree.fit(X_train, y_train)
preds = tree.predict(X_test)
print(classification_report(y_test, preds))


In [None]:
ConfusionMatrixDisplay.from_predictions(y_test, preds)
plt.title('Decision Tree Confusion Matrix')
plt.tight_layout()
plt.show()


This concludes the brief tour of basic machine learning examples using scikit-learn.
Each model optimizes a mathematical objective or rule to map inputs to outputs.
Feel free to modify the code cells and explore further, trying different datasets or algorithms.
