# Adding Polynomial Features

Once you plot features against label and realize that they are not linearly corelated but rather have a higher order polynomial relationship e.g. 2
<br>
The folloeing code adds a polynomial feature to the training dataset with a degree of int

```python
from sklearn.preprocessing import PolynomialFeatures

poly_features = PolynomialFeatures(degree=int, include_bias=False)
x_poly_transformed = poly_features.fit_transform(x)
```
The new array is then fed to a training algorithm to define its weights and bias

# Linear Models 


## Linear Regression 

```python
from sklearn.linear_model import LinearRegression

lin_reg = LinearRegression()
lin_reg.fit(x,y)
```

## Stochastic Gradient Descent 

The following code runs for maximum a epochs (max_iter=a) or until the loss drops by less than b during one epoch (tol=b), starting with a learning rate of c
(eta0=c),
```python
from sklearn.linear_model import SDG(Regressor or Classifier)
sgd_reg = SGDRegressor(max_iter=a, tol=b, penalty=None, eta0=c)
sgd_reg.fit(x,y)
```

## Ridge Regressor

Ridge linear algorithm is a regularized version of the linear model which puts some additional weight on the co-efficients ( alpha ) to prevent the model from overfitting

```python
from sklearn.linear_model import Ridge 

ridge = Ridge(alpha=alpha, solver="cholesky")
```

## Lasso Regressor 

Least Absolute Shrinkage and Selection Operator Regression (simply called Lasso Regression) is another regularized version of Linear Regression. 
<br>
An important characteristic of Lasso Regression is that it tends to completely eliminate the weights of the least important features 
<br>
This model allows for a regularization factor of alpha be place on the co-efficients to put some weight on them 

```python
from sklearn.linear_model import Lasso 

lasso = Lasso(alpha=alpha)
```

## Elastic Net

Elastic Net is a middle ground between Ridge Regression and Lasso Regression. The regularization term is a simple mix of both Ridge and Lasso’s regularization terms, and you can control the mix ratio l1_ratio. You can also determine the regularization variable with the alpha parameter

```python
from sklearn.linear_model import ElasticNet
elasticnet = ElasticNet(alpha=int, l1_ratio=float)
```

## Logistic Regression 
```python
from sklearn.linear_model import LogisticRegression
log_reg = LogisiticRegression()
log_reg.fit(x,y)
```

## Soft max Regression 
The Logistic Regression model can be generalized to support multiple classes directly, without having to train and combine multiple binary classifiers . This is called Softmax Regression, or Multinomial Logistic Regression
<br>
regularization is controlled using the C hypervariable

```python 
from sklearn.linear_model import LogisticRegression

softmax = LogisticRegression(multi_class="multinominal", solver="lbfgs", c=10)
```

# Decision Trees

They are algorithms which work by splitting a dataset based on certain features and hence make no generalization of the dataset It stops recursing once it reaches the maximum depth (defined by the ***max_depth*** hyperparameter), or if it cannot
find a split that will reduce impurity.


## Hyperparameters
**Criterion**: This decides the function used to measure impurity within each node. 
<br>These parameters include: "gini" or "entropy"
min_samples_split (the minimum number of samples a node must have before it can be split),<br> min_samples_leaf (the minimum num‐
ber of samples a leaf node must have)<br>
max_leaf_nodes (maximum number of leaf nodes), <br>and max_features
(maximum number of features that are evaluated for splitting at each node). 
>Increasing min_* hyperparameters or reducing max_* hyperparameters will regularize the
model.


# Ensemble Algorithms

## Voting Classifiers

This is a method which involves training multiple algorithms and then making predictions based on their collective ( average ) response
<hr>

```python
from sklearn.ensemble import RandomForrestClassifier, VotingClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression

log_clf = LogisticRegression()
svm_clf = SVC()
rand_frst = RandomForrestClassifier()

voting_clf = VotingClassifier(estimators=[("lr", log_clf), ("rf", rand_frst), ("svc", svm_clf)]
                              voting="hard")

```
<hr>
Hard voting is based on a simple rule = Majority carries the vote
<br>
With soft voting, you take advantage of  predict_proba() method, average the probabilities and then return a class based of the average probabilities 
<br>
This is not the case of the SVC class by default, so you need to set its probability hyperparameter to True (this will make the SVC class use cross-validation to estimate class probabilities, slowing down training, and it will add a predict_proba() method).

## Bagging and pasting
It uses the same training algorithm for every
predictor, but to train them on different random subsets of the training set. When sampling is performed with replacement, this method is called bagging(short for bootstrap aggregating). When sampling is performed without replacement, it is called pasting.
The following code trains an ensemble of 500 Decision Tree classifiers, each trained on 100 training instances randomly sampled from the training set with replacement (this is an example of bagging, but if you want to use pasting instead, just set bootstrap=False). The n_jobs parameter tells Scikit-Learn the number of CPU cores to use for training and predictions (–1 tells Scikit-Learn to use all available cores):
```python
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassfier
bag_clf = BaggingClassifier( DecisionTreeClassifier(), 
                            n_estimators=500,
                            max_samples=100, 
                            bootstrap=True, 
                            n_jobs=-1)
```
### Out of bag score

```python
bag_clf = BaggingClassifier( DecisionTreeClassifier(), n_estimators=500,bootstrap=True, n_jobs=-1, oob_score=True)
# to view the oob score
bag_clf.oob_score_
```

## Random Forrest

As we have discussed, a Random Forest is an ensemble of Decision Trees, generally trained via the bagging method (or sometimes pasting), typically with max_samples set to the size of the training set.

```python
from sklearn.ensemble import RandomForestClassifier
rnd_clf = RandomForestClassifier(n_estimators=500, max_leaf_nodes=16, n_jobs=-1)
rnd_clf.fit(X_train, y_train)
```

## XGBOOST
This is a different python library separate from sklearn
```python
import xgboost
xgb_reg = xgboost.XGBRegressor()
xgb_reg.fit(X_train, y_train)
y_pred = xgb_reg.predict(X_val)
```

# Dimension Reduction by PCA 

This is a method of reducing the number of reatures within a dataset while managing to maintain its variance. n_components determines the degree of variance you wish to preserve
```python
from sklearn.decomposition import PCA
pca = PCA(n_components=0.95)
x_reduced = pca.fit_transform(x)

# decompression
x_returned = pca.inverse_transform(x_reduced)
```

## Incremental PCA
 Fortunately, Incremental PCA (IPCA) algorithms have been developed: you can split the training
set into mini-batches and feed an IPCA algorithm one mini-batch at a time. 

```python
from sklearn.decomposition import IncrementalPCA
n_batches = 100
inc_pca = IncrementalPCA(n_components=154)
for X_batch in np.array_split(X_train, n_batches):
    inc_pca.partial_fit(X_batch)
X_reduced = inc_pca.transform(X_train)
```

# Clustering

## K means 

```python
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=5)
y_preds.fit_predict(x)
# setting predefined centroids
kmeans = KMeans(n_clusters=5, init=list_of_centroids, n_init=1)
```
An important way of checking if the cluster has the right number of clusters is by the  silhouette score
```python
sklearn.metrics import silhouette_score
silhouette_score(X, kmeans.labels_)
```

## DBSCAN

```python
from sklearn.cluster import DBSCAN

dbscan = DBSCAN(eps=int, min_samples=int)
```
This works for non linear clusterinf, with eps setting the value for the distance round the core values it must consider, and min_samples being the minimum number of samples within the cluster
<br>
The dbscan algorithm can not be used to predict which cluster a new instance belongs to hence it is coupled with other algorithms for that to be possible
```python
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=50)
knn.fit(dbscan.components_, dbscan.labels_[dbscan.core_sample_indices_])
```

## Gausian Mixture 
```python
from sklearn.mixture import GaussianMixture
gm = GaussianMixture(n_components=3, n_init=10)
gm.fit(X)
```

This further can be used to find outliers after clustering the data. 
```python
densities = gm.score_samples(X)
density_threshold = np.percentile(densities, 4)
anomalies = X[densities < density_threshold]
```
The decision boundry is 4% in this model.<br>
One method of evaluating the number of clusters ( n_components ), or if the model is good is by measuring the AIC or BIC, with the BIC more preferred <br>
`gm.aic(x)`
<br>
`gm.bic(x)`

## BayesianGaussianMixture
Rather than manually searching for the optimal number of clusters, it is possible to use instead the BayesianGaussianMixture class which is capable of giving weights equal (or close) to zero to unnecessary clusters. Just set the number of clusters n_coMponents to a value that you have good reason to believe is greater than the optimal number of clusters (this assumes some minimal knowledge about the problem at hand), and the algorithm will eliminate the unnecessary clusters automatically. 

```python
from sklearn.mixture import BayesianGaussianMixture
bgm = BayesianGaussianMixture(n_components=10, n_init=10, random_state=42)
bgm.fit(X)
```