# Xings Cheat Sheet

A compilation of everything I learned during my masters thesis. Written by **Jun Xing Li**

# Scikit Learn

## *StandardScalar()*

In [1]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler

# Create a 2D array
X = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
y = np.array([[10, 11, 12]])

pd.DataFrame(X)

Unnamed: 0,0,1,2
0,1,2,3
1,4,5,6
2,7,8,9


Now we call the *StandardScaler()* object that contains different functions. We apply *fit_transform()* in order to perform the following:

\begin{equation*}
    X_{scaled} = \frac{X - \mu}{\sigma}
\end{equation*}

In order to apply the same transformation on $y$, we use *transform()* since the scaling (aka. mean and variance) are stored in the object. 

\begin{equation*}
    y_{scaled} = \frac{y - \mu}{\sigma}
\end{equation*}

Important to note that the training and test data should be scaled independently 
in order to avoid data leakage.

In [2]:
scaler = StandardScaler()
X_obj = scaler.fit_transform(X)

pd.DataFrame(X_obj)

Unnamed: 0,0,1,2
0,-1.224745,-1.224745,-1.224745
1,0.0,0.0,0.0
2,1.224745,1.224745,1.224745


In [3]:
y_scaled = scaler.transform(y)
pd.DataFrame(y_scaled)

Unnamed: 0,0,1,2
0,2.44949,2.44949,2.44949


# *args and **kwargs

[Source](https://www.youtube.com/watch?v=GdSJAZDsCZA)

The *args is a positional argument collector.

In [10]:
def func1(*args):
    print("Tuple", args)
    print("All arguments", *args)
    

func1(1, "abc", 3, 4, 5)

Using normal gives a tuple (1, 'abc', 3, 4, 5)
Using * gives all arguments 1 abc 3 4 5


The **kwargs stands for key word arguments and can be shown as such. 

In [22]:
def func2(**kwargs):
    print("The whole dict", kwargs)
    print("The keys", *kwargs)
    print("The values", *kwargs.values())

func2(abc=123, yolo=567)

Hello World {'abc': 123, 'yolo': 567}
Hello World abc yolo
Hello World 123 567
Hello World () abc


Important to always order them in this following order:

<ol>
    <li>Positional arguments</li>
    <li>Optional positional arguments</li>
    <li>Key word arguments</li>
    <li>Optional key word arguments</li>
</ol>

In [27]:
def func3(arg_1, *args, arg3=None, arg4=6,  **kwargs):
    print(arg_1, *args,  arg3, arg4, kwargs)

func3("abc", "efg", "hij", arg3=5, abc=123)

abc efg hij 5 6 {'abc': 123}


In [36]:
def func4(**kwargs):
    print(kwargs)
    print(*kwargs)
    print(*kwargs.values())

def func5(kwargs):
    func4(**kwargs)
    
a = {'fit_intercept': True, 'n_jobs': None}
func5(a)

{'fit_intercept': True, 'n_jobs': None}
fit_intercept n_jobs
True None


# MLxtend

## *SequentialFeatureSelector*

Implementation of the greedy search algorithm. The best method to find the optimal subset of features would be to try out every single combination of features. This leads to and exhaustive feature selection due to computational demands. 

Whether or not a feature is included can be seen as a binary number (01011). The number of total combinations would be $2^n$, but we have to subtract the combination resulting in a null set. This gives the formula of 

\begin{equation*}
    \text{Combinations} = 2^n - 1
\end{equation*}

For a feature set of 14 features, this results in 16 383 unique combinations. In order to cut down the computational time, one can utilize SFS. 

[Explain Sequential Forward Selection]

[Explain Sequential Floating Forward Selection] Leads to better score, but slightly worse performance

[Ferri (1994)](https://www.sciencedirect.com/science/article/abs/pii/B9780444818928500407)

[mlxtend docs](https://rasbt.github.io/mlxtend/user_guide/feature_selection/SequentialFeatureSelector/)

In [37]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_iris

iris = load_iris()
X = iris.data
y = iris.target
knn = KNeighborsClassifier(n_neighbors=4)

In [45]:
from mlxtend.feature_selection import SequentialFeatureSelector as SFS

sfs1 = SFS(knn,                 # the classifier
           k_features=3,        # the number of features to select
           forward=True,        # if True, perform forward selection
           floating=False,      # if True, perform floating search
           verbose=0,           # print progress
           scoring='accuracy',  # the metric to use
           cv=0)                # the number of cross-validation folds

sfs1 = sfs1.fit(X, y)
sfs1.subsets_

{1: {'feature_idx': (3,),
  'cv_scores': array([0.96]),
  'avg_score': 0.96,
  'feature_names': ('3',)},
 2: {'feature_idx': (2, 3),
  'cv_scores': array([0.97333333]),
  'avg_score': 0.9733333333333334,
  'feature_names': ('2', '3')},
 3: {'feature_idx': (1, 2, 3),
  'cv_scores': array([0.97333333]),
  'avg_score': 0.9733333333333334,
  'feature_names': ('1', '2', '3')}}

In [49]:
# Note that the transform call is equivalent to
# X[:, sfs1.k_feature_idx_]
X_sfs = sfs1.transform(X)
print(X.shape)
print(X_sfs.shape)

(150, 4)
(150, 3)


In [None]:
"""
ridge1 = Ridge()
ridge2 = Ridge()

sfs_ridge = SFS(ridge1, k_features="best", forward=True, floating=True, 
                scoring='neg_mean_squared_error', cv=5, n_jobs=-1)
pipe_ridge = Pipeline([('sfs', sfs_ridge), 
                       ('ridge2', ridge2)])
bs_ridge = BayesSearchCV(pipe_ridge,
                   {
                         'ridge2__alpha': Real(1e-6, 1e+6, prior='log-uniform'),
                   }, 
                   n_iter=5,
                   cv=5, 
                   scoring='neg_root_mean_squared_error',
                   random_state=0)

bs_ridge = bs_ridge.fit(X_train.drop("Time", axis=1), y_train)
BayesSearch_results(X_train, X_test, y_train, y_test, bs_ridge)
"""