In [3]:
from sklearn.neural_network import MLPClassifier
import numpy as np

In [13]:
X = np.array([[0., 0.], [1., 1.]])
y = [0, 1]

clf = MLPClassifier(solver='lbfgs', alpha=1e-5, hidden_layer_sizes=(5, 2), random_state=1)
clf.fit(X, y)

MLPClassifier(alpha=1e-05, hidden_layer_sizes=(5, 2), random_state=1,
              solver='lbfgs')

In [14]:
clf.coefs_
[coef.shape for coef in clf.coefs_]

[(2, 5), (5, 2), (2, 1)]

In [15]:
clf.intercepts_

[array([-0.14962269,  0.75950271, -0.5472481 ,  6.92417703, -0.87510813]),
 array([-0.47635084, -0.76834882]),
 array([8.53354251])]

**Regressor**

Class **MLPRegressor** implements a MLP that trains using backpropagation with NO activation function in the output layer. (AKA identity function as activation function).

It uses the square error as the loss function, and the output is a set of continuous values.

Also supports multi-output regression, in which samples can have more than one target.

**Classifier**

Class MLPClassifier implements a MLP algorithm that trains using backpropagation.

Supports multi-label and multi-class (softmax) classification.

Both train on
X - (n_samples, n_features) training samples
y - (n_samples) target values (class labels)

**Regularization**

Both use parameter **alpha** for regularization, avoiding overfitting by penalizing weights with large magnitudes. Can vary this with MLP:

https://scikit-learn.org/stable/auto_examples/neural_networks/plot_mlp_alpha.html#sphx-glr-auto-examples-neural-networks-plot-mlp-alpha-py

**Optimization**

Uses SGD, Adam or L-BFGS. 

**Scaling**

* Scale data as MLP sensitive to feature scaling
i.e. standardise to have 0 mean and 1 variance, or place attribute between 0 and 1 or -1 and 1.

Can use the **StandardScaler** to do this.

* Learning parameter alpha

Use Grid-SearchCV to find alpha usually in the range 10.0 ** -np.arrange(1,7)

* L-BFGS converges quick with better solutions on small datasets. For larger use Adam. SGD with momentum or nesterov's momentum can perform better if learning rate is correctly tuned.



In [19]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
StandardScaler()

StandardScaler()

# Basic End-End Scikit-learn workflow

In [20]:
import pandas as pd
import numpy as np

# Import dataset and save to a dataframe
#data_df = pd.read_csv()

# Group data into features and labels
#X = data_df.drop("target",axis=1)
#y = data_df["target"]

In [21]:
# Split data into training and test sets
#from sklearn.model_selection import train_test_split
#X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1, shuffle=True)

Figure out which model to use

https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html

or 

https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier

https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html#sklearn.neural_network.MLPRegressor

In [23]:
# Instantiate an instance of the chosen model
#clf = ...

# Fit model to data
#clf.fit(X_train, y_train)

In [24]:
# Evaluate predictions
#clf.score(X_test, y_test)

1.0

# PyTorch 

In [1]:
import torch
from torch import nn, optim
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn

**Creating a dataset class**

In [None]:
from torch.utils.data import Dataset, DataLoader

# For regression:
class Data(Dataset):
    
    # Initialisation
    def __init__(self, train = True):
            self.x = torch.arange(-3, 3, 0.1).view(-1, 1)
            self.f = -3 * self.x + 1
            self.y = self.f + 0.1 * torch.randn(self.x.size())
            self.len = self.x.shape[0]
            
            #outliers 
            if train == True:
                self.y[0] = 0
                self.y[50:55] = 20
            else:
                pass
      
    # Indexer
    def __getitem__(self, index):    
        return self.x[index], self.y[index]
    
    # Length
    def __len__(self):
        return self.len
    
# Create training dataset and validation dataset
train_data = Data()
val_data = Data(train = False)

**Training and test split**

**The Dataset**

In [4]:
import torch
import torch.nn as nn
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

# Import dataset into a dataframe
#dataset = pd.read_csv()

# Have a look
#print(dataset.shape)
#dataset.head()

In [5]:
# Exploratory analysis - PLOTS?



In [None]:
# Data pre-processing
# Two types of information, categorical (discrete)
# or continuous
categorical_cols = ["columns..."]
numerical_cols = ["cols"]
outputs = ["cols"]

#check data types of columns
#dataset.dtypes

# Recast as category type (first step in numerical conversion)
for category in categorical_cols:
    dataset[category] = dataset[category].astype('category')

# Convert our columns to numpy arrays
some_name = dataset["Column_title"].cat.codes.values
some_other...
categorical data = np.stack([some_name, some_other], axis=1) #array of values

# Convert to tensor
categorical_data = torch.tensor(categorical_data, dtype=torch.int64)

# Repeat for numerical columns
numerical_data = np.stack([dataset[col].values for col in numerical_cols], 1)
numerical_data = torch.tensor(numerical_data, dtype=torch.float)

# And output array, GOTTA FLATTEN for tensor function
outputs = torch.tensor(dataset[outputs].values.flatten())

# Check them all
print(categorical_data.shape)
print(numerical_data.shape)
print(outputs.shape)

# Split them up
torch.utils.data.random_split(dataset, lengths)

   **Creating a combination of modules**

In [16]:
# Sequential combines some Modules
model = torch.nn.Sequential(torch.nn.Linear(3, 1), torch.nn.Flatten(0, 1))

# Linear Module computs output from input using a linear function
# Holds weights and biases internally

# Flatten Module flatens output of the linear layer to a 1D tensor
# to match shape of y

**Loss functions**

In [None]:
loss_fn = torch.nn.MSELoss(....)

**Optimiser** (optim)

We have updated weights manually by using "with torch.no_grad".
When it gets complicated this is a burden. The **optim** package deals with this.

In [14]:
# Set up the optimizer, and link it to our 'model'
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)

# We will want to zero the gradients
optimizer.zero_grad()

# Then step the Optimzer to update
optimizer.step()

# Remember

* Wrap in torch.no_grad() when updating the weights! Then zero the gradients after updating the weights (or gradients will keep amassing), or zero before!

* You can just call model(X) since it overrides the call function. I.e. y_pred = model(X)

**Autograd function**

It is really just 2 functions. 

1. Forward - computes output tensors from input tensors

2. Backward - computs gradient of input tensors wrt values for a given output gradient

So we can create a class which inherits torch.autograd.Function and define @staticmethod and forward and backward passes.

**NN** -
When building networks with lots of layers and learnable parameters we use the nn package to build neural networks. This defines a set of **Modules**, which are roughly equivalent to neureal network layers. The package defines useful loss functions commonly used for training.

**Module**  - 
Receives input Tensors and computes output Tensors, but also holds information about state (such as parameters). 


**Custom nn Modules**