# Break linearity example

> Why does neural network rules the fitting power since 2006

> This is a notebook intended for company(Genomicare Bio) training, to help people feel more friendly about NN

You certainly hear ```neural network``` & ```deep learning``` a lot in recent years. Yes it is something really advanced in certain aspects, but it's just a tool, which is much easier to use than its predecessor: linear regression.

This notebook is yet another effort to democratize deep learning. 

This kind of graph appear at many tutorials

![Simple nn view from wikipedia](https://upload.wikimedia.org/wikipedia/commons/e/e4/Artificial_neural_network.svg)

## Some playing around

* Try use your [face to control pac man](https://storage.googleapis.com/tfjs-examples/webcam-transfer-learning/dist/index.html)

* A neural [network playground](https://playground.tensorflow.org/)

# What's and why is all the fuss?

### What?

In a simple one liner answer:

Neural network is one of the most powerful, easiest to use **predictor**, that given x, any shape of x to predict y, any shape of y.

I can rephrase the sentence even longer, but it is in its essence, an awesome **predictor**

Think of $y = f(x)$,  the $f$ is a function(model), a predictor

### Why?

* It often works, with less effort than you expected, often better than human judgement
* Paper in this area often compare model learning theory to human recognition.
* It's like a warp engine for other traditional model.
* x and y can be really noisy, in some twisted & wierd shape, AKA, it predicts all sorts of problem set, with **much less preprocessing**, **much less feature engineering**
* Real life data itself is noisy, in some twisted & wierd shape

## Dataset
### Simplified Human Activity Recognition w/Smartphone

See the [dataset detail](https://www.kaggle.com/mboaglio/simplifiedhuarus)

> Abstract: Human Activity Recognition database built from the recordings of 30 subjects performing activities of daily living (ADL) while carrying a waist-mounted smartphone with embedded inertial sensors.

> Data Set Characteristics: Multivariate, Time-Series

> Data Set Information:

> The experiments have been carried out with a group of 30 volunteers within an age bracket of 19-48 years. Each person performed six activities (WALKING, WALKINGUPSTAIRS, WALKINGDOWNSTAIRS, SITTING, STANDING, LAYING) wearing a smartphone (Samsung Galaxy S II) on the waist. Using its embedded accelerometer and gyroscope, we captured 3-axial linear acceleration and 3-axial angular velocity at a constant rate of 50Hz. The experiments have been video-recorded to label the data manually. The obtained dataset has been randomly partitioned into two sets, where 70% of the volunteers was selected for generating the training data and 30% the test data.

> The sensor signals (accelerometer and gyroscope) were pre-processed by applying noise filters and then sampled in fixed-width sliding windows of 2.56 sec and 50% overlap (128 readings/window). The sensor acceleration signal, which has gravitational and body motion components, was separated using a Butterworth low-pass filter into body acceleration and gravity. The gravitational force is assumed to have only low frequency components, therefore a filter with 0.3 Hz cutoff frequency was used. From each window, a vector of features was obtained by calculating variables from the time and frequency domain.

> Check the README.txt file for further details about this dataset.

> An updated version of this dataset can be found at [here](https://archive.ics.uci.edu/ml/datasets/human+activity+recognition+using+smartphones). It includes labels of postural transitions between activities and also the full raw inertial signals instead of the ones pre-processed into windows.

In [None]:
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

from sklearn.decomposition import PCA

Install extra [packages](https://github.com/raynardj/forgebox)

In [None]:
!pip install -q forgebox

In [None]:
from forgebox.imports import *
from forgebox.ftorch.prepro import split_df
from forgebox.html import DOM
from forgebox.images.widgets import view_images

## Train data and test(validation) data

In [None]:
# constant
DATA = Path("/kaggle/input/simplifiedhuarus")
VALID_RATIO = .2

def read_data(filename: str) -> pd.DataFrame:return pd.read_csv(DATA/f'{filename}.csv')

total_df = read_data("train")
# train/valid split
train_df, valid_df = split_df(total_df, valid = VALID_RATIO)

test_df = read_data("test")

In [None]:
total_df.sample(10)

In [None]:
train_df.vc("activity")

In [None]:
valid_df.vc("activity")

### Visualize data signal pattern

We have 561 input features at our disposal

In [None]:
feature_nums = train_df.query("activity=='LAYING'").values[:200,2:].shape[1]
feature_nums

In [None]:
plt.imshow(train_df.query("activity=='LAYING'").values[:200,2:].astype(np.float32))

In [None]:
plt.imshow(train_df.query("activity=='STANDING'").values[:200,2:].astype(np.float32))

In [None]:
plt.imshow(train_df.query("activity=='WALKING_UPSTAIRS'").values[:200,2:].astype(np.float32))

## Build up dataloader

In [None]:
from torch.utils.data import DataLoader,Dataset

We have a pretty balanced dataset

In [None]:
y_map = dict((v,k) for k,v in enumerate(train_df.vc("activity").index))

Target categories mapped to indices

In [None]:
y_map

In [None]:
class ArrayDs(Dataset):
    def __init__(self, df, y_map=y_map):
        self.df = df
        self.y_map = y_map
        self.Xs = self.df.values[:,2:].astype(np.float32)
        if 'activity' in self.df.columns:
            self.has_y = True
            self.Ys = self.df['activity'].apply(lambda x:y_map[x]).values
        else:
            self.has_y = False
        
    def __len__(self): return len(self.df)
    
    def __getitem__(self,idx):
        if self.has_y:
            return self.Xs[idx],self.Ys[idx]
        else:
            return self.Xs[idx]
    
def get_data_dl(df: pd.DataFrame, batch_size: int=128, shuffle=False) -> DataLoader:
    ds = ArrayDs(df)
    return DataLoader(ds, shuffle=shuffle, batch_size=batch_size)

In [None]:
x, y = ArrayDs(train_df,)[6]

x, y

## Training boiler template

> You can also use sklearn at this part

In [None]:
!pip install -q pytorch-lightning==1.0.4

In [None]:
from pytorch_lightning import LightningDataModule, LightningModule
import pytorch_lightning as pl

class AllData(LightningDataModule):
    def __init__(self):
        super().__init__()
        
    def prepare_data(self):
        self.total_df = read_data("train")
        # train/valid split
        self.train_df, self.valid_df = split_df(
            self.total_df, valid = VALID_RATIO)
        
        self.test_df = read_data("test")
    
    def train_dataloader(self): return get_data_dl(
        self.train_df, shuffle=True)
    
    def val_dataloader(self): return get_data_dl(self.valid_df)
    
    def test_dataloader(self): return get_data_dl(self.test_df)
    

class ltModule(LightningModule):
    def __init__(self, base_model):
        super().__init__()
        self.base_model = base_model
        self.crit = nn.CrossEntropyLoss()
        self.accuracy = pl.metrics.Accuracy()
            
    def configure_optimizers(self):
        opt = torch.optim.Adam(self.parameters(), lr=1e-3)
        return opt
    
    def forward(self, x): return self.base_model(x)
    
    def forward_pass(self, batch):
        x, y = batch
        y_ = self(x)
        loss = self.crit(y_,y)
        acc = self.accuracy(y_.argmax(dim=-1),y)
        return {'loss': loss, 'acc':acc}
        
    def training_step(self, batch, batch_idx): return self.forward_pass(batch)
        
    def validation_step(self, batch, batch_idx): return self.forward_pass(batch)
    
    def print_acc(self, outputs, phase):
        avg_acc = torch.stack([x['acc'] for x in outputs]).mean()
        print(f"[{phase}]\tAccuracy:\t{int(avg_acc.item()*100)}%", end="\t")
    
    def training_epoch_end(self, outputs):
        self.print_acc(outputs, "TRAIN")

    def validation_epoch_end(self, outputs):
        self.print_acc(outputs, "VALID")

def learn(base_model, max_epochs=5):
    """
    Train the model automatially, the entire pipeline
    """
    all_data = AllData()
    module = ltModule(base_model)
    trainer = pl.Trainer(max_epochs=max_epochs)
    trainer.fit(model = module, datamodule=all_data, )

## Linear relation

In [None]:
# create linear model
linear_model = nn.Linear(feature_nums,len(y_map))

# learning process
learn(linear_model)

## Break Linearity

> Basic neural network

In [None]:
HIDDEN_SIZE = 512

# create model structure
neural_network = nn.Sequential(
    nn.Linear(feature_nums, HIDDEN_SIZE),
    nn.BatchNorm1d(HIDDEN_SIZE),
    nn.ReLU(),
    nn.Linear(HIDDEN_SIZE, len(y_map))
)

# learning process
learn(neural_network)

## Visualize

### Weights structure

In [None]:
def print_param_shape(model) -> pd.DataFrame:
    """print out the parameter shapes of a model"""
    return pd.DataFrame(list({"name":k, "shape":tuple(p.T.shape)} 
        for k,p in model.named_parameters()))

In [None]:
print_param_shape(linear_model)

In [None]:
print_param_shape(neural_network)

### Find influence

Input features, we print out the first 50 features of our input columns

In [None]:
x_map = np.array(total_df.columns[2:])
print('\t'.join(x_map[:50]))

Output categories

In [None]:
y_map

In [None]:
linear_weight = linear_model.weight.data.T
linear_weight, linear_weight.shape

In [None]:
most_influential_rank = linear_weight.abs().sum(-1).argsort(descending=True).numpy()

x_map[most_influential_rank][:30]

In [None]:
import plotly.express as px
def plot_heat(w,xname,yname,x,y,x_axis_top=True, height=None):
    pca = PCA(1,)
    new_order = pca.fit_transform(w)[:,0].argsort()
    new_w = w[new_order,:].T

    fig = px.imshow(new_w,
                    labels = dict(x=xname,y=yname),
                    x=x[new_order],y=y)
    if x_axis_top:
        fig.update_xaxes(side="top")
    if height:
        fig.update_layout(height=height)
    fig.show()

### Linear visualize
> We visualize the influences directly from ```input``` features to ```output``` categories

The size of the weights are 561 columns x 6 labels

Here we pick the most influential 30 input features for visualization

In [None]:
plot_heat(
    w=linear_weight[most_influential_rank][:30],
    xname="features",
    yname="activities",
    x=x_map[most_influential_rank][:30],
    y=np.array(list(y_map.keys()))
)

In [None]:
plot_heat(
    w=linear_model.bias.data.numpy()[None,:],
    xname="bias",
    yname="activities",
    x=np.array(["bias_layer",]),
    y=np.array(list(y_map.keys()))
)

In [None]:
neural_network[0],neural_network[3]

## Break linearity
> But linearity can not simulate a full spectrum of ```OR, AND, XOR``` logic gates, the weight $W_{ij}$ simply tells "the bigger/smaller $X_{i}$ the better for lable $Y_{j}$"

> Before neural network, what people usually tried, is the complicated feature enginearing to break linearity: to treat things like $X_{1}^2X_{2}, X_{1}X_{2}^2, X_{1}^2X_{2}^2, X_{1}^3X_{2}$... as extra part of input dimension 


### Where does hidden layer comes in 
Instead of $X_{i} \times W_{ij} => Y_{j}$, we do $X_{i} \times L1_{ik} => H_{k}, H_{k}\times L2_{kj} => Y_{j}$

> Neural network just using a linear model to fit input into hidden neurons (summarize input features into a middle layer), then use another linear model to fit hidden neurons into output. Hence breaking the linearity.

### Visualize input to hidden layer
> We pick first 80 input features and first 30 hidden neurons to visualize

In [None]:
plot_heat(
    w=neural_network[0].weight.data.T[:80,:30],
    xname="features",
    yname="hidden_feature",
    x=x_map[:80],
    y=np.arange(30)
)

### Visualize hidden layer to output

> We pick first 30 hidden neurons to all our output categories for visualize

In [None]:
plot_heat(
    w=neural_network[3].weight.data.T[:30,:],
    xname="hidden_feature",
    yname="activity",
    x=np.array(list(map(lambda i:f"H{i}",range(30)))),
    y=np.array(list(y_map.keys())),
)

# What goes from here

### Use the same model weights through time over and over again

If a sentence, eg. ```I will not fear. Fear is the mind-killer. I will face my fear. I will let it pass through me. When the fear has gone, there shall be nothing. Only I will remain.```

f("I will not fe") => "a"

f(" will not fea") => "r"

f("will not fear") => "."

...

f("Only I will re") => "m"

f("nly I will rem") => "a"

f("ly I will rema") => "i"

f("y I will remai") => "n"

Here $f(x)$ should not be different models depend on where we at within the sentence. It should be one single model that being ran **recurrently**. Hence the term RNN(recurrent neural networks)

### Use the same model weights through space over and over again

### Stuck an Neural network in Bellman equation

$\large V^{\pi*}(s)=  \max_a \{ {R(s,a) + \gamma \sum_{s'} P(s'|s,a) V^{\pi*}(s')} \}.\ $

But [with deep nn you can play all sorts of atari game](https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf)

# Let several models play with/ teach /fight each other

With [this paper](https://arxiv.org/pdf/1406.2661.pdf) end up from a bar fight, people start to play with serveral models interacting together.

And this [imaginative paper](https://arxiv.org/pdf/1703.10593.pdf) is one of the good example

# How to start?

* For genomicarers. Just look for 💪Ray😲
* Mostly, python
* [Kaggle](www.kaggle.com): competitions, notebooks to boost start the unfamiliar problem set
* [fast.ai](fast.ai): a tutorial:
    * almost writing a DL library from scratch
    
## Environments

* [Google Colab](https://colab.research.google.com/), free environment, GPU/TPU
* Kaggle kernels, also free environment, GPU/TPU

## About Ray
* [Kaggle profile](https://www.kaggle.com/raynardj)
    * Competition notes for [Understanding Clouds from Satellite Images](https://github.com/iofthetiger/ucsi)
    * and [Global Wheat Detection](https://github.com/iofthetiger/gwd)
    * and [Recursion Cell Image Classification](https://github.com/raynardj/python4ml/tree/master/experiments/rcic)
* My [experiments](https://genomicare.github.io/docs/docs/experiments/) at genomicare
* My [python tutorial](https://github.com/raynardj/python4ml) designed for machine learning