# Model Definition

- Choose, justify and apply a model performance indicator (e.g. F1 score, true positive rate, within cluster sum of squared error, …) to assess your model and justify the choice of an algorithm

- Implement your algorithm in at least one deep learning and at least one non-deep learning algorithm, compare and document model performance

- Apply at least one additional iteration in the process model involving at least the feature creation task and record impact on model performance (e.g. data normalizing, PCA, …)

- Depending on the algorithm class and data set size you might choose specific technologies / frameworks to solve your problem. Please document all your decisions in the ADD (Architectural Decisions Document).
<br><font color=blue></font>

In [85]:
# load cleaned data
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()

In [86]:
preppedDataDF = spark.read.parquet('preppedDataDF.parquet')
preppedDataDF.createOrReplaceTempView("preppedDataDF")

### Third iteration after adding new features with function and loop

## Model 1: Logistic Regression (Classification)
### Supervised machine learning
#### Classification of tiers of players

In [118]:
from pyspark.ml.regression import LinearRegression
from pyspark.ml.classification import LogisticRegression
from pyspark.ml.evaluation import MulticlassClassificationEvaluator
from pyspark.mllib.evaluation import MulticlassMetrics

lr = LogisticRegression(labelCol="label", featuresCol="features", maxIter=10, regParam=0.5, elasticNetParam=0.8)

from pyspark.ml import Pipeline
# we have already added vectors and normalized in the previous feature_eng module
pipeline = Pipeline(stages=[lr])

In [120]:
!rm -rf logistic_regression
lr.save("logistic_regression")

## Model 2: MultilayerPerceptronClassifier (MLP) (Classification)
### More primitive deep learning
#### Classification of tiers of players

In [98]:
from pyspark.ml.classification import MultilayerPerceptronClassifier

In [122]:
num_inputs = len(train.toPandas()['features'][0])

In [138]:
# specify layers for the neural network:
# input layer of size 13 (features), varying intermediate hidden layers
# and output of size 4 (classes)

layers = [num_inputs, 64, 64, 64, 32, 4]

In [139]:
# create the trainer and set its parameters
MLP_trainer = MultilayerPerceptronClassifier(maxIter=100, layers=layers, blockSize=128, seed=17)

In [140]:
!rm -rf MLP_trainer
MLP_trainer.save("MLP_trainer")

## Model 3: Neural Net (Classification)
### Deep learning
#### Classification of tiers of players

In [61]:
import torch
import torch.nn as nn
import torchvision
import torch.nn.functional as F

Thought about adding another hidden layer but the performance is decently good without.

In [63]:
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out = 64, num_inputs, 20, 4

# Create random Tensors to hold inputs and outputs
x = torch.tensor(list(train.toPandas()['features']))
y = torch.tensor(list(train.toPandas()['label']))

# define the model by subclassing Module
class Net(nn.Module):

    def __init__(self, D_in, H, D_out):
        """
        In the constructor we instantiate two nn.Linear modules and assign them as
        member variables.

        D_in: input dimension
        H: dimension of hidden layer
        D_out: output dimension
        """
        super(Net, self).__init__()
        # definte layers here.  can be re-used
        self.layer_1 = nn.Linear(D_in, H, bias=True)
        self.relu = nn.ReLU()
        self.layer_2 = nn.Linear(H, H, bias=True)
        self.output_layer = nn.Linear(H, D_out, bias=True)

    def forward(self, x):
        """
        In the forward function we accept a Variable of input data and we must 
        return a Variable of output data. We can use Modules defined in the 
        constructor as well as arbitrary operators on Variables.
        """
        out = self.layer_1(x)
        out = self.relu(out)
        out = self.layer_2(out)
        out = self.relu(out)
        out = self.output_layer(out)
        return out


# Use the nn package to define our model and loss function.
model = Net(D_in, H, D_out)

In [102]:
torch.save(model,"torch_nn")

  "type " + obj.__name__ + ". It won't be checked "
