# Softmax Logistic Regression
Creation of the logistic regression model for Iris to perform multiclass classification on all the attributes of the set.

- Carrying out the definition of the architecture using both the high and medium level interfaces.
- Multiclass accuracy programming and evaluate the model with this metric.

In [1]:
import os
from itertools import islice as take
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision.datasets.utils import download_url
from torch.autograd import Variable
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
URL = 'https://raw.githubusercontent.com/gibranfp/CursoAprendizajeProfundo/master/data/iris/iris.csv'
base_dir = './data/iris/'
filename = 'iris.csv'
filepath = os.path.join(base_dir, 'iris.csv')
from sklearn.model_selection import train_test_split
SEED = 1
torch.manual_seed(SEED)

<torch._C.Generator at 0x7fd60785c710>

## Data

We will use a reference set called Iris collected by Ronald Fisher (yes, the same one from the Fisher-Yates shuffling algorithm). This set has four input attributes: the widths and lengths of the petals and sepals; Ytres clases de flor iris de salida: setosa, versicolour, virginica.

![Pétalo y sépalo](https://miro.medium.com/max/2550/1*7bnLKsChXq94QjtAiRn40w.png)
<center>Fuente: Suruchi Fialoke, October 13, 2016, Classification of Iris Varieties</center>

This set has 50 samples of each class, let's read it and print the first five of each class.

### Download and codification of dataset

In [7]:
download_url(URL, base_dir, filename)
columns = ('SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm', 'Species')
df = pd.read_csv('./data/iris/iris.csv', names=columns)
df.loc[df.Species=='Iris-setosa', 'Species'] = 0
df.loc[df.Species=='Iris-versicolor', 'Species'] = 1
df.loc[df.Species=='Iris-virginica', 'Species'] = 2
x_trn = np.array(df.iloc[:,:4], dtype="float32")
y_trn = np.array(df.iloc[:, -1], dtype="float32")
print(x_trn.shape)
print(y_trn.shape)
pd.concat((df[1:5], df[50:55], df[100:105]))  

Using downloaded and verified file: ./data/iris/iris.csv
(150, 4)
(150,)


Unnamed: 0,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
1,4.9,3.0,1.4,0.2,0
2,4.7,3.2,1.3,0.2,0
3,4.6,3.1,1.5,0.2,0
4,5.0,3.6,1.4,0.2,0
50,7.0,3.2,4.7,1.4,1
51,6.4,3.2,4.5,1.5,1
52,6.9,3.1,4.9,1.5,1
53,5.5,2.3,4.0,1.3,1
54,6.5,2.8,4.6,1.5,1
100,6.3,3.3,6.0,2.5,2


### Training data / Test data

In [8]:
features_train,features_test, labels_train, labels_test = train_test_split(x_trn, y_trn, random_state=42, shuffle=True)

In [9]:
features_train.shape

(112, 4)

### Model

In [10]:
class RegresionLogisticaMulticlase(nn.Module):
    def __init__(self, input_dim):
        super(RegresionLogisticaMulticlase, self).__init__()
        self.layer1 = nn.Linear(input_dim,50)
        self.layer2 = nn.Linear(50, 3)
        
    def forward(self, x):
        x = torch.sigmoid(self.layer1(x))
        x = F.softmax(self.layer2(x))
        return x

In [20]:
modelo = RegresionLogisticaMulticlase(features_train.shape[1])
losses = []


optimizer = torch.optim.Adam(modelo.parameters(), lr=0.01)
loss_fn = nn.CrossEntropyLoss()
epocas = 100

x_train, y_train = Variable(torch.from_numpy(features_train)).float(), Variable(torch.from_numpy(labels_train)).long()
for epoca in range(1, epocas+1):
    print ("Epoca #",epoca)
    y_pred = modelo(x_train)
    loss = loss_fn(y_pred, y_train)
    print ("El valor de la función de perdida es: ", loss.item()*100)
    optimizer.zero_grad()
    loss.backward() 
    optimizer.step() 
    losses.append(loss.item()*100)
    

Epoca # 1
El valor de la función de perdida es:  109.1550350189209
Epoca # 2
El valor de la función de perdida es:  108.1132173538208
Epoca # 3
El valor de la función de perdida es:  107.10022449493408
Epoca # 4
El valor de la función de perdida es:  106.00082874298096
Epoca # 5
El valor de la función de perdida es:  104.80237007141113
Epoca # 6
El valor de la función de perdida es:  103.5280704498291
Epoca # 7
El valor de la función de perdida es:  102.21449136734009
Epoca # 8
El valor de la función de perdida es:  100.87023973464966
Epoca # 9
El valor de la función de perdida es:  99.47323203086853
Epoca # 10
El valor de la función de perdida es:  98.02560806274414
Epoca # 11
El valor de la función de perdida es:  96.5609610080719
Epoca # 12
El valor de la función de perdida es:  95.11640667915344
Epoca # 13
El valor de la función de perdida es:  93.71005296707153
Epoca # 14
El valor de la función de perdida es:  92.34743118286133
Epoca # 15
El valor de la función de perdida es:  91.

  x = F.softmax(self.layer2(x))


### Predictions

In [21]:
x_test = Variable(torch.from_numpy(features_test)).float()
pred = modelo(x_test)

  x = F.softmax(self.layer2(x))


In [22]:
pred = pred.detach().numpy()

In [14]:
for j in range(len(x_test)):
    y_pred = np.argmax(modelo(x_test[j]).detach().numpy(), axis=0)
    print(f'Valor Verdadero = {labels_test[j]} Valor predecido = {y_pred}')

Valor Verdadero = 1.0 Valor predecido = 1
Valor Verdadero = 0.0 Valor predecido = 0
Valor Verdadero = 2.0 Valor predecido = 2
Valor Verdadero = 1.0 Valor predecido = 1
Valor Verdadero = 1.0 Valor predecido = 1
Valor Verdadero = 0.0 Valor predecido = 0
Valor Verdadero = 1.0 Valor predecido = 1
Valor Verdadero = 2.0 Valor predecido = 2
Valor Verdadero = 1.0 Valor predecido = 1
Valor Verdadero = 1.0 Valor predecido = 1
Valor Verdadero = 2.0 Valor predecido = 2
Valor Verdadero = 0.0 Valor predecido = 0
Valor Verdadero = 0.0 Valor predecido = 0
Valor Verdadero = 0.0 Valor predecido = 0
Valor Verdadero = 0.0 Valor predecido = 0
Valor Verdadero = 1.0 Valor predecido = 1
Valor Verdadero = 2.0 Valor predecido = 2
Valor Verdadero = 1.0 Valor predecido = 1
Valor Verdadero = 1.0 Valor predecido = 1
Valor Verdadero = 2.0 Valor predecido = 2
Valor Verdadero = 0.0 Valor predecido = 0
Valor Verdadero = 2.0 Valor predecido = 2
Valor Verdadero = 0.0 Valor predecido = 0
Valor Verdadero = 2.0 Valor predec

  x = F.softmax(self.layer2(x))
