<a href="https://colab.research.google.com/github/albope/edem-clase-git/blob/master/Workshop_1%3A%20Neural%20Networks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Workshop 1: Mobile Price Range Classification

In this workshop we will learn how to train a neural network with numeric data as input to predict the price range of mobile phones ([info of the data](https://www.kaggle.com/iabhishekofficial/mobile-price-classification/data#)). The main blocks of the workshop are:

1. Get the data from Google Drive.
2. Load and Pre-process the data.
3. Define a Fully Connected Neural Network.
4. Choose loss function and optimizer.
5. Train the network.

The main libraries thar will be used in the workshop are:

- Tensorflow
- Keras
- Sci-kit Learn
- Matplotlib
- Numpy
- Pandas

[Reference](https://towardsdatascience.com/building-our-first-neural-network-in-keras-bdc8abbc17f5)

## 1. Get the data from Google Drive

In [0]:
# Import libraries to interact with Google Drive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

In [0]:
# Authenticate with your Google account to get access to the data
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

In [0]:
# Download data
download = drive.CreateFile({'id': '1igTyUp-YTHG0ig9VeNu83R8jBWn5Djji'})
download.GetContentFile('mobile_price.zip')

In [0]:
# Extract data from zip file
import zipfile
from pathlib import Path
data_path = Path("./mobile_price.zip")

with zipfile.ZipFile(str(data_path), 'r') as zip_ref:
    zip_ref.extractall("./data")

In [0]:
# List files of ./data directory
ls ./data

## 2. Load and Pre-process the data



In [0]:
# Load dependencies for loading data
import numpy as np
import pandas as pd

In [0]:
# Load training dataset and check variables
dataset = pd.read_csv('./data/train.csv')

In [0]:
# Show variables
dataset.columns

In [0]:
# Show first 5 rows
dataset.head()

In [0]:
# Import dependencies for pre-processing
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split

In [0]:
# Convert pandas Dataframe to Numpy Array
dataset_numpy = dataset.values

In [0]:
# Show data type before and after conversion
print(type(dataset))
print(type(dataset_numpy.dtype))

In [0]:
# Separate input variables from output label
X = dataset_numpy[:, :20]
y = dataset_numpy[:, 20:21]

In [0]:
# Show shape of training data and labels
print("Shape of training data: ", X.shape)
print("Shape of training labels: ", y.shape)

In [0]:
# Show different classes to predict
np.unique(y)

In [0]:
# Normalizing the data to improve stability while training
sc = StandardScaler()
sc.fit(X)
X_norm = sc.transform(X)

In [0]:
# Show mean and Standard Deviation learnt from training data
print("Mean per variable: \n", sc.mean_)
print("Standard Deviation per variable: \n", sc.scale_)

In [0]:
# Show data before and after normalization
print("Before normalization: \n", X[1, :])
print("After normalization: \n", X_norm[1, :])

In [0]:
# One-hot encoding of labels
onehot_enc = OneHotEncoder()
y_onehot = onehot_enc.fit_transform(y).toarray()

In [0]:
# Show labels before and after one-hot encoding
print("Before onet-hot encodeing: \n", y[0])
print("After onet-hot encodeing: \n", y_onehot[0])

In [0]:
# Split data in training and validation partitions
X_train, X_val, y_train, y_val = train_test_split(X_norm, y_onehot, 
                                                  test_size=0.1)

In [0]:
# Show sizes of partitions
print("Size of training data: ", X_train.shape)
print("Size of training labels: ", y_train.shape)
print("Size of validation data: ", X_val.shape)
print("Size of validation labels: ", y_val.shape)

## 3. Define a Fully Connected Neural Network

In [0]:
# Import dependencies for designing Keras model
import keras
from keras.models import Sequential
from keras.layers import Dense
%tensorflow_version 1.x

In [0]:
# Design simple neural network architecture
model = Sequential()
model.add(Dense(16, input_dim=20, activation='relu'))
model.add(Dense(12, activation='relu'))
model.add(Dense(4, activation='softmax'))

In [0]:
# Show model information
model.summary()

The number of parameters of each layer is obtained as follows:


*   params_dense_1 = (input_var_num + 1) * dense_1_neurons = (20 + 1) * 16 = 336
*   params_dense_2 = (dense_1_neurons + 1) * dense_2_neurons = (16 + 1) * 12 =      204
*   params_dense_3 = (dense_2_neurons + 1) * dense_3_neurons = (12 + 1) * 4 =      52

Where +1 comes from the bias term added in each layer.




## 4. Choose loss function and optimizer


In [0]:
# Choose loss function, optimizer and training metrics
model.compile(loss='categorical_crossentropy', optimizer='adam', 
              metrics=['accuracy'])

## 5. Train the network



In [0]:
# Choose number of epochs and batch size and train the model
history = model.fit(X_train, y_train, epochs=100, batch_size=64, 
                    validation_data=(X_val, y_val))

In [0]:
# Import dependence for plotting training process 
import matplotlib.pyplot as plt

In [0]:
# Plot training and validation accuracy
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Val'], loc='upper left')
plt.show()

In [0]:
# Plot training and test loss
plt.plot(history.history['loss']) 
plt.plot(history.history['val_loss']) 
plt.title('Model loss') 
plt.ylabel('Loss') 
plt.xlabel('Epoch') 
plt.legend(['Train', 'Val'], loc='upper left') 
plt.show()

In [0]:
# Save the model
from pathlib import Path
path = Path('./models')
if not path.exists():
  path.mkdir()
model.save('./models/model_1.h5')

In [0]:
# List files of ./models directory
ls ./models

# Exercise 1: Train the model without normalization

In [0]:
# Split data without normalization in training and validation partitions


In [0]:
# Train the model


In [0]:
# Plot training and validation accuracy


In [0]:
# Plot training and test loss


# Exercise 2: Train a simpler model and evaluate the results

In [0]:
# Split data with normalization in training and validation partitions


In [0]:
# Design a model like the previous but without the second Dense layer


In [0]:
# Compile the model


In [0]:
# Train the model


In [0]:
# Plot training and validation accuracy


In [0]:
# Plot training and test loss


# Exercise 3: Train a more complex model and evaluate the results

In [0]:
# Split data with normalization in training and validation partitions


In [0]:
# Design a model like the previous but with 128 neurons in the first Dense layer
# and 256 in the second


In [0]:
# Compile the model


In [0]:
# Train the model


In [0]:
# Plot training and validation accuracy


In [0]:
# Plot training and test loss


# Exercise 4: Predict the price range of the phones of the test.csv file

In [0]:
# Read it test data from csv


In [0]:
# Show column names (pay attention, maybe there is something different)


In [0]:
# Convert the data to numpy array


In [0]:
# Show shape of the data


In [0]:
# Get rid of a column if needed


In [0]:
# Normalize data. Important!! Always normalize test data with the mean and
# standard deviation learnt from the training data.


In [0]:
# Show data before and after normalization


In [0]:
# Load model saved during the example
from keras.models import load_model
model = load_model('./models/model_1.h5')

In [0]:
# Predict price range
predictions = model.predict(data_norm)

In [0]:
# Show predictions


In [0]:
# Convert predictions to scalars from one-hot encoding


In [0]:
# Show all scalar predictions


In [0]:
# Show the first test sample in Dataframe format


In [0]:
# Show prediction for that sample
