# Logistic Regression Exercises

* We'll use scikit-learn and tensorflow to predict Ad Click-Through.
* The goal is to try different models that give us the highest AUC testing.
* Also, it's important to see if we can train the model with 10 million
samples or if it's too much for the model.

## Excercise 1:
In the logistic regression-based click-through prediction project, can you also tweak hyperparameters such as penalty, eta0, and alpha in the SGDClassifier model?

What is the highest testing AUC you are able to achieve?

### Step 1: Loading the data

In [1]:
import tensorflow as tf
import pandas as pd
from tensorflow.keras.models import Sequential

# Read the first 100,000 rows of the dataset
n_rows = 100_000
df = pd.read_csv("./dataset/train.csv", nrows=n_rows)

# Drop unnecessary columns and prepare X and Y
X = df.drop(['click', 'id', 'hour', 'device_id', 'device_ip'], axis=1).values
Y = df['click'].values

### Step 2: Transforming them to One-Hot Encoded data
* We will only train the model using 270,000 samples, 30,000 will be for testing

In [2]:
# Split the data into training and testing sets (90% - 10%)
n_train = int(n_rows * 0.9)
X_train = X[:n_train]
Y_train = Y[:n_train].astype('float16')
X_test = X[n_train:]
Y_test = Y[n_train:].astype('float16')

# One-hot encode the categorical features
from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder(handle_unknown='ignore')
X_train_enc = enc.fit_transform(X_train).toarray().astype('float16')
X_test_enc = enc.transform(X_test).toarray().astype('float16')


### Step 3: Preparing the model 
* We'll tweak with penalty, eta0 and alpha

In [3]:
from sklearn.linear_model import SGDClassifier
from sklearn.model_selection import GridSearchCV

# Preparing the parameters to use with GridSearch
parameters = {'penalty': ['l1', 'l2'],
             'eta0': [1e-03, 1e-02, 1e-01],
             'alpha': [1e-03, 1e-02,]}

model = SGDClassifier(loss = "log_loss", fit_intercept=True, learning_rate='constant', verbose=1)

model_grid = GridSearchCV(model, parameters, n_jobs=-1, cv=27)


In [8]:
Y_test.shape

(10000,)

In [4]:
# Training and getting the best parameters
model_grid.fit(X_train_enc, Y_train)

print(model_grid.best_params_)



-- Epoch 1
Norm: 1.38, NNZs: 5566, Bias: -0.213428, T: 90000, Avg. loss: 0.429156
Total training time: 1.11 seconds.
-- Epoch 2
Norm: 1.71, NNZs: 5566, Bias: -0.241317, T: 180000, Avg. loss: 0.420588
Total training time: 2.24 seconds.
-- Epoch 3
Norm: 1.94, NNZs: 5566, Bias: -0.256211, T: 270000, Avg. loss: 0.418425
Total training time: 3.35 seconds.
-- Epoch 4
Norm: 2.12, NNZs: 5566, Bias: -0.282163, T: 360000, Avg. loss: 0.416963
Total training time: 4.44 seconds.
-- Epoch 5
Norm: 2.27, NNZs: 5566, Bias: -0.306288, T: 450000, Avg. loss: 0.416112
Total training time: 5.52 seconds.
-- Epoch 6
Norm: 2.38, NNZs: 5566, Bias: -0.327163, T: 540000, Avg. loss: 0.415356
Total training time: 6.60 seconds.
-- Epoch 7
Norm: 2.49, NNZs: 5566, Bias: -0.361351, T: 630000, Avg. loss: 0.414835
Total training time: 7.67 seconds.
-- Epoch 8
Norm: 2.58, NNZs: 5566, Bias: -0.379046, T: 720000, Avg. loss: 0.414428
Total training time: 8.74 seconds.
-- Epoch 9
Norm: 2.66, NNZs: 5566, Bias: -0.386419, T: 81

Best model: {'alpha': 0.001, 'eta0': 0.001, 'penalty': 'l2'}


### Predicting and evaluating


In [10]:
## Predicting and evaluating
from sklearn.metrics import roc_auc_score
logistic_best = model_grid.best_estimator_

pred = logistic_best.predict_proba(X_test_enc)
print(f"The ROC_AUC after training with 100,000 samples is: {roc_auc_score(Y_test, pred[:,1])}")

The ROC_AUC after training with 100,000 samples is: 0.724661623809706


## Exercise 2: 
Can you try to use more training samples, for instance, 10 million samples, in the online learning solution?

In [11]:
# Redefining our variables
n_rows = 10_000 * 1100
df = pd.read_csv("./dataset/train.csv", nrows = n_rows)
# Splitting the features from the target
X = df.drop(['click', 'id', 'hour', 'device_id', 'device_ip'], axis=1).values
y = df['click'].values

# Splitting in training and testing
Y = df['click'].values
n_train = 100000 * 100
X_train = X[:n_train]
Y_train = Y[:n_train]
X_test = X[n_train:]
Y_test = Y[n_train:]

In [12]:
# One hot encoding
enc = OneHotEncoder(handle_unknown = 'ignore')
enc.fit(X_train.toarray().)

In [14]:
# Initializing the SGD model. max_iter is set to 1 for online learning
sgd_lr_online = SGDClassifier(loss = 'log_loss', penalty = 'l2',
                              fit_intercept = True, max_iter = 1,
                              learning_rate = 'constant', eta0 = 0.001, alpha = 0.001, verbose = 1)


In [15]:
import timeit
# Building a loop (0100 times). We need to specify the classes in online learning
start_time = timeit.default_timer()
for i in range(1000):
    x_train = X_train[i * 100_00: (i+1) * 10_000]
    y_train = Y_train[i * 100_00: (i+1) * 10_000]
    x_train_enc = enc.transform(x_train)
    sgd_lr_online.partial_fit(x_train_enc.toarray(), y_train, classes = [0, 1])

print(f"--- {(timeit.default_timer() - start_time)}.3fs seconds ---")

-- Epoch 1
Norm: 1.43, NNZs: 5725, Bias: -0.211223, T: 100000, Avg. loss: 0.429137
Total training time: 5.57 seconds.
-- Epoch 1
Norm: 1.74, NNZs: 7439, Bias: -0.251675, T: 100000, Avg. loss: 0.419017
Total training time: 5.57 seconds.
-- Epoch 1
Norm: 1.97, NNZs: 8559, Bias: -0.306295, T: 100000, Avg. loss: 0.398179
Total training time: 5.17 seconds.
-- Epoch 1
Norm: 2.15, NNZs: 9332, Bias: -0.342524, T: 100000, Avg. loss: 0.377457
Total training time: 4.80 seconds.
-- Epoch 1
Norm: 2.30, NNZs: 10025, Bias: -0.375888, T: 100000, Avg. loss: 0.387828
Total training time: 4.85 seconds.
-- Epoch 1
Norm: 2.44, NNZs: 10530, Bias: -0.406709, T: 100000, Avg. loss: 0.410690
Total training time: 4.80 seconds.
-- Epoch 1
Norm: 2.57, NNZs: 11125, Bias: -0.441446, T: 100000, Avg. loss: 0.397483
Total training time: 4.80 seconds.
-- Epoch 1
Norm: 2.73, NNZs: 11572, Bias: -0.477114, T: 100000, Avg. loss: 0.379210
Total training time: 4.79 seconds.
-- Epoch 1
Norm: 2.84, NNZs: 11880, Bias: -0.500645,

-- Epoch 1
Norm: 3.59, NNZs: 20589, Bias: -1.491700, T: 100000, Avg. loss: 0.348972
Total training time: 4.78 seconds.
-- Epoch 1
Norm: 3.57, NNZs: 20617, Bias: -1.515423, T: 100000, Avg. loss: 0.348083
Total training time: 4.77 seconds.
-- Epoch 1
Norm: 3.56, NNZs: 20687, Bias: -1.514056, T: 100000, Avg. loss: 0.351274
Total training time: 4.77 seconds.
-- Epoch 1
Norm: 3.55, NNZs: 20743, Bias: -1.531812, T: 100000, Avg. loss: 0.350550
Total training time: 4.78 seconds.
-- Epoch 1
Norm: 3.53, NNZs: 20778, Bias: -1.534392, T: 100000, Avg. loss: 0.353044
Total training time: 4.78 seconds.
-- Epoch 1
Norm: 3.50, NNZs: 20805, Bias: -1.523599, T: 100000, Avg. loss: 0.355674
Total training time: 4.77 seconds.
-- Epoch 1
Norm: 3.51, NNZs: 20864, Bias: -1.546064, T: 100000, Avg. loss: 0.350048
Total training time: 4.79 seconds.
-- Epoch 1
Norm: 3.53, NNZs: 20914, Bias: -1.550158, T: 100000, Avg. loss: 0.341283
Total training time: 4.77 seconds.
-- Epoch 1
Norm: 3.55, NNZs: 20946, Bias: -1.551

In [21]:
# Applying the trained model on the testing set, the final 100_000 samples
x_test_enc = enc.transform(X_test)

pred = sgd_lr_online.predict_proba(x_test_enc)[:, 1]
print(f'Training samples: {n_train * 10}, AUC on testing set: {roc_auc_score(Y_test, pred):.3f}')

Training samples: 100000000, AUC on testing set: 0.687


## Exercise 3: 
In the TensorFlow-based solution, can you tweak the learning rate, the number of training steps, and other hyperparameters to obtain a better performance?

## Step 1: Loading the data

In [26]:
import tensorflow as tf
import pandas as pd
from tensorflow.keras.models import Sequential

# Read the first 300,000 rows of the dataset
n_rows = 3_000_000
df = pd.read_csv("./dataset/train.csv", nrows=n_rows)

# Drop unnecessary columns and prepare X and Y
X = df.drop(['click', 'id', 'hour', 'device_id', 'device_ip'], axis=1).values
Y = df['click'].values

## Step 2: Transforming the data into One-Hot Encoded items: 


In [29]:
# Split the data into training and testing sets (90% - 10%)
n_train = int(n_rows * 0.9)
X_train = X[:n_train]
Y_train = Y[:n_train].astype('float8')
X_test = X[n_train:]
Y_test = Y[n_train:].astype('float8')

# One-hot encode the categorical features
from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder(handle_unknown='ignore')
X_train_enc = enc.fit_transform(X_train).toarray().astype('single')
X_test_enc = enc.transform(X_test).toarray().astype('single')


TypeError: data type 'float8' not understood

## Step 3: Using the Sequential model from Keras

* While Keras is mainly used for building neural networks, it can also be used to create a logistic regression model.
* In this case, the logistic regression model can be seen as a simple one-layer neural network with a sigmoid activation function.
* When we compile and train this model, it essentially learns the weights and bias of a logistic regression model.


In [None]:
# Define the logistic regression model using Keras
# The Sequential model is a linear stack of layers in Keras, which is a popular deep learning library in Python.
model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(1, activation='sigmoid', input_shape=(X_train_enc.shape[1],))
])

# Set up the learning rate and optimizer
learning_rate = 0.0001
optimizer = tf.keras.optimizers.Adam(learning_rate = learning_rate, )

# Compile the model with binary cross entropy loss since it's a binary classification problem
# Set the metric as the ROC_AUC
model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=[tf.keras.metrics.AUC()])


In [None]:
# Train the model with 1000 sample-batches
batch_size = 1000
epochs = 18
model.fit(X_train_enc, Y_train, batch_size = batch_size, epochs = epochs, verbose = 1)



## Final Step: Making predictions: 

In [None]:
_, auc = model.evaluate(X_test_enc, Y_test, verbose = 0)
print(f'AUC with 2,700,000 training samples oon testing set: {auc:.3f}')
