# Logistic Regression using TensorFlow
* We'll do the Click-Through Ad prediction, but this time we'll use TensorFlow.

## Step 1: Loading the data

In [1]:
import tensorflow as tf
import pandas as pd
from tensorflow.keras.models import Sequential

# Read the first 300,000 rows of the dataset
n_rows = 300000
df = pd.read_csv("./dataset/train.csv", nrows=n_rows)

# Drop unnecessary columns and prepare X and Y
X = df.drop(['click', 'id', 'hour', 'device_id', 'device_ip'], axis=1).values
Y = df['click'].values

## Step 2: Transforming them to One-Hot Encoded data
* We will only train the model using 270,000 samples, 30,000 will be for testing

In [2]:
# Split the data into training and testing sets (90% - 10%)
n_train = int(n_rows * 0.9)
X_train = X[:n_train]
Y_train = Y[:n_train].astype('float16')
X_test = X[n_train:]
Y_test = Y[n_train:].astype('float16')

# One-hot encode the categorical features
from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder(handle_unknown='ignore')
X_train_enc = enc.fit_transform(X_train).toarray().astype('float16')
X_test_enc = enc.transform(X_test).toarray().astype('float16')


# Step 3: Using the Sequential model from Keras

* While Keras is mainly used for building neural networks, it can also be used to create a logistic regression model.
* In this case, the logistic regression model can be seen as a simple one-layer neural network with a sigmoid activation function.
* When we compile and train this model, it essentially learns the weights and bias of a logistic regression model.


In [3]:
# Define the logistic regression model using Keras
# The Sequential model is a linear stack of layers in Keras, which is a popular deep learning library in Python.
model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(1, activation='sigmoid', input_shape=(X_train_enc.shape[1],))
])

# Set up the learning rate and optimizer
learning_rate = 0.001
optimizer = tf.keras.optimizers.Adam(learning_rate = learning_rate)

# Compile the model with binary cross entropy loss since it's a binary classification problem
# Set the metric as the ROC_AUC
model.compile(optimizer=optimizer, loss='binary_crossentropy', metrics=[tf.keras.metrics.AUC()])


In [4]:
# Train the model with 1000 sample-batches
batch_size = 1000
epochs = 12
model.fit(X_train_enc, Y_train, batch_size = batch_size, epochs = epochs, verbose = 1)



Epoch 1/12
Epoch 2/12
Epoch 3/12
Epoch 4/12
Epoch 5/12
Epoch 6/12
Epoch 7/12
Epoch 8/12
Epoch 9/12
Epoch 10/12
Epoch 11/12
Epoch 12/12


<keras.callbacks.History at 0x1c8162e1760>

## Final step: Making predictions and evaluating the model

In [5]:
_, auc = model.evaluate(X_test_enc, Y_test, verbose = 0)
print(f'AUC with 270,000 training samples oon testing set: {auc:.3f}')


AUC with 270,000 training samples oon testing set: 0.770
