<a href="https://colab.research.google.com/github/LiamSwick/Data-Science-Sandbox/blob/main/Audiobook_Customer_Churn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [279]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler

from google.colab import drive
drive.mount('/content/drive')

import tensorflow as tf


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [280]:
df = pd.read_csv('/content/drive/MyDrive/Audiobooks_data.csv', header= None)
unscaled_inputs = df.iloc[:,1:-1]
targets = df.iloc[:,-1]

In [281]:
num_nonzero_targets = int(targets.sum())
print(f"There are {num_nonzero_targets} non-zero targets out of {np.shape(targets)[0]}")

There are 2237 non-zero targets out of 14084


In [282]:
all_inputs = unscaled_inputs.to_numpy()
all_targets = targets.to_numpy()

zero_indices = np.where(all_targets == 0)[0]
nonzero_indices = np.where(all_targets != 0)[0]

shuffled_zero_indices = np.random.choice(zero_indices, len(nonzero_indices), replace=False)

indices_to_keep = np.concatenate([shuffled_zero_indices, nonzero_indices])
np.random.shuffle(indices_to_keep)

shuffled_inputs = all_inputs[indices_to_keep].astype(np.float32)
shuffled_targets = all_targets[indices_to_keep].astype(np.int32)

num_samples = shuffled_inputs.shape[0]
num_train = int(0.8 * num_samples)
num_validation = int(0.1 * num_samples)

unscaled_train = shuffled_inputs[:num_train]
train_targets = shuffled_targets[:num_train]

unscaled_val = shuffled_inputs[num_train:num_train+num_validation]
validation_targets = shuffled_targets[num_train:num_train+num_validation]

unscaled_test = shuffled_inputs[num_train+num_validation:]
test_targets = shuffled_targets[num_train+num_validation:]

scaler = StandardScaler()

train_inputs = scaler.fit_transform(unscaled_train)
validation_inputs = scaler.transform(unscaled_val)
test_inputs = scaler.transform(unscaled_test)

In [287]:
print(np.sum(train_targets), train_targets.shape[0], np.sum(train_targets)/train_targets.shape[0])
print(np.sum(validation_targets), validation_targets.shape[0], np.sum(validation_targets)/validation_targets.shape[0])
print(np.sum(test_targets), test_targets.shape[0], np.sum(test_targets)/test_targets.shape[0])


1805 3579 0.5043308186644314
210 447 0.4697986577181208
222 448 0.4955357142857143
(3579, 10)


In [284]:
np.savez("Audiobook_Data_Train", inputs = train_inputs, targets = train_targets)

np.savez("Audiobook_Data_Validation", inputs = validation_inputs, targets = validation_targets)

np.savez("Audiobook_Data_Test", inputs = test_inputs, targets = test_targets)


In [285]:
npz = np.load("Audiobook_Data_Train.npz")
train_inputs, train_targets = npz['inputs'].astype(np.float32), npz['targets'].astype(np.int32)

npz = np.load('Audiobook_Data_Validation.npz')
validation_inputs, validation_targets = npz['inputs'].astype(np.float32), npz['targets'].astype(np.int32)

npz = np.load('Audiobook_Data_Test.npz')
test_inputs, test_targets = npz['inputs'].astype(np.float32), npz['targets'].astype(np.int32)

In [288]:
input_size = 10
output_size = 2
hidden_layer_size = 100

model = tf.keras.Sequential([
                          tf.keras.layers.Dense(hidden_layer_size, activation = 'relu'),
                          tf.keras.layers.Dense(hidden_layer_size, activation = 'relu'),
                          tf.keras.layers.Dense(output_size, activation = 'softmax')


                          ])

custom_optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
model.compile(optimizer = custom_optimizer, loss = 'sparse_categorical_crossentropy', metrics = ['accuracy'], )

batch_size = 100
max_epochs = 100

early_stopping = tf.keras.callbacks.EarlyStopping(
    monitor='val_loss',
    patience=2,           # Allow 10 epochs of no improvement
    restore_best_weights=True
)

model.fit( train_inputs,
           train_targets,
           batch_size = batch_size,
           epochs = max_epochs,
           callbacks = [early_stopping],
           validation_data = (validation_inputs, validation_targets),
           verbose = 2
           )

Epoch 1/100
36/36 - 5s - 146ms/step - accuracy: 0.7399 - loss: 0.5350 - val_accuracy: 0.7897 - val_loss: 0.4516
Epoch 2/100
36/36 - 0s - 11ms/step - accuracy: 0.7809 - loss: 0.4265 - val_accuracy: 0.7852 - val_loss: 0.4142
Epoch 3/100
36/36 - 0s - 13ms/step - accuracy: 0.7941 - loss: 0.3954 - val_accuracy: 0.7785 - val_loss: 0.4078
Epoch 4/100
36/36 - 0s - 11ms/step - accuracy: 0.7980 - loss: 0.3815 - val_accuracy: 0.8076 - val_loss: 0.3838
Epoch 5/100
36/36 - 0s - 4ms/step - accuracy: 0.8039 - loss: 0.3732 - val_accuracy: 0.8210 - val_loss: 0.3791
Epoch 6/100
36/36 - 0s - 4ms/step - accuracy: 0.8036 - loss: 0.3664 - val_accuracy: 0.8143 - val_loss: 0.3767
Epoch 7/100
36/36 - 0s - 4ms/step - accuracy: 0.8013 - loss: 0.3659 - val_accuracy: 0.8166 - val_loss: 0.3833
Epoch 8/100
36/36 - 0s - 4ms/step - accuracy: 0.8069 - loss: 0.3636 - val_accuracy: 0.8166 - val_loss: 0.3892


<keras.src.callbacks.history.History at 0x798d3e5c2ed0>

In [289]:
test_loss, test_accuracy = model.evaluate(test_inputs,test_targets)
print(f"Test loss: {test_loss*100:0.2f}%")
print(f"Test accuracy: {test_accuracy*100:0.2f}%")

[1m14/14[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.8013 - loss: 0.3896 
Test loss: 39.24%
Test accuracy: 79.69%


# Task
Create an interactive user interface to allow users to input values for the 10 features of the audiobook dataset, preprocess these inputs using the `scaler` object, make a prediction using the trained `model`, and display the prediction in a user-friendly format. The input fields should be designed based on the features present in the `unscaled_inputs` DataFrame.

## Analyze Model Inputs

### Subtask:
Examine the `train_inputs` and `unscaled_inputs` to understand the features the model expects and their original scales. This is crucial for designing appropriate input fields for the user interface.


**Reasoning**:
To understand the original features and their values, I will display the first few rows of the `unscaled_inputs` DataFrame. Additionally, I will print the shapes of both `unscaled_inputs` and `train_inputs` to verify their dimensions before and after scaling.



In [290]:
print("First 5 rows of unscaled_inputs:")
print(unscaled_inputs.head())
print("\nShape of unscaled_inputs:")
print(unscaled_inputs.shape)
print("\nShape of train_inputs (after scaling):")
print(train_inputs.shape)

First 5 rows of unscaled_inputs:
       1     2      3      4   5      6     7       8   9    10
0  1620.0  1620  19.73  19.73   1  10.00  0.99  1603.8   5   92
1  2160.0  2160   5.33   5.33   0   8.91  0.00     0.0   0    0
2  2160.0  2160   5.33   5.33   0   8.91  0.00     0.0   0  388
3  1620.0  1620   5.96   5.96   0   8.91  0.42   680.4   1  129
4  2160.0  2160   5.33   5.33   0   8.91  0.22   475.2   0  361

Shape of unscaled_inputs:
(14084, 10)

Shape of train_inputs (after scaling):
(3579, 10)


## Design User Input Fields

### Subtask:
Create a set of user-friendly input fields (e.g., text boxes, sliders) for each feature using `ipywidgets`. This will allow users to easily provide input for the model without needing to understand data formats.


**Reasoning**:
To create user-friendly input fields, I will import necessary libraries and then define a list to hold the ipywidgets for each feature. I will iterate through the columns of the `unscaled_inputs` DataFrame and create an appropriate `FloatText` widget for each, as the head of `unscaled_inputs` shows floating-point numbers.



In [291]:
import ipywidgets as widgets
from IPython.display import display

input_widgets = []

for i, col in enumerate(unscaled_inputs.columns):
    # Assuming all features are numerical and can be represented as floats
    # We can infer more specific types later if needed, but FloatText is a good general choice
    widget = widgets.FloatText(
        value=unscaled_inputs[col].mean(), # Initialize with the mean of the column for a sensible default
        description=f'Feature {col}:',
        disabled=False
    )
    input_widgets.append(widget)

print("Input widgets created successfully.")

Input widgets created successfully.


**Reasoning**:
The subtask of creating user-friendly input fields is completed by displaying the created `ipywidgets` to the user. This will allow the user to interact with the widgets and provide input.



In [292]:
print("Please enter values for each feature:")
for widget in input_widgets:
    display(widget)


Please enter values for each feature:


FloatText(value=1591.2816853978272, description='Feature 1:')

FloatText(value=1678.6086339108208, description='Feature 2:')

FloatText(value=7.103790826469753, description='Feature 3:')

FloatText(value=7.543805026980971, description='Feature 4:')

FloatText(value=0.16074978699233172, description='Feature 5:')

FloatText(value=8.909794802612891, description='Feature 6:')

FloatText(value=0.12565890372053395, description='Feature 7:')

FloatText(value=189.88898324339675, description='Feature 8:')

FloatText(value=0.0702215279750071, description='Feature 9:')

FloatText(value=61.9350326611758, description='Feature 10:')

**Reasoning**:
The subtask requires an interactive user interface to make a prediction. Now that the input widgets are displayed, the next step is to create a 'Predict' button that, when clicked, will gather the input from the widgets, preprocess it, and trigger the model prediction. A `Button` widget from `ipywidgets` is suitable for this.



In [293]:
predict_button = widgets.Button(
    description='Predict',
    disabled=False,
    button_style='success', # 'success', 'info', 'warning', 'danger' or ''
    tooltip='Click to make a prediction'
)

output_area = widgets.Output()

print("Click the button below to get a prediction based on your inputs:")
display(predict_button, output_area)


Click the button below to get a prediction based on your inputs:


Button(button_style='success', description='Predict', style=ButtonStyle(), tooltip='Click to make a prediction…

Output()

**Reasoning**:
Now that the input widgets and the 'Predict' button are displayed, I need to define the function that will be executed when the 'Predict' button is clicked. This function will collect the input values, preprocess them using the `scaler` object, make a prediction with the trained `model`, and display the result in the `output_area`.



In [294]:
def on_predict_button_clicked(b):
    with output_area:
        output_area.clear_output()
        try:
            # 1. Gather input values
            user_input_values = [widget.value for widget in input_widgets]

            # 2. Convert to numpy array and reshape for scaler
            input_array = np.array(user_input_values).reshape(1, -1).astype(np.float32)

            # 3. Preprocess inputs using the scaler
            scaled_input = scaler.transform(input_array)

            # 4. Make prediction using the trained model
            prediction_probabilities = model.predict(scaled_input)
            prediction_label = np.argmax(prediction_probabilities, axis=1)[0]

            # 5. Display the prediction
            if prediction_label == 1:
                print("Prediction: The customer is likely to make a purchase.")
            else:
                print("Prediction: The customer is not likely to make a purchase.")
            print(f"Confidence (No Purchase, Purchase): {prediction_probabilities[0][0]:.2f}, {prediction_probabilities[0][1]:.2f}")

        except Exception as e:
            print(f"An error occurred: {e}")

# Link the button to the function
predict_button.on_click(on_predict_button_clicked)

print("Prediction logic attached to the button.")

Prediction logic attached to the button.


## Summary:

### Data Analysis Key Findings

*   **Input Features:** The `unscaled_inputs` DataFrame contains 10 numerical features, which serve as the basis for the interactive user interface.
*   **Interactive Input Design:** Ten `ipywidgets.FloatText` input fields were successfully created, one for each feature. Each widget was initialized with the mean value of its corresponding feature from the `unscaled_inputs` and labeled descriptively (e.g., 'Feature 1:').
*   **Prediction Control and Output:** A 'Predict' button (`ipywidgets.Button`) was implemented to trigger the prediction process, and an `ipywidgets.Output` area was established to display the prediction results to the user.
*   **Prediction Logic:** The system gathers user-provided values, converts them into a NumPy array, scales them using the pre-trained `scaler` object, and then uses the `model` to generate a purchase prediction. The prediction includes a label (e.g., "likely to make a purchase") and confidence scores for both possible outcomes.

### Insights or Next Steps

*   The developed interactive interface allows for immediate testing and demonstration of the model's predictive capabilities with custom user inputs, enhancing accessibility for non-technical stakeholders.
*   To improve robustness, consider implementing input validation for the `FloatText` widgets, such as defining acceptable ranges (min/max values) for each feature based on the `unscaled_inputs` distribution, to guide users and prevent unrealistic data entry.
