## Part 1: Preprocessing

In [80]:
# Import our dependencies
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import pandas as pd
import numpy as np
from tensorflow.keras.models import Model
from tensorflow.keras import layers

In [81]:
#  Import and read the attrition data
attrition_df = pd.read_csv('https://static.bc-edx.com/ai/ail-v-1-0/m19/lms/datasets/attrition.csv')
attrition_df.head()

Unnamed: 0,Age,Attrition,BusinessTravel,Department,DistanceFromHome,Education,EducationField,EnvironmentSatisfaction,HourlyRate,JobInvolvement,...,PerformanceRating,RelationshipSatisfaction,StockOptionLevel,TotalWorkingYears,TrainingTimesLastYear,WorkLifeBalance,YearsAtCompany,YearsInCurrentRole,YearsSinceLastPromotion,YearsWithCurrManager
0,41,Yes,Travel_Rarely,Sales,1,2,Life Sciences,2,94,3,...,3,1,0,8,0,1,6,4,0,5
1,49,No,Travel_Frequently,Research & Development,8,1,Life Sciences,3,61,2,...,4,4,1,10,3,3,10,7,1,7
2,37,Yes,Travel_Rarely,Research & Development,2,2,Other,4,92,2,...,3,2,0,7,3,3,0,0,0,0
3,33,No,Travel_Frequently,Research & Development,3,4,Life Sciences,4,56,3,...,3,3,0,8,3,3,8,7,3,0
4,27,No,Travel_Rarely,Research & Development,2,1,Medical,1,40,3,...,3,4,1,6,3,3,2,2,2,2


In [82]:
# Determine the number of unique values in each column.
attrition_df.nunique()

Unnamed: 0,0
Age,43
Attrition,2
BusinessTravel,3
Department,3
DistanceFromHome,29
Education,5
EducationField,6
EnvironmentSatisfaction,4
HourlyRate,71
JobInvolvement,4


In [83]:
# Create y_df with the Attrition and Department columns
y_df = attrition_df[['Attrition', 'Department']]
y_df.head()



Unnamed: 0,Attrition,Department
0,Yes,Sales
1,No,Research & Development
2,Yes,Research & Development
3,No,Research & Development
4,No,Research & Development


In [84]:
# Create a list of at least 10 column names to use as X data
selected_columns = [
    'Age', 'DistanceFromHome', 'Education',
    'EnvironmentSatisfaction', 'HourlyRate', 'JobInvolvement',
    'JobLevel', 'JobSatisfaction', 'TotalWorkingYears', "YearsAtCompany"
]

# Create X_df using your selected columns
X_df = attrition_df[selected_columns]

# Show the data types for X_df
X_df.dtypes




Unnamed: 0,0
Age,int64
DistanceFromHome,int64
Education,int64
EnvironmentSatisfaction,int64
HourlyRate,int64
JobInvolvement,int64
JobLevel,int64
JobSatisfaction,int64
TotalWorkingYears,int64
YearsAtCompany,int64


In [85]:
# Split the data into training and testing sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X_df, y_df, test_size=0.2, random_state=15)


In [86]:
# Convert your X data to numeric data types however you see fit
# Add new code cells as necessary

X_train.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1176 entries, 453 to 645
Data columns (total 10 columns):
 #   Column                   Non-Null Count  Dtype
---  ------                   --------------  -----
 0   Age                      1176 non-null   int64
 1   DistanceFromHome         1176 non-null   int64
 2   Education                1176 non-null   int64
 3   EnvironmentSatisfaction  1176 non-null   int64
 4   HourlyRate               1176 non-null   int64
 5   JobInvolvement           1176 non-null   int64
 6   JobLevel                 1176 non-null   int64
 7   JobSatisfaction          1176 non-null   int64
 8   TotalWorkingYears        1176 non-null   int64
 9   YearsAtCompany           1176 non-null   int64
dtypes: int64(10)
memory usage: 101.1 KB


In [87]:
# Create a StandardScaler
scaler = StandardScaler()

# Fit the StandardScaler to the training data
scaler.fit(X_train)

# Scale the training and testing data
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [88]:
from sklearn.preprocessing import OneHotEncoder
# Create a OneHotEncoder for the Department column
attrition_encoder = OneHotEncoder(sparse_output=False)

attrition_train = y_train[['Department']]
attrition_test = y_test[['Department']]
# Fit the encoder to the training data
attrition_encoder.fit(attrition_train)

# Create two new variables by applying the encoder
# to the training and testing data
y_train_attrition_encoded = attrition_encoder.transform(attrition_train)
y_test_attrition_encoded = attrition_encoder.transform(attrition_test)

# Verify the shapes
print("y_train_attrition_encoded shape:", y_train_attrition_encoded.shape)
print("y_test_attrition_encoded shape:", y_test_attrition_encoded.shape)

y_train_attrition_encoded shape: (1176, 3)
y_test_attrition_encoded shape: (294, 3)


In [89]:
# Create a OneHotEncoder for the Attrition column
department_encoder = OneHotEncoder(sparse_output=False)

department_train = y_train[['Department']]
department_test = y_test[['Department']]
# Fit the encoder to the training data
department_encoder.fit(department_train)

# Create two new variables by applying the encoder
# to the training and testing data
y_train_department_encoded = department_encoder.transform(department_train)
y_test_department_encoded = department_encoder.transform(department_test)

# Verify the shapes
print("y_train_department_encoded shape:", y_train_department_encoded.shape)
print("y_test_department_encoded shape:", y_test_department_encoded.shape)



y_train_department_encoded shape: (1176, 3)
y_test_department_encoded shape: (294, 3)


## Create, Compile, and Train the Model

In [90]:
# Find the number of columns in the X training data
num_column = X_train_scaled.shape[1]

# Create the input layer
input_layer = layers.Input(shape=(num_column,))

# Create at least two shared layers
shared_layer1 = layers.Dense(22, activation='relu')(input_layer)
shared_layer2 = layers.Dense(32, activation='relu')(shared_layer1)


In [91]:
# Create a branch for Department
# with a hidden layer and an output layer

# Create the hidden layer
department_hidden_layer = layers.Dense(16, activation='relu')(shared_layer2)

# Create the output layer
department_output_layer = layers.Dense(y_train_department_encoded.shape[1], activation='softmax', name='department_output')(department_hidden_layer)



In [92]:
# Create a branch for Attrition
# with a hidden layer and an output layer

# Create the hidden layer
attrition_hidden_layer = layers.Dense(16, activation='relu')(shared_layer2)

# Create the output layer
attrition_output_layer = layers.Dense(y_train_attrition_encoded.shape[1], activation='softmax', name='attrition_output')(attrition_hidden_layer)



In [93]:
# Create the model
model = Model(inputs=input_layer, outputs=[department_output_layer, attrition_output_layer])

# Compile the model
model.compile(optimizer='adam',
              loss={'department_output': 'categorical_crossentropy', 'attrition_output': 'categorical_crossentropy'},
              metrics={'department_output': 'accuracy', 'attrition_output': 'accuracy'})

# Summarize the model
model.summary()


In [94]:

# Train the model
model_train = model.fit(X_train_scaled,
                    [ y_train_department_encoded, y_train_attrition_encoded],
                    epochs=50)




Epoch 1/50
[1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2ms/step - attrition_output_accuracy: 0.5918 - department_output_accuracy: 0.3882 - loss: 1.9730
Epoch 2/50
[1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - attrition_output_accuracy: 0.6458 - department_output_accuracy: 0.6284 - loss: 1.6581
Epoch 3/50
[1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - attrition_output_accuracy: 0.6515 - department_output_accuracy: 0.6515 - loss: 1.5523
Epoch 4/50
[1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - attrition_output_accuracy: 0.6323 - department_output_accuracy: 0.6312 - loss: 1.5836
Epoch 5/50
[1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - attrition_output_accuracy: 0.6463 - department_output_accuracy: 0.6463 - loss: 1.5512
Epoch 6/50
[1m37/37[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - attrition_output_accuracy: 0.6497 - department_output_

In [100]:
# Evaluate the model with the testing data
evaluation = model.evaluate(
    X_test_scaled,
    {'department_output': y_test_department_encoded, 'attrition_output': y_test_attrition_encoded},
    verbose=1
)

# Print the evaluation list to understand its structure
print(evaluation)

# Extract the specific metrics
test_loss = evaluation[0]
department_output_loss = evaluation[1]
attrition_output_loss = evaluation[2]

print(f"Test Loss: {test_loss}")
print(f"Test Department Output Loss: {department_output_loss}")
print(f"Test Attrition Output Loss: {attrition_output_loss}")


[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - attrition_output_accuracy: 0.6897 - department_output_accuracy: 0.6862 - loss: 1.5671  
[1.533742904663086, 0.7006802558898926, 0.7108843326568604]
Test Loss: 1.533742904663086
Test Department Output Loss: 0.7006802558898926
Test Attrition Output Loss: 0.7108843326568604


In [102]:
# Print the accuracy for both department and attrition

# Calculate accuracy for the department output
department_pred = model.predict(X_test_scaled)[0]
department_accuracy = np.mean(np.argmax(department_pred, axis=1) == np.argmax(y_test_department_encoded, axis=1))

# Calculate accuracy for the attrition output
attrition_pred = model.predict(X_test_scaled)[1]
attrition_accuracy = np.mean(np.argmax(attrition_pred, axis=1) == np.argmax(y_test_attrition_encoded, axis=1))

print(f"Test Department Output Accuracy: {department_accuracy}")
print(f"Test Attrition Output Accuracy: {attrition_accuracy}")

[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step 
[1m10/10[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step 
Test Department Output Accuracy: 0.7108843537414966
Test Attrition Output Accuracy: 0.7006802721088435


# Summary

In the provided space below, briefly answer the following questions.

1. Is accuracy the best metric to use on this data? Why or why not?

2. What activation functions did you choose for your output layers, and why?

3. Can you name a few ways that this model might be improved?

# Model Evaluation and Improvement

### Accuracy as a Metric
You're right that accuracy can be misleading, especially in imbalanced datasets. In cases where one class significantly outnumbers another, a model might achieve high accuracy simply by predicting the majority class. Precision, recall, and F1-score provide a more nuanced view of performance, particularly for the minority class (e.g., employees leaving the company). Additionally, considering metrics like the ROC-AUC can help evaluate the trade-off between true positive and false positive rates.

### Activation Functions
Using Softmax for both outputs is a good choice. For the binary classification problem (attrition), while Softmax works, using the sigmoid function could be more straightforward. Sigmoid outputs a single probability score, which directly indicates the likelihood of an employee leaving, while Softmax handles multiple classes effectively. However, either can work depending on the context and how you interpret the outputs.

### Model Improvement Strategies
1. **More Data**: Beyond increasing volume, ensuring data quality and relevance is essential. Incorporating diverse sources can enhance model robustness.
   
2. **Feature Engineering**: In addition to your suggestions, consider interaction terms or polynomial features for complex relationships. Normalizing or scaling features can also improve training efficiency.

3. **Hyperparameter Tuning**: Techniques like grid search or random search can systematically explore combinations, and Bayesian optimization can be more efficient in finding optimal hyperparameters.

4. **Regularization**: Besides dropout and L2, consider techniques like early stopping, where training halts when performance on a validation set begins to degrade, helping to prevent overfitting.

5. **Different Architectures**: You might explore recurrent neural networks (RNNs) if your data has a temporal component. Alternatively, experimenting with attention mechanisms can be beneficial for capturing relevant features.

6. **Cross-Validation**: Implementing k-fold cross-validation can help ensure that the model's performance is robust across different subsets of the data.

7. **Ensemble Learning**: Beyond random forests and gradient boosting, consider stacking or blending models to leverage the strengths of various algorithms.

These enhancements can help improve your model's predictive power and overall reliability in making informed decisions.