## New analysis of the cheese platter problem

An airline stocks a certain amount of cheese platters on their flights based on certain factors. Some of these flights sell out of cheese platters. I will endeavor to create a model that correctly stocks each flight so that the airline does not run out of platters and can maximize profit.

In [29]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score

In [4]:
data = pd.read_csv("cheesplate.csv")
data.head()

Unnamed: 0,Dptr Hour,Length of Flight (Hrs),Day of Week,Passengers Boarded,Passengers Booked 2 DtD,Stock Out Occurred,Cheese Platters Sold
0,14,4,Thursday,140,137,1,17
1,19,6,Tuesday,163,153,0,23
2,19,2,Saturday,165,160,0,20
3,6,6,Saturday,161,161,0,16
4,18,3,Friday,118,112,0,16


In [11]:
y = pd.get_dummies(data['Day of Week'])
data.drop(columns=["Day of Week"])
cheese = data.join(y)

In [22]:
cheese.head()
cheese = cheese.drop(columns=["Day of Week"])

In [24]:
X_train, X_test = train_test_split(cheese, test_size=.3)
X_train_label = X_train["Stock Out Occurred"]
X_train_data = X_train.drop(columns=["Stock Out Occurred"])

clf = RandomForestClassifier()
clf.fit(X_train_data, X_train_label)

RandomForestClassifier()

In [30]:
X_test_label = X_test["Stock Out Occurred"]
X_test_data = X_test.drop(columns=["Stock Out Occurred"])
scores = clf.score(X_test_data, X_test_label)
print(scores)
print(clf.feature_importances_)


0.8706666666666667
[0.21181374 0.08541306 0.18754157 0.19230144 0.22953083 0.01327472
 0.01456657 0.01371795 0.01265195 0.01270684 0.0128879  0.01359342]


In [37]:
clf1 = RandomForestClassifier(max_depth=5, max_features=6, bootstrap=True)
cv_scores = cross_val_score(clf1, X_train_data, X_train_label, cv=6)
print(cv_scores)

[0.85616438 0.84931507 0.8490566  0.85591767 0.86106346 0.864494  ]


In [38]:
depth = [2,3,4,5,6,7]
features = [1,2,3,4,5,6]
estimate = [50,100,150,200,250,300]
means = 0

for d in depth:
    for f in features:
        for e in estimate:
            clf = RandomForestClassifier(n_estimators=e, max_depth=d, max_features=f, bootstrap=True)
            cv_scores = cross_val_score(clf, X_train_data, X_train_label, cv=6)
            if np.mean(cv_scores) > means:
                best_depth = d
                best_features = f
                best_estimate = e
                means = np.mean(cv_scores)

print(best_depth, best_features, best_estimate)

7 6 250


In [40]:
new_depth = [7,8,9,10,11]
new_features = [6,7,8,9]
new_means = 0

for d in new_depth:
    for f in new_features:
        clf = RandomForestClassifier(n_estimators=250, max_depth = d, max_features=f)
        cv_scores = cross_val_score(clf, X_train_data, X_train_label, cv=5)
        if np.mean(cv_scores > new_means):
            best_depth1 = d
            best_features1 = f
            new_means = np.mean(cv_scores)

print(best_depth1, best_features1)

11 9
