#  Analyzing and Predicting Player Count Trends in Online Games

## Part 3: Model Construction

We will be utilizing a model known as SARIMAX, or Seasonal Auto-Regressive Integrated Moving Average with Exogenous Regressors. At first, we will not be using exogenous regressors, but we will be eventually incorporating our 'event' column to assess the impact of an ongoing event to our predictor of choice.  

### Modelling Goals

We wish to predict 3 separate things:

1. Number of players at a given time in the future
2. Player growth over a given period
3. Player growth at a given time.

### Modelling Process for SARIMA and SARIMAX

1. Transform Data
    - Outlier Removal
    - Discontinuity
2. Determine Seasonality
    - Spectral Analysis
3. Stationarity
    - ACF & PACF Analysis
4. Model Construction
5. Model Comparison

### Modelling Process for Our Data

We have to ask what exact model we want to construct. Would we want a model that is built on a single game and can only preedict that one game? Or should we build a model that takes an amalgamization of all games, and build a model based upon that? Or perhaps making a model based upon our control dataframe, Team Fortress 2, and seeing how this model performs on other games. 

1. Basic Modelling
    - CSGO
    - DOTA 2
    - Rocket League
    - Team Fortress 2
2. Amalgamized Modelling
    - Amalgamization Technique
    - Modelling
    - Testing
3. Control Model
    - Team Fortress 2 modelling
    - Testing

In [None]:
# we import the necessary libraries


from scipy.stats import boxcox as bc
import scipy.stats
import itertools
import statistics as stats
from fsds.imports import * 
from datetime import datetime
import statsmodels as sm
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.statespace.sarimax import SARIMAX

## Data Retreival

In [None]:
# we import our dataframes

csgo = pd.read_csv('data/Clean/csgo.csv')
dota = pd.read_csv('data/Clean/dota.csv')
rl = pd.read_csv('data/Clean/rl.csv')
tf = pd.read_csv('data/Clean/tf.csv')

raw_ls = [csgo, dota, rl, tf]

ls = raw_ls

In [None]:
# we also need to drop unnecessary columns

ls = list(map(lambda df: df.drop(columns = ['Unnamed: 0', 'index']), ls))

In [None]:
# and set our 'time' column to be the index

for df in ls:
    df.set_index(df['time'], inplace = True)

In [None]:
ls[0]

## 1. Basic Modelling

### a. Preparation

#### Transformations

In [None]:
# we make a placeholder list

ls_one_a = ls

In [None]:
# we need to transform data before we even decide on our predictor column. Let's inspect players over time first. 


for df in ls:
    df_plot = df.copy()

    df_plot = df_plot.drop(columns = ['viewers', 'event', '%chg_players', '%chg_viewers'])
    df_plot = df_plot.drop(columns = ['time'])

    df_plot.plot(figsize = (15,6))
    plt.show()
    
    

In [None]:
# we handle discontinuities

def fill_zeroes(df, col):
    for i in range(len(df)):
        if df[col][i] == 0:
            df[col][i] = df[col][i-1]
    return df

ls = list(map(lambda df: fill_zeroes(df, col = 'players'), ls))


In [None]:
# and inspect to ensure this was successful

for i, df in enumerate(ls):
    df_plot = df.copy()
    df_plot.set_index(df_plot['time'], inplace = True)
    df_plot = df_plot.drop(columns = ['viewers', 'event', '%chg_players', '%chg_viewers'])
    df_plot = df_plot.drop(columns = ['time'])

    df_plot.plot(figsize = (15,6))
    plt.show()

In [None]:
# all of these games had major changes in number of players directly after their release, leading us to 
# model only player data a sufficient time after release. 

start_dates = [ "2016-01-01",
              "2016-01-01",
              "2016-01-01",
              "2013-01-01"]

trimmed_df_list = []

for i in range(0, 4):
    trimmed = ls[i].where(ls[i]['time'] >= start_dates[i]).dropna()
    trimmed_df_list.append(trimmed)
    

In [None]:
# we inspect our trimmed data

for i, df in enumerate(trimmed_df_list):
    df_plot = df.copy()
    df_plot.set_index(df_plot['time'], inplace = True)
    df_plot = df_plot.drop(columns = ['viewers', 'event', '%chg_players', '%chg_viewers'])
    df_plot = df_plot.drop(columns = ['time'])

    df_plot.plot(figsize = (15,6))
    plt.show()

ls_trimmed = trimmed_df_list

ls = trimmed_df_list


Looking much better, but we have more work to complete before this can be modelled. 
#### Stationarity

In [None]:
# we visualize our mean player count over time. If this plot has a trend, our data is not stationary. 

for df in ls:
    
    rolling_mean = df['players'].rolling(window = 3).mean()
    
    fig = plt.figure(figsize = (15, 8))
   # plt.plot(df['players'], color = 'blue', label = 'Players')
    plt.plot(rolling_mean, color = 'orange', label = 'Rolling Average')

None of our games have stationary data, and we can address this in multiple ways. We can complete data transformations such as rolling mean transformations or differencing.

In [None]:
for df in ls:
    df['players_logged'] = np.log(df['players'])
    fig = plt.figure(figsize = (15, 8))
    plt.plot(df['players_logged'])

#### Rolling Mean Transformations

In [None]:
# we can stationize our data by subtracting the rolling mean from our 'players' column. This will force our data to be centered
# on the mean, even if the mean changes over time. 

for df in ls:
    rolling_mean = df['players'].rolling(window = 3).mean()
    df_plot = df.copy()
    df_plot['players_sub_mean'] = df['players'] - rolling_mean
    df_plot.dropna(inplace = True)
    fig = plt.figure(figsize = (15,8))
    plt.plot(df_plot['players_sub_mean'], label = 'Players Centered on Rolling Average')
    plt.show()
    

The data looks beautifully stationary now, though there are severe outliers in some cases. We'll address this after checking how differencing will make our data look.
#### Differencing

We have already constructed a column that represents a differencing technique. Our '%chg_players' represents this difference in the form of a percentage, but let's construct the raw difference for each games player counts. 

In [None]:
# using pandas diff function

for df in ls:
    df['difference'] = df['players'].diff(periods = 1)

In [None]:
# basic differences

for df in ls:
    fig = plt.figure(figsize = (15, 8))
    plt.plot(df['difference'], label = 'Differences in Players')

In [None]:
# differences of the rolling mean, though this will affect model interpretability

for df in ls:
    df['rolling_mean'] = df['players'].rolling(window = 3).mean()
    df['rolling_mean_diff'] = df['rolling_mean'].diff(periods = 1)
    fig = plt.figure(figsize = (15, 8))
    plt.plot(df['rolling_mean_diff'], label = 'Differences in Rolling Mean')



In [None]:
# differences in logged data

for df in ls:
    
    df['players_logged'] = np.log(df['players'])
    df['players_logged_diff'] = df['players_logged'].diff(periods = 1)
    
    fig = plt.figure(figsize = (15, 8))
    plt.plot(df['players_logged_diff'])

#### Outliers

We are going to deal with extreme outliers only, and we will continue on with using Logged Data, as this is normalized, on a small scale, and easily transformed back to player count.

In [None]:
# we need to keep a list of dataframes with outliers to use as a testing space.

ls_w_outliers = ls

In [None]:
ls = list(map(lambda df: df.dropna(), ls))

In [None]:
# outliers for our 'players_logged_differences'

plotting_lists = []

for df in ls:
    outlier_removed = []
    Inter_qr = scipy.stats.iqr(df['players_logged_diff'])
    mean = df['players_logged_diff'].mean()
    Q1 = df['players_logged_diff'].quantile([0.25])[0.25]
    Q3 = df['players_logged_diff'].quantile([0.75])[0.75]
    upper = Q3 + 3*Inter_qr
    lower = Q1 - 3*Inter_qr
    print(lower, upper)
    for i, val in enumerate(df['players_logged_diff']):
        if val > upper:
            outlier_removed.append(upper)
        elif val < lower:
            outlier_removed.append(lower)
        else:
            outlier_removed.append(val)
    plotting_lists.append(outlier_removed)

In [None]:
for i in range(0, 4):
    ls[i]['outlier_removed_logged_diff'] = plotting_lists[i]

In [None]:
# let's see how this affected our data

for df in ls:
    fig = plt.figure(figsize = (15, 8))
    plt.plot(df['outlier_removed_logged_diff'])

We have beautifully stationary data, without outliers, and this is also very easy to return to our original player count!

This data is ready to be modelled. 

In [None]:
ls_prepped = ls

### b. Model Construction

We will be using a gridsearch to find the optimal model. However, we still have to manually find S, or the seasonal component of the model. 

In [None]:
# we define our predictor column and prepare it.

mod_ls = []

for df in ls:
    df_mod = pd.DataFrame()
    df_mod['ORLD'] = df['outlier_removed_logged_diff']
    mod_ls.append(df_mod)
mod_ls

In [None]:
# we plot both ACF and PACF

for df in mod_ls:
    acf = plot_acf(df, lags = 25)
    pacf = plot_pacf(df, lags = 25)

The above plots suggest that our seasonal period for all our games is 7 days, or one week. 

In [None]:
# we establish our parameter variables

p = q = d = range(0, 2)
pdq = list(itertools.product(p, q, d))
pdqs = [(x[0], x[1], x[2], 7) for x in pdq]




In [None]:
ans_ls = []
for df in mod_ls:
    df = df.dropna()
    ans = []
    for comb in pdq:
        for combs in pdqs:
            mod = SARIMAX(df,
                                            order=comb,
                                            seasonal_order=combs,
                                            enforce_stationarity=False,
                                            enforce_invertibility=False)

            output = mod.fit()
            ans.append([comb, combs, output.aic])
            print('ARIMA {} x {}12 : AIC Calculated ={}'.format(comb, combs, output.aic))

    ans_df = pd.DataFrame(ans, columns=['pdq', 'pdqs', 'aic'])
    ans_df = ans_df.loc[ans_df['aic'].idxmin()].dropna()
    
    ans_ls.append(ans_df)

In [None]:
for i in range(0, 4):
    print(ans_ls[i])

Our optimized models, according to the grid search:

CS:GO --- ARIMA(1,0,1)x(1,0,1,7)

DOTA 2 --- ARIMA(1,0,1)x(1,1,1,7)

ROCKET LEAGUE --- ARIMA(1,0,1)x(1,1,1,7)

TEAM FORTRESS 2 --- ARIMA(1,0,1)x(1,0,1,7)

### c. Model Validation

In [None]:
# we generate validation statistics for our models

csgo_model = SARIMAX(mod_ls[0],
                    order = (1,0,1),
                    seasonal_order = (1,0,1,7),
                    enforce_stationarity = False,
                    enforce_invertibility = False)
csgo_output = csgo_model.fit()
print(csgo_output.summary().tables[1])

In [None]:
dota_model = SARIMAX(mod_ls[1],
                    order = (1,0,1),
                    seasonal_order = (1,1,1,7),
                    enforce_stationarity = False,
                    enforce_invertibility = False)
dota_output = dota_model.fit()
print(dota_output.summary().tables[1])

In [None]:
rl_model = SARIMAX(mod_ls[2],
                    order = (1,0,1),
                    seasonal_order = (1,1,1,7),
                    enforce_stationarity = False,
                    enforce_invertibility = False)
rl_output = rl_model.fit()
print(rl_output.summary().tables[1])

In [None]:
tf_model = SARIMAX(mod_ls[3],
                    order = (1,0,1),
                    seasonal_order = (1,0,1,7),
                    enforce_stationarity = False,
                    enforce_invertibility = False)
tf_output = tf_model.fit()
print(tf_output.summary().tables[1])

In [None]:
models = [csgo_model, dota_model, rl_model, tf_model]
outputs = [csgo_output, dota_output, rl_output, tf_output]

In [None]:
for i, output in enumerate(outputs):
    
    output.plot_diagnostics(figsize = (10, 10))
    

#### Forecasting our Basic Models

We will be forecasting from 2020-12-15 to present, and visualizing the full month of December 2020 for comparison to our original values.  


In [None]:
# we construct predictions

predictions = list(map(lambda output: output.get_prediction(start = pd.to_datetime('2020-12-15'), dynamic = False), outputs))
pred_conf = list(map(lambda pred: pred.conf_int(), predictions))



In [None]:
ax = ls_w_outliers[0]['2020-12-01':]['players_logged_diff'].plot(label = "Truth")

predictions[0].predicted_mean.plot(ax = ax, label = "Forecasted", alpha = 0.8)
ax.fill_between(pred_conf[0].index,
                   pred_conf[0].iloc[:,0],
                   pred_conf[0].iloc[:,1], color = 'g', alpha = 0.5)

In [None]:
ax = ls_w_outliers[1]['2020-12-01':]['players_logged_diff'].plot(label = "Truth")

predictions[1].predicted_mean.plot(ax = ax, label = "Forecasted", alpha = 0.8)
ax.fill_between(pred_conf[1].index,
                   pred_conf[1].iloc[:,0],
                   pred_conf[1].iloc[:,1], color = 'g', alpha = 0.5)

In [None]:
ax = ls_w_outliers[2]['2020-12-01':]['players_logged_diff'].plot(label = "Truth")

predictions[2].predicted_mean.plot(ax = ax, label = "Forecasted", alpha = 0.8)
ax.fill_between(pred_conf[2].index,
                   pred_conf[2].iloc[:,0],
                   pred_conf[2].iloc[:,1], color = 'g', alpha = 0.5)

In [None]:
ax = ls_w_outliers[3]['2020-12-01':]['players_logged_diff'].plot(label = "Truth")

predictions[3].predicted_mean.plot(ax = ax, label = "Forecasted", alpha = 0.8)
ax.fill_between(pred_conf[3].index,
                   pred_conf[3].iloc[:,0],
                   pred_conf[3].iloc[:,1], color = 'g', alpha = 0.5)

In [None]:
for i in range(0, 4):
    forecasted = predictions[i].predicted_mean
    truth = ls_w_outliers[i]["2020-01-01":]['players_logged_diff']
    error = forecasted - truth
    mse = (error ** 2).mean()
    print(mse)

### Part 1 Conclusions:

We constructed 4 models from the dataframes we have, and summary statistics show that these can all be said to be viable models in predicting the logged differences in player counts. P-values are lower than 0.05, and mean squared error values are also very low, though this is without any support until we can compare these values to those of other models. 

## Part 2: Amalgamized Modelling

The goal in this section of our project is to construct a model from all of our dataframes available. We will hopefully gain a model that is able to predict  for all of our games to an accuracy comparable to the accuracy values gained for individual models, found in part 1 above. 

We can construct this model by simply taking the average of all of our dataframe values. In this case we would be forced to scale all values, so we could feasibly use our % change feature for this process.  

### Taking mean values of dataframes
We are going to be taking the mean of '%_chg_players', but we have to prepare this data first. 
### Preparation

In [None]:
# resetting our dataframes to start fresh from when we trimmed the dataframe dates. 

ls = trimmed_df_list

ls = list(map(lambda df: df.dropna(), ls))

In [None]:
# in order to take the mean, we have to have all our series begin at the same time. We only have to alter team fortress 2

ls[3] = ls[3].where(ls[3]['time'] >= "2016-01-01").dropna()

In [None]:
# visualizing this feature
for df in ls:
    fig = plt.figure(figsize = (15, 8))
    plt.plot(df['%chg_players'])

In [None]:
# we should average our data before removing outliers

average_pc = []

for i in range(len(ls[0])):
    pc = '%chg_players'
    average_at_i = np.array([ls[0][pc][i], ls[1][pc][i], ls[2][pc][i], ls[3][pc][i]]).mean() 
    average_pc.append(average_at_i)
df_model = ls[0].copy()
df_model["mean_%chg_players"] = average_pc
df_model = df_model['mean_%chg_players']

df_model_w_outliers = pd.DataFrame(df_model)
df_model = pd.DataFrame(df_model)
df_model

In [None]:
# while our data is already stationary, we have to address outliers, much like we did before with our logged differences. 

outlier_removed = []

IQR = scipy.stats.iqr(df_model)
mean = df_model.mean()
quantiles = df_model['mean_%chg_players'].quantile([0.25, 0.75])
Q1 = quantiles[0.25]
Q3 = quantiles[0.75]
upper = Q3 + 3*IQR
lower = Q1 - 3*IQR
for val in df_model['mean_%chg_players']:
        if val > upper:
            outlier_removed.append(upper)
        elif val < lower:
            outlier_removed.append(lower)
        else:
            outlier_removed.append(val)
df_model['PCOR'] = outlier_removed

df_model = df_model.drop(columns = ['mean_%chg_players'])

df_model



In [None]:
# inspecting how this impacted our visualizations

fig = plt.figure(figsize = (15, 8))

plt.plot(df_model)

Data is ready to be modelled!

As before, we need to first ensure that our seasonality was not affected by our transformations so far. 

### Modelling

In [None]:
# plotting acf and pacf, showing that our seasonality is retained at s = 7

acf = plot_acf(df_model, lags = 25)
pacf = plot_pacf(df_model, lags = 25)

In [None]:
# defining our parameters for our gridsearch.

p = d = q = range(0, 2)
pdq = list(itertools.product(p, q, d))
pdqs = [(x[0], x[1], x[2], 7) for x in pdq]

In [None]:

models = []
for comb in pdq:
    for combs in pdqs:
        model = SARIMAX(df_model, 
                       order = comb,
                       seasonal_order = combs,
                       enforce_stationarity = False,
                       enforce_invertibility = False)
        output = model.fit()
        models.append([comb, combs, output.aic])
        print('ARIMA {} X {} : AIC Calculated = {}'.format(comb, combs, round(output.aic, 2)))
models_df = pd.DataFrame(models, columns = ['pdq', 'pdqs', 'aic'])
best_model = models_df.loc[models_df['aic'].idxmin()].dropna()
    

In [None]:
print(best_model)

### Validation

In [None]:
# we generate summary stats for the model

mean_model = SARIMAX(df_model_w_outliers,
             order = (1,0,1),
             seasonal_order = (1,0,1,7),
             enforce_stationarity = False,
             enforce_invertibility = False)
output = mean_model.fit()
print(output.summary().tables[1])

In [None]:
# we visualize validation plots

output.plot_diagnostics(figsize = (10, 10))
plt.show()

#### Forecasting 

We will forecast and calculate the mean squared error for each dataframe, using our constructed model. 

In [None]:
# we generate predictions

pred = output.get_prediction(start = pd.to_datetime('2020-12-15'))
pred_conf = pred.conf_int()

In [None]:
# and visualize the forecasts for each of our dataframes

ax = ls_w_outliers[0]['2020-12-01':]['%chg_players'].plot(label = "Truth")

pred.predicted_mean.plot(ax = ax, label = "Forecasted", alpha = 0.8)
ax.fill_between(pred_conf.index,
                   pred_conf.iloc[:,0],
                   pred_conf.iloc[:,1], color = 'g', alpha = 0.5)
plt.show()

In [None]:
ax = ls_w_outliers[1]['2020-12-01':]['%chg_players'].plot(label = "Truth")

pred.predicted_mean.plot(ax = ax, label = "Forecasted", alpha = 0.8)
ax.fill_between(pred_conf.index,
                   pred_conf.iloc[:,0],
                   pred_conf.iloc[:,1], color = 'g', alpha = 0.5)
plt.show()

In [None]:
ax = ls_w_outliers[2]['2020-12-01':]['%chg_players'].plot(label = "Truth")

pred.predicted_mean.plot(ax = ax, label = "Forecasted", alpha = 0.8)
ax.fill_between(pred_conf.index,
                   pred_conf.iloc[:,0],
                   pred_conf.iloc[:,1], color = 'g', alpha = 0.5)
plt.show()

In [None]:
ax = ls_w_outliers[3]['2020-12-01':]['%chg_players'].plot(label = "Truth")

pred.predicted_mean.plot(ax = ax, label = "Forecasted", alpha = 0.8)
ax.fill_between(pred_conf.index,
                   pred_conf.iloc[:,0],
                   pred_conf.iloc[:,1], color = 'g', alpha = 0.5)
plt.show()

In [None]:
# finally, we calculate the MSE for each of these forecasts. 

for df in ls_w_outliers:
    forecasted = pred.predicted_mean
    truth = df["2020-12-15":]['%chg_players']
    mse = ((forecasted - truth) ** 2).mean()
    print(mse)

### Conclusions on Amalgamized Modelling

We constructed one model from the mean change in players for all of our games. This model's summary statistics show that it is more viable than our basic models in part 1, due to the fact that our validation visualizations, namely the QQ plot suggesting that residuals in this model are more normally distributed than for all other models constructed thus far.

This model also has the benefit of being able to be applied to any game. 

## Part 3: Model Construction from Control Dataframe, Team Fortress 2

In this section, we will be constructing a model solely from Team Fortress 2, our control dataframe. We called this game our 'control' becuase there is a large amount of data available (13 years of daily player counts), as well as the fact that there are limited events for Team Fortress 2.

We would like to see if trends from one game are able to be used to predict trends on another, unrelated game. If this model performs well, we can say that trends from one game predict those of others. This seems to be the case from Part 2, since our amalgamized model performed well. 

Here, we are only using one game, which cuts down on the required data for our model. This is also more realistic in a business sense, as a company would want to be able to predict competitor trends based on their own trends. 

In [None]:
# we will be predicting logged differences for ease of comparison across models. 

# using our benchmarked lists, which contain all constructed columns

tf = ls_prepped[3]

tf_model_df = pd.DataFrame()

tf_model_df['ORLD'] = tf['outlier_removed_logged_diff']

tf_model_df.head()

In [None]:
# we can jump straight to model construction in this part. 

p = q = d = range(0,2)
pdq = list(itertools.product(p,d,q))
pdqs = [(x[0], x[1], x[2], 7) for x in pdq]

In [None]:

mods = []
for comb in pdq:
    for combs in pdqs:
        model = SARIMAX(tf_model_df, 
                       order = comb,
                       seasonal_order = combs,
                       enforce_stationarity = False,
                       enforce_invertibility = False)
        output = model.fit()
        mods.append([comb, combs, output.aic])
        print('ARIMA {} X {} : AIC Calculated = {}'.format(comb, combs, round(output.aic, 2)))
models_df = pd.DataFrame(mods, columns = ['pdq', 'pdqs', 'aic'])
best_model = models_df.loc[models_df['aic'].idxmin()].dropna()
   

In [None]:
print(best_model)

In [None]:
control_model = SARIMAX(tf_model_df,
                       order = (1,0,1),
                       seasonal_order = (1,0,1,7),
                       enforce_stationarity = False,
                       enforce_invertibility = False)
control_output = control_model.fit()
print(control_output.summary().tables[1])

In [None]:
pred = control_output.get_prediction(start = pd.to_datetime('2020-12-15'))
pred_conf = pred.conf_int()

In [None]:
ax =ls_w_outliers[0]["2020-12-01":]['players_logged_diff'].plot(label = "Truth")

pred.predicted_mean.plot(ax = ax, label = "Forecasted", alpha = 0.8)
ax.fill_between(pred_conf.index,
                   pred_conf.iloc[:,0],
                   pred_conf.iloc[:,1], color = 'g', alpha = 0.5)

In [None]:
ax =ls_w_outliers[1]["2020-12-01":]['players_logged_diff'].plot(label = "Truth")

pred.predicted_mean.plot(ax = ax, label = "Forecasted", alpha = 0.8)
ax.fill_between(pred_conf.index,
                   pred_conf.iloc[:,0],
                   pred_conf.iloc[:,1], color = 'g', alpha = 0.5)

In [None]:
ax =ls_w_outliers[2]["2020-12-01":]['players_logged_diff'].plot(label = "Truth")

pred.predicted_mean.plot(ax = ax, label = "Forecasted", alpha = 0.8)
ax.fill_between(pred_conf.index,
                   pred_conf.iloc[:,0],
                   pred_conf.iloc[:,1], color = 'g', alpha = 0.5)

In [None]:
ax =ls_w_outliers[3]["2020-12-01":]['players_logged_diff'].plot(label = "Truth")

pred.predicted_mean.plot(ax = ax, label = "Forecasted", alpha = 0.8)
ax.fill_between(pred_conf.index,
                   pred_conf.iloc[:,0],
                   pred_conf.iloc[:,1], color = 'g', alpha = 0.5)

In [None]:
# and we generate mse for each df. the last is best because this is tf model testing on tf data.

for df in ls_w_outliers:
    forecasted = pred.predicted_mean
    truth = df["2020-12-15":]['%chg_players']
    mse = ((forecasted - truth) ** 2).mean()
    print(mse)

### Conclusions on our Control Model

While this model constructed solely from Team Fortress 2 data can be said to be viable from our summary stats, our mean squared error is much higher than our amalgamized model. This suggests that Team Fortress 2 is not as effective at predicting trends than an amalgamization of all of our data being used to predict trends. 

In [None]:
# we save all of our constructed models for use in the next notebook

import pickle

models = [csgo_model,
dota_model,
rl_model,
tf_model,
mean_model]

names = ["models/csgo_model.pkl",
"models/dota_model.pkl",
"models/rl_model.pkl",
"models/tf_model.pkl",
"models/mean_model.pkl"]

for i in range(len(models)):
    with open(names[i], 'wb') as file:
        pickle.dump(models[i], file)

In [None]:
# we also save our testing dataframes, those that retain outliers. 

names = ['csgo_test.csv',
        'dota_test.csv',
        'rl_test.csv',
        'tf_test.csv']

for i in range(len(ls_w_outliers)):
    ls_w_outliers[i].to_csv('data/Test/'+names[i])

## Next:

In the next notebook, we will compare our models and describe the pros and cons of each, while also using models to find interesting points in the future. We will also provide our recommendations to those looking to utilize our findings, and conclude with thoughts on how our investigation could have yeilded more insight into these data. 