# Optimizing gold extraction from mined ore using Machine Learning

The Zyfra Group develops and implements industrial digital solutions to improve efficiency and safety of the mining, oil & gas, chemical industries. In this project, Zyfra has requested a prototype machine learning model to optimize production of gold from gold ore and eliminate unprofitable parameters. 

**Task Statement**

Prepare a prototype of a machine learning model for Zyfra. The company develops efficiency solutions for heavy industry. The model should predict the amount of gold recovered from gold ore. You have the data on extraction and purification. The model will help to optimize the production and eliminate unprofitable parameters.

## Data description

**Technological process**

- Rougher feed — raw material
- Rougher additions (or reagent additions) — flotation reagents: Xanthate, Sulphate, Depressant
 + Xanthate — promoter or flotation activator;
 + Sulphate — sodium sulphide for this particular process;
 + Depressant — sodium silicate.
- Rougher process — flotation
- Rougher tails — product residues
- Float banks — flotation unit
- Cleaner process — purification
- Rougher Au — rougher gold concentrate
- Final Au — final gold concentrate

**Parameters of stages**
- air amount — volume of air
- fluid levels
- feed size — feed particle size
- feed rate

The stages of the technological process for gold extraction from ore is shown

In [1]:
from IPython.display import Image
Image(url='https://pythondance.s3.us-west-2.amazonaws.com/Gold_extraction_process.JPG', width = 600, height = 300)

## Objectives

The objectives of this project is to:
- Simulate the gold recovery process from gold ore
- Develop a model to predict the amount of gold recovered and optimize gold production
- Predict final sMAPE metric 

<hr>

 # Table of contents

<div class="alert alert-block alert-info" style="margin-top: 20px">
    <ol>
        <li><a href="#open_the_data">Open the data file and study the general information</a></li>
        <li><a href="#analyze_the_data">Analyze the data</a></li>
        <li><a href="#build_the_model">Build the model</a></li>
        <li><a href="#overall_conclusion">Overall conclusion</a></li>
    </ol>
</div>
<br>
<hr>

<div id="open_the_data">
    <h2>Open the data file and study the general information</h2> 
</div>

We require the following libraries: *pandas* and *numpy* for data preprocessing and manipulation, *Scikit-Learn* for building our learning algorithms

In [None]:
# import pandas and numpy for data preprocessing and manipulation
import numpy as np
import pandas as pd
from scipy import stats as st

# import datetime for date conversion
import datetime
from datetime import datetime 

# seaborn and matplotlib for visualization
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

# import train_test_split to split data
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
pd.options.mode.chained_assignment = None # to avoid SettingWithCopyWarning after scaling

# import machine learning module from the sklearn library
from sklearn.linear_model import LinearRegression # import linear regression 

# import regression metrics
from sklearn.metrics import explained_variance_score 
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

# import warnings
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

print('Project libraries has been successfully been imported!')

In [None]:
# read the data
try:
    gold_recovery_train = pd.read_csv('https://code.s3.yandex.net/datasets/gold_recovery_train.csv')
    gold_recovery_test = pd.read_csv('https://code.s3.yandex.net/datasets/gold_recovery_test.csv')
    gold_recovery_full = pd.read_csv('https://code.s3.yandex.net/datasets/gold_recovery_full.csv')
except:
    gold_recovery_train = pd.read_csv('C:/Users/hotty/Desktop/Practicum by Yandex/Projects/Integrated Project 2/gold_recovery_train.csv')
    gold_recovery_test = pd.read_csv('C:/Users/hotty/Desktop/Practicum by Yandex/Projects/Integrated Project 2/gold_recovery_test.csv')
    gold_recovery_full = pd.read_csv('C:/Users/hotty/Desktop/Practicum by Yandex/Projects/Integrated Project 2/gold_recovery_full.csv')
print('Data has been read correctly!')

In [None]:
# function to determine if columns in file have null values
def get_percent_of_na(df, num):
    count = 0
    df = df.copy()
    s = (df.isna().sum() / df.shape[0])
    for column, percent in zip(s.index, s.values):
        num_of_nulls = df[column].isna().sum()
        if num_of_nulls == 0:
            continue
        else:
            count += 1
        print('Column {} has {:.{}%} percent of Nulls, and {} of nulls'.format(column, percent, num, num_of_nulls))
    if count != 0:
        print("\033[1m" + 'There are {} columns with NA.'.format(count) + "\033[0m")
    else:
        print()
        print("\033[1m" + 'There are no columns with NA.' + "\033[0m")
        
# function to display general information about the dataset
def get_info(df):
    """
    This function uses the head(), info(), describe(), shape() and duplicated() 
    methods to display the general information about the dataset.
    """
    print("\033[1m" + '-'*100 + "\033[0m")
    print('Head:')
    print()
    display(df.head())
    print('-'*100)
    print('Info:')
    print()
    display(df.info())
    print('-'*100)
    print('Describe:')
    print()
    display(df.describe())
    print('-'*100)
    display(df.describe(include='object'))
    print()
    print('Columns with nulls:')
    display(get_percent_of_na(df, 4))  # check this out
    print('-'*100)
    print('Shape:')
    print(df.shape)
    print('-'*100)
    print('Duplicated:')
    print("\033[1m" + 'We have {} duplicated rows.\n'.format(df.duplicated().sum()) + "\033[0m")
    print()

In [None]:
# study the general information about the dataset 
print('General information about the dataframe')
get_info(gold_recovery_train)
get_info(gold_recovery_test)
get_info(gold_recovery_full)

**Conclusion**

By looking at the data, we observed that:
- In the `gold_recovery_train.csv` dataset, we have 16,860 rows, 86 features, and 85 columns with NA's.
- In the `gold_recovery_test.csv` dataset, we have 5,856 rows, 52 features, and 51 columns with NA's.
- In the `gold_recovery_full.csv` dataset, we have 22,716 rows, 86 features, and 85 columns with NA's.

#### Calculate recovery and determine MAE between calculated recovery and feature values recovery

Let check that recovery is calculated correctly. The recovery process is simulated using:
\begin{equation}
    Recovery = \frac{C * (F - T)}{F * (C - T)} * 100%
\end{equation}

In [None]:
# gold extraction data
C = gold_recovery_train['rougher.output.concentrate_au']
F = gold_recovery_train['rougher.input.feed_au']
T = gold_recovery_train['rougher.output.tail_au']

# recovery calculation
calculated_recovery = (C * (F - T)) / (F * (C - T)) * 100
calculated_recovery

In [None]:
# MAE calculation
mae_calc = pd.DataFrame({'calculated_recovery': calculated_recovery, 'output_recovery': gold_recovery_train['rougher.output.recovery']}).dropna()
mae = mean_absolute_error(mae_calc.calculated_recovery, mae_calc.output_recovery)
print('The MAE score is: {:.2f}'.format(mae))

In [None]:
# check sample of calculated recovery and output recovery
mae_calc.sample(5)

**Conclusion**

From the calculations done, we can see that the `calculated_recovery` and `rougher.output.recovery` have similar values. We also observed that the MAE score is 0.0. This shows that the calculated value from the simulated recovery process is similar to the `rougher.output.recovery`. 

#### Locate features not available in test set

In [None]:
# features not available in the test set
features_diff = list(set(gold_recovery_train.columns).difference(gold_recovery_test.columns))
features_diff

**Conclusion**

By checking for features not available in the test set, we can see that we have 34 features in the train set that are not available in the test set. The features not available in the test set include features containing the concentration of metals (Au, Ag, Pb). The parameters type are output - product parameters and calculation characteristics. 

#### Data Preprocessing

In this section, we need to preprocess the data by converting the `date` datatype to datetime, filling in the missing values in the train, test and full dataset. For filling in the missing values, we would use the fill forward along the column axis.

In [None]:
# function to calculate the percentage of missing values
def missing_values_table(df):
    """
    This function calculates the percentage of
    missing values in the dataset
    """
    # Total missing values
    mis_val = df.isnull().sum()

    # Percentage of missing values
    mis_val_percent = 100 * df.isnull().sum() / len(df)

    # Make a table with the results
    mis_val_table = pd.concat([mis_val, mis_val_percent], axis=1)

    # Rename the columns
    mis_val_table_ren_columns = mis_val_table.rename(
    columns = {0 : 'Missing Values', 1 : '% of Total Values'})

    # Sort the table by percentage of missing descending
    mis_val_table_ren_columns = mis_val_table_ren_columns[
        mis_val_table_ren_columns.iloc[:,1] != 0].sort_values(
    '% of Total Values', ascending=False).round(1)

    # Print some summary information
    print ("Your selected dataframe has " + str(df.shape[1]) + " columns.\n"      
        "There are " + str(mis_val_table_ren_columns.shape[0]) +
            " columns that have missing values.")

    # Return the dataframe with missing information
    return mis_val_table_ren_columns

In [None]:
# convert date to datetime 
def convert_date_time(df, col):
    df[col] = pd.to_datetime(df[col])
    df.sort_values(col, inplace=True)
    
# function to forward fill missing values    
def fill_missing_values(df):
    df = df.fillna(method='ffill', axis=0, inplace=True)

##### Processing Missing Values

In [None]:
# check missing data in train
missing_values_table(gold_recovery_train)

In [None]:
# check missing data in test
missing_values_table(gold_recovery_test)

In [None]:
# check missing data in full data
missing_values_table(gold_recovery_full)

In [None]:
# filling in missing value using forward fill method
fill_missing_values(gold_recovery_train)
fill_missing_values(gold_recovery_test)
fill_missing_values(gold_recovery_full)

In [None]:
# re-check missing data in train dataset
missing_values_table(gold_recovery_train)

In [None]:
# re-check missing data in test dataset
missing_values_table(gold_recovery_test)

In [None]:
# re-check missing data in full dataset
missing_values_table(gold_recovery_full)

##### Process date feature

In [None]:
# process date column in dataset
convert_date_time(gold_recovery_train, 'date')
convert_date_time(gold_recovery_test, 'date')
convert_date_time(gold_recovery_full, 'date')

In [None]:
# check processed date column datatype
gold_recovery_train.dtypes

**Conclusion**

After data preprocessing, we have now filled NA/NaN values in the dataframe using the forward fill method in pandas. We rechecked to be sure that there is no more missing values in the dataset. We also processed the `date` feature by converting to the datatime datatype.

#### Conclusion

In this section, we inspected the data and observed that the `gold_recovery_train.csv` dataset have 16,860 rows, 86 features, and 85 columns with NA's. In the `gold_recovery_test.csv` dataset, we have 5,856 rows, 52 features, and 51 columns with NA's. In the `gold_recovery_full.csv` dataset, we have 22,716 rows, 86 features, and 85 columns with NA's. We then proceeded to calculate the recovery for the `rougher.output.recovery` feature using the training set. We computed the mean absolute error between the calculated value and feature values to be 0.0. This signifies that the recovery value is calculated correctly since there is no error. We noted that thirty-four (34) features from the output and calculation parameter type were not available in the test set. We then proceeded to preprocess the NA's using the forward fill method and changed the datatype in the `date` feature to datetime. Now the data is prepared and ready for analysis.

<div id="analyze_the_data">
    <h2>Analyze the data</h2> 
</div>

##### Distribution of concentration of metals across purification stage

To find out how the concentrations of metals (Au, Ag, Pb) changes depending on the purification stage, we plot the distribution of concentrations across each stage.

In [None]:
metals = ['au', 'ag', 'pb']
stage_parameter = ['rougher.output.concentrate', 'primary_cleaner.output.concentrate', 'final.output.concentrate']
xcolors = ['green', 'orange', 'dodgerblue']

# function to plot distribution of concentrations
def plot_distribution(df):
    kwargs = dict(hist_kws={'alpha': 0.6}, kde_kws={'linewidth':2})
    for element in metals:
        plt.figure(figsize=(10,6), dpi=80)
        for features, colour in zip(stage_parameter, xcolors):
            sns.distplot(df[features+'_'+element], color=colour, label=features+'_'+element, **kwargs)
            plt.axvline(0, c="r")
        plt.title('Distribution of ' +element+ ' concentrations across purification stage')
        plt.xlabel('Concentration of '+element)
        plt.legend();

In [None]:
# distribution of concentration of metals in the training set
plot_distribution(gold_recovery_train)

The plots above shows the distribution of concentration of metals across purification stage. Looking at the plot, we observed that the concentration of gold (au) increased from the `rougher.output` to the `final.output`. This shows that the process is functioning as it suppose to. The concentration of the other metals decreased in the case of silver (ag) or remain almost the same like lead (pb).

##### Compare the feed particle size distributions in the training set and in the test set

In [None]:
# function to plot distribution of feed particle size
def plot_particle_size_distribution(df):
    kwargs = dict(hist_kws={'alpha': 0.5}, kde_kws={'linewidth':2})
    plt.figure(figsize=(10,6), dpi=80)
    for features, colour, labels in zip(input_feed, xcolors, xlabel_):
        sns.distplot(features, color=colour, label=labels, **kwargs)
        plt.axvline(0, c="r")
    plt.title('Feed particle size distribution for '+ [x.split('__', 1)[1] for x in xlabel_][0])
    plt.xlabel('Distribution of feed particle size')
    plt.legend();

In [None]:
input_feed = [gold_recovery_train['rougher.input.feed_size'], gold_recovery_test['rougher.input.feed_size']]
xlabel_ = ['gold_recovery_train__rougher.input.feed_size', 'gold_recovery_test__rougher.input.feed_size']
xcolors = ['green', 'orange']

# distribution of feed particle size (rougher.input.feed_size)
plot_particle_size_distribution(gold_recovery_train)

In [None]:
input_feed = [gold_recovery_train['primary_cleaner.input.feed_size'], gold_recovery_test['primary_cleaner.input.feed_size']]
xlabel_ = ['gold_recovery_train__primary_cleaner.input.feed_size', 'gold_recovery_test__primary_cleaner.input.feed_size']
xcolors = ['dodgerblue', 'orange']

# distribution of feed particle size (primary_cleaner.input.feed_size)
plot_particle_size_distribution(gold_recovery_test)

We compared the feed particle size distribution for the training and test dataset. We can also observe that the particle size distributions are similar. The distribution do not vary significantly hence we expect the model evaluation to be correct. This means that we can train the model on the train set and test model on the test set without any significant problem.

##### Check for anomalies in total concentrations of all substances at different stages

In [None]:
# rougher.input.feed in full dataset
gold_recovery_full['rougher.input.feed.total_concentration'] = gold_recovery_full[['rougher.input.feed_ag', 'rougher.input.feed_pb', 'rougher.input.feed_sol', 'rougher.input.feed_au']].sum(axis=1)

# check for anomaly
total_conc_stage_1 = gold_recovery_full['rougher.input.feed.total_concentration']
plt.figure(figsize=(10,6))
plt.hist(total_conc_stage_1, density=True, bins=100)
# add title and axis names
plt.xlabel('rougher.input.feed_total concentration')
plt.ylabel('frequency')
plt.title("Distribution of total concentration in raw feed");

In [None]:
# rougher.output.concentrate in full dataset
gold_recovery_full['rougher.output.concentrate.total_concentration'] = gold_recovery_full[['rougher.output.concentrate_ag', 'rougher.output.concentrate_pb', 'rougher.output.concentrate_sol', 'rougher.output.concentrate_au']].sum(axis=1)

# check for anomaly
total_conc_stage_2 = gold_recovery_full['rougher.output.concentrate.total_concentration']
plt.figure(figsize=(10,6))
plt.hist(total_conc_stage_2, density=True, bins=100)
# add title and axis names
plt.xlabel('rougher.output.concentrate_total concentration')
plt.ylabel('frequency')
plt.title("Distribution of total concentration in rougher concentrate");

In [None]:
# final.output.concentrate in full dataset
gold_recovery_full['final.output.concentrate.total_concentration'] = gold_recovery_full[['final.output.concentrate_ag', 'final.output.concentrate_pb', 'final.output.concentrate_sol', 'final.output.concentrate_au']].sum(axis=1)

# check for anomaly
total_conc_stage_3 = gold_recovery_full['final.output.concentrate.total_concentration'] 
plt.figure(figsize=(10,6))
plt.hist(total_conc_stage_3, density=True, bins=100)
# add title and axis names
plt.xlabel('final.output.concentrate_total concentration')
plt.ylabel('frequency')
plt.title("Distribution of total concentration in final concentrate");

We sum the total concentrations of all substances at different stages in the process.i.e. the raw feed (`rougher.input.feed.total_concentration`), rougher concentrate(`rougher.output.concentrate.total_concentration`), and final concentrate(`final.output.concentrate.total_concentration`). We plotted a histogram and observed a peak in all three plots. This shows an anomaly in the data. It is strange to have total concentration around 0, so the peak is an abnormal value and we removed it by setting the threshold between 0.8 and 0.95.

In [29]:
# filtered full dataset without anomaly
gold_recovery_full_data = gold_recovery_full[(gold_recovery_full['rougher.input.feed.total_concentration'] > 0.8) & 
                                        (gold_recovery_full['rougher.output.concentrate.total_concentration'] > 0.95) & 
                                        (gold_recovery_full['final.output.concentrate.total_concentration'] > 0.8)]
print('The shape of the new dataset is: ', format(gold_recovery_full_data.shape))
print()
gold_recovery_full_data.describe()

The shape of the new dataset is:  (19904, 90)



Unnamed: 0,final.output.concentrate_ag,final.output.concentrate_pb,final.output.concentrate_sol,final.output.concentrate_au,final.output.recovery,final.output.tail_ag,final.output.tail_pb,final.output.tail_sol,final.output.tail_au,primary_cleaner.input.sulfate,...,secondary_cleaner.state.floatbank4_b_level,secondary_cleaner.state.floatbank5_a_air,secondary_cleaner.state.floatbank5_a_level,secondary_cleaner.state.floatbank5_b_air,secondary_cleaner.state.floatbank5_b_level,secondary_cleaner.state.floatbank6_a_air,secondary_cleaner.state.floatbank6_a_level,rougher.input.feed.total_concentration,rougher.output.concentrate.total_concentration,final.output.concentrate.total_concentration
count,19904.0,19904.0,19904.0,19904.0,19904.0,19904.0,19904.0,19904.0,19904.0,19904.0,...,19904.0,19904.0,19904.0,19904.0,19904.0,19904.0,19904.0,19904.0,19904.0,19904.0
mean,5.250317,10.005729,9.437927,44.024213,67.274249,9.722925,2.695731,10.405297,3.059586,143.984616,...,-461.784848,15.670516,-488.682801,12.21768,-487.183956,18.941251,-505.444267,57.056355,69.231266,68.718185
std,1.427628,1.575505,2.882103,4.686102,9.61995,2.396828,0.980749,3.041419,0.971711,45.810767,...,66.674388,5.510786,35.460418,5.307127,39.196781,5.541075,38.854243,7.863282,9.23144,6.498115
min,0.0,0.135894,0.0,0.225775,0.0,0.0,0.0,0.0,0.0,0.003112,...,-800.836914,-0.372054,-797.323986,0.528083,-800.220337,-0.079426,-809.741464,0.913565,1.064979,0.835351
25%,4.2782,9.146978,7.629914,43.275641,63.350638,8.077092,2.023136,8.763196,2.46349,113.36284,...,-500.187016,10.988093,-500.458352,8.97788,-500.130066,14.983458,-500.739496,53.372708,66.195289,67.860212
50%,5.108484,10.105279,9.191454,44.936239,68.236471,9.809782,2.739593,10.52745,2.996648,143.095289,...,-499.473279,14.997593,-499.793222,11.02112,-499.934117,19.95517,-500.047765,57.549919,70.342238,69.554069
75%,5.952,11.043546,10.969702,46.231796,72.741328,11.21799,3.344845,12.029298,3.588434,175.275803,...,-400.14428,18.025555,-498.348094,14.01748,-499.431386,23.991131,-499.481118,61.98872,74.59847,71.344193
max,16.001945,17.031899,19.61572,53.611374,100.0,19.552149,6.086532,22.861749,9.789625,265.983123,...,-7.834746,43.709931,-244.483566,27.926001,-137.740004,32.188906,-104.427459,76.978947,90.964431,80.210644


In [46]:
pd.set_option("display.max_rows", None, "display.max_columns", None)

In [1]:
pd.reset_option("display.max_rows", None, "display.max_columns", None)

NameError: name 'pd' is not defined

In [33]:
# make the `date` column the index
gold_recovery_full_data = gold_recovery_full_data.set_index('date')
gold_recovery_train = gold_recovery_train.set_index('date')
gold_recovery_test = gold_recovery_test.set_index('date')

In [37]:
#gold_recovery_test.head()

In [47]:
# new train dataset
gold_recovery_train_data = gold_recovery_full_data.merge(gold_recovery_train, on='date', how='left').dropna(axis=)
#gold_recovery_full_data.loc[gold_recovery_train.index]
gold_recovery_train_data.head()

SyntaxError: invalid syntax (<ipython-input-47-1c4488702f84>, line 2)

In [45]:
missing_values_table(gold_recovery_train_data)

Your selected dataframe has 175 columns.
There are 86 columns that have missing values.


Unnamed: 0,Missing Values,% of Total Values
final.output.concentrate_ag_y,5290,26.6
rougher.state.floatbank10_b_air_y,5290,26.6
rougher.state.floatbank10_f_air_y,5290,26.6
rougher.state.floatbank10_e_level_y,5290,26.6
rougher.state.floatbank10_e_air_y,5290,26.6
...,...,...
primary_cleaner.state.floatbank8_d_air_y,5290,26.6
primary_cleaner.state.floatbank8_c_level_y,5290,26.6
primary_cleaner.state.floatbank8_c_air_y,5290,26.6
primary_cleaner.state.floatbank8_b_level_y,5290,26.6


In [43]:
print(gold_recovery_train_data.columns)

Index(['final.output.concentrate_ag_x', 'final.output.concentrate_pb_x',
       'final.output.concentrate_sol_x', 'final.output.concentrate_au_x',
       'final.output.recovery_x', 'final.output.tail_ag_x',
       'final.output.tail_pb_x', 'final.output.tail_sol_x',
       'final.output.tail_au_x', 'primary_cleaner.input.sulfate_x',
       ...
       'secondary_cleaner.state.floatbank4_a_air_y',
       'secondary_cleaner.state.floatbank4_a_level_y',
       'secondary_cleaner.state.floatbank4_b_air_y',
       'secondary_cleaner.state.floatbank4_b_level_y',
       'secondary_cleaner.state.floatbank5_a_air_y',
       'secondary_cleaner.state.floatbank5_a_level_y',
       'secondary_cleaner.state.floatbank5_b_air_y',
       'secondary_cleaner.state.floatbank5_b_level_y',
       'secondary_cleaner.state.floatbank6_a_air_y',
       'secondary_cleaner.state.floatbank6_a_level_y'],
      dtype='object', length=175)


In [38]:
# new test dataset
gold_recovery_test_data = gold_recovery_full_data.loc[gold_recovery_test.index]
gold_recovery_test_data.head()

KeyError: "[Timestamp('2016-09-02 06:59:59'), Timestamp('2016-09-02 07:59:59'), Timestamp('2016-09-02 08:59:59'), Timestamp('2016-09-02 09:59:59'), Timestamp('2016-09-04 17:59:59'), Timestamp('2016-09-04 18:59:59'), Timestamp('2016-09-04 19:59:59'), Timestamp('2016-09-04 20:59:59'), Timestamp('2016-09-04 21:59:59'), Timestamp('2016-09-04 22:59:59'), Timestamp('2016-09-04 23:59:59'), Timestamp('2016-09-05 01:59:59'), Timestamp('2016-09-05 02:59:59'), Timestamp('2016-09-05 03:59:59'), Timestamp('2016-09-05 04:59:59'), Timestamp('2016-09-05 05:59:59'), Timestamp('2016-09-05 06:59:59'), Timestamp('2016-09-05 07:59:59'), Timestamp('2016-09-05 08:59:59'), Timestamp('2016-09-05 09:59:59'), Timestamp('2016-09-05 10:59:59'), Timestamp('2016-09-05 11:59:59'), Timestamp('2016-09-05 12:59:59'), Timestamp('2016-09-05 13:59:59'), Timestamp('2016-09-05 14:59:59'), Timestamp('2016-09-05 15:59:59'), Timestamp('2016-09-05 16:59:59'), Timestamp('2016-09-05 17:59:59'), Timestamp('2016-09-05 18:59:59'), Timestamp('2016-09-05 19:59:59'), Timestamp('2016-09-05 20:59:59'), Timestamp('2016-09-05 21:59:59'), Timestamp('2016-09-05 22:59:59'), Timestamp('2016-09-05 23:59:59'), Timestamp('2016-09-06 00:59:59'), Timestamp('2016-09-06 01:59:59'), Timestamp('2016-09-06 02:59:59'), Timestamp('2016-09-06 03:59:59'), Timestamp('2016-09-06 04:59:59'), Timestamp('2016-09-06 05:59:59'), Timestamp('2016-09-06 06:59:59'), Timestamp('2016-09-06 07:59:59'), Timestamp('2016-09-06 08:59:59'), Timestamp('2016-09-06 09:59:59'), Timestamp('2016-09-06 10:59:59'), Timestamp('2016-09-06 11:59:59'), Timestamp('2016-09-06 12:59:59'), Timestamp('2016-09-06 13:59:59'), Timestamp('2016-09-06 14:59:59'), Timestamp('2016-09-06 15:59:59'), Timestamp('2016-09-06 16:59:59'), Timestamp('2016-09-06 17:59:59'), Timestamp('2016-09-06 18:59:59'), Timestamp('2016-09-06 19:59:59'), Timestamp('2016-09-06 20:59:59'), Timestamp('2016-09-06 21:59:59'), Timestamp('2016-09-06 22:59:59'), Timestamp('2016-09-06 23:59:59'), Timestamp('2016-09-07 00:59:59'), Timestamp('2016-09-07 01:59:59'), Timestamp('2016-09-07 02:59:59'), Timestamp('2016-09-07 03:59:59'), Timestamp('2016-09-07 04:59:59'), Timestamp('2016-09-07 05:59:59'), Timestamp('2016-09-07 06:59:59'), Timestamp('2016-09-07 07:59:59'), Timestamp('2016-09-07 08:59:59'), Timestamp('2016-09-07 09:59:59'), Timestamp('2016-09-07 10:59:59'), Timestamp('2016-09-07 11:59:59'), Timestamp('2016-09-07 12:59:59'), Timestamp('2016-09-07 13:59:59'), Timestamp('2016-09-07 14:59:59'), Timestamp('2016-09-08 01:59:59'), Timestamp('2016-09-08 02:59:59'), Timestamp('2016-09-08 03:59:59'), Timestamp('2016-09-11 05:59:59'), Timestamp('2016-09-11 06:59:59'), Timestamp('2016-09-11 07:59:59'), Timestamp('2016-09-11 08:59:59'), Timestamp('2016-09-11 09:59:59'), Timestamp('2016-09-11 10:59:59'), Timestamp('2016-09-11 11:59:59'), Timestamp('2016-09-11 12:59:59'), Timestamp('2016-09-11 13:59:59'), Timestamp('2016-09-11 14:59:59'), Timestamp('2016-09-11 15:59:59'), Timestamp('2016-09-11 20:59:59'), Timestamp('2016-09-11 21:59:59'), Timestamp('2016-09-11 22:59:59'), Timestamp('2016-09-11 23:59:59'), Timestamp('2016-09-12 00:59:59'), Timestamp('2016-09-12 01:59:59'), Timestamp('2016-09-13 22:59:59'), Timestamp('2016-09-13 23:59:59'), Timestamp('2016-09-14 00:59:59'), Timestamp('2016-09-14 01:59:59'), Timestamp('2016-09-14 02:59:59'), Timestamp('2016-09-14 03:59:59'), Timestamp('2016-09-14 04:59:59'), Timestamp('2016-09-14 05:59:59'), Timestamp('2016-09-14 06:59:59'), Timestamp('2016-09-14 07:59:59'), Timestamp('2016-09-14 08:59:59'), Timestamp('2016-09-14 09:59:59'), Timestamp('2016-09-14 10:59:59'), Timestamp('2016-09-14 11:59:59'), Timestamp('2016-09-14 12:59:59'), Timestamp('2016-09-14 13:59:59'), Timestamp('2016-09-14 14:59:59'), Timestamp('2016-09-14 15:59:59'), Timestamp('2016-09-14 16:59:59'), Timestamp('2016-09-14 17:59:59'), Timestamp('2016-09-16 08:59:59'), Timestamp('2016-09-16 09:59:59'), Timestamp('2016-09-16 10:59:59'), Timestamp('2016-09-16 11:59:59'), Timestamp('2016-09-16 12:59:59'), Timestamp('2016-09-21 13:59:59'), Timestamp('2016-09-22 20:59:59'), Timestamp('2016-09-22 21:59:59'), Timestamp('2016-09-22 22:59:59'), Timestamp('2016-09-22 23:59:59'), Timestamp('2016-09-23 00:59:59'), Timestamp('2016-09-23 01:59:59'), Timestamp('2016-09-23 02:59:59'), Timestamp('2016-09-23 03:59:59'), Timestamp('2016-09-23 04:59:59'), Timestamp('2016-09-23 05:59:59'), Timestamp('2016-09-23 06:59:59'), Timestamp('2016-09-23 07:59:59'), Timestamp('2016-09-23 08:59:59'), Timestamp('2016-09-23 09:59:59'), Timestamp('2016-09-23 10:59:59'), Timestamp('2016-09-23 11:59:59'), Timestamp('2016-09-23 12:59:59'), Timestamp('2016-09-23 13:59:59'), Timestamp('2016-09-23 14:59:59'), Timestamp('2016-09-23 15:59:59'), Timestamp('2016-09-23 16:59:59'), Timestamp('2016-09-23 17:59:59'), Timestamp('2016-09-23 18:59:59'), Timestamp('2016-09-23 19:59:59'), Timestamp('2016-09-23 20:59:59'), Timestamp('2016-09-23 21:59:59'), Timestamp('2016-09-23 22:59:59'), Timestamp('2016-09-23 23:59:59'), Timestamp('2016-09-24 00:59:59'), Timestamp('2016-09-24 01:59:59'), Timestamp('2016-09-24 02:59:59'), Timestamp('2016-09-24 03:59:59'), Timestamp('2016-09-24 04:59:59'), Timestamp('2016-09-24 05:59:59'), Timestamp('2016-09-24 06:59:59'), Timestamp('2016-09-24 07:59:59'), Timestamp('2016-09-24 08:59:59'), Timestamp('2016-09-24 09:59:59'), Timestamp('2016-09-24 10:59:59'), Timestamp('2016-09-24 11:59:59'), Timestamp('2016-09-24 12:59:59'), Timestamp('2016-09-24 13:59:59'), Timestamp('2016-09-24 14:59:59'), Timestamp('2016-09-24 15:59:59'), Timestamp('2016-09-24 16:59:59'), Timestamp('2016-09-24 17:59:59'), Timestamp('2016-09-24 18:59:59'), Timestamp('2016-09-24 19:59:59'), Timestamp('2016-09-24 20:59:59'), Timestamp('2016-09-24 21:59:59'), Timestamp('2016-09-24 22:59:59'), Timestamp('2016-09-24 23:59:59'), Timestamp('2016-09-25 00:59:59'), Timestamp('2016-09-25 01:59:59'), Timestamp('2016-09-25 02:59:59'), Timestamp('2016-09-25 03:59:59'), Timestamp('2016-09-25 04:59:59'), Timestamp('2016-09-25 05:59:59'), Timestamp('2016-09-25 06:59:59'), Timestamp('2016-09-25 07:59:59'), Timestamp('2016-09-25 08:59:59'), Timestamp('2016-09-25 09:59:59'), Timestamp('2016-09-25 10:59:59'), Timestamp('2016-09-25 11:59:59'), Timestamp('2016-09-25 12:59:59'), Timestamp('2016-09-25 13:59:59'), Timestamp('2016-09-25 14:59:59'), Timestamp('2016-09-25 15:59:59'), Timestamp('2016-09-25 16:59:59'), Timestamp('2016-09-25 17:59:59'), Timestamp('2016-09-25 18:59:59'), Timestamp('2016-09-25 19:59:59'), Timestamp('2016-09-25 20:59:59'), Timestamp('2016-09-25 21:59:59'), Timestamp('2016-09-25 22:59:59'), Timestamp('2016-09-25 23:59:59'), Timestamp('2016-09-26 00:59:59'), Timestamp('2016-09-26 01:59:59'), Timestamp('2016-09-26 02:59:59'), Timestamp('2016-09-26 03:59:59'), Timestamp('2016-09-26 04:59:59'), Timestamp('2016-09-26 05:59:59'), Timestamp('2016-09-26 06:59:59'), Timestamp('2016-09-26 07:59:59'), Timestamp('2016-09-26 08:59:59'), Timestamp('2016-09-26 09:59:59'), Timestamp('2016-09-26 10:59:59'), Timestamp('2016-09-26 11:59:59'), Timestamp('2016-09-26 12:59:59'), Timestamp('2016-09-26 13:59:59'), Timestamp('2016-09-26 14:59:59'), Timestamp('2016-09-26 15:59:59'), Timestamp('2016-09-26 16:59:59'), Timestamp('2016-09-26 17:59:59'), Timestamp('2016-09-26 18:59:59'), Timestamp('2016-09-26 19:59:59'), Timestamp('2016-09-26 20:59:59'), Timestamp('2016-09-26 21:59:59'), Timestamp('2016-09-26 22:59:59'), Timestamp('2016-09-26 23:59:59'), Timestamp('2016-09-27 00:59:59'), Timestamp('2016-09-27 01:59:59'), Timestamp('2016-09-27 02:59:59'), Timestamp('2016-09-27 03:59:59'), Timestamp('2016-09-27 04:59:59'), Timestamp('2016-09-27 05:59:59'), Timestamp('2016-09-27 06:59:59'), Timestamp('2016-09-27 07:59:59'), Timestamp('2016-09-27 08:59:59'), Timestamp('2016-09-27 09:59:59'), Timestamp('2016-09-27 10:59:59'), Timestamp('2016-09-27 11:59:59'), Timestamp('2016-09-27 12:59:59'), Timestamp('2016-09-27 13:59:59'), Timestamp('2016-09-27 14:59:59'), Timestamp('2016-09-27 15:59:59'), Timestamp('2016-09-27 16:59:59'), Timestamp('2016-09-27 17:59:59'), Timestamp('2016-09-27 18:59:59'), Timestamp('2016-09-27 19:59:59'), Timestamp('2016-09-27 20:59:59'), Timestamp('2016-09-27 21:59:59'), Timestamp('2016-09-27 22:59:59'), Timestamp('2016-09-27 23:59:59'), Timestamp('2016-09-28 00:59:59'), Timestamp('2016-09-28 01:59:59'), Timestamp('2016-09-28 02:59:59'), Timestamp('2016-09-28 03:59:59'), Timestamp('2016-09-28 04:59:59'), Timestamp('2016-09-28 05:59:59'), Timestamp('2016-09-28 06:59:59'), Timestamp('2016-09-28 07:59:59'), Timestamp('2016-09-28 08:59:59'), Timestamp('2016-09-28 09:59:59'), Timestamp('2016-09-28 10:59:59'), Timestamp('2016-09-28 11:59:59'), Timestamp('2016-09-28 12:59:59'), Timestamp('2016-09-28 13:59:59'), Timestamp('2016-09-28 14:59:59'), Timestamp('2016-09-28 15:59:59'), Timestamp('2016-09-28 16:59:59'), Timestamp('2016-09-28 17:59:59'), Timestamp('2016-09-28 18:59:59'), Timestamp('2016-09-28 19:59:59'), Timestamp('2016-09-28 20:59:59'), Timestamp('2016-09-28 21:59:59'), Timestamp('2016-09-28 22:59:59'), Timestamp('2016-09-28 23:59:59'), Timestamp('2016-09-29 00:59:59'), Timestamp('2016-09-29 01:59:59'), Timestamp('2016-09-29 02:59:59'), Timestamp('2016-09-29 03:59:59'), Timestamp('2016-09-29 04:59:59'), Timestamp('2016-09-29 05:59:59'), Timestamp('2016-09-29 06:59:59'), Timestamp('2016-09-29 07:59:59'), Timestamp('2016-09-29 08:59:59'), Timestamp('2016-09-29 09:59:59'), Timestamp('2016-09-29 10:59:59'), Timestamp('2016-09-29 11:59:59'), Timestamp('2016-09-29 12:59:59'), Timestamp('2016-09-29 13:59:59'), Timestamp('2016-09-29 14:59:59'), Timestamp('2016-09-30 10:59:59'), Timestamp('2016-09-30 11:59:59'), Timestamp('2016-09-30 12:59:59'), Timestamp('2016-09-30 13:59:59'), Timestamp('2016-09-30 14:59:59'), Timestamp('2016-09-30 15:59:59'), Timestamp('2016-10-10 07:59:59'), Timestamp('2016-10-10 08:59:59'), Timestamp('2016-10-10 09:59:59'), Timestamp('2016-10-10 10:59:59'), Timestamp('2016-10-10 11:59:59'), Timestamp('2016-10-10 12:59:59'), Timestamp('2016-10-10 13:59:59'), Timestamp('2016-10-10 14:59:59'), Timestamp('2016-10-24 13:59:59'), Timestamp('2016-11-01 12:59:59'), Timestamp('2016-11-01 13:59:59'), Timestamp('2016-11-01 14:59:59'), Timestamp('2016-11-01 15:59:59'), Timestamp('2016-11-01 16:59:59'), Timestamp('2016-11-01 17:59:59'), Timestamp('2016-11-01 18:59:59'), Timestamp('2016-11-06 14:59:59'), Timestamp('2016-11-10 16:59:59'), Timestamp('2016-11-10 17:59:59'), Timestamp('2016-11-10 18:59:59'), Timestamp('2016-11-10 19:59:59'), Timestamp('2016-11-10 20:59:59'), Timestamp('2016-11-10 21:59:59'), Timestamp('2016-11-10 22:59:59'), Timestamp('2016-11-11 09:59:59'), Timestamp('2016-11-11 10:59:59'), Timestamp('2016-11-19 15:59:59'), Timestamp('2016-11-25 16:59:59'), Timestamp('2016-11-26 10:59:59'), Timestamp('2016-11-26 11:59:59'), Timestamp('2016-11-26 12:59:59'), Timestamp('2016-11-26 13:59:59'), Timestamp('2016-11-26 14:59:59'), Timestamp('2016-11-26 15:59:59'), Timestamp('2016-11-26 16:59:59'), Timestamp('2016-11-26 17:59:59'), Timestamp('2016-11-26 18:59:59'), Timestamp('2016-11-26 19:59:59'), Timestamp('2016-11-30 05:59:59'), Timestamp('2016-12-09 14:59:59'), Timestamp('2016-12-22 11:59:59'), Timestamp('2016-12-22 12:59:59'), Timestamp('2016-12-22 13:59:59'), Timestamp('2016-12-22 14:59:59'), Timestamp('2016-12-23 19:59:59'), Timestamp('2017-09-04 07:59:59'), Timestamp('2017-09-04 08:59:59'), Timestamp('2017-09-04 09:59:59'), Timestamp('2017-09-04 10:59:59'), Timestamp('2017-09-04 11:59:59'), Timestamp('2017-09-04 12:59:59'), Timestamp('2017-09-04 13:59:59'), Timestamp('2017-09-04 14:59:59'), Timestamp('2017-09-04 15:59:59'), Timestamp('2017-09-04 16:59:59'), Timestamp('2017-09-04 17:59:59'), Timestamp('2017-09-04 18:59:59'), Timestamp('2017-09-04 19:59:59'), Timestamp('2017-09-04 20:59:59'), Timestamp('2017-09-04 21:59:59'), Timestamp('2017-09-04 22:59:59'), Timestamp('2017-09-04 23:59:59'), Timestamp('2017-09-05 00:59:59'), Timestamp('2017-09-05 01:59:59'), Timestamp('2017-09-05 02:59:59'), Timestamp('2017-09-05 03:59:59'), Timestamp('2017-09-05 04:59:59'), Timestamp('2017-09-05 05:59:59'), Timestamp('2017-09-05 06:59:59'), Timestamp('2017-09-05 07:59:59'), Timestamp('2017-09-05 08:59:59'), Timestamp('2017-09-05 09:59:59'), Timestamp('2017-09-05 10:59:59'), Timestamp('2017-09-05 11:59:59'), Timestamp('2017-09-05 12:59:59'), Timestamp('2017-09-05 13:59:59'), Timestamp('2017-09-05 14:59:59'), Timestamp('2017-09-05 15:59:59'), Timestamp('2017-09-05 16:59:59'), Timestamp('2017-09-05 17:59:59'), Timestamp('2017-09-05 18:59:59'), Timestamp('2017-09-05 19:59:59'), Timestamp('2017-09-05 20:59:59'), Timestamp('2017-09-05 21:59:59'), Timestamp('2017-09-05 22:59:59'), Timestamp('2017-09-05 23:59:59'), Timestamp('2017-09-06 00:59:59'), Timestamp('2017-09-06 01:59:59'), Timestamp('2017-09-06 02:59:59'), Timestamp('2017-09-06 03:59:59'), Timestamp('2017-09-06 04:59:59'), Timestamp('2017-09-06 05:59:59'), Timestamp('2017-09-06 06:59:59'), Timestamp('2017-09-06 07:59:59'), Timestamp('2017-09-06 08:59:59'), Timestamp('2017-09-06 09:59:59'), Timestamp('2017-09-06 10:59:59'), Timestamp('2017-09-06 11:59:59'), Timestamp('2017-09-06 12:59:59'), Timestamp('2017-09-06 13:59:59'), Timestamp('2017-09-06 14:59:59'), Timestamp('2017-09-06 15:59:59'), Timestamp('2017-09-06 16:59:59'), Timestamp('2017-09-06 17:59:59'), Timestamp('2017-09-06 18:59:59'), Timestamp('2017-09-06 19:59:59'), Timestamp('2017-09-06 20:59:59'), Timestamp('2017-09-06 21:59:59'), Timestamp('2017-09-06 22:59:59'), Timestamp('2017-09-06 23:59:59'), Timestamp('2017-09-07 00:59:59'), Timestamp('2017-09-07 01:59:59'), Timestamp('2017-09-07 02:59:59'), Timestamp('2017-09-07 03:59:59'), Timestamp('2017-09-07 04:59:59'), Timestamp('2017-09-07 05:59:59'), Timestamp('2017-09-07 06:59:59'), Timestamp('2017-09-07 07:59:59'), Timestamp('2017-09-07 08:59:59'), Timestamp('2017-09-07 09:59:59'), Timestamp('2017-09-07 10:59:59'), Timestamp('2017-09-07 11:59:59'), Timestamp('2017-09-07 12:59:59'), Timestamp('2017-09-07 13:59:59'), Timestamp('2017-09-07 14:59:59'), Timestamp('2017-09-07 15:59:59'), Timestamp('2017-09-07 16:59:59'), Timestamp('2017-09-07 17:59:59'), Timestamp('2017-09-07 18:59:59'), Timestamp('2017-09-07 22:59:59'), Timestamp('2017-09-08 16:59:59'), Timestamp('2017-09-11 12:59:59'), Timestamp('2017-09-11 13:59:59'), Timestamp('2017-09-22 20:59:59'), Timestamp('2017-09-22 22:59:59'), Timestamp('2017-09-22 23:59:59'), Timestamp('2017-09-23 00:59:59'), Timestamp('2017-09-23 01:59:59'), Timestamp('2017-09-26 07:59:59'), Timestamp('2017-09-26 08:59:59'), Timestamp('2017-09-26 09:59:59'), Timestamp('2017-09-26 10:59:59'), Timestamp('2017-09-26 11:59:59'), Timestamp('2017-09-26 12:59:59'), Timestamp('2017-09-26 13:59:59'), Timestamp('2017-10-06 11:59:59'), Timestamp('2017-10-06 12:59:59'), Timestamp('2017-10-06 13:59:59'), Timestamp('2017-10-06 14:59:59'), Timestamp('2017-10-06 15:59:59'), Timestamp('2017-10-06 16:59:59'), Timestamp('2017-10-06 17:59:59'), Timestamp('2017-10-06 18:59:59'), Timestamp('2017-10-06 19:59:59'), Timestamp('2017-10-06 20:59:59'), Timestamp('2017-10-06 21:59:59'), Timestamp('2017-10-06 22:59:59'), Timestamp('2017-10-06 23:59:59'), Timestamp('2017-10-07 00:59:59'), Timestamp('2017-10-07 01:59:59'), Timestamp('2017-10-07 02:59:59'), Timestamp('2017-10-07 03:59:59'), Timestamp('2017-10-07 04:59:59'), Timestamp('2017-10-07 05:59:59'), Timestamp('2017-10-08 20:59:59'), Timestamp('2017-10-08 21:59:59'), Timestamp('2017-10-08 22:59:59'), Timestamp('2017-10-12 10:59:59'), Timestamp('2017-10-12 11:59:59'), Timestamp('2017-10-12 12:59:59'), Timestamp('2017-10-12 13:59:59'), Timestamp('2017-10-12 14:59:59'), Timestamp('2017-10-12 18:59:59'), Timestamp('2017-10-12 19:59:59'), Timestamp('2017-10-12 20:59:59'), Timestamp('2017-10-13 13:59:59'), Timestamp('2017-10-14 06:59:59'), Timestamp('2017-10-14 07:59:59'), Timestamp('2017-10-14 08:59:59'), Timestamp('2017-10-14 09:59:59'), Timestamp('2017-10-14 10:59:59'), Timestamp('2017-10-17 18:59:59'), Timestamp('2017-10-17 19:59:59'), Timestamp('2017-10-17 20:59:59'), Timestamp('2017-10-17 21:59:59'), Timestamp('2017-10-17 22:59:59'), Timestamp('2017-10-21 02:59:59'), Timestamp('2017-10-21 03:59:59'), Timestamp('2017-10-22 09:59:59'), Timestamp('2017-10-22 19:59:59'), Timestamp('2017-10-23 01:59:59'), Timestamp('2017-10-23 07:59:59'), Timestamp('2017-10-23 08:59:59'), Timestamp('2017-10-23 09:59:59'), Timestamp('2017-11-01 11:59:59'), Timestamp('2017-11-05 00:59:59'), Timestamp('2017-11-05 01:59:59'), Timestamp('2017-11-05 02:59:59'), Timestamp('2017-11-05 03:59:59'), Timestamp('2017-11-07 13:59:59'), Timestamp('2017-11-07 14:59:59'), Timestamp('2017-11-07 15:59:59'), Timestamp('2017-11-07 16:59:59'), Timestamp('2017-11-07 17:59:59'), Timestamp('2017-11-07 18:59:59'), Timestamp('2017-11-07 19:59:59'), Timestamp('2017-11-07 20:59:59'), Timestamp('2017-11-18 07:59:59'), Timestamp('2017-11-18 08:59:59'), Timestamp('2017-11-19 17:59:59'), Timestamp('2017-11-19 18:59:59'), Timestamp('2017-11-19 19:59:59'), Timestamp('2017-11-19 20:59:59'), Timestamp('2017-11-19 21:59:59'), Timestamp('2017-11-19 22:59:59'), Timestamp('2017-11-29 23:59:59'), Timestamp('2017-11-30 00:59:59'), Timestamp('2017-11-30 01:59:59'), Timestamp('2017-11-30 02:59:59'), Timestamp('2017-12-04 13:59:59'), Timestamp('2017-12-04 14:59:59'), Timestamp('2017-12-04 15:59:59'), Timestamp('2017-12-04 16:59:59'), Timestamp('2017-12-04 17:59:59'), Timestamp('2017-12-04 18:59:59'), Timestamp('2017-12-06 17:59:59'), Timestamp('2017-12-06 18:59:59'), Timestamp('2017-12-06 19:59:59'), Timestamp('2017-12-06 20:59:59'), Timestamp('2017-12-06 21:59:59'), Timestamp('2017-12-06 22:59:59'), Timestamp('2017-12-06 23:59:59'), Timestamp('2017-12-07 00:59:59'), Timestamp('2017-12-07 01:59:59'), Timestamp('2017-12-07 02:59:59'), Timestamp('2017-12-12 05:59:59'), Timestamp('2017-12-16 06:59:59'), Timestamp('2017-12-16 07:59:59'), Timestamp('2017-12-16 08:59:59'), Timestamp('2017-12-16 09:59:59'), Timestamp('2017-12-16 10:59:59'), Timestamp('2017-12-16 11:59:59'), Timestamp('2017-12-16 12:59:59'), Timestamp('2017-12-16 13:59:59'), Timestamp('2017-12-16 14:59:59'), Timestamp('2017-12-16 15:59:59'), Timestamp('2017-12-16 16:59:59'), Timestamp('2017-12-16 17:59:59'), Timestamp('2017-12-16 18:59:59'), Timestamp('2017-12-16 19:59:59'), Timestamp('2017-12-16 20:59:59'), Timestamp('2017-12-16 21:59:59'), Timestamp('2017-12-16 22:59:59'), Timestamp('2017-12-16 23:59:59'), Timestamp('2017-12-17 00:59:59'), Timestamp('2017-12-17 01:59:59'), Timestamp('2017-12-17 02:59:59'), Timestamp('2017-12-17 03:59:59'), Timestamp('2017-12-17 04:59:59'), Timestamp('2017-12-17 08:59:59'), Timestamp('2017-12-17 11:59:59'), Timestamp('2017-12-17 12:59:59'), Timestamp('2017-12-25 09:59:59'), Timestamp('2017-12-25 20:59:59'), Timestamp('2017-12-25 21:59:59'), Timestamp('2017-12-25 22:59:59'), Timestamp('2017-12-26 19:59:59'), Timestamp('2017-12-26 20:59:59'), Timestamp('2017-12-26 21:59:59'), Timestamp('2017-12-26 22:59:59'), Timestamp('2017-12-26 23:59:59'), Timestamp('2017-12-27 00:59:59'), Timestamp('2017-12-27 01:59:59'), Timestamp('2017-12-27 02:59:59'), Timestamp('2017-12-27 12:59:59'), Timestamp('2017-12-27 13:59:59'), Timestamp('2017-12-27 14:59:59'), Timestamp('2017-12-27 15:59:59')] not in index"

In [None]:
# # rougher.input.feed
# total_conc_stage_1 = gold_recovery_test_full[['rougher.input.feed_ag', 'rougher.input.feed_pb', 'rougher.input.feed_sol', 'rougher.input.feed_au']]
# total_conc_stage_1.head()

In [None]:
# total_conc_stage_1.plot(kind='box')

In [None]:
# # rougher.output.concentrate
# total_conc_stage_2 = gold_recovery_test_full[['rougher.output.concentrate_ag', 'rougher.output.concentrate_pb', 'rougher.output.concentrate_sol', 'rougher.output.concentrate_au']]
# total_conc_stage_2.head()

In [None]:
# total_conc_stage_2.plot(kind='box')

In [None]:
# # final.output.concentrate
# total_conc_stage_3 = gold_recovery_test_full[['final.output.concentrate_ag', 'final.output.concentrate_pb', 'final.output.concentrate_sol', 'final.output.concentrate_au',]]
# total_conc_stage_3.head()

In [None]:
# total_conc_stage_3.plot(kind='box')

<div id="build_the_model">
    <h2>Build the model</h2> 
</div>

<div id="overall_conclusion">
    <h2>Overall conclusion</h2> 
</div>

We need to predict two values:
- rougher concentrate recovery `rougher.output.recovery`
- final concentrate recovery `final.output.recovery`

In [None]:
# function to calculate sMAPE
def smape(rougher_true, rougher_predictions, final_true, final_predictions):
    rougher_smape = 100/len(rougher_true) * np.sum(2 * np.abs(rougher_true - rougher_predictions) / (np.abs(rougher_true) + np.abs(rougher_predictions)))
    final_smape = 100/len(final_true) * np.sum(2 * np.abs(final_true - final_predictions) / (np.abs(final_true) + np.abs(final_predictions)))
    return (.25 * rougher_smape) + (.75 * final_smape)