<br>

<h3 style="font-family: Verdana; font-size: 20px; font-style: normal; font-weight: normal; text-decoration: none; text-transform: none; letter-spacing: 2px; color: #FF8C00; background-color: #ffffff;"> <b>DATASET</b> DESCRIPTION</h3>

---

The data include:

* train dataset with game sessions data: `train.csv`
* correct answers for all questions for each session: `train_labels.csv`
* test dataset with game sessions data: `test.csv`
* sample submission file: `sample_submission.csv`


<br>

<h3 style="font-family: Verdana; font-size: 20px; font-style: normal; font-weight: normal; text-decoration: none; text-transform: none; letter-spacing: 2px; color: #FF8C00; background-color: #ffffff;"> <b>EVALUATION</b></h3>

---

This competition has two tracks: the first one focus on the <b>accuracy</b> of the model, and the second one focus on the <b>efficiency</b> of the model
* <b>First track: Accuracy</b>
    * The submissions will be evaluated based on the <a>F1 socre</a>: $2 \frac{precision*recall}{precision+recall}$
* <b>Second track: Efficiency</b>
    * Must be among the submissions selected by a team for the Leaderboard Prize, or else among those submissions automatically selected under the conditions described in the My Submissions tab.
    * Must be ranked on the Private Leaderboard higher than the sample_submission.csv benchmark.
    * Must not have a GPU enabled. <b>The Efficiency Prize is CPU Only</b>.
    * The submissions will be evaluated based on the <a>Efficiency</a>: $\frac{1}{Benchmark-maxF1} + \frac{1}{32400}RuntimeSeconds $

<br>

<a id="imports"></a>

<h1 style="font-family: Verdana; font-size: 24px; font-style: normal; font-weight: bold; text-decoration: none; text-transform: none; letter-spacing: 3px; background-color: #ffffff; color: #FF8C00;">&nbsp;&nbsp;IMPORTS&nbsp;&nbsp;&nbsp;&nbsp;</h1>

In [4]:
# Machine Learning and Data Science Imports
import pandas as pd
import numpy as np

# Built-In Imports
import os
import time
import gc

# Visualization Imports
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
from skimage import io 
try: 
    import mpl_scatter_density # for density scatter graph 
except:
    print("\tPlease install mpl_scatter_density!")
    
# Other Imports
from tqdm.notebook import tqdm # for progress bar
#import jo_wilder # the API of this competition


<br>

<h3 style="font-family: Verdana; font-size: 20px; font-style: normal; font-weight: normal; text-decoration: none; text-transform: none; letter-spacing: 2px; color: #FF8C00; background-color: #ffffff;">HELPER FUNCTIONS</h3>
<br>


In [5]:
# for checking features properties during feature engineering
def check_features(features_df):
    fig, ax = plt.subplots(1, 2, figsize=(18,9))
    fig.tight_layout(pad=10.0)
    sns.boxplot(ax=ax[0], data=features_df, orient="h")
    sns.violinplot(ax=ax[1], data=features_df, orient="h")
    plt.show()

In [6]:
# convert features series into features datafame
def to_df(features_series): 
    features_df = pd.concat(features_series,axis=1)
    features_df = features_df.reset_index()
    features_df = features_df.set_index('session_id')
    return features_df

<br>

<h3 style="font-family: Verdana; font-size: 20px; font-style: normal; font-weight: normal; text-decoration: none; text-transform: none; letter-spacing: 2px; color: #FF8C00; background-color: #ffffff;">LOAD DATA</h3>
<br>


In [None]:
dtypes = {'session_id': 'category',
          'elapsed_time': np.int32,
          'event_name': 'category',
          'name': 'category',
          'level': np.uint8,
          'page': 'category',
          'room_coor_x': np.float32,
          'room_coor_y': np.float32,
          'screen_coor_x': np.float32,
          'screen_coor_y': np.float32,
          'hover_duration': np.float32,
          'text': 'category',
          'fqid': 'category',
          'room_fqid': 'category',
          'text_fqid': 'category',
          'fullscreen': np.int8,
          'hq': np.int8,
          'music': np.int8,
          'level_group': 'category'}
print("\n\n... LOAD DATA FROM CSV FILE ...")
train = pd.read_csv(os.path.join(DATA_DIR, "train.csv"), dtype=dtypes)
test = pd.read_csv(os.path.join(DATA_DIR, "test.csv"), dtype=dtypes)
train_labels = pd.read_csv(os.path.join(DATA_DIR, "train_labels.csv"))
print(f"\n\n... LOAD DATA COMPLETE ...\n")