# ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️
# MAKE A COPY OF THIS NOTEBOOK ON YOUR OWN GDRIVE
# Changes to this notebook will not be saved.
# ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️

# Unity Data Science Homework - Specification

At Unity, we develop **deep learning models** for [Real-Time Bidding (RTB)](https://en.wikipedia.org/wiki/Real-time_bidding).

To bid for an ad impression, we estimate the optimal bid value using the predicted install probability together with several other predictions.

In this homework, your task is to train a **Deep Learning Model** using the data sampled from our production environment and to ​predict the install probabilities for the ad impressions included in the test data.

Note that the install probability predicted by your model will be used directly for estimating the optimal bid values of ad impressions. Therefore, it is important for the predictions to be as accurate as possible.

**Here are the guidelines for the homework:**
*   Complete the homework using python and any libraries of your choice (for example, numpy, scikit learn, Tensorflow, PyTorch, etc.)
*   You're free to use any type of model for your imemdiate steps, but your final model **must** be a Neural Network
*   Use ​**ROC AUC, log loss**,​ and **​prediction bias**​ to evaluate model performance. Feel free to use **other metrics** to discuss a model’s merit
*   Perform exploratory data analysis.
*   Keep code clean and organized.
*   Please include all of your intermediate models and results (even if they don't work, it will help us understand your model development)
*   Structure your work to showcase your understanding of the important steps leading to your final solution.


# Colab setup

The default kernel of this notebook is set to CPU. You can use a GPU kernel when working on parts of your solution that requires more compute. Note that you will have a daily limit on the usage of those kernels.

## Download Data

Run the next cell to download the two datasets required for this homework. You will need to authenticate using your google account to be able to download the files. (This will take about 1.5 minutes to download)

The two files are `train_data.csv` and `assessment_data.csv` (saved into the variables `TRAIN_DATA_PATH` and `ASSESSMENT_DATA_PATH` respectively)

`train_data.csv` has install labels. `assessment_data.csv` does not. The `assessment_data.csv` is used for assessing your submission.

They can be loaded with pandas like so:
```
import pandas as pd
trainDF = pd.read_csv(TRAIN_DATA_PATH, delimiter=';')
assessmentDF = pd.read_csv(ASSESSMENT_DATA_PATH, delimiter=';')
```

In [None]:
!pip install PyDrive

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

TRAIN_DATA_PATH = 'train_data.csv'
ASSESSMENT_DATA_PATH = 'assessment_data.csv'

downloaded = drive.CreateFile({'id':"1N3n7ThL-4mRod-lwgy047LkFeZ5iGFuJ"})
downloaded.GetContentFile(TRAIN_DATA_PATH)
print(f'Downloaded: {TRAIN_DATA_PATH}')

downloaded = drive.CreateFile({'id':"1BCQuIIE-Kh61ExDuhWNLOPae-9Q3m6IL"})
downloaded.GetContentFile(ASSESSMENT_DATA_PATH)
print(f'Downloaded: {ASSESSMENT_DATA_PATH}')

## Deliverables

- This Jupyter notebook containing:
    - All code to produce the results
    - Exploratory data analysis
    - Modelling approach
    - Performance evaluation of the model
    - Explanation of design choices
    - Discussion of future work
- A CSV file that contains the predicted install probabilities of ad impressions in the `assessment_data.csv`. The file should have the following columns only:
    - ```id```: ID of ad impression in the test data
    - ```install_proba```: Predicted install probability of ad impression


## Data description

- ```id```: impression id
- ```timestamp```: time of the event in UTC
- ```campaignId```: id of the advertising campaign (the game being advertised)
- ```platform```: device platform
- ```softwareVersion```: OS version of the device
- ```sourceGameId```: id of the publishing game (the game being played)
- ```country```: country of user
- ```startCount```: how many times the user has started (any) campaigns
- ```viewCount```: how many times the user has viewed (any) campaigns
- ```clickCount```: how many times the user has clicked (any) campaigns
- ```installCount```: how many times the user has installed games from this ad network
- ```lastStart```: last time user started any campaign
- ```startCount1d```: how many times user has started (any) campaigns within the last 24 hours
- ```startCount7d```: how many times user has started (any) campaigns within the last 7 days
- ```connectionType```: internet connection type
- ```deviceType```: device model
- ```install```: binary indicator if an install was observed (install=1) or not (install=0) after impression

# Your submission: