# Scalable Capital Data Science Test Assignment

## Description

Welcome to the Scalable Capital Data Science Test Assignment! <br>
In this assignment, you are given a dataset with which you can demonstrate skills that are typically required for a data analyst/scientist position. <br>
These skills consist of the following (non-exhaustive) list of items:
- Exploratory Data Analysis
- Data Cleaning and Preprocessing
- Feature Engineering
- Model Building
- Model Evaluation

### Data Description
The dataset in this assignment is a gaming dataset from [Steam](http://store.steampowered.com). The objective is to 
1. understand and learn interesting patterns in the dataset
2. make predictions whether a user would recommend an item (game) or not
3. and, for a given item, provide a recommendation of 5 similar items. 

In `reviews.json`, you'll find reviews given by users to certain games, and whether they would recommend the games or not. <br>
In `users.json`, you can see the number of hours the users spend on the games. <br>
In `items.json`, metadata about games are provided.

**Important:**

To aid you through the assignment, we provide a structure that we think is suitable for the project. Nevertheless, you are free to re-structure it in any other way you consider as reasonable. <br>
Feel free to install any library that you need. If you do, please provide an accompanying `requirements.txt` file.<br>
We encourage you to use comment to explain your thought process and logic.

In [None]:
import pandas as pd
import numpy as np
import datetime
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
plt.rcParams['figure.figsize'] = 14, 6
sns.set()

## Part 1: Exploratory Data Analysis

Understanding the data is an important part of a data scientist's daily tasks. <br>
Please provide visualizations where necessary, and at the end, write a short summary of what you have learned from the data. <br>
*We recommend that you spend roughly 1/3 - 1/4 of your test assignment time on this part.*

Things you could look at:
- distribution of game genres
- distribution of sentiments among genres
- average amount of playtime 
- most popular games among users in term of playtime, recommendations, etc.
- etc.

**Summary**


## Part 2: Modeling

### 2.1 Predict whether a user would recommend a game or not

#### Feature engineering

#### Baseline model
It's good practice to quickly come up with a baseline model, and iterate from there. <br>
For example, a naive model: always recommend the most popular games.

#### ML Model
Learn from the data and predict rating.

#### Model(s) Evaluation
Evaluate your model(s) with appropriate metrics

#### (Optional) Hyperparameters tuning

### 2.2 Game recommendation
- Pick a random game of your choice, and recommend 5 similar games.
- Pick a random user of your choice, and recommend 5 new games to that user.

Do you think your model / approach gives reasonable recommendations?

## Summary
- Give us some brief thoughts on your approach to the assignment.
- If you could have spent more time on the test assignment, what would you do to improve your model(s)?