# Steam's Videogames Platform 👾

## Overview
- **Platform:** Steam

---

## Company's Description 📇
Steam is a video game digital distribution service and storefront from Valve. It was launched as a software client in September 2003 to provide game updates automatically for Valve's games, and expanded to distributing third-party titles in late 2005. Steam offers various features, such as:

- Digital rights management (DRM)
- Game server matchmaking with Valve Anti-Cheat measures
- Social networking
- Game streaming services

Steam client's functions include:

- Game update automation
- Cloud storage for game progress
- Community features such as direct messaging, in-game overlay functions, and a virtual collectable marketplace

---

## Project 🚧
You're working for Ubisoft, a French video game publisher. They'd like to release a new revolutionary video game! They have asked you to conduct a global analysis of the games available on Steam's marketplace in order to better understand the video games ecosystem and today's trends.

---

## Goals 🎯
The ultimate goal of this project is to understand what factors affect the popularity or sales of a video game. Additionally, your boss has asked you to take advantage of this opportunity to analyze the video game market globally.

---

## Levels of Analysis

### Macro-Level Analysis

- **Which publisher has released the most games on Steam?**
- **What are the best-rated games?**
- **Are there years with more releases?** For example, were there more or fewer game releases during the Covid period?
- **How are the prices distributed?** Are there many games with a discount?
- **What are the most represented languages?**
- **Are there many games prohibited for children under 16/18?**

### Genres Analysis

- **What are the most represented genres?**
- **Are there any genres that have a better positive/negative review ratio?**
- **Do some publishers have favorite genres?**
- **What are the most lucrative genres?**

### Platform Analysis

- **Are most games available on Windows/Mac/Linux instead?**
- **Do certain genres tend to be preferentially available on certain platforms?**

---

## Flexibility of Analysis

You are free to follow these guidelines or to choose a different angle of analysis, as long as your analysis reveals relevant and useful information. 🤓

---

## Scope of this Project 🗁️
You'll have to use **Databricks** and **PySpark** to conduct this EDA. Particularly, you'll have to use Databricks' visualization tool to create the visualizations.

- **Dataset location:**
  `s3://full-stack-bigdata-datasets/Big_Data/Project_Steam/steam_game_output.json`

---

## Helpers 🦭
To help you achieve this project, here are a few tips:

- To adopt different levels of analysis, it might be useful to create different dataframes.
- As the dataset is semi-structured with a nested schema, PySpark's methods such as `getField()` and `explode()` may help you.
- There are some text and date fields in this dataset: PySpark offers utility functions to manipulate these types of data efficiently. 💡
- You can use aggregate functions and `groupBy` to conduct segmented analysis.

---

## Deliverable 📣
To complete this project, you should deliver:

1. **One or several notebooks** including data manipulation with PySpark and data visualization with Databricks' dashboarding tool.

2. To make sure the jury can view all the visualizations, please use the **"publish"** button on Databricks notebooks to create a public URL where a copy of your notebook will be available.

   - If Databricks notifies you that your notebook's size exceeds the maximum allowed size, split your notebook into several smaller notebooks.

3. Copy-paste the link(s) to your published notebooks into your GitHub repo so the jury can access it easily. 😌
