# <span style="color:darkBlue"> Project Report:  Game Price Prediction Project</span>

## <span style="color:Purple"> 1) Introduction </span>

The objective of this project is to predict the prices of video games based on various features gathered from the SteamDB website. Accurate price prediction can help developers and publishers in strategizing their marketing and sales efforts, and can assist consumers in making informed purchasing decisions.

## <span style="color:Purple"> 2) Data Collection </span>
The data collection phase involved web scraping from the SteamDB website to gather information on approximately 3000 video games. This phase was divided into two main steps:  

#### 2.1 Web Scraping Game Information

In the first step, game URLs and detailed game information were scraped from the SteamDB search pages. The key features collected for each game include:
- `NAME`: The title of the game.
- `STORE_GENRE`: The genre(s) of the game.
- `RATING_SCORE`: The overall rating score of the game.
- `N_SUPPORTED_LANGUAGES`: The number of languages supported by the game.
- `DEVELOPERS`: The developer(s) of the game.
- `SUPPORTED_PLATFORMS`: The platforms on which the game is available.
- `POSITIVE_REVIEWS`: The number of positive reviews.
- `NEGATIVE_REVIEWS`: The number of negative reviews.
- `TECHNOLOGIES`: The technologies used in the game.
- `RELEASE_DATE`: The release date of the game.
- `TOTAL_TWITCH_PEAK`: The peak number of viewers on Twitch.
- `PRICE`: The price of the game.
- `N_DLC`: The number of downloadable content (DLC) available.
- `24_HOUR_PEAK`: The peak number of players in the last 24 hours.

This step involved several challenges, including handling the large volume of data (3000 games) and ensuring the scraping process was efficient and reliable. The data collected in this step was saved in two files:
- `game_urls.txt`: Contains URLs of the games.
- `games_info.csv`: Contains detailed information of the games.

#### 2.2 Web Scraping Game Prices

In the second step, the focus was on gathering the prices of the games in USD to ensure consistency across different regions. This involved scraping the name, release date, and price of each game from their respective pages. This data was saved in:
- `games_details.csv`: Contains the name, release date, and price of the games.

*** 

#### Data Files and Notebooks

The following files and notebooks were created and used during the data collection phase:
1. **Data Files**:
   - `game_urls.txt`: A text file containing URLs of the games.
   - `games_info.csv`: A CSV file containing detailed game information.
   - `games_details.csv`: A CSV file containing the name, release date, and price of the games.

2. **Notebooks**:
   - `webScrape_1.ipynb`: Notebook used to scrape game URLs and detailed information, resulting in `game_urls.txt` and `games_info.csv`.
   - `webScrape_2_price.ipynb`: Notebook used to scrape game prices, resulting in `games_details.csv`.


The data collection phase provided a comprehensive dataset that includes various features of video games and their prices. This dataset forms the foundation for the subsequent analysis and price prediction models. The challenges encountered during web scraping were addressed effectively to ensure data accuracy and completeness.


## <span style="color:Purple">B) Preprocessing (1)</span>

### 2) Data Collection
The data collection phase involved web scraping from the SteamDB website to gather information on approximately 3000 video games. This phase was divided into three main steps:

#### 2.1 Web Scraping Game Information
In the first step, game URLs and detailed game information were scraped from the SteamDB search pages. The key features collected for each game include:
- `NAME`: The title of the game.
- `STORE_GENRE`: The genre(s) of the game.
- `RATING_SCORE`: The overall rating score of the game.
- `N_SUPPORTED_LANGUAGES`: The number of languages supported by the game.
- `DEVELOPERS`: The developer(s) of the game.
- `SUPPORTED_PLATFORMS`: The platforms on which the game is available.
- `POSITIVE_REVIEWS`: The number of positive reviews.
- `NEGATIVE_REVIEWS`: The number of negative reviews.
- `TECHNOLOGIES`: The technologies used in the game.
- `RELEASE_DATE`: The release date of the game.
- `TOTAL_TWITCH_PEAK`: The peak number of viewers on Twitch.
- `PRICE`: The price of the game.
- `N_DLC`: The number of downloadable content (DLC) available.
- `24_HOUR_PEAK`: The peak number of players in the last 24 hours.

This step involved several challenges, including handling the large volume of data (3000 games) and ensuring the scraping process was efficient and reliable. The data collected in this step was saved in two files:
- `game_urls.txt`: Contains URLs of the games.
- `games_info.csv`: Contains detailed information of the games.

#### 2.2 Web Scraping Game Prices
In the second step, the focus was on gathering the prices of the games in USD to ensure consistency across different regions. This involved scraping the name, release date, and price of each game from their respective pages. This data was saved in:
- `games_details.csv`: Contains the name, release date, and price of the games.

#### 2.3 Additional Web Scraping for Missing Prices
During preprocessing, it was found that many prices were missing even after the second web scraping step. To address this, a third web scraping notebook was created to gather more game data, focusing on saving the name, URL, and price for additional games. This additional data collection was crucial to ensure a more complete and accurate dataset for the price prediction model. The data collected in this step was saved in:
- `games_name_url_price.csv`: Contains the name, URL, and price of more games.

### Data Files and Notebooks

The following files and notebooks were created and used during the data collection phase:
1. **Data Files**:
   - `game_urls.txt`: A text file containing URLs of the games.
   - `games_info.csv`: A CSV file containing detailed game information.
   - `games_details.csv`: A CSV file containing the name, release date, and price of the games.
   - `games_name_url_price.csv`: A CSV file containing the name, URL, and price of more games.

2. **Notebooks**:
   - `webScrape_1.ipynb`: Notebook used to scrape game URLs and detailed information, resulting in `game_urls.txt` and `games_info.csv`.
   - `webScrape_2_price.ipynb`: Notebook used to scrape game prices, resulting in `games_details.csv`.
   - `webScrape_3_URL.ipynb`: Notebook used to gather additional game prices, resulting in `games_name_url_price.csv`.

The data collection phase provided a comprehensive dataset that includes various features of video games and their prices. This dataset forms the foundation for the subsequent analysis and price prediction models. The challenges encountered during web scraping were addressed effectivee are any further adjustments or additions needed!tional details or specific adjustments, please let me know!

## <span style="color:Purple">C) Analysis</span>