GitHub - MBuggey/Capstone: GitHub Repository for my BrainStation Data Science Diploma capstone.

Project Overview

This project was created as the capstone project for BrainStation's Data Science Diploma program. The purpose of this project is to create a recommender algorithm for the Steam game store. The end goal is an algorithm that can read a users play history and identify titles it believes the user would enjoy. Currently it implements a FunkSVDpp algorithm using two different rating metrics. In the near future hybrid models are intended to be implemented to allow greater variety and customizability of recommendations.

Data Source

The version of the dataset that was used for this project can be downloaded here.

These datasets were sourced from Kaggle, the original poster specified a monthly update frequency so you can potentially source an updated dataset here.

Data Breakdown

The file consists of three separates csv's and a JSON file.

games.csv features a breakdown of all the games available for sale in the Steam store. The intial csv contains basic data that could create a base recommender system. The JSON file included however contains additional, non tabular, information on games such as descriptions and tags. For our purposes we will have to extract and connect at least some of that information to this file.

Column breakdown of games.csv is as follows:

app_id: Unique id for each individual game in the games.csv
title: Title of game
date_release: Release date of the game
win: Game is playable on Windows
mac: Game is playable on Mac
linux: Game is playable on Linux
rating: Overall user rating of game
positive_ratio: The ratio of positive user reviews to negative user reviews by game
user_reviews: Number of unique user reviews on a game
price_final: Current price of game as of last data scrape
price_original: Original price of game on release (specifically on Steam store release)
discount: Whether the game is discounted as of last data scrape (and by how much in %)
steam_deck: Whether game is playable on Steam's handheld device, the Steam Deck

app_id	title	date_release	win	mac	linux	rating	positive_ratio	user_reviews	price_final	price_original	discount	steam_deck
13500	Prince of Persia: Warrior Within™	2008-11-21	true	false	false	Very Positive	84	2199	9.99	9.99	0.0	true

users.csv contains information on unique Steam users. Though the information is sparse it could potentially be used to 'grade' recommendations based on whether the user is a notable one.

Column breakdown of users.csv is as follows:

user_id: Unique id for each individual user of the Steam store. Useful for connecting back to recommendations.csv
products: Number of products owned by unique user
reviews: Number of reviews posted by unique user

user_id	products	reviews
7360263	359	0

recommendations.csv is the largest file of the three and focuses on individual user recommendations for video games. This will be the primary column in determinine whether a game is performing well and whether a review could be a good or bad indicator of game success.

Column breakdown of recommendations.csv is as follows:

app_id: Unique id of the game being reviewed, can be tied back to app_id in games.csv
helpful: How many users voted this review as 'helpful'
funny: How many users voted this review as 'funny'
date: Date of review
is_recommended: Whether the user recommends the game or not
hours: Hours of playtime the user has on the game
user_id: Unique user id that can be tied back to users.csv
review_id: Auto-generated label for review number by row. Could potentially be deleted.

app_id	helpful	funny	date	is_recommended	hours	user_id	review_id
975370	0	0	2022-12-12	True	36.3	51580	0

Steps to Complete

Initial GitHub repository and project breakdown
Preliminary EDA on the data, including cleaning and noting any problem areas
Extraction of JSON file and merging pertinent data into existing csv, cleaning data again once completed
Discover usable metrics which might lead to an initial model
Begin prototyping model
Build and test first model
Create a recommendation function
Implement hybrid model features to refine recommendation output
Package everything into a Streamlit app

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Data		Data
Models		Models
Notebooks		Notebooks
References		References
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.ipynb_checkpoints

.ipynb_checkpoints

Data

Data

Models

Models

Notebooks

Notebooks

References

References

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Project Overview

Data Source

Data Breakdown

Steps to Complete

About

Releases

Packages

Languages

MBuggey/Capstone

Folders and files

Latest commit

History

Repository files navigation

Project Overview

Data Source

Data Breakdown

Steps to Complete

About

Resources

Stars

Watchers

Forks

Languages