A text classification model from data collection, model training, and deployment.
The model can classify 20 different types of game genres
The keys of encrypted/kaggle/genre_types_encoded_kaggle.json show the game genres.
🔴Notice🔴
originalsub folder of all the folder contains codes for the scraped dataset.
kagglesub folder of all the folder contains codes for the collected dataset from the Kaggle platform.
Data was collected from a Game Website:Metacritic Game. The data collection process is divided into 2 steps:
Game URL Scraping: The game urls were scraped withscraper\game_url_scraper.pyand the urls are stored along with game title.Game Details Scraping: Using the urls, the game description and genres are scraped withscraper\game_details_scraper.pyand they are stored indata/raw_data/game_detils.csv.
In total, I scraped 20,406 game details.
Initially, there were 33 different genres in the dataset.
After some analysis, I found out 12 of them are rare (probably custom genres by users).
So, I removed those genres and then I have 21 genres.
After that, I removed the description without any genres resulting in 73,551 samples.
Fine-tuned a distilrobera-base model from HuggingFace Transformers using Fastai and Blurr. The model training notebook can be viewed here
The trained model has a memory of 900+MB.
I compressed this model using ONNX quantization and brought it under 78.7MB.
The compressed model is deployed to HuggingFace Spaces Gradio App. The implementation can be found in the deployment folder or here
Deployed a Flask App built to take description and show the genres as output.
Check dev branch.
The website is live here