Skip to content

Using machine learning to predict the outcomes of professional sumo wrestling bouts in Japan Grand Tournaments.

Notifications You must be signed in to change notification settings

chow-vincent/sumo_project

Repository files navigation

Sumo Wrestling Machine Learning Project

The goal of this project was to use machine learning to predict the outcomes of sumo wrestling bouts in Japanese Sumo Grand Tournaments.

Most of the project focused on data collection and exploration. On the prediction side, a simple logistic regression model ingesting a few key features can predict the outcomes of a grand tournament with about 61% accuracy. Better than random chance, but not enough margin to beat the betting markets. A lot of work can be done regarding prediction.

Folders

data/ : contains data collected from online, public database using the Beautiful Soup Library. Stored as pickle files from pandas dataframes.

plots/ : contains visualizations of data in Seaborn plots saved as png's.

tourneys/ : contains daily tourney head-to-head lineups for March (Haru) Basho 2017.

Python Scripts and Modules

Machine Learning Files:

  • machine_learning.py : Script for doing various machine learning tasks, including model evaluation & predicting outcomes of new bouts.

  • ml_fxns.py : Module with helper functions for various machine learning tasks. Could include data pre-processing tasks, prediction tasks, etc.

Data Scraping Files:

  • rikishi_scrape.py : Module with functions used for scraping data with Beautiful Soup. Functions are used when scraping data from multiple html pages/sumo wrestlers.

  • scrape_multiple_h2h.py : Script to scrape head-to-head data for multiple sumo wrestlers.

  • scrape_multiple_rikishi.py : Script to scrape basic profile data for multiple sumo wrestlers.

Data Preparation Files:

  • data_extraction.py : Module with helper functions for processing information extracted from html tags using scraping libraries (e.g. Beautiful Soup).

  • database_ops.py : Module with helper functions to perform various operations with DataFrames.

  • feature_generation.py : Script to generate DataFrame containing feature data and labels.

  • filter_duplicates.py : Script to filter out the duplicate rows in raw head-to-head DataFrame generated by feature generation script.

Jupyter Notebooks

ml_playground.ipynb : notebook for playing around with various machine learning tasks.

testing_playground.ipynb : notebook for testing miscellaneous pieces of code.

visualizations.ipynb : notebook for generating Seaborn visualizations of scraped data.

About

Using machine learning to predict the outcomes of professional sumo wrestling bouts in Japan Grand Tournaments.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published