Skip to content

AlexKinley/DS4635-soccer-final

Repository files navigation

What's in this repo

This repo contains code and other stuff for predicting the outcome of soccer matches based on this https://www.kaggle.com/competitions/football-match-probability-prediction/overview dataset.

Where is the actual data?

Because git doesn't like big files, the actual training and testing data is not in this repository.

After downloading the dataset off of kaggle, I used create_validation_set.R to randomly split the data into a training and a testing set (85% and 15% of the total training data respectively). Then I used create_validation_set2.R to randomly split that training data into training and testing data. These files start with base_ and can be viewed as the bottom level of the validation set hierarchy.

To get this data you have to download it from onedrive. To try out a model, download all 4 files. The _train.csv files contain the predictors (base_train.csv) and the match results (base_res_train.csv) for the training set. The _test.csv files contain the same info, but for the testing set, which you can use to calculate the accuracy of your model after training it on the training data.

base_train.csv

base_res_train.csv

base_test.csv

base_res_test.csv

The training data all in one from this can be downloaded from match_res_train.csv and matches_train.csv. (Don't do this unless you explicitly don't want to use the bottom level validation set).

The .gitignore file is set up so that if you download those files and put them in this repo folder, it won't get committed.

What does the data look like?

Look at summary.md to see some basic info about the data, primarily what the columns are, and what they look like.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages