MLB Rankings

Style Guide

Documentation

Documentation for this project is avaiable at Read the Docs - MLB Rankings

Version

0.0.1

What is this repo?

This is the computation engine to run the statistic calculations using the mesaures listed below.

Project Goals

A application to predict which MLB teams will be contenders in the range of 3-5 years. Also suggest what a particular team can do to make their team a contender in 3-5 years. Focus which stage a team is in (buying, selling, rebuilding, etc.) and how aggressive they are in that mode. Determine value of players focused on WAR and years of control as primary factors.

This will also be an exploration of ES2015 / ES6, Map and Reduce, and draw many ideas and algorithms from Data Science.

Preface

Having played baseball for such a long time this project is very interesting to me on many levels. Much of the baseball specific analysis will be based upon is The Hidden Game by John Thorn and Pete Palmer. However, a significant portion of the project will be based on principles of data science not specific to baseball including data mining (with scheduling), association rules, recommender systems utilizing Jaccard Similarity.

Statistical Components

Stat and Player Based Similarities (Collaborative Filtering)

Basis of the prediction simulator is to determine similarities between players and teams to provide a better model to predict player and therefore team performance. Requires a learning phase to be successful and gauge accuracy. This should be easy for most of the common stats as there is a massive amount of data dating back many years.

Measures

Jaccard Similarity - is a statistic used for comparing the similarity and diversity of sample sets of individual player stats
Pearson Correlation Coefficient - measures how well two stats fit on a straight line
Adjusted Cosine Similarity - treat stats for each player as vectors in n-dimensional space (n = number of players) and determine the angle between the two vectors.
- Important Adjustment - weight all values with the average of each stat for that particular year as year to year factors change.

Determining Accuracy

To determine the accuracy of the predictions the models have we allocate all previous data for the system to be trained on and compare the predicted values with the known values.

Algorithms and Models

Many of the algorithms that employed will be that of Collaborative Filtering which often used in recommender systems to give recommendations of products, movies or shows. However, in this case the recommendation will either be players that are similar or in a more narrow scope stats that are similar to predict players that are similar which can lead to the ultimate goal of predicting how teams will ultimately perform.

Variables

Name	Description
n	number of unique stats we are analyzing
p	number of players
v	number of values for the stats (includes stats over years)
u	undermined stats for a given player
Pt	prediction time of all players and all of their stats
Lt	learning time used by the algorithm to build a dataset in order to determine predictions

all players of the same high level position should have all of the stats

The table belows some of the algorithms that will be used to create a hybrid between memory and model based collaborative filtering.

Name	Description	Performance	Use Case
K-Nearest Neighbors	Training phase stores only feature vectors. Classification phase assigns labels which is most frequent among the k training samples nearest to the query point.	TBD	Very simplistic uses lazy learning
Slope One	description	Lt = pn ² Pt = (n-x)	USE CASE

Potential Problems

Dev

Node.js - event-driven I/O server-side JavaScript environment based on V8
ES6 - NoSQL database system which stores data similar to JSON documents
Mongo DB - standardized single modern database management system
Nginx - high performance HTTP server
Digital Ocean - simple cloud infrastructure for hosting
Front End Framework - TBD
Statistical Analysis Framework - TBD

Name		Name	Last commit message	Last commit date
Latest commit History 157 Commits
data		data
src		src
.babelrc		.babelrc
.eslintrc.js		.eslintrc.js
.gitignore		.gitignore
README.md		README.md
README.rst		README.rst
Vagrantfile		Vagrantfile
index.js		index.js
init.sh		init.sh
package.json		package.json

mlb-ranking/mlb-prediction-engine

Folders and files

Latest commit

History

Repository files navigation

MLB Rankings

Documentation

Version

What is this repo?

Project Goals

Preface

Statistical Components

Stat and Player Based Similarities (Collaborative Filtering)

Measures

Determining Accuracy

Algorithms and Models

Variables

Potential Problems

Dev

Sections

About

Resources

Stars

Watchers

Forks

Languages