Skip to content

bynchang/UMAP-baseball

Repository files navigation

Uniform Manifold Approximation and Projection (UMAP) is a scalable manifold learning algorithm for dimension reduction and visualization for high-dimensional data. Its goal is to find a low-dimensional representation of the data while preserving the distance structure in the original space as much as possible. Mathematically, it does so by minimizing the fuzzy set cross entropy between the fuzzy topological representation of the data and that of the low-dimensional representation. The package's documentation gives a clear tutorial on the theory behind it, while I also recommend reading the original paper, which includes comparisons with other dimension reduction algorithms like PCA and t-SNE.

In this project, I utilized UMAP to learn a latent representation of the arsenals of the MLB pitchers in the 2019 season. The data is scraped from the baseball website Savant, which records the release speed, horizontal and vertical movement of each pitch of each pitcher. After learning the latent representation of pitcher arsenal, I ran a clustering algorithm (HDBSCAN) and was able to detect pitcher clusters in the dataset.

You can view the notebook here.

About

Applying UMAP on baseball data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published