My name is Alexander Sutherland and I'm a data scientist, researcher, and mathematician.
Pronouns: He/Him/His
data science projects focused on interesting and meaningful questions.
-
Predicting Preterm Birth (https://github.com/kgrillaert/preterm_birth)
Capstone Project for Data Science Program – Academia to Industry at The Erdös Institute, The Ohio State University
Preterm birth is the leading cause of infant mortality and a major contributor to serious long-term afflictions. In this project, I led our group’s study on the ability of demographics to predict preterm births by using two datasets – one with only demographic information and one with demographic and health / lifestyle information. For each dataset, we investigated the efficacy of logistic regression, support vector classifier, and boosted (e.g. AdaBoost, XGBoost) model types. We also used the AI Fairness 360 package to test that our model predictions performed equally well across the race and ethnicity features. Our models performed only as well as the baseline model, highlighting the challenges of predicting preterm birth with only electronic health records. Future work should consider the collection of thorough, individual-level data observed during the pregnancy to provide a high-quality data source for machine learning predictions.
-
Analyzing MLB Umpire Performance (https://github.com/AlexanderJSutherland/pitch-calling)
Personal Project
Major League Baseball (MLB) has always had people working as home plate umpires to determine whether each pitch is a ball or a strike. However, since the introduction of Pitch/fx in the mid-2000’s, MLB has had access to better and better pitch tracking systems and is implementing some form of automated ball-strike calling as soon as 2026. In this project, I investigate the efficacy of umpires in calling balls and strikes by training models not on whether a pitch was a true ball or strike, but by whether an umpire called each pitch a ball or a strike, and then comparing the model’s predictions to true ball/strike calls. In this project, I compare performance of “pseudo-robo-umpires” (i.e. machine learning models trained on real umpire data) across four datasets with varying pitch location and umpire information. Additionally, given the expectation that no human umpire will be perfect at calling balls and strikes, I investigate which non-location factors most impact an umpire making an incorrect ball/strike call.
-
Classifying MLB Pitch Types (https://www.github.com/AlexanderJSutherland/classifying-2024-MLB-pitches)
Personal Project
Pitchers across Major League Baseball (MLB) commonly throw 9-10 different types of pitches, but no two pitchers throw the same type of pitch exactly the same way. In this project, I identify the most important measurable features for pitch classification and classify pitch types for individual pitches using these features - paying special attention to the inclusion/exclusion of pitcher IDs. For this project, I used k-nearest neighbors, decision tree, and ensemble models, as well as a simple neural network.