Various ML coding projects
Creating correlated data and running various algorithms to understand their success in various situations.
The purpose of this assignment is to conduct exploratory data analysis on a merged dataset. EDA should allow us to uncover information about the data such as trends, relationships, and patterns. It also can be used to identify parts of data that needs to be cleaned. By the end, we gain the necessary experience to handle, analyze, and visualize real world 'style' data which can be messy.
The goal of this project was to explore MLR and logistic regression in depth. This involved setting up the full ML pipeline, manipulating data, and playing around with the models for optimization. This provided invaluable experience regarding iterating over a model, just like we would if provided a real-life problem.
The purpose of this task is to gain experience and familiarity with using KNN and SGD regression and classification models. From there, we can put theory into practice by fully analyzing the results and data.\
The purpose of this task is to gain experience and familiarity with using Tree based and KNN classification models. From there, we can put theory into practice by fully analyzing the results and data from the EPA.
The overall purpose of this assignment is to tie together a variety of supervised learning techniques in order to appropriately analyze several questions related to the Behavioral Risk Factor Surveillance System. More specifically, the goal is to use supervised ML to identify patterns of comorbidity among the survey respondents. This is a comparative exercise focusing on pre (2019) and post (2021) covid health.
This assignment involves using various clustering algorithms on the MNIST data to group similar data observations together based on their characteristics. The aim is to identify patterns and outliers in the data. The clustering algorithms used are k-means, mini-batch k-means, DBSCAN, and HDBSCAN, which will be applied to the original and noisy data. The essential steps for the assignment include importing and exploring the data, standardizing it, identifying and filtering out outliers, performing clustering using the chosen techniques, selecting 10 cluster solutions for all techniques, and tuning the algorithms to compare the clusters with the actual response. The best outcomes and associated algorithms will be commented on in the final output, which should be a fully executed Jupyter notebook.