Skip to content

christophergaldi6/Data_Science

Repository files navigation

Data Science

A collection of projects focusing on the utilization of Linear Regression, Logistical Regression, K-Means, Hierarchal Agglomerative Clustering, and other methods to analyze problems and provide solutions using data in Jupyter Notebooks using Pandas, Numpy, and other Python libraries.

Project List

1.FIFA Project

  • A project looking at the varied spread of players, clubs, and nations represented in-game and the relation between in-game abilities and their real-life evaluation. I utilized Linear Regression, Ridge Regression, and LASSO Regression models to predict the overall rating and real-life evaluation of a player based on in-game stats, age, and wages of the player. Also, I clustered in-game stats using a Hierarchal Clustering method to determine the overall strengths of players and the general positions that they play.

2.Diabetes Project #1

  • A data science project using a Linear Regression model utilizing the K-Fold method to predict BMI of a person and a Logistic Regression model utilizing K-Fold method to predict if a person has been diagnosed with diabetes.

3.Food Analysis Project #2

  • A data science project using a Logistical Regression model using a Train-Test Split and K-Nearest Neighbors model to predict if a menu item is a burger or pizza based on dietary information, and looked at the menu of Krispy Kreme to cluster the donuts being sold. I visualized the data and results to analyze the spread and contents of the data and the accuracy of the predictions and used scatterplots, confusion matrices, and clustering techniques to analyze the results of the models.

Assignments from CPSC 392

Homework 1: Exploring and Visualizing Data

  • Assignment using Pandas and Numpy libraries to expand the data sets using scatterplots, line graphs, and other strategies and effectively portraying them to analyze.

Homework 2

  • Assignment using Pandas and Numpy libraries to create linear regression models to predict the BMI and test the accuracy of the models using mean squared error, r2 score, and coefficient evaluation. Through z-scoring and variable manipulation, I was able to use the models to visualize the BMI data to predict body weight.

Homework 3

  • Assignment that uses Pandas and Numpy libraries to create logistic regression models to predict admissions and test the accuracy of the models. Using z-scoring and variable manipulation, I was able to predict admissions and other variables while also testing if the models were accurate and precise through logarithmic odds and coefficient evaluation and a confusion matrix to visualize. I also looked at the probability and odds of certain students admissions.

Homework 5

  • Assignment using Pandas and Numpy libraries to create clustering models for the different profficiency levels of programming languages for each person in the data set. I utilized the K-Means method, the Empirical method, and the Hierarchal Clustering method to determine the clustering of each person for the different programming languages and visualized the findings using scatterplots to show the clustering.