Skip to content

This portfolio is a collection of my work completed independently and as class work to show my skills and abilities in Data Science, Data Analytics, and Data Engineering.

License

Notifications You must be signed in to change notification settings

datagirlz19/datagirlz19.github.io

 
 

Repository files navigation

This portfolio is a collection of my work completed independently and as class work to show my skills and abilities in Data Science / Analytics. During college, I gained experience working in team settings and creating data-driven solutions. There, I learned how to: use design visualizations for impact, create machine learning models, and teach myself skills to present actionable results to an audience. I hope you enjoy some of my insights.

To Contact Me

Send me an email at: swright22@wooster.edu

Message me on LinkedIn: www.linkedin.com/in/swright22

datagirlz19 https://linkedin.com/in/swright22 19185336 datagirlz19 swright22 swright22

Check out my blog: www.datagirlz19.github.io

Project 1 - Predicting Low Income Households

fredy-martinez-ou3fG2zWbcs-unsplash Photo by Fredy Martinez on Unsplash

Description:

In 2020, the US census recorded the information of over 48,000 individuals and consolidated the information into the data set provided. We were tasked with creating a model to predict whether someone has an annual income of over $50,000.

To do this, we performed basic data manipulation on the variables in the dataset to analyze the provided information and gain some insight. The dataset mostly included categorical variables, so we decided to look at each unique category. We found out that there were several overlapping variables and decided to modify them to reduce the number of categories, thus reducing potential complications.

The purpose of this project is to predict whether someone has an annual income of over $50,000.

The annual census is taken to help governments provide resources to families in need. However, there are many situations when this data will not be accurate or complete, such as when leadership changes the definition of what a low-income family is or when families do not fill out the census with accurate/complete information. In cases like these, it is important to keep track of low-income families so that the government can find ways to provide them with the necessary aid. This project aims to find households that are low-income (making under $50,000 annually) so that proper support can be given based on specific predictors.

Skills Used:

Methods Used

  • Inferential Statistics
  • Machine Learning
  • Data Visualization
  • Predictive Modeling

Technologies

  • R

Project 2 - The College of Wooster Admissions Rate

snowsquare

Description:

The purpose of this project is to determine who is more likely to apply to Wooster, and how we can increase these results.

The College of Wooster is a small liberal arts school in Ohio that mainly attracts students due to its prestigious Independent Study Program that draws in students from around the globe. Despite the college's appeal, the college struggles to obtain domestic students, as only 55% of the students that apply to the school are admitted, and only 16% of the students that are admitted to the school accept the offer. We were allowed to analyze a dataset of over 2,000 students in hopes of finding insights into the data and providing small solutions to increase the number of student acceptances.

Methods Used

  • Inferential Statistics
  • Data Visualization

Technologies

  • R

Project 3 - Classifying Natural Disaster Tweets

chris-gallagher-4zxp5vlmvnI-unsplash

Description:

Twitter can be both a resource for finding urgent information and a tool for communicating useless information about sales and petty gossip.

Twitter can be both a resource for finding urgent information such as reporting Natural Disasters (real-time) and asking for help during times of crisis. Hashtags can be useful tools for sifting through the nonsense, but they can also be misused for sales and gossip. This project aims to use Natural Language Processing to determine which messages posted on Twitter can be classified as Natural Disasters or spam.

The purpose of this project is to use NLP to predict whether a tweet is reporting on a natural disaster/crisis or not.

Methods Used

  • Inferential Statistics
  • Machine Learning
  • Natural Language Processing (NLP)
  • Random Forrest Classification

Technologies

  • Python
  • Pandas, Seaborne, Numpy
  • Jupyter Notebooks
  • LateX

About

This portfolio is a collection of my work completed independently and as class work to show my skills and abilities in Data Science, Data Analytics, and Data Engineering.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 70.2%
  • SCSS 24.9%
  • Ruby 4.9%