Skip to content

Recent Grad of Statistics from Cal Poly SLO looking to become Data Analyst/Scientist - Data Projects

Notifications You must be signed in to change notification settings

dho23/Portfolio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Projects

Hello! I'm Darren Ho and I am a recent graduate of Statistics from California Polytechnic State University in San Luis Obispo. The education and coursework at Cal Poly have been stimulating -- growing my passion for data and data driven decision making, and has formed an eagerness to translate my classroom learnings to real-world experience. The projects below were curated by the team at DataQuest, and were designed to simulate working with real data in real-world scenarios while also helping me learn and apply new data science skills.

Find me on: LinkedIn icon

Exploratory Data Analysis & Visualization

Acted as a data analyst for a company that builds Android and iOS apps. Goal of this project was to analyze data from the Google Play Store and App Store to help our developers understand what type of apps are likely to attract more users - the more users who see and engage with the ads, the better!

In this project, I worked with a dataset of submissions to popular technology site Hacker News. For this project, I was interested in posts with titles that began with either Ask HN or Show HN, and wanted to compare the two types of posts to determine the following:

  • On average, do Ask HN or Show HN posts receive more comments?
  • On average, do posts created at a certain time receive more comments?

In this project, I worked with a dataset of used cars from eBay Kleinanzeigen, a classified section of the German eBay website. The aim of this specific project was to clean the data and analyze the included used car listings.

In this project, I analyzed a dataset about the westbound traffic on the I-94 Interstate highway. John Hogue made the dataset available, and can be downloaded from the UCI Machine Learning Repository. The goal of this analysis was to determine a few indicators of heavy traffic on the I-94. Such variables can be weather type, time of the day, time of the week, etc. Also created data visualizations to give a better idea of what the results tell us.

This project was based on Euro daily exchange rates, and the dataset covered the rates from 1999 to 2021. The data source was from the European Central Bank. This project was about applying what I learned up to this point, which includes exploring and cleaning the dataset, brainstorming an idea for storytelling data visualizations, and then coding it.

In this project, I worked with exit surveys from employees of the Department of Education, Training and Employment (DETE) and the Technical and Further Education (TAFE) institute in Queensland, Australia. I played the role of a data analyst to help answer the questions our "stakeholders" wanted to know. I experienced that in order to extract any meaningful insights from our data, I had to perform many data cleaning tasks.

In this project, I worked with data from FiveThirtyEight in order to answer some questions about Star wars (data was collected prior to the 7th installment in the Star Wars Universe). In particular, I wondered: does the rest of America realize that "The Empire Strikes Back" is clearly the best film of the Star Wars franchise? The project included data cleaning, exploratory analysis and some data visualizations to go along with.

Probability & Statistics

In this project, I looked into findings from a data journalist named Walt Hickey, who analyzed movie ratings data and found strong evidence to suggest that Fandango's rating system was biased and dishonest. Fandango displays a 5-star rating system on their website, where the minimum rating is 0 stars and the maximum is 5 stars. Hickey found that there's a significant discrepancy between the number of stars displayed to users and the actual rating. I wanted to further analyze recent movie ratings data to determine whether there had been any changes in Fandango's rating system after Hickey's analysis.

In this project, I acted as if I were working for an e-learning company that offers courses on programming. Most of our courses are on web and mobile development, but we also cover many other domains, like data science, game development, etc. We want to promote our product and we'd like to invest some money in advertisement. Our goal in this project is to find out the two best markets to advertise our product in.

In this project, I contributed to the development of a fake mobile app by a medican institute that aimed to prevent and treat gambling addictions. The app would help gambling addicts better estimate their chances of winning. The team needed help in creating the logical core of the app and to calculate probabilities. The main purpose of this project was to practice applying proability and combinatorics (permutations and combinations) concepts in a setting that simulates a real-world scenario.

In this project, I studied the practical side of the multinomial Naive Bayes algorithm by building a spam filter for SMS messages/texts. To classify messages as spam or non-spam, I needed the computer to learn how humans classify messages, use human knowledge to estimate probabilities for a new message, and to classify a new message based on the probability values. The goal was to create a spam filter that classifies new messages with high accuracy.

Jeopardy is a popular TV show in the US where participants answer questions to win money. It's been running for many years, and is a major force in pop culture. Imagine that we want to compete on Jeopardy, and we're looking for any way to win. In this project, I worked with a dataset of Jeopardy questions to figure out some patterns in the questions that could help us win. This project incorporated hypothesis testing through significance testing and chi-squared tests.

Machine Learning

I explored the fundamentals of machine learning using the k-nearest neighbors algorithm. So in this project, I practiced the machine learning workflow I had learned so far to predict a car's market price using its attributes. The dataset I worked with contained info on various cars. For each car, I had info about the technical aspects of the vehicle such as the motor's displacement, the weight of the car, the miles per gallon, how fast the car accelerates, and more.

In this project, I practiced what I had learned by exploring ways to improve the models I built through feature engineering, feature selection, and training & testing. I worked with housing data for the city of Ames, Iowa from 2006 to 2010. I was able to build a pipeline of functions that allowed for quick iterations on different models.

In this project, I tried to predict the total number of bikes people rented in a given hour. To accomplish this, I created a few different machine learning models and evaluated their performances. I calculated features, split the data into train and test sets, and then applied linear regression, decision trees and random forests algorithms to the data.

In a previous lession, I had learned how adding hidden layers of neurons to a neural network can improve its ability to capture nonlinearity in the data. So in this project, I explored why image classification is a hard task, observe the limitations of traditional machine learning models for image classification, train,test, and improved a few different deep neural networks for image classification. Wanted to explore the effectiveness of deep, feedforward neural networks at classifying images.

In this project, I put together all that I had learned and created a data science workflow. By defining a workflow, I was able to give myself a framework with which to make iterating on ideas quicker and easier, allowing for myself to work more efficiently. I explored a workflow to make competing in the Kaggle Titanic competeition easier, using a pipeline of function to reduce the number of dimensions needing focus on.

About

Recent Grad of Statistics from Cal Poly SLO looking to become Data Analyst/Scientist - Data Projects

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published