Projects

Hello! I'm Darren Ho and I am a recent graduate of Statistics from California Polytechnic State University in San Luis Obispo. The education and coursework at Cal Poly have been stimulating -- growing my passion for data and data driven decision making, and has formed an eagerness to translate my classroom learnings to real-world experience. The projects below were curated by the team at DataQuest, and were designed to simulate working with real data in real-world scenarios while also helping me learn and apply new data science skills.

Find me on:

Exploratory Data Analysis & Visualization

📱 Profitable App Profiles for the App Store & Google Play Markets -

Acted as a data analyst for a company that builds Android and iOS apps. Goal of this project was to analyze data from the Google Play Store and App Store to help our developers understand what type of apps are likely to attract more users - the more users who see and engage with the ads, the better!

Exploring Hacker News Posts -

In this project, I worked with a dataset of submissions to popular technology site Hacker News. For this project, I was interested in posts with titles that began with either Ask HN or Show HN, and wanted to compare the two types of posts to determine the following:

On average, do Ask HN or Show HN posts receive more comments?
On average, do posts created at a certain time receive more comments?

🚙 Exploring eBay Car Sales Data -

In this project, I worked with a dataset of used cars from eBay Kleinanzeigen, a classified section of the German eBay website. The aim of this specific project was to clean the data and analyze the included used car listings.

Finding Heavy Traffic Indicators on I-94 -

In this project, I analyzed a dataset about the westbound traffic on the I-94 Interstate highway. John Hogue made the dataset available, and can be downloaded from the UCI Machine Learning Repository. The goal of this analysis was to determine a few indicators of heavy traffic on the I-94. Such variables can be weather type, time of the day, time of the week, etc. Also created data visualizations to give a better idea of what the results tell us.

💰 Storytelling Data Visualization on Exchange Rates -

This project was based on Euro daily exchange rates, and the dataset covered the rates from 1999 to 2021. The data source was from the European Central Bank. This project was about applying what I learned up to this point, which includes exploring and cleaning the dataset, brainstorming an idea for storytelling data visualizations, and then coding it.

Employee Exit Surveys -

In this project, I worked with exit surveys from employees of the Department of Education, Training and Employment (DETE) and the Technical and Further Education (TAFE) institute in Queensland, Australia. I played the role of a data analyst to help answer the questions our "stakeholders" wanted to know. I experienced that in order to extract any meaningful insights from our data, I had to perform many data cleaning tasks.

Star Wars Survey -

In this project, I worked with data from FiveThirtyEight in order to answer some questions about Star wars (data was collected prior to the 7th installment in the Star Wars Universe). In particular, I wondered: does the rest of America realize that "The Empire Strikes Back" is clearly the best film of the Star Wars franchise? The project included data cleaning, exploratory analysis and some data visualizations to go along with.

Probability & Statistics

🎥Investigating Fandango Movie Ratings -

In this project, I looked into findings from a data journalist named Walt Hickey, who analyzed movie ratings data and found strong evidence to suggest that Fandango's rating system was biased and dishonest. Fandango displays a 5-star rating system on their website, where the minimum rating is 0 stars and the maximum is 5 stars. Hickey found that there's a significant discrepancy between the number of stars displayed to users and the actual rating. I wanted to further analyze recent movie ratings data to determine whether there had been any changes in Fandango's rating system after Hickey's analysis.

💻Finding the Best Markets to Advertise In -

In this project, I acted as if I were working for an e-learning company that offers courses on programming. Most of our courses are on web and mobile development, but we also cover many other domains, like data science, game development, etc. We want to promote our product and we'd like to invest some money in advertisement. Our goal in this project is to find out the two best markets to advertise our product in.

Mobile App for Lottery Addiction -

In this project, I contributed to the development of a fake mobile app by a medican institute that aimed to prevent and treat gambling addictions. The app would help gambling addicts better estimate their chances of winning. The team needed help in creating the logical core of the app and to calculate probabilities. The main purpose of this project was to practice applying proability and combinatorics (permutations and combinations) concepts in a setting that simulates a real-world scenario.

📱Building a Spam Filter with Naive Bayes -

In this project, I studied the practical side of the multinomial Naive Bayes algorithm by building a spam filter for SMS messages/texts. To classify messages as spam or non-spam, I needed the computer to learn how humans classify messages, use human knowledge to estimate probabilities for a new message, and to classify a new message based on the probability values. The goal was to create a spam filter that classifies new messages with high accuracy.

Winning Jeopardy -

Jeopardy is a popular TV show in the US where participants answer questions to win money. It's been running for many years, and is a major force in pop culture. Imagine that we want to compete on Jeopardy, and we're looking for any way to win. In this project, I worked with a dataset of Jeopardy questions to figure out some patterns in the questions that could help us win. This project incorporated hypothesis testing through significance testing and chi-squared tests.

Machine Learning

🚗 Predicting Car Prices -

I explored the fundamentals of machine learning using the k-nearest neighbors algorithm. So in this project, I practiced the machine learning workflow I had learned so far to predict a car's market price using its attributes. The dataset I worked with contained info on various cars. For each car, I had info about the technical aspects of the vehicle such as the motor's displacement, the weight of the car, the miles per gallon, how fast the car accelerates, and more.

🏘️ Predicting House Sale Prices -

In this project, I practiced what I had learned by exploring ways to improve the models I built through feature engineering, feature selection, and training & testing. I worked with housing data for the city of Ames, Iowa from 2006 to 2010. I was able to build a pipeline of functions that allowed for quick iterations on different models.

🚲 Predicting Bike Rentals -

In this project, I tried to predict the total number of bikes people rented in a given hour. To accomplish this, I created a few different machine learning models and evaluated their performances. I calculated features, split the data into train and test sets, and then applied linear regression, decision trees and random forests algorithms to the data.

Building A Handwritten Digits Classifier -

In a previous lession, I had learned how adding hidden layers of neurons to a neural network can improve its ability to capture nonlinearity in the data. So in this project, I explored why image classification is a hard task, observe the limitations of traditional machine learning models for image classification, train,test, and improved a few different deep neural networks for image classification. Wanted to explore the effectiveness of deep, feedforward neural networks at classifying images.

Creating a Kaggle Workflow -

In this project, I put together all that I had learned and created a data science workflow. By defining a workflow, I was able to give myself a framework with which to make iterating on ideas quicker and easier, allowing for myself to work more efficiently. I explored a workflow to make competing in the Kaggle Titanic competeition easier, using a pipeline of function to reduce the number of dimensions needing focus on.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Projects

Exploratory Data Analysis & Visualization

📱 Profitable App Profiles for the App Store & Google Play Markets -

Exploring Hacker News Posts -

🚙 Exploring eBay Car Sales Data -

Finding Heavy Traffic Indicators on I-94 -

💰 Storytelling Data Visualization on Exchange Rates -

Employee Exit Surveys -

Star Wars Survey -

Probability & Statistics

🎥Investigating Fandango Movie Ratings -

💻Finding the Best Markets to Advertise In -

Mobile App for Lottery Addiction -

📱Building a Spam Filter with Naive Bayes -

Winning Jeopardy -

Machine Learning

🚗 Predicting Car Prices -

🏘️ Predicting House Sale Prices -

🚲 Predicting Bike Rentals -

Building A Handwritten Digits Classifier -

Creating a Kaggle Workflow -

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
Building A Handwritten Digits Classifier		Building A Handwritten Digits Classifier
Building a Spam Filter with Naive Bayes		Building a Spam Filter with Naive Bayes
Clean & Analyze Employee Exit Surveys		Clean & Analyze Employee Exit Surveys
Creating a Kaggle Workflow		Creating a Kaggle Workflow
Exploring Hacker News Posts		Exploring Hacker News Posts
Exploring eBay Car Sales Data		Exploring eBay Car Sales Data
Finding Heavy Traffic Indicators on I-94		Finding Heavy Traffic Indicators on I-94
Finding the Best Markets to Advertise In		Finding the Best Markets to Advertise In
Intro SQL Project/new		Intro SQL Project/new
Investigating Fandango Movie Ratings		Investigating Fandango Movie Ratings
Mobile App for Lotto Addiction		Mobile App for Lotto Addiction
Predicting Bike Rentals		Predicting Bike Rentals
Predicting Car Prices		Predicting Car Prices
Predicting House Sale Prices		Predicting House Sale Prices
Profitable App Profiles for the App Store & Google Play Markets		Profitable App Profiles for the App Store & Google Play Markets
Star Wars Survey		Star Wars Survey
Storytelling Data Visualization on Exchange Rates		Storytelling Data Visualization on Exchange Rates
Winning Jeopardy		Winning Jeopardy
README.md		README.md

dho23/Portfolio

Folders and files

Latest commit

History

Repository files navigation

Projects

Exploratory Data Analysis & Visualization

📱 Profitable App Profiles for the App Store & Google Play Markets -

Exploring Hacker News Posts -

🚙 Exploring eBay Car Sales Data -

Finding Heavy Traffic Indicators on I-94 -

💰 Storytelling Data Visualization on Exchange Rates -

Employee Exit Surveys -

Star Wars Survey -

Probability & Statistics

🎥Investigating Fandango Movie Ratings -

💻Finding the Best Markets to Advertise In -

Mobile App for Lottery Addiction -

📱Building a Spam Filter with Naive Bayes -

Winning Jeopardy -

Machine Learning

🚗 Predicting Car Prices -

🏘️ Predicting House Sale Prices -

🚲 Predicting Bike Rentals -

Building A Handwritten Digits Classifier -

Creating a Kaggle Workflow -

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages