This repository contains almost all the coursework I completed as part of my Applied Data Science course at NYU Abu Dhabi. The course focuses on using Python for Data Science applications. The course also delves into machine learning, providing hands-on experience with applying appropriate ML models for various prediction and classification tasks.
The final project of the course involves working on a large dataset, which in our case was the MovieLens 1M dataset. Our project was centered around finding a link between user demographics and their movie genre preferences. As part of our project, we used Python's pandas
, numpy
, matplotlib
, seaborn
and scikit-learn
libraries to:
- munge the data
- select or engineer appropriate features for our analysis
- visualize the data
- perform statistical analysis to find relations between our variables of interest
- develop predictive models based on our hypothesis
In the end, we developed a multi-label classification model and an SVD-based collaborative filtering model to predict movie genres that users would like, given their age, gender and occupation.