Skip to content

A simple project on the use of map and reduce in Hadoop.

Notifications You must be signed in to change notification settings

MagdaleneHo/MapReduce

Repository files navigation

MapReduce

A simple project on the use of map and reduce in Hadoop. The movie dataset was extracted from Kaggle: https://www.kaggle.com/rounakbanik/the-movies-dataset?select=movies_metadata.csv. There are two datasets used in this project which is the movies_metadata (some columns were removed) and the ratings (26mil rows).

This project aims to gather insights about movies ratings and budget through descriptive analysis by leveraging the parallel processing capabilities of Hadoop Map Reduce.

The objectives are:

  1. To explore the relationship between high budget movies and the average ratings.
  2. To identify the behaviour of the users in this dataset, as to whether the users are generous in their ratings by comparing similar data sets for 2017 and 2019.
  3. To identify the top-rated movies and average ratings for each movie.
  4. To find the popular movies in the dataset through the number of ratings.

About

A simple project on the use of map and reduce in Hadoop.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages