Skip to content

devals94/hadoop-proof-of-concept

Repository files navigation

hadoop-proof-of-concept

This repository contains the proof of concept for various hadoop technologies.

Hadoop Mapreduce is shown on IPL Dataset.

Objective with Mapreduce is to determine:

  1. The total number of matches played in every season.
  2. The number of matches played in a particular stadium.
  3. The decision on winning the toss and how many times batting and fielding were selected on winning toss from season1 to season 9.
  4. The number of matches played by particular team at particular stadium.
  5. The total number of matches played by every team.
  6. The total number of matches won by every team.
  7. The total number of matches won by every team.
  8. How many times a team has won toss & match at a particular stadium.

Hive is shown on McDonalds Dataset.

Objective with Hive is to determine:

  1. The total count from the menu.
  2. The max calories of different categories.
  3. The top 10 items with highest calories.
  4. The top 10 items with highest sugars.
  5. The top 10 items with highest proteins.
  6. The top 10 categories-items with highest Vitamin A & Vitamin C.
  7. To Partition the menu according to Category (Example: Category = Breakfast, Category = Beverages, Category = Deserts).
  8. To pick any value from Bucket after partitioning.

Pig is shown on 2016 Olympics in Rio de Janerio Dataset.

Objective with Pig is to determine:

  1. Find total participants by country.
  2. Find total male & female participants.
  3. Find total male participants per country and female participants by country.
  4. Find total gold & silver won.
  5. Find oldest participant.
  6. Find youngest participant.
  7. Find number of participants with respect to a particular sport & country.
  8. Find total participants per sport.

Sqoop is shown on Loandata Dataset.

Objective with Sqoop is:

1.To show export & import commands using sqoop.

Releases

No releases published

Packages