UE19CS322 : Big Data - Assignment 1

Aim of the assignment:

To read a data set and perform specific tasks using the Map-Reduce framework of the Hadoop Ecosystem.

Language used: Python 3.8.x
Dataset: 5% and 15% Dataset

Steps to run tasks:

To run the code locally on your system wihtout using hadoop, the following command can be used:

cat <path-to-json-file-dataset> | python3 mapper.py [command_line_arguments] | sort -k 1,1 | python3 reducer.py [command_line_arguments]

To run the code on Hadoop HDFS on your local system:

Turn on Hadoop on your local system.
Create a directory within hdfs to store the dataset file.

hdfs dfs -mkdir /<folder-name>

The command below is used to create a folder called input to store the dataset.

hdfs dfs -mkdir /<folder-name>/input

Add the json dataset file into the directory input which was created in the previous step.

hdfs dfs -put <path-to-json-file> /<folder-name>/input

To verify if the JSON files was successfully added:

hdfs dfs -ls /<folder-name>/input

To run the code on the Hadoop HDFS, run this command. Note that the output folder must NOT exist when running this command. Hadoop creates it internally.

hadoop jar <path-to-streaming-jar-file> -input /<folder-name>/input -output /<folder-name>/output -file <path-to-mapper-file> <path-to-reducer-file> -mapper "python3 mapper.py [command_line_arguments]" -reducer "python3 reducer.py"

Once executed, the output will be visible using the following command.

hdfs dfs -cat /<folder-name>/output/part-00000

Contributors:

Hari Raagav T R
Manasa S M
Lakshmi Narayan P

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Task 1		Task 1
Task 2		Task 2
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UE19CS322 : Big Data - Assignment 1

Aim of the assignment:

Steps to run tasks:

Contributors:

About

Releases

Packages

Contributors 3

Languages

HariRaagavTR/big-data-analysis

Folders and files

Latest commit

History

Repository files navigation

UE19CS322 : Big Data - Assignment 1

Aim of the assignment:

Steps to run tasks:

Contributors:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages