Big Data Analytics Project

Overview

This project involves the use of Big Data technologies to analyze and build a classification model using a large dataset(Assuming that the dataset is large). The project uses Hadoop and Spark to load and process data, MongoDB for data warehouse, HDFS for datalake.

Data

The project starts with a large data source, which could be a CSV file or any other file format. The data is loaded onto the Hadoop Distributed File System (HDFS) to ensure storage scalability.

Sandbox

The next step involves creating a sandboxed environment using Hadoop and Spark. The data is loaded into MongoDB to ensure scalability through a Big Data architecture.

Exploratory Data Analysis

The sandboxed environment is then used for exploratory analyses with standard libraries to analyze the dataset, and perform feature selection.

Model Building

Spark is used to apply the analyses or train/apply models to the altered (I applied undersampling to entire dataset) dataset. The model is built using the results obtained from the exploratory data analysis.

MapReduce Jobs

Additional MapReduce job was added to demonstrate the skills.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Map-Reduce		Map-Reduce
.gitignore		.gitignore
Analysis.ipynb		Analysis.ipynb
Main.ipynb		Main.ipynb
README.md		README.md
terminal-commands.txt		terminal-commands.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Map-Reduce

Map-Reduce

.gitignore

.gitignore

Analysis.ipynb

Analysis.ipynb

Main.ipynb

Main.ipynb

README.md

README.md

terminal-commands.txt

terminal-commands.txt

Repository files navigation

Big Data Analytics Project

Overview

Data

Sandbox

Exploratory Data Analysis

Model Building

MapReduce Jobs

About

Releases

Packages

Contributors 2

Languages

Mert-Cihangiroglu/Big-Data-Analytics-Solution

Folders and files

Latest commit

History

Repository files navigation

Big Data Analytics Project

Overview

Data

Sandbox

Exploratory Data Analysis

Model Building

MapReduce Jobs

About

Topics

Resources

Stars

Watchers

Forks

Languages