GitHub - Folasade-Ojo/Stocks-Dataset: This repository contains analysis done with spark dataframe in Google Cloud Platform using Zeppelin Notebook and HDFS

About the Dataset

This is a large stocks dataset of about 400MB which will be loaded onto HDFS and transformed into a Scala DataFrame using Spark.

Tools

Instead of running the Spark engine in the GCP VM engine, it was intiated in Zeppelin Notebook due to its user-friendly interface and ability to handle error with ease.

Analysis

Loading the stocks dataset into a directory in Hadoop

Creating the SPARK schema in Zeppelin Notebook

Business Questions

Commands were written to answer below questions about the dataset.

Stocks with an average daily volume greater than 1 million shares.

Top 3 stocks by volume for the year 2004

Top 3 stocks by volume whose symbol starts with “G”

Symbols whose closing price are larger than my age

Top 10 stocks with the largest intraday price change

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
The Queries in GCP and Zeppelin Notebook.txt		The Queries in GCP and Zeppelin Notebook.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About the Dataset

Tools

Analysis

Loading the stocks dataset into a directory in Hadoop

Creating the SPARK schema in Zeppelin Notebook

Business Questions

About

Releases

Packages

Folasade-Ojo/Stocks-Dataset

Folders and files

Latest commit

History

Repository files navigation

About the Dataset

Tools

Analysis

Loading the stocks dataset into a directory in Hadoop

Creating the SPARK schema in Zeppelin Notebook

Business Questions

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages