Skip to content

This repository contains analysis done with spark dataframe in Google Cloud Platform using Zeppelin Notebook and HDFS

Notifications You must be signed in to change notification settings

Folasade-Ojo/Stocks-Dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

About the Dataset

This is a large stocks dataset of about 400MB which will be loaded onto HDFS and transformed into a Scala DataFrame using Spark.

Tools

Instead of running the Spark engine in the GCP VM engine, it was intiated in Zeppelin Notebook due to its user-friendly interface and ability to handle error with ease.

Analysis

Loading the stocks dataset into a directory in Hadoop

image

Creating the SPARK schema in Zeppelin Notebook

image

Business Questions

Commands were written to answer below questions about the dataset.

Stocks with an average daily volume greater than 1 million shares.

image

Top 3 stocks by volume for the year 2004

image

Top 3 stocks by volume whose symbol starts with “G”

image

Symbols whose closing price are larger than my age

image

Top 10 stocks with the largest intraday price change

image

About

This repository contains analysis done with spark dataframe in Google Cloud Platform using Zeppelin Notebook and HDFS

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published