This is a large stocks dataset of about 400MB which will be loaded onto HDFS and transformed into a Scala DataFrame using Spark.
Instead of running the Spark engine in the GCP VM engine, it was intiated in Zeppelin Notebook due to its user-friendly interface and ability to handle error with ease.
Commands were written to answer below questions about the dataset.
Stocks with an average daily volume greater than 1 million shares.
Top 3 stocks by volume for the year 2004
Top 3 stocks by volume whose symbol starts with “G”
Symbols whose closing price are larger than my age
Top 10 stocks with the largest intraday price change