Skip to content
This repository has been archived by the owner on Jun 13, 2023. It is now read-only.

digitalghost-dev/stock-data-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Stock Data Pipeline with Python and Google Cloud

Tip
This project is now archived. The visualization still works but has stopped being updated as of March 30th, 2022. Archival was set due to no longer wanting to pay for API usage.
Tip
Any data in this project or on my website is for informational purposes only and should not be taken as invesment advice.

Version

Overview

  • Extracts and transforms S&P 500 stock data with Python from a financial API.
  • Data is loaded into Cloud Storage then transferred to BigQuery and rendered on my webpage.
  • Python code runs on a scheduled cron job through a virtual machine with GCP Compute Engine.

Important Links

How the Pipeline Works

Data Pipeline

  1. A cron job triggers main.py to run.
  2. main.py calls the IEX Cloud API.
  3. The data is processed and cleaned by removing commas, hyphens, and/or other extra characters from the company name column.
  4. main.py creates a csv file with the prepared data.
  5. load.py copies the csv file to a Cloud Storage bucket.
  6. The csv file is loaded to BigQuery.
  7. Using the BigQuery API and when the webpage is loaded, the data is queried and then displayed.

CI/CD

  • None

Notes:

  • The file that connects to BigQuery to pull the data when the page loads is located in my wesbite repository since that renders the frontend.
  • The pipeline does not account for holidays.

Pipeline Flowchart

stock-data-flowchart

Services Used