Skip to content

Minjieli6/google_cloud_platform_app1

Repository files navigation

Stock Visualizer and Forecast Application on Google Cloud Platform

CI

I. Project Overview

In this project, we build an application to visualize stock market data from Yahoo Finance and its forecasting of market movement over time through the architecture pipeline shown in Figure 1.

App link

https://googlecloudplatformapp1-hmlu6pvwmq-ue.a.run.app/

Demo link

https://youtu.be/wIXzjELHWNQ

Architecture Diagram



Figure 1: Architecture diagram

II. Source Code Stored in GitHub

Our source codes are stored in GitHub repo https://github.com/Minjieli6/google_cloud_platform_app1. It can be easily cloned with the following code.

git clone git@github.com:Minjieli6/google_cloud_platform_app1.git
cd google_cloud_platform_app1/
virtualenv ~/.venv && source ~/.venv/bin/activate
make all
#Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

Below is the list of files contained in this repo.

requirements.txt stores all the libraries, modules, and packages on which the Python project is dependent or required to run, such as pytest, pylint, dash, plotly, jinjia2, gunicorn, etc.

Makefile defines a set of tasks to be executed and simplifies automating software building procedures and complex tasks with dependencies.

main.py contains the main part of the application, including Dash server and layout.

test_main.py is a test file to check if the open source is available.

Dockerfile is used to execute all the commands to automatically build an image of the application.

app.yaml specifies how URL paths correspond to request handlers and runs the Dash app on GCP using gunicorn.

III. Continuous Deployment from CircleCI

The file main.yml under the folder .github/workflows/ has been set up for GitHub Actions workflow. The credentials have been set up by using SSH keys from GCP Security via a secret manager with identity aware proxy. The code below is used to pull and push the code to GitHub from the GCP terminal.

git status
git add *
git commit -m "merging code"
git config --global user.email "minjieli6@gmail.com"
git config --global user.name "Minjieli6"
git pull
git push

IV. Data

Real Time Extraction

We get the stock market data from Yahoo Finance with several input parameters, such as stock symbol, starting time, ending time, and frequency through an URL as below. In this case, it’s unnecessary to store data into Google Cloud Storage since we’re able to leverage the real-time extraction.

interval = '1d'
ticker = '^GSPC'
period1 = int(time.mktime(datetime(2010,1,1,23,59).timetuple()))
period2 = int(time.mktime(datetime.now().timetuple()))

query_string = f"https://query1.finance.yahoo.com/v7/finance/download/{ticker}?period1={period1}&period2={period2}&interval={interval}&events=history&includeAdjustedClose=true"

df = pd.read_csv(query_string)
print(df.head(3))

Google Cloud Storage

Alternatively, we can upload data to Google Cloud Storage, and then use Google Cloud Function to schedule data updates through Google Cloud Scheduler in Figure 4. However, it would generate cost from the Cloud Storage.

Figure 4: Cloud Storage for BigQuery ML

V. ML predictions created and served out (AutoML, BigQuery, etc.)

NeuralProphet

The model NeuralProphet is embedded in the app to predict future values based on history data. NeuralProphet, a python library for time series models based on neural networks, is built on top of PyTorch and inspired by Facebook Prophet and AR-Net library. NeuralProphet optimizes gradient descent with PyTorch, applies AR-Net for autocorrelation, leverages a separate Feed-Forward Neural Network (FFNN) for lagged regressors, and configure nonlinear deep layers of the FFNNs.

from neuralprophet import NeuralProphet

df['ds'] = df['Date']
df['y'] = df['Adj Close']

model = NeuralProphet(n_forecasts=360,n_lags=360,epochs=100)
model.fit(df[['ds','y']], freq='D')
future = model.make_future_dataframe(df[['ds','y']], periods=360,n_historic_predictions=len(df))
forecast = model.predict(future)

model.plot_components(forecast)

Figure 5.1: S&P time series decomposition

GCP BigQuery ML

Alternatively, we can easily create an end-to-end AutoML model ARIMA for training and forecasting the stock with BigQuery ML. In Figure 5.2, we train and deploy ML models directly in SQL, then visualize the forecasted values with Data Studio in Figure 5.3. In BigQuery ML, data is auto preprocessed with missing value imputation, timestamp deduplication, anomalies identification, holiday effects, seasonal and trend decomposition. The best model is generated with the lowest AIC score. Time-series model Auto ARIMA can be scheduled to retrain automatically. The result can be loaded and displayed in Python using the code below.

Figure 5.2: Create an ARIMA model in SQL with BigQuery ML

Figure 5.3: S&P time series decomposition

from google.cloud import bigquery

gcp_project = 'i-mariner-347323'
db_project = 'Dataset'

client = bigquery.Client(project=gcp_project)
dataset_ref = client.dataset(db_project)

def gcp2df(sql):
	query = client.query(sql)
	results = query.result()
	return results.to_dataframe()

qry = """SELECT * FROM `i-mariner-347323.Dataset.AMZN_output`"""

print(df.head())

fig = px.line(df, x="timestamp", y=df.columns, title='Amazon Stock Price')
fig.show()

VI. Stackdriver installed for monitoring

Google Cloud’s operation suite, formerly called stackdriver, integrated monitoring, logging, and trace managed services for applications and systems running on Google Cloud. It not only provides visibility into the performance, uptime, and overall health of the app, but also enables users to set alerts and notify if metrics are in the expected ranges. Cloud logging in Figure 6.1 shows real-time logs and helps improve troubleshooting and debugging. Cloud monitoring in Figure 6.2 is a custom dashboard for us to track the usages of container’s memory and CPU, as well log entries etc. Cloud trace in Figure 6.3 collects latency data from the app and tracks how requests propagate through the app.

Figure 6.1: GCP cloud logging

Figure 6.2: Cloud monitoring custom dashboard

Figure 6.3: Cloud trace

VII. Deployed into GCP environment with Cloud Run

Deploy to GCP Cloud Run

The app is built with Dockerfile and deployed as container to Cloud Run service [googlecloudplatformapp1] in project [second-strand-351703] region [us-east1] in Figure 7.1.

Figure 7.1: Deployed into GCP environment with Cloud Run

make all
gcloud run deploy

This code is used to deploy the container to cloud run. We can see the API traffic, errors, and median latency in Figure 7.2. Once it's deployed, there are more comprehensive metrics including request and instant count in Cloud Run metrics as Figure 7.3.

Figure 7.2: API services

Figure 7.3: Cloud run metrics

Deploy locally

Alternatively, the app can be easily deployed with a local host by using the code below.

make install
python main.py

VIII. Result and Demo

The interactive results with plotly dash display high, low, adjusted closed, moving average, and next 360 days forecasted values in Figure 8. Demo is also attached.

Input Parameters:

  • 1st text input: stock symbol (such as ^GSPC, ^DJI, or FDN etc)
  • 2nd numeric input: number of days moving average (such as 30, 60, 90, 120 etc)
  • 3nd text input: (such as visualization or forecast)
    • visualization only for historical data
    • forecast for both historical and forecasted values
      • Note: the forecast part is not stable, it may require a time delay to allow model training, fitting and predicting
        time.sleep(30) 
  • time sliders:
    • year slicer for all the charts at the bottom of the web
    • range slicer for individual chart

Click the chart below for Demo video

Watch the video

Figure 8: Historical and forecasted values on the deployed app

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published