- I. Project Overview
- II. Source code stored in GitHub
- III. Continuous Deployment from CircleCI
- IV. Data
- V. ML predictions created and served out (AutoML, BigQuery, etc.)
- VI. Stackdriver installed for monitoring
- VII. Deployed into GCP environment with Cloud Run
- VIII. Result and Demo
In this project, we build an application to visualize stock market data from Yahoo Finance and its forecasting of market movement over time through the architecture pipeline shown in Figure 1.
https://googlecloudplatformapp1-hmlu6pvwmq-ue.a.run.app/
Figure 1: Architecture diagram
Our source codes are stored in GitHub repo https://github.com/Minjieli6/google_cloud_platform_app1. It can be easily cloned with the following code.
git clone git@github.com:Minjieli6/google_cloud_platform_app1.git
cd google_cloud_platform_app1/
virtualenv ~/.venv && source ~/.venv/bin/activate
make all
#Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)
Below is the list of files contained in this repo.
requirements.txt stores all the libraries, modules, and packages on which the Python project is dependent or required to run, such as pytest, pylint, dash, plotly, jinjia2, gunicorn, etc.
Makefile defines a set of tasks to be executed and simplifies automating software building procedures and complex tasks with dependencies.
main.py contains the main part of the application, including Dash server and layout.
test_main.py is a test file to check if the open source is available.
Dockerfile is used to execute all the commands to automatically build an image of the application.
app.yaml specifies how URL paths correspond to request handlers and runs the Dash app on GCP using gunicorn.
The file main.yml under the folder .github/workflows/ has been set up for GitHub Actions workflow. The credentials have been set up by using SSH keys from GCP Security via a secret manager with identity aware proxy. The code below is used to pull and push the code to GitHub from the GCP terminal.
git status
git add *
git commit -m "merging code"
git config --global user.email "minjieli6@gmail.com"
git config --global user.name "Minjieli6"
git pull
git push
We get the stock market data from Yahoo Finance with several input parameters, such as stock symbol, starting time, ending time, and frequency through an URL as below. In this case, it’s unnecessary to store data into Google Cloud Storage since we’re able to leverage the real-time extraction.
interval = '1d'
ticker = '^GSPC'
period1 = int(time.mktime(datetime(2010,1,1,23,59).timetuple()))
period2 = int(time.mktime(datetime.now().timetuple()))
query_string = f"https://query1.finance.yahoo.com/v7/finance/download/{ticker}?period1={period1}&period2={period2}&interval={interval}&events=history&includeAdjustedClose=true"
df = pd.read_csv(query_string)
print(df.head(3))
Alternatively, we can upload data to Google Cloud Storage, and then use Google Cloud Function to schedule data updates through Google Cloud Scheduler in Figure 4. However, it would generate cost from the Cloud Storage.
Figure 4: Cloud Storage for BigQuery ML
The model NeuralProphet is embedded in the app to predict future values based on history data. NeuralProphet, a python library for time series models based on neural networks, is built on top of PyTorch and inspired by Facebook Prophet and AR-Net library. NeuralProphet optimizes gradient descent with PyTorch, applies AR-Net for autocorrelation, leverages a separate Feed-Forward Neural Network (FFNN) for lagged regressors, and configure nonlinear deep layers of the FFNNs.
from neuralprophet import NeuralProphet
df['ds'] = df['Date']
df['y'] = df['Adj Close']
model = NeuralProphet(n_forecasts=360,n_lags=360,epochs=100)
model.fit(df[['ds','y']], freq='D')
future = model.make_future_dataframe(df[['ds','y']], periods=360,n_historic_predictions=len(df))
forecast = model.predict(future)
model.plot_components(forecast)
Figure 5.1: S&P time series decomposition
Alternatively, we can easily create an end-to-end AutoML model ARIMA for training and forecasting the stock with BigQuery ML. In Figure 5.2, we train and deploy ML models directly in SQL, then visualize the forecasted values with Data Studio in Figure 5.3. In BigQuery ML, data is auto preprocessed with missing value imputation, timestamp deduplication, anomalies identification, holiday effects, seasonal and trend decomposition. The best model is generated with the lowest AIC score. Time-series model Auto ARIMA can be scheduled to retrain automatically. The result can be loaded and displayed in Python using the code below.
Figure 5.2: Create an ARIMA model in SQL with BigQuery ML
Figure 5.3: S&P time series decomposition
from google.cloud import bigquery
gcp_project = 'i-mariner-347323'
db_project = 'Dataset'
client = bigquery.Client(project=gcp_project)
dataset_ref = client.dataset(db_project)
def gcp2df(sql):
query = client.query(sql)
results = query.result()
return results.to_dataframe()
qry = """SELECT * FROM `i-mariner-347323.Dataset.AMZN_output`"""
print(df.head())
fig = px.line(df, x="timestamp", y=df.columns, title='Amazon Stock Price')
fig.show()
Google Cloud’s operation suite, formerly called stackdriver, integrated monitoring, logging, and trace managed services for applications and systems running on Google Cloud. It not only provides visibility into the performance, uptime, and overall health of the app, but also enables users to set alerts and notify if metrics are in the expected ranges. Cloud logging in Figure 6.1 shows real-time logs and helps improve troubleshooting and debugging. Cloud monitoring in Figure 6.2 is a custom dashboard for us to track the usages of container’s memory and CPU, as well log entries etc. Cloud trace in Figure 6.3 collects latency data from the app and tracks how requests propagate through the app.
Figure 6.1: GCP cloud logging
Figure 6.2: Cloud monitoring custom dashboard
Figure 6.3: Cloud trace
The app is built with Dockerfile and deployed as container to Cloud Run service [googlecloudplatformapp1] in project [second-strand-351703] region [us-east1] in Figure 7.1.
Figure 7.1: Deployed into GCP environment with Cloud Run
make all
gcloud run deploy
This code is used to deploy the container to cloud run. We can see the API traffic, errors, and median latency in Figure 7.2. Once it's deployed, there are more comprehensive metrics including request and instant count in Cloud Run metrics as Figure 7.3.
Figure 7.2: API services
Figure 7.3: Cloud run metrics
Alternatively, the app can be easily deployed with a local host by using the code below.
make install
python main.py
The interactive results with plotly dash display high, low, adjusted closed, moving average, and next 360 days forecasted values in Figure 8. Demo is also attached.
- 1st text input: stock symbol (such as ^GSPC, ^DJI, or FDN etc)
- 2nd numeric input: number of days moving average (such as 30, 60, 90, 120 etc)
- 3nd text input: (such as visualization or forecast)
- visualization only for historical data
- forecast for both historical and forecasted values
- Note: the forecast part is not stable, it may require a time delay to allow model training, fitting and predicting
time.sleep(30)
- Note: the forecast part is not stable, it may require a time delay to allow model training, fitting and predicting
- time sliders:
- year slicer for all the charts at the bottom of the web
- range slicer for individual chart
Demo link: https://youtu.be/wIXzjELHWNQ
Click the chart below for Demo video
Figure 8: Historical and forecasted values on the deployed app