# Google Cloud Platform Project Creation Workbook 
 
Use this workbook to create a google cloud project with everything needed to collect new data and host your own web app. 
 
Prerequisites:  
+ Create Google user account  <br><br>
+ Create your own personal Google Cloud Project and Enable Billing
    - Enable Free Tier account by seleting "Try it Free" here: [Try Google Cloud Platform for free](https://cloud.google.com/cloud-console)
    - Follow steps to activate billing found here: [Create New Billing Account](https://cloud.google.com/billing/docs/how-to/manage-billing-account#create_a_new_billing_account)
        - Billing account is required for APIs used in this project
        - You will not exceed the $300 free trial setting up this project but make sure to delete the project if you do not want to be charged
        - Take note of project name created because this billing account will be used with the new project <br><br>
+ Install and initialize Google Cloud SDK by following instructions found here: [Cloud SDK Quickstart](https://cloud.google.com/sdk/docs/quickstart) <br><br>
+ Set default region and zone following instructions here:

## Step 1 - Check Prequisites Successfully Completed
Check that you have successfully installed and enabled Cloud SDK by running the config list command. If you get an error please refer to Troubleshooting steps found here [Cloud SDK Quickstart](https://cloud.google.com/sdk/docs/quickstart).  
You should see an output that includes your account along with any other configuration setup when using gcloud init

In [1]:
!gcloud config list

[accessibility]
screen_reader = False
[app]
promote_by_default = false
[core]
account = cwilbar04@gmail.com
disable_usage_reporting = True
project = nba-predictions-dev



Your active configuration is: [default]


Update all gcloud components to latest release.

In [2]:
!gcloud components update

Beginning update. This process may take several minutes.

ERROR: (gcloud.components.update) You cannot perform this action because you do not have permission to modify the Google Cloud SDK installation directory [C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk].

Click the Google Cloud SDK Shell icon and re-run the command in that window, or re-run the command with elevated privileges by right-clicking cmd.exe and selecting "Run as Administrator".


## Step 2 - Create GCP Project

###### TO DO: Enter name for new project and biling project then change to Code block and run
###### Note: Proect name must be unique across GCP. If you get error when creating project please change the project name here and try again.
new_project_id = 'YOUR_NEW_UNIQUE_PROJECT_NAME'

In [42]:
new_project_id = 'nba-predictions-prod'

In [4]:
!gcloud projects create {new_project_id}

ERROR: (gcloud.projects.create) Project creation failed. The project ID you specified is already in use by another project. Please try an alternative ID.


**TO DO: Navigate to [Cloud Console](https://console.cloud.google.com/), Change to new project, and enable billing following instructions found here: [Enable Billing](https://cloud.google.com/billing/docs/how-to/modify-project#enable_billing_for_a_project)**

## Step 3 - Enable Necessary Cloud Services

This project uses:
+ BigQuery to Store Model Data 
+ Google Cloud Functions scheduled using Google Cloud Scheduler to Load new Data Daily
+ Google App Engine to Host Website
+ Google Firestore in Native Mode to store data used by the Web Page  
  
List below contains all services needed at time of creation of this workbook. Please add/remove from this list if the names/necessary services have changed.

In [5]:
enable_services_list = [
    'appengine.googleapis.com',
    'bigquery.googleapis.com',
    'bigquerystorage.googleapis.com',
    'cloudapis.googleapis.com',
    'cloudbuild.googleapis.com',
    'clouddebugger.googleapis.com',
    'cloudfunctions.googleapis.com',
    'cloudresourcemanager.googleapis.com',
    'cloudscheduler.googleapis.com',
    'cloudtrace.googleapis.com',
    'compute.googleapis.com',
    'datastudio.googleapis.com',
    'deploymentmanager.googleapis.com',
    'firebaserules.googleapis.com',
    'firestore.googleapis.com',
    'logging.googleapis.com',
    'monitoring.googleapis.com',
    'oslogin.googleapis.com',
    'servicemanagement.googleapis.com',
    'serviceusage.googleapis.com',
    'sql-component.googleapis.com',
    'storage-api.googleapis.com',
    'storage-component.googleapis.com',
    'storage.googleapis.com'    
]

In [8]:
## Services can only be enabled 20 at a time at the time of workbook creation. Use this loop to enable 20 at a time.
for x in range(0,len(enable_services_list),20):
    !gcloud services enable {' '.join(enable_services_list[x:(x+20)])} --project={new_project_id}   

Operation "operations/acf.p2-130738074716-9c0adbcd-26f0-4865-8ec7-ec84fe7783ac" finished successfully.
Operation "operations/acf.p2-130738074716-53011cf0-3a9e-4a88-958a-c66262a5ff84" finished successfully.


In [9]:
!gcloud services list --project={new_project_id}

NAME                                 TITLE
appengine.googleapis.com             App Engine Admin API
bigquery.googleapis.com              BigQuery API
bigquerystorage.googleapis.com       BigQuery Storage API
cloudapis.googleapis.com             Google Cloud APIs
cloudbuild.googleapis.com            Cloud Build API
clouddebugger.googleapis.com         Cloud Debugger API
cloudfunctions.googleapis.com        Cloud Functions API
cloudresourcemanager.googleapis.com  Cloud Resource Manager API
cloudscheduler.googleapis.com        Cloud Scheduler API
cloudtrace.googleapis.com            Cloud Trace API
compute.googleapis.com               Compute Engine API
containerregistry.googleapis.com     Container Registry API
datastore.googleapis.com             Cloud Datastore API
datastudio.googleapis.com            Data Studio API
deploymentmanager.googleapis.com     Cloud Deployment Manager V2 API
firebaserules.googleapis.com         Firebase Rules API
firestore.googleapis.com             Cloud Fi

## Step 4 - Create Necessary Service Accounts

There are four primary service accounts used in this project:  
- **App Engine default service account**
    - This gets created automatically when the App engine API is enabled
    - Generally your_project_id@appspot.gserviceaccount.com  <br><br>
      
- **Compute Engine default service account**
    - This gets created automatically when the Compute engine API is enabled
    - Generally your_project_number-compute@developer.gserviceaccount.com  <br><br>
      
- **Cloud Function service account**
    - We create this and add necessary roles below using the Cloud SDK
    - cloudfunction-service-account@your_project_name.iam.gserviceaccount.com
    - This account is used as the service account to run all Cloud Functions in this project  <br><br>
      
- **CircleCI Service Account**
    - We create this and add necessary roles below using the Cloud SDK
    - circleci-deployer@your_project_name.iam.gserviceaccount.com
    - This account is used in CircleCI for CI\CD to deploy and test App Engine and Cloud Functions 

Check what service ccounts are already created (should be the two default ones described above)

In [10]:
!gcloud iam service-accounts list --project={new_project_id}

DISPLAY NAME                            EMAIL                                                                       DISABLED
App Engine default service account      nba-predictions-prod@appspot.gserviceaccount.com                            False
Compute Engine default service account  130738074716-compute@developer.gserviceaccount.com                          False
Circle CI Service Account               circleci-deployer@nba-predictions-prod.iam.gserviceaccount.com              False
Cloud Function Service Account          cloudfunction-service-account@nba-predictions-prod.iam.gserviceaccount.com  False


In [11]:
!gcloud iam service-accounts create cloudfunction-service-account \
    --display-name="Cloud Function Service Account" \
    --description="Account used to run all Cloud Functions with necessary BigQuery and Firestore Permissions" \
    --project={new_project_id}

ERROR: (gcloud.iam.service-accounts.create) Resource in projects [nba-predictions-prod] is the subject of a conflict: Service account cloudfunction-service-account already exists within project projects/nba-predictions-prod.
- '@type': type.googleapis.com/google.rpc.ResourceInfo
  resourceName: projects/nba-predictions-prod/serviceAccounts/cloudfunction-service-account@nba-predictions-prod.iam.gserviceaccount.com


In [12]:
!gcloud iam service-accounts create circleci-deployer \
    --display-name="Circle CI Service Account" \
    --description="Account used by Circle CI with necessary permissions to Deploy to Cloud Functions and App Engine" \
    --project={new_project_id}

ERROR: (gcloud.iam.service-accounts.create) Resource in projects [nba-predictions-prod] is the subject of a conflict: Service account circleci-deployer already exists within project projects/nba-predictions-prod.
- '@type': type.googleapis.com/google.rpc.ResourceInfo
  resourceName: projects/nba-predictions-prod/serviceAccounts/circleci-deployer@nba-predictions-prod.iam.gserviceaccount.com


Check service accounts were created successfully

In [14]:
!gcloud iam service-accounts list --project={new_project_id}

DISPLAY NAME                            EMAIL                                                                       DISABLED
App Engine default service account      nba-predictions-prod@appspot.gserviceaccount.com                            False
Compute Engine default service account  130738074716-compute@developer.gserviceaccount.com                          False
Circle CI Service Account               circleci-deployer@nba-predictions-prod.iam.gserviceaccount.com              False
Cloud Function Service Account          cloudfunction-service-account@nba-predictions-prod.iam.gserviceaccount.com  False


Programatically update the roles for the new service accounts using the guide found here: [Programatic Change Access](https://cloud.google.com/iam/docs/granting-changing-revoking-access#programmatic)

In [37]:
# Save policy file in directory above where the repo is saved so that it is not stored to github
file_directory = '..\..\policy.json'

In [38]:
# Write current policy to file directory
!gcloud projects get-iam-policy {new_project_id} --format json > {file_directory}

**If running jupyter notebook run below cell to load and modify policy file.**

In [39]:
import json

with open('..\..\policy.json') as f:
    policy = json.load(f)

def modify_policy_add_role(policy, role, member):
    """Adds a new role binding to a policy."""

    binding = {"members": [member],"role": role }
    policy["bindings"].append(binding)
    return policy

members = [f'serviceAccount:cloudfunction-service-account@{new_project_id}.iam.gserviceaccount.com', 
           f'serviceAccount:circleci-deployer@{new_project_id}.iam.gserviceaccount.com']
roles = {members[0]:['roles/bigquery.dataEditor','roles/datastore.user','roles/run.serviceAgent', 'roles/bigquery.user',
                    'roles/storage.admin'],
        members[1]:['roles/appengine.deployer','roles/appengine.serviceAdmin','roles/cloudbuild.builds.builder',
                   'roles/cloudfunctions.admin','roles/compute.storageAdmin','roles/iam.serviceAccountUser']}

for member in members:
    for role in roles[member]:
        policy = modify_policy_add_role(policy, role, member)

with open('..\..\policy.json', 'w') as json_file:
    json.dump(policy, json_file)

**If running code direct in console, navigate to file path and add the members and roles below in to the file path**  
**Change "your_project_id" to the name of your project id**

{"members": ["serviceAccount:cloudfunction-service-account@your_project_id.iam.gserviceaccount.com"], "role": "roles/bigquery.user"},  
{"members": ["serviceAccount:cloudfunction-service-account@your_project_id.iam.gserviceaccount.com"], "role": "roles/datastore.user"},  
{"members": ["serviceAccount:cloudfunction-service-account@your_project_id.iam.gserviceaccount.com"], "role": "roles/run.serviceAgent"},  
{"members": ["serviceAccount:circleci-deployer@your_project_id.iam.gserviceaccount.com"], "role": "roles/appengine.deployer"},   
{"members": ["serviceAccount:circleci-deployer@your_project_id.iam.gserviceaccount.com"], "role": "roles/appengine.serviceAdmin"},   
{"members": ["serviceAccount:circleci-deployer@your_project_id.iam.gserviceaccount.com"], "role": "roles/cloudbuild.builds.builder"},   
{"members": ["serviceAccount:circleci-deployer@your_project_id.iam.gserviceaccount.com"], "role": "roles/cloudfunctions.admin"},  
{"members": ["serviceAccount:circleci-deployer@your_project_id.iam.gserviceaccount.com"], "role": "roles/compute.storageAdmin"},  
{"members": ["serviceAccount:circleci-deployer@your_project_id.iam.gserviceaccount.com"], "role": "roles/iam.serviceAccountUser"}

In [40]:
!gcloud projects set-iam-policy {new_project_id} {file_directory}

bindings:
- members:
  - serviceAccount:circleci-deployer@nba-predictions-dev.iam.gserviceaccount.com
  - serviceAccount:circleci-gcf-deployer@nba-predictions-dev.iam.gserviceaccount.com
  role: roles/appengine.deployer
- members:
  - serviceAccount:circleci-deployer@nba-predictions-dev.iam.gserviceaccount.com
  - serviceAccount:circleci-gcf-deployer@nba-predictions-dev.iam.gserviceaccount.com
  role: roles/appengine.serviceAdmin
- members:
  - serviceAccount:service-188994400757@gcp-gae-service.iam.gserviceaccount.com
  role: roles/appengine.serviceAgent
- members:
  - serviceAccount:biqquery-service-account@nba-predictions-dev.iam.gserviceaccount.com
  - serviceAccount:cloudfunction-service-account@nba-predictions-dev.iam.gserviceaccount.com
  role: roles/bigquery.dataEditor
- members:
  - serviceAccount:cloudfunction-service-account@nba-predictions-dev.iam.gserviceaccount.com
  role: roles/bigquery.user
- members:
  - serviceAccount:service-188994400757@gcp-sa-bigquerydatatransfer.i

Updated IAM policy for project [nba-predictions-dev].


In [41]:
# Remove policy file 
!del {file_directory}

## Step 5 - Create App Engine Application & Firestore in Native Mode Database

In order to deploy a specific application you first need to create a placeholder application.

This application will get the latest infomration for each team from storage in Firestore in Native Mode. We create an empty database here to change the Firestore mode from Datastore Mode to Native Mode.

**Change YOUR_REGION to your default region**  
See [Regions and Zone](https://cloud.google.com/compute/docs/regions-zones) for more info

In [20]:
## TO DO: Change region to your default region
region = 'us-central'

In [21]:
!gcloud app create --region={region} --project={new_project_id}

You are creating an app for project [nba-predictions-prod].
cannot be changed. More information about regions is at
<https://cloud.google.com/appengine/docs/locations>.

ERROR: (gcloud.app.create) The project [nba-predictions-prod] already contains an App Engine application. You can deploy your application using `gcloud app deploy`.


In [22]:
!gcloud firestore databases create --region={region} --project={new_project_id}

Waiting for operation [apps/nba-predictions-prod/operations/f9a56b55-5ca7-45a1-ba00-815b0e29025d] to complete...
...................................done.
Success! Selected Google Cloud Firestore Native database for nba-predictions-prod


## Step 6 - Create BigQuery Dataset

Your new project will need a dataset to store the data if you plan on copying/creating your own repository of data.  

This has to be a unique name per project.  

In my workflows I have named the dataset 'nba' but feel free to change it. Note that if you do change it, then you will also need to change the dataset name in any of the other python scripts in this project appropriately. 

In [None]:
dataset_name = 'nba'

In [None]:
!bq --location=US mk --dataset \
--description "Stores all National Basketball Association Data. Created using Project Creation workbook found at https://github.com/cwilbar04/nba-predictions/tree/main/notebooks" \
{new_project_id}:{dataset_name}  

## Step 7 - Load BigQuery Tables

All data in this project is taken from [BASEKTBALL REFERENCE](https://www.basketball-reference.com/)

There are two options for loading the data to BigQuery:  
1. **Load the data yourself** 
    - Part 1: Raw Data
        - Navigate to [Initial Load Workbook](https://github.com/cwilbar04/nba-predictions/blob/main/notebooks/NBA%20Data%20Initial%20Load.ipynb) and change start date to desired starting date. For my model I loaded data starting from '10-1-1999'. Always choose a start date in between seasons if you don't want to get partial season data. Warning this may take a couple days and require re-starts. 
    - Part 2: Model Data
        - Navigate to [Initial Model Load Workbook](https://github.com/cwilbar04/nba-predictions/blob/main/notebooks/NBA%20Model%20Table%20Initial%20Load.ipynb) and change project and dataset names to what you used in the workbook then run all. <br><br>
2. **Copy Data**
    - For a quicker load process, simply copy the data directly from my public data set by running the code blocks below. You must completed Step 6 - Create BigQuery Dataset first. Be careful of costs if dataset you create is in a different region than US. At time of creation this is still in beta and there is no cost. See documentation here for latest info: [Copy Datasets](https://cloud.google.com/bigquery/docs/copying-datasets)

In [None]:
##### Copy Dataset Code Block. Only run if choosing option 2 above ####
## You first have to enable Data Transfer Service API ##
!gcloud services enable bigquerydatatransfer.googleapis.com --project={new_project_id}

In [None]:
##### Copy Dataset Code Block. Only run if choosing option 2 above ####
## Enabling the Data Transfer Service API can take a minute. Please wait and retry if you get an error"   ##
## Below code must be run in python. To run outside of python please replace {} with correct information. ##
## Params must be JSON formatted                                                                          ##
## Data will be transfered from my public data set to the dataset you created in Step 6 above ##

import json
source_parameters = '{"source_dataset_id":"nba", "source_project_id":"nba-predictions-dev", "overwrite_destination_table":"true"}'
source_parameters_json = json.dumps(source_parameters)
run = f'bq mk --transfer_config\
                --project_id={new_project_id}\
                --data_source=cross_region_copy\
                --target_dataset={dataset_name}\
                --display_name="Initial load of public NBA dataset"\
                --no_auto_scheduling\
                --params={source_parameters_json}'
!{run}

## Step 8 - Deploy Cloud Functions

This project uses three cloud functions that we will set up schedules for using Cloud Scheduler in order to update the data daily:
1. **nba_basketball_reference_scraper**
    - This funciton allows you to specify a start date and end date in a JSON header ({"StartDate":"1-1-1000","EndDate":"1-1-100"}) for game box scores and game player box scores from [BASEKTBALL REFERENCE](https://www.basketball-reference.com/) to nba.raw_basketballreference_game and nba.raw_basketballreference_playerbox.
    - If you don't provide a start date then it automatically uses the max game date from the raw_basketballreference_game table.
    - If you don't specify an end date then it automatically loads data up to yesterday (aka the last day games were guaranteed to be completed).
    - When we schedule this job we will not provide a start date or end date so it will always load the most recent data that is not already in the raw_basketballreference_game and raw_basketballreference_playerbox tables. <br><br>
       
2. **nba_model_game_refresh**
    - This function uses the view we will create in the next step to identify games that have been loaded to the raw_basketballreference_game table but have not been loaded in to the model_game_data table yet. It then performs all of the necessary transformations to combine specific player data stats and create moving average columns and load the data in to the model_game_data table.
    - This job also loads the most recent information for each team to Firestore that our web app uses when making predictions.
    - This job does not care what is in the JSON header.
    - We will schedule this to run daily one hour after the scraper function. <br><br>
    
3. **nba_get_upcoming_games**
    - This function gets the schedule from [BASEKTBALL REFERENCE](https://www.basketball-reference.com/) for one week, including "today" and overwrites the schedule file stored in the App Engine default cloud storage bucket. This schedule will be used to display upcoming games on our web page.
    - This function will be scheduled to run one hour before the scraper function.
  
**NOTE:** All three functions are set to be allow all users to invoke them in the current build. This is to avoid setting up credentialing for cloud scheduler. Future build will seek to remove this vulnerability by properly setting up Cloud Scheduler credentials.

**IMPORTANT** The deploy functions will only run if you have launched this notebook from a git cloned folder. Otherwise, you will need to change the "source" to the file path where the folders containing the relevant functions and requirements exist.

**NOTE:** Deploying can take some time, up to 5 minutes, for each function.

In [23]:
## Set variables used in each deploy. You should not need to change these if you have followed 
# all of the steps about in creating the service account and creating the app engine.
CLOUD_FUNCTION_SERVICE_ACCOUNT = f'cloudfunction-service-account@{new_project_id}.iam.gserviceaccount.com'
CLOUD_STORAGE_BUCKET = f'{new_project_id}.appspot.com'

In [43]:
# Deploy function
FUNCTION_NAME='nba_basketball_reference_scraper'

!gcloud functions deploy {FUNCTION_NAME} \
  --source=../scraper \
  --project={new_project_id} \
  --allow-unauthenticated \
  --entry-point=nba_basketballreference_scraper \
  --memory=1024MB \
  --runtime=python38 \
  --service-account={CLOUD_FUNCTION_SERVICE_ACCOUNT} \
  --trigger-http \
  --timeout=300

# Set policy on function to allow allUsers to invoke
!gcloud functions add-iam-policy-binding {FUNCTION_NAME} \
  --member=allUsers \
  --role=roles/cloudfunctions.invoker \
  --project={new_project_id}

availableMemoryMb: 1024
buildId: d6a390cd-db7e-4eb8-8fcd-72acca0e1ac7
entryPoint: nba_basketballreference_scraper
httpsTrigger:
  securityLevel: SECURE_OPTIONAL
  url: https://us-central1-nba-predictions-prod.cloudfunctions.net/nba_basketball_reference_scraper
ingressSettings: ALLOW_ALL
labels:
  deployment-tool: cli-gcloud
name: projects/nba-predictions-prod/locations/us-central1/functions/nba_basketball_reference_scraper
runtime: python38
serviceAccountEmail: cloudfunction-service-account@nba-predictions-prod.iam.gserviceaccount.com
sourceUploadUrl: https://storage.googleapis.com/gcf-upload-us-central1-9863c3ad-8942-48e1-9766-5969f5bffe72/4b19ce68-91dc-4975-af05-90fc3c6709ed.zip?GoogleAccessId=service-130738074716@gcf-admin-robot.iam.gserviceaccount.com&Expires=1615445289&Signature=NkiD7lpfg2S0tRQsxGTj9AMEwD%2FvlXuAZIQSnYH67kF3ABh5xtHn3xKF50t65KTu8jYOa2jAXPa%2BXqvzfBKZLCKd92GoWmwQu%2F2dLeecc%2BWdThZfE7Lwn7W6i%2FkUqnzx2cEcOlrB77%2FNnIUSxJGtvAUiglFec19ZkMYByUdn4CHmnI1YaxP%2F7vDz2%2BMGB

Deploying function (may take a while - up to 2 minutes)...
..
For Cloud Build Stackdriver Logs, visit: https://console.cloud.google.com/logs/viewer?project=nba-predictions-prod&advancedFilter=resource.type%3Dbuild%0Aresource.labels.build_id%3Dd6a390cd-db7e-4eb8-8fcd-72acca0e1ac7%0AlogName%3Dprojects%2Fnba-predictions-prod%2Flogs%2Fcloudbuild
......................................done.


bindings:
- members:
  - allUsers
  role: roles/cloudfunctions.invoker
etag: BwW9PMNIilw=
version: 1


In [26]:
#Deploy function
FUNCTION_NAME='nba_model_game_refresh'

!gcloud functions deploy {FUNCTION_NAME} \
  --source=../data_model \
  --project={new_project_id} \
  --allow-unauthenticated \
  --entry-point=create_model_data \
  --memory=1024MB \
  --runtime=python38 \
  --service-account={CLOUD_FUNCTION_SERVICE_ACCOUNT} \
  --trigger-http \
  --timeout=300

# Set policy on function to allow allUsers to invoke
!gcloud functions add-iam-policy-binding {FUNCTION_NAME} \
  --member=allUsers \
  --role=roles/cloudfunctions.invoker \
  --project={new_project_id}

availableMemoryMb: 1024
buildId: f9aa3bcf-dd35-4e88-8001-44d97be4cfc1
entryPoint: create_model_data
httpsTrigger:
  securityLevel: SECURE_OPTIONAL
  url: https://us-central1-nba-predictions-prod.cloudfunctions.net/nba_model_game_refresh
ingressSettings: ALLOW_ALL
labels:
  deployment-tool: cli-gcloud
name: projects/nba-predictions-prod/locations/us-central1/functions/nba_model_game_refresh
runtime: python38
serviceAccountEmail: cloudfunction-service-account@nba-predictions-prod.iam.gserviceaccount.com
sourceUploadUrl: https://storage.googleapis.com/gcf-upload-us-central1-9863c3ad-8942-48e1-9766-5969f5bffe72/8505b351-ff8f-4876-ad12-74e69ad9c9fb.zip?GoogleAccessId=service-130738074716@gcf-admin-robot.iam.gserviceaccount.com&Expires=1615444524&Signature=sETCP7kZi6LLLG%2FULTfLc2NPGY7SmxEY8K3c0VVkKzT3HltqhQ5z7aLQnWyjm%2B1ryFM00BdZk2df8SafnDukl2qRbCYaTzdr8zkEuGr5kYNsuGHniRv9JNzJ75rGOfjHpgLU7%2FvKmj%2FJmGtAtQwmRzYJDjZC8JcOUlJ6XJcx0Fv918S2TQCjThzokX%2F6U7%2BxvKnQIsFQk9o6WSfnObPRBEjZmn%2B5S5cya

Deploying function (may take a while - up to 2 minutes)...
.
For Cloud Build Stackdriver Logs, visit: https://console.cloud.google.com/logs/viewer?project=nba-predictions-prod&advancedFilter=resource.type%3Dbuild%0Aresource.labels.build_id%3Df9aa3bcf-dd35-4e88-8001-44d97be4cfc1%0AlogName%3Dprojects%2Fnba-predictions-prod%2Flogs%2Fcloudbuild
................................................................................done.


bindings:
- members:
  - allUsers
  role: roles/cloudfunctions.invoker
etag: BwW9PJl7v5A=
version: 1


In [27]:
# Deploy function
FUNCTION_NAME='nba_get_upcoming_games'

!gcloud functions deploy {FUNCTION_NAME} \
  --source=../get_schedule \
  --project={new_project_id} \
  --allow-unauthenticated \
  --entry-point=write_to_bucket \
  --memory=512MB \
  --runtime=python38 \
  --service-account={CLOUD_FUNCTION_SERVICE_ACCOUNT} \
  --trigger-http \
  --timeout=60 \
  --set-env-vars=CLOUD_STORAGE_BUCKET={CLOUD_STORAGE_BUCKET}

# Set policy on function to allow allUsers to invoke
!gcloud functions add-iam-policy-binding {FUNCTION_NAME} \
  --member=allUsers \
  --role=roles/cloudfunctions.invoker \
  --project={new_project_id}

availableMemoryMb: 512
buildId: 20939101-2161-4d0b-9750-1fb4b1d8a2cd
entryPoint: write_to_bucket
environmentVariables:
  CLOUD_STORAGE_BUCKET: nba-predictions-prod.appspot.com
httpsTrigger:
  securityLevel: SECURE_OPTIONAL
  url: https://us-central1-nba-predictions-prod.cloudfunctions.net/nba_get_upcoming_games
ingressSettings: ALLOW_ALL
labels:
  deployment-tool: cli-gcloud
name: projects/nba-predictions-prod/locations/us-central1/functions/nba_get_upcoming_games
runtime: python38
serviceAccountEmail: cloudfunction-service-account@nba-predictions-prod.iam.gserviceaccount.com
sourceUploadUrl: https://storage.googleapis.com/gcf-upload-us-central1-9863c3ad-8942-48e1-9766-5969f5bffe72/9293a9f2-e8d5-4327-8bba-6887f0cbf721.zip?GoogleAccessId=service-130738074716@gcf-admin-robot.iam.gserviceaccount.com&Expires=1615444656&Signature=jT2FBgjG1d%2Fbe9%2BITzEfWybsBlj%2FmqARh8Hp%2BpmjcZSorpCPsqq9G1bcdA9o3IIi1CFsrpS95LvRGGEB6d8EkZilcYSjSm8vUJ%2B%2BUH3dHI1K7xFGwm%2Bwd5EWsK6biEuxTllO0WUnhxX2lmoMoql5X

Deploying function (may take a while - up to 2 minutes)...
..
For Cloud Build Stackdriver Logs, visit: https://console.cloud.google.com/logs/viewer?project=nba-predictions-prod&advancedFilter=resource.type%3Dbuild%0Aresource.labels.build_id%3D20939101-2161-4d0b-9750-1fb4b1d8a2cd%0AlogName%3Dprojects%2Fnba-predictions-prod%2Flogs%2Fcloudbuild
..................................................................done.


bindings:
- members:
  - allUsers
  role: roles/cloudfunctions.invoker
etag: BwW9PKBI4Tw=
version: 1


## Step 9 - Create BigQuery View

In order to use the nba_model_game_refresh function we need to create a Big Query view that identifies what games have been loaded in to the raw_basektballrefernce_game table but have not been loaded in to the model_game_data table yet. Copying datasets does not copy views so we will always need to run this step even if you copied the entire dataset directly.

**IMPORTANT** If you ever change the number of games to use for the weighted moving average (W) then you will need to update this view as well. The game_number < filter needs to change to however many games you are averaging over. Future release will seek to remove this change dependency as it is too easy to miss.

In [49]:
## Choose how many games you want to include for the weighted moving average
W = 20

In [50]:
## Change dataset name (nba) if you chose a different dataset name earlier
view_name = 'nba.games_to_load_to_model'
view_query = f'CREATE OR REPLACE VIEW `{view_name}` AS \
WITH model_load_games as (SELECT \
distinct left(game_key,length(game_key)-1) as game_key \
FROM `nba.model_game` \
) \
    SELECT distinct order_of_games_per_team.game_key, \
    CASE WHEN model_load_games.game_key is NULL THEN 1 ELSE 0 END as NEEDS_TO_LOAD_TO_MODEL \
    FROM ( \
            SELECT team, game_key, row_number() OVER (PARTITION BY team ORDER BY game_date desc) as game_number \
            FROM ( \
                    SELECT \
                        home_team_name as team, game_date, game_key \
                    FROM  `nba.raw_basketballreference_game` \
                    UNION DISTINCT \
                    SELECT \
                        visitor_team_name as team, game_date, game_key \
                    FROM  `nba.raw_basketballreference_game` \
                 ) games_per_team \
            )order_of_games_per_team \
    LEFT JOIN model_load_games ON model_load_games.game_key = order_of_games_per_team.game_key \
    WHERE \
        game_number <= {W} \
        and team in ( \
                    SELECT \
                        distinct home_team_name as team_to_load \
                    FROM `nba.raw_basketballreference_game` \
                    WHERE \
                    game_date >= (SELECT date_sub(max(game_date), INTERVAL 1 YEAR) FROM `nba.raw_basketballreference_game` ) \
                    and game_key not in (SELECT game_key FROM model_load_games) \
                    UNION DISTINCT \
                    SELECT \
                        distinct visitor_team_name as team_to_load \
                    FROM `nba.raw_basketballreference_game` \
                    WHERE \
                    game_date >= (SELECT date_sub(max(game_date), INTERVAL 1 YEAR) FROM `nba.raw_basketballreference_game`) \
                    and game_key not in (SELECT game_key FROM model_load_games))'

run_view = f'''bq query --use_legacy_sql=false --project_id={new_project_id} "{view_query}"'''
!{run_view}

Replaced nba-predictions-prod.nba.games_to_load_to_model


Waiting on bqjob_r19e3ccb46c812488_000001781ff3213f_1 ... (0s) Current status: RUNNING
                                                                                      
Waiting on bqjob_r19e3ccb46c812488_000001781ff3213f_1 ... (0s) Current status: DONE   






## Step 10 - Create Cloud Scheduler Jobs

This is only required if you wish to keep your data up to date. If you do not need to keep the data up to date, simply make sure you execute the nba_model_game_refresh and nba_get_upcoming_games functions once in order for the Web App to be able to function with most recent game and upcoming schedule information.

In [30]:
## TO DO: Replace region with the region your cloud functions are deployed to and timezone with your desired scheduled time zone
region = 'us-central1'
timezone = 'America/Chicago'

In [31]:
# Create daily scraper schedule
uri = f'https://{region}-{new_project_id}.cloudfunctions.net/nba_basketball_reference_scraper'
!gcloud scheduler jobs create http nba_basketball_reference_scraper_daily --project {new_project_id} \
--schedule "0 6 * * *" --uri {uri} --http-method GET \
--time-zone={timezone} \
--description="Calls http cloud function nba_basketball_reference_scraper every day to scrape the most recent days information and add to big query tables"

# Create daily model refresh schedule
uri = f'https://{region}-{new_project_id}.cloudfunctions.net/nba_model_game_refresh'
!gcloud scheduler jobs create http nba_model_game_refresh_daily --project {new_project_id} \
--schedule "0 7 * * *" --uri {uri} --http-method GET \
--time-zone={timezone} \
--description="Calls http cloud function nba_model_game_refresh every day to load the most recently scraped data in to the model table and most recent data for each team to firestore"

# Create upcoming games refresh schedule
uri = f'https://{region}-{new_project_id}.cloudfunctions.net/nba_get_upcoming_games'
!gcloud scheduler jobs create http nba_get_upcoming_games --project {new_project_id} \
--schedule "0 5 * * *" --uri {uri} --http-method GET \
--time-zone={timezone} \
--description="Calls http cloud function nba_get_upcoming_games every day to scrape the schedule for the upcoming week and store to cloud storage"

## Step 11 - Trigger Cloud Functions

In order to populate Firestore with the most recent game data and cloud storage with the upcoming games the fdeploy functions must be triggered. This can be done in the Console or by using the script below.

In [32]:
empty_data = '{}'
empty_data = json.dumps(empty_data)

In [44]:
!gcloud functions call --project {new_project_id} nba_basketball_reference_scraper --data {empty_data}

executionId: mtjj6sp0slnc
result: Successfully loaded 0 row(s) to raw_basketballreference_playerbox and 0 to
  raw_basketballreference_game


In [34]:
!gcloud functions call nba_model_game_refresh --project {new_project_id} --data {empty_data}

executionId: 3pfcn0ytejkh
result: Function ended early. No new data to load.


In [35]:
!gcloud functions call nba_get_upcoming_games --project {new_project_id} --data {empty_data}

executionId: i522ky9qn7h6
result: Successfully updated bucket with upcoming games


## Step 12 - Create Static Model Training Data View

For tranparency and auditability we create a view using the model_game table for specific dates and a timestamped name. This will allow us to come back to train different models on the same data. These are created as views so they are not part of what is copied externally but you could create these as tables instead if desired but would have to pay for additional storage costs.

In [45]:
#TO DO: Change timezone to your timezone if desired
timezone = 'America/Chicago'

In [54]:
## Create view that excludes first game in every seasons because rest days will be way off. 
#It will use moving average dating back to previous season.

query = f"""EXECUTE IMMEDIATE CONCAT(' \
                CREATE OR REPLACE VIEW `nba.model_training_data_' \
                , FORMAT_DATE('%Y%m%d', CURRENT_DATE(\\"{timezone}\\")) \
            ,'` AS \
                SELECT * FROM ( \
                    SELECT \
                        *, \
                        ROW_NUMBER() OVER (PARTITION BY g.SEASON, g.TEAM ORDER BY g.game_date asc) as SEASON_GAME_NUMBER, \
                    FROM nba.model_game g \
            ) WHERE SEASON_GAME_NUMBER > 1 and is_home_team = 1 \
                and game_date < DATE_SUB(CURRENT_DATE(\\"{timezone}\\"), INTERVAL 1 WEEK)')"""

run_query = f'''bq query --use_legacy_sql=false --project_id={new_project_id} "{query}"'''

!{run_query}

CREATE OR REPLACE VIEW `nba.model_training_data_20210311` AS                 SELECT * FROM (                     SELECT                         *,                         ROW_NUMBER() OVER (PARTITION BY g.SEASON, g.TEAM ORDER BY g.game_date asc) as SEASON_GAME_NUMBER,                     FROM nba.model_game g             ) WHERE SEASON_GAME_NUMBER > 1 and is_home_team = 1                 and game_date < DATE_SUB(CURRENT_DATE("America/Chicago"), INTERVAL 1 WEEK); 
-- at Dynamic SQL[1:18]
-- at [1:1]
Replaced nba-predictions-prod.nba.model_training_data_20210311




Waiting on bqjob_rdb38301aabf049b_000001781ff6c721_1 ... (0s) Current status: RUNNING
                                                                                     
Waiting on bqjob_rdb38301aabf049b_000001781ff6c721_1 ... (0s) Current status: DONE   


## Step 13 - Create Baseline Linear Model using View

We will now use the data in the view we just created to generate a linear model on all of the relevant variables. 

You definiltey will want to open the Console to explore the model further but that is left as a separate task.

**NOTE:** This is the most time consuming and costly step. Be careful with running this too many times but definitely expirement with different modeling types.

In [55]:
## If running Step 13 on the same date as Step 12 execute this cell to set the view date
from datetime import datetime
view_date = datetime.now().strftime('%Y%m%d')

##If running Step 13 on a different day than Step 12 change the date here to the date you created the view in Step 12
view_date = '20210310'

In [56]:
model_query = f"""CREATE OR REPLACE MODEL nba.baseline_linear_model \
  OPTIONS(model_type='LINEAR_REG', input_label_cols=['spread']) \
    AS SELECT spread, \
        is_home_team, \
        incoming_is_win_streak, \
        incoming_is_win_streak_opponent, \
        incoming_wma_{W}_pace, \
        incoming_wma_{W}_efg_pct, \
        incoming_wma_{W}_tov_pct, \
        incoming_wma_{W}_ft_rate, \
        incoming_wma_{W}_off_rtg, \
        incoming_wma_{W}_opponent_efg_pct, \
        incoming_wma_{W}_opponent_tov_pct, \
        incoming_wma_{W}_opponent_ft_rate, \
        incoming_wma_{W}_opponent_off_rtg, \
        incoming_wma_{W}_starter_minutes_played_proportion, \
        incoming_wma_{W}_bench_plus_minus,\
        incoming_wma_{W}_opponnent_starter_minutes_played_proportion, \
        incoming_wma_{W}_opponent_bench_plus_minus, \
        incoming_rest_days - incoming_rest_days_opponent as rest_days_difference \
    FROM `nba.model_training_data_{view_date}`"""

model_query = f'''bq query --use_legacy_sql=false --project_id={new_project_id} "{model_query}"'''

!{model_query}

Created nba-predictions-prod.nba.baseline_linear_model




Waiting on bqjob_r26d98112aec956fe_000001781ff6ec32_1 ... (0s) Current status: RUNNING
                                                                                      
Waiting on bqjob_r26d98112aec956fe_000001781ff6ec32_1 ... (1s) Current status: RUNNING
                                                                                      
Waiting on bqjob_r26d98112aec956fe_000001781ff6ec32_1 ... (2s) Current status: RUNNING
                                                                                      
Waiting on bqjob_r26d98112aec956fe_000001781ff6ec32_1 ... (3s) Current status: RUNNING
                                                                                      
Waiting on bqjob_r26d98112aec956fe_000001781ff6ec32_1 ... (4s) Current status: RUNNING
                                                                                      
Waiting on bqjob_r26d98112aec956fe_000001781ff6ec32_1 ... (5s) Current status: RUNNING
                                          

## Step 14 Deploy App Engine App

We are finally ready to deploy the app engine web app! If you have sucessfully completed all steps above then you should be able to navigate to a webpage that works the same as the [webpage](https://nba-predictions-prod.uc.r.appspot.com/) in the Readme.

As a prequisite, make sure you are running this notebook in the folder from the gitclone or be sure to replace the file paths below with the correct file path. 

In [57]:
!gcloud app deploy ../webapp/app.yaml --project={new_project_id} --promote --quiet
print(f'Check you your new web page at https://{new_project_id}.uc.r.appspot.com/')

Services to deploy:

descriptor:      [C:\environments\nba-predictions\nba-predictions\webapp\app.yaml]
source:          [C:\environments\nba-predictions\nba-predictions\webapp]
target project:  [nba-predictions-prod]
target service:  [default]
target version:  [20210311t002656]
target url:      [http://nba-predictions-prod.uc.r.appspot.com]


Beginning deployment of service [default]...
#= Uploading 11 files to Google Cloud Storage               =#
File upload done.
Updating service [default]...
..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

## Optional - Delete Project

To avoid on-going charges for everything created in this workbook run the below command to delete the project that you just created. Note it will take approximately 30 days for full completion and you will stil be charged for any charges accrued during this walkthrough. Check you [Deleting GCP Project](https://cloud.google.com/resource-manager/docs/creating-managing-projects?visit_id=637510410447506984-2569255859&rd=1#shutting_down_projects) for more information.

In [None]:
!gcloud projects delete {new_project_id}