# Google Cloud Platform Project Creation Workbook 
 
Use this workbook to create a google cloud project with everything needed to collect new data and host your own web app. 
 
Prerequisites:  
+ Create Google user account  <br><br>
+ Create your own personal Google Cloud Project and Enable Billing
    - Enable Free Tier account by seleting "Try it Free" here: [Try Google Cloud Platform for free](https://cloud.google.com/cloud-console)
    - Follow steps to activate billing found here: [Create New Billing Account](https://cloud.google.com/billing/docs/how-to/manage-billing-account#create_a_new_billing_account)
        - Billing account is required for APIs used in this project
        - You will not exceed the $300 free trial setting up this project but make sure to delete the project if you do not want to be charged
        - Take note of project name created because this billing account will be used with the new project <br><br>
+ Install and initialize Google Cloud SDK by following instructions found here: [Cloud SDK Quickstart](https://cloud.google.com/sdk/docs/quickstart) <br><br>

## Step 1 - Check Prequisites Successfully Completed
Check that you have successfully installed and enabled Cloud SDK by running the config list command. If you get an error please refer to Troubleshooting steps found here [Cloud SDK Quickstart](https://cloud.google.com/sdk/docs/quickstart).  
You should see an output that includes your account along with any other configuration setup when using gcloud init

In [None]:
!gcloud config list

In [None]:
#!gcloud auth login

## Step 2 - Create GCP Project

In [None]:
###### TO DO: Enter name for new project
###### Note: Proect name must be unique across GCP. If you get error when creating project please change the project name here and try again.

new_project_id = 'spark-container-testing-5'

In [None]:
!gcloud projects create {new_project_id}

In [None]:
!gcloud config set project {new_project_id}

#### IMPORTANT
*****TO DO: Navigate to [Cloud Console](https://console.cloud.google.com/), Change to new project, and enable billing following instructions found here: [Enable Billing](https://cloud.google.com/billing/docs/how-to/modify-project#enable_billing_for_a_project)***


## Step 3 - Enable Necessary Cloud Services

This project uses:
+ Google Kubernetes Engine for a kubernetes cluster manager
+ Google Container Registry to store spark Docker container images
  
List below contains all services needed at time of creation of this workbook. Please add/remove from this list if the names/necessary services have changed.

In [None]:
enable_services_list = [
    'bigquery.googleapis.com',
    'bigquerystorage.googleapis.com',
    'cloudapis.googleapis.com',
    'cloudbuild.googleapis.com',
    'clouddebugger.googleapis.com',
    'cloudtrace.googleapis.com',
    'compute.googleapis.com',
    'container.googleapis.com',
    'containeranalysis.googleapis.com',
    'containerregistry.googleapis.com',
    'iam.googleapis.com ',
    'iamcredentials.googleapis.com ',
    'language.googleapis.com',
    'oslogin.googleapis.com',
    'servicemanagement.googleapis.com',
    'serviceusage.googleapis.com',
    'sql-component.googleapis.com',
    'storage-api.googleapis.com',
    'storage-component.googleapis.com',
    'storage.googleapis.com'    
]

In [None]:
## Services can only be enabled 20 at a time at the time of workbook creation. Use this loop to enable 20 at a time.
for x in range(0,len(enable_services_list),20):
    !gcloud services enable {' '.join(enable_services_list[x:(x+20)])} --project={new_project_id}   

In [None]:
# Check that services were enabled
!gcloud services list --project={new_project_id}

## Step 4 - Create Necessary Service Accounts

There are two primary service accounts used in this project:  
- **Deployment Service Account**
    - We create this and add necessary roles below using the Cloud SDK
    - deployer-sa@your_project_name.iam.gserviceaccount.com
    - This account is used to deploy and test docker container and kubernetes cluster<br><br>
- **BigQuery Service Account**
    - We create this and add necessary roles below using the Cloud SDK
    - bigquery-sa@your_project_name.iam.gserviceaccount.com
    - This account is used in the container for access to big query

Check what service ccounts are already created (should be the two default ones described above)

In [None]:
!gcloud iam service-accounts list --project={new_project_id}

In [None]:
!gcloud iam service-accounts create deployer-sa \
    --display-name="Deployment Service Account" \
    --description="Account used to deploy to Google Cloud Project" \
    --project={new_project_id}

In [None]:
!gcloud iam service-accounts create bigquery-sa \
    --display-name="BigQuery Service Account" \
    --description="Account used by Spark Containers to Connect to BigQuery" \
    --project={new_project_id}

Check service accounts were created successfully

In [None]:
!gcloud iam service-accounts list --project={new_project_id}

Programatically update the roles for the new service accounts using the guide found here: [Programatic Change Access](https://cloud.google.com/iam/docs/granting-changing-revoking-access#programmatic)

In [None]:
# Save policy file in directory above where the repo is saved so that it is not stored to github
file_directory = '..\..\policy.json'

In [None]:
# Write current policy to file directory
!gcloud projects get-iam-policy {new_project_id} --format json > {file_directory}

**If running jupyter notebook run below cell to load and modify policy file.**

In [None]:
import json

with open('..\..\policy.json') as f:
    policy = json.load(f)

def modify_policy_add_role(policy, role, member):
    """Adds a new role binding to a policy."""

    binding = {"members": [member],"role": role }
    policy["bindings"].append(binding)
    return policy

members = [f'serviceAccount:deployer-sa@{new_project_id}.iam.gserviceaccount.com', 
           f'serviceAccount:bigquery-sa@{new_project_id}.iam.gserviceaccount.com']
roles = {
        members[0]:['roles/editor'],
        members[1]:['roles/bigquery.dataEditor','roles/run.serviceAgent', 'roles/bigquery.user',
                    'roles/storage.admin']}

for member in members:
    for role in roles[member]:
        policy = modify_policy_add_role(policy, role, member)

with open('..\..\policy.json', 'w') as json_file:
    json.dump(policy, json_file)

In [None]:
!gcloud projects set-iam-policy {new_project_id} {file_directory}

In [None]:
# Remove policy file 
!del {file_directory}

## Step 5 - Create Kubernetes Engine Cluster

In order to deploy a container to kubernetes to run an application you first need to create a kubernetes engine cluster

In [None]:
## TO DO: Change region  to your default region
COMPUTE_REGION = 'us-central1'
CLUSTER_NAME = 'spark-cluster'
COMPUTE_ZONE = 'us-central1-c'

In [None]:
#!gcloud compute regions list

In [None]:
!gcloud config set compute/region {COMPUTE_REGION}

In [None]:
!gcloud config set compute/zone {COMPUTE_ZONE}

In [None]:
# Create cluster with default settings. This may take serveral minutes
!gcloud container clusters create {CLUSTER_NAME} \
    --project={new_project_id}

In [None]:
# Get credentials to use when deploying to cluster
!gcloud container clusters get-credentials {CLUSTER_NAME}

In [None]:
account = f'bigquery-sa@{new_project_id}.iam.gserviceaccount.com' 

In [None]:
# Download bigquery service account json file
!gcloud iam service-accounts keys create sa.json \
    --iam-account={account}

In [None]:
# Create Kubernetes Secret from file
!kubectl create secret generic bigquery-credentials \
  --from-file ./sa.json

In [None]:
# Remove servie account file from local system now that Kubernetes Secret
!del sa.json

## Step 6 - Create BigQuery Dataset

Your new project will need a dataset to store the data if you plan on copying/creating your own repository of data.  

This has to be a unique name per project.  

In my workflows I have named the dataset 'nba' but feel free to change it. Note that if you do change it, then you will also need to change the dataset name in any of the other python scripts in this project appropriately. 

In [None]:
dataset_name = 'amazon_reviews'

In [None]:
#Stop and re-run if this takes more than a minute
!bq --location=US mk --dataset \
--description "Stores transformed amazon review data orginally found at https://nijianmo.github.io/amazon/index.html" \
{new_project_id}:{dataset_name}  

## Step 7 - Build and Push Container to GCR



In [None]:
!docker build -t client-mode-spark-notebook ../client-mode

In [None]:
!docker tag client-mode-spark-notebook:latest gcr.io/{new_project_id}/client-mode-spark-notebook:latest

In [None]:
!docker push gcr.io/{new_project_id}/client-mode-spark-notebook:latest

## Step 8 - Deploy App to Cluster and Expose


In [None]:
APP_NAME = 'spark-server'

In [None]:
template = f'''
apiVersion: apps/v1
kind: Deployment
metadata:
  name: spark-server
  labels:
    app: spark-server
spec:
  replicas: 1
  selector:
    matchLabels:
      app: spark-server
  template:
    metadata:
      labels:
        app: spark-server
    spec:
      # The secret data is exposed to Containers in the Pod through a Volume.
      volumes:
      - name: secret-volume
        secret:
          secretName: bigquery-credentials
      containers:
      - image: gcr.io/{new_project_id}/client-mode-spark-notebook:latest
        name: client-mode-spark-notebook
        volumeMounts:
          # name must match the volume name below
          - name: secret-volume
            mountPath: /var/secrets/google    
'''

In [None]:
with open('deployment.yaml', 'w') as file:
    file.write(template)  

In [None]:
# !kubectl create deployment {APP_NAME} --save_config=true --image=gcr.io/{new_project_id}/client-mode-spark-notebook:latest --template={template} 

In [None]:
# !kubectl delete deployment {APP_NAME}

In [None]:
!kubectl apply -f deployment.yaml

In [None]:
!kubectl expose deployment {APP_NAME} --type LoadBalancer --port 80 --target-port 8888

In [None]:
!kubectl describe services

## Step 9 - Create Cloud Storage Bucket

In order to load data to BigQuery with Spark we need a temporary Cloud Storage Bucket we create below.

Bucket names must be globally unique so we include the project name.

When running the Jupyter Notebook to load the data you will have to set the project ID so that it uses this bucket.

In [None]:
!gsutil mb gs://amazon_reviews_bucket-{new_project_id}

## Optional - Delete Project

To avoid on-going charges for everything created in this workbook run the below command to delete the project that you just created. Note it will take approximately 30 days for full completion and you will stil be charged for any charges accrued during this walkthrough. Check out [Deleting GCP Project](https://cloud.google.com/resource-manager/docs/creating-managing-projects?visit_id=637510410447506984-2569255859&rd=1#shutting_down_projects) for more information.

In [None]:
### Uncomment code to delete project
!gcloud projects delete {new_project_id}