# Create Google Cloud Resources and Train with AutoML

## Google Cloud Setup
1. To train a twitter sentiment classification model with Google's AutoML cloud solution you must create a Google Cloud account [here](cloud.google.com)

2. In the Google Cloud console create a new Google Cloud project.

3. Confirm that billing is enabled for you Google Cloud project - [learn more](https://cloud.google.com/billing/docs/how-to/modify-project)

4. Enable the AutoML and Cloud Storage APIs [here](https://console.cloud.google.com/flows/enableapi?apiid=storage-component.googleapis.com,automl.googleapis.com,storage-api.googleapis.com&redirect=https://console.cloud.google.com&_ga=2.19444408.1477944611.1615487721-641531934.1615487721)

5. Create a service account with the roles `AutoML Admin` and `AutoML Service Agent` and download a key file for it [here](https://cloud.google.com/iam/docs/creating-managing-service-accounts#creating_a_service_account)

6. Get the [Project ID](https://cloud.google.com/resource-manager/docs/creating-managing-projects#identifying_projects) for your Cloud project

7. Create a [Cloud Storage bucket](https://cloud.google.com/storage/docs/creating-buckets) to import and store data in for model training

8. Follow steps in this notebook

---

__Resources__


* https://cloud.google.com/natural-language/automl/docs/how-to   

* https://github.com/GoogleCloudPlatform/ai-platform-samples/blob/master/notebooks/samples/tables/census_income_prediction/getting_started_notebook.ipynb



## Setup and Authenticate
Fill in the necessary account information to begin.

In [None]:
!pip install --upgrade --quiet --user google-cloud-automl

In [None]:
import sys

from google.cloud import automl

In [None]:
PROJECT_ID = '' #@param {type:"string"}
DISPLAY_NAME = '' #@param {type:"string"}
BUCKET_NAME = '' #@param {type:"string"}
COMPUTE_REGION = 'us-central1' #@param {type:"string"}

In [None]:
if 'google.colab' in sys.modules:    
  from google.colab import files
  keyfile_upload = files.upload()
  keyfile = list(keyfile_upload.keys())[0]
  %env GOOGLE_APPLICATION_CREDENTIALS $keyfile
  !gcloud auth activate-service-account --key-file $keyfile

In [None]:
!gsutil ls -al gs://$BUCKET_NAME

## Create Dataset and Import Data
Make sure the variable values are set correctly for your purposes.

In [None]:
# A name for the AutoML tables Dataset to create.
DATASET_DISPLAY_NAME = 'twitter_data' #@param {type: 'string'}
# The GCS data to import data from (doesn't need to exist).
INPUT_CSV_NAME = 'clean_twitter_data.csv' #@param {type: 'string'}
# A name for the AutoML tables model to create.
MODEL_DISPLAY_NAME = 'twitter_sentiment_model' #@param {type: 'string'}

assert all([
    PROJECT_ID,
    COMPUTE_REGION,
    DATASET_DISPLAY_NAME,
    INPUT_CSV_NAME,
    MODEL_DISPLAY_NAME,
])

## Import Training Data
First we create the dataset in Google Cloud. Then we populate it with our data. Lastly, we test our import be reading the dataset from Google Cloud.

In [None]:
client = automl.AutoMlClient()
project_location = 'projects/{}/locations/{}'.format(PROJECT_ID, COMPUTE_REGION)

metadata = automl.TextSentimentDatasetMetadata(
    sentiment_max=1
) 

dataset = automl.Dataset(
    display_name=DATASET_DISPLAY_NAME, text_sentiment_dataset_metadata=metadata
)

response = client.create_dataset(parent=project_location, dataset=dataset)
created_dataset = response.result()
dataset_id = created_dataset.name.split('/')[-1]

print('Dataset name: {}'.format(created_dataset.name))
print('Dataset id: {}'.format(dataset_id))

In [None]:
gcs_dataset_uri = 'gs://{}/{}.csv'.format(BUCKET_NAME, INPUT_CSV_NAME)

if 'google.colab' in sys.modules:    
  from google.colab import files
  dataset_upload = files.upload()
  dataset_csv = list(dataset_upload.keys())[0]

!gsutil ls gs://$BUCKET_NAME || gsutil mb -l $COMPUTE_REGION gs://$BUCKET_NAME
!gsutil cp $dataset_csv $gcs_dataset_uri

In [None]:
dataset_full_id = client.dataset_path(PROJECT_ID, COMPUTE_REGION, dataset_id)

input_uris = gcs_dataset_uri.split(",")
gcs_source = automl.GcsSource(input_uris=input_uris)
input_config = automl.InputConfig(gcs_source=gcs_source)

response = client.import_data(name=dataset_full_id, input_config=input_config)

print('Processing import...')
print('Data imported. {}'.format(response.result()))

## Train Model
Once the dataset has imported we can start the training job on Google Cloud. This takes several hours. You will be email when it completes.

In [None]:
metadata = automl.TextSentimentModelMetadata()
model = automl.Model(
    display_name=MODEL_DISPLAY_NAME,
    dataset_id=dataset_id,
    text_sentiment_model_metadata=metadata,
)

response = client.create_model(parent=project_location, model=model)

print('Training operation name: {}'.format(response.operation.name))
print('Training started...')


Confirm your model finished training and is accessible via GCS.

In [None]:
created_model = response.result()
print('Training finished. {}'.format(created_model))

## Deploy Model
Deploy the model once training has completed. Once the model is deployed verify its deployment.

In [None]:
response = client.deploy_model(name=created_model.name)
print(f"Model deployment finished. {response.result()}")