# Train AutoML Model

To train and host our classification model, we will be using AutoML Tabular Datasets in Google Cloud Vertex AI. We will be training a classification algorithm in order to predict whether a particular user likes the Beatles (essentially a binary classification problem).


First, we must create our Vertex AI Dataset.

In [1]:
PROJECT_NAME = 'mwpmltr'
LOCATION = "us-central1"
MODEL_NAME = "beatles_automl_file_out_2485_tags"
TARGET_COLUMN = "Like_The_Beatles"

In [2]:
from google.cloud import aiplatform

In [3]:
def create_and_import_dataset_tabular_gcs_sample(
    display_name: str,
    project: str,
    location: str,
    gcs_source: list,
) -> None:
    """Creates a AutoML Tabular Dataset in Google Cloud Vertex AI"""
    aiplatform.init(project=project, location=location)

    dataset = aiplatform.TabularDataset.create(
        display_name=display_name,
        gcs_source=gcs_source,
    )

    dataset.wait()

    print(f'\tDataset: "{dataset.display_name}"')
    print(f'\tname: "{dataset.resource_name}"')

In [4]:
dataset_name = f'{MODEL_NAME}_v1'
create_and_import_dataset_tabular_gcs_sample(dataset_name, PROJECT_NAME, LOCATION, 'gs://csalling-docai-datasets-regional/beatles/file_out_2485_tags.csv')

Creating TabularDataset
Create TabularDataset backing LRO: projects/55590906972/locations/us-central1/datasets/6508517299178176512/operations/4289139678214356992
TabularDataset created. Resource name: projects/55590906972/locations/us-central1/datasets/6508517299178176512
To use this TabularDataset in another session:
ds = aiplatform.TabularDataset('projects/55590906972/locations/us-central1/datasets/6508517299178176512')
	Dataset: "beatles_automl_file_out_2485_tags_v1"
	name: "projects/55590906972/locations/us-central1/datasets/6508517299178176512"


In [5]:
dataset = aiplatform.TabularDataset('projects/55590906972/locations/us-central1/datasets/2545107734534029312')

job = aiplatform.AutoMLTabularTrainingJob(
    display_name=f"{MODEL_NAME}-automl",
    optimization_prediction_type="classification"
)

model = job.run(
    dataset=dataset,
    target_column=TARGET_COLUMN,
    predefined_split_column_name="data_split",
    budget_milli_node_hours=1000,
    model_display_name=f"{MODEL_NAME}-automl",
    disable_early_stopping=False,
)

No column transformations provided, so now retrieving columns from dataset in order to set default column transformations.
The column transformation of type 'auto' was set for the following columns: ['tag_romantic', 'tag_blues_rock', 'tag_ninja_tune', 'tag_stoner_rock', 'tag_gothic_metal', 'Mumford_and_Sons', 'Pavement', 'Cake', 'The_Offspring', 'Soilwork', 'Death_Cab_for_Cutie', 'Paramore', 'Portishead', 'Tame_Impala', 'tag_hip_hop', 'tag_speed_metal', 'The_Decemberists', 'Ellie_Goulding', 'tag_french', 'tag_lord_of_the_rings', 'tag_vocal_jazz', 'tag_jam', 'Phoenix', 'Florence__and__the_Machine', 'ZZ_Top', 'tag_math_rock', 'Eels', 'Bon_Iver', 'Breaking_Benjamin', 'tag_blues', 'Queens_of_the_Stone_Age', 'tag_industrial_metal', 'Duran_Duran', 'Alanis_Morissette', 'The_National', 'Thievery_Corporation', 'Nirvana', 'tag_folk_metal', 'The_Black_Keys', 'tag_atmospheric', 'Kate_Bush', 'Four_Tet', 'user_name', 'Blind_Guardian', 'tag_composers', 'tag_electro', 'CHVRCHES', 'tag_britney_spears',