# Train AutoML Model

To train and host our classification model, we will be using AutoML Tabular Datasets in Google Cloud Vertex AI. We will be training a classification algorithm in order to predict whether a particular user likes the Beatles (essentially a binary classification problem).


First, we must create our Vertex AI Dataset.

In [6]:
PROJECT_NAME = 'ds-training-380514'
LOCATION = "us-central1"
MODEL_NAME = "beatles_automl_file_out_2485_tags"
TARGET_COLUMN = "Like_The_Beatles"

In [7]:
from google.cloud import aiplatform

In [8]:
def create_and_import_dataset_tabular_gcs_sample(
    display_name: str,
    project: str,
    location: str,
    gcs_source: list,
) -> None:
    """Creates a AutoML Tabular Dataset in Google Cloud Vertex AI"""
    aiplatform.init(project=project, location=location)

    dataset = aiplatform.TabularDataset.create(
        display_name=display_name,
        gcs_source=gcs_source,
    )

    dataset.wait()

    print(f'\tDataset: "{dataset.display_name}"')
    print(f'\tname: "{dataset.resource_name}"')

In [9]:
dataset_name = f'{MODEL_NAME}_v1'
create_and_import_dataset_tabular_gcs_sample(dataset_name, PROJECT_NAME, LOCATION, 'gs://aaa-aca-ml-workshop/beatles/file_out_2485_tags.csv')

Creating TabularDataset
Create TabularDataset backing LRO: projects/354621994428/locations/us-central1/datasets/6306981215853346816/operations/8613369376675987456
TabularDataset created. Resource name: projects/354621994428/locations/us-central1/datasets/6306981215853346816
To use this TabularDataset in another session:
ds = aiplatform.TabularDataset('projects/354621994428/locations/us-central1/datasets/6306981215853346816')
	Dataset: "beatles_automl_file_out_2485_tags_v1"
	name: "projects/354621994428/locations/us-central1/datasets/6306981215853346816"


In [10]:
dataset = aiplatform.TabularDataset('projects/354621994428/locations/us-central1/datasets/6306981215853346816')

job = aiplatform.AutoMLTabularTrainingJob(
    display_name=f"{MODEL_NAME}-automl",
    optimization_prediction_type="classification"
)

model = job.run(
    dataset=dataset,
    target_column=TARGET_COLUMN,
    predefined_split_column_name="data_split",
    budget_milli_node_hours=1000,
    model_display_name=f"{MODEL_NAME}-automl",
    disable_early_stopping=False,
)

No column transformations provided, so now retrieving columns from dataset in order to set default column transformations.
The column transformation of type 'auto' was set for the following columns: ['Children_of_Bodom', 'tag_celtic', 'tag_latin', 'tag_power_metal', 'Beastie_Boys', 'tag_surf', 'Anathema', 'Animal_Collective', 'Imagine_Dragons', 'Opeth', 'user_name', 'The_Shins', 'tag_math_rock', 'tag_funk_rock', 'tag_japanese', 'tag_pop_rock', '65daysofstatic', 'Bjork', 'Audioslave', 'Grimes', 'Ludwig_van_Beethoven', 'tag_classical', 'Elvis_Presley', 'tag_lord_of_the_rings', 'tag_elephant_6', 'tag_legend', 'tag_experimental_hip_hop', 'Nujabes', 'Infected_Mushroom', 'tag_alternative_rock', 'tag_dnb', 'tag_noisegrind', 'tag_metalcore', 'tag_90s', 'tag_comedy', 'Joy_Division', 'tag_viking_metal', 'Kanye_West', 'Genesis', 'tag_thrash_metal', 'tag_post_hardcore', 'Belle_and_Sebastian', 'tag_gothic_metal', 'John_Williams', 'A_Tribe_Called_Quest', 'Sufjan_Stevens', 'tag_final_fantasy', 'tag_w