# Train AutoML Model

To train and host our classification model, we will be using AutoML Tabular Datasets in Google Cloud Vertex AI. We will be training a classification algorithm in order to predict whether a particular user likes the Beatles (essentially a binary classification problem).


First, we must create our Vertex AI Dataset.

In [None]:
PROJECT_NAME = 'mwpmltr'
LOCATION = "us-central1"
MODEL_NAME = "beatles_automl_file_out_2485_tags"
TARGET_COLUMN = "Like_The_Beatles"

In [None]:
from google.cloud import aiplatform

In [None]:
def create_and_import_dataset_tabular_gcs_sample(
    display_name: str,
    project: str,
    location: str,
    gcs_source: list,
) -> None:
    """Creates a AutoML Tabular Dataset in Google Cloud Vertex AI"""
    aiplatform.init(project=project, location=location)

    dataset = aiplatform.TabularDataset.create(
        display_name=display_name,
        gcs_source=gcs_source,
    )

    dataset.wait()

    print(f'\tDataset: "{dataset.display_name}"')
    print(f'\tname: "{dataset.resource_name}"')

In [None]:
dataset_name = f'{MODEL_NAME}_v1'
create_and_import_dataset_tabular_gcs_sample(dataset_name, PROJECT_NAME, LOCATION, 'gs://csalling-docai-datasets-regional/beatles/file_out_2485_tags.csv')

In [None]:
dataset = aiplatform.TabularDataset('projects/55590906972/locations/us-central1/datasets/2545107734534029312')

job = aiplatform.AutoMLTabularTrainingJob(
    display_name=f"{MODEL_NAME}-automl",
    optimization_prediction_type="classification"
)

model = job.run(
    dataset=dataset,
    target_column=TARGET_COLUMN,
    predefined_split_column_name="data_split",
    budget_milli_node_hours=1000,
    model_display_name=f"{MODEL_NAME}-automl",
    disable_early_stopping=False,
)