# MLOps stage 1 : data management

## Overview

This tutorial demonstrates how to use Vertex AI for E2E MLOps on Google Cloud in production

## Objective

In this tutorial, you learn how to use BigQuery as a dataset for training with Vertex AI.

This tutorial uses the following Google Cloud ML services:
- Vertex AI Datasets
- BigQuery Datasets

The steps performed include:
- Create a BigQuery dataset from CSV files.
- Create a Vertex AI Dataset resource from BigQuery table
- Select rows from a BigQuery dataset into a pandas dataframe -- compatible for custom training.

## Dataset

The dataset used in this example is the [Synthetic Financial Fraud dataset from Kaggle](https://www.kaggle.com/datasets/ealaxi/paysim1). PaySim simulates mobile money transactions based on a sample of real transactions extracted from one month of financial logs from a mobile money service implemented in an African country. The original logs were provided by a multinational company, who is the provider of the mobile financial service which is currently running in more than 14 countries all around the world.

### Installations

In [3]:
import os

# The Vertex AI Workbench Notebook product has specific requirements
IS_WORKBENCH_NOTEBOOK = os.getenv("DL_ANACONDA_HOME") and not os.getenv("VIRTUAL_ENV")
IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(
    "/opt/deeplearning/metadata/env_version"
)

# Vertex AI Notebook requires dependencies to be installed with '--user'
USER_FLAG = ""
if IS_WORKBENCH_NOTEBOOK:
    USER_FLAG = "--user"

extra_pkgs = "google-cloud-bigquery"
! pip3 install --upgrade --quiet {USER_FLAG} google-cloud-aiplatform $extra_pkgs

[0m

In [4]:
# Restart the kernel
import os

if not os.getenv("IS_TESTING"):
    # Automatically restart kernel after installs
    import IPython

    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)

## Import Libraries and Define Constants

In [1]:
import google.cloud.aiplatform as aiplatform
import pandas as pd
from google.cloud import bigquery

In [None]:
REGION='us-central1'

## Initialize Vertex AI SDK for Python

In [None]:
aiplatform.init(project=PROJECT_ID, location=REGION)