## **Prerequisites**

#### - Create Azure Machine Learning Workspace via the Portal
- Note workspace name, resource group, and subscription id
#### - Create and populate .env file in the home directory
- Use [.sample.env](../.sample.env) as a guide
#### - Create and activate conda virtual env
- Run the following bash commands via the terminal
```bash
    conda env create --name many_models --file=../environment/conda.yaml
    conda activate many_models
    az login
```
- Select the many_models python interpreter and kernel to run the remainder of this notebook
### **1. Load Data to Workspace**

In [1]:
import os
import time
import pandas as pd
from dotenv import load_dotenv, find_dotenv
from azure.ai.ml import MLClient
from azure.ai.ml.entities import Data
from azure.ai.ml.constants import AssetTypes
from azure.identity import DefaultAzureCredential

In [2]:
# Load environment variables from .env file
load_dotenv(find_dotenv(), override=True)

#Confirm variabels were loaded
print(os.getenv("WORKSPACE_NAME"))

mm-aml-wksp


In [3]:
# authenticate
credential = DefaultAzureCredential(tenantid=os.environ.get('TENANT_ID'))

# Get a handle to the workspace
ml_client = MLClient(
    credential=credential,
    subscription_id = os.environ.get('SUBSCRIPTION_ID'),
    resource_group_name = os.environ.get('RESOURCE_GROUP_NAME'),
    workspace_name = os.environ.get('WORKSPACE_NAME'),
)

In [4]:
# set the version number of the data asset to the current UTC time
v1 = time.strftime("%Y.%m.%d.%H%M%S", time.gmtime())
local_path = "../data/oj_sim_sales/"

In [5]:
train_data = Data(
    name="oj-sim-sales-train",
    version=v1,
    description="Training Data - Chicago area orange juice sales data",
    path=local_path + "train_subset.csv",
    type=AssetTypes.URI_FILE,
)

test_data = Data(
    name="oj-sim-sales-test",
    version=v1,
    description="Validation Set - Chicago area orange juice sales data",
    path=local_path + "test_subset.csv",
    type=AssetTypes.URI_FILE,
)

# create data assets
ml_client.data.create_or_update(train_data)
ml_client.data.create_or_update(test_data)

Data({'skip_validation': False, 'mltable_schema_url': None, 'referenced_uris': None, 'type': 'uri_file', 'is_anonymous': False, 'auto_increment_version': False, 'auto_delete_setting': None, 'name': 'oj-sim-sales-test', 'description': 'Validation Set - Chicago area orange juice sales data', 'tags': {}, 'properties': {}, 'print_as_yaml': True, 'id': '/subscriptions/9a729243-1221-42c5-824c-9e44cb2da98d/resourceGroups/many-models-rg/providers/Microsoft.MachineLearningServices/workspaces/mm-aml-wksp/data/oj-sim-sales-test/versions/2023.12.20.161635', 'Resource__source_path': None, 'base_path': '/home/zacksoenen/Projects/many-models-azureml/demo_notebooks', 'creation_context': <azure.ai.ml.entities._system_data.SystemData object at 0x7f1eadad1a00>, 'serialize': <msrest.serialization.Serializer object at 0x7f1eadae7b80>, 'version': '2023.12.20.161635', 'latest_version': None, 'path': 'azureml://subscriptions/9a729243-1221-42c5-824c-9e44cb2da98d/resourcegroups/many-models-rg/workspaces/mm-aml-

In [6]:
# Validate data upload
data_asset = ml_client.data.get("oj-sim-sales-train", label="latest")

df = pd.read_csv(data_asset.path)
display(df.head(10))

Unnamed: 0,WeekStarting,Store,Brand,Advert,Price,Revenue
0,1990-06-14,1000,dominicks,1,2.59,31087.77
1,1990-06-14,1028,dominicks,1,2.64,45819.84
2,1990-06-14,1021,minute_maid,1,2.2,27271.2
3,1990-06-14,1011,tropicana,1,2.62,48213.24
4,1990-06-14,1009,minute_maid,1,2.67,50278.77
5,1990-06-14,1023,tropicana,1,2.15,24475.6
6,1990-06-14,1032,dominicks,1,2.16,34920.72
7,1990-06-14,1001,dominicks,1,2.3,28480.9
8,1990-06-14,1004,tropicana,1,2.46,44198.82
9,1990-06-14,1028,minute_maid,1,1.9,34975.2
