# Lab 01 - Data Access




## Datasets and datastores


https://docs.microsoft.com/en-us/azure/machine-learning/concept-azure-machine-learning-architecture#datasets-and-datastores
  
   
Azure Machine Learning Datasets make it easier to access and work with your data. By creating a dataset, you create a reference to the data source location along with a copy of its metadata. Because the data remains in its existing location, you incur no extra storage cost, and don't risk the integrity of your data sources.

For more information, see Create and register Azure Machine Learning Datasets. For more examples using Datasets, see the sample notebooks.

Datasets use datastores to securely connect to your Azure storage services. **Datastores store connection information without putting your authentication credentials and the integrity of your original data source at risk**. They store connection information, like your subscription ID and token authorization in your Key Vault associated with the workspace, so you can securely access your storage without having to hard code them in your script.

<img src="https://docs.microsoft.com/en-us/azure/machine-learning/media/concept-data/data-concept-diagram.svg" width="700">

## Registering a Dataset

https://docs.microsoft.com/en-us/azure/machine-learning/how-to-connect-data-ui#create-datasets


<img src="https://docs.microsoft.com/en-us/azure/machine-learning/media/how-to-connect-data-ui/create-dataset-ui.gif" width="700">

## Challenge 01.01

***===> Create a new tabular dataset from a local file.***

* First download this file: https://bupademoflatfile.blob.core.windows.net/public/lab01sample.csv to your local PC
* Now load it as a new data set in Azure Machine Learning as a **\<YOUR_INITIALS\>_LAB_01**

## Challenge 01.02

***===> Create a new tabular dataset from a SQL TABLE***

**Only one of you** needs to create a SQL **Datastore** ... let us call it **bupasqldemo**

* Use Service Principal: **5203e379-a33a-4aea-bd8d-20bef410cc72**
* Use secret: **9\~by7nEl35Zv-3lntYqwm7b-15\~ulIWHmw**


Then you can create the **Dataset**
* Use the new datastore you created
* Use query **SELECT * FROM DEMONSTRATOR** to get a list of demonstrators

## Now let us see our dataset

First we connect to the datasource

In [0]:
import os
from azureml.core import Workspace, Dataset
from azureml.core.authentication import ServicePrincipalAuthentication


service_principal = ServicePrincipalAuthentication(
    tenant_id="502f2f1d-000c-410d-9d65-de9b3cfa9a83",
    service_principal_id="5203e379-a33a-4aea-bd8d-20bef410cc72",
    service_principal_password="9~by7nEl35Zv-3lntYqwm7b-15~ulIWHmw")


ws = Workspace(
    subscription_id='f9f80119-dbb0-496f-8e2c-351e0b95b66e',
    resource_group='bupa_demo',
    workspace_name='bupa_demo',
    auth=service_principal)

print("Found workspace {} at location {}".format(ws.name, ws.location))

Now we mount the datasource in a temp folder and read it

In [0]:
import pandas as pd

from glob import glob
dataset = Dataset.get_by_name(ws, name='Kaggle_Insurance_Train')

df_train = dataset.to_pandas_dataframe()


In [0]:
df_train.head()

Unnamed: 0,id,Gender,Age,Driving_License,Region_Code,Previously_Insured,Vehicle_Age,Vehicle_Damage,Annual_Premium,Policy_Sales_Channel,Vintage,Response
0,1,Male,44,1,28.0,0,> 2 Years,True,40454.0,26.0,217,1
1,2,Male,76,1,3.0,0,1-2 Year,False,33536.0,26.0,183,0
2,3,Male,47,1,28.0,0,> 2 Years,True,38294.0,26.0,27,1
3,4,Male,21,1,11.0,1,< 1 Year,False,28619.0,152.0,203,0
4,5,Female,29,1,41.0,1,< 1 Year,False,27496.0,152.0,39,0
