# Want to *actually* do machine learning?
## Part 0: Configuration

_Made for Microsoft Build 2019_

This notebook ensures that you have all the necessary setup in order to run the remaining notebooks smoothly. In particular, we'll make sure you have an Azure Machine Learning (ML) workspace with the sample data we'll use in our walkthrough.

In general, we assume that you're running in Azure ML's Notebooks VMs, which already knows which workspace you're in so you can skip some setup. It also comes with the Azure ML Python SDK pre-installed; make sure you're using the "Python 3.6 - AzureML" kernel.

> However, in order to upload local data, you'll need to either:
> 1. Run this notebook locally, or
> 2. Use the [Azure Portal](http://portal.azure.com) to upload via the web interface.


#### Table of Contents

1. [Configure your Azure ML workspace](#Configure%20your%20Azure%20ML%20workspace)
1. [Upload sample data to Azure Blob Storage](#Upload%20sample%20data%20to%20Azure%20Blob%20Storage)
1. [Next steps](#Next%20steps)

---

### Configure your Azure ML workspace

Already have a workspace you want to use? Or, want to create a new workspace? In either case, you'll need to provide the same parameters below:
* __Your subscription id__. You can get this from the [Azure portal](https://portal.azure.com).
* __A resource group name__. You will also need access to a [resource group](https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-group-overview#resource-groups), which organizes Azure resources and provides a default region for the resources in a group. You can see what resource groups to which you have access or create a new one in the [Azure portal](https://portal.azure.com). If you don't have a resource group, the create workspace command will create one for you using the name you provide.
* __A name for your workspace__. The name for your workspace is unique within the subscription, so it should be descriptive enough to discern among other Azure ML workspaces.  Depending on the scope of who has access to your subscription (i.e. only you, your department, or your entire enterprise), choose a name that makes sense for that scope.
* (optional) __The region that will host your workspace__. The region to host your workspace will be used if you are creating a new workspace.  You do not need to specify this if you are using an existing workspace. You can find the list of supported regions [here](https://azure.microsoft.com/en-us/global-infrastructure/services/?products=machine-learning-service).  You should pick a region that is close to your location or that contains your data.

Replace the default values in the cell below with your workspace parameters:

In [1]:
subscription_id = "db74d2db-c6a4-4287-b984-fd05ac72cf42"
resource_group = "omg_aml_testing"
workspace_name = "actuallydemosung"
workspace_region = "West Europe"

#### Access your existing workspace

If you already have an Azure ML workspace, the following cell attempts to load it specified by your parameters. The cell can fail if the specified workspace doesn't exist or you don't have permissions to access it. 

In [26]:
from azureml.core import Workspace

try:
    ws = Workspace(
        subscription_id = subscription_id, 
        resource_group = resource_group, 
        workspace_name = workspace_name
    )
    print("Workspace configuration succeeded. Skip the workspace creation steps below. subscription_id = "+subscription_id)
except:
    print("Workspace not accessible. Change your parameters or create a new workspace below")

Workspace configuration succeeded. Skip the workspace creation steps below. subscription_id = db74d2db-c6a4-4287-b984-fd05ac72cf42


#### Create a new workspace

This cell will create an Azure ML workspace for you in a subscription provided you have the correct permissions.

This will fail if:
* You don't have permission to create a workspace in the resource group
* You don't have a resource group and you don't have permission to create a resource group
* You aren't a subscription owner or contributor and no Azure ML workspaces have ever been created in this subscription

If workspace creation fails, please work with your IT admin to provide you with the appropriate permissions or to provision the required resources.

In [24]:
from azureml.core import Workspace

# Create the workspace using the specified parameters
ws = Workspace.create(
    name = workspace_name,
    subscription_id = subscription_id,
    resource_group = resource_group, 
    location = workspace_region,
    create_resource_group = True,
    exist_ok = True
)
ws.get_details()

{'id': '/subscriptions/db74d2db-c6a4-4287-b984-fd05ac72cf42/resourceGroups/omg_aml_testing/providers/Microsoft.MachineLearningServices/workspaces/actuallydemosung',
 'name': 'actuallydemosung',
 'location': 'westeurope',
 'type': 'Microsoft.MachineLearningServices/workspaces',
 'tags': {},
 'workspaceid': '7e985739-46ef-4d8c-9eb2-84b2f57dd5bd',
 'description': '',
 'friendlyName': '',
 'creationTime': '2019-06-11T00:29:05.9019797+00:00',
 'keyVault': '/subscriptions/db74d2db-c6a4-4287-b984-fd05ac72cf42/resourcegroups/omg_aml_testing/providers/microsoft.keyvault/vaults/actuallydemosu1005639927',
 'applicationInsights': '/subscriptions/db74d2db-c6a4-4287-b984-fd05ac72cf42/resourcegroups/omg_aml_testing/providers/microsoft.insights/components/actuallydemosu9047442039',
 'identityPrincipalId': 'b089749e-6ddb-4508-a002-b2834068ce55',
 'identityTenantId': '72f988bf-86f1-41af-91ab-2d7cd011db47',
 'identityType': 'SystemAssigned',
 'storageAccount': '/subscriptions/db74d2db-c6a4-4287-b984-fd05

### Upload sample data to Azure Blob Storage

For this walkthrough, we use data from [NYC's Taxi and Limousine Commission data sets](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page). In particular, we're using 6 csv files that we'll coalesce into one data set: the January through June 2018 Yellow Taxi files. Expand the 2018 menu, click on each data set to download it, and move them all into one folder. The following cell will upload them into your workspace's Azure Blob Storage; you'll need to replace the `src_dir` parameter with the path to your newly created folder that contains all 6 files:

In [23]:
import azureml.data
from azureml.data.azure_storage_datastore import AzureFileDatastore, AzureBlobDatastore

datastore = ws.get_default_datastore()

datastore.upload(
    src_dir=".\data",
    target_path="yellow_tripdata_2018_01",
    overwrite=True, 
    show_progress=True)


#datastore.upload(src_dir='your source directory',
 #         target_path='your target path',
  #        overwrite=True,
   #       show_progress=True)

Uploading .\data\yellow_tripdata_2018-01.csv


--- Logging error ---
Traceback (most recent call last):
  File "C:\Users\SUHON\AppData\Local\Continuum\anaconda3\lib\site-packages\azureml\data\azure_storage_datastore.py", line 256, in handler
    result = fn()
  File "C:\Users\SUHON\AppData\Local\Continuum\anaconda3\lib\site-packages\azureml\data\azure_storage_datastore.py", line 576, in <lambda>
    lambda target, source: lambda: self.blob_service.create_blob_from_path(self.container_name, target, source)
  File "C:\Users\SUHON\AppData\Local\Continuum\anaconda3\lib\site-packages\azureml\_vendor\azure_storage\blob\blockblobservice.py", line 463, in create_blob_from_path
    timeout=timeout)
  File "C:\Users\SUHON\AppData\Local\Continuum\anaconda3\lib\site-packages\azureml\_vendor\azure_storage\blob\blockblobservice.py", line 614, in create_blob_from_stream
    initialization_vector=iv
  File "C:\Users\SUHON\AppData\Local\Continuum\anaconda3\lib\site-packages\azureml\_vendor\azure_storage\blob\_upload_chunking.py", line 85, in _uploa

$AZUREML_DATAREFERENCE_4934ad9617b046738dad062fc8123123

---

### Next steps

Now you can continue running our other notebooks for the walkthrough on Azure ML Notebook VMs. Continue to [Part 1: Ingest and Wrangle Data](1_ingest-wrangle-data.ipynb).