# Create your Datastore
A datastore is a storage abstraction over an Azure storage account. The datastore can use either an Azure blob container or an Azure file share as the back-end storage. Each workspace has a default datastore, and you can register additional datastores. Use the Python SDK API or the Azure Machine Learning CLI to store and retrieve files from the datastore.

Essentially, Datastores are ways to keep track of different storage accounts and containers within storage accounts to better manage your data.

To learn more, click here:  https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.datastore(class)?view=azure-ml-py

In [None]:
# Load Azure Libaries
from azureml.core import Datastore
from azureml.core.workspace import Workspace
from azureml.core.authentication import InteractiveLoginAuthentication
import pandas as pd
import numpy as np
import json
import os

In [None]:
# Type in the first version of the Azure ML SDK you are using after running this cell the first time
import azureml.core

print("This notebook was created using version <TYPE IN FIRST VERSION USED HERE> of the Azure ML SDK")
print("You are currently using version", azureml.core.VERSION, "of the Azure ML SDK")

In [None]:
ws = Workspace.from_config()

In [None]:
# Set up your datastore by filling in the lower case values between double quotes
blob_datastore_name="<my-datastore-name>" # Name of the datastore.  This is your choice.
account_name=os.getenv("BLOB_ACCOUNTNAME", "<my-storage-account>") # Name of your Storage Account you use to store data
container_name=os.getenv("BLOB_CONTAINER", "<my-container>") # Name of the Container you use to store data
account_key=os.getenv("BLOB_ACCOUNT_KEY", "<my-storage-account-key>") # Your Storage Account key

# Set up a datastore using the normal parameters (for Blob Storage or Azure Data Lake Gen 2)
datastore = Datastore.register_azure_blob_container(workspace=ws, 
                                                    datastore_name=blob_datastore_name, 
                                                    container_name=container_name, 
                                                    account_name=account_name,
                                                    account_key=account_key,
                                                    overwrite=True)

In [None]:
# Output a nice table with all of your essential datastore information
dsoutput = {}
dsoutput['Workspace Name'] = ws.name
dsoutput['Datastore Name'] = datastore.name
dsoutput['Container Name'] = datastore.container_name
dsoutput['Resource Group'] = ws.resource_group
dsoutput['Storage Account'] = datastore.account_name
pd.set_option('display.max_colwidth', -1)
dsoutputDf = pd.DataFrame(data = dsoutput, index = [''])
dsoutputDf.T

## Most people can stop here and move on to the next notebook.  
If you want to set up your datastore using a service principal for enhanced security, read and work through the cells below.

In [None]:
# You can also set up a datastore using a service principal if you use Azure Data Lake Gen 2

client_id=os.getenv("CLIENT_ID","<my-client-id>") # Service Principal ID with permissions to access your Storage Account
client_secret=os.getenv("CLIENT_SECRET","<my-client-secret>") # Service Principal Password
tenant_id=os.getenv("TENANT_ID","<my-tenant_id>") # Your Tenant ID



# Set up a datastore using a service principal (Azure Data Lake Gen 2 only)

datastore = Datastore.register_azure_data_lake_gen2(workspace=ws, 
                                                         datastore_name=blob_datastore_name, 
                                                         account_name=account_name,
                                                         filesystem=container_name,
                                                         client_id=client_id, 
                                                         client_secret=client_secret,
                                                         tenant_id=tenant_id,
                                                         overwrite=True)

In [None]:
# Output a nice table with all of your essential datastore information
dsoutput = {}
dsoutput['Workspace Name'] = ws.name
dsoutput['Datastore Name'] = datastore.name
dsoutput['Container Name'] = datastore.container_name
dsoutput['Resource Group'] = ws.resource_group
dsoutput['Storage Account'] = datastore.account_name
pd.set_option('display.max_colwidth', -1)
dsoutputDf = pd.DataFrame(data = dsoutput, index = [''])
dsoutputDf.T

<br><br><br><br><br>



Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.