# Reading from and Writing to Datastores

Copyright (c) Microsoft Corporation. All rights reserved.<br>
Licensed under the MIT License.

A datastore is a reference that points to an Azure storage service like a blob container for example. It belongs to a workspace and a workspace can have many datastores.

A data reference points to a path on the underlying Azure storage service the datastore references. For example, given a datastore named `blob` that points to an Azure blob container, a data reference can point to `/test/data/titanic.csv` in the blob container.

## Read data from Datastore

Data Prep supports reading data from a `Datastore` or a `DataReference`. 

Passing in a datastore into all the `read_*` methods of Data Prep will result in reading everything in the underlying Azure storage service. To read a specific folder or file in the underlying storage, you have to pass in a data reference.

In [1]:
from azureml.core import Workspace, Datastore
from azureml.data.data_reference import DataReference

import azureml.dataprep as dprep

First, get or create a workspace. Feel free to replace `subscription_id`, `resource_group`, and `workspace_name` with other values.

In [2]:
subscription_id = '35f16a99-532a-4a47-9e93-00305f6c40f2'
resource_group = 'DataStoreTest'
workspace_name = 'dataprep-centraleuap'

workspace = Workspace(subscription_id=subscription_id, resource_group=resource_group, workspace_name=workspace_name)

In [3]:
workspace.datastores

{'dataprep_adls': <azureml.data.azure_data_lake_datastore.AzureDataLakeDatastore at 0x7ff9fc69c160>,
 'dataprep_blob': <azureml.data.azure_storage_datastore.AzureBlobDatastore at 0x7ff9fc682ba8>,
 'dataprep_blob_key': <azureml.data.azure_storage_datastore.AzureBlobDatastore at 0x7ff9fc6dda20>,
 'dataprep_file': <azureml.data.azure_storage_datastore.AzureFileDatastore at 0x7ff9fc69c1d0>,
 'test_sql': <azureml.data.azure_sql_database_datastore.AzureSqlDatabaseDatastore at 0x7ff9fc682b38>}

You can now read a crime data set from the datastore. If you are using your own workspace, the `crime0-10.csv` will not be there by default. You will have to upload the data to the datastore yourself.

In [4]:
datastore = Datastore(workspace=workspace, name='dataprep_blob')
dflow = dprep.read_csv(path=datastore.path('crime0-10.csv'))
dflow.head(5)

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,10140490,HY329907,07/05/2015 11:50:00 PM,050XX N NEWLAND AVE,820,THEFT,$500 AND UNDER,STREET,False,False,...,41,10,06,1129230.0,1933315.0,2015,07/12/2015 12:42:46 PM,41.973309466,-87.800174996,"(41.973309466, -87.800174996)"
1,10139776,HY329265,07/05/2015 11:30:00 PM,011XX W MORSE AVE,460,BATTERY,SIMPLE,STREET,False,True,...,49,1,08B,1167370.0,1946271.0,2015,07/12/2015 12:42:46 PM,42.008124017,-87.65955018,"(42.008124017, -87.65955018)"
2,10140270,HY329253,07/05/2015 11:20:00 PM,121XX S FRONT AVE,486,BATTERY,DOMESTIC BATTERY SIMPLE,STREET,False,True,...,9,53,08B,,,2015,07/12/2015 12:42:46 PM,,,
3,10139885,HY329308,07/05/2015 11:19:00 PM,051XX W DIVISION ST,610,BURGLARY,FORCIBLE ENTRY,SMALL RETAIL STORE,False,False,...,37,25,05,1141721.0,1907465.0,2015,07/12/2015 12:42:46 PM,41.902152027,-87.754883404,"(41.902152027, -87.754883404)"
4,10140379,HY329556,07/05/2015 11:00:00 PM,012XX W LAKE ST,930,MOTOR VEHICLE THEFT,THEFT/RECOVERY: AUTOMOBILE,STREET,False,False,...,27,28,07,1168413.0,1901632.0,2015,07/12/2015 12:42:46 PM,41.885610142,-87.657008701,"(41.885610142, -87.657008701)"


You can also read from an Azure SQL database. To do that, you will first get an Azure SQL database datastore instance and pass it to Data Prep for reading.

In [5]:
datastore = Datastore(workspace=workspace, name='test_sql')
dflow_sql = dprep.read_sql(data_source=datastore, query='SELECT * FROM team')
dflow_sql.head(5)

Unnamed: 0,Name,Alias
0,Alpha,alpha
1,Bravo,bravo
2,Charlie,charlie
3,Dank,memes


## Write data to Datastore

You can also write a dataflow to a datastore. The code below will write the file you read in earlier to the folder in the datastore.

In [6]:
dest_datastore = Datastore(workspace, 'dataprep_blob_key')

In [7]:
dflow.write_to_csv(directory_path=dest_datastore.path('output/crime0-10')).run_local()

Now you can read all the files in the `dataprep_adls` datastore which references an Azure Data Lake store.

In [8]:
datastore = Datastore(workspace=workspace, name='dataprep_adls')
dflow_adls = dprep.read_csv(path=DataReference(datastore, path_on_datastore='/input/crime0-10.csv'))
dflow_adls.head(5)

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,10140490,HY329907,07/05/2015 11:50:00 PM,050XX N NEWLAND AVE,820,THEFT,$500 AND UNDER,STREET,False,False,...,41,10,06,1129230.0,1933315.0,2015,07/12/2015 12:42:46 PM,41.973309466,-87.800174996,"(41.973309466, -87.800174996)"
1,10139776,HY329265,07/05/2015 11:30:00 PM,011XX W MORSE AVE,460,BATTERY,SIMPLE,STREET,False,True,...,49,1,08B,1167370.0,1946271.0,2015,07/12/2015 12:42:46 PM,42.008124017,-87.65955018,"(42.008124017, -87.65955018)"
2,10140270,HY329253,07/05/2015 11:20:00 PM,121XX S FRONT AVE,486,BATTERY,DOMESTIC BATTERY SIMPLE,STREET,False,True,...,9,53,08B,,,2015,07/12/2015 12:42:46 PM,,,
3,10139885,HY329308,07/05/2015 11:19:00 PM,051XX W DIVISION ST,610,BURGLARY,FORCIBLE ENTRY,SMALL RETAIL STORE,False,False,...,37,25,05,1141721.0,1907465.0,2015,07/12/2015 12:42:46 PM,41.902152027,-87.754883404,"(41.902152027, -87.754883404)"
4,10140379,HY329556,07/05/2015 11:00:00 PM,012XX W LAKE ST,930,MOTOR VEHICLE THEFT,THEFT/RECOVERY: AUTOMOBILE,STREET,False,False,...,27,28,07,1168413.0,1901632.0,2015,07/12/2015 12:42:46 PM,41.885610142,-87.657008701,"(41.885610142, -87.657008701)"
