# &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Getting started with Azure Data Lake Analytics using R tutorial

# &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;  with Jupyter Notebook and Azure CLI

## Contents
- [Create Azure Resources](#section1)
- [Get set up](#section2)
- [Submit Jobs](#section3)
- [View results](#section4)

<a id="section1"></a>

## Create Azure Resources

- A prerequisite for this tutorial is that you have an Azure subscription. 
- We will provision a **Data Lake Analytiscs Account** and a **Data Lake Store** in your subscription. 
- Additionally we will transfer scripts needed for this tutorial to a folder in your Data Lake Store. 

In [27]:
import os
mycwd = os.getcwd()
print('My current working directory is',mycwd)

My current working directory is C:\Users\gshaheen\Documents


In [None]:
# Clone the repository
!git clone https://github.com/Azure/ADLAwithR-GettingStarted.git

**Note**: If you see an error like 'git' is not recognized as an internal or external command,
operable program or batch file and you have it installed, you might need to set your PATH to point at your git installation. Read more @ https://stackoverflow.com/questions/4492979/git-is-not-recognized-as-an-internal-or-external-command 

In [1]:
# Install Azure CLI
package = 'azure-cli'
try:
    __import__(package)
except ImportError:
    print('package already installed')
    ispip = 1    

if ispip:
    !pip --version
else:
    !pip install -I azure-cli

package already installed
pip 9.0.1 from c:\program files\anaconda3\lib\site-packages (python 3.5)



Login to you azure account. Executing the command below will show a message ~  
`To sign in, use a web browser to open the page https://aka.ms/devicelogin and enter the code XXXXXXXXX to authenticate.`   
Cope the code from your screen, follow the url and paste the code.


In [None]:
!az login -o table

In [None]:
!az account list | grep name | grep -v '@'

In [None]:
selected_subscription = "YOUR SUBSCRIPTION" # Pick a subscription from above
!az account set --subscription $selected_subscription

In [None]:
# make sure you are in the right subscription
!az account show

Create a unique string (that has not been chosen by another Azure user). This string can include only lowercase letters and numbers, and must be less than 20 characters in length.

In [21]:
import uuid
unique_string = "shaheendemo"  
unique_string = unique_string + str(uuid.uuid4())[:4]
print(unique_string)  

shaheendemo2b63


In [22]:
RESOURCE_GROUP_NAME = unique_string + 'rg'
ADLA_NAME = unique_string + 'adla'
ADLS_NAME = unique_string + 'adls'
LOCATION = 'Central US'

In [23]:
print(RESOURCE_GROUP_NAME)
print(ADLA_NAME)
print(ADLS_NAME)
print(LOCATION)

shaheendemo2b63rg
shaheendemo2b63adla
shaheendemo2b63adls
Central US


In [None]:
# To create a new resource group:
!az group create -n $RESOURCE_GROUP_NAME -l $LOCATION 
#!az group create --name shaheendemo2b63rg --location "Central US"

In [None]:
# If you don't already have one create a new Data Lake Store account: 
# This will be the default data lake store account associated with this adla account
!az dls account create --account $ADLS_NAME --resource-group $RESOURCE_GROUP_NAME
#!az dls account create --account "shaheendemo2b63adls" --resource-group "shaheendemo2b63rg"

In [None]:
# If you don't already have one create a Data Lake Analytics account:
!az dla account create --account $ADLA_NAME --resource-group $RESOURCE_GROUP_NAME --location $LOCATION --default-data-lake-store $ADLS_NAME
#!az dla account create --account "shaheendemo2b63adla" --resource-group "shaheendemo2b63rg" --location "Central US" --default-data-lake-store "shaheendemo2b63adls"

**Note**: One additional important step is to install U-SQL Extensions in your ADLA account.  
-  Got to azure portal and in the data lake analytics blade, locate and click on Sample Scripts under the GETTING STARTED section in the left-hand menu. (You may need to scroll down or use the search feature.)
-  In the Sample Scripts blade, click on Install U-SQL Extensions to install U-SQL Extensions to your account.
-  This step will enable R (and Python) extensions to work with ADLA.
-  This step may take several minutes to complete.

<a id='section2'></a>

## Get set up

Next we will upload the folder ADLSmaterial on our local machine to the folder TutorialMaterial in Data Lake Store. If you haven't changed directory since you cloned the repository the folder ADLSmaterial shoould be in the location below.

In [36]:
path_folderlocal = os.path.join(mycwd, "ADLAwithR-GettingStarted","Tutorial","ADLSmaterial")

In [35]:
os.listdir(path_folderlocal)

['dplyrWithDependencies.zip',
 'magrittr_1.5.zip',
 'myiris.csv',
 'myiris_wheader.csv',
 'readme.txt',
 'Rscript2runlocally_packages.R',
 'rscriptEx2.R',
 'rscriptEx7b.R',
 'rscriptEx8b.R']

In [None]:
!az dls fs upload --account $ADLS_NAME --source-path $path_folderlocal --destination-path "/TutorialMaterial"    
#!az dls fs upload --account shaheendemo2b63adls  --source-path "C:\Users\gshaheen\Documents\ADLAwithR-GettingStarted\Tutorial\ADLSmaterial"  --destination-path "/TutorialMaterial" 


The command above will create a folder TutorialMaterial in your Data Lake Store and transfer the contents of ADLSmaterial to it.

List the files in the folder TutorialMaterial that we created above in your Data Lake Store account.

In [None]:
!az dls fs list --account $ADLS_NAME --path /TutorialMaterial

<a id='section3'></a>

## Submit Jobs

In [9]:
# If you haven't changed the directory since you git cloned the path should be as follows
script_usql = os.path.join(mycwd, "ADLAwithR-GettingStarted","Tutorial",'Exercise1','usqlscriptEx1.usql')
os.path.isfile(script_usql)

True

In [None]:
# make sure you are in the right data lake analytics account
!az dla account show --account $ADLA_NAME 

In [20]:
# Submit script_usql to adla and grab the jobId for the submitted job
!az dla job submit --account $ADLA_NAME --job-name "myadlajobex2" --script @$script_usql | grep "jobId"

  "jobId": "bbbf4ce2-a3aa-11e7-a07d-54ee7528f313",


#### Congratulations! you submitted the first adla job of this tutorial. To check the status of your job:

In [32]:
jobId = "bbbf4ce2-a3aa-11e7-a07d-54ee7528f313"  # from above

In [34]:
!az dla job show --account $ADLA_NAME --job-identity $jobId | grep '"state":'

  "state": "Ended",


In [35]:
!az dla job show --account $ADLA_NAME --job-identity $jobId | grep '"result":'

  "result": "Succeeded",


<a id='section4'></a>

## View Results  
After a job is completed, you can use the following commands to list the output files, and download the files:

In [27]:
# You can preview the output created
!az dls fs preview --account $ADLS_NAME --path "/TutorialMaterial/outex1.txt" --length 128 --offset 0

"Par,sepal_length,sepal_width,petal_length,petal_width,species,mynewcol\r\n0,5.1,3.5,1.4,0.2,Iris-setosa,6\r\n0,4.9,3.0,1.4,0.2,Iris-"


In [31]:
# To download the output file:
path_todownload = os.path.join(mycwd, "ADLAwithR-GettingStarted","Tutorial",'Exercise1','outex1.txt')
!az dls fs download --account $ADLS_NAME --source-path "/TutorialMaterial/outex1.txt" --destination-path $path_todownload