# Iguazio Getting Started Example

This notebook contains code examples for performing common tasks to help you get started with the Iguazio Continous Data Platform

Follow the tutorial by running the paragraphs in order of appearance.

> **Tip:** You can also browse the files and directories that you write to the "bigdata" container in this tutorial from the platform dashboard: in the side navigation menu, select **Data**, and then select the **bigdata** container from the table. On the container data page, select the **Browse** tab, and then use the side directory-navigation tree to browse the directories. Selecting a file or directory in the browse table displays its metadata.


## Step 1: Load a sample CSV file from S3
Use `curl` to download a sample stock data from Amazon Public Datasets on S3. This file belongs to deutsche-boerse public dataset.
For additional public datasets check out (https://registry.opendata.aws/) 

In [42]:
%%sh 

mkdir -p /v3io/bigdata/examples

# Download a sample stocks file 
curl -L "deutsche-boerse-xetra-pds.s3.amazonaws.com/2018-03-26/2018-03-26_BINS_XETR07.csv" > /v3io/bigdata/examples/stocks.csv




  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  975k  100  975k    0     0  3969k      0 --:--:-- --:--:-- --:--:-- 3982k


## Step 2: Convert the sample CSV file to a NoSQL table

Read the sample stocks.csv file that you downloaded in Step 1 into a Spark DataFrame, and write the data in NoSQL format to a new stocks_nosql table 

Note: To use the Iguazio Spark Connector, set the data-source format to "io.iguaz.v3io.spark.sql.kv".

In [44]:


from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("Iguazio Integration demo").getOrCreate()

# Read the sample stocks.csv file into a Spark DataFrame, and let Spark infer the schema of the CSV file

myDF = spark.read.format("io.iguaz.v3io.spark.sql.kv").option("header", "true").option("inferSchema", "true").csv("v3io://bigdata/examples/stocks.csv")

# Show the DataFrame data
# myDF.show()

# Write the DataFrame data to a stocks_nosql table under "bigdata" container and define "ISIN" column as a key
myDF.write.format("io.iguaz.v3io.spark.sql.kv").mode("append").option("key", "ISIN").save("v3io://bigdata/examples/stocks_tab/")


## Step 3: Run interactive SQL queries

In [29]:
# run only once (load SQL magic)
%load_ext sql
%config SqlMagic.autocommit=False

In [45]:
%sql select * from v3io.bigdata."/examples/stocks_tab" where tradedvolume > 11000 order by tradedvolume

 * presto://default-tenant-presto.default-tenant.svc:8080/v3io
Done.


securitydesc,securitytype,time,isin,minprice,date,endprice,numberoftrades,mnemonic,currency,securityid,maxprice,tradedvolume,startprice
UBS-ETF-MSCI CANADA CDAD,ETF,07:34,LU0446734872,24.625,2018-03-26 00:00:00.000,24.625,2,UIM9,EUR,2505968,24.625,11016,24.625
"LLOYDS BKG GRP LS-,10",Common stock,07:12,GB0008706128,0.758,2018-03-26 00:00:00.000,0.758,1,LLD,EUR,2505381,0.758,11090,0.758
DT.TELEKOM AG NA,Common stock,07:00,DE0005557508,13.055,2018-03-26 00:00:00.000,13.06,14,DTE,EUR,2504954,13.07,12414,13.065
IS C.MSCI EMIMI U.ETF DLA,ETF,07:09,IE00BKM4GZ66,24.853,2018-03-26 00:00:00.000,24.853,1,IS3N,EUR,2505427,24.853,12629,24.853
ISIV-MSCI EMUMC.U.ETF EOA,ETF,07:55,IE00BCLWRD08,38.305,2018-03-26 00:00:00.000,38.305,1,IS3H,EUR,2505392,38.305,13062,38.305
XTR.MSCI JAPAN 4CEOH,ETF,07:04,LU0659580079,20.02,2018-03-26 00:00:00.000,20.02,5,XMK9,EUR,2506044,20.032,14357,20.026
ETFS DAX DLY2XSH.GO UC.DZ,ETF,07:15,DE000A0X9AA8,5.11,2018-03-26 00:00:00.000,5.112,7,DES2,EUR,2504257,5.112,14500,5.111
ETFS MET.SEC.DZ07/UN.XAG,ETC,07:04,DE000A0N62F2,12.685,2018-03-26 00:00:00.000,12.685,3,VZLC,EUR,2506199,12.685,15000,12.685
ISHS ESTXX BNKS.30-15 UC.,ETF,07:04,DE0006289309,12.084,2018-03-26 00:00:00.000,12.098,12,EXX1,EUR,2505027,12.098,15829,12.084
IS.S.E.600 INSUR.U.ETF A.,ETF,07:06,DE000A0H08K7,28.02,2018-03-26 00:00:00.000,28.02,1,EXH5,EUR,2504328,28.02,16234,28.02


## Step 4: Convert the stocks_nosql table to a Parquet file

In [46]:
myDF.write.parquet("v3io://bigdata/examples/stocks_prqt")


## Step 5: Display the content of the example container directory
Use hadoop fs to list the contents of the root directory under “bigdata” container where all the example files are located
You should see in this directory the stocks.csv file and the stocks_nosql and stocks_prqt table directories.

In [47]:
!ls -lrt /v3io/bigdata/

total 0
drwxrwsrwx. 2 50 nogroup 0 Nov 12 12:30 mytsdb
drwxrwsrwx. 2 50 nogroup 0 Nov 13 16:42 bank2
drwxrwsrwx. 2 50 nogroup 0 Nov 15 10:43 family
drwxr-xr-x. 2 50 nogroup 0 Nov 18 10:15 iguazio
drwxrwsr-x. 2 50 nogroup 0 Nov 18 10:58 examples


In [None]:
Step 6: User 

In [None]:
%%sh

# List the files and directories in the root directory of the "bigdata" container
hadoop fs -ls v3io://bigdata/


In [None]:
## Remove Data

In [None]:
%%sh

hadoop fs -rm -r v3io://bigdata/examples/stock*

In [12]:
!rm -rf /v3io/bigdata/examples/*

In [15]:
!rm -rf /v3io/bigdata/"\$current_user"