**$$ Databricks\ Platform $$**

### Execute code in multiple languages

In [0]:
%python
print("Run on python")

In [0]:
%sql
select "Run on sql"

Run on sql
Run on sql


In [0]:
%scala
print("Run on scala")

In [0]:
%fs ls

path,name,size
dbfs:/FileStore/,FileStore/,0
dbfs:/aniketanil.chaudhary@diggibyte.com/,aniketanil.chaudhary@diggibyte.com/,0
dbfs:/databricks/,databricks/,0
dbfs:/databricks-datasets/,databricks-datasets/,0
dbfs:/databricks-results/,databricks-results/,0
dbfs:/local_disk0/,local_disk0/,0
dbfs:/mnt/,mnt/,0
dbfs:/tmp/,tmp/,0
dbfs:/user/,user/,0


In [0]:
%fs ls /databricks-datasets

path,name,size
dbfs:/databricks-datasets/COVID/,COVID/,0
dbfs:/databricks-datasets/README.md,README.md,976
dbfs:/databricks-datasets/Rdatasets/,Rdatasets/,0
dbfs:/databricks-datasets/SPARK_README.md,SPARK_README.md,3359
dbfs:/databricks-datasets/adult/,adult/,0
dbfs:/databricks-datasets/airlines/,airlines/,0
dbfs:/databricks-datasets/amazon/,amazon/,0
dbfs:/databricks-datasets/asa/,asa/,0
dbfs:/databricks-datasets/atlas_higgs/,atlas_higgs/,0
dbfs:/databricks-datasets/bikeSharing/,bikeSharing/,0


In [0]:
%fs head /databricks-datasets/README.md

<li>%fs is a short hand for dbutils.fs</li>

In [0]:
%fs help

Run file system command in dbfs using DButils directly

In [0]:
# it will gives list of file paths from file system.
dbutils.fs.ls("/databricks-datasets")[0:5]

In [0]:
# we will get all the files in a list from databricks datasets path. 
db_files = dbutils.fs.ls("/databricks-datasets")

# displaying the first 2 rows o files paths.
display(db_files[0:2])

path,name,size
dbfs:/databricks-datasets/COVID/,COVID/,0
dbfs:/databricks-datasets/README.md,README.md,976


In [0]:
%fs mounts

mountPoint,source,encryptionType
/databricks-datasets,databricks-datasets,
/databricks/mlflow-tracking,databricks/mlflow-tracking,
/databricks-results,databricks-results,
/databricks/mlflow-registry,databricks/mlflow-registry,
/mnt/tf-abfss,abfss://cntexapure@stexapure.dfs.core.windows.net,
/,DatabricksRoot,


<li>creating a widgets using dbutils.widgets</li>

In [0]:
# creating a text using dbutils
dbutils.widgets.text("name", "Brickster", "Name")
dbutils.widgets.multiselect("colors", "orange", ["red", "orange", "black", "blue"], "Favorite Color?")

Access the current value of the widget using the **`dbutils.widgets`** function **`get`**

In [0]:
name = dbutils.widgets.get("name")
colors = dbutils.widgets.get("colors").split(",")

html = "<div>Hi {}! Select your color preference.</div>".format(name)
for c in colors:
    html += """<label for="{}" style="color:{}"><input type="radio"> {}</label><br>""".format(c, c, c)

displayHTML(html)

<li>Removing all the widgets</li>

In [0]:
dbutils.widgets.removeAll()

In [0]:
spark

**$$ Reader\ and\ Writer $$**

### Objectives
<li>Read from CSV files</li>
<li>Read from JSON files</li>
<li>Write DataFrame to files</li>
<li>Write DataFrame to tables</li>
<li>Write DataFrame to a Delta table</li>
<h4>Methods</h4>
<li><b>DataFrameReader</b>: csv, json, option, schema</li>
<li><b>DataFrameWriter</b>: mode, option, parquet, format, saveAsTable</li>
<li><b>StructType</b>: toDDL</li>
<h4>Spark Types</h4>
<li><b>Types</b>: ArrayType, DoubleType, IntegerType, StringType, LongType, StructType, StructField</li>

<h3> DataFrameReader</h3>
<li>MAGIC Interface used to load a DataFrame from external storage systems</li>
<h4>syntax</h4>
<li>spark.read.parquet("path/to/files")</li>

In [0]:
# Reading file from CSV
csv_path = '/mnt/tf-abfss/data/ds/food_inspection_dinesh/tesla_stocks.csv'

# here we are using inferschema as True beacause we need their actual type
userdf = (spark.read.option("sep",",").option("header",True).option("inferSchema", True).csv(csv_path))

userdf.printSchema()

### Creating a schema for types

In [0]:
from pyspark.sql.types import DoubleType, StringType, IntegerType,StructType, StructField

userdf_schema = StructType([StructField('Date',StringType(),True),
                          StructField('Open', DoubleType(),True),
                          StructField('High', DoubleType(), True),
                          StructField('Low', DoubleType(), True),
                          StructField('Close', DoubleType(), True),
                          StructField('Adj Close', DoubleType(), True),
                          StructField('Volume', IntegerType(), True)])

In [0]:
# Reading file from CSV
csv_path = '/mnt/tf-abfss/data/ds/food_inspection_dinesh/tesla_stocks.csv'

userdf = (spark.read.option("sep",",")\
          .option("header",True)\
          .schema(userdf_schema)\
          .csv(csv_path))

userdf.printSchema()

In [0]:
# Changing the column name because, while converting type using ddl this space between the column becomes problem.

userdf.withColumnRenamed("Adj Close", "Adj_Close")

**Schema using DDl(Data definition language) syntax**

In [0]:
DDl_schema = "Date string, Open double, High double, Low double, Close double, Adj_Close double, Volume integer"

# converting the type of columns according the DDL schema
ddl_userdf = (spark.read.option("sep","")\
          .option("header",True)\
          .schema(DDl_schema)\
          .csv(csv_path))

ddl_userdf.printSchema()

**$$ Data\ frame\ and\ column $$**

In [0]:
from pyspark.sql.functions import col

userdf.Open
userdf['Open']
col('Open')

In [0]:
%scala
$"Open"

In [0]:
df = spark.read.csv(csv_path, header=True)

In [0]:
# Select() function
sel = df.select('Open', 'High')
display(sel)

Open,High
3.8,5.0
5.158,6.084
5.0,5.184
4.6,4.62
4.0,4.0
3.28,3.326
3.228,3.504
3.516,3.58
3.59,3.614
3.478,3.728
