## Overview

This notebook will show you how to create and query a table or DataFrame that you uploaded to DBFS. [DBFS](https://docs.databricks.com/user-guide/dbfs-databricks-file-system.html) is a Databricks File System that allows you to store data for querying inside of Databricks. This notebook assumes that you have a file already inside of DBFS that you would like to read from.

This notebook is written in **Python** so the default cell type is Python. However, you can use different languages by using the `%LANGUAGE` syntax. Python, Scala, SQL, and R are all supported.

In [0]:
import pandas as pd
import numpy as np

In [0]:
container_name = 'coviddatacontainer'
storage_name = 'coviddata2019'
mount_name = '/mnt/covid_19_datastorage2' # This path can be used to access the contents of the blob container 
sas_key = '?sv=2019-12-12&ss=b&srt=sco&sp=rwdlacx&se=2021-02-12T12:06:13Z&st=2021-02-12T04:06:13Z&spr=https&sig=QFa1jGJ%2BZMvv0srHRMwF2hVTt73Xl99C9O%2Br1311%2Fs4%3D'

dbutils.fs.mount(
  source = "wasbs://%s@%s.blob.core.windows.net" % (container_name, storage_name),
  mount_point = mount_name,
  extra_configs = {"fs.azure.sas.%s.%s.blob.core.windows.net" % (container_name, storage_name) : sas_key })
spark.conf.set(
  "fs.azure.account.key.coviddata2019.blob.core.windows.net",
  "CyjVZf7sENvAHizznCWeyrrm9BaJnuyESzdMBodb+j9lhAN4aO6+zq7mWPKP4QXMO0Cp89pf1lE7TWIuyog8zg==")

In [0]:
# File location and type
file_location = "/FileStore/tables/country_wise_latest.csv"
file_type = "csv"
# CSV options
infer_schema = "false"
first_row_is_header = "true"
delimiter = ","

# The applied options are for CSV files. For other file types, these will be ignored.
df = spark.read.format("csv") \
  .options(header=first_row_is_header) \
  .options(sep=delimiter) \
  .load(file_location)
display(df)

Country/Region,Confirmed,Deaths,Recovered,Active,New cases,New deaths,New recovered,Deaths / 100 Cases,Recovered / 100 Cases,Deaths / 100 Recovered,Confirmed last week,1 week change,1 week % increase,WHO Region
Afghanistan,36263,1269,25198,9796,106,10,18,3.5,69.49,5.04,35526,737,2.07,Eastern Mediterranean
Albania,4880,144,2745,1991,117,6,63,2.95,56.25,5.25,4171,709,17.0,Europe
Algeria,27973,1163,18837,7973,616,8,749,4.16,67.34,6.17,23691,4282,18.07,Africa
Andorra,907,52,803,52,10,0,0,5.73,88.53,6.48,884,23,2.6,Europe
Angola,950,41,242,667,18,1,0,4.32,25.47,16.94,749,201,26.84,Africa
Antigua and Barbuda,86,3,65,18,4,0,5,3.49,75.58,4.62,76,10,13.16,Americas
Argentina,167416,3059,72575,91782,4890,120,2057,1.83,43.35,4.21,130774,36642,28.02,Americas
Armenia,37390,711,26665,10014,73,6,187,1.9,71.32,2.67,34981,2409,6.89,Europe
Australia,15303,167,9311,5825,368,6,137,1.09,60.84,1.79,12428,2875,23.13,Western Pacific
Austria,20558,713,18246,1599,86,1,37,3.47,88.75,3.91,19743,815,4.13,Europe


In [0]:
pdf = df.select("Country/Region","Confirmed","Deaths / 100 Cases","Recovered / 100 Cases","Deaths / 100 Recovered").toPandas()
display(pdf)

Country/Region,Confirmed,Deaths / 100 Cases,Recovered / 100 Cases,Deaths / 100 Recovered
Afghanistan,36263,3.5,69.49,5.04
Albania,4880,2.95,56.25,5.25
Algeria,27973,4.16,67.34,6.17
Andorra,907,5.73,88.53,6.48
Angola,950,4.32,25.47,16.94
Antigua and Barbuda,86,3.49,75.58,4.62
Argentina,167416,1.83,43.35,4.21
Armenia,37390,1.9,71.32,2.67
Australia,15303,1.09,60.84,1.79
Austria,20558,3.47,88.75,3.91


In [0]:
final_spark_country_data_frame=spark.createDataFrame(pdf)

In [0]:
outputpath="wasbs://coviddatacontainer@coviddata2019.blob.core.windows.net/output"
final_spark_country_data_frame.write.mode('overwrite').format('csv').save(outputpath)
