## Access dataframes using SQL

#### Objectives
1. Create temporary views on dataframes
2. Access the view from SQL cell
3. Access the view from Python cell

In [0]:
"""
Spark gives you two options to use SQL. One is to create a temporary view, and the other one is the global view.
"""

Out[1]: '\nSpark gives you two options to use SQL. One is to create a temporary view, and the other one is the global view.\n'

In [0]:
%run "../includes/configuration"

In [0]:
race_results = spark.read.parquet(f"{presentation_folder_path}/race_results")


In [0]:
# Create a view on top of the dataframe
race_results.createOrReplaceTempView("v_race_results") # 'v_race_results' is the name you give to the view
# This view is only available within a session as its a temp view


In [0]:
# using the sql magic command to use a sql cell:

In [0]:
%sql
SELECT *
FROM v_race_results
WHERE race_year = 2020

race_year,race_name,race_date,circuit_location,driver_name,driver_number,driver_nationality,team,fastest_lap,race_time,points,position,created_date
2020,Hungarian Grand Prix,2020-07-19T13:10:00.000+0000,Budapest,Lewis Hamilton,44,British,Mercedes,70.0,1:36:12.473,26.0,1.0,2023-08-05T08:12:07.616+0000
2020,Tuscan Grand Prix,2020-09-13T13:10:00.000+0000,Mugello,Lewis Hamilton,44,British,Mercedes,58.0,2:19:35.060,26.0,1.0,2023-08-05T08:12:07.616+0000
2020,Russian Grand Prix,2020-09-27T11:10:00.000+0000,Sochi,Valtteri Bottas,77,Finnish,Mercedes,51.0,1:34:00.364,26.0,1.0,2023-08-05T08:12:07.616+0000
2020,Portuguese Grand Prix,2020-10-25T12:10:00.000+0000,Portimão,Lewis Hamilton,44,British,Mercedes,63.0,1:29:56.828,26.0,1.0,2023-08-05T08:12:07.616+0000
2020,Emilia Romagna Grand Prix,2020-11-01T12:10:00.000+0000,Imola,Lewis Hamilton,44,British,Mercedes,63.0,1:28:32.430,26.0,1.0,2023-08-05T08:12:07.616+0000
2020,Austrian Grand Prix,2020-07-05T13:10:00.000+0000,Spielburg,Valtteri Bottas,77,Finnish,Mercedes,68.0,1:30:55.739,25.0,1.0,2023-08-05T08:12:07.616+0000
2020,Styrian Grand Prix,2020-07-12T13:10:00.000+0000,Spielburg,Lewis Hamilton,44,British,Mercedes,68.0,1:22:50.683,25.0,1.0,2023-08-05T08:12:07.616+0000
2020,British Grand Prix,2020-08-02T13:10:00.000+0000,Silverstone,Lewis Hamilton,44,British,Mercedes,45.0,1:28:01.283,25.0,1.0,2023-08-05T08:12:07.616+0000
2020,70th Anniversary Grand Prix,2020-08-09T13:10:00.000+0000,Silverstone,Max Verstappen,33,Dutch,Red Bull,46.0,1:19:41.993,25.0,1.0,2023-08-05T08:12:07.616+0000
2020,Spanish Grand Prix,2020-08-16T13:10:00.000+0000,Montmeló,Lewis Hamilton,44,British,Mercedes,63.0,1:31:45.279,25.0,1.0,2023-08-05T08:12:07.616+0000


In [0]:
# Running sql from a python cell

p_race_year = '2019'
race_results_2019 = spark.sql(f"SELECT * FROM v_race_results WHERE race_year = {p_race_year}")

display(race_results_2019)

race_year,race_name,race_date,circuit_location,driver_name,driver_number,driver_nationality,team,fastest_lap,race_time,points,position,created_date
2019,Australian Grand Prix,2019-03-17T05:10:00.000+0000,Melbourne,Valtteri Bottas,77,Finnish,Mercedes,57.0,1:25:27.325,26.0,1.0,2023-08-05T08:12:07.616+0000
2019,Spanish Grand Prix,2019-05-12T13:10:00.000+0000,Montmeló,Lewis Hamilton,44,British,Mercedes,54.0,1:35:50.443,26.0,1.0,2023-08-05T08:12:07.616+0000
2019,Austrian Grand Prix,2019-06-30T13:10:00.000+0000,Spielburg,Max Verstappen,33,Dutch,Red Bull,60.0,1:22:01.822,26.0,1.0,2023-08-05T08:12:07.616+0000
2019,British Grand Prix,2019-07-14T13:10:00.000+0000,Silverstone,Lewis Hamilton,44,British,Mercedes,52.0,1:21:08.452,26.0,1.0,2023-08-05T08:12:07.616+0000
2019,German Grand Prix,2019-07-28T13:10:00.000+0000,Hockenheim,Max Verstappen,33,Dutch,Red Bull,61.0,1:44:31.275,26.0,1.0,2023-08-05T08:12:07.616+0000
2019,Russian Grand Prix,2019-09-29T11:10:00.000+0000,Sochi,Lewis Hamilton,44,British,Mercedes,51.0,1:33:38.992,26.0,1.0,2023-08-05T08:12:07.616+0000
2019,Abu Dhabi Grand Prix,2019-12-01T13:10:00.000+0000,Abu Dhabi,Lewis Hamilton,44,British,Mercedes,53.0,1:34:05.715,26.0,1.0,2023-08-05T08:12:07.616+0000
2019,Bahrain Grand Prix,2019-03-31T15:10:00.000+0000,Sakhir,Lewis Hamilton,44,British,Mercedes,36.0,1:34:21.295,25.0,1.0,2023-08-05T08:12:07.616+0000
2019,Chinese Grand Prix,2019-04-14T06:10:00.000+0000,Shanghai,Lewis Hamilton,44,British,Mercedes,47.0,1:32:06.350,25.0,1.0,2023-08-05T08:12:07.616+0000
2019,Azerbaijan Grand Prix,2019-04-28T12:10:00.000+0000,Baku,Valtteri Bottas,77,Finnish,Mercedes,50.0,1:31:52.942,25.0,1.0,2023-08-05T08:12:07.616+0000


## Global Temp Views

In [0]:
""" Global temp views can be accessed in other notebooks """

Out[11]: ' Global temp views can be accessed in other notebooks '

In [0]:
# Create a view on top of the dataframe
race_results.createOrReplaceGlobalTempView("gv_race_results") # 'v_race_results' is the name you give to the view

In [0]:
""" 
Databricks saves this in a database called global_temp
"""

Out[13]: ' \nDatabricks saves this in a database called global_temp\n'

In [0]:
%sql
SELECT *
FROM global_temp.gv_race_results
WHERE race_year = 2020

race_year,race_name,race_date,circuit_location,driver_name,driver_number,driver_nationality,team,fastest_lap,race_time,points,position,created_date
2020,Hungarian Grand Prix,2020-07-19T13:10:00.000+0000,Budapest,Lewis Hamilton,44,British,Mercedes,70.0,1:36:12.473,26.0,1.0,2023-08-05T08:12:07.616+0000
2020,Tuscan Grand Prix,2020-09-13T13:10:00.000+0000,Mugello,Lewis Hamilton,44,British,Mercedes,58.0,2:19:35.060,26.0,1.0,2023-08-05T08:12:07.616+0000
2020,Russian Grand Prix,2020-09-27T11:10:00.000+0000,Sochi,Valtteri Bottas,77,Finnish,Mercedes,51.0,1:34:00.364,26.0,1.0,2023-08-05T08:12:07.616+0000
2020,Portuguese Grand Prix,2020-10-25T12:10:00.000+0000,Portimão,Lewis Hamilton,44,British,Mercedes,63.0,1:29:56.828,26.0,1.0,2023-08-05T08:12:07.616+0000
2020,Emilia Romagna Grand Prix,2020-11-01T12:10:00.000+0000,Imola,Lewis Hamilton,44,British,Mercedes,63.0,1:28:32.430,26.0,1.0,2023-08-05T08:12:07.616+0000
2020,Austrian Grand Prix,2020-07-05T13:10:00.000+0000,Spielburg,Valtteri Bottas,77,Finnish,Mercedes,68.0,1:30:55.739,25.0,1.0,2023-08-05T08:12:07.616+0000
2020,Styrian Grand Prix,2020-07-12T13:10:00.000+0000,Spielburg,Lewis Hamilton,44,British,Mercedes,68.0,1:22:50.683,25.0,1.0,2023-08-05T08:12:07.616+0000
2020,British Grand Prix,2020-08-02T13:10:00.000+0000,Silverstone,Lewis Hamilton,44,British,Mercedes,45.0,1:28:01.283,25.0,1.0,2023-08-05T08:12:07.616+0000
2020,70th Anniversary Grand Prix,2020-08-09T13:10:00.000+0000,Silverstone,Max Verstappen,33,Dutch,Red Bull,46.0,1:19:41.993,25.0,1.0,2023-08-05T08:12:07.616+0000
2020,Spanish Grand Prix,2020-08-16T13:10:00.000+0000,Montmeló,Lewis Hamilton,44,British,Mercedes,63.0,1:31:45.279,25.0,1.0,2023-08-05T08:12:07.616+0000
