
This notebook shows you how to create and query a table or DataFrame loaded from data stored in Azure Blob storage.


### Step 1: Mount the ADLS container to DBFS: 
access Azure Blob storage: account keys and shared access signatures (SAS).

To get started, we need to set the location and type of the file.

In [0]:
# Configuration details
storage_account_name = "datalake840"
container_name = "worldcupt202024"
storage_account_access_key = "oM1I8RqaXUuUC7AYBoEjJz8vnDO3cJELnA+sQDMZG3KlIuiub+qZ2KdH8ZhJGa5UzCevxkmRyGMw+AStSdSf8A=="

# Set up the Spark configuration to access ADLS
spark.conf.set("fs.azure.account.key." + storage_account_name + ".blob.core.windows.net", storage_account_access_key)


In [0]:
# Define the mount point
mount_point = "/mnt/world_cup_T-20"

# Check if the mount point already exists
if any(mount.mountPoint == mount_point for mount in dbutils.fs.mounts()):
    dbutils.fs.unmount(mount_point)
    dbutils.fs.mount(
    source = f"wasbs://{container_name}@{storage_account_name}.blob.core.windows.net",
    mount_point = mount_point,
    extra_configs = {f"fs.azure.account.key.{storage_account_name}.blob.core.windows.net": storage_account_access_key}
)

else:
    # Mount the ADLS container
    dbutils.fs.mount(
        source = f"wasbs://{container_name}@{storage_account_name}.blob.core.windows.net",
        mount_point = mount_point,
        extra_configs = {f"fs.azure.account.key.{storage_account_name}.blob.core.windows.net": storage_account_access_key}
    )




/mnt/world_cup_T-20 has been unmounted.


In [0]:
# List files in the mounted directory
files=[]
for i in dbutils.fs.ls("/mnt/world_cup_T-20"):
    files.append(i.name)
files

['deliveries.csv', 'matches.csv']


### Step 2: Read the data

Now that we have specified our file metadata, we can create a DataFrame.

First, let's create a DataFrame in Python.

In [0]:
deliveries=spark.read.format("csv").option("header", "true").load(f"/mnt/world_cup_T-20/deliveries.csv")
matches=spark.read.format("csv").option("header", "true").load(f"/mnt/world_cup_T-20/matches.csv")

In [0]:
matches.display()
deliveries.display()

season,team1,team2,date,match_number,venue,city,toss_winner,toss_decision,player_of_match,umpire1,umpire2,reserve_umpire,match_referee,winner,winner_runs,winner_wickets,match_type
2024,Canada,United States of America,2024/06/01,1,Grand Prairie Stadium,Dallas,United States of America,field,Aaron Jones,RK Illingworth,Sharfuddoula,L Rusere,RB Richardson,United States of America,,7.0,Group
2024,Papua New Guinea,West Indies,2024/06/02,2,Providence Stadium,Providence,West Indies,field,RL Chase,AT Holdstock,Rashid Riaz,HDPK Dharmasena,AJ Pycroft,West Indies,,5.0,Group
2024,Oman,Namibia,2024/06/02,3,Kensington Oval,Bridgetown,Namibia,field,D Wiese,J Madanagopal,JS Wilson,Asif Yaqoob,RS Madugalle,,,,Group
2024,Sri Lanka,South Africa,2024/06/03,4,Nassau County International Cricket Stadium,New York,Sri Lanka,bat,A Nortje,CM Brown,RA Kettleborough,AG Wharf,JJ Crowe,South Africa,,6.0,Group
2024,Afghanistan,Uganda,2024/06/03,5,Providence Stadium,Providence,Uganda,field,Fazalhaq Farooqi,Ahsan Raza,HDPK Dharmasena,Rashid Riaz,AJ Pycroft,Afghanistan,125.0,,Group
2024,Scotland,England,2024/06/04,6,Kensington Oval,Bridgetown,Scotland,bat,,Asif Yaqoob,Nitin Menon,J Madanagopal,RS Madugalle,No Result,,,Group
2024,Nepal,Netherlands,2024/06/04,7,Grand Prairie Stadium,Dallas,Netherlands,field,TJG Pringle,L Rusere,RJ Tucker,RK Illingworth,RB Richardson,Netherlands,,6.0,Group
2024,Ireland,India,2024/06/05,8,Nassau County International Cricket Stadium,New York,India,field,JJ Bumrah,AG Wharf,CB Gaffaney,CM Brown,DC Boon,India,,8.0,Group
2024,Papua New Guinea,Uganda,2024/06/05,9,Providence Stadium,Providence,Uganda,field,Riazat Ali Shah,AT Holdstock,Rashid Riaz,HDPK Dharmasena,J Srinath,Uganda,,3.0,Group
2024,Australia,Oman,2024/06/05,10,Kensington Oval,Bridgetown,Oman,field,MP Stoinis,Asif Yaqoob,JS Wilson,Nitin Menon,RS Madugalle,Australia,39.0,,Group


match_id,season,start_date,venue,innings,ball,batting_team,bowling_team,striker,non_striker,bowler,runs_off_bat,extras,wides,noballs,byes,legbyes,penalty,wicket_type,player_dismissed,other_wicket_type,other_player_dismissed
1,2024,2024-06-02,"Providence Stadium, Guyana",1,0.1,Papua New Guinea,West Indies,TP Ura,A Vala,AJ Hosein,0,0,,,,,,,,,
1,2024,2024-06-02,"Providence Stadium, Guyana",1,0.2,Papua New Guinea,West Indies,TP Ura,A Vala,AJ Hosein,1,0,,,,,,,,,
1,2024,2024-06-02,"Providence Stadium, Guyana",1,0.3,Papua New Guinea,West Indies,A Vala,TP Ura,AJ Hosein,0,0,,,,,,,,,
1,2024,2024-06-02,"Providence Stadium, Guyana",1,0.4,Papua New Guinea,West Indies,A Vala,TP Ura,AJ Hosein,0,0,,,,,,,,,
1,2024,2024-06-02,"Providence Stadium, Guyana",1,0.5,Papua New Guinea,West Indies,A Vala,TP Ura,AJ Hosein,0,0,,,,,,,,,
1,2024,2024-06-02,"Providence Stadium, Guyana",1,0.6,Papua New Guinea,West Indies,A Vala,TP Ura,AJ Hosein,0,0,,,,,,,,,
1,2024,2024-06-02,"Providence Stadium, Guyana",1,1.1,Papua New Guinea,West Indies,TP Ura,A Vala,R Shepherd,1,0,,,,,,,,,
1,2024,2024-06-02,"Providence Stadium, Guyana",1,1.2,Papua New Guinea,West Indies,A Vala,TP Ura,R Shepherd,0,0,,,,,,,,,
1,2024,2024-06-02,"Providence Stadium, Guyana",1,1.3,Papua New Guinea,West Indies,A Vala,TP Ura,R Shepherd,3,0,,,,,,,,,
1,2024,2024-06-02,"Providence Stadium, Guyana",1,1.4,Papua New Guinea,West Indies,TP Ura,A Vala,R Shepherd,0,0,,,,,,,,,



### Step 3: create the database worldcup_T-20


In [0]:
%sql
CREATE DATABASE IF NOT EXISTS worldcup_T_20;
USE worldcup_T_20;


### Step 4: Save DataFrames as tables in the cricket database



In [0]:
deliveries.write.format("delta").mode("overwrite").saveAsTable("deliveries")
matches.write.format("delta").mode("overwrite").saveAsTable("matches")



We can query this view using Spark SQL. For instance, we can perform a simple aggregation. Notice how we can use `%sql` to query the view from SQL.

In [0]:
%sql
show tables;
select * from matches limit 2

season,team1,team2,date,match_number,venue,city,toss_winner,toss_decision,player_of_match,umpire1,umpire2,reserve_umpire,match_referee,winner,winner_runs,winner_wickets,match_type
2024,Canada,United States of America,2024/06/01,1,Grand Prairie Stadium,Dallas,United States of America,field,Aaron Jones,RK Illingworth,Sharfuddoula,L Rusere,RB Richardson,United States of America,,7,Group
2024,Papua New Guinea,West Indies,2024/06/02,2,Providence Stadium,Providence,West Indies,field,RL Chase,AT Holdstock,Rashid Riaz,HDPK Dharmasena,AJ Pycroft,West Indies,,5,Group
