
This notebook shows you how to create and query a table or DataFrame loaded from data stored in Azure Blob storage.


### Step 1: Set the data location and type

There are two ways to access Azure Blob storage: account keys and shared access signatures (SAS).

To get started, we need to set the location and type of the file.

In [0]:
storage_account_name = "account4st0rage"
storage_account_access_key = "fcuu0k4uRz+0pKs4wtVqpfqwBah25BBZ3vV9Q+t6cQcoR/coBZkIXvC3M/HQ94zOSYBcRi3aWZoW+AStQcrrPw=="

In [0]:
file_location = "wasbs://input@account4st0rage.blob.core.windows.net/organizations-100.csv"
file_type = "csv"

In [0]:
spark.conf.set(
  "fs.azure.account.key.account4st0rage.blob.core.windows.net",
  storage_account_access_key)


### Step 2: Read the data

Now that we have specified our file metadata, we can create a DataFrame. Notice that we use an *option* to specify that we want to infer the schema from the file. We can also explicitly set this to a particular schema if we have one already.

First, let's create a DataFrame in Python.

In [0]:
df = spark.read.csv(file_location, header=True)


### Step 3: Query the data

Now that we have created our DataFrame, we can query it. For instance, you can identify particular columns to select and display.


### Step 4: (Optional) Create a view or table

If you want to query this data as a table, you can simply register it as a *view* or a table.

In [0]:
df.describe().show()

In [0]:
display(df.head(10))

Index,Organization Id,Name,Website,Country,Description,Founded,Industry,Number of employees
1,FAB0d41d5b5d22c,Ferrell LLC,https://price.net/,Papua New Guinea,Horizontal empowering knowledgebase,1990,Plastics,3498
2,6A7EdDEA9FaDC52,"Mckinney, Riley and Day",http://www.hall-buchanan.info/,Finland,User-centric system-worthy leverage,2015,Glass / Ceramics / Concrete,4952
3,0bFED1ADAE4bcC1,Hester Ltd,http://sullivan-reed.com/,China,Switchable scalable moratorium,1971,Public Safety,5287
4,2bFC1Be8a4ce42f,Holder-Sellers,https://becker.com/,Turkmenistan,De-engineered systemic artificial intelligence,2004,Automotive,921
5,9eE8A6a4Eb96C24,Mayer Group,http://www.brewer.com/,Mauritius,Synchronized needs-based challenge,1991,Transportation,7870
6,cC757116fe1C085,Henry-Thompson,http://morse.net/,Bahamas,Face-to-face well-modulated customer loyalty,1992,Primary / Secondary Education,4914
7,219233e8aFF1BC3,Hansen-Everett,https://www.kidd.org/,Pakistan,Seamless disintermediate collaboration,2018,Publishing Industry,7832
8,ccc93DCF81a31CD,Mcintosh-Mora,https://www.brooks.com/,Heard Island and McDonald Islands,Centralized attitude-oriented capability,1970,Import / Export,4389
9,0B4F93aA06ED03e,Carr Inc,http://ross.com/,Kuwait,Distributed impactful customer loyalty,1996,Plastics,8167
10,738b5aDe6B1C6A5,Gaines Inc,http://sandoval-hooper.com/,Uzbekistan,Multi-lateral scalable protocol,1997,Outsourcing / Offshoring,9698


In [0]:
display(df.tail(10))

Index,Organization Id,Name,Website,Country,Description,Founded,Industry,Number of employees
91,7ABc3c7ecA03B34,Sampson-Griffith,http://hendricks.org/,Benin,Multi-layered composite paradigm,1972,Textiles,3881
92,4e0719FBE38e0aB,Miles-Dominguez,http://www.turner.com/,Gibraltar,Organized empowering forecast,1996,Civic / Social Organization,897
93,dEbDAAeDfaed00A,Rowe and Sons,https://www.simpson.org/,El Salvador,Balanced multimedia knowledgebase,1978,Facilities Services,8172
94,61BDeCfeFD0cEF5,"Valenzuela, Holmes and Rowland",https://www.dorsey.net/,Taiwan,Persistent tertiary focus group,1999,Transportation,1483
95,4e91eD25f486110,"Best, Wade and Shepard",https://zimmerman.com/,Zimbabwe,Innovative background definition,1991,Gambling / Casinos,4873
96,0a0bfFbBbB8eC7c,Holmes Group,https://mcdowell.org/,Ethiopia,Right-sized zero tolerance focus group,1975,Photography,2988
97,BA6Cd9Dae2Efd62,Good Ltd,http://duffy.com/,Anguilla,Reverse-engineered composite moratorium,1971,Consumer Services,4292
98,E7df80C60Abd7f9,Clements-Espinoza,http://www.flowers.net/,Falkland Islands (Malvinas),Progressive modular hub,1991,Broadcast Media,236
99,AFc285dbE2fEd24,Mendez Inc,https://www.burke.net/,Kyrgyz Republic,User-friendly exuding migration,1993,Education Management,339
100,e9eB5A60Cef8354,Watkins-Kaiser,http://www.herring.com/,Togo,Synergistic background access,2009,Financial Services,2785


In [0]:
 display(df.orderBy(["Country", "Founded" ],ascending=[0,1], index_ignore=True))

Index,Organization Id,Name,Website,Country,Description,Founded,Industry,Number of employees
95,4e91eD25f486110,"Best, Wade and Shepard",https://zimmerman.com/,Zimbabwe,Innovative background definition,1991,Gambling / Casinos,4873
32,f5afd686b3d05F5,"Durham, Allen and Barnes",http://chan-stafford.org/,Zimbabwe,Synergistic web-enabled framework,1993,Mechanical or Industrial Engineering,6135
46,1eD64cFe986BBbE,Walton-Barnett,https://ashley-schaefer.com/,Western Sahara,Right-sized clear-thinking flexibility,2001,Luxury Goods / Jewelry,1746
62,EB9f456e8b7022a,Soto Group,https://norris.info/,Vietnam,Enterprise-wide executive installation,1988,Business Supplies / Equipment,9097
10,738b5aDe6B1C6A5,Gaines Inc,http://sandoval-hooper.com/,Uzbekistan,Multi-lateral scalable protocol,1997,Outsourcing / Offshoring,9698
72,adcB0afbE58bAe3,Wagner LLC,https://decker-esparza.com/,Uruguay,Reactive attitude-oriented toolset,1987,International Affairs,6874
88,8cC1bDa330a5871,Pineda-Morton,https://www.carr.com/,United States Virgin Islands,Grass-roots methodical info-mediaries,1991,Printing,6168
66,fdFbecbadcdCdf1,"Wilkinson, Charles and Arroyo",http://hunter-mcfarland.com/,United States Virgin Islands,Assimilated 24/7 archive,1996,Building Materials,602
14,D2c91cc03CA394c,Glover-Pope,http://www.silva.biz/,United Arab Emirates,Persevering contextually-based approach,2013,Medical Practice,9079
17,68139b5C4De03B4,"Bowers, Guerra and Krause",http://www.carrillo-nicholson.com/,Uganda,De-engineered transitional strategy,1972,Primary / Secondary Education,6986


In [0]:
df.select("Name", "Country", "Founded", "Industry").display()

Name,Country,Founded,Industry
Ferrell LLC,Papua New Guinea,1990,Plastics
"Mckinney, Riley and Day",Finland,2015,Glass / Ceramics / Concrete
Hester Ltd,China,1971,Public Safety
Holder-Sellers,Turkmenistan,2004,Automotive
Mayer Group,Mauritius,1991,Transportation
Henry-Thompson,Bahamas,1992,Primary / Secondary Education
Hansen-Everett,Pakistan,2018,Publishing Industry
Mcintosh-Mora,Heard Island and McDonald Islands,1970,Import / Export
Carr Inc,Kuwait,1996,Plastics
Gaines Inc,Uzbekistan,1997,Outsourcing / Offshoring


In [0]:
df.select("Name","Country", "Founded", "Industry").write.format("csv").option("header",True).option("sep",",").save("wasbs://output@account4st0rage.blob.core.windows.net/outputfile.csv")

In [0]:
df.drop("Website").display()

Index,Organization Id,Name,Country,Description,Founded,Industry,Number of employees
1,FAB0d41d5b5d22c,Ferrell LLC,Papua New Guinea,Horizontal empowering knowledgebase,1990,Plastics,3498
2,6A7EdDEA9FaDC52,"Mckinney, Riley and Day",Finland,User-centric system-worthy leverage,2015,Glass / Ceramics / Concrete,4952
3,0bFED1ADAE4bcC1,Hester Ltd,China,Switchable scalable moratorium,1971,Public Safety,5287
4,2bFC1Be8a4ce42f,Holder-Sellers,Turkmenistan,De-engineered systemic artificial intelligence,2004,Automotive,921
5,9eE8A6a4Eb96C24,Mayer Group,Mauritius,Synchronized needs-based challenge,1991,Transportation,7870
6,cC757116fe1C085,Henry-Thompson,Bahamas,Face-to-face well-modulated customer loyalty,1992,Primary / Secondary Education,4914
7,219233e8aFF1BC3,Hansen-Everett,Pakistan,Seamless disintermediate collaboration,2018,Publishing Industry,7832
8,ccc93DCF81a31CD,Mcintosh-Mora,Heard Island and McDonald Islands,Centralized attitude-oriented capability,1970,Import / Export,4389
9,0B4F93aA06ED03e,Carr Inc,Kuwait,Distributed impactful customer loyalty,1996,Plastics,8167
10,738b5aDe6B1C6A5,Gaines Inc,Uzbekistan,Multi-lateral scalable protocol,1997,Outsourcing / Offshoring,9698


In [0]:
df.groupBy("Founded").count().display()

Founded,count
1987,2
2016,2
2020,2
2012,5
1972,5
1988,3
2017,3
1971,3
2014,3
1984,1


In [0]:
spark.stop()