## Overview

This notebook will show you how to create and query a table or DataFrame that you uploaded to DBFS. [DBFS](https://docs.databricks.com/user-guide/dbfs-databricks-file-system.html) is a Databricks File System that allows you to store data for querying inside of Databricks. This notebook assumes that you have a file already inside of DBFS that you would like to read from.

This notebook is written in **Python** so the default cell type is Python. However, you can use different languages by using the `%LANGUAGE` syntax. Python, Scala, SQL, and R are all supported.

In [2]:
# File location and type
file_location = "/FileStore/tables/categories-1.csv"
file_type = "csv"

# CSV options
infer_schema = "false"
first_row_is_header = "True"
delimiter = ","

# The applied options are for CSV files. For other file types, these will be ignored.
df = spark.read.format(file_type) \
  .option("inferSchema", infer_schema) \
  .option("header", first_row_is_header) \
  .option("sep", delimiter) \
  .load(file_location)

display(df)

CategoryID,CategoryName,Description
1,Beverages,Soft drinks coffee teas beers and ales
2,Condiments,Sweet and savory sauces relishes spreads and seasonings
3,Confections,Desserts candies and sweet breads
4,Dairy Products,Cheeses
5,Grains/Cereals,Breads crackers pasta and cereal
6,Meat/Poultry,Prepared meats
7,Produce,Dried fruit and bean curd
8,Seafood,Seaweed and fish


In [3]:
# Create a view or table

temp_table_name = "categories_csv"

df.createOrReplaceTempView(temp_table_name)

In [4]:
%sql

/* Query the created temp table in a SQL cell */
select * from `categories_csv`

CategoryID,CategoryName,Description
1,Beverages,Soft drinks coffee teas beers and ales
2,Condiments,Sweet and savory sauces relishes spreads and seasonings
3,Confections,Desserts candies and sweet breads
4,Dairy Products,Cheeses
5,Grains/Cereals,Breads crackers pasta and cereal
6,Meat/Poultry,Prepared meats
7,Produce,Dried fruit and bean curd
8,Seafood,Seaweed and fish


In [5]:
# With this registered as a temp view, it will only be available to this particular notebook. If you'd like other users to be able to query this table, you can also create a table from the DataFrame.
# Once saved, this table will persist across cluster restarts as well as allow various users across different notebooks to query this data.
# To do so, choose your table name and uncomment the bottom line.

permanent_table_name = "categories_csv"

df.write.format("parquet").saveAsTable(permanent_table_name)

In [6]:
# File location and type
file_location = "/FileStore/tables/customers-1.csv"
file_type = "csv"

# CSV options
infer_schema = "false"
first_row_is_header = "True"
delimiter = ","

# The applied options are for CSV files. For other file types, these will be ignored.
df = spark.read.format(file_type) \
  .option("inferSchema", infer_schema) \
  .option("header", first_row_is_header) \
  .option("sep", delimiter) \
  .load(file_location)

display(df)

CustomerID,CompanyName,ContactName,ContactTitle,Address,City,Region,PostalCode,Country,Phone,Fax
ALFKI,Alfreds Futterkiste,Maria Anders,Sales Representative,Obere Str. 57,Berlin,,12209,Germany,030-0074321,030-0076545
ANATR,Ana Trujillo Emparedados y helados,Ana Trujillo,Owner,Avda. de la Constitución 2222,México D.F.,,05021,Mexico,(5) 555-4729,(5) 555-3745
ANTON,Antonio Moreno Taquería,Antonio Moreno,Owner,Mataderos 2312,México D.F.,,05023,Mexico,(5) 555-3932,
AROUT,Around the Horn,Thomas Hardy,Sales Representative,120 Hanover Sq.,London,,WA1 1DP,UK,(171) 555-7788,(171) 555-6750
BERGS,Berglunds snabbköp,Christina Berglund,Order Administrator,Berguvsvägen 8,Luleå,,S-958 22,Sweden,0921-12 34 65,0921-12 34 67
BLAUS,Blauer See Delikatessen,Hanna Moos,Sales Representative,Forsterstr. 57,Mannheim,,68306,Germany,0621-08460,0621-08924
BLONP,Blondesddsl père et fils,Frédérique Citeaux,Marketing Manager,24 place Kléber,Strasbourg,,67000,France,88.60.15.31,88.60.15.32
BOLID,Bólido Comidas preparadas,Martín Sommer,Owner,C/ Araquil 67,Madrid,,28023,Spain,(91) 555 22 82,(91) 555 91 99
BONAP,Bon app',Laurence Lebihan,Owner,12 rue des Bouchers,Marseille,,13008,France,91.24.45.40,91.24.45.41
BOTTM,Bottom-Dollar Markets,Elizabeth Lincoln,Accounting Manager,23 Tsawassen Blvd.,Tsawassen,BC,T2F 8M4,Canada,(604) 555-4729,(604) 555-3745


In [7]:
# Create a view or table

temp_table_name = "customers_csv"

df.createOrReplaceTempView(temp_table_name)

In [8]:
%sql

/* Query the created temp table in a SQL cell */

select * from `customers_csv`

CustomerID,CompanyName,ContactName,ContactTitle,Address,City,Region,PostalCode,Country,Phone,Fax
ALFKI,Alfreds Futterkiste,Maria Anders,Sales Representative,Obere Str. 57,Berlin,,12209,Germany,030-0074321,030-0076545
ANATR,Ana Trujillo Emparedados y helados,Ana Trujillo,Owner,Avda. de la Constitución 2222,México D.F.,,05021,Mexico,(5) 555-4729,(5) 555-3745
ANTON,Antonio Moreno Taquería,Antonio Moreno,Owner,Mataderos 2312,México D.F.,,05023,Mexico,(5) 555-3932,
AROUT,Around the Horn,Thomas Hardy,Sales Representative,120 Hanover Sq.,London,,WA1 1DP,UK,(171) 555-7788,(171) 555-6750
BERGS,Berglunds snabbköp,Christina Berglund,Order Administrator,Berguvsvägen 8,Luleå,,S-958 22,Sweden,0921-12 34 65,0921-12 34 67
BLAUS,Blauer See Delikatessen,Hanna Moos,Sales Representative,Forsterstr. 57,Mannheim,,68306,Germany,0621-08460,0621-08924
BLONP,Blondesddsl père et fils,Frédérique Citeaux,Marketing Manager,24 place Kléber,Strasbourg,,67000,France,88.60.15.31,88.60.15.32
BOLID,Bólido Comidas preparadas,Martín Sommer,Owner,C/ Araquil 67,Madrid,,28023,Spain,(91) 555 22 82,(91) 555 91 99
BONAP,Bon app',Laurence Lebihan,Owner,12 rue des Bouchers,Marseille,,13008,France,91.24.45.40,91.24.45.41
BOTTM,Bottom-Dollar Markets,Elizabeth Lincoln,Accounting Manager,23 Tsawassen Blvd.,Tsawassen,BC,T2F 8M4,Canada,(604) 555-4729,(604) 555-3745


In [9]:
# With this registered as a temp view, it will only be available to this particular notebook. If you'd like other users to be able to query this table, you can also create a table from the DataFrame.
# Once saved, this table will persist across cluster restarts as well as allow various users across different notebooks to query this data.
# To do so, choose your table name and uncomment the bottom line.

permanent_table_name = "customers_csv"

df.write.format("parquet").saveAsTable(permanent_table_name)

In [10]:
# File location and type
file_location = "/FileStore/tables/employees-3.csv"
file_type = "csv"

# CSV options
infer_schema = "false"
first_row_is_header = "True"
delimiter = ","

# The applied options are for CSV files. For other file types, these will be ignored.
df = spark.read.format(file_type) \
  .option("inferSchema", infer_schema) \
  .option("header", first_row_is_header) \
  .option("sep", delimiter) \
  .load(file_location)

display(df)

EmployeeID,LastName,FirstName,Title,TitleOfCourtesy,BirthDate,HireDate,Address,City,Region,PostalCode,Country,HomePhone,Extension,Photo,Notes,ReportsTo,PhotoPath
1,Davolio,Nancy,Sales Representative,Ms.,1948-12-08 00:00:00.000,1992-05-01 00:00:00.000,507 - 20th Ave. E.Apt. 2A,Seattle,WA,98122,USA,(206) 555-9857,5467,0x151C2F00020000000D000E0014002100FFFFFFFF4269746D617020496D616765005061696E742E506963747572650001050000020000000700000050427275736800000000000000000020540000424D20540000000000007600000028000000C0000000DF0000000100040000000000A0530000CE0E0000D80E0000000000,"Education includes a BA in psychology from Colorado State University in 1970. She also completed ""The Art of the Cold Call."" Nancy is a member of Toastmasters International.",2.0,http://accweb/emmployees/davolio.bmp
2,Fuller,Andrew,Vice President Sales,Dr.,1952-02-19 00:00:00.000,1992-08-14 00:00:00.000,908 W. Capital Way,Tacoma,WA,98401,USA,(206) 555-9482,3457,0x151C2F00020000000D000E0014002100FFFFFFFF4269746D617020496D616765005061696E742E506963747572650001050000020000000700000050427275736800000000000000000020540000424D20540000000000007600000028000000C0000000DF0000000100040000000000A0530000CE0E0000D80E0000000000,Andrew received his BTS commercial in 1974 and a Ph.D. in international marketing from the University of Dallas in 1981. He is fluent in French and Italian and reads German. He joined the company as a sales representative was promoted to sales manager i,,http://accweb/emmployees/fuller.bmp
3,Leverling,Janet,Sales Representative,Ms.,1963-08-30 00:00:00.000,1992-04-01 00:00:00.000,722 Moss Bay Blvd.,Kirkland,WA,98033,USA,(206) 555-3412,3355,0x151C2F00020000000D000E0014002100FFFFFFFF4269746D617020496D616765005061696E742E506963747572650001050000020000000700000050427275736800000000000000000080540000424D80540000000000007600000028000000C0000000E0000000010004000000000000540000CE0E0000D80E0000000000,Janet has a BS degree in chemistry from Boston College (1984). She has also completed a certificate program in food retailing management. Janet was hired as a sales associate in 1991 and promoted to sales representative in February 1992.,2.0,http://accweb/emmployees/leverling.bmp
4,Peacock,Margaret,Sales Representative,Mrs.,1937-09-19 00:00:00.000,1993-05-03 00:00:00.000,4110 Old Redmond Rd.,Redmond,WA,98052,USA,(206) 555-8122,5176,0x151C2F00020000000D000E0014002100FFFFFFFF4269746D617020496D616765005061696E742E506963747572650001050000020000000700000050427275736800000000000000000020540000424D20540000000000007600000028000000C0000000DF0000000100040000000000A0530000CE0E0000D80E0000000000,Margaret holds a BA in English literature from Concordia College (1958) and an MA from the American Institute of Culinary Arts (1966). She was assigned to the London office temporarily from July through November 1992.,2.0,http://accweb/emmployees/peacock.bmp
5,Buchanan,Steven,Sales Manager,Mr.,1955-03-04 00:00:00.000,1993-10-17 00:00:00.000,14 Garrett Hill,London,,SW1 8JR,UK,(71) 555-4848,3453,0x151C2F00020000000D000E0014002100FFFFFFFF4269746D617020496D616765005061696E742E506963747572650001050000020000000700000050427275736800000000000000000020540000424D20540000000000007600000028000000C0000000DF0000000100040000000000A0530000CE0E0000D80E0000000000,Steven Buchanan graduated from St. Andrews University Scotland with a BSC degree in 1976. Upon joining the company as a sales representative in 1992. he spent 6 months in an orientation program at the Seattle office and then returned to his permanent po,2.0,http://accweb/emmployees/buchanan.bmp
6,Suyama,Michael,Sales Representative,Mr.,1963-07-02 00:00:00.000,1993-10-17 00:00:00.000,Coventry HouseMiner Rd.,London,,EC2 7JR,UK,(71) 555-7773,428,0x151C2F00020000000D000E0014002100FFFFFFFF4269746D617020496D616765005061696E742E506963747572650001050000020000000700000050427275736800000000000000000020540000424D16540000000000007600000028000000C0000000DF0000000100040000000000A0530000CE0E0000D80E0000000000,"Michael is a graduate of Sussex University (MA economics 1983) and the University of California at Los Angeles (MBA marketing 1986). He has also taken the courses ""Multi-Cultural Selling"" and ""Time Management for the Sales Professional."" He is fluent",5.0,http://accweb/emmployees/davolio.bmp
7,King,Robert,Sales Representative,Mr.,1960-05-29 00:00:00.000,1994-01-02 00:00:00.000,Edgeham HollowWinchester Way,London,,RG1 9SP,UK,(71) 555-5598,465,0x151C2F00020000000D000E0014002100FFFFFFFF4269746D617020496D616765005061696E742E506963747572650001050000020000000700000050427275736800000000000000000020540000424D16540000000000007600000028000000C0000000DF0000000100040000000000A0530000CE0E0000D80E0000000000,"Robert King served in the Peace Corps and traveled extensively before completing his degree in English at the University of Michigan in 1992 the year he joined the company. After completing a course entitled ""Selling in Europe"" he was transferred to the",5.0,http://accweb/emmployees/davolio.bmp
8,Callahan,Laura,Inside Sales Coordinator,Ms.,1958-01-09 00:00:00.000,1994-03-05 00:00:00.000,4726 - 11th Ave. N.E.,Seattle,WA,98105,USA,(206) 555-1189,2344,0x151C2F00020000000D000E0014002100FFFFFFFF4269746D617020496D616765005061696E742E506963747572650001050000020000000700000050427275736800000000000000000020540000424D16540000000000007600000028000000C0000000DF0000000100040000000000A0530000CE0E0000D80E0000000000,Laura received a BA in psychology from the University of Washington. She has also completed a course in business French. She reads and writes French.,2.0,http://accweb/emmployees/davolio.bmp
9,Dodsworth,Anne,Sales Representative,Ms.,1966-01-27 00:00:00.000,1994-11-15 00:00:00.000,7 Houndstooth Rd.,London,,WG2 7LT,UK,(71) 555-4444,452,0x151C2F00020000000D000E0014002100FFFFFFFF4269746D617020496D616765005061696E742E506963747572650001050000020000000700000050427275736800000000000000000020540000424D16540000000000007600000028000000C0000000DF0000000100040000000000A0530000CE0E0000D80E0000000000,Anne has a BA degree in English from St. Lawrence College. She is fluent in French and German.,5.0,http://accweb/emmployees/davolio.bmp


In [11]:
# Create a view or table

temp_table_name = "employees_csv"

df.createOrReplaceTempView(temp_table_name)

In [12]:
%sql

/* Query the created temp table in a SQL cell */

select * from `employees_csv`

EmployeeID,LastName,FirstName,Title,TitleOfCourtesy,BirthDate,HireDate,Address,City,Region,PostalCode,Country,HomePhone,Extension,Photo,Notes,ReportsTo,PhotoPath
1,Davolio,Nancy,Sales Representative,Ms.,1948-12-08 00:00:00.000,1992-05-01 00:00:00.000,507 - 20th Ave. E.Apt. 2A,Seattle,WA,98122,USA,(206) 555-9857,5467,0x151C2F00020000000D000E0014002100FFFFFFFF4269746D617020496D616765005061696E742E506963747572650001050000020000000700000050427275736800000000000000000020540000424D20540000000000007600000028000000C0000000DF0000000100040000000000A0530000CE0E0000D80E0000000000,"Education includes a BA in psychology from Colorado State University in 1970. She also completed ""The Art of the Cold Call."" Nancy is a member of Toastmasters International.",2.0,http://accweb/emmployees/davolio.bmp
2,Fuller,Andrew,Vice President Sales,Dr.,1952-02-19 00:00:00.000,1992-08-14 00:00:00.000,908 W. Capital Way,Tacoma,WA,98401,USA,(206) 555-9482,3457,0x151C2F00020000000D000E0014002100FFFFFFFF4269746D617020496D616765005061696E742E506963747572650001050000020000000700000050427275736800000000000000000020540000424D20540000000000007600000028000000C0000000DF0000000100040000000000A0530000CE0E0000D80E0000000000,Andrew received his BTS commercial in 1974 and a Ph.D. in international marketing from the University of Dallas in 1981. He is fluent in French and Italian and reads German. He joined the company as a sales representative was promoted to sales manager i,,http://accweb/emmployees/fuller.bmp
3,Leverling,Janet,Sales Representative,Ms.,1963-08-30 00:00:00.000,1992-04-01 00:00:00.000,722 Moss Bay Blvd.,Kirkland,WA,98033,USA,(206) 555-3412,3355,0x151C2F00020000000D000E0014002100FFFFFFFF4269746D617020496D616765005061696E742E506963747572650001050000020000000700000050427275736800000000000000000080540000424D80540000000000007600000028000000C0000000E0000000010004000000000000540000CE0E0000D80E0000000000,Janet has a BS degree in chemistry from Boston College (1984). She has also completed a certificate program in food retailing management. Janet was hired as a sales associate in 1991 and promoted to sales representative in February 1992.,2.0,http://accweb/emmployees/leverling.bmp
4,Peacock,Margaret,Sales Representative,Mrs.,1937-09-19 00:00:00.000,1993-05-03 00:00:00.000,4110 Old Redmond Rd.,Redmond,WA,98052,USA,(206) 555-8122,5176,0x151C2F00020000000D000E0014002100FFFFFFFF4269746D617020496D616765005061696E742E506963747572650001050000020000000700000050427275736800000000000000000020540000424D20540000000000007600000028000000C0000000DF0000000100040000000000A0530000CE0E0000D80E0000000000,Margaret holds a BA in English literature from Concordia College (1958) and an MA from the American Institute of Culinary Arts (1966). She was assigned to the London office temporarily from July through November 1992.,2.0,http://accweb/emmployees/peacock.bmp
5,Buchanan,Steven,Sales Manager,Mr.,1955-03-04 00:00:00.000,1993-10-17 00:00:00.000,14 Garrett Hill,London,,SW1 8JR,UK,(71) 555-4848,3453,0x151C2F00020000000D000E0014002100FFFFFFFF4269746D617020496D616765005061696E742E506963747572650001050000020000000700000050427275736800000000000000000020540000424D20540000000000007600000028000000C0000000DF0000000100040000000000A0530000CE0E0000D80E0000000000,Steven Buchanan graduated from St. Andrews University Scotland with a BSC degree in 1976. Upon joining the company as a sales representative in 1992. he spent 6 months in an orientation program at the Seattle office and then returned to his permanent po,2.0,http://accweb/emmployees/buchanan.bmp
6,Suyama,Michael,Sales Representative,Mr.,1963-07-02 00:00:00.000,1993-10-17 00:00:00.000,Coventry HouseMiner Rd.,London,,EC2 7JR,UK,(71) 555-7773,428,0x151C2F00020000000D000E0014002100FFFFFFFF4269746D617020496D616765005061696E742E506963747572650001050000020000000700000050427275736800000000000000000020540000424D16540000000000007600000028000000C0000000DF0000000100040000000000A0530000CE0E0000D80E0000000000,"Michael is a graduate of Sussex University (MA economics 1983) and the University of California at Los Angeles (MBA marketing 1986). He has also taken the courses ""Multi-Cultural Selling"" and ""Time Management for the Sales Professional."" He is fluent",5.0,http://accweb/emmployees/davolio.bmp
7,King,Robert,Sales Representative,Mr.,1960-05-29 00:00:00.000,1994-01-02 00:00:00.000,Edgeham HollowWinchester Way,London,,RG1 9SP,UK,(71) 555-5598,465,0x151C2F00020000000D000E0014002100FFFFFFFF4269746D617020496D616765005061696E742E506963747572650001050000020000000700000050427275736800000000000000000020540000424D16540000000000007600000028000000C0000000DF0000000100040000000000A0530000CE0E0000D80E0000000000,"Robert King served in the Peace Corps and traveled extensively before completing his degree in English at the University of Michigan in 1992 the year he joined the company. After completing a course entitled ""Selling in Europe"" he was transferred to the",5.0,http://accweb/emmployees/davolio.bmp
8,Callahan,Laura,Inside Sales Coordinator,Ms.,1958-01-09 00:00:00.000,1994-03-05 00:00:00.000,4726 - 11th Ave. N.E.,Seattle,WA,98105,USA,(206) 555-1189,2344,0x151C2F00020000000D000E0014002100FFFFFFFF4269746D617020496D616765005061696E742E506963747572650001050000020000000700000050427275736800000000000000000020540000424D16540000000000007600000028000000C0000000DF0000000100040000000000A0530000CE0E0000D80E0000000000,Laura received a BA in psychology from the University of Washington. She has also completed a course in business French. She reads and writes French.,2.0,http://accweb/emmployees/davolio.bmp
9,Dodsworth,Anne,Sales Representative,Ms.,1966-01-27 00:00:00.000,1994-11-15 00:00:00.000,7 Houndstooth Rd.,London,,WG2 7LT,UK,(71) 555-4444,452,0x151C2F00020000000D000E0014002100FFFFFFFF4269746D617020496D616765005061696E742E506963747572650001050000020000000700000050427275736800000000000000000020540000424D16540000000000007600000028000000C0000000DF0000000100040000000000A0530000CE0E0000D80E0000000000,Anne has a BA degree in English from St. Lawrence College. She is fluent in French and German.,5.0,http://accweb/emmployees/davolio.bmp


In [13]:
# With this registered as a temp view, it will only be available to this particular notebook. If you'd like other users to be able to query this table, you can also create a table from the DataFrame.
# Once saved, this table will persist across cluster restarts as well as allow various users across different notebooks to query this data.
# To do so, choose your table name and uncomment the bottom line.

permanent_table_name = "employees_csv"

df.write.format("parquet").saveAsTable(permanent_table_name)

In [14]:
# File location and type
file_location = "/FileStore/tables/employee_territories-2.csv"
file_type = "csv"

# CSV options
infer_schema = "false"
first_row_is_header = "True"
delimiter = ","

# The applied options are for CSV files. For other file types, these will be ignored.
df = spark.read.format(file_type) \
  .option("inferSchema", infer_schema) \
  .option("header", first_row_is_header) \
  .option("sep", delimiter) \
  .load(file_location)

display(df)

EmployeeID,TerritoryID
1,6897
1,19713
2,1581
2,1730
2,1833
2,2116
2,2139
2,2184
2,40222
3,30346


In [15]:
# Create a view or table

temp_table_name = "employee_territories_csv"

df.createOrReplaceTempView(temp_table_name)

In [16]:
%sql

/* Query the created temp table in a SQL cell */

select * from `employee_territories_csv`

EmployeeID,TerritoryID
1,6897
1,19713
2,1581
2,1730
2,1833
2,2116
2,2139
2,2184
2,40222
3,30346


In [17]:
# With this registered as a temp view, it will only be available to this particular notebook. If you'd like other users to be able to query this table, you can also create a table from the DataFrame.
# Once saved, this table will persist across cluster restarts as well as allow various users across different notebooks to query this data.
# To do so, choose your table name and uncomment the bottom line.

permanent_table_name = "employee_territories_csv"

df.write.format("parquet").saveAsTable(permanent_table_name)

In [18]:
# File location and type
file_location = "/FileStore/tables/order_details-5.csv"
file_type = "csv"

# CSV options
infer_schema = "false"
first_row_is_header = "True"
delimiter = ","

# The applied options are for CSV files. For other file types, these will be ignored.
df = spark.read.format(file_type) \
  .option("inferSchema", infer_schema) \
  .option("header", first_row_is_header) \
  .option("sep", delimiter) \
  .load(file_location)

display(df)

OrderID,ProductID,UnitPrice,Quantity,Discount
10248,11,14.0,12,0.0
10248,42,9.8,10,0.0
10248,72,34.8,5,0.0
10249,14,18.6,9,0.0
10249,51,42.4,40,0.0
10250,41,7.7,10,0.0
10250,51,42.4,35,0.15
10250,65,16.8,15,0.15
10251,22,16.8,6,0.05
10251,57,15.6,15,0.05


In [19]:
# Create a view or table

temp_table_name = "order_details_csv"

df.createOrReplaceTempView(temp_table_name)

In [20]:
%sql

/* Query the created temp table in a SQL cell */

select * from `order_details_csv`

OrderID,ProductID,UnitPrice,Quantity,Discount
10248,11,14.0,12,0.0
10248,42,9.8,10,0.0
10248,72,34.8,5,0.0
10249,14,18.6,9,0.0
10249,51,42.4,40,0.0
10250,41,7.7,10,0.0
10250,51,42.4,35,0.15
10250,65,16.8,15,0.15
10251,22,16.8,6,0.05
10251,57,15.6,15,0.05


In [21]:
# With this registered as a temp view, it will only be available to this particular notebook. If you'd like other users to be able to query this table, you can also create a table from the DataFrame.
# Once saved, this table will persist across cluster restarts as well as allow various users across different notebooks to query this data.
# To do so, choose your table name and uncomment the bottom line.

permanent_table_name = "order_details_csv"

df.write.format("parquet").saveAsTable(permanent_table_name)

In [22]:
# File location and type
file_location = "/FileStore/tables/orders-3.csv"
file_type = "csv"

# CSV options
infer_schema = "false"
first_row_is_header = "True"
delimiter = ","

# The applied options are for CSV files. For other file types, these will be ignored.
df = spark.read.format(file_type) \
  .option("inferSchema", infer_schema) \
  .option("header", first_row_is_header) \
  .option("sep", delimiter) \
  .load(file_location)

display(df)

OrderID,CustomerID,EmployeeID,OrderDate,RequiredDate,ShippedDate,ShipVia,Freight,ShipName,ShipAddress,ShipCity,ShipRegion,ShipPostalCode,ShipCountry
10248,VINET,5,1996-07-04 00:00:00.000,1996-08-01 00:00:00.000,1996-07-16 00:00:00.000,3,32.38,Vins et alcools Chevalier,59 rue de l'Abbaye,Reims,,51100,France
10249,TOMSP,6,1996-07-05 00:00:00.000,1996-08-16 00:00:00.000,1996-07-10 00:00:00.000,1,11.61,Toms Spezialitäten,Luisenstr. 48,Münster,,44087,Germany
10250,HANAR,4,1996-07-08 00:00:00.000,1996-08-05 00:00:00.000,1996-07-12 00:00:00.000,2,65.83,Hanari Carnes,Rua do Paço,67,Rio de Janeiro,RJ,05454-876
10251,VICTE,3,1996-07-08 00:00:00.000,1996-08-05 00:00:00.000,1996-07-15 00:00:00.000,1,41.34,Victuailles en stock,2,rue du Commerce,Lyon,,69004
10252,SUPRD,4,1996-07-09 00:00:00.000,1996-08-06 00:00:00.000,1996-07-11 00:00:00.000,2,51.3,Suprêmes délices,Boulevard Tirou,255,Charleroi,,B-6000
10253,HANAR,3,1996-07-10 00:00:00.000,1996-07-24 00:00:00.000,1996-07-16 00:00:00.000,2,58.17,Hanari Carnes,Rua do Paço,67,Rio de Janeiro,RJ,05454-876
10254,CHOPS,5,1996-07-11 00:00:00.000,1996-08-08 00:00:00.000,1996-07-23 00:00:00.000,2,22.98,Chop-suey Chinese,Hauptstr. 31,Bern,,3012,Switzerland
10255,RICSU,9,1996-07-12 00:00:00.000,1996-08-09 00:00:00.000,1996-07-15 00:00:00.000,3,148.33,Richter Supermarkt,Starenweg 5,Genève,,1204,Switzerland
10256,WELLI,3,1996-07-15 00:00:00.000,1996-08-12 00:00:00.000,1996-07-17 00:00:00.000,2,13.97,Wellington Importadora,Rua do Mercado,12,Resende,SP,08737-363
10257,HILAA,4,1996-07-16 00:00:00.000,1996-08-13 00:00:00.000,1996-07-22 00:00:00.000,3,81.91,HILARION-Abastos,Carrera 22 con Ave. Carlos Soublette #8-35,San Cristóbal,Táchira,5022,Venezuela


In [23]:
# Create a view or table

temp_table_name = "orders_csv"

df.createOrReplaceTempView(temp_table_name)

In [24]:
%sql

/* Query the created temp table in a SQL cell */

select * from `orders_csv`

OrderID,CustomerID,EmployeeID,OrderDate,RequiredDate,ShippedDate,ShipVia,Freight,ShipName,ShipAddress,ShipCity,ShipRegion,ShipPostalCode,ShipCountry
10248,VINET,5,1996-07-04 00:00:00.000,1996-08-01 00:00:00.000,1996-07-16 00:00:00.000,3,32.38,Vins et alcools Chevalier,59 rue de l'Abbaye,Reims,,51100,France
10249,TOMSP,6,1996-07-05 00:00:00.000,1996-08-16 00:00:00.000,1996-07-10 00:00:00.000,1,11.61,Toms Spezialitäten,Luisenstr. 48,Münster,,44087,Germany
10250,HANAR,4,1996-07-08 00:00:00.000,1996-08-05 00:00:00.000,1996-07-12 00:00:00.000,2,65.83,Hanari Carnes,Rua do Paço,67,Rio de Janeiro,RJ,05454-876
10251,VICTE,3,1996-07-08 00:00:00.000,1996-08-05 00:00:00.000,1996-07-15 00:00:00.000,1,41.34,Victuailles en stock,2,rue du Commerce,Lyon,,69004
10252,SUPRD,4,1996-07-09 00:00:00.000,1996-08-06 00:00:00.000,1996-07-11 00:00:00.000,2,51.3,Suprêmes délices,Boulevard Tirou,255,Charleroi,,B-6000
10253,HANAR,3,1996-07-10 00:00:00.000,1996-07-24 00:00:00.000,1996-07-16 00:00:00.000,2,58.17,Hanari Carnes,Rua do Paço,67,Rio de Janeiro,RJ,05454-876
10254,CHOPS,5,1996-07-11 00:00:00.000,1996-08-08 00:00:00.000,1996-07-23 00:00:00.000,2,22.98,Chop-suey Chinese,Hauptstr. 31,Bern,,3012,Switzerland
10255,RICSU,9,1996-07-12 00:00:00.000,1996-08-09 00:00:00.000,1996-07-15 00:00:00.000,3,148.33,Richter Supermarkt,Starenweg 5,Genève,,1204,Switzerland
10256,WELLI,3,1996-07-15 00:00:00.000,1996-08-12 00:00:00.000,1996-07-17 00:00:00.000,2,13.97,Wellington Importadora,Rua do Mercado,12,Resende,SP,08737-363
10257,HILAA,4,1996-07-16 00:00:00.000,1996-08-13 00:00:00.000,1996-07-22 00:00:00.000,3,81.91,HILARION-Abastos,Carrera 22 con Ave. Carlos Soublette #8-35,San Cristóbal,Táchira,5022,Venezuela


In [25]:
# With this registered as a temp view, it will only be available to this particular notebook. If you'd like other users to be able to query this table, you can also create a table from the DataFrame.
# Once saved, this table will persist across cluster restarts as well as allow various users across different notebooks to query this data.
# To do so, choose your table name and uncomment the bottom line.

permanent_table_name = "orders_csv"

df.write.format("parquet").saveAsTable(permanent_table_name)

In [26]:
# File location and type
file_location = "/FileStore/tables/products-2.csv"
file_type = "csv"

# CSV options
infer_schema = "false"
first_row_is_header = "True"
delimiter = ","

# The applied options are for CSV files. For other file types, these will be ignored.
df = spark.read.format(file_type) \
  .option("inferSchema", infer_schema) \
  .option("header", first_row_is_header) \
  .option("sep", delimiter) \
  .load(file_location)

display(df)

ProductID,ProductName,SupplierID,CategoryID,QuantityPerUnit,UnitPrice,UnitsInStock,UnitsOnOrder,ReorderLevel,Discontinued
1,Chai,1,1,10 boxes x 20 bags,18.0,39,0,10,0
2,Chang,1,1,24 - 12 oz bottles,19.0,17,40,25,0
3,Aniseed Syrup,1,2,12 - 550 ml bottles,10.0,13,70,25,0
4,Chef Anton's Cajun Seasoning,2,2,48 - 6 oz jars,22.0,53,0,0,0
5,Chef Anton's Gumbo Mix,2,2,36 boxes,21.35,0,0,0,1
6,Grandma's Boysenberry Spread,3,2,12 - 8 oz jars,25.0,120,0,25,0
7,Uncle Bob's Organic Dried Pears,3,7,12 - 1 lb pkgs.,30.0,15,0,10,0
8,Northwoods Cranberry Sauce,3,2,12 - 12 oz jars,40.0,6,0,0,0
9,Mishi Kobe Niku,4,6,18 - 500 g pkgs.,97.0,29,0,0,1
10,Ikura,4,8,12 - 200 ml jars,31.0,31,0,0,0


In [27]:
# Create a view or table

temp_table_name = "products_csv"

df.createOrReplaceTempView(temp_table_name)

In [28]:
%sql

/* Query the created temp table in a SQL cell */

select * from `products_csv`

ProductID,ProductName,SupplierID,CategoryID,QuantityPerUnit,UnitPrice,UnitsInStock,UnitsOnOrder,ReorderLevel,Discontinued
1,Chai,1,1,10 boxes x 20 bags,18.0,39,0,10,0
2,Chang,1,1,24 - 12 oz bottles,19.0,17,40,25,0
3,Aniseed Syrup,1,2,12 - 550 ml bottles,10.0,13,70,25,0
4,Chef Anton's Cajun Seasoning,2,2,48 - 6 oz jars,22.0,53,0,0,0
5,Chef Anton's Gumbo Mix,2,2,36 boxes,21.35,0,0,0,1
6,Grandma's Boysenberry Spread,3,2,12 - 8 oz jars,25.0,120,0,25,0
7,Uncle Bob's Organic Dried Pears,3,7,12 - 1 lb pkgs.,30.0,15,0,10,0
8,Northwoods Cranberry Sauce,3,2,12 - 12 oz jars,40.0,6,0,0,0
9,Mishi Kobe Niku,4,6,18 - 500 g pkgs.,97.0,29,0,0,1
10,Ikura,4,8,12 - 200 ml jars,31.0,31,0,0,0


In [29]:
# With this registered as a temp view, it will only be available to this particular notebook. If you'd like other users to be able to query this table, you can also create a table from the DataFrame.
# Once saved, this table will persist across cluster restarts as well as allow various users across different notebooks to query this data.
# To do so, choose your table name and uncomment the bottom line.

permanent_table_name = "products_csv"

df.write.format("parquet").saveAsTable(permanent_table_name)

In [30]:
# File location and type
file_location = "/FileStore/tables/regions-2.csv"
file_type = "csv"

# CSV options
infer_schema = "false"
first_row_is_header = "True"
delimiter = ","

# The applied options are for CSV files. For other file types, these will be ignored.
df = spark.read.format(file_type) \
  .option("inferSchema", infer_schema) \
  .option("header", first_row_is_header) \
  .option("sep", delimiter) \
  .load(file_location)

display(df)

RegionID,RegionDescription
1,Eastern
2,Western
3,Northern
4,Southern


In [31]:
# Create a view or table

temp_table_name = "regions_csv"

df.createOrReplaceTempView(temp_table_name)

In [32]:
%sql

/* Query the created temp table in a SQL cell */

select * from `regions_csv`

RegionID,RegionDescription
1,Eastern
2,Western
3,Northern
4,Southern


In [33]:
# With this registered as a temp view, it will only be available to this particular notebook. If you'd like other users to be able to query this table, you can also create a table from the DataFrame.
# Once saved, this table will persist across cluster restarts as well as allow various users across different notebooks to query this data.
# To do so, choose your table name and uncomment the bottom line.

permanent_table_name = "regions_csv"

df.write.format("parquet").saveAsTable(permanent_table_name)

In [34]:
# File location and type
file_location = "/FileStore/tables/shippers-2.csv"
file_type = "csv"

# CSV options
infer_schema = "false"
first_row_is_header = "True"
delimiter = ","

# The applied options are for CSV files. For other file types, these will be ignored.
df = spark.read.format(file_type) \
  .option("inferSchema", infer_schema) \
  .option("header", first_row_is_header) \
  .option("sep", delimiter) \
  .load(file_location)

display(df)

ShipperID,CompanyName,Phone
1,Speedy Express,(503) 555-9831
2,United Package,(503) 555-3199
3,Federal Shipping,(503) 555-9931


In [35]:
# Create a view or table

temp_table_name = "shippers_csv"

df.createOrReplaceTempView(temp_table_name)

In [36]:
%sql

/* Query the created temp table in a SQL cell */

select * from `shippers_csv`

ShipperID,CompanyName,Phone
1,Speedy Express,(503) 555-9831
2,United Package,(503) 555-3199
3,Federal Shipping,(503) 555-9931


In [37]:
# With this registered as a temp view, it will only be available to this particular notebook. If you'd like other users to be able to query this table, you can also create a table from the DataFrame.
# Once saved, this table will persist across cluster restarts as well as allow various users across different notebooks to query this data.
# To do so, choose your table name and uncomment the bottom line.

permanent_table_name = "shippers_csv"

df.write.format("parquet").saveAsTable(permanent_table_name)

In [38]:
# File location and type
file_location = "/FileStore/tables/suppliers-3.csv"
file_type = "csv"

# CSV options
infer_schema = "false"
first_row_is_header = "true"
delimiter = ","

# The applied options are for CSV files. For other file types, these will be ignored.
df = spark.read.format(file_type) \
  .option("inferSchema", infer_schema) \
  .option("header", first_row_is_header) \
  .option("sep", delimiter) \
  .load(file_location)

display(df)

SupplierID,CompanyName,ContactName,ContactTitle,Address,City,Region,PostalCode,Country,Phone,Fax,HomePage
1,Exotic Liquids,Charlotte Cooper,Purchasing Manager,49 Gilbert St.,London,,EC1 4SD,UK,(171) 555-2222,,
2,New Orleans Cajun Delights,Shelley Burke,Order Administrator,P.O. Box 78934,New Orleans,LA,70117,USA,(100) 555-4822,,#CAJUN.HTM#
3,Grandma Kelly's Homestead,Regina Murphy,Sales Representative,707 Oxford Rd.,Ann Arbor,MI,48104,USA,(313) 555-5735,(313) 555-3349,
4,Tokyo Traders,Yoshi Nagase,Marketing Manager,9-8 Sekimai Musashino-shi,Tokyo,,100,Japan,(03) 3555-5011,,
5,Cooperativa de Quesos 'Las Cabras',Antonio del Valle Saavedra,Export Administrator,Calle del Rosal 4,Oviedo,Asturias,33007,Spain,(98) 598 76 54,,
6,Mayumi's,Mayumi Ohno,Marketing Representative,92 Setsuko Chuo-ku,Osaka,,545,Japan,(06) 431-7877,,Mayumi's (on the World Wide Web)#http://www.microsoft.com/accessdev/sampleapps/mayumi.htm#
7,Pavlova,Ltd.,Ian Devling,Marketing Manager,74 Rose St. Moonie Ponds,Melbourne,Victoria,3058,Australia,(03) 444-2343,(03) 444-6588
8,Specialty Biscuits,Ltd.,Peter Wilson,Sales Representative,29 King's Way,Manchester,,M14 GSD,UK,(161) 555-4448,
9,PB Knäckebröd AB,Lars Peterson,Sales Agent,Kaloadagatan 13,Göteborg,,S-345 67,Sweden,031-987 65 43,031-987 65 91,
10,Refrescos Americanas LTDA,Carlos Diaz,Marketing Manager,Av. das Americanas 12.890,Sao Paulo,,5442,Brazil,(11) 555 4640,,


In [39]:
# Create a view or table

temp_table_name = "suppliers_csv"

df.createOrReplaceTempView(temp_table_name)

In [40]:
%sql

/* Query the created temp table in a SQL cell */

select * from `suppliers_csv`

SupplierID,CompanyName,ContactName,ContactTitle,Address,City,Region,PostalCode,Country,Phone,Fax,HomePage
1,Exotic Liquids,Charlotte Cooper,Purchasing Manager,49 Gilbert St.,London,,EC1 4SD,UK,(171) 555-2222,,
2,New Orleans Cajun Delights,Shelley Burke,Order Administrator,P.O. Box 78934,New Orleans,LA,70117,USA,(100) 555-4822,,#CAJUN.HTM#
3,Grandma Kelly's Homestead,Regina Murphy,Sales Representative,707 Oxford Rd.,Ann Arbor,MI,48104,USA,(313) 555-5735,(313) 555-3349,
4,Tokyo Traders,Yoshi Nagase,Marketing Manager,9-8 Sekimai Musashino-shi,Tokyo,,100,Japan,(03) 3555-5011,,
5,Cooperativa de Quesos 'Las Cabras',Antonio del Valle Saavedra,Export Administrator,Calle del Rosal 4,Oviedo,Asturias,33007,Spain,(98) 598 76 54,,
6,Mayumi's,Mayumi Ohno,Marketing Representative,92 Setsuko Chuo-ku,Osaka,,545,Japan,(06) 431-7877,,Mayumi's (on the World Wide Web)#http://www.microsoft.com/accessdev/sampleapps/mayumi.htm#
7,Pavlova,Ltd.,Ian Devling,Marketing Manager,74 Rose St. Moonie Ponds,Melbourne,Victoria,3058,Australia,(03) 444-2343,(03) 444-6588
8,Specialty Biscuits,Ltd.,Peter Wilson,Sales Representative,29 King's Way,Manchester,,M14 GSD,UK,(161) 555-4448,
9,PB Knäckebröd AB,Lars Peterson,Sales Agent,Kaloadagatan 13,Göteborg,,S-345 67,Sweden,031-987 65 43,031-987 65 91,
10,Refrescos Americanas LTDA,Carlos Diaz,Marketing Manager,Av. das Americanas 12.890,Sao Paulo,,5442,Brazil,(11) 555 4640,,


In [41]:
# With this registered as a temp view, it will only be available to this particular notebook. If you'd like other users to be able to query this table, you can also create a table from the DataFrame.
# Once saved, this table will persist across cluster restarts as well as allow various users across different notebooks to query this data.
# To do so, choose your table name and uncomment the bottom line.

permanent_table_name = "suppliers_csv"

df.write.format("parquet").saveAsTable(permanent_table_name)

In [42]:
# File location and type
file_location = "/FileStore/tables/territories-1.csv"
file_type = "csv"

# CSV options
infer_schema = "false"
first_row_is_header = "true"
delimiter = ","

# The applied options are for CSV files. For other file types, these will be ignored.
df = spark.read.format(file_type) \
  .option("inferSchema", infer_schema) \
  .option("header", first_row_is_header) \
  .option("sep", delimiter) \
  .load(file_location)

display(df)

TerritoryID,TerritoryDescription,RegionID
1581,Westboro,1
1730,Bedford,1
1833,Georgetow,1
2116,Boston,1
2139,Cambridge,1
2184,Braintree,1
2903,Providence,1
3049,Hollis,3
3801,Portsmouth,3
6897,Wilton,1


In [43]:
# Create a view or table

temp_table_name = "territories_csv"

df.createOrReplaceTempView(temp_table_name)

In [44]:
%sql

/* Query the created temp table in a SQL cell */

select * from `territories_csv`

TerritoryID,TerritoryDescription,RegionID
1581,Westboro,1
1730,Bedford,1
1833,Georgetow,1
2116,Boston,1
2139,Cambridge,1
2184,Braintree,1
2903,Providence,1
3049,Hollis,3
3801,Portsmouth,3
6897,Wilton,1


In [45]:
# With this registered as a temp view, it will only be available to this particular notebook. If you'd like other users to be able to query this table, you can also create a table from the DataFrame.
# Once saved, this table will persist across cluster restarts as well as allow various users across different notebooks to query this data.
# To do so, choose your table name and uncomment the bottom line.

permanent_table_name = "territories_csv"

df.write.format("parquet").saveAsTable(permanent_table_name)