# Download NYC Yellow Taxi trip data and save as a managed table in your catalog
## Prerequisites
- Create a Databricks workspace with Unity Catalog enabled
- Create a storage account with hierarchical namespace enabled
- Create a container
- Create a folder inside it to store your unity catalog data
- Create a Databricks Access Connector with system managed identity enabled
- Assign the access connector's managed identity the storage blob data contributor role
- Create a new storage credential using the access connector
- Create a new external location using the storage credential
- Create a new catalog called 'dev' using the external location


In [0]:
# Download the expanded nyc taxi data set for 2018
wasbs_path = 'wasbs://nyctlc@azureopendatastorage.blob.core.windows.net/yellow'
df = spark.read.parquet(wasbs_path).filter("puYear = 2018")

# Alternatively, you can use the abridged data set in the samples Delta Share catalog instead:
# df = spark.read.table("samples.nyctaxi.trips")
# note: this data has a different schema with fewer columns than the one at the url above

In [0]:
# Save the data as a managed table in the dev catalog
df.write.mode("overwrite").saveAsTable("dev.default.yellow_taxi")

In [0]:
display(spark.read.table("dev.default.yellow_taxi"))

vendorID,tpepPickupDateTime,tpepDropoffDateTime,passengerCount,tripDistance,puLocationId,doLocationId,startLon,startLat,endLon,endLat,rateCodeId,storeAndFwdFlag,paymentType,fareAmount,extra,mtaTax,improvementSurcharge,tipAmount,tollsAmount,totalAmount,puYear,puMonth
2,2018-03-24T17:42:04.000Z,2018-03-25T01:04:03.000Z,4,3.98,151,229,,,,,1,N,1,17.0,0.0,0.5,0.3,1.78,0.0,21.53,2018,3
2,2018-02-28T13:53:50.000Z,2018-03-01T13:05:23.000Z,1,1.55,161,50,,,,,1,N,2,11.0,0.0,0.5,0.3,0.0,0.0,11.8,2018,3
2,2018-02-28T23:43:27.000Z,2018-03-01T00:03:25.000Z,5,4.81,230,13,,,,,1,N,1,18.0,0.5,0.5,0.3,3.86,0.0,23.16,2018,3
2,2018-02-28T13:09:42.000Z,2018-03-01T13:02:33.000Z,2,3.34,238,229,,,,,1,N,2,15.0,0.0,0.5,0.3,0.0,0.0,15.8,2018,3
2,2018-02-28T23:19:55.000Z,2018-03-01T00:12:57.000Z,1,14.44,100,210,,,,,1,N,1,48.5,0.5,0.5,0.3,5.0,5.76,60.56,2018,3
2,2018-02-28T18:09:14.000Z,2018-03-01T18:00:46.000Z,2,0.44,239,143,,,,,1,N,2,4.0,1.0,0.5,0.3,0.0,0.0,5.8,2018,3
2,2018-02-28T23:54:44.000Z,2018-03-01T00:08:16.000Z,4,2.42,24,42,,,,,1,N,2,11.5,0.5,0.5,0.3,0.0,0.0,12.8,2018,3
2,2018-02-28T14:48:41.000Z,2018-03-01T00:00:00.000Z,2,17.51,132,107,,,,,2,N,1,52.0,0.0,0.5,0.3,0.0,5.76,58.56,2018,3
1,2018-02-28T23:59:07.000Z,2018-03-01T00:06:45.000Z,1,2.3,162,236,,,,,1,N,1,9.0,0.5,0.5,0.3,2.06,0.0,12.36,2018,3
2,2018-02-28T11:36:48.000Z,2018-03-01T11:33:09.000Z,3,0.33,234,164,,,,,1,N,2,6.0,0.0,0.5,0.3,0.0,0.0,6.8,2018,3
