# Explore the NYC taxi trip Hudi snapshot BigLake table

This notebook can be used to explore NYC taxi trips snapshot (yellow and green) in the Hudi Data Lake with BigLake as external table construct, and via Spark, and the BigQuery Spark connector.

### 1. Get or create Spark Session with requisite Hudi configs

In [2]:
from pyspark.sql import SparkSession

spark = SparkSession.builder \
  .appName("NYC Taxi Hudi BigLake Explorer-PySpark") \
  .master("yarn")\
  .enableHiveSupport() \
  .getOrCreate()

23/07/11 05:04:21 WARN SparkSession: Using an existing Spark session; only runtime SQL configurations will take effect.


In [3]:
spark

### 2. Variables

In [6]:
import os

PROJECT_ID = ""
PROJECT_NBR = ""

# Get your Google Cloud project ID from gcloud
if not os.getenv("IS_TESTING"):
    project_id_output = !gcloud config list --format 'value(core.project)' 2>/dev/null
    PROJECT_ID = project_id_output[0]
    print("PROJECT_ID: ", PROJECT_ID)
    
    
    project_nbr_output = !gcloud projects describe $PROJECT_ID --format='value(projectNumber)'
    PROJECT_NBR = project_nbr_output[0]
    print("PROJECT_NBR: ", PROJECT_NBR)
    

BIGQUERY_SCRATCH_DATASET="gaia_product_ds"
BIGLAKE_FQN=f"{PROJECT_ID}.{BIGQUERY_SCRATCH_DATASET}.nyc_taxi_trips_hudi_biglake"

print("BIGQUERY_SCRATCH_DATASET: ", BIGQUERY_SCRATCH_DATASET)
print("BIGLAKE_FQN: ", BIGLAKE_FQN)

PROJECT_ID:  apache-hudi-lab
PROJECT_NBR:  623600433888
BIGQUERY_SCRATCH_DATASET:  gaia_product_ds
BIGLAKE_FQN:  apache-hudi-lab.gaia_product_ds.nyc_taxi_trips_hudi_biglake


### 3. Explore the Hudi taxi trips in BigLake

In [9]:
spark.conf.set("viewsEnabled","true")
spark.conf.set("materializationDataset",BIGQUERY_SCRATCH_DATASET)

sql = """
  SELECT trip_year, AVG(tip_amount) as avg_tips_in_dollars
    FROM
      gaia_product_ds.nyc_taxi_trips_hudi_biglake
    GROUP BY
      trip_year
    ORDER BY
      trip_year
  """
tripTipsDF = spark.read.format("bigquery").load(sql)
tripTipsDF.show()


+---------+-------------------+
|trip_year|avg_tips_in_dollars|
+---------+-------------------+
|     2019|        2.108474790|
|     2020|        2.038248925|
|     2021|        2.310335055|
|     2022|        2.719184782|
+---------+-------------------+



In [None]:
%%javascript
Jupyter.notebook.session.delete();

<IPython.core.display.Javascript object>