Module 4 Homework: Analytics Engineering with dbt In this homework, we'll use the dbt project in 04-analytics-engineering/taxi_rides_ny/ to transform NYC taxi data and answer questions by querying the models.
Setup Set up your dbt project following the setup guide Load the Green and Yellow taxi data for 2019-2020 into your warehouse Run dbt build to create all models and run tests After a successful build, you should have models like fct_trips, dim_zones, and fct_monthly_zone_revenue in your warehouse.
Question 1. dbt Lineage and Execution Given a dbt project with the following structure:
models/ βββ staging/ β βββ stg_green_tripdata.sql β βββ stg_yellow_tripdata.sql βββ intermediate/ βββ int_trips_unioned.sql (depends on stg_green_tripdata & stg_yellow_tripdata) If you run dbt run --select int_trips_unioned, what models will be built?
stg_green_tripdata, stg_yellow_tripdata, and int_trips_unioned (upstream dependencies) Any model with upstream and downstream dependencies to int_trips_unioned int_trips_unioned only int_trips_unioned, int_trips, and fct_trips (downstream dependencies) Question 2. dbt Tests You've configured a generic test like this in your schema.yml:
columns:
- name: payment_type
data_tests:
- accepted_values: arguments: values: [1, 2, 3, 4, 5] Your model fct_trips has been running successfully for months. A new value 6 now appears in the source data.
What happens when you run dbt test --select fct_trips?
dbt will skip the test because the model didn't change dbt will fail the test, returning a non-zero exit code dbt will pass the test with a warning about the new value dbt will update the configuration to include the new value Question 3. Counting Records in fct_monthly_zone_revenue After running your dbt project, query the fct_monthly_zone_revenue model.
What is the count of records in the fct_monthly_zone_revenue model?
12,998 14,120 12,184 15,421 Question 4. Best Performing Zone for Green Taxis (2020) Using the fct_monthly_zone_revenue table, find the pickup zone with the highest total revenue (revenue_monthly_total_amount) for Green taxi trips in 2020.
Which zone had the highest revenue?
East Harlem North Morningside Heights East Harlem South Washington Heights South Question 5. Green Taxi Trip Counts (October 2019) Using the fct_monthly_zone_revenue table, what is the total number of trips (total_monthly_trips) for Green taxis in October 2019?
500,234 350,891 384,624 421,509 Question 6. Build a Staging Model for FHV Data Create a staging model for the For-Hire Vehicle (FHV) trip data for 2019.
Load the FHV trip data for 2019 into your data warehouse Create a staging model stg_fhv_tripdata with these requirements: Filter out records where dispatching_base_num IS NULL Rename fields to match your project's naming conventions (e.g., PUlocationID β pickup_location_id) What is the count of records in stg_fhv_tripdata?
42,084,899 43,244,696 22,998,722 44,112,187 Submitting the solutions Form for submitting: https://courses.datatalks.club/de-zoomcamp-2026/homework/hw4 Solution Video: To be published after deadline Learning in Public We encourage everyone to share what they learned. This is called "learning in public".
Read more about the benefits here.
Example post for LinkedIn π Week 4 of Data Engineering Zoomcamp by @DataTalksClub complete!
Just finished Module 4 - Analytics Engineering with dbt. Learned how to:
β Build transformation models with dbt β Create staging, intermediate, and fact tables β Write tests to ensure data quality β Understand lineage and model dependencies β Analyze revenue patterns across NYC zones
Transforming raw data into analytics-ready models - the T in ELT!
Here's my homework solution:
Following along with this amazing free course - who else is learning data engineering?
You can sign up here: https://github.com/DataTalksClub/data-engineering-zoomcamp/ Example post for Twitter/X π Module 4 of Data Engineering Zoomcamp done!
- Analytics Engineering with dbt
- Transformation models & tests
- Data lineage & dependencies
- NYC taxi revenue analysis
My solution:
Free course by @DataTalksClub: https://github.com/DataTalksClub/data-engineering-zoomcamp/