A city traffic department wants to collect traffic data using swarm UAVs (drones) from a number of locations in the city and use the data collected for improving traffic flow in the city and for a number of other undisclosed projects. You are responsible for creating a scalable data warehouse that will host the vehicle trajectory data extracted by analyzing footage taken by swarm drones and static roadside cameras.

The ELT framework helps analytic engineers in the city traffic department setup transformation workflows on a need basis.

**Reference**

1. https://github.com/Nathnael12/DataEngineering_Datawarehouse_airflow

## Environment Setup

In [2]:
%run ~/prerun.ipynb

In [3]:
import os
from dotenv import load_dotenv
import pandas as pd

In [4]:
load_dotenv()

DBT_PROJECT_DIR = os.getenv('DBT_PROJECT_DIR')
SCHEMA = os.getenv('SCHEMA')

In [5]:
db_credentials = get_secret(secret_name='wysde')

USERNAME = db_credentials["RDS_POSTGRES_USERNAME"]
PASSWORD = db_credentials["RDS_POSTGRES_PASSWORD"]
HOST = db_credentials["RDS_POSTGRES_HOST"]
DBNAME = "postgres"

### dbt

In [8]:
!mkdir -p $DBT_PROJECT_DIR

In [23]:
%%writefile $DBT_PROJECT_DIR/profiles.yml
default:
  outputs:
    dev:
      type: postgres
      threads: 2
      host: {HOST}
      port: 5432
      user: {USERNAME}
      pass: "{PASSWORD}"
      dbname: {DBNAME}
      schema: {SCHEMA}
  target: dev

In [29]:
%%writefile $DBT_PROJECT_DIR/dbt_project.yml
name: 'VehicleTrajectory'
version: '1.0.0'
config-version: 2

profile: 'default'

model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

target-path: "target"  
clean-targets:         
  - "target"
  - "dbt_packages"

In [27]:
!dbt debug --profiles-dir $DBT_PROJECT_DIR --project-dir $DBT_PROJECT_DIR

[0m03:53:12  Running with dbt=1.3.1
dbt version: 1.3.1
python version: 3.9.7
python path: /Users/sparshagarwal/anaconda3/envs/env-spacy/bin/python
os info: macOS-10.16-x86_64-i386-64bit
Using profiles.yml file at /Users/sparshagarwal/Desktop/projects/recohut/de/data-engineering-private/workshops/_drone-analytics/_dbt/profiles.yml
Using dbt_project.yml file at /Users/sparshagarwal/Desktop/projects/recohut/de/data-engineering-private/workshops/_drone-analytics/_dbt/dbt_project.yml

Configuration:
  profiles.yml file [[32mOK found and valid[0m]
  dbt_project.yml file [[32mOK found and valid[0m]

Required dependencies:
 - git [[32mOK found[0m]

Connection:
  host: database-1.cy8ltogyfgas.us-east-1.rds.amazonaws.com
  port: 5432
  user: postgres
  database: postgres
  schema: vehicle_trajectory
  search_path: None
  keepalives_idle: 0
  sslmode: None
  Connection test: [[32mOK connection ok[0m]

[32mAll checks passed![0m


In [28]:
%%writefile $DBT_PROJECT_DIR/models/schema.yml
version: 2

sources:
  - name: source
    schema: {SCHEMA}
    tables:
      - name: endpoints_location
      - name: endpoints_trafficinfo
      
# models:
#   - name: dim_types
#     columns:
#       - name: Id
#         tests:
#           - unique

#   - name: fast_vehicles
#     description: "Query fast vehicles"
    
#   - name: vehicles_summary
#     description: "A summary of vehicles by distance and speed"
  
#   - name: fast_vehicles_summary
#     description: "A summary of vehicles by distance and speed"

#   - name: timely_summary
#     description: "A summary of vehicles by speed"

#   - name: speed_timely_summary
#     description: "A summary of vehicles by distance and speed"

#   - name: lat_timely_summary
#     description: "A summary of vehicles by distance and speed"

#   - name: lon_timely_summary
#     description: "A summary of vehicles by distance and speed"

## EDA

In [10]:
df_trafficinfo = pd.read_csv("data/endpoints_trafficinfo.csv")
df_trafficinfo

Unnamed: 0,id,track_id,type,traveled_d,avg_speed,lat,lon,speed,lon_acc,lat_acc,time,location_id
0,342,343,Medium Vehicle,177.75,9.724672,37.978357,23.737813,22.2933,0.0211,-0.0124,311.0,43.0
1,1,2,Car,103.37,18.985780,37.978027,23.737237,19.5936,-0.0688,-0.8172,0.0,3.0
2,2,3,Motorcycle,130.36,32.589339,37.978110,23.737129,0.0053,0.0083,0.0000,0.0,4.0
3,3,4,Motorcycle,160.70,26.537342,37.978128,23.737149,0.0022,0.0031,0.0000,0.0,4.0
4,4,5,Car,164.14,26.379138,37.978152,23.737110,0.0015,0.0024,0.0000,0.0,4.0
...,...,...,...,...,...,...,...,...,...,...,...,...
756,278,279,Car,228.54,12.855408,37.978318,23.735342,24.9794,-0.0001,0.0772,250.4,1.0
757,279,280,Car,507.76,33.601963,37.980734,23.734999,30.5495,0.0834,-0.0967,252.2,41.0
758,280,281,Medium Vehicle,508.87,34.827755,37.980751,23.735024,38.1639,0.0646,0.0463,252.4,42.0
759,281,282,Car,235.18,13.105756,37.978319,23.735339,25.5763,0.0233,0.0206,252.6,1.0


## Data Load

In [24]:
!dbt seed --profiles-dir $DBT_PROJECT_DIR --project-dir $DBT_PROJECT_DIR

[0m03:49:02  Running with dbt=1.3.1
[0m03:49:02  Partial parse save file not found. Starting full parse.
[0m03:49:08  Found 0 models, 2 tests, 0 snapshots, 0 analyses, 289 macros, 0 operations, 2 seed files, 0 sources, 0 exposures, 0 metrics
[0m03:49:08  
[0m03:49:23  Concurrency: 2 threads (target='dev')
[0m03:49:23  
[0m03:49:23  1 of 2 START seed file vehicle_trajectory.endpoints_location ................... [RUN]
[0m03:49:23  2 of 2 START seed file vehicle_trajectory.endpoints_trafficinfo ................ [RUN]
[0m03:49:27  1 of 2 OK loaded seed file vehicle_trajectory.endpoints_location ............... [[32mINSERT 47[0m in 4.59s]
[0m03:49:44  2 of 2 OK loaded seed file vehicle_trajectory.endpoints_trafficinfo ............ [[32mINSERT 761[0m in 20.77s]
[0m03:49:47  
[0m03:49:47  Finished running 2 seeds in 0 hours 0 minutes and 39.11 seconds (39.11s).
[0m03:49:47  
[0m03:49:47  [32mCompleted successfully[0m
[0m03:49:47  
[0m03:49:47  Done. PASS=2 WARN=0 ERROR=

## Transformation with dbt

In [12]:
!dbt run --profiles-dir $DBT_PROJECT_DIR --project-dir $DBT_PROJECT_DIR

[0m05:11:23  Running with dbt=1.3.1
[0m05:11:24  Found 17 models, 3 tests, 0 snapshots, 0 analyses, 289 macros, 0 operations, 2 seed files, 2 sources, 0 exposures, 0 metrics
[0m05:11:24  
[0m05:11:36  Concurrency: 2 threads (target='dev')
[0m05:11:36  
[0m05:11:36  1 of 17 START sql table model vehicle_trajectory.dim_types ..................... [RUN]
[0m05:11:36  2 of 17 START sql view model vehicle_trajectory.distance_distribution .......... [RUN]
[0m05:11:41  2 of 17 OK created sql view model vehicle_trajectory.distance_distribution ..... [[32mCREATE VIEW[0m in 5.51s]
[0m05:11:41  1 of 17 OK created sql table model vehicle_trajectory.dim_types ................ [[32mSELECT 6[0m in 5.51s]
[0m05:11:41  3 of 17 START sql view model vehicle_trajectory.fast_vehicles .................. [RUN]
[0m05:11:41  4 of 17 START sql view model vehicle_trajectory.max_distance ................... [RUN]
[0m05:11:47  4 of 17 OK created sql view model vehicle_trajectory.max_distance .......