# Defining the Production source

## Adding an abstraction layer for testability 

By defining the ingestion source in an external table, we can easily switch from the production source to a test one.

This lets you easily replace an ingestion from a Kafka server in production by a small csv file in your test. 

This notebook correspond to the PROD stream (the **green** input source on the left)

<img width="1000px" src="https://github.com/QuentinAmbard/databricks-demo/raw/main/product_demos/dlt-advanecd/DLT-advanced-unit-test-1.png"/>

<!-- Collect usage data (view). Remove it to disable collection. View README for more details.  -->
<img width="1px" src="https://ppxrzfxige.execute-api.us-west-2.amazonaws.com/v1/analytics?category=data-engineering&org_id=3782931733495456&notebook=%2Fingestion_profile%2FDLT-ingest_prod&demo_name=dlt-unit-test&event=VIEW&path=%2F_dbdemos%2Fdata-engineering%2Fdlt-unit-test%2Fingestion_profile%2FDLT-ingest_prod&version=1&user_hash=f54348b201997908b91ace6288a9864114e7faea0de6a910579a7ab80989b7e0">

## Production Source for customer dataset


In prod, we'll be using the autoloader to our prod landing folder. 

To give more flexibility in our deployment, we'll go further and set the location as a DLT parameter

In [0]:
import dlt
DEFAULT_LANDING_PATH = "/Volumes/dbdemos/dbdemos_dlt_unit_test/raw_data/prod"

@dlt.view(comment="Raw user data - Production")
def raw_user_data():
  landing_path = spark.conf.get("mypipeline.landing_path", DEFAULT_LANDING_PATH)
  return (
    spark.readStream.format("cloudFiles")
      .option("cloudFiles.format", "json")
      .option("cloudFiles.schemaHints", "id int")
      .load(f"{landing_path}/users_json")
  )

In [0]:
@dlt.view(comment="Raw spend data - Production")
def raw_spend_data():
  landing_path = spark.conf.get("mypipeline.landing_path", DEFAULT_LANDING_PATH)
  return(
    spark.readStream.format("cloudFiles")
    .option("cloudFiles.format","csv")
    .option("cloudFiles.schemaHints","id int, age int, annual_income float, spending_core float")
    .load(f"{landing_path}/spend_csv")
  )