In [4]:
%%capture
!pip install "dlt[sql_database, duckdb]"
!pip install pymysql
!pip install pyyaml

# **Data Contracts**



## **1. Exporting Schema with dlt**
When you run a pipeline, `dlt` internally generates a `<>.schema.json` file. You can export this file to a specific location in YAML format by specifying `export_schema_path="schemas/export"` in your pipeline.


In [3]:
import dlt
from datetime import datetime

data = [
    {"id": "1", "name": "bulbasaur", "size": {"weight": 6.9, "height": 0.7}},
    {"id": "4", "name": "charmander", "size": {"weight": 8.5, "height": 0.6}},
    {"id": "25", "name": "pikachu", "size": {"weight": 6, "height": 0.4}},
]

pipeline = dlt.pipeline(
    pipeline_name="pokemon_load_1",
    destination="duckdb",
    dataset_name="pokemon_data_1",
    export_schema_path="schemas/export", # <--- dir path for a schema export
)

load_info = pipeline.run(data, table_name="pokemon", write_disposition="replace")

In [5]:
import yaml

with open('/content/schemas/export/pokemon_load_1.schema.yaml', 'r') as f:
    data = yaml.load(f, Loader=yaml.SafeLoader)

# Print the values as a dictionary
print(data)

{'version': 2, 'version_hash': 'thKAYcoKWxfP0pc0y9Fo9XgQAjyRyndTpBt+py+wcik=', 'engine_version': 11, 'name': 'pokemon_load_1', 'tables': {'_dlt_version': {'columns': {'version': {'data_type': 'bigint', 'nullable': False}, 'engine_version': {'data_type': 'bigint', 'nullable': False}, 'inserted_at': {'data_type': 'timestamp', 'nullable': False}, 'schema_name': {'data_type': 'text', 'nullable': False}, 'version_hash': {'data_type': 'text', 'nullable': False}, 'schema': {'data_type': 'text', 'nullable': False}}, 'write_disposition': 'skip', 'resource': '_dlt_version', 'description': 'Created by DLT. Tracks schema updates'}, '_dlt_loads': {'columns': {'load_id': {'data_type': 'text', 'nullable': False}, 'schema_name': {'data_type': 'text', 'nullable': True}, 'status': {'data_type': 'bigint', 'nullable': False}, 'inserted_at': {'data_type': 'timestamp', 'nullable': False}, 'schema_version_hash': {'data_type': 'text', 'nullable': True}}, 'write_disposition': 'skip', 'resource': '_dlt_loads', 

### **1.1 dlt Schema Contents**

A `table schema` may have the following properties:

- `name`
- `description`
- `parent`: The name of the parent table if this is a child table.
- `columns`: A list of column schemas defining the table's structure.
- `write_disposition`: A hint telling `dlt` how new data coming into the table should be loaded.


A `column schema` may have the following properties:

- `name`
- `description`
- `data_type`
- `precision`: Defines the precision for text, timestamp, time, bigint, binary, and decimal types.
- `scale`: Defines the scale for the decimal type.
- `is_variant`: Indicates that the column was generated as a variant of another column.

A `column schema` may have the following basic hints:

- `nullable`
- `primary_key`
- `merge_key`: Marks the column as part of the merge key used for incremental loads.
- `foreign_key`
- `root_key`: Marks the column as part of a root key, a type of foreign key that always refers to the root table.
- `unique`


A `column schema` may have the following performance hints:

- `partition`: Marks the column to be used for partitioning data.
- `cluster`: Marks the column to be used for clustering data.
- `sort`: : Marks the column as sortable or ordered; on some destinations, this may generate an index, even if the column is not unique.

> Each destination can interpret these performance hints in its own way. For example, the `cluster` hint is used by Redshift to define table distribution, by BigQuery to specify a cluster column, and is ignored by DuckDB and Postgres when creating tables.

## **2. Data Contracts**

Data contracts are used to govern data quality, they specify the format, schema and protocols governing the exchange between database entitites

Aim to avoid downstream disruptions and make data transformations stable and realiable.

With dlt you can specify data contracts at 3 different levels


### **2.1. Level 1 - Table Level Data Contracts**

dlt provides two options at table level
- `evolve` - Allow creation of new tables (No constraints on schema changes.)
- `freeze` - Prevents any changes to the schema ensuring no new tables can be added

In [6]:
import dlt

# Sample data to be loaded
data = [
    {"id": 1, "name": "Alice"},
    {"id": 2, "name": "Bob"}]

# Create a dlt pipeline
table_pipeline = dlt.pipeline(
    pipeline_name="data_contracts_table_level", destination="duckdb", dataset_name="mydata"
)

# Load the data to the "users" table
load_info = table_pipeline.run(data, table_name="users")
print(load_info)

# Print the row counts for each table that was loaded in the last run of the pipeline
print("\nNumber of new rows loaded into each table: ", table_pipeline.last_trace.last_normalize_info.row_counts)

Pipeline data_contracts_table_level load step completed in 0.20 seconds
1 load package(s) were loaded to destination duckdb and into dataset mydata
The duckdb destination used duckdb:////content/data_contracts_table_level.duckdb location to store data
Load package 1746105341.7485833 is LOADED and contains no failed jobs

Number of new rows loaded into each table:  {'_dlt_pipeline_state': 1, 'users': 2}


In [9]:
# Define a dlt resource that allows the creation of new tables
@dlt.resource(name="new_users", schema_contract={"tables": "evolve"})
def new_users(input_data):
  yield input_data

# Run the pipeline again with the above dtl resource to load the same data into a new table "new_users"
load_info = table_pipeline.run(new_users(data))
print(load_info)

# Print the row counts for each table that was loaded in the last run of the pipeline
print("\nNumber of new rows loaded into each table: ", table_pipeline.last_trace.last_normalize_info.row_counts)

Pipeline data_contracts_table_level load step completed in 0.13 seconds
1 load package(s) were loaded to destination duckdb and into dataset mydata
The duckdb destination used duckdb:////content/data_contracts_table_level.duckdb location to store data
Load package 1746107081.3727233 is LOADED and contains no failed jobs

Number of new rows loaded into each table:  {'new_users': 2}


In [8]:
# Define a dlt resource that prevents any changes to the schema at the table level (no new tables can be added)
@dlt.resource(schema_contract={"tables": "freeze"})
def no_new_tables(input_data):
  yield input_data

# Now, run the pipeline with the resource above, attempting to load the same data into "newest_users".
# This will fail, as new tables can't be added.
load_info = table_pipeline.run(no_new_tables(data), table_name="newest_users")
print(load_info)

PipelineStepFailed: Pipeline execution failed at stage extract when processing package 1746106960.0782938 with exception:

<class 'dlt.common.schema.exceptions.DataValidationError'>
In schema: data_contracts_table_level: In Schema: data_contracts_table_level Table: newest_users  . Contract on tables with mode freeze is violated. Trying to add table newest_users but new tables are frozen.

### **2.2. Level 2 - Column Level Data Contracts**

At the column level, you can specify:
- `evolve`: Allows for the addition of new columns or changes in the existing ones.
- `freeze`: Prevents any changes to the existing columns.
- `discard_row`: Skips rows that have new columns but loads those that follow the existing schema.
- `discard_value`: Doesn't skip entire rows. Instead, it only skips the values of new columns, loading the rest of the row data.

In [17]:
import dlt

# Create a new pipeline
column_pipeline = dlt.pipeline(
    pipeline_name="data_contracts_column_level", destination="duckdb", dataset_name="mydata"
)

# Load the initial data containing columns "id" and "name" into the "users" table
load_info = column_pipeline.run([{"id": 1, "name": "Alice"}], table_name="users", write_disposition="replace")
print(load_info)

Pipeline data_contracts_column_level load step completed in 0.04 seconds
1 load package(s) were loaded to destination duckdb and into dataset mydata
The duckdb destination used duckdb:////content/data_contracts_column_level.duckdb location to store data
Load package 1746107734.1478782 is LOADED and contains no failed jobs


In [18]:
import duckdb
conn = duckdb.connect(f"{column_pipeline.pipeline_name}.duckdb")

conn.sql("SELECT * FROM mydata.users").df()


Unnamed: 0,id,name,_dlt_load_id,_dlt_id,age
0,1,Alice,1746107734.1478782,cx8Awz2KN61ecg,


**Scenario**
Assume that Alice ☝️ is the first user at your imaginary company, and you have now decided to collect users' ages as well.

When you load the information for your second user, Bob, who also provided his age 👇, the schema contract at the column level set to `evolve` will allow `dlt` to automatically adjust the schema in the destination database by adding a new column for "age".

In [19]:
@dlt.resource(schema_contract={"columns":"evolve"})
def allow_new_columns(input_data):
    yield input_data

# Now, load a new row into the same table, "users", which includes an additional column "age"
load_info = column_pipeline.run(allow_new_columns([{"id": 2, "name": "Bob", "age": 35}]), table_name="users")
print(load_info)

# View the data that has been loaded
print("\n")
conn = duckdb.connect(f"{column_pipeline.pipeline_name}.duckdb")
conn.sql("SELECT * FROM mydata.users").df()

Pipeline data_contracts_column_level load step completed in 0.03 seconds
1 load package(s) were loaded to destination duckdb and into dataset mydata
The duckdb destination used duckdb:////content/data_contracts_column_level.duckdb location to store data
Load package 1746107737.026151 is LOADED and contains no failed jobs




Unnamed: 0,id,name,_dlt_load_id,_dlt_id,age
0,1,Alice,1746107734.1478782,cx8Awz2KN61ecg,
1,2,Bob,1746107737.026151,yAgj/PaAzu+uYg,35.0


In [20]:
# Define a dlt resource that skips rows that have new columns but loads those that follow the existing schema
@dlt.resource(schema_contract={"columns": "discard_row"})
def discard_row(input_data):
   yield input_data

# Attempt to load two additional rows. Only the row that follows the existing schema will be loaded
load_info = column_pipeline.run(
    discard_row([
        {"id": 3, "name": "Sam", "age": 30}, # This row will be loaded
        {"id": 4, "name": "Kate", "age": 79, "phone": "123-456-7890"} # This row will not be loaded
    ]),
    table_name="users"
)
print(load_info)

# View the data that has been loaded
print("\n")
conn = duckdb.connect(f"{column_pipeline.pipeline_name}.duckdb")
conn.sql("SELECT * FROM mydata.users").df()

Pipeline data_contracts_column_level load step completed in 0.03 seconds
1 load package(s) were loaded to destination duckdb and into dataset mydata
The duckdb destination used duckdb:////content/data_contracts_column_level.duckdb location to store data
Load package 1746107738.5064209 is LOADED and contains no failed jobs




Unnamed: 0,id,name,_dlt_load_id,_dlt_id,age
0,1,Alice,1746107734.1478782,cx8Awz2KN61ecg,
1,2,Bob,1746107737.026151,yAgj/PaAzu+uYg,35.0
2,3,Sam,1746107738.5064209,vYh/evfPmGh0FA,30.0


*Kate's record was not added as it did not follow the current schema*

In [21]:
# Define a dlt resource that only skips the values of new columns, loading the rest of the row data
@dlt.resource(schema_contract={"columns": "discard_value"})
def discard_value(input_data):
   yield input_data

# Load two additional rows. Since we're using the "discard_value" resource, both rows will be added
# However, the "phone" column in the second row will be ignored and not loaded
load_info = column_pipeline.run(
    discard_value([
        {"id": 5, "name": "Sarah", "age": "23"},
        {"id": 6, "name": "Violetta", "age": "22", "phone": "666-513-4510"}
    ]),
    table_name="users"
)
print(load_info)

# View the data that has been loaded
print("\n")
conn = duckdb.connect(f"{column_pipeline.pipeline_name}.duckdb")
conn.sql("SELECT * FROM mydata.users").df()

Pipeline data_contracts_column_level load step completed in 0.02 seconds
1 load package(s) were loaded to destination duckdb and into dataset mydata
The duckdb destination used duckdb:////content/data_contracts_column_level.duckdb location to store data
Load package 1746107741.2735934 is LOADED and contains no failed jobs




Unnamed: 0,id,name,_dlt_load_id,_dlt_id,age
0,1,Alice,1746107734.1478782,cx8Awz2KN61ecg,
1,2,Bob,1746107737.026151,yAgj/PaAzu+uYg,35.0
2,3,Sam,1746107738.5064209,vYh/evfPmGh0FA,30.0
3,5,Sarah,1746107741.2735934,l9nybxZ8+fTcbA,23.0
4,6,Violetta,1746107741.2735934,GJVynndkU872Cg,22.0


*phone number field was discarded*

In [22]:
# Define a dlt resource that only skips the values of new columns, loading the rest of the row data
@dlt.resource(schema_contract={"columns": "discard_value"})
def discard_value(input_data):
   yield input_data

# Load two additional rows. Since we're using the "discard_value" resource, both rows will be added
# However, the "phone" column in the second row will be ignored and not loaded
load_info = column_pipeline.run(
    discard_value([
        {"id": 7, "name": "Tim", "age": "35"},
        {"id": 8, "name": "Max", "phone": "666-513-4510"}
    ]),
    table_name="users"
)
print(load_info)

# View the data that has been loaded
print("\n")
conn = duckdb.connect(f"{column_pipeline.pipeline_name}.duckdb")
conn.sql("SELECT * FROM mydata.users").df()

Pipeline data_contracts_column_level load step completed in 0.03 seconds
1 load package(s) were loaded to destination duckdb and into dataset mydata
The duckdb destination used duckdb:////content/data_contracts_column_level.duckdb location to store data
Load package 1746107766.3035574 is LOADED and contains no failed jobs




Unnamed: 0,id,name,_dlt_load_id,_dlt_id,age
0,1,Alice,1746107734.1478782,cx8Awz2KN61ecg,
1,2,Bob,1746107737.026151,yAgj/PaAzu+uYg,35.0
2,3,Sam,1746107738.5064209,vYh/evfPmGh0FA,30.0
3,5,Sarah,1746107741.2735934,l9nybxZ8+fTcbA,23.0
4,6,Violetta,1746107741.2735934,GJVynndkU872Cg,22.0
5,7,Tim,1746107766.3035574,+uschJmv0Vwrzg,35.0
6,8,Max,1746107766.3035574,jBRu8FS300e9Rg,


*Basially `discard_value` loads all the values from the data that match the current schema but discards all the other fields*

In [23]:
# Define a dlt resource that does not allow new columns in the data
# so any columns other than user_id, name and age will not be allowed to be created
@dlt.resource(schema_contract={"columns": "freeze"})
def no_new_columns(input_data):
  yield input_data

# Attempt to load a row with additional columns when the column contract is set to freeze
# This will fail as no new columns are allowed.
load_info = column_pipeline.run(
    no_new_columns([
        {"id": 9, "name": "Lisa", "age": 40, "phone": "098-765-4321"}
    ]),
    table_name="users"
)
print(load_info)

PipelineStepFailed: Pipeline execution failed at stage normalize when processing package 1746107830.3729026 with exception:

<class 'dlt.normalize.exceptions.NormalizeJobFailed'>
Job for users.0c6ddb9188.typed-jsonl failed terminally in load 1746107830.3729026 with message In schema: data_contracts_column_level: In Schema: data_contracts_column_level Table: users Column: phone . Contract on columns with mode freeze is violated. Trying to add column phone to table users but columns are frozen..

### **2.3. Level 3 - Data Type Level Data Contracts**

- `evolve`: Allows any data type. This may result with variant columns upstream. dlt creates new variant column when it is not able to validate the incoming data against the data type specified
- `freeze`: Prevents any changes to the existing data types.
- `discard_row`: Omits rows with unverifiable data types.
- `discard_value`: Replaces unverifiable values with None, but retains the rest of the row data.

In [24]:
import dlt

# Create a pipeline for loading data
data_type_pipeline = dlt.pipeline(
    pipeline_name="data_contracts_data_type", destination="duckdb", dataset_name="mydata"
)

# Load the initial data containing a column "age" of type int
load_info = data_type_pipeline.run([{"id": 1, "name": "Alice", "age": 24}], table_name="users")
print(load_info)

# View the data that has been loaded
print("\n")
conn = duckdb.connect(f"{data_type_pipeline.pipeline_name}.duckdb")
conn.sql("SELECT * FROM mydata.users").df()

Pipeline data_contracts_data_type load step completed in 0.33 seconds
1 load package(s) were loaded to destination duckdb and into dataset mydata
The duckdb destination used duckdb:////content/data_contracts_data_type.duckdb location to store data
Load package 1746108012.274977 is LOADED and contains no failed jobs




Unnamed: 0,id,name,age,_dlt_load_id,_dlt_id
0,1,Alice,24,1746108012.274977,XEspFwszoXDAKw


In [25]:
# Define dlt resource that accepts all data types
@dlt.resource(schema_contract={"data_type": "evolve"})
def allow_any_data_type(input_data):
    yield input_data

# Now, load a new row where the "age" column is passed as a string but will be validated and stored as an integer
load_info = data_type_pipeline.run(allow_any_data_type([{"id": 2, "name": "Bob", "age": "35"}]), table_name="users")
print(load_info)

# View the data that has been loaded
print("\n")
conn = duckdb.connect(f"{data_type_pipeline.pipeline_name}.duckdb")
conn.sql("SELECT * FROM mydata.users").df()

Pipeline data_contracts_data_type load step completed in 0.14 seconds
1 load package(s) were loaded to destination duckdb and into dataset mydata
The duckdb destination used duckdb:////content/data_contracts_data_type.duckdb location to store data
Load package 1746108138.5348387 is LOADED and contains no failed jobs




Unnamed: 0,id,name,age,_dlt_load_id,_dlt_id
0,1,Alice,24,1746108012.274977,XEspFwszoXDAKw
1,2,Bob,35,1746108138.534839,8Uiy1dcwD92VJw


In [26]:
# If you pass the age as "thirty-five", a new variant column will be added
load_info = data_type_pipeline.run(allow_any_data_type([{"id": 2, "name": "Bob", "age": "thirty-five"}]), table_name="users")
print(load_info)

# View the data that has been loaded
print("\n")
conn = duckdb.connect(f"{data_type_pipeline.pipeline_name}.duckdb")
conn.sql("SELECT * FROM mydata.users").df()

Pipeline data_contracts_data_type load step completed in 0.04 seconds
1 load package(s) were loaded to destination duckdb and into dataset mydata
The duckdb destination used duckdb:////content/data_contracts_data_type.duckdb location to store data
Load package 1746108212.2995057 is LOADED and contains no failed jobs




Unnamed: 0,id,name,age,_dlt_load_id,_dlt_id,age__v_text
0,1,Alice,24.0,1746108012.274977,XEspFwszoXDAKw,
1,2,Bob,35.0,1746108138.534839,8Uiy1dcwD92VJw,
2,2,Bob,,1746108212.2995057,zKmjBt6sAwYS9A,thirty-five


*Column named `age__v_text` was created, dlt creates variant columns with the following naming convention: `{columnname}__v_{type}`*

In [27]:
# Define dlt resource that omits rows with unverifiable data types
@dlt.resource(schema_contract={"data_type": "discard_row"})
def discard_row(input_data):
   yield input_data

# Attempt to load two additional rows. Only the row where all column types can be validated will be loaded
load_info = data_type_pipeline.run(
    discard_row([
        {"id": 3, "name": "Sam", "age": "35"}, # This row will be loaded
        {"id": 4, "name": "Kate", "age": False} # This row will not be loaded - changed it to boolean
    ]),
    table_name="users"
)
print(load_info)

# View the data that has been loaded
print("\n")
conn = duckdb.connect(f"{data_type_pipeline.pipeline_name}.duckdb")
conn.sql("SELECT * FROM mydata.users").df()

Pipeline data_contracts_data_type load step completed in 0.08 seconds
1 load package(s) were loaded to destination duckdb and into dataset mydata
The duckdb destination used duckdb:////content/data_contracts_data_type.duckdb location to store data
Load package 1746108395.313949 is LOADED and contains no failed jobs




Unnamed: 0,id,name,age,_dlt_load_id,_dlt_id,age__v_text
0,1,Alice,24.0,1746108012.274977,XEspFwszoXDAKw,
1,2,Bob,35.0,1746108138.534839,8Uiy1dcwD92VJw,
2,2,Bob,,1746108212.2995057,zKmjBt6sAwYS9A,thirty-five
3,3,Sam,35.0,1746108395.313949,5C5qkDQZB3S6OA,


*So with `discard_row` at data type level as well - all fields need match the current data types of the schema else the whole row is discarded*

In [28]:
# Define a dlt resource that replaces unverifiable values with None, but retains the rest of the row data
@dlt.resource(schema_contract={"data_type": "discard_value"})
def discard_value(input_data):
   yield input_data

# Load two additional rows. Since we're using the "discard_value" resource, both rows will be added
# However, the "age" value "twenty-eight" in the second row will be ignored and not loaded under the age column rather under the age__v_text column that was created earlier
load_info = data_type_pipeline.run(
    discard_value([
        {"id": 5, "name": "Sarah", "age": 23},
        {"id": 6, "name": "Violetta", "age": "twenty-eight"}
    ]),
    table_name="users"
)
print(load_info)

# View the data that has been loaded
print("\n")
conn = duckdb.connect(f"{data_type_pipeline.pipeline_name}.duckdb")
conn.sql("SELECT * FROM mydata.users").df()

Pipeline data_contracts_data_type load step completed in 0.04 seconds
1 load package(s) were loaded to destination duckdb and into dataset mydata
The duckdb destination used duckdb:////content/data_contracts_data_type.duckdb location to store data
Load package 1746108911.73196 is LOADED and contains no failed jobs




Unnamed: 0,id,name,age,_dlt_load_id,_dlt_id,age__v_text
0,1,Alice,24.0,1746108012.274977,XEspFwszoXDAKw,
1,2,Bob,35.0,1746108138.534839,8Uiy1dcwD92VJw,
2,2,Bob,,1746108212.2995057,zKmjBt6sAwYS9A,thirty-five
3,3,Sam,35.0,1746108395.313949,5C5qkDQZB3S6OA,
4,5,Sarah,23.0,1746108911.73196,VeUcenAUCvhnDQ,
5,6,Violetta,,1746108911.73196,oM5d5tXuHsfkTQ,twenty-eight


In [30]:
# Define dlt resource that prevents any changes to the existing data types
@dlt.resource(schema_contract={"data_type": "freeze"})
def no_data_type_changes(input_data):
  yield input_data

# Attempt to load a row with a column value that can't be validated, in this case False
# This will fail as no data type changes are allowed with the "no_data_type_changes" resource
load_info = data_type_pipeline.run(no_data_type_changes([{"id": 7, "name": "Lisa", "age": False}]), table_name="users")
print(load_info)

PipelineStepFailed: Pipeline execution failed at stage normalize when processing package 1746109001.4054785 with exception:

<class 'dlt.normalize.exceptions.NormalizeJobFailed'>
Job for users.7e16e64735.typed-jsonl failed terminally in load 1746109001.4054785 with message In schema: data_contracts_data_type: In Schema: data_contracts_data_type Table: users Column: age__v_bool . Contract on data_type with mode freeze is violated. Trying to create new variant column age__v_bool to table users but data_types are frozen..

## **3. Using Pydantic and dlt**

**What is Pydantic?**
- Python library that is used for data validation
- It uses type hints for automatically validating and converting data to match the definitions

**Defining Schema with Pydantic**

```python
from pydantic import BaseModel
from typing import List, Optional, Union

class Address(BaseModel):
    street: str
    city: str
    postal_code: str

class User(BaseModel):
    id: int
    name: str
    tags: List[str]
    email: Optional[str]
    address: Address
    status: Union[int, str]

@dlt.resource(name="user", columns=User)
def get_users():
    ...
```

This will set the schema contract to align with the default Pydantic behavior:
```python
{
  "tables": "evolve",
  "columns": "discard_value",
  "data_type": "freeze"
}
```


### **3.1. dlt Contracts with Pydantic**
In this section we will see how `dlt` translates your data contract rules into Pydantic validation behaviors


#### 1. Tables Contract

- This doesn't affect your Pydantic model
- It only applies when creating new tables in your database

#### 2. Columns Contract

- The contract modes for columns get mapped to Pydantic's "extra" behavior settings
- This mapping happens recursively for nested models
- The contract is applied when adding new columns to existing tables

The mapping works like this:

| Column Mode | Pydantic Extra | What it means |
|-------------|----------------|---------------|
| evolve | allow | Accept new fields not in the model and add them to the database |
| freeze | forbid | Reject any data with fields not defined in the model |
| discard_value | ignore | Accept but ignore fields not in the model |
| discard_row | forbid | Same as freeze - reject rows with extra fields |

#### 3. Data Type Contract

This controls what happens when data doesn't match the expected type:

1. `evolve`: Creates a flexible model that accepts any data type (might create variant/mixed type columns)
2. `freeze`: Raises an error if data doesn't match the expected type
3. `discard_row`: Removes entire rows that have type mismatches
4. `discard_value`: Not supported yet


## **4. Extra Pointers**

---

- Unless you specify a schema contract, settings will default to `evolve` on all levels.

- The `schema_contract` argument accepts two forms:
  1. Full form: A detailed mapping of schema entities to their respective contract modes.
  ```python
  schema_contract={"tables": "freeze", "columns": "freeze", "data_type": "freeze"}
  ```
  2. Shorthand form: A single contract mode that will be uniformly applied to all schema entities.
  ```python
  schema_contract="freeze"
  ```

- Schema contracts can be defined for:
  1. `dlt` resources: The contract applies to the corresponding table and any child tables.
  ```python
  @dlt.resource(schema_contract={"columns": "evolve"})
def items():
        ...
  ```
  2. `dlt` sources: The contract serves as a default for all resources within that source.
  ```python
  @dlt.source(schema_contract="freeze")
def source():
        ...
  ```
  3. The `pipeline.run()`: This contract overrides any existing schema contracts.
  ```python
  pipeline.run(source(), schema_contract="freeze")
  ```

- You can change the contract on a `dlt` source via its `schema_contract` property.
```python
source = dlt.source(...)
source.schema_contract = {"tables": "evolve", "columns": "freeze", "data_type": "discard_row"}
```

- To update the contract for `dlt` resources, use `apply_hints`.
```python
resource.apply_hints(schema_contract={"tables": "evolve", "columns": "freeze"})
```

- For the `discard_row` method at the table level, if there are two tables in a parent-child relationship, such as `users` and `users__addresses`, and the contract is violated in the child table, the row in the child table (`users__addresses`) will be discarded, while the corresponding parent row in the `users` table will still be loaded.

- If a table is a `new table` that hasn't been created on the destination yet, `dlt` will allow the creation of new columns. During the first pipeline run, the column mode is temporarily changed to `evolve` and then reverted back to the original mode. Following tables are considered new:
  1. Child tables inferred the nested data.
  2. Dynamic tables created from the data during extraction.
  3. Tables containing incomplete columns - columns without a data type bound to them.

  > Note that tables with columns defined with Pydantic models are not considered new.