# **Custom Destinations and Reverse ETL**

Using dlt's `@dlt.destination` you can send data various destinations other than the usual ones like Postgres or BigQuery, for instance you can send data to
- APIs like notion, slack
- Message queues
- Logging systems
- Custom data sinks

This is useful for Reverse ETL

```python
@dlt.destination(
    batch_size=10, # how many items per function call are batched together
    loader_file_format="jsonl", # in which format should the files be LOADED
    name="my_custom_destination", # a custom name to the destination
    naming_convention="direct", # controls how table and column names are normalised, direct means keep all names same
    max_table_nesting=0, # unnesting of nested fields, here 0 means no unnesting
    skip_dlt_columns_and_tables=True, # defines whether internal tables and columns will be fed into the custom destination
    max_parallel_load_jobs=5,
    loader_parallelism_strategy="table-sequential",
)
def my_destination(items: TDataItems, table: TTableSchema) -> None:
    ...
```

In [8]:
%%capture
!pip install pymysql duckdb dlt

## 1. Creating a Simple Print Output Custom Destination

In [6]:
import dlt
from dlt.common.typing import TDataItems # A single data item or a list as extracted from the data source
from dlt.common.schema import TTableSchema # TypedDict that defines properties of a table

@dlt.destination(batch_size=5)
def print_sink(items: TDataItems, table: TTableSchema):
  print(f"\nTable: {table['name']}")
  for item in items:
    print(item)

@dlt.resource
def simple_data():
    yield [{"id": i, "value": f"row-{i}"} for i in range(10)]


pipeline = dlt.pipeline("print_example", destination=print_sink)
load_info = pipeline.run(simple_data())


Table: simple_data
{'id': 0, 'value': 'row-0'}
{'id': 1, 'value': 'row-1'}
{'id': 2, 'value': 'row-2'}
{'id': 3, 'value': 'row-3'}
{'id': 4, 'value': 'row-4'}

Table: simple_data
{'id': 5, 'value': 'row-5'}
{'id': 6, 'value': 'row-6'}
{'id': 7, 'value': 'row-7'}
{'id': 8, 'value': 'row-8'}
{'id': 9, 'value': 'row-9'}


*We can see that the data was loaded to the custom destination in two batches as specified*

In [7]:
@dlt.destination(batch_size=2)
def print_sink(items: TDataItems, table: TTableSchema):
  print(f"\nTable: {table['name']}")
  for item in items:
    print(item)

@dlt.resource
def simple_data():
    yield [{"id": i, "value": f"row-{i}"} for i in range(6)]


pipeline = dlt.pipeline("print_example", destination=print_sink)
load_info = pipeline.run(simple_data())


Table: simple_data
{'id': 0, 'value': 'row-0'}
{'id': 1, 'value': 'row-1'}

Table: simple_data
{'id': 2, 'value': 'row-2'}
{'id': 3, 'value': 'row-3'}

Table: simple_data
{'id': 4, 'value': 'row-4'}
{'id': 5, 'value': 'row-5'}


## 2. Creating Custom Destination - Notion Database

### 2.1. Initialse Database in Notion
Create a database in Notion iwth columns
Accession (title), ID (text), Description (text)


### 2.2. Install Necessary Libraries and Fetch Notion Secret Key

1. Go to https://www.notion.so/profile/integrations
2. Create a new internal integrations
3. Copy the key and store in google colab's user data
4. Make sure the connect the page with the database with the integration created (On the page > options > connections > select created integration)

In [9]:
%%capture
!pip install dlt pymysql notion-client

*Test if your Notion integration works, you should receive 200 as status code*

In [18]:
import os
import requests
from google.colab import userdata

url = "https://api.notion.com/v1/search"

os.environ["NOTION_SECRET"] = userdata.get("NOTION_SECRET")

# Load token from environment variable
notion_token = os.getenv("NOTION_SECRET")

headers = {
    "Authorization": f"Bearer {notion_token}",
    "Content-Type": "application/json",
    "Notion-Version": "2022-06-28"
}

data = {
    "query": "dlt",
    "filter": {
        "value": "database",
        "property": "object"
    },
    "sort": {
        "direction": "ascending",
        "timestamp": "last_edited_time"
    }
}

response = requests.post(url, headers=headers, json=data)

print(response.status_code)
print(response.json())

200
{'object': 'list', 'results': [{'object': 'database', 'id': '1df9c4c8-7d7b-8000-be1b-e17d187a1b6e', 'cover': None, 'icon': None, 'created_time': '2025-04-24T15:47:00.000Z', 'created_by': {'object': 'user', 'id': '6fc75fc3-822e-42c0-9b58-cd1bff0ae559'}, 'last_edited_by': {'object': 'user', 'id': '6fc75fc3-822e-42c0-9b58-cd1bff0ae559'}, 'last_edited_time': '2025-04-24T16:10:00.000Z', 'title': [{'type': 'text', 'text': {'content': 'dlt Advanced Course', 'link': None}, 'annotations': {'bold': False, 'italic': False, 'strikethrough': False, 'underline': False, 'code': False, 'color': 'default'}, 'plain_text': 'dlt Advanced Course', 'href': None}], 'description': [], 'is_inline': False, 'properties': {'ID': {'id': '%5Cry%5C', 'name': 'ID', 'type': 'rich_text', 'rich_text': {}}, 'Description': {'id': 'pD%5Ex', 'name': 'Description', 'type': 'rich_text', 'rich_text': {}}, 'Accession': {'id': 'title', 'name': 'Accession', 'description': '', 'type': 'title', 'title': {}}}, 'parent': {'type':

### 2.3. Load Data from MySQL Public Data

In [13]:
import os
import dlt
from dlt.sources.sql_database import sql_database

import sqlalchemy as sa
from sqlalchemy import text
from dlt.sources.sql_database import sql_database

# using a query adapter to limit the number of rows ingested
def limit_rows(query, table):
    return text(f"SELECT * FROM {table.fullname} LIMIT 20")


source = sql_database(
    "mysql+pymysql://rfamro@mysql-rfam-public.ebi.ac.uk:4497/Rfam",
    table_names=["family",],
    query_adapter_callback=limit_rows
)

### 2.4. Create Notion Custom Destination

In [15]:
import os
from google.colab import userdata
from notion_client import Client


os.environ["DESTINATION__NOTION__NOTION_AUTH"] = userdata.get('NOTION_SECRET')
os.environ["DESTINATION__NOTION__NOTION_PAGE_ID"] = userdata.get('NOTION_PAGE_ID')

@dlt.destination(name="notion")
def push_to_notion(items, table, notion_auth=dlt.secrets.value, notion_page_id=dlt.secrets.value):
  client = Client(auth=notion_auth)
  print(len(items))

  for item in items:
    client.pages.create(
        parent={"database_id": notion_page_id},
        properties={
                "Accession": {"title": [{"text": {"content": item["rfam_acc"]}}]},
                "ID": {"rich_text": [{"text": {"content": item["rfam_id"]}}]},
                "Description": {"rich_text": [{"text": {"content": item["description"]}}]}
            }
    )

### 2.5. Time to Test the Custom Destination

In [17]:
pipeline = dlt.pipeline("notion_pipeline", destination=push_to_notion, progress="log")
pipeline.run(source, table_name="rfam_family")
print(pipeline.last_trace)



------------------- Load sql_database in 1745510951.2799456 --------------------
Jobs: 1/2 (50.0%) | Time: 0.00s | Rate: 182361.04/s
Memory usage: 356.29 MB (12.50%) | CPU usage: 0.00%

10
10
------------------- Load sql_database in 1745510951.2799456 --------------------
Jobs: 1/2 (50.0%) | Time: 16.93s | Rate: 0.06/s
Memory usage: 356.29 MB (12.80%) | CPU usage: 0.00%

------------------- Load sql_database in 1745510951.2799456 --------------------
Jobs: 2/2 (100.0%) | Time: 16.93s | Rate: 0.12/s
Memory usage: 356.29 MB (12.80%) | CPU usage: 0.00%

Run started at 2025-04-24 16:10:31.760527+00:00 and COMPLETED in 16.96 seconds with 2 steps.
Step load COMPLETED in 16.94 seconds.
Pipeline notion_pipeline load step completed in 16.93 seconds
1 load package(s) were loaded to destination notion and into dataset None
The notion destination used <dlt.common.configuration.specs.base_configuration.CredentialsConfiguration object at 0x7aa8f5ba4a50> location to store data
Load package 1745510951