## Overview

`dlt` is an open-source library that you can add to your Python scripts to load data from various and often messy data sources into well-structured, live datasets.

How it works?

`dlt` extracts data from a source, inspects its structure to generate a schema, organizes, normalizes and verifies the data, and loads the data into a destination, such as a database.


![img](images/dlt-high-level.png)

Below, we give you a preview of how you can get data from APIs, files, Python objects or pandas dataframes and move it into a local or remote database, data lake or a vector data store. 

Let's get started!

## Installation

Official releases of dlt can be installed from [PyPI](https://pypi.org/project/dlt/):

In [164]:
!pip install -q dlt

Command above just installs library core, in example below we use `duckdb` as a [destination](https://dlthub.com/docs/dlt-ecosystem/destinations), so let's add it:

In [165]:
!pip install -q "dlt[duckdb]"

> Use clean virtual environment for your experiments! Here are [detailed instructions](https://dlthub.com/docs/reference/installation).

## Quick start

Let's load a list of Python objects (dicts) into `duckdb` database and inspect the created dataset.

> We gonna use `full_refresh` for our test examples. If you create a new pipeline script you will be experimenting a lot. If you want that each time the pipeline resets its state and loads data to a new dataset, set the full_refresh argument of the dlt.pipeline method to True. Each time the pipeline is created, dlt adds datetime-based suffix to the dataset name.

In [166]:
import dlt

data = [
	{'id': 1, 'name': 'Alice'},
	{'id': 2, 'name': 'Bob'}
]

pipeline = dlt.pipeline(
	pipeline_name='quick_start',
	destination='duckdb',
	dataset_name='mydata',
    full_refresh=True, 
)
load_info = pipeline.run(data, table_name="users")
print(load_info)

Pipeline quick_start completed in 0.46 seconds
1 load package(s) were loaded to destination duckdb and into dataset mydata_20230904091046
The duckdb destination used duckdb:////home/alenaastrakhantseva/dlthub/spotlight_demo/quick_start.duckdb location to store data
Load package 1693818646.790694 is LOADED and contains no failed jobs


### Now explore your data! 

To see the schema of your created database, run Streamlit command:

```python
 dlt pipeline <pipeline_name> show
```
[This command](https://dlthub.com/docs/reference/command-line-interface#show-tables-and-data-in-the-destination) generates and launches a simple Streamlit app that you can use to inspect the schemas and data in the destination.

To use `streamlit`, install it first.

For example above pipeline name is “quick_start”, so run:

In [139]:
!pip install -q streamlit pandas==2.0.0

In [167]:
!dlt pipeline quick_start show

Found pipeline [1mquick_start[0m in [1m/home/alenaastrakhantseva/.dlt/pipelines[0m

  You can now view your Streamlit app in your browser.

  Local URL: http://localhost:8501
  Network URL: http://172.16.0.2:8501

^C
  Stopping...


## Load data from variety of sources

Use dlt to load practically any data you deal with in your Python script into a dataset. 

The library will create/update tables, infer data types and deal with nested data automatically:
- list of dicts
- json
- csv
- API
- database
- etc.

### from JSON

When creating a schema during normalization, dlt recursively unpacks this nested structure into relational tables, creating and linking [children and parent tables](https://dlthub.com/docs/dlt-ecosystem/visualizations/understanding-the-tables#child-and-parent-tables).

In [169]:
# create test json file

import json

with open("test.json", 'w') as file:
    data = {
        'id': 1, 
        'name': 'Alice', 
        'job': {
            "company": "ScaleVector",
            "title": "Data Scientist",
        },
        'children': [
            {
                'id': 1, 
                'name': 'Eve'
            },
            {
                'id': 2, 
                'name': 'Wendy'
            }
        ]
    }
    json.dump(data, file)


In [170]:
# load test json to duckdb database

import json
import dlt


with open("test.json", 'r') as file:
    data = json.load(file)


pipeline = dlt.pipeline(
	pipeline_name='from_json',
	destination='duckdb', 
	dataset_name='mydata',
    full_refresh=True,
)
# dlt works with lists of dicts, so wrap data to the list
load_info = pipeline.run([data], table_name="json_data")
print(load_info)

Pipeline from_json completed in 0.56 seconds
1 load package(s) were loaded to destination duckdb and into dataset mydata_20230904091445
The duckdb destination used duckdb:////home/alenaastrakhantseva/dlthub/spotlight_demo/from_json.duckdb location to store data
Load package 1693818885.744856 is LOADED and contains no failed jobs


In [172]:
import duckdb

conn = duckdb.connect(f"{pipeline.pipeline_name}.duckdb")
conn.sql(f"SET search_path = '{pipeline.dataset_name}'")
display(conn.sql("DESCRIBE"))
data_table = conn.sql("SELECT * FROM json_data").df()
data_table

┌───────────┬──────────────────────┬─────────────────────┬──────────────────────┬──────────────────────────┬───────────┐
│ database  │        schema        │        name         │     column_names     │       column_types       │ temporary │
│  varchar  │       varchar        │       varchar       │      varchar[]       │        varchar[]         │  boolean  │
├───────────┼──────────────────────┼─────────────────────┼──────────────────────┼──────────────────────────┼───────────┤
│ from_json │ mydata_20230904091…  │ _dlt_loads          │ [load_id, schema_n…  │ [VARCHAR, VARCHAR, BIG…  │ false     │
│ from_json │ mydata_20230904091…  │ _dlt_pipeline_state │ [version, engine_v…  │ [BIGINT, BIGINT, VARCH…  │ false     │
│ from_json │ mydata_20230904091…  │ _dlt_version        │ [version, engine_v…  │ [BIGINT, BIGINT, TIMES…  │ false     │
│ from_json │ mydata_20230904091…  │ json_data           │ [id, name, job__co…  │ [BIGINT, VARCHAR, VARC…  │ false     │
│ from_json │ mydata_20230904091

Unnamed: 0,id,name,job__company,job__title,_dlt_load_id,_dlt_id
0,1,Alice,ScaleVector,Data Scientist,1693818885.744856,+j3y0Oxq0pFlWA


### from API

Below we load 100 most recent issues from our [own dlt repository](https://github.com/dlt-hub/dlt) into "issues" table.

In [173]:
import dlt
import requests


# url to request dlt-hub/dlt issues
url = "https://api.github.com/repos/dlt-hub/dlt/issues"
# make the request and check if succeeded
response = requests.get(url)
response.raise_for_status()

pipeline = dlt.pipeline(
	pipeline_name='from_api',
	destination='duckdb', 
	dataset_name='mydata',
    full_refresh=True,
)
load_info = pipeline.run(response.json(), table_name="issues")
print(load_info)

Pipeline from_api completed in 0.85 seconds
1 load package(s) were loaded to destination duckdb and into dataset mydata_20230904091800
The duckdb destination used duckdb:////home/alenaastrakhantseva/dlthub/spotlight_demo/from_api.duckdb location to store data
Load package 1693819080.431144 is LOADED and contains no failed jobs


In [174]:
import duckdb

conn = duckdb.connect(f"{pipeline.pipeline_name}.duckdb")
conn.sql(f"SET search_path = '{pipeline.dataset_name}'")
display(conn.sql("DESCRIBE"))
data_table = conn.sql("SELECT * FROM issues").df()
data_table.head()

┌──────────┬──────────────────────┬─────────────────────┬──────────────────────┬───────────────────────────┬───────────┐
│ database │        schema        │        name         │     column_names     │       column_types        │ temporary │
│ varchar  │       varchar        │       varchar       │      varchar[]       │         varchar[]         │  boolean  │
├──────────┼──────────────────────┼─────────────────────┼──────────────────────┼───────────────────────────┼───────────┤
│ from_api │ mydata_20230904091…  │ _dlt_loads          │ [load_id, schema_n…  │ [VARCHAR, VARCHAR, BIGI…  │ false     │
│ from_api │ mydata_20230904091…  │ _dlt_pipeline_state │ [version, engine_v…  │ [BIGINT, BIGINT, VARCHA…  │ false     │
│ from_api │ mydata_20230904091…  │ _dlt_version        │ [version, engine_v…  │ [BIGINT, BIGINT, TIMEST…  │ false     │
│ from_api │ mydata_20230904091…  │ issues              │ [url, repository_u…  │ [VARCHAR, VARCHAR, VARC…  │ false     │
│ from_api │ mydata_20230904091…

Unnamed: 0,url,repository_url,labels_url,comments_url,events_url,html_url,id,node_id,number,title,...,assignee__following_url,assignee__gists_url,assignee__starred_url,assignee__subscriptions_url,assignee__organizations_url,assignee__repos_url,assignee__events_url,assignee__received_events_url,assignee__type,assignee__site_admin
0,https://api.github.com/repos/dlt-hub/dlt/issue...,https://api.github.com/repos/dlt-hub/dlt,https://api.github.com/repos/dlt-hub/dlt/issue...,https://api.github.com/repos/dlt-hub/dlt/issue...,https://api.github.com/repos/dlt-hub/dlt/issue...,https://github.com/dlt-hub/dlt/pull/607,1879448366,PR_kwDOGvRYu85ZcM8t,607,Added MongoDB documentation.,...,,,,,,,,,,
1,https://api.github.com/repos/dlt-hub/dlt/issue...,https://api.github.com/repos/dlt-hub/dlt,https://api.github.com/repos/dlt-hub/dlt/issue...,https://api.github.com/repos/dlt-hub/dlt/issue...,https://api.github.com/repos/dlt-hub/dlt/issue...,https://github.com/dlt-hub/dlt/pull/606,1878822576,PR_kwDOGvRYu85ZaS70,606,Add support for TIME data type,...,,,,,,,,,,
2,https://api.github.com/repos/dlt-hub/dlt/issue...,https://api.github.com/repos/dlt-hub/dlt,https://api.github.com/repos/dlt-hub/dlt/issue...,https://api.github.com/repos/dlt-hub/dlt/issue...,https://api.github.com/repos/dlt-hub/dlt/issue...,https://github.com/dlt-hub/dlt/issues/605,1877653146,I_kwDOGvRYu85v6raa,605,support TIME type,...,https://api.github.com/users/steinitzu/followi...,https://api.github.com/users/steinitzu/gists{/...,https://api.github.com/users/steinitzu/starred...,https://api.github.com/users/steinitzu/subscri...,https://api.github.com/users/steinitzu/orgs,https://api.github.com/users/steinitzu/repos,https://api.github.com/users/steinitzu/events{...,https://api.github.com/users/steinitzu/receive...,User,False
3,https://api.github.com/repos/dlt-hub/dlt/issue...,https://api.github.com/repos/dlt-hub/dlt,https://api.github.com/repos/dlt-hub/dlt/issue...,https://api.github.com/repos/dlt-hub/dlt/issue...,https://api.github.com/repos/dlt-hub/dlt/issue...,https://github.com/dlt-hub/dlt/pull/603,1876027382,PR_kwDOGvRYu85ZQ-0y,603,Add module_config customization in the Weaviat...,...,,,,,,,,,,
4,https://api.github.com/repos/dlt-hub/dlt/issue...,https://api.github.com/repos/dlt-hub/dlt,https://api.github.com/repos/dlt-hub/dlt/issue...,https://api.github.com/repos/dlt-hub/dlt/issue...,https://api.github.com/repos/dlt-hub/dlt/issue...,https://github.com/dlt-hub/dlt/pull/601,1873728536,PR_kwDOGvRYu85ZJKdR,601,Fixes docs on apply hints,...,,,,,,,,,,


## Append or replace your data

Run this examples twice and you notice that each time a copy of the data is added to your tables.
We call this load mode `append`. It is very useful when i.e. you have a new folder created daily with `json` file logs, and you want to ingest them.

In [177]:
import dlt


data = [
	{'id': 1, 'name': 'Alice'},
	{'id': 2, 'name': 'Bob'}
]

pipeline = dlt.pipeline(
	pipeline_name='append',
	destination='duckdb',
	dataset_name='mydata',
    full_refresh=False, 
)
load_info = pipeline.run(data, table_name="users")
print(load_info)

Pipeline append completed in 0.26 seconds
1 load package(s) were loaded to destination duckdb and into dataset mydata
The duckdb destination used duckdb:////home/alenaastrakhantseva/dlthub/spotlight_demo/append.duckdb location to store data
Load package 1693819143.898282 is LOADED and contains no failed jobs


In [178]:
import duckdb

conn = duckdb.connect(f"{pipeline.pipeline_name}.duckdb")
conn.sql(f"SET search_path = '{pipeline.dataset_name}'")
data_table = conn.sql("SELECT * FROM users").df()
data_table

Unnamed: 0,id,name,_dlt_load_id,_dlt_id
0,1,Alice,1693819136.763849,6WSNEZYqGbBWeg
1,2,Bob,1693819136.763849,QcnOcIZKxZZ/8w
2,1,Alice,1693819143.898282,Cv/IeEVAfNCL+Q
3,2,Bob,1693819143.898282,T8B6UpmcodfFyg


Perhaps this is not what you want to do in the example above.
For example, if the CSV file is updated, how we can refresh it in the database?
One method is to tell `dlt` to replace the data in existing tables by using `write_disposition`.

In [181]:
import dlt


data = [
	{'id': 1, 'name': 'Alice'},
	{'id': 2, 'name': 'Bob'}
]

pipeline = dlt.pipeline(
	pipeline_name='replace',
	destination='duckdb',
	dataset_name='mydata',
    full_refresh=False, 
)
load_info = pipeline.run(data, table_name="users", write_disposition="replace")
print(load_info)

Pipeline replace completed in 0.32 seconds
1 load package(s) were loaded to destination duckdb and into dataset mydata
The duckdb destination used duckdb:////home/alenaastrakhantseva/dlthub/spotlight_demo/replace.duckdb location to store data
Load package 1693819185.262428 is LOADED and contains no failed jobs


In [182]:
import duckdb

conn = duckdb.connect(f"{pipeline.pipeline_name}.duckdb")
conn.sql(f"SET search_path = '{pipeline.dataset_name}'")
data_table = conn.sql("SELECT * FROM users").df()
data_table

Unnamed: 0,id,name,_dlt_load_id,_dlt_id
0,1,Alice,1693819185.262428,jFrY0LpeuVPivg
1,2,Bob,1693819185.262428,Xrlm9Zc3EebJ+A


## Declare loading behavior

You can finetune the loading process by decorating Python functions with `@dlt.resource`.

### Load only new data (incremental loading)

We can supercharge the example above and get only users that were created since last load.
Instead of using `replace` write_disposition and downloading all users each time the pipeline is run, we do the following:

In [185]:
import dlt


data = [
	{'id': 1, 'name': 'Alice', 'created_at': "2023-09-01"},
	{'id': 2, 'name': 'Bob', 'created_at': "2023-09-02"},
    {'id': 3, 'name': 'Chad', 'created_at': "2023-09-03"},
    {'id': 4, 'name': 'Carol', 'created_at': "2023-09-04"}
]

@dlt.resource
def users(
    created_at=dlt.sources.incremental("created_at", initial_value="2023-08-01")
):
    yield from data
    
pipeline = dlt.pipeline(
	pipeline_name='incremental',
	destination='duckdb',
	dataset_name='mydata',
    full_refresh=False, 
)
load_info = pipeline.run(users)
print(load_info)

Pipeline incremental completed in 0.29 seconds
1 load package(s) were loaded to destination duckdb and into dataset mydata
The duckdb destination used duckdb:////home/alenaastrakhantseva/dlthub/spotlight_demo/incremental.duckdb location to store data
Load package 1693819322.241362 is LOADED and contains no failed jobs


We use the `@dlt.resource` decorator to declare table name to which data will be loaded and write disposition, which is `append` by default.

We also use `dlt.sources.incremental` to track `created_at` field present in each user to filter only the newly created ones.

Now run the script. It loads all the users from our test data to `duckdb`. Run it again, and you can see that no users got added.

In [186]:
import duckdb

conn = duckdb.connect(f"{pipeline.pipeline_name}.duckdb")
conn.sql(f"SET search_path = '{pipeline.dataset_name}'")
data_table = conn.sql("SELECT * FROM users").df()
data_table.head()

Unnamed: 0,id,name,created_at,_dlt_load_id,_dlt_id
0,1,Alice,2023-09-01,1693819298.15018,vB8l/GcWY6n8lw
1,2,Bob,2023-09-02,1693819298.15018,z6ftrCSl7BHg4A
2,3,Chad,2023-09-03,1693819298.15018,ODUplz+EUMJlRg
3,4,Carol,2023-09-04,1693819322.241362,s7ybTxVmDCLbCQ


## Update and deduplicate your data

The script above finds new users and adds them to the database.
It will ignore any updates to user information.
Get always fresh content of all the users: combine an incremental load with `merge` write disposition,
like in the script below.

In [195]:
import dlt


data = [
	{'id': 1, 'name': 'Alice', 'created_at': "2023-09-01", 'updated_at': "2023-09-01"},
	{'id': 2, 'name': 'Boba', 'created_at': "2023-09-02", 'updated_at': "2023-09-05"},
    {'id': 3, 'name': 'Chad', 'created_at': "2023-09-03", 'updated_at': "2023-09-03"},
    {'id': 4, 'name': 'Carol', 'created_at': "2023-09-04", 'updated_at': "2023-09-04"}
]

@dlt.resource(
    write_disposition="merge",
    primary_key="id",
)
def users(
    updated_at=dlt.sources.incremental("updated_at", initial_value="2023-08-01")
):
    yield from data
    
pipeline = dlt.pipeline(
	pipeline_name='merge',
	destination='duckdb',
	dataset_name='mydata',
    full_refresh=False, 
)
load_info = pipeline.run(users)
print(load_info)

Pipeline merge completed in 0.07 seconds
0 load package(s) were loaded to destination duckdb and into dataset None
The duckdb destination used duckdb:////home/alenaastrakhantseva/dlthub/spotlight_demo/merge.duckdb location to store data


Above we add `primary_key` hint that tells `dlt` how to identify the users in the database to find duplicates which content it will merge.


In [196]:
import duckdb

conn = duckdb.connect(f"{pipeline.pipeline_name}.duckdb")
conn.sql(f"SET search_path = '{pipeline.dataset_name}'")
data_table = conn.sql("SELECT * FROM users").df()
data_table.head()

Unnamed: 0,id,name,created_at,updated_at,_dlt_load_id,_dlt_id
0,2,Boba,2023-09-02,2023-09-05,1693819361.004953,XPkSUWS/XYOe/g
1,1,Alice,2023-09-01,2023-09-01,1693819361.004953,6HYWF4lUAUoTnA
2,4,Carol,2023-09-04,2023-09-04,1693819361.004953,Ue2lSCGe7wgoNg
3,3,Chad,2023-09-03,2023-09-03,1693819361.004953,SzObGfgXId/PHw


## Real life example

We can improve the GitHub API example above and get only issues that were created since last load.

In [198]:
import dlt
import requests


@dlt.resource(
    table_name="issues",
    write_disposition="merge",
    primary_key="id",
)
def get_issues(
    updated_at = dlt.sources.incremental("updated_at", initial_value="1970-01-01T00:00:00Z")
):
    # url to request dlt-hub issues
    url = f"https://api.github.com/repos/dlt-hub/dlt/issues?since={updated_at.last_value}"

    while True:
        response = requests.get(url)
        page_items = response.json()

        if len(page_items) == 0:
            break
        yield page_items

        if "next" not in response.links:
            break
        url = response.links["next"]["url"]


pipeline = dlt.pipeline(
    pipeline_name='github_issues_merge',
    destination='duckdb',
    dataset_name='mydata',
    full_refresh=False,
)
# dlt works with lists of dicts, so wrap data to the list
load_info = pipeline.run(get_issues)
print(load_info)

Pipeline github_issues_merge completed in 3.29 seconds
1 load package(s) were loaded to destination duckdb and into dataset mydata
The duckdb destination used duckdb:////home/alenaastrakhantseva/dlthub/spotlight_demo/github_issues_merge.duckdb location to store data
Load package 1693819683.2984 is LOADED and contains no failed jobs



Note that we now track the `updated_at` field - so we filter in all issues **updated** since the last pipeline run (which also includes newly created ones).

Also pay attention how we use **since** [GitHub API](https://docs.github.com/en/rest/issues/issues?apiVersion=2022-11-28#list-repository-issues)
and `updated_at.last_value` to tell GitHub which issues we are interested in. `updated_at.last_value` holds the last `updated_at` value from the previous run.

Now you can run this script on a daily schedule, and each day you'll load only issues created after the time of the previous pipeline run.

In [199]:
import duckdb

conn = duckdb.connect(f"{pipeline.pipeline_name}.duckdb")
conn.sql(f"SET search_path = '{pipeline.dataset_name}'")
data_table = conn.sql("SELECT * FROM issues").df()
data_table.head()

Unnamed: 0,id,url,repository_url,labels_url,comments_url,events_url,html_url,node_id,number,title,...,assignee__following_url,assignee__gists_url,assignee__starred_url,assignee__subscriptions_url,assignee__organizations_url,assignee__repos_url,assignee__events_url,assignee__received_events_url,assignee__type,assignee__site_admin
0,1879448366,https://api.github.com/repos/dlt-hub/dlt/issue...,https://api.github.com/repos/dlt-hub/dlt,https://api.github.com/repos/dlt-hub/dlt/issue...,https://api.github.com/repos/dlt-hub/dlt/issue...,https://api.github.com/repos/dlt-hub/dlt/issue...,https://github.com/dlt-hub/dlt/pull/607,PR_kwDOGvRYu85ZcM8t,607,Added MongoDB documentation.,...,,,,,,,,,,
1,1878822576,https://api.github.com/repos/dlt-hub/dlt/issue...,https://api.github.com/repos/dlt-hub/dlt,https://api.github.com/repos/dlt-hub/dlt/issue...,https://api.github.com/repos/dlt-hub/dlt/issue...,https://api.github.com/repos/dlt-hub/dlt/issue...,https://github.com/dlt-hub/dlt/pull/606,PR_kwDOGvRYu85ZaS70,606,Add support for TIME data type,...,,,,,,,,,,
2,1877653146,https://api.github.com/repos/dlt-hub/dlt/issue...,https://api.github.com/repos/dlt-hub/dlt,https://api.github.com/repos/dlt-hub/dlt/issue...,https://api.github.com/repos/dlt-hub/dlt/issue...,https://api.github.com/repos/dlt-hub/dlt/issue...,https://github.com/dlt-hub/dlt/issues/605,I_kwDOGvRYu85v6raa,605,support TIME type,...,https://api.github.com/users/steinitzu/followi...,https://api.github.com/users/steinitzu/gists{/...,https://api.github.com/users/steinitzu/starred...,https://api.github.com/users/steinitzu/subscri...,https://api.github.com/users/steinitzu/orgs,https://api.github.com/users/steinitzu/repos,https://api.github.com/users/steinitzu/events{...,https://api.github.com/users/steinitzu/receive...,User,False
3,1876027382,https://api.github.com/repos/dlt-hub/dlt/issue...,https://api.github.com/repos/dlt-hub/dlt,https://api.github.com/repos/dlt-hub/dlt/issue...,https://api.github.com/repos/dlt-hub/dlt/issue...,https://api.github.com/repos/dlt-hub/dlt/issue...,https://github.com/dlt-hub/dlt/pull/603,PR_kwDOGvRYu85ZQ-0y,603,Add module_config customization in the Weaviat...,...,,,,,,,,,,
4,1873728536,https://api.github.com/repos/dlt-hub/dlt/issue...,https://api.github.com/repos/dlt-hub/dlt,https://api.github.com/repos/dlt-hub/dlt/issue...,https://api.github.com/repos/dlt-hub/dlt/issue...,https://api.github.com/repos/dlt-hub/dlt/issue...,https://github.com/dlt-hub/dlt/pull/601,PR_kwDOGvRYu85ZJKdR,601,Fixes docs on apply hints,...,,,,,,,,,,


### Use existed verified sources

To use existed verified source, just run the `dlt init` [command](https://dlthub.com/docs/reference/command-line-interface#dlt-init).

List all verified sources:

In [200]:
!dlt init --list-verified-sources

Looking up for verified sources in [1mhttps://github.com/dlt-hub/verified-sources.git[0m...
[1mairtable[0m: Source that loads tables form Airtable.
[1msalesforce[0m: Source for Salesforce depending on the simple_salesforce python package.
[1mnotion[0m: A source that extracts data from Notion API
[1mjira[0m: This source uses Jira API and dlt to load data such as Issues, Users, Workflows and Projects to the database. 
[1mhubspot[0m: This is a module that provides a DLT source to retrieve data from multiple endpoints of the HubSpot API using a specified API key. The retrieved data is returned as a tuple of Dlt resources, one for each endpoint.
[1mpipedrive[0m: Highly customizable source for Pipedrive, supports endpoint addition, selection and column rename
[1mchess[0m: A source loading player profiles and games from chess.com api
[1masana_dlt[0m: This source provides data extraction from the Asana platform via their API.
[1mfacebook_ads[0m: Loads campaigns, ad

This command shows all available verified sources and their short descriptions. For each source, checks if your local `dlt` version requires update and prints the relevant warning.

Consider an example of a pipeline for Pokemon API.

This command will initialize the pipeline example with Pokemon as the source and `duckdb` as the [destination](https://dlthub.com/docs/dlt-ecosystem/destinations):


In [201]:
!dlt --non-interactive init pokemon duckdb

Looking up the init scripts in [1mhttps://github.com/dlt-hub/verified-sources.git[0m...
No files to update, exiting


In [202]:
!python pokemon_pipeline.py

Pipeline pokemon completed in 1.46 seconds
1 load package(s) were loaded to destination duckdb and into dataset pokemon_data
The duckdb destination used duckdb:////home/alenaastrakhantseva/dlthub/spotlight_demo/pokemon.duckdb location to store data
Load package 1693819840.581949 is LOADED and contains no failed jobs


In [203]:
import duckdb

conn = duckdb.connect(f"pokemon.duckdb")
conn.sql(f"SET search_path = 'pokemon_data'")
display(conn.sql("DESCRIBE"))
data_table = conn.sql("SELECT * FROM pokemon").df()
data_table

┌──────────┬──────────────┬─────────────────────┬──────────────────────┬───────────────────────────────────┬───────────┐
│ database │    schema    │        name         │     column_names     │           column_types            │ temporary │
│ varchar  │   varchar    │       varchar       │      varchar[]       │             varchar[]             │  boolean  │
├──────────┼──────────────┼─────────────────────┼──────────────────────┼───────────────────────────────────┼───────────┤
│ pokemon  │ pokemon_data │ _dlt_loads          │ [load_id, schema_n…  │ [VARCHAR, VARCHAR, BIGINT, TIME…  │ false     │
│ pokemon  │ pokemon_data │ _dlt_pipeline_state │ [version, engine_v…  │ [BIGINT, BIGINT, VARCHAR, VARCH…  │ false     │
│ pokemon  │ pokemon_data │ _dlt_version        │ [version, engine_v…  │ [BIGINT, BIGINT, TIMESTAMP WITH…  │ false     │
│ pokemon  │ pokemon_data │ berries             │ [name, url, _dlt_l…  │ [VARCHAR, VARCHAR, VARCHAR, VAR…  │ false     │
│ pokemon  │ pokemon_data │ poke

Unnamed: 0,name,url,_dlt_load_id,_dlt_id
0,bulbasaur,https://pokeapi.co/api/v2/pokemon/1/,1693819840.581949,LE2RgzOt4YRRZA
1,ivysaur,https://pokeapi.co/api/v2/pokemon/2/,1693819840.581949,5p0B1OMIsSHPbA
2,venusaur,https://pokeapi.co/api/v2/pokemon/3/,1693819840.581949,sUOy96OHI9ezXQ
3,charmander,https://pokeapi.co/api/v2/pokemon/4/,1693819840.581949,EhXIRYatjZx66g
4,charmeleon,https://pokeapi.co/api/v2/pokemon/5/,1693819840.581949,QhrBLgATqWWZ5w
5,charizard,https://pokeapi.co/api/v2/pokemon/6/,1693819840.581949,qm6zHHCZv0s3nw
6,squirtle,https://pokeapi.co/api/v2/pokemon/7/,1693819840.581949,tjMaCTD3iXHlQg
7,wartortle,https://pokeapi.co/api/v2/pokemon/8/,1693819840.581949,n8opSKh8vrIrZQ
8,blastoise,https://pokeapi.co/api/v2/pokemon/9/,1693819840.581949,6NgYFW7S68zEbA
9,caterpie,https://pokeapi.co/api/v2/pokemon/10/,1693819840.581949,IwPQMfnqu/3zTQ
