# Python API

## Introduction

There are various client APIs for DuckDB. DuckDB’s “native” API is C++, with “official” wrappers available for C, Python, R, Java, Node.js, WebAssembly/Wasm, ODBC API, Julia, and a Command Line Interface (CLI).

In this notebook, we will explore the [DuckDB Python API](https://duckdb.org/docs/api/python/overview).

## Datasets

The following datasets are used in this notebook. You don't need to download them, they can be accessed directly from the notebook.

- [cities.csv](https://open.gishub.org/data/duckdb/cities.csv)
- [countries.csv](https://open.gishub.org/data/duckdb/countries.csv)

## Installation

Uncomment the following cell to install the required packages if needed.

In [2]:
 %pip install duckdb duckdb-engine jupysql

Collecting duckdb
  Obtaining dependency information for duckdb from https://files.pythonhosted.org/packages/2e/60/04503bb5bffe0edeccb223b275b487fbb006dc0fd23513ed2dac03641429/duckdb-0.9.1-cp311-cp311-win_amd64.whl.metadata
  Downloading duckdb-0.9.1-cp311-cp311-win_amd64.whl.metadata (798 bytes)
Collecting duckdb-engine
  Obtaining dependency information for duckdb-engine from https://files.pythonhosted.org/packages/f3/6c/17298ff413db694b87ce0e3eea1685ffe47987fb64e46cd6b673bac53177/duckdb_engine-0.9.2-py3-none-any.whl.metadata
  Downloading duckdb_engine-0.9.2-py3-none-any.whl.metadata (6.8 kB)
Collecting jupysql
  Obtaining dependency information for jupysql from https://files.pythonhosted.org/packages/34/ca/4ea8ba339edb13f3cffcf76390b4f35f5f0478c994f254ccf1ae49104241/jupysql-0.10.2-py3-none-any.whl.metadata
  Downloading jupysql-0.10.2-py3-none-any.whl.metadata (5.6 kB)
Collecting sqlalchemy>=1.3.22 (from duckdb-engine)
  Obtaining dependency information for sqlalchemy>=1.3.22 from 

## Library Import

In [3]:
import duckdb
import pandas as pd

## Installing Extensions

DuckDB’s Python API provides functions for installing and loading extensions, which perform the equivalent operations to running the `INSTALL` and `LOAD` SQL commands, respectively. An example that installs and loads the [httpfs extension](https://duckdb.org/docs/extensions/httpfs) looks like follows:

In [5]:
con = duckdb.connect()
con.install_extension("httpfs")
con.load_extension("httpfs")

## Data Input

DuckDB can ingest data from a wide variety of formats – both on-disk and in-memory. See the [data ingestion page](https://duckdb.org/docs/api/python/data_ingestion) for more information.

In [6]:
con.sql('SELECT 42').show()

┌───────┐
│  42   │
│ int32 │
├───────┤
│    42 │
└───────┘



In [7]:
con.read_csv('https://open.gishub.org/data/duckdb/cities.csv')

┌───────┬──────────────────┬─────────┬───────────┬───────────┬────────────┐
│  id   │       name       │ country │ latitude  │ longitude │ population │
│ int64 │     varchar      │ varchar │  double   │  double   │   int64    │
├───────┼──────────────────┼─────────┼───────────┼───────────┼────────────┤
│     1 │ Bombo            │ UGA     │    0.5833 │   32.5333 │      75000 │
│     2 │ Fort Portal      │ UGA     │     0.671 │    30.275 │      42670 │
│     3 │ Potenza          │ ITA     │    40.642 │    15.799 │      69060 │
│     4 │ Campobasso       │ ITA     │    41.563 │    14.656 │      50762 │
│     5 │ Aosta            │ ITA     │    45.737 │     7.315 │      34062 │
│     6 │ Mariehamn        │ ALD     │    60.097 │    19.949 │      10682 │
│     7 │ Ramallah         │ PSE     │  31.90294 │  35.20621 │      24599 │
│     8 │ Vatican City     │ VAT     │  41.90001 │  12.44781 │        832 │
│     9 │ Poitier          │ FRA     │  46.58329 │   0.33328 │      85960 │
│    10 │ Cl

In [8]:
con.read_csv('https://open.gishub.org/data/duckdb/countries.csv')

┌───────┬─────────────────────────┬─────────────┬─────────────┬──────────────┬──────────┬───────────┐
│  id   │         Country         │ Alpha2_code │ Alpha3_code │ Numeric_code │ Latitude │ Longitude │
│ int64 │         varchar         │   varchar   │   varchar   │    int64     │  double  │  double   │
├───────┼─────────────────────────┼─────────────┼─────────────┼──────────────┼──────────┼───────────┤
│     1 │ Afghanistan             │ AF          │ AFG         │            4 │     33.0 │      65.0 │
│     2 │ Albania                 │ AL          │ ALB         │            8 │     41.0 │      20.0 │
│     3 │ Algeria                 │ DZ          │ DZA         │           12 │     28.0 │       3.0 │
│     4 │ American Samoa          │ AS          │ ASM         │           16 │ -14.3333 │    -170.0 │
│     5 │ Andorra                 │ AD          │ AND         │           20 │     42.5 │       1.6 │
│     6 │ Angola                  │ AO          │ AGO         │           24 │    

## DataFrames

DuckDB can also directly query Pandas DataFrames. 

In [9]:
pandas_df = pd.DataFrame({'a': [42]})
con.sql('SELECT * FROM pandas_df')

┌───────┐
│   a   │
│ int64 │
├───────┤
│    42 │
└───────┘

DuckDB can also ingest data from remote sources (e.g., HTTP, S3) and return a Pandas DataFrame.

In [10]:
df = con.read_csv('https://open.gishub.org/data/duckdb/cities.csv').df()
df.head()

Unnamed: 0,id,name,country,latitude,longitude,population
0,1,Bombo,UGA,0.5833,32.5333,75000
1,2,Fort Portal,UGA,0.671,30.275,42670
2,3,Potenza,ITA,40.642,15.799,69060
3,4,Campobasso,ITA,41.563,14.656,50762
4,5,Aosta,ITA,45.737,7.315,34062


## Result Conversion

DuckDB supports converting query results efficiently to a variety of formats. See the [result conversion page](https://duckdb.org/docs/api/python/result_conversion) for more information.

In [11]:
con.sql('SELECT 42').fetchall()  # Python objects

[(42,)]

In [12]:
con.sql('SELECT 42').df()  # Pandas DataFrame

Unnamed: 0,42
0,42


In [13]:
con.sql('SELECT 42').fetchnumpy()  # NumPy Arrays

{'42': array([42])}

## Writing Data to Disk

DuckDB supports writing Relation objects directly to disk in a variety of formats. The [COPY](https://duckdb.org/docs/sql/statements/copy) statement can be used to write data to disk using SQL as an alternative.

In [None]:
con.sql('SELECT 42').write_parquet('out.parquet')  # Write to a Parquet file
con.sql('SELECT 42').write_csv('out.csv')  # Write to a CSV file
con.sql("COPY (SELECT 42) TO 'out.parquet'")  # Copy to a parquet file

## Persistent Storage

By default DuckDB operates on an **in-memory** database. That means that any tables that are created are not persisted to disk. Using the `.connect` method a connection can be made to a persistent database. Any data written to that connection will be persisted, and can be reloaded by re-connecting to the same file.

In [None]:
# create a connection to a file called 'file.db'
con = duckdb.connect('file.db')
# create a table and load data into it
con.sql(
    'CREATE TABLE IF NOT EXISTS cities AS FROM read_csv_auto("https://open.gishub.org/data/duckdb/cities.csv")'
)
# query the table
con.table('cities').show()
# Note: connections also closed implicitly when they go out of scope

In [None]:
# explicitly close the connection
con.close()

You can also use a context manager to ensure that the connection is closed:

In [None]:
with duckdb.connect('file.db') as con:
    con.sql(
        'CREATE TABLE IF NOT EXISTS cities AS FROM read_csv_auto("https://open.gishub.org/data/duckdb/cities.csv")'
    )
    con.table('cities').show()
    # the context manager closes the connection automatically

## Connection Object and Module

The connection object and the `duckdb` module can be used interchangeably – they support the same methods. The only difference is that when using the `duckdb` module a global in-memory database is used.

Note that if you are developing a package designed for others to use, and use DuckDB in the package, it is recommend that you create connection objects instead of using the methods on the `duckdb` module. That is because the `duckdb` module uses a shared global database – which can cause hard to debug issues if used from within multiple different packages.

In [None]:
duckdb.sql('SELECT 42')

In [None]:
#this one is recommended compared to codeblock above
con = duckdb.connect()
con.sql('SELECT 42')

## References

- [DuckDB Python API Overview](https://duckdb.org/docs/api/python/overview)