# Exploring DuckDB’s Python API

This notebook contains the code examples from chapter 8 of *Getting Started with DuckDB*.

## Technical requirements

In order to run the examples in this notebook, you'll need to install the Python dependencies for this project. You can do this by running the following command in your terminal when in the root directory of the project. Note that ideally this should be using a Python virtual environment for this project.

    pip install -r requirements.txt

For complete instructions on how to set up your environment for working through the examples, please consult the *Technical requirements* section of the chapter *Setting up the DuckDB Python Client*.

## Working with the Relational API

In [None]:
import duckdb

result = duckdb.sql("SELECT 'quack!'") 

type(result) 

### Data ingestion 

In [None]:
pets_csv = duckdb.read_csv("Seattle_Pet_Licenses.csv") 

pets_csv 

In [None]:
pets_csv.types

In [None]:
pets_csv = duckdb.read_csv(
    "Seattle_Pet_Licenses.csv",
    dtype={"License Issue Date": duckdb.typing.DATE},
    date_format="%B %d %Y",
)
pets_csv.types

In [None]:
pets_csv.limit(5)

In [None]:
pets_csv.sql_query() 

In [None]:
pets_csv_alt = duckdb.sql(
    """ 
    SELECT * 
    FROM read_csv_auto( 
        'Seattle_Pet_Licenses.csv',  
        dateformat='%B %d %Y',  
        dtypes={'License Issue Date': 'DATE'} 
    ) 
    """
)

### Querying relations

In [None]:
pets = duckdb.sql(
    """ 
    SELECT  
        "License Issue Date" AS issue_date, 
        "Animal's Name" AS pet_name, 
        "Species" AS species, 
        "Primary Breed" AS breed 
    FROM pets_csv 
    """
)

pets.limit(5)

In [None]:
duckdb.sql("SELECT min(issue_date), max(issue_date) FROM pets")

### Composing queries with relations

In [None]:
pets.value_counts("species") 

In [None]:
pets.filter("species = 'Pig'")

In [None]:
pets.filter("species = 'Dog'").value_counts("pet_name").order("2 DESC").limit(10)

In [None]:
val_counts_sql = (
    pets.filter("species = 'Dog'")
    .value_counts("pet_name")
    .order("2 DESC")
    .limit(10)
    .sql_query()
)

val_counts_sql

In [None]:
import sqlparse

sqlparse.format(val_counts_sql, reindent=True)

In [None]:
pets.order("length(pet_name) DESC").limit(10)

In [None]:
pets.query("pets_rel", "SELECT *, length(pet_name) AS name_length FROM pets_rel").order(
    "name_length DESC"
).limit(10)

### Writing to disk 

In [None]:
pets.write_csv("seattle_pets.csv")

pets.write_parquet("seattle_pets.parquet")

In [None]:
duckdb.sql("COPY pets TO 'seattle_pets.csv'")

duckdb.sql("COPY pets TO 'seattle_pets.parquet'")

### Modifying the database

In [None]:
conn = duckdb.connect("seattle_pets.db")

In [None]:
conn.read_parquet("seattle_pets.parquet").create("pets")

In [None]:
conn.sql("SHOW TABLES")

In [None]:
conn.table("pets").count("*")

In [None]:
new_dog1 = ("2023-07-16", "Monty", "Dog", "Border Collie") 

conn.table("pets").insert(new_dog1) 

In [None]:
new_dog2 = ("2023-07-16", "Pixie", "Dog", "Australian Kelpie") 

new_dog_rel = conn.values(new_dog2) 

new_dog_rel.insert_into("pets") 

In [None]:
conn.table("pets").filter("issue_date = '2023-07-16'")

In [None]:
conn.close()

### Working with the Python DB-API

### Connecting to a database 

In [None]:
conn = duckdb.connect()

### Querying databases

In [None]:
conn.execute("CREATE TABLE seattle_pets AS SELECT * FROM 'seattle_pets.parquet'")

In [None]:
conn.execute("SELECT * FROM seattle_pets") 

In [None]:
conn.fetchone()

In [None]:
conn.description

In [None]:
[conn.fetchone() for i in range(3)]

In [None]:
conn.fetchmany(3)

In [None]:
rest_rows = conn.fetchall()

len(rest_rows)

### Running SQL queries using Prepared statements

In [None]:
import datetime

new_pet1 = (datetime.date.today(), "Ned", "Dog", "Border Collie")

conn.execute("INSERT INTO seattle_pets VALUES (?, ?, ?, ?)", parameters=new_pet1)

In [None]:
new_pet2 = {
    "name": "Simon",
    "species": "Cat",
    "breed": "Bombay",
    "issue_date": datetime.date.today(),
}

conn.execute(
    "INSERT INTO seattle_pets VALUES ($issue_date, $name, $species, $breed)", new_pet2
)

In [None]:
conn.execute(
    """ 
    SELECT * 
    FROM seattle_pets 
    WHERE issue_date = ?; 
    """,
    [datetime.date.today()],
).fetchall()

### Writing to disk

In [None]:
conn.execute("COPY seattle_pets TO 'seattle_pets_updates.csv'") 

conn.execute("COPY seattle_pets TO 'seattle_pets_updates.parquet'")

### Closing the database connection

In [None]:
conn.close()

### Database cursors

In [None]:
conn = duckdb.connect()

new_conn = conn.cursor()

## Integration with Python packages and language features 

### Consuming Python data structures

#### Querying Python objects via replacement scans 

In [None]:
import pandas as pd  

pets_df = pd.read_parquet("seattle_pets.parquet").sample(frac=1) 

duckdb.sql("SELECT * FROM pets_df").fetchone()

#### Registering objects as virtual tables 

In [None]:
pets_dict = {"seattle": pd.read_parquet("seattle_pets.parquet").sample(frac=1)}

duckdb.register("pets_from_pandas", pets_dict["seattle"])

duckdb.sql("SELECT * FROM pets_from_pandas").fetchone()

#### Creating tables from objects 

In [None]:
pets_df = pd.read_parquet("seattle_pets.parquet").sample(frac=1)

duckdb.sql("CREATE OR REPLACE TABLE pets_table_from_df AS SELECT * FROM pets_df")

duckdb.sql("SELECT * FROM pets_table_from_df").fetchone()

### Converting query results

#### Converting to dataframes 

In [None]:
conn = duckdb.connect()

seattle_pets = conn.from_parquet("seattle_pets.parquet")

pets_df = seattle_pets.df()

pets_df[pets_df["species"] == "Dog"].value_counts("breed")[:5]

In [None]:
import polars as pl

pets_df = seattle_pets.pl()

pets_df.filter(pl.col("species") == "Cat")["breed"].value_counts(sort=True)[:5]

#### Converting to Arrow tables

In [None]:
conn = duckdb.connect()

conn.execute("SELECT * FROM 'seattle_pets.parquet'")

pets_table = conn.arrow()

pets_table.schema

### Data types: from Python to DuckDB

In [None]:
varchar_type = duckdb.typing.VARCHAR 

bigint_type = duckdb.typing.BIGINT

In [None]:
varchar_type = duckdb.typing.DuckDBPyType(str) 

bigint_type = duckdb.typing.DuckDBPyType(int)

In [None]:
duckdb.values(
    [
        10,
        1_000_000,
        0.95,
        "hello string",
        b"hello bytes",
        True,
        datetime.date.today(),
        None,
    ]
)

In [None]:
duckdb.values([(1, 2), ["hello", "world"], {"key1": 10, "key2": "quack!"}])

### User-defined functions

In [None]:
import emoji

def emojify(species):
    """Converts a string into a single emoji."""
    emoji_str = emoji.emojize(f":{species.lower()}:")
    if emoji.is_emoji(emoji_str):
        return emoji_str
    return None

In [None]:
emojify("goat")

In [None]:
duckdb.create_function(
    "emojify", 
    emojify, 
    [duckdb.typing.VARCHAR],
    duckdb.typing.VARCHAR
)

In [None]:
duckdb.sql(
    """ 
    SELECT *, emojify(species) as emoji 
    FROM 'seattle_pets_updates.parquet' 
    USING SAMPLE 10 
    """
)

In [None]:
duckdb.remove_function("emojify")

In [None]:
duckdb.create_function("emojify", emojify, [str], str)

In [None]:
def emojify(species: str) -> str:
    """Converts a string into a single emoji."""
    emoji_str = emoji.emojize(f":{species.lower()}:")
    if emoji.is_emoji(emoji_str):
        return emoji_str
    return None

duckdb.remove_function("emojify")

duckdb.create_function("emojify", emojify)

### Handling exceptions

In [None]:
from duckdb import ConversionException

try:
    duckdb.execute("SELECT '5,000'::INTEGER").fetchall()
except ConversionException as error:
    print(error)
    # handle exception...

In [None]:
from duckdb import CatalogException

try:
    duckdb.sql("SELECT * from imaginary_table")
except CatalogException as error:
    print(error)
    # handle exception...

## Summary