# Data Export

## Introduction

This notebook demonstrates how to export data from the database to various formats, including Pandas DataFrames, CSV, JSON, Excel, Parquet, GeoJSON, Shapefile, and GeoPackage.

## Installation

Uncomment the following cell to install the required packages if needed.

In [None]:
# %pip install duckdb

## Library Import

In [1]:
import duckdb
import pandas as pd

## Installing Extensions

DuckDB’s Python API provides functions for installing and loading extensions, which perform the equivalent operations to running the `INSTALL` and `LOAD` SQL commands, respectively. An example that installs and loads the [httpfs extension](https://duckdb.org/docs/extensions/httpfs) looks like follows:

In [2]:
con = duckdb.connect()

In [3]:
con.install_extension("httpfs")
con.load_extension("httpfs")

In [4]:
con.install_extension("spatial")
con.load_extension("spatial")

## Sample Data

In [5]:
con.sql(
    """
CREATE TABLE IF NOT EXISTS cities AS
SELECT * EXCLUDE geometry, ST_GeomFromWKB(geometry) 
AS geometry FROM 'https://open.gishub.org/data/duckdb/cities.parquet'
"""
)

In [6]:
con.table("cities").show()

┌─────────┬────────┬───────────┬───────────┬──────────────────┬────────────┬─────────────────────────────┐
│ country │   id   │ latitude  │ longitude │       name       │ population │          geometry           │
│ varchar │ double │  double   │  double   │     varchar      │   double   │          geometry           │
├─────────┼────────┼───────────┼───────────┼──────────────────┼────────────┼─────────────────────────────┤
│ UGA     │    1.0 │    0.5833 │   32.5333 │ Bombo            │    75000.0 │ POINT (32.5333 0.5833)      │
│ UGA     │    2.0 │     0.671 │    30.275 │ Fort Portal      │    42670.0 │ POINT (30.275 0.671)        │
│ ITA     │    3.0 │    40.642 │    15.799 │ Potenza          │    69060.0 │ POINT (15.799 40.642)       │
│ ITA     │    4.0 │    41.563 │    14.656 │ Campobasso       │    50762.0 │ POINT (14.656 41.563)       │
│ ITA     │    5.0 │    45.737 │     7.315 │ Aosta            │    34062.0 │ POINT (7.315 45.737)        │
│ ALD     │    6.0 │    60.097 │    1

## To DataFrames

In [7]:
con.table("cities").df()

Unnamed: 0,country,id,latitude,longitude,name,population,geometry
0,UGA,1.0,0.58330,32.53330,Bombo,75000.0,"[0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,..."
1,UGA,2.0,0.67100,30.27500,Fort Portal,42670.0,"[0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,..."
2,ITA,3.0,40.64200,15.79900,Potenza,69060.0,"[0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,..."
3,ITA,4.0,41.56300,14.65600,Campobasso,50762.0,"[0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,..."
4,ITA,5.0,45.73700,7.31500,Aosta,34062.0,"[0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,..."
...,...,...,...,...,...,...,...
1244,BRA,1245.0,-22.92502,-43.22502,Rio de Janeiro,11748000.0,"[0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,..."
1245,BRA,1246.0,-23.55868,-46.62502,Sao Paulo,18845000.0,"[0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,..."
1246,AUS,1247.0,-33.92001,151.18518,Sydney,4630000.0,"[0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,..."
1247,SGP,1248.0,1.29303,103.85582,Singapore,5183700.0,"[0, 0, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,..."


## To CSV

In [None]:
con.sql("COPY cities TO 'cities.csv' (HEADER, DELIMITER ',')")

In [None]:
# To export without the geometry column
con.sql("COPY (SELECT * EXCLUDE geometry from cities) TO 'cities.csv'")

In [None]:
con.sql(
    "COPY (SELECT * FROM cities WHERE country='USA') TO 'cities_us.csv' (HEADER, DELIMITER ',')"
)

## To JSON

In [None]:
con.sql("COPY cities TO 'cities.json'")

In [None]:
con.sql("COPY (SELECT * FROM cities WHERE country='USA') TO 'cities_us.json'")

## To Excel

In [None]:
con.sql(
    "COPY (SELECT * EXCLUDE geometry FROM cities) TO 'cities.xlsx' WITH (FORMAT GDAL, DRIVER 'XLSX')"
)

## To Parquet

In [None]:
con.sql("COPY cities TO 'cities.parquet' (FORMAT PARQUET)")

In [None]:
con.sql(
    "COPY (SELECT * FROM cities WHERE country='USA') TO 'cities_us.parquet' (FORMAT PARQUET)"
)

## To GeoJSON

In [None]:
con.sql("COPY cities TO 'cities.geojson' WITH (FORMAT GDAL, DRIVER 'GeoJSON')")

In [None]:
con.sql(
    "COPY (SELECT * FROM cities WHERE country='USA') TO 'cities_us.geojson' WITH (FORMAT GDAL, DRIVER 'GeoJSON')"
)

## To Shapefile

Doens't work on Linux.

In [None]:
con.sql("COPY cities TO 'cities.shp' WITH (FORMAT GDAL, DRIVER 'ESRI Shapefile')")

## To GeoPackage

In [None]:
con.sql("COPY cities TO 'cities.gpkg' WITH (FORMAT GDAL, DRIVER 'GPKG')")