# Arrow and DuckDB Example Notebook

Demonstrating using Apache Arrow with the DuckDB client via SQL.

## References

- [Apache Arrow](https://arrow.apache.org/docs/index.html)
- [DuckDB](https://duckdb.org/docs/)
- [SQL](https://en.wikipedia.org/wiki/SQL)
- [Iris Dataset](https://archive.ics.uci.edu/ml/datasets/Iris)
    - Creator: R.A. Fisher
    - Donor: Michael Marshall (MARSHALL%PLU '@' io.arc.nasa.gov)

In [18]:
import pyarrow.csv as csv
import pyarrow.parquet as parquet
import duckdb
import pathlib
import urllib

data_link = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"

In [19]:
parquet.write_table(
    table=csv.read_csv(
        input_file=urllib.request.urlopen(data_link),
        read_options=csv.ReadOptions(
            column_names=[
                "sepal length",
                "sepal width",
                "petal length",
                "petal width",
                "class",
            ]
        ),
    ),
    where="iris.parquet",
)

In [24]:
list(pathlib.Path(".").glob("*.parquet"))

[PosixPath('iris.parquet')]

In [27]:
duckdb.connect().execute(
    f"""
    /*  */
    select * from read_parquet('iris.parquet');
    """
).arrow()

pyarrow.Table
sepal length: double
sepal width: double
petal length: double
petal width: double
class: string
----
sepal length: [[5.1,4.9,4.7,4.6,5,...,6.7,6.3,6.5,6.2,5.9]]
sepal width: [[3.5,3,3.2,3.1,3.6,...,3,2.5,3,3.4,3]]
petal length: [[1.4,1.4,1.3,1.5,1.4,...,5.2,5,5.2,5.4,5.1]]
petal width: [[0.2,0.2,0.2,0.2,0.2,...,2.3,1.9,2,2.3,1.8]]
class: [["Iris-setosa","Iris-setosa","Iris-setosa","Iris-setosa","Iris-setosa",...,"Iris-virginica","Iris-virginica","Iris-virginica","Iris-virginica","Iris-virginica"]]