### Using duckdb with Python API

In [6]:
from IPython.display import display, Image

# Replace 'your_image_url' with the actual URL of the image
image_url = 'https://duckdb.org/images/logo-dl/DuckDB_Logo.png'

# Display the image in the notebook
display(Image(url=image_url))

DuckDB is a database management system, and it is useful in the geospatial domain for efficiently storing and retrieving location-based data.

In simpler terms, DuckDB helps organize and manage information related to locations, like maps or geographic data. It's handy for tasks such as tracking objects, analyzing spatial patterns, or finding nearby points on a map. DuckDB makes working with geospatial data in computer programs easier and more efficient.

DuckDB stands out in the geospatial domain because it's faster, lightweight, and often more efficient compared to other geospatial packages.

* Speed: DuckDB is designed to be fast, allowing for quick processing and retrieval of geospatial data. Its optimized architecture enables speedy execution of queries and analyses, making it a favorable choice for applications that require real-time or near-real-time processing of location-based information.

* Lightweight: DuckDB is a lightweight database, meaning it doesn't require significant computational resources. This makes it suitable for use in various environments, including resource-constrained devices or systems where efficient resource utilization is crucial. The lightweight nature of DuckDB can contribute to faster deployment and lower operational costs.

* Efficiency: DuckDB is built with efficiency in mind, providing a balance between performance and resource utilization. Its design allows for quick data retrieval and processing, making it a reliable option for applications dealing with large geospatial datasets. The efficiency of DuckDB can result in improved overall system performance.

* Integration: DuckDB is designed to integrate seamlessly with programming languages commonly used in data science and geospatial analysis, such as Python. This makes it easier for developers and data scientists to incorporate DuckDB into their workflows, benefiting from its speed and efficiency while leveraging familiar programming tools.

### Objective of this task

* DucDB Exploration:
   - Investigate DucDB features for spatial analysis.
   - Examine geographic data handling tools.
*OSM Integration:
   - Assess DucDB's OSM data integration capabilities.
   - Verify compatibility and interoperability.
* Spatial Analysis Techniques:
   - Identify DucDB's spatial analysis methods.
   - Evaluate advanced analysis using OSM data.
* Nigeria-specific Application:
   - Tailor exploration to Nigeria's geography.
   - Examine DucDB's effectiveness with OSM data in Nigeria.
* Data Visualization:
   - Assess DucDB's geographic data visualization.
   - Explore visualization options for Nigerian geography.

In [None]:
%pip install duckdb leafmap

import the neccessary libraries

In [None]:
#import the libaries
import duckdb
import pandas as  pd
import leafmap
import os
import pyogrio

%load_ext sql

In [None]:
%config SqlMagic.autopandas = True
%config SqlMagic.feedback = False
%config SqlMagic.displaycon = False

To download the data used in this analysis

In [None]:
# url = r'https://download.geofabrik.de/africa/nigeria-latest-free.shp.zip'
# leafmap.download_file(url,unzip=True)

In [None]:
# load the data in the osm nigeria shapefile folder
home_folder = 'data'
nigeria_folder = 'nigeria-latest-free_shp'
data_folder = os.path.join(home_folder, nigeria_folder)
data = os.listdir(data_folder)
for item in data:
    if item.endswith('.shp'):
        print(item)

Connecting to Duckdb

Create a db for nigeria where all the data will be store before analysis

In [None]:
con = duckdb.connect("nigeria.db")

Install and load spatial extension

In [None]:
con.install_extension('spatial')
con.load_extension('spatial')

In [None]:
con.sql("SHOW TABLES")

In [None]:
osm_building  = 'gis_osm_buildings_a_free_1.shp'
budiling_data = os.path.join(home_folder, nigeria_folder, osm_building)
# get the first 10 building data
con.sql(f"SELECT * FROM ST_Read('{bulding_data}') LIMIT 10")

Count the total number of building mapped in nigeria

In [None]:
%%timeit
osm_building  = 'gis_osm_buildings_a_free_1.shp'
budiling_data = os.path.join(home_folder, nigeria_folder, osm_building)
# get the total numbers of building mapped on osm in Nigeria
query = f"SELECT COUNT(*) FROM ST_Read('{bulding_data}') LIMIT 10"
con.sql(query)

use leafmap to visualize the nigeria buidling shapefile

Load data into the databse using sqalchemy method
* read the data using pyogrio, the reason why pyogrio is used to load this data instead of the commonly known geopandas is because of the speed, it is 18x faster than geopandas.
* Use a for loop to load all the data in the nigeria shapefile into the db that was connected to i.e "nigeria.db"

In [None]:
%%timeit
# lets try it with one and see then later we can use a loop to load the rest of the data into the database
osm_building  = 'gis_osm_buildings_a_free_1.shp'
budiling_data = os.path.join(home_folder, nigeria_folder, osm_building)2
# read the data into a dataframe
buidling_gdf = pyogrio.read_dataframe(building_data)

# pass the data into the database
# create a new table from the contents of a DataFrame
query = f"CREATE TABLE osm_nigeria_builing AS SELECT * FROM ST_Read('{bulding_gdf}')"
con.execute(query)

# if the tbale already exit all will need to do is to insert into the table
# insert into an existing table from the contents of a DataFrame
#con.execute("INSERT INTO existing_table SELECT * FROM loaded_Dataframe")

Show our table again to know if the data is already ingested int ot the table

In [None]:
con.sql("SHOW TABLES")

if that works for one, then we will pass the bulk table (data) into the db

In [None]:
%sql duckdb:///:memory:
# %sql duckdb:///path/to/file.db

In [None]:
%%sql

SELECT * FROM duckdb_extensions();

## References 

* https://duckdb.org/docs/api/python/data_ingestion