Skip to content

AliMassoud/ADBC-Driver-with-PostgreSQL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ADBC Writing to PostgreSQL Database

In this repository, we introduce an experiment that shows how much Arrow Database Connectivity (ADBC) is efficient in writing big datasets to PostgreSQL database. The experiment is conducted on a dataset of 10 and 40 million rows and 3 columns. The dataset is generated randomly using Python's pandas library. I use Jupyter Notebook to run the experiment.

What is ADBC?

It is an API collection that allows Arrow-native to access databases. It executes SQL queries, database catalogs, and more. It leverages Apache Arrow to avoid making copies of data on reads and speed up data access with eliminating serialization overhead. In addition, the driver is responsible for converting objects from/to arrow where required and that helps developers focus on their main objective.

Experiment

We compare the performance of ADBC driver and SQLAlchemy which is a popular Python SQL toolkit and Object-Relational Mapping (ORM) library. We measure the time taken to write 10 and 40 million rows to a PostgreSQL database. The experiment is conducted on a local machine with 16GB RAM and 8 cores.

To replicate the experiment, please follow instructions belows (make sure to have python, docker, and docker-compose installed on your machine):

  1. Clone repository.
  2. Navigate to project directory and run the following command to start PostgreSQL database on docker container:
docker-compose up -d
  1. Create a virtual environment, activate it and install required libraries:
python -m venv .venv
source .venv/bin/activate
make install

you are all set, you can replicate the experiment by using the jupyter notebook!

for more explanation about the experiment, please refer to this medium article:
ARTICLE

Results

The results show that ADBC is faster than SQLAlchemy in writing big datasets to PostgreSQL database. The time taken to write 10 million rows insertion is 32.4 seconds with ADBC and 1 minute 4.6 seconds with SQLAlchemy. The time taken to write 40 million rows insertion is 1 minute 33 seconds with ADBC and 4 minutes 30 seconds with SQLAlchemy. The results show that ADBC is 2.3 times faster than SQLAlchemy in writing 10 million rows and 2.2 times faster in writing 40 million rows.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published