This is a Linux/macOS utility to import CSV files into a PostgreSQL database using Docker and Python. It runs two services:
- A PostgreSQL server.
- A server loaded with Python and JupyterLab. JupyterLab provides GUI access to the host's CSV files, Python code, and the guest's directory tree and terminal.
Set up is done easily via terminal with make up. Importing CSVs is done via a CLI-enabled Python script with the command make import. There is a test module (make test) that allows testing the import process. There is also a logger script to notify and make a record of individual successes and failures when importing CSVs.
This project uses docker compose to run two services and a network.
This service runs the official Postgres image. See https://hub.docker.com/_/postgres.
- Container Name:
pg_server_csvs2postgres. - Exposes port
5432. - Initializes a database called
imported_csvs. - Persists data to
csvs-to-postgres/data/in host. - Permissions are default in the Postgres image.
This service is built on top of https://quay.io/jupyter/base-notebook, customized with a Dockerfile to include additional Python libraries. Additional packages, e.g., SQLAlchemy, are listed in csvs-to-postgres/code/docker/jupyter/requirements.txt. make is also installed for CLI automation.
- Mounts:
- directory
csvs-to-postgres/data/csvs/containing CSVs in host mounts to~/data. - Python scripts directory
csvs-to-postgres/code/python/mounts to~/code.
- directory
- Exposes port
8888for notebook access. - JupyterLab is reachable at http://localhost:8888/lab?token=custom_token.
rootandsudoare inaccessible;rootcan be enabled in Dockerfile.
Python code can be modified either in the host system, e.g., with VSCode, or in the guest system using JupyterLab. Changes sync and are persistent in the host, allowing script customization.
A bridge network is created to connect the two containers.
Below it is assumed (as an example) that the project's directory is saved to the Home directory. If that is not the case, adjust the path in the commands below accordingly by susbtituting ~ with the path to the directory containing the project. If downloading as a ZIP file, update the project's path with the corresponding nomenclature, e.g., by using csvs-to-postgres-main instead of csvs-to-postgres.
Open a terminal pointing to the project's directory ~/csvs-to-postgres/code/docker/ and send the command make up. This will set everything up.
CSVs to be imported should be placed inside the project's directory ~/csvs-to-postgres/data/csvs/. You can also use symlinks to point the directory to the original files without moving them. This directory is automatically created when the container is initialized, but can be set up beforehand with the CSV files inside. It won't be overwritten if it is created manually.
- Once Docker finishes setting up all services, navigate to the JupyterLab server at
http://localhost:8888/lab?token=custom_token. - Open a terminal inside JupyterLab and change the working directory with
cd /home/jovyan/code/etl/. The Makefile in this directory can be run directly in JupyterLab's terminal. Options include:make importto import all CSV files from~/csvs-to-postgres/data/csvs/(in the host computer) into the Postgres database. The CSV files are visible in/home/jovyan/data/in the JupyterLab container.make logsto show the 5 most recent logs, which are produced and saved after importing into/home/jovyan/code/logs/in the container, and in~/csvs-to-postgres/code/logs/in the host.make testto run import tests on the CSVs defined in the host at~/csvs-to-postgres/code/python/tests/test_csvs2postgres.py. You can open and edit this file with JupyterLab at/home/jovyan/code/etl/tests/test_csvs2postgres.py.
- Use
ctrl + cwith the original terminal window followed bymake downto send the shut down signal to Docker. Alternatively, from another terminal in your host pointing to~/csvs-to-postgres/code/docker/sendmake down. The Postgres database will be saved in the host at~/csvs-to-postgres/data/postgres/. This database can be accessed with the default credentials set up by the Postgres image.