Skip to content

This library contains several functions that allow you to migrate data from a CSV file or Pandas Dataframe into a PostgreSQL database using the libraries Psycopg2 and Pandas

License

Notifications You must be signed in to change notification settings

adolfojmnz/pandaspg

Repository files navigation

Description

This library contains several functions that allow you to migrate data from a CSV file or Pandas Dataframe into a PostgreSQL database using the libraries Psycopg2 and Pandas. Specifically, it includes functions for:

  • Loading data from a CSV file to a Pandas dataframe
  • Mapping Pandas Dataframe columns to their datatypes
  • Mapping Pandas Dataframe columns to suitable PostgreSQL datatypes
  • Connect to a PostgreSQL database
  • Creating new tables on a PostgreSQL database
  • Inserting data from a Pandas DataFrame into a table in a PostgreSQL database

Installation

Install The Required Dependencies

pip install pandas psycopg2

Install The pandaspg Package

pip install pandaspg
💡 Make sure you have a PostgreSQL database up and running!

Usage

I will walk you through a step-by-step example of how to migrate data from a CSV file into a PostgreSQL database. For demonstration purposes, download the exoplanets file located in this repo.


Download The CSV file

wget https://raw.githubusercontent.com/adolfojmnz/pandaspg/main/exoplanets_07-04-2023.csv
💡 This file contains a total of 34,112 rows and 92 columns with a total size of 28.6 MB

Import The Library

import pandaspg

Create a Pandas dataframe with the CSV file data

dataframe = pandaspg.csv_to_dataframe('exoplanets_04-07-2023.csv')

Generate a dictionary mapping the dataframe columns to their datatype

column_datatypes_dict = pandaspg.get_dataframe_column_dtypes_dict(dataframe)

Generate a dictionary mapping the dataframe columns to a suitable PostgreSQL datatype

pg_column_datatypes_dict = pandaspg.map_pandas_to_postgresql_datatypes(
    column_datatypes_dict
)

Connect to an existing and running PostgreSQL database

connection = pandaspg.connect_to_postgresql(
    database='analysis',
    user='postgres',
    password='postgres',
    host='localhost',
    port=5432,
)
💡 Replace the above parameters as you see fit.

Create a PostgreSQL table

pandaspg.create_postgresql_table(
    connection, 'exoplanets_csv', pg_column_datatypes_dict
)
💡 The parameter `exoplanets_csv` is the name of the database to be created.

Insert the data from the dataframe into the recently created table

pandaspg.insert_dataframe_into_postgresql(
    connection, 'exoplanets_csv', dataframe
)

Close the connection with the database

connection.close()

The Full Example

import pandaspg

dataframe = pandaspg.csv_to_dataframe('exoplanets_04-07-2023.csv')

column_datatypes_dict = pandaspg.get_dataframe_column_dtypes_dict(dataframe)

pg_column_datatypes_dict = pandaspg.map_pandas_to_postgresql_datatypes(
    column_datatypes_dict
)

connection = pandaspg.connect_to_postgresql(
    database='analysis',
    user='postgres',
    password='postgres',
    host='localhost',
    port=5432,
)

pandaspg.create_postgresql_table(
    connection, 'exoplanets_csv', pg_column_datatypes_dict
)

pandaspg.insert_dataframe_into_postgresql(
    connection, 'exoplanets_csv', dataframe
)

connection.close()

Inspect The Results

Enter the PostgreSQL prompt

psql -U postgres -d analysis

List the tables in the analysis database

\dt

Retrieve the data from the exoplanets_csv table

SELECT * FROM exoplanets_csv;

About

This library contains several functions that allow you to migrate data from a CSV file or Pandas Dataframe into a PostgreSQL database using the libraries Psycopg2 and Pandas

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages