Dsdb is a Buildnn open source project.
Tired of having to manage thousands of unstructured .csv outputs for your small Data Science experiments? Would you like to experience a real SQL-like data management of yout datasets with a real database?
Take a look at what you can do with Postgres.
Give a boost to your data management skills with DsDb
(Data Science DataBase).
- Basic usage of
DsDb
- Quickstart with docker-compose
- Pip Installation
- Connection to a custom DB server
Push a table from pandas to postgres with Just:
from dsdb import DsDbConnect
import pandas as pd
# Load some data or take a DataFrame you analyzed
df = pd.read_csv('my-ugly-file.csv')
with DsDbConnect() as con:
df.to_sql_table('table', con=con, if_exist='append')
and... that's it. To load data from the db:
with DsDbConnect() as con:
df_read = pd.read_sql_table('test', con)
The following workflow launches a dockerized jupyter
server with an underlying db.
Firs, retrieve our pre-made docker-compose.yml
file:
cd my-project-dir
wget https://raw.githubusercontent.com/buildnn/dsdb/master/docker-compose.yml
wget https://raw.githubusercontent.com/buildnn/dsdb/master/notebooks/dsdb_test.ipynb
touch .env
Open the .env
file and place the following text, filling the {text under curly brackets}
as suggested:
content of the .env
file -->
DSDB_USER=datascientist
DSDB_PASSWORD={your password}
DSDB_DB=dsdb
POSTGRES_USER=admin
POSTGRES_PASSWORD={your db password}
POSTGRES_DB=mydb
PGADMIN_DEFAULT_EMAIL={your email}
PGADMIN_DEFAULT_PASSWORD={another different password}
And then start the game
docker-compose up
And... that should be it.
Visit:
https://localhost:8888
to see jupyterhttps://localhost:5050
to visit the pgadmin panel (use the credentials in .env)
To pip-install this repo:
pip install dsdb
dsdb.DsDbConnect
uses a DsDb
object to connect to your db. It loads some
environment variables and uses them to perform
the connection. these are
DSDB_USER
: your username in the DBDSDB_PASSWORD
: your password to access the DBDSDB_DB
: The name of the DBDSDB_HOST
: The address of the DB serverDSDB_DRIVER
: The driver. E.g.'postgres+psycopg2'
for a standard postgres.
The following is a quick way to create them directly inside yout python script:
import os
os.environ['DSDB_USER'] = 'myuser'
# prompt a password input (never write pass explicitly!)
os.environ['DSDB_PASSWORD'] = input('password:')
os.environ['DSDB_DB'] = 'mydb'
os.environ['DSDB_HOST'] = 'localhost:5432' # server address
os.environ['DSDB_DRIVER'] = 'postgres+psycopg2'
...
another option is to create a custom dsdb.DsDb
object
to pass to dsdb.DsDbConnect
:
import dsdb
db = dsdb._utils_dsdb.DsDb(
usr='myuser',
pwd=input('password:'),
db='mydb',
host='localhost:5432', # server address
driver='postgres+psycopg2',
)
with dsdb.DsDbConnect(db=db) as con:
df.to_sql_table('table', con=con)
...