Skip to content

IBM/SqlDataPrep4ML

Repository files navigation

SQLDataPrep4ML - SQL Data Preprocessing Library

Library providing implementation of Machine Learning data pre-processing functions which allows to process data directly in SQL database. As opposed to the traditional in-memory execution approach of traditional libraries such as SKLearn, the mechanism does not retrieve the data from the source but instead dynamically generates SQL queries and executes them in RDBMS where the data are stored.

The current version supports following RDBMS backends: IBM DB2 (LUW and Z) and PostgreSQL

Structure of the library

Setup

Dependencies

DB2 driver

Driver to DB2 requires following libraries:
DB2 Driver - setup steps
  1. pip - https://github.com/ibmdb/python-ibmdb
  2. install from source https://pypi.org/project/ibm-db-sa/

DB2 driver

Specifying DB2 Connection

A TCP/IP connection can be specified as follows: e = create_engine("db2+ibm_db://user:pass@host[:port]/database")

For a local socket connection, exclude the "host" and "port" portions: e = create_engine("db2+ibm_db://user:pass@/database")

Directories

  • sp_preprocessing - the library python module
  • Tests and Examples - series of files showcasing various aspects of the the library
  • Jupyter Notebook Examples
  • Performance Tests