CP4D-notebook-datastage

This project demonstrates the integration between notebook and DataStage in IBM CP4D Environment. IBM CP4D environment is IBM Cloud Pak for Data, which is an open, extensible data platform that provides a data fabric to make all data available for AI and analytics, on any cloud.

Description

Validations that are performed on files are done by a notebook in Analytics Project and transformations are performed on files through DataStage Jobs in Data Transformation Project. The functionality of the implemented functions can be viewed via notebook. Basic functionalities are listed below:

Connect to IBM COS (s3) and DB2
Take the data from the IBM COS landing bucket
Perform checks and validations
Insert the status into DB2
Saved the data in IBM COS processed bucket
Call DataStage REST APIs for data transformation

Advantages

Customization of functionalities can be achieved via using Notebooks.
Transformation on millions of rows processing done through DataStage Tool integrated into IBM CP4D Environment.
No need to lift and shift data in different environments based on their usage.

Environment Details

This implementation is done on IBM CP4D which runs on top of the Openshift cluster on IBM Cloud. IBM CP4D allows three kinds of projects:

Analytics Project: Notebooks, AUTO AI, Data connections, dashboards, Federated Learning Environment
Data Transformation Project: DataStage ETL
Data Quality Project: Data Cleansing and Data Matching, Business Terms, Rules

conf directory consists of sample_configuration_file.txt the configuration file which holds the user-provided input parameter.

It holds the client name, file name, target table name, DataStage job name, data transformation project name, username, and password of the user on this IBM CP4D cluster hosted on OpenShift. User will store all their files along with this configuration file separated by the "|" (pipe) symbol in the IBM COS bucket. The notebook will read the parameters from this configuration file and trigger the functionality. Packages required to build this project were pre-installed in Analytics Notebook.

Working Details

Analytics Notebook works like a Jupyter Notebook. When the user receives the request from client then they have to keep the parameters in configuration file and execute the code stored in Notebook. Notebook code will run as per the functionalities written in the Description section.

Analytics Project holds the data connectors for IBM COS and DB2.

Highlights

IBM COS and DB2 connectors.
Loading and storing data in s3 buckets.
Integration between Analytics and Data Transformation Project using APIs.

Demo Screenshots

The output of the functionality can be seen through Analytics notebook as shown below:

IBM COS Bucket Structure:

Count Metrics in AUDIT_TABLE for file:

File Status in AUDIT_TABLE_STATUS:

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
conf		conf
demo-image		demo-image
.gitignore		.gitignore
COS_DB2_DataStage.ipynb		COS_DB2_DataStage.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

conf

conf

demo-image

demo-image

.gitignore

.gitignore

COS_DB2_DataStage.ipynb

COS_DB2_DataStage.ipynb

LICENSE

LICENSE

README.md

README.md

Repository files navigation

CP4D-notebook-datastage

Description

Advantages

Environment Details

Working Details

Highlights

Demo Screenshots

About

Releases

Packages

Languages

License

Anshita1Saxena/CP4D-notebook-datastage

Folders and files

Latest commit

History

Repository files navigation

CP4D-notebook-datastage

Description

Advantages

Environment Details

Working Details

Highlights

Demo Screenshots

About

Resources

License

Stars

Watchers

Forks

Languages