Data Science Workspace
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

Data Science Workspace (dsws)

A common data science component integration. Provides common access patterns for hadoop component libraries used in parallel distributed environments.

Some access patterns are standardized:

  • hive
  • imp
  • sql

Some access patterns are specific to CDSW:

Exisiting libraries, classes, configs, and type

library class config type default
dsws duct X
hive Hive cli
Beeline (Hive) hbl cli X
pyhs2 Pyhs2 conn X
Beeline (Impala) Ibl cli X
impyla Impyla conn X
spark Spark sess X
tb Tb webapp

Requirement Notes. There are some libraries that this will require, Others will only be available after install.

Impyla requirements

thrift==0.9.3 impyla>=0.14.0

For Hive and/or Kerberos support

sasl>=0.2.1 thrift_sasl>=0.2.1

For some of the example code



Upload to pypi

pip install -U pip setuptools twine

python sdist

twine upload dist/*


In order to provide some form of configuration to evaluate, the project comes with an example configuration specific to a quickstart instance.