Skip to content

Enable and extend Jupyter for Greenplum Database environments

Notifications You must be signed in to change notification settings

cantzakas/gpdb-jupyter

Repository files navigation

Enable and extend Jupyter Notebooks for Greenplum Database environments

Install Greenplum Sandbox VM

Download the Greenplum Sandbox VM before moving forward to the next steps. Read throught the instructions and tutorials available on greenplum.org on how to enable, connect to and start your Greenplum Database. SSH to your Greenplum Database.

Ensure you can run Python from the command line

Before you go any further, make sure you have Python and that the expected version is available from your command line. You can check this by running:

$ python --version

You should get some output like Python 2.7.12..

Ensure you can run pip from the command line

Additionally, you’ll need to make sure you have pip available. You can check this by running:

$ pip --version

If pip isn’t already installed, then first try to bootstrap it from the standard library:

$ python -m ensurepip --default-pip

If that still doesn’t allow you to run pip:

  • Securely Download get-pip.py[1].
  • Run python get-pip.py[2]. This will install or upgrade pip. Additionally, it will install setuptools and wheel if they’re not installed already.
Warning:
Be cautious if you’re using a Python install that’s managed by your operating system or another package manager. get-pip.py does not coordinate with those tools, and may leave your system in an inconsistent state. You can use python get-pip.py --prefix=/usr/local/ to install in /usr/local which is designed for locally-installed software.

Ensure pip, setuptools, and wheel are up to date

While pip alone is sufficient to install from pre-built binary archives, up to date copies of the setuptools and wheel projects are useful to ensure you can also install from source archives:

$ python -m pip install --upgrade pip setuptools wheel

Ensure ipython-sql & psycopg2 are up to date

You must also install the Python SQL toolkit, PostgreSQL adapter for the Python and sql_magic Python library for writing SQL to interact with relational databases, with the pip tool. You may achieve that by executing the following in the command-line:

$ pip install --upgrade ipython-sql psycopg2-binary sqlalchemy sql_magic

Install Jupyter Notebooks with pip

Install the Jupyter Notebook using:

$ pip install jupyter

Congratulations. You have installed Jupyter Notebook.

Start a Jupyter Notebook Server for the first time

You can start running a notebook server from the command line using the following command:

$ jupyter notebook

This will print some information about the notebook server in your console, and open a web browser to the URL of the web application (by default, http://127.0.0.1:8888).

Configure the Jupyter Notebook Server using the Command-Line options

The notebook server can be run with a variety of command line arguments.

To open a specific Jupyter Notebook

The following code should open the given notebook in the currently running notebook server, starting one if necessary.

$ jupyter notebook notebook.ipynb

To start the Jupyter Notebook using a custom IP or port

By default, the notebook server starts on port 8888. If port 8888 is unavailable or in use, the notebook server searches the next available port. You may also specify a port and an IP manually. In this example, we set the server’s port to 9999 and the server's IP to 10.0.2.45:

$ jupyter notebook --port 9999 -ip "10.0.2.15"

To get help about Jupyter Notebook server options

The full list of available options can be found here or can be generated by running the following command:

$ jupyter notebook --help

Running a public Jupyter Notebook Server

If you want to access your notebook server remotely via a web browser, you can do so by running a public notebook server. For optimal security when running a public notebook server, you should first secure the server with a password and/or SSL/HTTPS as described in Securing a notebook server.

Important
This is not the multi-user server you are looking for. This document describes how you can run a public server with a single user. This should only be done by someone who wants remote access to their personal machine. Even so, doing this requires a thorough understanding of the set-ups limitations and security implications. If you allow multiple users to access a notebook server as it is described in this document, their commands may collide, clobber and overwrite each other.
If you want a multi-user server, the official solution is JupyterHub. To use JupyterHub, you need a Unix server (typically Linux) running somewhere that is accessible to your users on a network. This may run over the public internet, but doing so introduces additional security concerns.

Check to see if you have a notebook configuration file, jupyter\_notebook\_config.py. The default location for this file is your Jupyter folder located in your home directory, which for Linux systems is /home/$USERNAME/.jupyter/jupyter_notebook_config.py. If you don’t already have a Jupyter folder, or if your Jupyter folder doesn’t contain a notebook configuration file, run the following command - This command will create the Jupyter folder if necessary, and create notebook configuration file, jupyter_notebook_config.py, in this folder.

$ jupyter notebook --generate-config

Automatic Password setup

As of notebook 5.3, the first time you log-in using a token, the notebook server should give you the opportunity to setup a password from the user interface. You will be presented with a form asking for the current token, as well as your new password; enter both and click on Login and setup new password. Next time you need to log in you’ll be able to use the new password instead of the login token, otherwise follow the procedure to set a password from the command line.

Starting at notebook version 5.0, you can enter and store a password for your notebook server with a single command. jupyter notebook password will prompt you for your password and record the hashed password in your jupyter_notebook_config.json.

$ jupyter notebook password
Enter password:  ****
Verify password: ****
[NotebookPasswordApp] Wrote hashed password to ~/.jupyter/jupyter_notebook_config.json

This can be used to reset a lost password; or if you believe your credentials have been leaked and desire to change your password. Changing your password will invalidate all logged-in sessions after a server restart.

Configure the Jupyter Notebook Server using the jupyter_notebook_config.json file

The same configuration parameters for which we previously used the command-line can also be configured on the jupyter_notebook_config.json file. Simply update the file with the appropriate configuration key-value pairs, for example as shown below:

{
  "NotebookApp": {
    "password": "sha1:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
    "ip": "10.0.2.15",
    "notebook_dir": "/home/gpadmin/jupyter/",
    "open_browser": false,
    "port": 9999
  }
}

Install Jupyter Notebook Extensions

Run the following in a command prompt:

$ pip install jupyter_contrib_nbextensions && jupyter contrib nbextension install 

Enable Snippets Menu Jupyter Notebook Extension

  • Start up a Jupyter Notebook:

img

  • Navigate to the new Nbextensions tab:

img

  • Enable Jupyter Notebook Snippets Menu Extension:

img

  • Enable/Activate the Include custom menu content parsed from JSON string below option, and update the JSON text within the text area with the following:
{
    "name" : "Greenplum Database",
    "sub-menu" : [
        {
            "name" : "Connect to Greenplum Database (manual)",
            "snippet" : ["%load_ext sql", 
                "%sql postgres://gpadmin:pivotal@10.0.2.6:5432/gpadmin"]
        },
        {
            "name" : "Connect to Greenplum Database (via config file)",
            "snippet" : ["import json",
                "with open('/root/gpdb-config.json') as f:\n dbconf = json.load(f)",
                "%load_ext sql", 
                "%sql postgresql://{dbconf['user']}:{dbconf['passw']}@{dbconf['host']}:{str(dbconf['port'])}/{dbconf['database']}"]
        },
        "---",
        {
            "name" : "Get Greenplum Database Version",
            "snippet" : ["%%sql\nSELECT version()"]
        }
    ]
}

img

  • Create a new Python Notebook. Notice the Snippets menu which has now been added into your toolbar. Click to expand the menu and notice the Greenplum Database sub-menu and the Connect to Greenplum Database (manual), Connect to Greenplum Database (via config file), and Get Greenplum Database Version options which have been added, according to the JSON string above.

img

  • Click on the Connect to Greenplum Database (manual) option; the snippet code defined in the JSON string, is now added into the active cell:

img

  • Run the cells to execute the code inserted:

img


[1]: Secure in this context means using a modern browser or a tool like curl that verifies SSL certificates when downloading from https URLs.

[2]: Depending on your platform, this may require root or Administrator access. pip is currently considering changing this by making user installs the default behavior.

About

Enable and extend Jupyter for Greenplum Database environments

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published