
# Customise Pangeo Environment on Gadi


In this notebook we demonstrate how to

- load additional modules 
- install packages not available on Gadi

Existing software list available on Gadi is [here](https://opus.nci.org.au/x/6YT_Ag). Users may need to load additional modules to run their own workflows. If a particular package is not
available on Gadi, users can install it under their own working directory, e.g., their computer project space if it is allowed. Please note, the Pangeo environment should always be loaded **<span style="color:red">before</span>** adding other modules or installing new modules.

###  Step 1: Enable Pangeo in your shell envorinment

To enable the Pangeo environment, you can use the following command within jobs, or within an interactive environment:

```
$ module load pangeo
Loading pangeo/2020.05
  Loading requirement: intel-mkl/2019.3.199 python3/3.7.4 hdf5/1.10.5
    netcdf/4.7.3

```
Note that Pangeo has its own Python installation.


###  Step 2: Check if a module is available on Gadi

Lets see if **tensorflow** is available on Gadi.

```
[abc123@gadi-login-01 ~]$ module avail tensorflow

-----------------------/apps/Modules/modulefiles-----------------------
tensorflow/2.0.0

```

###  Step 3a: Load modules if they are available on Gadi

If the module exists, it will list all the versions available. Pick the one that you would like to use:

```
[abc123@gadi-login-01 ~]$ module load tensorflow/2.0.0
```

Check if the module has loaded properly:

```
[abc123@gadi-login-01 ~]$ module list

Currently loaded Module files:
 1) pbs                    5) netcdf/4.7.3           9) nccl/2.5.6-1+cuda10.1  
 2) intel-mkl/2019.3.199   6) pangeo/2020.05        10) openmpi/4.0.1          
 3) python3/3.7.4          7) cuda/10.1             11) tensorflow/2.0.0       
 4) hdf5/1.10.5            8) cudnn/7.6.5-cuda10.1  
```

###  Step 3b: Install modules if they are NOT available on Gadi

Lets try the **Deep Graph Library** to see if it is available on Gadi. We can list all modules that start with the letter **d**:

```
[abc123@gadi-login-01 ~]$ module avail d

-----------------------/apps/Modules/modulefiles-----------------------
dalton/2018.2   dftbplus/19.1    dmo13/2018

```
We can see that the **Deep Graph Library** is not available as a module on Gadi and therefore we will need to install it ourselves.




There are multiple ways to install Python packages. For example, you could use the **pip** package manager which is a de facto standard package-management system used to install and manage software packages written in Python (see instruction here: https://packaging.python.org/tutorials/installing-packages/#installing-to-the-user-site ). Another popular way to install packages is through [Conda](https://docs.conda.io/en/latest/) which is an open source package, dependency and environment management system for any language - Python, R, Ruby, Jua, Scala, Java, JavaScript, C/C++, FORTRAN and many more.


Please note: Additional packages should be installed within a user's own directory

As an example, you can use pip to install a package by running:

```
$ pip install --user <package name>

```

Let's now install the **Deep Graph Library** using pip:

```
[abc123@gadi-login-01 ~]$ pip install --user deepgraph

Collecting deepgraph
  Using cached https://files.pythonhosted.org/packages/fc/3e/4a34a5316a5f886b8d7a6787c24852d9e5a5ef00b4ec6af0736f681a3a58/DeepGraph-0.2.2.tar.gz
Requirement already satisfied: numpy>=1.6 in /apps/python3/3.7.4/lib/python3.7/site-packages/numpy-1.17.2-py3.7-linux-x86_64.egg (from deepgraph) (1.17.2)
Requirement already satisfied: pandas>=0.17.0 in /apps/pangeo/2019.12/lib/python3.7/site-packages (from deepgraph) (0.25.3)
Requirement already satisfied: pytz>=2017.2 in /apps/pangeo/2019.12/lib/python3.7/site-packages (from pandas>=0.17.0->deepgraph) (2019.3)
Requirement already satisfied: python-dateutil>=2.6.1 in /apps/python3/3.7.4/lib/python3.7/site-packages (from pandas>=0.17.0->deepgraph) (2.8.1)
Requirement already satisfied: six>=1.5 in /apps/python3/3.7.4/lib/python3.7/site-packages (from python-dateutil>=2.6.1->pandas>=0.17.0->deepgraph) (1.13.0)
Building wheels for collected packages: deepgraph
  Building wheel for deepgraph (setup.py) ... done
  Created wheel for deepgraph: filename=DeepGraph-0.2.2-cp37-cp37m-linux_x86_64.whl size=373893 sha256=53e6966cdd833e99af226dd925f9d9f1a10259053cd13f4391caa356bbfedabb
  Stored in directory: /home/900/nre900/.cache/pip/wheels/7f/4b/45/caf95420067f7a1795c5664bce0beda747d0ce931c2424c5ff
Successfully built deepgraph
Installing collected packages: deepgraph
Successfully installed deepgraph-0.2.2

```

### Step 4: Validate new installation 


```
[abc123@gadi-login-01 ~]$ python

Python 3.7.4 (default, Nov  6 2019, 12:34:08) 
[GCC 8.2.1 20180905 (Red Hat 8.2.1-3)] on linux
Type "help", "copyright", "credits" or "license" for more information.

>>> import deepgraph
>>> exit()
```

### Step 5: Add path into your job script

If you want to use the program installed in your own space, you can add `export` into your login bash file:

export PYTHONPATH=<`path to where you installed your own software`>:$PYTHONPATH

You can also add into your job script run_ipynb_job.sh as below. Please replace the path to your own working space where you insalled the software.

```
#!/bin/bash
#PBS -N pangeo_test
#PBS -P c25
#PBS -q normal
#PBS -l walltime=5:00:00
#PBS -l ncpus=96
#PBS -l mem=384GB
#PBS -l jobfs=100GB
#PBS -l storage=scratch/z00+scratch/c25+gdata/c25
module load pangeo
export PYTHONPATH=/g/data/c25/apps:$PYTHONPATH
pangeo.ini.all.sh
sleep infinity
```