# Portable Jupyter Notebooks and libraries

Gregor von Laszewski, laszewski@gmail.com

In this notebook we demonstrate a portable way to develap large jupyter notebooks.

## Editor

Although colab provides an online editor, this editor has limited features and
we recommend avoiding it. The same is said for jupyter notebooks as well as
its successor jupyter-lab. They are good for interactively experimenting
with notebooks when features are not supported by other more powerful editors.

Two other editors are avaloable. VSCode and pyCharm. We recommend pyCharm as it
has many code checking and formating features that are just right for developing
large jupyter scripts.

### Pycharm configuration

It is important to create a python venv that you register with your pycharm
environment so that it is the same as you use on the commandline. We name the
environment ENV3. To do this you use

```bash
python -m venv ~/ENV3
```

To activate it in the command line you say

```bash
source ~/ENV3/bin/activate
```

For Windows you say

```bash
source ~/ENV3/Scripts/activate
```

You can then install with pip or requirements.txt files modules in theis
environment after you ativate it. However if you were to use Google colab, it
is easier to embed such activities in your program. This makes them portable
across colab, and your local machine.

### Installing Libraries

We will give here an example on how to install cloudmesh a library with many
useful add ons that makes managing some portion of your large script much easier.
To install the library use

In [23]:
import os
os.system("pip install -U cloudmesh.common")



0

Now you can use the many useful features of cloudmesh. One is the easy availability
of a banner that makes print statements more visible by putting a frame aound it.

In [24]:
from cloudmesh.common.util import banner
banner("hallo")


# ----------------------------------------------------------------------
# hallo
# ----------------------------------------------------------------------



One additional function is an anhanced command to run commanline programs from
within jupyter notebooks as well as regular pythin programs. This method is
prefered as it allows portable program execution across a variety of operating
systems, google colab, and other environments. Thus we **DO NOT** recommend
that you use ! or !! as command execution as it will potentially making the
move to a future integarion in a python library of portions of the notebook
more difficult. Instead just use the command `Shell` class from cloudmesh
that contains mainy useful methods portable between operating systems. Such
methods include for example `pwd`, `mkdir`, `run`, `grep`, `cm_grep` (a grep
portable in windows, Linux, and Mac), and so on. Please refer to
[https://github.com/cloudmesh/cloudmesh-common/blob/main/cloudmesh/common/Shell.py](Shell.py)

In [25]:
from cloudmesh.common.Shell import Shell

In [26]:
basedir = Shell.pwd()
print(basedir)

/Users/grey/Desktop/github/dsc-spidal/dl-hec


In [27]:
datadir = f"{basedir}/data"

In [28]:
content = Shell.mkdir(datadir)
os.system(f"echo >> {datadir}/gregor.txt")
content = Shell.run(f"ls {datadir}")
print (content)

gregor.txt



## Installing libraries directly from github

To develop in a team large libraries it is best practice to use a code
repository such as GitHub. However, although instalation of libarries into the
venv on your local machine is simple and usually done with a requiremets.txt file,
we can also install them directly on your local machine when checking out the code
and using in the directory `pip install -e .` (the dot is important).
This will make it possible on your local machine to make modifications that are
directly available to you without reinstalling it once you do a change
on you rlocal machine.

However, on google colab you may need to do this directly from the notebook.
It is to be noted that if you make a change the notebook kernal needs to be
stopped or interrupted and restarted before you load in the new library. It is
not sufficient to just go in the cell and rerun it. This means you have to
start the notebook form the beginning. For this reason it is best to do all
development first on the local computer before you move to colab. It will
simplify your development cycle, especially when developing libtraries.
Thus notebooks are not realy suitable for best practices in software
engeneering, but are great for interactive exploration of principles
and experiments.

To include a library that is hosted on GitHub we simply can activate it in google
 col;ab directly from the latest version of the code such as (but do not forget
 to interrupt the kernal and restart it before doing this command as otherwise
 an earlier verison may be still available.)

In [29]:
print(Shell.run("pip install -U git+https://github.com/DSC-SPIDAL/dl-hec.git"))

Collecting git+https://github.com/DSC-SPIDAL/dl-hec.git
  Cloning https://github.com/DSC-SPIDAL/dl-hec.git to /private/var/folders/q5/s8_pcggn5f73xnz11zjqrhlw0000gp/T/pip-req-build-87oh7t0a
  Running command git clone --filter=blob:none --quiet https://github.com/DSC-SPIDAL/dl-hec.git /private/var/folders/q5/s8_pcggn5f73xnz11zjqrhlw0000gp/T/pip-req-build-87oh7t0a
  Resolved https://github.com/DSC-SPIDAL/dl-hec.git to commit 36fd5def45d517abf6494208134222dcee427b1c
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'



Now you can use the functions defined in hec.util

In [30]:
from hec.util import timenow

In [31]:
t = timenow()
print (t)

01/03/2023, 09:01:47 UTC


In [32]:
from hec.util import NaN
print (NaN)

nan


## Globals

Using Global variable is often to be avoided as it can introduce unintended
side effects. However, in some cases they can be an easy way to communicate
global state in programs. HOwever the following has to be considered:

1. globals are defined sepeartly for each file. Thus if they need to be reused
   between files they must be explicitly defined in each file the same global is
   reused.
2. Before a global variable can be reused it must be decalerd and initialized.
   Without that the global variable can not be used
3. It is advisable in large programs to declare all globals in a single file so
   it is easier to track if they have been declared. It is possible to have
   multiple such files, so that they can be organized by concern/topic.
4. If used in a notebook, the initialization of the global variables need to
   be done first. Then each variable used needs to be explicily imported. It
   helps to create such imports as comments in the file they are decalared so
   the import can be copied into other files using the global statement.
   To remove the comments in the file where they are supposed to be
   used simply highlight all lines with the commented globals and say
   `COMAND-/` to remove the `#`. We recommend that you remove all global
   variables statements that you do not use

**THIS IS NOT YET WORKING**

In [1]:
from hec.global_variables import my_global_a
from pprint import pprint

print(my_global_a)

print("check if it is in globals:", 'my_global_a' in globals())

def my_function_using_a_global():
    global my_global_a
    print(my_global_a)
    my_global_a = 1
    print(my_global_a)

my_function_using_a_global()

print (f"my_global_a={my_global_a}", "should be 1")

None
check if it is in globals: True
None
1
my_global_a=1 should be 1


To import all global variables from a file we can use the usual mport pythoin
statement with an as statement to give it a "topic" name so we can use it while
preceeding the topic name

In [2]:
import hec.global_variables as gl

print (gl.my_global_a)
print(gl.my_global_b)
print(gl.my_global_c)

None
b
c


Naturally, we still could use an explicit import

In [3]:
from hec.global_variables import my_global_b
from hec.global_variables import my_global_c
print (my_global_b, "=", gl.my_global_b)
print(my_global_c, "=", gl.my_global_c)

b = b
c = c


## Classes to organize variables

The best way to organize large codes is with the use of classes as they include
the ability not only to add variables, but also to add methods that operate on
these variables.

In [2]:
from hec.myclass import myclass
config = myclass()

print(config.a)

config.print_values()

a
a
b
c


## Syntax highlights

Arguably one of the most valuble features of pyCharm is syntax highlighting.
You will need to make sure that you import in your code the libraries that you
develop as `Mark Directory as -> Source Root`. This will make sure your syntax
highlights are properly applied to your library. In our case we do it on `dl-hec`

## Filenames in lowe case letters

To be consistent and avoid issues with capitalization in filenames, as some
operating system filesystems do not distingish between lower and uper letters
in commandshells we recommend to use all lower case letters for filenames.
That this is best practice can even be seen when exporting an ipython notbook
to a python file as the filename will be converted to all lower case letters
(at least on macOS).

