# Modules and scripts

Pitfalls of Jupyter-based development:
- no separation between the code implementation (i.e. a class) and its use (i.e. an analysis using that class);
- mix-ups between global and local variables can lead to unintended consequences!
- Jupyter allows cells to be executed in ***any order*** and it's hard to keep track of what the program is doing: you may have the impression that everything works but once you reset the program and try to execute it in a **linear** fashion it breaks!

### Modules versus scripts
- Scripts give a sequence of commands to execute
- Modules have code that is designed to be imported and used by another file
- Both are stored as plain text files

In short
- script = code to execute (it does something!)
- module = code to import (class and function definitions)

### Scripts

Let's try making a simple script and running it with `python helloworld.py`, `./helloworld.py` and `python -m helloworld`. The last option treats the script as a module and searches for a module of this name in the module search path.

In [1]:
import helloworld

Hello world!


Treating this as a module doesn't make too much sense - this is code to be executed, so it is better treated as a script. Note that the execution will only happen on the first import.

In [2]:
import helloworld

But you may have noticed that normally when you import something, nothing is executed.

In [3]:
import math

### Modules

Modules are loaded with the `import` statement. They don't necessarily have to be in python! They are helpful for organizing code and making it more reusable. 

Scope is an important aspect of modules, which we will see shortly. **Namespace** is an important concept here, recall the definition (from Wikipedia): a namespace is an abstract container or environment created to hold a logical grouping of unique identifiers or symbols. A namespace groups items (functions/classes/data) together and helps avoid conflicts from repeated names.

Modules can be organized into packages, which we won't cover in depth.

In [4]:
import galaxy_mod

We have seen the built-in function `dir()` before - we will use it more today to 

- Check what is in the module
- Check what is in the **local symbol table** (sort of like the local namespace).

In [5]:
dir(galaxy_mod)

['MyGalaxyClass',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'galaxy_list',
 'name',
 'print_my_galaxy']

In [6]:
dir()

['In',
 'Out',
 '_',
 '_5',
 '__',
 '___',
 '__builtin__',
 '__builtins__',
 '__doc__',
 '__loader__',
 '__name__',
 '__package__',
 '__session__',
 '__spec__',
 '_dh',
 '_i',
 '_i1',
 '_i2',
 '_i3',
 '_i4',
 '_i5',
 '_i6',
 '_ih',
 '_ii',
 '_iii',
 '_oh',
 'exit',
 'galaxy_mod',
 'get_ipython',
 'helloworld',
 'math',
 'open',
 'quit']

In [7]:
galaxy_mod

<module 'galaxy_mod' from '/Users/elisa/Desktop/teaching/Python/galaxy_mod.py'>

Items from the module can be accessed with the dot operator.

In [8]:
print(galaxy_mod.name)

Galaxy catalog


In [9]:
print(galaxy_mod.galaxy_list)

['NGC 5128', 'TXS 0506+056', 'NGC 1068', 'GB6 J1040+0617', 'TXS 2226-184']


In [10]:
galaxy_mod.print_my_galaxy('NGC 1275')

My galaxy name = NGC 1275


In [11]:
galaxy_mod.MyGalaxyClass()

<galaxy_mod.MyGalaxyClass at 0x11331e600>

But this will not work 

In [12]:
print(galaxy_list)

NameError: name 'galaxy_list' is not defined

You can import items from the module individually, we have seen this before.

In [14]:
from math import pi

In [15]:
print(pi)

3.141592653589793


In [16]:
from galaxy_mod import galaxy_list

In [17]:
dir()

['In',
 'Out',
 '_',
 '_11',
 '_5',
 '_6',
 '_7',
 '__',
 '___',
 '__builtin__',
 '__builtins__',
 '__doc__',
 '__loader__',
 '__name__',
 '__package__',
 '__session__',
 '__spec__',
 '_dh',
 '_i',
 '_i1',
 '_i10',
 '_i11',
 '_i12',
 '_i13',
 '_i14',
 '_i15',
 '_i16',
 '_i17',
 '_i2',
 '_i3',
 '_i4',
 '_i5',
 '_i6',
 '_i7',
 '_i8',
 '_i9',
 '_ih',
 '_ii',
 '_iii',
 '_oh',
 'exit',
 'galaxy_list',
 'galaxy_mod',
 'get_ipython',
 'helloworld',
 'math',
 'open',
 'pi',
 'quit']

In [18]:
print(galaxy_list)

['NGC 5128', 'TXS 0506+056', 'NGC 1068', 'GB6 J1040+0617', 'TXS 2226-184']


You can also import to alternative names. This can be particularly useful for avoiding overwriting a name in your local symbol table.

In [19]:
name = "Python for Physicists"

In [20]:
dir()

['In',
 'Out',
 '_',
 '_11',
 '_17',
 '_5',
 '_6',
 '_7',
 '__',
 '___',
 '__builtin__',
 '__builtins__',
 '__doc__',
 '__loader__',
 '__name__',
 '__package__',
 '__session__',
 '__spec__',
 '_dh',
 '_i',
 '_i1',
 '_i10',
 '_i11',
 '_i12',
 '_i13',
 '_i14',
 '_i15',
 '_i16',
 '_i17',
 '_i18',
 '_i19',
 '_i2',
 '_i20',
 '_i3',
 '_i4',
 '_i5',
 '_i6',
 '_i7',
 '_i8',
 '_i9',
 '_ih',
 '_ii',
 '_iii',
 '_oh',
 'exit',
 'galaxy_list',
 'galaxy_mod',
 'get_ipython',
 'helloworld',
 'math',
 'name',
 'open',
 'pi',
 'quit']

In [21]:
from galaxy_mod import name as name_galaxy_cat

In [22]:
print(name)
print(name_galaxy_cat)

Python for Physicists
Galaxy catalog


In [23]:
dir()

['In',
 'Out',
 '_',
 '_11',
 '_17',
 '_20',
 '_5',
 '_6',
 '_7',
 '__',
 '___',
 '__builtin__',
 '__builtins__',
 '__doc__',
 '__loader__',
 '__name__',
 '__package__',
 '__session__',
 '__spec__',
 '_dh',
 '_i',
 '_i1',
 '_i10',
 '_i11',
 '_i12',
 '_i13',
 '_i14',
 '_i15',
 '_i16',
 '_i17',
 '_i18',
 '_i19',
 '_i2',
 '_i20',
 '_i21',
 '_i22',
 '_i23',
 '_i3',
 '_i4',
 '_i5',
 '_i6',
 '_i7',
 '_i8',
 '_i9',
 '_ih',
 '_ii',
 '_iii',
 '_oh',
 'exit',
 'galaxy_list',
 'galaxy_mod',
 'get_ipython',
 'helloworld',
 'math',
 'name',
 'name_galaxy_cat',
 'open',
 'pi',
 'quit']

You can import everything from a module at once, but it is usually not a good idea. Why not?

In [24]:
from galaxy_mod import *

In [25]:
dir()

['In',
 'MyGalaxyClass',
 'Out',
 '_',
 '_11',
 '_17',
 '_20',
 '_23',
 '_5',
 '_6',
 '_7',
 '__',
 '___',
 '__builtin__',
 '__builtins__',
 '__doc__',
 '__loader__',
 '__name__',
 '__package__',
 '__session__',
 '__spec__',
 '_dh',
 '_i',
 '_i1',
 '_i10',
 '_i11',
 '_i12',
 '_i13',
 '_i14',
 '_i15',
 '_i16',
 '_i17',
 '_i18',
 '_i19',
 '_i2',
 '_i20',
 '_i21',
 '_i22',
 '_i23',
 '_i24',
 '_i25',
 '_i3',
 '_i4',
 '_i5',
 '_i6',
 '_i7',
 '_i8',
 '_i9',
 '_ih',
 '_ii',
 '_iii',
 '_oh',
 'exit',
 'galaxy_list',
 'galaxy_mod',
 'get_ipython',
 'helloworld',
 'math',
 'name',
 'name_galaxy_cat',
 'open',
 'pi',
 'print_my_galaxy',
 'quit']

### Another example

In the next lecture we will start working with `numpy`. Let's preview using `numpy.random` for random number generation. The current recommended usage does not involve calling `numpy.random` directly but rather instantiating a random number generator object, i.e.:

```python
rng = np.random.default_rng(seed)
```

As an example, we will write a "wrapper class", a class that "wraps around" existing functionality to make it more convenient for us.

In [26]:
import numpy as np

class RNG:
    def __init__(self, seed : int = 0):
        self.rng = np.random.default_rng(seed=seed)

    def generate(self, shape : tuple = None):
        # this is a bit silly because it only replicates the behaviour of random()
        # but this is just meant as an example
        if shape is not None:
            return self.rng.random(shape)
        else:
            return self.rng.random()

In [27]:
rand = RNG(seed=27)

rand.generate(10)

array([0.69773622, 0.31381427, 0.1211971 , 0.32359152, 0.93121187,
       0.78966731, 0.01001912, 0.19893322, 0.29311369, 0.94341571])

Now let's make it into a module.

- Move the content of the cell containing the RNG definition to a separate file under modules/utils.py!
- Comment out the cell content and import from the file.

In [29]:
from modules.utils import RNG

rand = RNG(seed=55)
type(rand)

modules.utils.RNG

In [31]:
rand.generate(5)

array([0.87064561, 0.2183619 , 0.23009715, 0.4927431 , 0.7532229 ])

In [34]:
# 1. Import one or more items from a module with (optionally) an alias.

from modules.utils import RNG as random_number_generator

a = random_number_generator(9)
a.generate()

0.8702492039700847

In [35]:
# 2. Import a reference to the module (with optional alias) and access its symbols with the '.' operator.
import modules.utils as random_utils
c = random_utils.RNG(14)
# Similar to `import numpy as np`
c.generate()

0.8309833188891115

Why are the generated random numbers identical?

## Packages

A **package** like `numpy` is a collection of modules. The package `numpy` provides the `numpy` module, that provides some basic functionality. Some features of `numpy` are accessible through other modules provided by the same package.

In [36]:
import numpy

# This is the main module of the package.
print(type(numpy))
# These are symbols contained in the main module.
print(type(numpy.ndarray))
print(type(numpy.array))
# This is a module accessible through the main module.
print(type(numpy.random))
# This is a symbol contained in the previous.
print(type(numpy.random.random))

<class 'module'>
<class 'type'>
<class 'builtin_function_or_method'>
<class 'module'>
<class 'builtin_function_or_method'>


The way a package makes accessible its functionality through is main module is based on a chain of `import` statements. 

Packages that are installable through `pip install package_name` are published at [pypi](https://pypi.org/)! You may also learn how to write your own private package and install it locally.

### Best practices
- never use `import *`
- if you plan to use only a few items from the module in specific places, use `from module import class as class_alias`;
- if you plan to use many features all the time, import the module with a short alias `import numpy as np`;
- you may store "constants" in modules but try not to store variables!

### Another script

To show the usefulness of modules, we want now to use the class we have written in a python **script** that we can execute outside of Jupyter.

The way we write a script is creating a file such as `script.py` and defining a `main()` function to call in the body of the script as in the following:
```python
def main():
    pass

if __name__ == "__main__":
    main()    
```

The `if(__name__) == "__main__"` guard statement is required to ensure that `main()` is called only when if file is run as a script! Sometimes, the same file could be imported as a module or run as a script, alternatively. 

Take a look at `simple_script.py` for an example script using the RNG class. You can launch it by running `python simple_script.py` from your console. Alternatively, we can invoke it in Jupyter:

In [38]:
# This command does not use the python interpreter internal to Jupyter but rather "invokes" the `python` interpreter of the underlying system!
!python simple_script.py

Inside main


We can also test the `main()` function of the script inside Jupyter, effectively using the script file as a module!

In [39]:
from simple_script import main as script_main # better use an alias as main is a very common name!

In [40]:
script_main()

Inside main
