# Scripts, modules, packages

# scripts
- text file, input for interpreter
- define and use functions, variables

problem: as code grows, scripts unwieldy

In [12]:
%%bash 

python script.py

12.56636
mod __name__: mod
28.27431
script __name__: __main__


### solution: modules
- single namespace
- defines functions, constants, variables, runnable code

### why
- helps maintain/extend code
- easier to read
- easier to reuse across different programs

# Modules

- identified by .py extension
- `sys.path` contains directory list searched by python 
    - current directory
    - shell variable PYTHONPATH
- can be run directly or imported by other scripts/modules


### Example: mod.py

```
pi = 3.14159

def circ_area(r):
    return float(pi*r**2)

#if __name__ == "__main__":
print(circ_area(2)) 
print("mod __name__:", __name__)
```

In [7]:
%%bash
python mod.py

12.56636
mod __name__: __main__


### Importing mod.py

Let's take a look at `scripy.py`, which imports `mod.py`:

```
import mod

print(mod.circ_area(3))
print("script __name__:", __name__)
```

In [13]:
%%bash 
python script.py

12.56636
mod __name__: mod
28.27431
script __name__: __main__


### Importing modules without executing their code

Based on the contents of `script.py` we expected 2 things to be printed but we see more...

#### special variable \_\_name\_\_

We can exploit the fact that within the file, `__name__` changes depending on whether the file was given to python interpreter or imported by another file. 

For `mod.py`, `__name__` takes on values:
- `__main__` if given to python interpreter
- `mod` if imported


To prevent runnable code within mod.py from getting executed (e.g. tests), uncomment out the line containing `if __name__ == "__main__":` and rerun the previous cell!

### Modules: final notes
- to simplify syntax, you can use `from mod import circ_area` to use the function without the dot notation
- to speed up loading modules, python caches the compiled version in the __pycache__ directory 
- Python comes with a library of standard modules, described in the Python Library Reference
- the built in dir() function defines which names a module defines, e.g. type dir(module_name) after importing



In [11]:
import mod
dir(mod)

12.56636
mod __name__: mod


['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'circ_area',
 'pi']

# Packages

- a directory containing a collection of modules
    - each module contains code related to topic (e.g. I/O functionality)
    - use an \_\_init\_\_.py file in directory to indicate it’s a package
- structured using python’s “dotted module names” namespace
    - Module name A.B specifies a submodule B in a package named A

### why use packages?
- divides code into well-structured, logical units with standard layout
    - easier to read/understand/use/extend
- many tools to build, install, distribute packages
- helps prevent namespace conflicts
    - authors of different modules (packages) don’t need to worry about each other’s variable (module) names





### package structure

A basic package not meant for publishing, as it lacks the special files to do so (`setup.py`, `README.md`, and `MANIFEST.in`)

```
datascience
├── __init__.py
├── __main__.py
├── analysis            # submodule
│   ├── __init__.py
│   └── regression.py
└── preprocess          # submodule
    ├── __init__.py
    ├── filter.py
    └── transform.py
```

### accessing modules and functions

See the `__init__.py` function in the `datascience` directory to explore the various ways to change the behavior of import statements. (Un)comment out various sections, restart the Kernel (Kernel -> Restart Kernel), and re-import packages.

In [1]:
import datascience as ds
#from datascience import # not recommended!
dir(ds)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 'analysis',
 'filter_1',
 'filter_2',
 'lasso',
 'logistic',
 'preprocess',
 'transform_1',
 'transform_2']

In [3]:
dir(ds.filter)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'filter_1',
 'filter_2']

### \_\_main\_\_.py 

In addition to organized modules that the user imports, a module/package may also have a script that the user runs directly on the command line. Many people may do this (e.g. `python script.py`) with a file that contains:
```
if __name__ == __main__:
  main()
```

but it could be ambiguous which script in your module is the main file. Instead of using this syntax, you can create a `__main__.py` file that is automatically recognized as the command line interface. The above package can also be run with:


```
python -m datascience
```

or

```
python datascience
```