# Table of Contents:
- [Modules](#What-is-a-module?)
- [Packages](#Packages)
- [Files and Utilities](#Files-and-Utilities)

Based on the notebooks by Eric Franzosa, Ph.D. (franzosa@hsph.harvard.edu), edited by prof. Nicola Tonellotto.

## What is a module?

* A **module** is Python code designed to be re-used in other code
    - Sometimes referred to as “Libraries” in other languages
* Don’t reinvent the wheel
* Also helps with organizing code related to a particular domain


* A module can be as small as a single Python script
    - In fact, every Python script can be used as a module (more on that shortly...)
* Some modules come bundled with Python
    - The Python “Standard Library”
* Other (more specialized) modules can be installed separately
    - For example, the Python "scientific stack": `scipy`, `numpy`, `pandas`, and `matplotlib`
    - Anaconda installed a bunch of these for us automatically
* You can write your own modules

## Using modules

* We use modules by importing them into our code

In [None]:
import time

for i in range(3):
    time.sleep(1)
    print("sleep one second")

In [None]:
from time import sleep 

for i in range(3):
    print("sleep one second")
    sleep(1)

* `time` is a built-in module for dealing with matters related to, well, time

Note:
* Modules contain the same sort of elements as other Python code
* Modules can contain variables
    - `math.pi` contains the value of pi (to many decimal places)
    - `string.uppercase` contains the uppercase English alphabet
* Modules can contain functions
    - `time.sleep` pauses computation for N seconds
    - `math.sqrt` returns the square root of a number
* Modules can contain classes defining other data types
    - `collections.Counter` a special dictionary for counting

## A demonstrative example

* The following example assumes I have two Python files in the same folder
    - `script.py` is a new script I am working on (or a jupyter notebook --> `notebook.ipynb`)
    - `module.py` is some existing code that I want to re-use
    
    <img src="Figures/module1.png" alt="drawing" width="300"/>

Aside: Atom stands for an IDE (Integrated Development Environment), that is a software application that provides facilities for software development.


![drawing](Figures/module2.png)

![drawing](Figures/module3.png)

![drawing](Figures/module5.png)

![drawing](Figures/module6.png)

![drawing](Figures/module7.png)

![drawing](Figures/module8.png)

![drawing](Figures/module10.png)

![drawing](Figures/module11.png)

![drawing](Figures/module12.png)

![drawing](Figures/module13.png)

![drawing](Figures/module14.png)

## Module concepts
- You’ll find a new file in the working directory (or in the "__pycache__" directory) called `module.pyc`
- This is an intermediate “compiled” representation of the Python code (oversimplifying)
- Compiled code can be acted on more directly by the computer
- Rather than compiling “on the fly” (as with script-code), module code might not change as often, so we save time by keeping a compiled version

<img src="Figures/module15.png" alt="drawing" width="400"/>

## Module concepts

* Not all modules live in your working directory
* Most of them live in the folder for your Python installation (and specifically in the folder of the dedicated environment)
* New packages will be installed there too
    - e.g. when i type `conda install scipy`

In [None]:
import sys
sys.path # A list of strings that specifies the search path for modules. 
         # Initialized from the environment variable PYTHONPATH, plus an installation-dependent default.

# Packages

- Complex modules are organized in a nested structure
- **Packages** are a way of structuring Python's module namespace by
using “dotted module names”.
    - E.g., the name A.B designates the "B" submodule in the "A“ package.
    - `scipy.stats.wilcoxon` is a function in the module `stats` in the package `scipy`
    - In reality, `scipy` would be a folder, `stats` would be a file within that folder, and `wilcoxon` would be a function in that file

In [None]:
import matplotlib.pyplot # package for MATLAB-like plotting framework.

In [None]:
import matplotlib.pyplot as plt # "as" keyword defines an alias

In [None]:
from matplotlib.pyplot import plot

In [None]:
plot([1, 2, 3], [1, 2, 1])

# Files and Utilities

The *os* module include many functions to interact with the file system.


In [None]:
import os
os.listdir() # returns a list of the content of the current directory

## Writing and Reading
In order to use a file we have to:
- open the file
- handle it for doing some operation
- close the file

The open() function opens and returns a file handle that can be used to read or write a file in the usual way. 

The code `f = open('name', 'r')` opens the file into the variable `f`, ready for reading operations, and use `f.close()` when finished. 

The built-in `open()` function returns a **file object** and allows to work with files. It takes two parameters as arguments: the first argument is a string specifying the name of the file to open, while the second specifies the opening mode:

* `'r’`: opens the file in *read-only mode*
* `'w’`: opens the file in *write-only mode*
* `'a’`: opens the file in *appending mode*
* `'r+’`: opens the file in *reading and writing mode*

If you don’t provide a second parameter that defines the file’s opening mode in the `open()` function, the file will be opened by default in read-only mode.

### Writing a File

In [None]:
out_dir = 'out'
if not os.path.exists(out_dir):# check if a directory already exists...
    os.makedirs(out_dir)         # ... otherwise create it

In [None]:
a = range(100)
b = [x * x for x in a]
file_path = os.path.join(out_dir, 'prova.csv') # Join one or more path components intelligently
file_path

In [None]:
mf = open(file_path, 'w') # open the file in writing mode
for x in zip(a, b):
    mf.write(f'{x[0]}, {x[1]}\n')
mf.close()  # close the file!

### Reading a File

In [None]:
mf = open(file_path, 'r')
for line_number, line in enumerate(mf):  ## iterates over the lines of the file
    print(line)    
    if line_number == 10:
        break
mf.close()  # close the file!


It is considered a good practice to always close files after you are done with them (whether for writing or reading). The `with` statement is used with *context managers* to enforce conditions before and after a block is executed. The `open()` function serves as a context manager to ensure that a file is closed when the block is left. The following code using the `with` statement is equivalent to the previous one:

In [None]:
# a preferable sintax (both for writing and reading): file is closed automatically
with open(file_path, 'r') as mf:
    for number, x in enumerate(mf):
        print(x)
        if number == 10:
            break

In [None]:
with open(file_path, 'r') as mf:
    x = mf.readlines() #returns a list: each line is a list element.
print(x[0:10])
x = [element.strip() for element in x]
print(x[0:10])
len(x)