<a href="https://colab.research.google.com/github/albertomanfreda/intensive_school_ml/blob/master/Lesson6.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Python Modules and the Standard Library

When creating a large project, it is quite rare to have all the code stored in a single file, as shorter files are much easier to use and mantain. In order to avoid repeating line of codes around, it is highly desirable to reuse a function defined in just one place through many other different scripts.

Python offers such a functionality through modules. A **module** is just a usual python code file, containing statements and function (or class) definitions. You can import a module into another python file using the **import** statement, making available the functions, classes and variables defined there.

Files (or packages) which are not meant to be executed directly, but rather to provide functionalities for other programs, are called **libraries**.

Even if you do write your code all in a single file, chances are that you will at least import modules from the **Python standard library** or from some other package, like *NumPy*.

Python is famous for coming *with batteries included*: in the standard library you will find useful functionalities for most of the tasks you need to accomplish.

## Different import styles

In [None]:
# 'math' is a library of mathematical functions
import math
# To access functions inside the math module we need the dot operator
print(math.sin(0.5))

# We can also import only specific functions
from math import cos
# In this case the function is accessible directly (no dot operator required)
print(cos(0.5))

# We can also do this, to import all the functions, though you really shouldn't
from math import *
# This is considered a bad practice, as we don't know which names are imported
tan(0.5)

## The Python Standard Library

### Built-in functions
Those function are always available and require no import statement (the Python interpreter automatically loads them upon starting).

We have already seen a few of them: *print()*, *zip()*, *format()*, *len()*, *range()*, *type()*, ...

Let's take a look at a couple more.

In [None]:
# sum() can sum all the elements in an iterable
my_tuple = (1, 2, 3)
print(sum(my_tuple))

# max() [min()] returns the greatest [lowest] value 
print(min(my_tuple), max(my_tuple))

# abs() compute the absolute value
print(abs(-3))


### math

math contains a lot of useful mathematical functions. Note that, when working woth arrays, you will usually use functions from NumPy instead of from math

In [None]:
import math
# Print the list of functions defined in math
dir(math)

### os

In [None]:
# os is a cross-platform library for interacting with the OS
import os
# Get the current working directory
print(os.getcwd())
# Read or modify an enviorment variable
print(os.environ['PYTHONPATH'])
# Get info about the current system
print(os.uname())

### os.path and glob

In [None]:
# os.path is a library for manipulating paths
import os.path

# Split a path at the last '/'
print(os.path.split(os.getcwd()))

# Join arguments with the directory separator appropriate for the os
print(os.path.join('user', 'data', 'results'))

# Check if a file or a directory exists
print(os.path.exists(os.path.join(os.getcwd(), 'sample_data')))

# glob is a library supporting Unix-style path expansion
import glob

# Let's create a sample file
with open('sample_file.txt', 'w') as f:
    pass

# List all the txt files in the currnt directory
glob.glob(os.path.join(os.getcwd(), '*.txt'))

### datetime, time

In [None]:
# Libraries for time and dates handling
# WARNING: this is way more complex than you may think
import time
from datetime import datetime

# Print the current Unix Time - seconds since January 1, 1970, 00:00:00 (UTC)
print(time.time())

# Create a datetime object
date = datetime(2020, 9, 7, hour=12, minute=30, second=20)
print(date)
# Create another one, from a formatted string
date2 = datetime.strptime('2020-11-03 08:11:55', '%Y-%m-%d %H:%M:%S')
print(date2)
# Get the difference between the two
deltat = date2 - date
print(deltat)

### logging

Logging is a invaluable useful library. It can be usde to print nicely formatted information about the execution both to terminal and to file. Its major advantage over the basic *print()* function is that you can set a **level** of importance of everything you print (CRITICAL > ERROR > WARNING > INFO > DEBUG > NOTSET) and only show messages above a certain level.

For example you can add to your code a lot of debug information and avoid printing it when you are done debugging just by changig the logging level, without having to remove all those lines.

In [None]:
import logging

# Only messages of level WARNING or greater are shown by default
logging.debug('This is not so relevant')
logging.info('This is ordinary stuff')
logging.warning('You should probably take a look at this')
logging.error('Oops... this sounds like it\'s getting serious!')
logging.critical('RUN!')
# The word 'root' which you see in the messages means that the 'root' logger
# is printing them (which is the default one)

In [None]:
# Let's change this

import logging

# Get a logger object
logger = logging.getLogger('mylog')
# Change the level
logger.setLevel(logging.DEBUG)

logger.debug('This is not so relevant')
logger.info('This is ordinary stuff')
logger.warning('You should probably take a look at this')
logger.error('Oops... this sounds like it\'s getting serious!')
logger.critical('RUN!')

### Other relevant modules:

  * **argparse** for command line option parsing
  * **unittest** for testing your code
  * **random** for pseudo-random number generation (though we will use NumPy for that)
  * **collections** for even more fancy data containers
  * **itertools** for advanced iteration
  * **threading** and **multiprocessing** for multithread programming

and many many others: https://docs.python.org/3/library/


## Create your own module

Creating a module is as simple as creating a python file. One thing that should be noted, hower, is that all the code in a module that is not inside a function (or a class) is executed when the module is imported, so be careful about what you put there.

In order for the import to work, the module needs to be in a place where the Python interpreter is able to find it. This means either:

* The same directory as the script calling it
* One of the directories stored in the *PYTHONPATH*, an environement variable containing a list of directory names
* The system default

You can see the full list of directories where modules are searched in the variable *sys.path*

In [None]:
import os
import sys
print(os.environ['PYTHONPATH'])
print(sys.path)


For this lesson, I have loaded on my Drive space a *.py* file containing just a single function *merge_strings*. Since we are working on a Google colab, we need to do some initialization in order to be able to access the module, mainly mounting my drive space so that it can by accessed here. This kind of setup is not required when files are on your system.

We also update the sys.path variable so that the Python interpreter actually finds my module when we import it.


In [None]:
from google.colab import drive
drive.mount('/content/gdrive')
sys.path.append('/content/gdrive/My Drive/Colab Notebooks/')

In [None]:
""" Let's take a look at the content of the file using notebook support for bash
commands (don't worry if you don't understand the following line, it's not
Python code)"""
!cat '/content/gdrive/My Drive/Colab Notebooks/stringlib.py'

In [None]:
import stringlib
stringlib.merge_strings('a', 'b')