# Introduction to Programming
## Foundations - Chapter 8

- [Modules](#docstrings)

- [Built-in modules](#builtin)

    - [OS](#OS)
    
    - [sys](#sys)
    
    - [collections](#collections)

    - [math](#math)

    - [statistics](#statistics)

    - [random](#random)

    - [datetime](#datetime)

---

<a id="modules"></a>
### Modules

What is a Module?
Consider a module to be the same as a code library.
In practical terms, it is simply a Python script containing a set of objects (classes, functions, and possibly variables) you want to include in your application.

To create a module you need to create a Python script, i.e. a `.py` file, and in this file you will write down all the components that will make up your module.

Example: create a script `mymodule.py` to include a variable, a function, and an class.

Then, to include the objects you have defined in the script `mymodule.py` into your application, you can just load them using the keyword `import` followed by the name of the script without the `.py` extension.
The general sintax to load a script is:
```
`import <name of the module script without extension .py>`
```

Example: import your module `mymodule.py`.

In [1]:
import mymodule

In order for `import` to work, the script you are importing needs to be in the same folder as the application that is calling it.

Once you have imported your module, all the objects defined in it become available in your current application using the `.` syntax as follows:
```
<module name>.<object in the module>
```
and you might need to use parentheses after the object name if your are calling a function or a class.

To have an overview of the components in a module, you can use the `dir()` function, which returns a list of all the objects defined in the module.
Alternatively, you can use tab-completion after typing `<module name>.`: `<module name>.` + Tab.

Example:`mymodule.py` contains a variable, a function, and a class (an object template). Explore the module using `dir()`. Print the varible. Run the function. Create an object using the class and run the only available method.

In [2]:
print(dir(mymodule))

['SomeClass', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'some_function', 'some_variable']


In [3]:
print(mymodule.some_variable)

[1, 2, 3]


In [4]:
mymodule.some_function()

this is the function in mymodule


In [5]:
myobject = mymodule.SomeClass()

In [6]:
myobject.some_method()

this is the method in SomeClass in mymodule


Generally speaking, when using `import` there is no syntactic difference between loading a module and executing any generic Python script from within an application.

By convention, a module contains only definitions and does not contain any actions, so that when you import it additional objects and classes are created, but nothing is run.

However, if a script contains any commands to be run, these commands will be run when you use the `import` keyword.

Example: add a simple action to `mymodule.py` (for instance, a `for` loop that prints some values) and import it again. 

In [1]:
import mymodule

0
1
2
3
4
5
6
7
8
9


While you should stick to the convention of only including definitions when defining a module, you can also use the `import` functionality to split your workflow into multiple regular Python scripts to be loaded one after the other.
For instance, you could have one script for loading data, a second script for cleaning, a third for model estimation, and finally one for visualization.

As we said, a file containing a module or a file containing a general script have no formal differences: they are both `.py` files. 
What makes them a module or a regular script is just a matter of convention.

Using `import` is a convenient way to start your script for assignment 4.
Make sure that CodeMaker.py is in the same directory as the script or Jupyter notebook where you are building your solution.

In [2]:
import CodeMaker

In [3]:
CodeMaker.code_maker.respond([1,2,3,4])

['partial', 'wrong', 'partial', 'partial']

In [4]:
CodeMaker.code_maker.get_last_response()

['partial', 'wrong', 'partial', 'partial']

In [5]:
CodeMaker.code_maker.get_all_responses()

[['partial', 'wrong', 'partial', 'partial']]

In [6]:
CodeMaker.code_maker.find_matches([3,4,2,1], [1,2,3,4])

['partial', 'partial', 'partial', 'partial']

#### Nested modules

As projects grow in size, it can become necessary to split the components of your module into submodules to keep your code organized.

For instance, if you are building a mathematical model for your data, you generally would need functions to translate the mathematics into code, and functions to program an algorithm that will estimate that mathematical model.
You can define these functions in two submodules, to keep your code organized by topic.
Then, you can then create a single module that does nothing more than loading all the required submodules, so that from your application you only need to load this "umbrella" module to have access to all the different components you need.

Of course the need for and benefit of this grows with the number of submodules.
As we will see, large Python modules such as `matplotlib` have a large number of submodules.

Whenever modules are loaded using such a nested structure, the `.` syntax to access their components changes slighlty.
In particular, you will need to use the `.` syntax to reconstruct the path from the "umbrella" module (the one you load directly) all the way to the component you want to use in some submodule.
To do this, tab-completion is very helpful.
You can as usual also use `dir()`.
The stylized syntax can look as follows:
```
<"umbrella module">.<specific submodule>.<specific component of the submodule>
```
where again parentheses might be required.

Example: define 2 submodules with 2 simple functions. Then define a module that merely loads both these submodules. `import` this "umbrella" module and access the 2 functions in the 2 submodules.

In [7]:
import umbrellamodule

In [8]:
dir(umbrellamodule)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'submodule1',
 'submodule2']

In [9]:
dir(umbrellamodule.submodule1)

['__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'some_function']

In [10]:
umbrellamodule.submodule1.some_function()

this is the function in submodule 1


In [11]:
umbrellamodule.submodule2.some_function()

this is the function in submodule 2


#### Alternative ways of loading

Using `import + <module name>` loads all the components in the module and makes them accessible using the syntax `<module name>.<...>`.
There are two alternatives to slightly change the behavior of `import` and how to access the components.

The first alternative is to define an **alias** for the module.
An alias is an alternative name to be used to access the module's components using the `.` syntax.
Typically, this alias will be an abreviation of the module's name.

To import a module using an alias, use both the keyword `import` and the keyword `as`, as in the following stylized syntax:
```
import <module name> as <alias>
```

When an alias is used, the module's name cannot be used to access its components. 
The alias needs to be used throughout instead, replacing the module's name in the `.` syntax: `<alias>.<object in the module>.

Example: create `newmodule.py` with a single simple function in it. Import `newmodule` under the alias `nm`. Run the function therein.

In [12]:
import newmodule as nm

In [13]:
nm.some_function()

I am the function in newmodule.


In [14]:
newmodule.some_function()

NameError: name 'newmodule' is not defined

Another option for importing is to selectively load one or more objects from a module.
To only import some named objects from a module, use the combination of keywords `from` and `include` as in the following stylized syntax:
```
from <module name> import <comma-separated names of the objects to be imported>
```

Including objects selectively is obviously only recommended when a few components of the module are needed. 
Selective import makes it more convenient to work with these few named objects, by loading them to the global namespace of your application under their own name.
This means that the objects imported using the `from` keyword can be called directly, without the `<module name>.` syntax.
In fact, they do not support the `.` syntax.

Example: create `anothernewmodule.py` with a single simple function in it. Import the function selectively and run it.

In [15]:
from anothernewmodule import some_function

In [16]:
anothernewmodule.some_function()

NameError: name 'anothernewmodule' is not defined

In [17]:
anothernewmodule.some_variable

NameError: name 'anothernewmodule' is not defined

In [18]:
some_function()

I am the function in anothernewmodule


You can use the `from` - `include` keywords in addition to the asterisk / star / splat `*` to import all the objects in a module under their own name in the global namespace (i.e. you do not have to access them with the `.` syntax). 
The stylized syntax is the following:
```
from <module name> import *
```

This is however not recommended unless you know exactly what is in the module, because you might accidentally overwrite names that are reserved by Python for other objects, thus "breaking" Python.

<a id="builtin"></a>
### Built-in modules

In addition to defining and importing your own modules, the `import` functionality in Python allows you to load the function stored in the optional function packages that come with Python (the built-in modules) or with optional / third-party modules (such as the major data wrangling modules: matplotlib, numpy, pandas).
Let's first discuss built-in modules.

In addition to built-in functions, which are always available in Python, a large number of other pre-defined functions are bundled with Python distributions, but are not automatically made available. 
These functions are defined and stored in modules that come with Python, i.e. they are built-in modules, and can be loaded as usual with the `import` keyword, when and if needed.

These additional modules store that are common but will not be crucial for Python to work, and require some level of specialization or sophistication of the program being written (or the programmer).
For instance, these are functionalities that allow Python to interact with the machine (the OS or Sys modules), basic mathematical or statistical functions (the math or statistics modules), or work with specific data (like time with datetime, or strings with re).

The list of built-in Python modules is available in the offical documentation [here](https://docs.python.org/3/py-modindex.html), or they can be listed by calling all available modules with `help('modules')` (it takes a while for this function to complete). 
The latter lists both built-in and optional modules.

Once you know the name of the module you want to load, as we said you can use the standard syntax:
```
import <module name>
```

By convention, built-in modules are not aliased.

We are only going to discuss a subset of important functions and classes in these modules, if any.
For further details of their components, you can use `dir()` or the documentation.

<a id="os"></a>
#### OS

The OS module provides functions to many operating system tasks from within a Python script. 
It includes functions for creating and removing a directory (folder), fetching its contents, changing and identifying the current directory, etc.
It can be imported as follows:

```python
import os
```

<a id="sys"></a>
#### sys

The sys module provides functions and variables used to manipulate different parts of the Python runtime environment (think about the options of Python).
It can be loaded as follows:

```python
import sys
```

The most useful of these manipulations is to instruct Python to look for a module you defined and that is stored in a specific directory.
After importing `sys`, the following syntax can be used:
```
sys.path.insert(0, <path to the directory where the module is stored>)
```

<a id="collection"></a>
#### collections

The collections module provides alternatives to built-in data structures list, tuple and dict.

The most useful is the `deque` type, which is a list that can be appended or popped both at the beginning and the end. 
It performs these operations much more efficiently than a list.

#### math

`math` is the built-in module for basic mathematical tasks.
It features some of the most popular mathematical functions, such as trigonometric functions, logarithmic and exponentiation functions, angle conversion functions, etc. 
It also contains useful popular mathematical constants, such as $\pi$, defined as variables up to the maximal numer of digits you computer can support (with double precision).

Some of the most useful functions and variables in `math` are the following.

`math.exp()` take a single argument and calculates the value of Euler's number $\mathrm e$ raise to the power of the argument that is passed, i.e. $\mathrm e ^ {\text{<argument>}}$.
Alternatively, this same computation can be performed using the constant `math.e`, which stores the value of Euler's number, and the usual exponentiation operator `**`.

`math.log()` and `math.log10()` take a single argument and return its natural and base-10 logarithms respectively.

`math.pi` stores the constant $\pi$.

`math.ceil()` and `math.floor()` round a fractional number to its next and previous integer respectively.

The complete list of functions and variables can be consulted [here](https://www.w3schools.com/python/module_math.asp).

Example: load the math package, floor and ceiling round a given number.

In [20]:
import math
print(round(.5))
print(round(1.5))

0
2


In [21]:
print(math.floor(1.5))
print(math.ceil(1.5))

1
2


Example: load the math package and define a function for the Normal probability density function:
$$ \mathcal N(x | \mu, \sigma^2) = \frac{1}{\sqrt{2\pi} \sigma} \mathrm e ^ {- \frac{(x - \mu)^2}{2 \sigma^2}}$$

In [22]:
import math
def normal_pdf(x, mu, sigma):
    return math.exp(- .5 * ((x - mu) / sigma) ** 2) / (sigma * (2 * math.pi) ** (1/2))

In [24]:
normal_pdf(0, 0, 1)

0.3989422804014327

<a id="statistics"></a>
#### statistics

`statistics` provides basic statistical functions to work with numeric data. 
Some of the most useful functions and variables in `statistics` are the following.

The most commonly used are `statistics.mean()`, `statistics.median()`, `statistics.mode()`, and `statistics.stdev()`, which take as an input a numeric data structure and return the associated summary statistic. 

The complete documentation can be consulted [here](https://docs.python.org/3/library/statistics.html).

Example: import statistic. Create a sample of 10 numbers. Calculate the density at zero of the sample distribution, assuming it is Normal. 

In [25]:
import statistics
sample = [1, 5, -10, -12, 4, -2, 20, -2, 9, -7]
normal_pdf(0, statistics.mean(sample), statistics.stdev(sample))

0.041702387227688674

In [27]:
print(statistics.mean(sample))

0.6


In [28]:
statistics.stdev(sample)

9.547541859324605

<a id="random"></a>
#### random

We have already used, a bit cluelessly, the built-in `random` module.
`random` provides functions to instruct your computer to generate randomness, or more precisely random variables.
It can be used generate a random draw from a distribution or set, to shuffle elements randomly, etc.
It can also be used to make the computer-generated randomness replicable.

A computer uses what is known as a random number generator, which needs a number to start.
This number is called a **seed**.
By default Python's random number generator uses the current system time as the seed.
`random` provides a function to change the seed so that the randomness you generate becomes replicable (which is important for making your results credible).


I consider the following the most important functions in `random`.

- `random.random()` takes no argument and generates a random float between 0 and 1, i.e. a random draw from a standard uniform distribution. It is the building block of all functions that generate a draw from a distribution. In fact, you can use `random.random()` plus the inverse of the cumulative distribution function of a given distribution to sample from it.

- `random.randrange()` is a function we have already used. It takes two mandatory arguments, two integers, and generates a random draw from the integers in `[<first argument>, <second argument>)`. It can be provided a third optional argument to set a step argument, so that the random only considers integers at step-distance.

- `random.sample()` takes two arguments, an iterable and a sample size, and then draws elements from the iterable **without replacement** to create a sample of the desired size. Thus, the required sample size cannot exceed the length of the iterable. If the sample size is set exaclty to the length of the iterable, then the iterable is shuffled.

- `random.seed()` takes one argument: the seed, i.e. the number to start the random number generator. It can be added before any other random function, so that the random numbers generated can be replicated every time you run your script.

The complete documentation can be consulted [here](https://docs.python.org/3/library/random.html).

Example: import random and consider the list `sample` from above. Sample 3 elements from it without replacement. Shuffle it. Sample 15 elements from it with replacement. Finally check the workings of `random.seed()` by making the above replicable.

In [44]:
import random
print(random.sample(sample, 3))
random.seed(1)
print(random.sample(sample, len(sample)))
[ random.sample(sample, 1) for i in range(0, 15)]

[1, 9, -7]
[-10, 5, 4, 1, -12, -2, -2, -7, 9, 20]


[[5],
 [-2],
 [1],
 [20],
 [20],
 [-7],
 [1],
 [-2],
 [4],
 [-12],
 [-7],
 [5],
 [-2],
 [1],
 [1]]

<a id="datetime"></a>
#### datetime

When we discussed basic data types, we did not mention dates and times.
Python however ships with the machinery required for dealing with this specific type of data.
To load this machinery, we need to import the built-in `datetime` module.

The following are the central functions of `datetime`. Since the module `datetime` contains submodules, more elaborate `.` syntax is required to access the following functions after you run `import datetime`. Alternatively, one could use the `from + import` syntax.

- `datetime.datetime.now()` returns the current date and time as `year`, `month`, `day`, `hour`, `minute`, `second` (accurate to several fractional digits).

In [1]:
import datetime

In [2]:
datetime.datetime.now()

datetime.datetime(2021, 10, 7, 12, 36, 23, 598225)

In [3]:
now = datetime.datetime.now()
print(now)

2021-10-07 12:37:33.186021


In [4]:
now

datetime.datetime(2021, 10, 7, 12, 37, 33, 186021)

- `datetime.datetime()` can be used to create a date. It takes three mandatory arguments: year, month, and day. As optional arguments is can take the following: `hour`, `minute`, `second`, `microsecond`, `tzone`. `tzone` allows to specify the time zone and it is defaulted to `None`. All the other optional arguments are defaulted to `0` or `None`.

In [5]:
dt_today = datetime.datetime(2021, 10, 7)
dt_today

datetime.datetime(2021, 10, 7, 0, 0)

In [6]:
print(dt_today)

2021-10-07 00:00:00


- `datetime.strptime()` and `datetime.strftime()` can be used to convert strings to datetime objects and vice versa respectively. These exist also as methods of any datetime object. 

    - `datetime.strptime()` takes two arguments. The first is a string containing a date and time, e.g. `'20090511 19:11'`. The second is a string describing the format of the date and time in the string, using the legal format codes for dates and time (see table below). In the example, `'%Y%m%d %H:%M'`. It returns a datetime object with all the information available in the string.
    
    - `datetime.strftime()` also takes two arguments. The first is a datetime object. The second is the desired format of the string to be produced, again to be passed as a string following the legal format codes. It returns a formatted string.

The legal format codes for date and time are the following:
<div align="center">
        <img src="LFC.png" alt="LFC" width="800"/>
</div>



Example: convert '210101 3:32PM' to a datetime object. Print that object as 'Friday January 2021, 01 - 3:32 PM'.

In [7]:
dt = datetime.datetime.strptime('210101 3:32PM', '%y%m%d %I:%M%p')
print(dt)

2021-01-01 15:32:00


In [8]:
dt

datetime.datetime(2021, 1, 1, 15, 32)

In [10]:
dt.strftime('%A %B %Y, %d - %I:%M %p')

'Friday January 2021, 01 - 03:32 PM'

`datetime` objects have useful methods.
In addition to those mentioned above, they feature methods `date()` and `time()` to only access the associated part of a datetime object.

In [11]:
print(dt.date())
print(dt.time())

2021-01-01
15:32:00


Another useful method is `replace()`.
It allows to temporarily alter the information about date or time. 
`replace()` does not performe in-place modification of the object, so you need to redefine it to update some information.
Use keyword arguments to pass only the change you wish to make. 

In [12]:
dt

datetime.datetime(2021, 1, 1, 15, 32)

In [13]:
dt.replace(year = 2020)
print(dt)

2021-01-01 15:32:00


In [14]:
print(dt.replace(year = 2020))

2020-01-01 15:32:00


In [15]:
dt = dt.replace(year = 2020)
print(dt)

2020-01-01 15:32:00


#### time

This module provides various time-related functions. 
The only function I wish to mention is the `sleep(<number of seconds>)` function, which allows you to instruct Python to wait a user-specified number of seconds before continuing with the execution of a script.

In [16]:
from time import sleep

In [18]:
for i in range(0,10):
    print(i)
    sleep(1)

0
1
2
3
4
5
6
7
8
9


#### re

We have mentioned how Python is extremely powerful for dealing with string data.
Part of Python's functionalities for manipulating and analyzing string data are relegated to the built-in `re` module, standing for Regular Expressions.
A regular expression, also called "regex", is a special sequence of characters (somewhat similarly to legal format codes) that forms a search pattern.

I here simply want to mention this module amongst the built-in ones.
Its use will be discussed in the chapter on string manipulation.