# Modules

## 1) Definition:

Modules are libraries that contains many functions that can be reused without having to code them. There are many modules that exist, and many more can be downloaded from the web. A non-exhaustive list, but containing more than 300 modules, can be found at:
https://docs.python.org/3/py-modindex.html

It is easy to call a module in Python, as an example let's call the math and the random module that we have just called before:

In [None]:
import math
print(math.cos(math.pi/2))
print(math.sin(math.pi/2))

In [None]:
import random
random.randint(0,10)

It is also possible to import only a single function from a module, for instance:

In [None]:
from random import randint
randint(0,10)

With the keyword from, one can only import a specific module. If one want to import all the function from a module, one needs to use the joker symbol "*": 

In [None]:
from math import *
cos(pi/2)

It is less confusing however to load the whole module and call the functions one by one. It is also possible to load a function as an alias:

In [None]:
import random as rand
rand.randint(1,10)

In this example the functions of the random module are accessible via the alis rand. 
It is possible to remove a module from the memory by calling the del instruction:


In [None]:
import random
random.randint(0 ,10)
del random
random.randint(0 ,10)


## 2) Getting help with modules

Given the high amount of modules available it is not always trivial to know what can be done with a module. To obtain some information, one can use the help() command.

In [None]:
import random
help(random)

The help() command is more general and can be used with any objects loaded in the memory:

In [None]:
t=[1,2,3]
help(t)

To know which functions are available from a module, one can use the dir() function:


In [None]:
import random
dir(random)

## 3) A few interesting modules

Among the popular modules are:
- math: basic mathematical functions and constants.
- random: generation of random numbers.
- sys: interaction with the Python interpreter.
- os: interaction with the operating system.
- time: timing information.
- datetime: date information.
- re: regular expressions.
- NumPy: vector and matrix manipulations.
- matplotlib: graphics and histograms.
- pandas: data analysis.
- scipy: data analysis and data fitting.

Let's take a look at a few of them:
### 3.1) sys module: 

The sys module contains functions and variables that are specific to the Python interpreter. It is particularly interesting to get access to arguments that are passed when one execute a 

In [None]:
! python ../python/sys.py salut girafe 42

The script return the content of the sys.argv list, which contains all the argument used in the command line, including the name of the python script itself. It is therefore possible to access the argument with sys.argv[n], with n>0. In the sys module it is also possible to use the sys.exit() function to exit the python interpreter.

For instance look at the content of the following file (python/test_sys.py) using the more linux command line:

In [None]:
!more ../python/test_sys.py

In [None]:
!python ../python/test_sys.py

In [None]:
!python ../python/test_sys.py arg1

One can use the argument in a program. For instance look at the file python/count_lines.py and execute it on the data/animals.txt file:

In [None]:
!python ../python/count_lines.py ../data/animals.txt

### 3.2) os module:

The os module allows interactions with the operating system. One can give three different useful commands of this module:
- os.path.exists("filename") checks that filename exist.
- os.getcwd() returns the current directory where the Python shell is running.
- os.listdir("path") returns the content of the directory path.

In [None]:
import os

if os.path.exists ("toto.pdb"):
    print ("toto.pdb exist")
else:
    print ("toto.pdb does not exist")
    
print(os.getcwd())

print(os.listdir("../data/"))

### 3.2) re module:

The re module allows to do regular expression (regex) searches in a string. For instance:

In [None]:
import re
animals = "python anaconda tiger"
re.search ("tiger", animals )

if re.search ("tiger", animals ):
    print (" OK ")

The following characters are important in regex searches:
- "^": corresponds to the beginning of the regular expression.
    - example: ^ATG is found in ATGCG but not in CCATG.
- "\$": corresponds to the end of the regular expression.
    - example: $ATG is found in TJATG but not in TJATGKD.
- ".": corresponds to any characters.
    - example: A.G is found in ARG and AJG, etc..
- "[ABC]": corresponds to the character A or B or C.
    - example: T[ABC]G is found in TAG, TBG or TCG, but not in TG.
- "[A-Z]": corresponds to any capitalized characters from A to Z.
    - example: C[A-Z]T is found in CAT.
- "[a-z]": corresponds to any characters from a to z.
- "[0-9]": corresponds to any numbers.
- "[A-Za-z0-9]": corresponds to any characters.
- "[^AB]": corresponds to any characters but A and B.
- "\": corresponds to a protected character. 
    - example: A\.G is found in A.G and not to search for any characters in A.G
- "*": corresponds to 0 to n times the previous characters in brackets.
    - example: A(CG)*T is found in AT, ACGT, ACGCGT.
- "+": corresponds to 1 to n times the previous characters in brackets.
    - example: A(CG)*T is found in ACGT, ACGCGT.
- "?": corresponds to 0 to 1 times the previous characters in brackets.
    - example: A(CG)?T is found in ACGT, ACGCGT.
- "{n}": corresponds to n times the previous characters in brackets.
    - example: A(CG){2}T is found in ACGCGT but not in ACGT, ACGCGCGT or ACGCG.
- "{n,m}": corresponds to n to m times the previous characters in brackets.
    - example: A(C){2,4}T is found in ACCT, ACCCT et ACCCCT but not in ACT, ACCCCCT or ACCC.
- "{n,}": corresponds to at least n times the previous characters in brackets.
    - example: A(C){2,}T is found in ACCT, ACCCT et ACCCCT but not in ACT or ACCC.
- "{,m}": corresponds to at most m times the previous characters in brackets.
    - example: A(C){,2}T is found in AT, ACT et ACCT but not in ACCCT or ACC.
- "(CG|TT)": corresponds to the chain of characters CG or TT.
    - example: A(CG|TT)C is found in ACGC or ATTC.
    
- "\d": corresponds to any numbers or is equivalent to [0-9].
- "\w": corresponds to any characters or is equivanet to [0-9A-Za-z_].
- "\s": corresponds to whitespace.
- And the classic "\t", "\n" for tabulation and new line. 

There are many more methods that can be used:
- match(): return true if the regex is found at the beginning of the string.
- fullmatch(): return true if the regex fully match the entire expression tested.
- compile(): will compile return a compiled regex that can be run faster on large list of caracters.
- findall(): return a list of the element corresponding to the search.
- sub(): allow to replace the expression in the chain.

Some example:

In [None]:
animals = "python anaconda tiger"
print("match():",bool(re.match("python",animals)))
print("fullmatch():",bool(re.fullmatch("tiger",animals)))
regex=re.compile("tiger")
print("compiled search():",bool(regex.search(animals)))

regex2 = re.compile("[0-9]+\.[0-9]+")
result = regex2.findall(" pi is equal to 3.14 and e is equal to 2.72")
print("findall():",result)

regex2.sub ("something"," pi is equal to 3.14 and e is equal to 2.72")


## 4) Exercise

- Display on the same line, all numbers from 10 to 20 and their square root with up to 3 decimals.
- Display the name and content of the running directory. Determine the number of files in the data and python directories.
- Display numbers from 1 to 10 with 1 second of interval between each display.
- Create a list with 10 random float numbers drawn from 0 to 1.
- Improve the python/count_lines.py macro to return an error message and exit if the file provided does not exist.
- Compute the pi number using a Monte Carlo method:
    - Let's start from a circle of radius $R=1$, which is in a square of size $s=2$ in a side. 
        - The circle surface is $\pi R^{2}=\pi$ while the square surface is $2R^{2}=4$.
    - If one draw N random points, using a uniform distribution, inside the square the probability that these points are in the circle is:
        - $p=\frac{\mathrm{surface\ circle}}{\mathrm{surface\ square}}=\frac{pi}{4}$.
        - If n trials out of the total N are in the circle $p=\frac{n}{N}$ or $\pi=4\times{}\frac{n}{N}$
        
    - In detail:
        - Draw the x and y coordinates of a single point between -1 and 1 using the uniform() function of the random module, and store these coordinates (one couple x and y on each lines) with 6 digits in a file named data/pi_mc_coordinates.txt.
        - Compute the distance ($d=\sqrt{(x_b-x_a)^2+(y_b-y_a)^2}$) between the center of the circle (coordinates (0,0)) and the point.
        - Determine if this point is within the circle area (d<1), if it is the case increment n.
        - Compute pi for 100, 1000, 10000 iterations and compare these to the value returned by the math module.

