# Scientific Programming for Data Science with Python - Introduction

## Introduction

### Get the notebooks

here: https://gitlab.com/sylvain.caillou/scientific-programming-m1eit-m2aic-2019-2020

Clone the notebooks:

<code>git clone git@gitlab.com:sylvain.caillou/scientific-programming-m1eit-m2aic-2019-2020.git</code>

or 

<code>git clone https://gitlab.com/sylvain.caillou/scientific-programming-m1eit-m2aic-2019-2020.git</code>


### What is Data  Science?

source : "Python Data Science Handbook" (Jake VanderPlas) [ http://shop.oreilly.com/product/0636920034919.do ]

- It's a surprisingly hard definition to nail down
- especially given how ubiquitous the term has become.

![Data Science Venn Diagram](images/Data_Science_VD.png)

<small>(Source: [Drew Conway](http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram). Used by permission.)</small>

While some of the intersection labels are a bit tongue-in-cheek, this diagram captures the essence of what I think people mean when they say "data science": it is fundamentally an *interdisciplinary* subject.

Data science comprises three distinct and overlapping areas: 

- the skills of a *statistician* who knows how to model and summarize datasets (which are growing ever larger); 

- the skills of a *computer scientist* who can design and use algorithms to efficiently store, process, and visualize this data; 

- and the *domain expertise*—what we might think of as "classical" training in a subject—necessary both to formulate the right questions and to put their answers in context.

Think of data science not as a new domain of knowledge to learn, but a new set of skills that you can apply within your current area of expertise.

## Why Python?





| High level     | Generic low level        |   specialized low level |
| :-------------:   |:---------------------------:|:-----------------------:|
| Python          |         C                 |  Assembleur            |
| Julia          |        C++                  |      CUDA               |
| Mathematica     |        FORTAN               |      OpenCL             |
| R               |        ...                  |      ...                |
| matlab            |                             |                         |
| ...           |                             |                         |




* Python is a good compromise between simplicity and consision of code and performance

* Python has a strong position in scientific computing: 
    * Very large community of users, easy to find help and documentation.

* Great performance due to close integration with time-tested and highly optimized codes written in C and Fortran:
    * blas, atlas blas, lapack, arpack, Intel MKL, ...

* Readily available and suitable for use on high-performance computing clusters. 

* No license costs, no unnecessary use of research budget.


Python has emerged over the last couple decades as a first-class tool for scientific computing tasks

- Data managment
- Data processing
- Visualization of large datasets.

* Extensive ecosystem of scientific libraries and environments:

<img src="images/python_ecosystem_1.png" width="600">


The usefulness of Python for data science stems primarily from the large and active ecosystem of third-party packages: 

*NumPy* (http://numpy.scipy.org - Numerical Python) for manipulation of homogeneous array-based data, 

*Pandas* (https://pandas.pydata.org/ - Data Analysis and manipulation library) for manipulation of heterogeneous and labeled data, 

*SciPy* (http://www.scipy.org -  Scientific Python) for common scientific computing tasks, 

*Matplotlib* (http://www.matplotlib.org - graphics library) for publication-quality visualizations, 

*IPython* for interactive execution and sharing of code, 

*Scikit-Learn*, *keras*, *pytorch*, for machine learning

*A lot more!*

## Plan of the course

### Session 1 & 2 : Introduction to Scientific Programming with Python

- Introduction
- Development environment and tools
- Basics of Python
    - Simple types
    - Containers
    - Control flow
    - Exceptions
    - Context managers
- Advanced Python mecanism
    - Packing / Unpacking
    - Comprehension
    - Iterable and iterators
    - Generators
- Style guide
- Design and packaging
 
### Session 3 & 4 : Introduction to Numpy

- Introduction 
- Basics on numpy array 
- Computation on array 
- Vectorization 
- Universal function 
- Aggregates function 
- Broadcasting 
- Booleans arrays and mask 
- Indexing 
- Structured Data Type 

### Session 5 : Data manipulation with Pandas

- Introduction
- Pandas objects 
- Data indexing and selection 
- Operations in pandas 
- Missing values 
- Concat and append 
- Merge and join 
- Aggregate and grouping 

### Session 6 : Data visualization
 
- Introduction 
- Qick overview of Matplotlib, Seaborn and Bokeh 
- Introduction to matplotlib 
    - Simple line plots 
    - Simple scatter plots 
    - Errors bars 
    - Histogram and bindings 
    - Customize (figure, subplots, axes, lines, legends) 
 
### Session 7 : Exam and open discussion

- Exam (1h30)
- Open on machine learning with Python 
    - Scikit-learn 
    - Keras, PyTorch
    - ...


## Development environment and tools

### Python interpreter

The standard way to use the Python programming language is to use the Python interpreter to run python code. The python interpreter is a program that reads and execute the python code in files passed to it as arguments. At the command prompt, the command ``python`` is used to invoke the Python interpreter.

For example, to run a file ``my-program.py`` that contains python code from the command prompt, use::

    $ python my-program.py

We can also start the interpreter by simply typing ``python`` at the command line, and interactively type python code into the interpreter. 

<!-- <img src="files/images/python-screenshot.jpg" width="600"> -->
<img src="images/python-screenshot.jpg" width="600">


This is often how we want to work when developing scientific applications, or when doing small calculations. But the standard python interpreter is not very convenient for this kind of work, due to a number of limitations.

### Python IDEs

- PyCharm
- Spyder
- Pyzo
- Eclipse with python plugin
- Atom
- ...

<!-- <img src="files/images/spyder-screenshot.jpg" width="800"> -->
<img src="images/spyder-screenshot.jpg" width="800">

Some advantages of IDEs:

* Powerful code editor, with syntax high-lighting, dynamic code introspection and integration with the python debugger.
* Variable explorer, IPython command prompt.
* Integrated documentation and help.

### Ipython and Jupyter notebook

If Python is the engine of our data science task, you might think of IPython as the interactive control panel.

IPython is an interactive shell that addresses the limitation of the standard python interpreter, and it is a work-horse for scientific use of python. It provides an interactive prompt to the python interpreter with a greatly improved user-friendliness.

<!-- <img src="files/images/ipython-screenshot.jpg" width="600"> -->
<img src="images/ipython-screenshot.jpg" width="600">

Some of the many useful features of IPython includes:

* Command history, which can be browsed with the up and down arrows on the keyboard.
* Tab auto-completion.
* In-line editing of code.
* Object introspection, and automatic extract of documentation strings from python objects like classes and functions.
* Good interaction with operating system shell.
* A lot more...


IPython is closely tied with the [Jupyter project](http://jupyter.org), which provides a browser-based notebook that is useful for development, collaboration, sharing, and even publication of data science results.

The Jupyter notebook is a browser-based graphical interface to the IPython shell, and builds on it a rich set of dynamic display capabilities.

As well as executing Python/IPython statements, the notebook allows the user to include formatted text, static and dynamic visualizations, mathematical equations, JavaScript widgets, and much more.

#### Launching the Jupyter Notebook

$ jupyter-notebook

#### Shell Commands in IPython and Jupyter notebook

Any command that works at the command-line can be used in IPython by prefixing it with the ``!`` character.
For example, the ``ls``, ``pwd``, and ``echo`` commands can be run as follows:

```ipython
In [1]: !ls
myproject.txt

In [2]: !pwd
/home/jake/projects/myproject

In [3]: !echo "printing from the shell"
printing from the shell
```

#### Magic commands in IPython and Jupyter notebook

##### Running External Code: ``%run``
As you begin developing more extensive code, you will likely find yourself working in both IPython for interactive exploration, as well as a text editor to store code that you want to reuse.
Rather than running this code in a new window, it can be convenient to run it within your IPython session.
This can be done with the ``%run`` magic.

For example, imagine you've created a ``myscript.py`` file with the following contents:

```python
#-------------------------------------
# file: myscript.py

def square(x):
    """square a number"""
    return x ** 2

for N in range(1, 4):
    print(N, "squared is", square(N))
```

You can execute this from your IPython session as follows:

```ipython
In [6]: %run myscript.py
1 squared is 1
2 squared is 4
3 squared is 9
```

Note also that after you've run this script, any functions defined within it are available for use in your IPython session:

```ipython
In [7]: square(5)
Out[7]: 25
```

##### Help on Magic Functions: ``?``, ``%magic``, and ``%lsmagic``

Like normal Python functions, IPython magic functions have docstrings, and this useful
documentation can be accessed in the standard manner.
So, for example, to read the documentation of the ``%timeit`` magic simply type this:

```ipython
In [10]: %timeit?
```

Documentation for other functions can be accessed similarly.
To access a general description of available magic functions, including some examples, you can type this:

```ipython
In [11]: %magic
```

For a quick and simple list of all available magic functions, type this:

```ipython
In [12]: %lsmagic
```

#### Profiling and Timing Code

IPython provides access to a wide array of functionality for this kind of timing and profiling of code:

- ``%time``: Time the execution of a single statement
- ``%timeit``: Time repeated execution of a single statement for more accuracy
- ``%prun``: Run code with the profiler
- ``%lprun``: Run code with the line-by-line profiler
- ``%memit``: Measure the memory use of a single statement
- ``%mprun``: Run code with the line-by-line memory profiler

##### Timing Code : ``%timeit`` and ``%time``

In [None]:
%timeit sum(range(100))

Note that because this operation is so fast, ``%timeit`` automatically does a large number of repetitions.
For slower commands, ``%timeit`` will automatically adjust and perform fewer repetitions:

In [None]:
%%timeit
total = 0
for i in range(1000):
    for j in range(1000):
        total += i * (-1) ** j

Sometimes repeating an operation is not the best option.
For example, if we have a list that we'd like to sort, we might be misled by a repeated operation.
Sorting a pre-sorted list is much faster than sorting an unsorted list, so the repetition will skew the result:

In [None]:
import random
L = [random.random() for i in range(100000)]
%timeit L.sort()

For this, the ``%time`` magic function may be a better choice. It also is a good choice for longer-running commands, when short, system-related delays are unlikely to affect the result.
Let's time the sorting of an unsorted and a presorted list:

In [None]:
import random
L = [random.random() for i in range(100000)]
print("sorting an unsorted list:")
%time L.sort()

In [None]:
print("sorting an already sorted list:")
%time L.sort()

For ``%time`` as with ``%timeit``, using the double-percent-sign cell magic syntax allows timing of multiline scripts:

In [2]:
%%time
total = 0
for i in range(1000):
    for j in range(1000):
        total += i * (-1) ** j

CPU times: user 280 ms, sys: 0 ns, total: 280 ms
Wall time: 281 ms


##### Profiling Full Scripts: ``%prun``

IPython offers a much more convenient way to use this profiler, in the form of the magic function ``%prun``.

By way of example, we'll define a simple function that does some calculations:

In [3]:
def sum_of_lists(N):
    total = 0
    for i in range(5):
        L = [j ^ (j >> i) for j in range(N)]
        total += sum(L)
    return total

Now we can call ``%prun`` with a function call to see the profiled results:

In [4]:
%prun sum_of_lists(1000000)

 

In the notebook, the output is printed to the pager, and looks something like this:

```
14 function calls in 0.714 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        5    0.599    0.120    0.599    0.120 <ipython-input-19>:4(<listcomp>)
        5    0.064    0.013    0.064    0.013 {built-in method sum}
        1    0.036    0.036    0.699    0.699 <ipython-input-19>:1(sum_of_lists)
        1    0.014    0.014    0.714    0.714 <string>:1(<module>)
        1    0.000    0.000    0.714    0.714 {built-in method exec}
```

The result is a table that indicates, in order of total time on each function call, where the execution is spending the most time. In this case, the bulk of execution time is in the list comprehension inside ``sum_of_lists``.
From here, we could start thinking about what changes we might make to improve the performance in the algorithm.

### Package managers : Pip and Conda

- Pip is the Python Packaging Authority’s recommended tool for installing packages from the Python Package Index, PyPI. 

- Conda is a cross platform package and environment manager that installs and manages conda packages from the Anaconda repository. 

#### Install conda

Install miniconda : https://docs.conda.io/en/latest/miniconda.html

Install miniconda : https://docs.conda.io/en/latest/anaconda.html

#### Use Virtual environments

It's not mandatory but it's a good practice and allow to have a dedicated environnement where you can install the necessary package for your project and let your default environment unchanged. To do this tape on a terminal (Linux or mac) or on conda prompt (Windows) : 

<code>$ conda create --name intro-scientific-programming python=3.7</code>

Activate "intro-scientific-programming" environment :

<code>$ source activate intro-scientific-programming (or conda activate intro-scientific-programming)</code>

To exit the environmnent:

<code>$ source deactivate (or conda deactivate)</code>



#### Package installation


Activate "intro-scientific-programming" environment :

<code> $ source activate intro-scientific-programming (or conda activate intro-scientific-programming)</code>

- Install numpy, pandas, scipy, scikit-learn, matplotlib and jupyter :

<code>$ conda install numpy pandas scipy scikit-learn matplotlib jupyter</code>

- Install pytorch (follow the command according to your OS on the official pytorch web site: https://pytorch.org/get-started/locally/

For linux with conda  and python 3.7 and no GPU :

<code>$ conda install pytorch-cpu torchvision-cpu -c pytorch</code>

#### Check installation and use the notebooks

Activate "intro-scientific-programming" environment :

<code>$ source activate intro-scientific-programming (or conda activate intro-scientific-programming)</code>

Launch jupyter notebook interface:

<code>$ jupyter-notebook</code>

A page on your default internet browser should open (if not go to http://localhost:8888)

Launch the installation notebook and run the next cell (CTRL+ENTER), if you don't have any errors you are ready to go.

In [None]:
import numpy as np
import sklearn
import matplotlib
import torch

print('numpy version : ', np.__version__)
print('scikit-learn version : ', sklearn.__version__)
print('matplotlib version : ', matplotlib.__version__)
print('pytorch version', torch.__version__)

## Basics of Python

### What is Python?

[Python](http://www.python.org/) is a modern, general-purpose, object-oriented, high-level programming language.

General characteristics of Python:

* **clean and simple language:** Easy-to-read and intuitive code, easy-to-learn minimalistic syntax, maintainability scales well with size of projects.
* **expressive language:** Fewer lines of code, fewer bugs, easier to maintain.

Technical details:

* **dynamically typed:** No need to define the type of variables, function arguments or return types.
* **automatic memory management:** No need to explicitly allocate and deallocate memory for variables and data arrays. No memory leak bugs. 
* **interpreted:** No need to compile the code. The Python interpreter reads and executes the python code directly.

Advantages:

* The main advantage is ease of programming, minimizing the time required to develop, debug and maintain the code.
* Well designed language that encourage many good programming practices:
 * Modular and object-oriented programming, good system for packaging and re-use of code. This often results in more transparent, maintainable and bug-free code.
 * Documentation tightly integrated with the code.
* A large standard library, and a large collection of add-on packages.

Disadvantages:

* Since Python is an interpreted and dynamically typed programming language, the execution of python code can be slow compared to compiled statically typed programming languages, such as C and Fortran. 
* Somewhat decentralized, with different environment, packages and documentation spread out at different places. Can make it harder to get started.

### Module

Most of the functionality in Python is provided by *modules*. The Python Standard Library is a large collection of modules that provides *cross-platform* implementations of common facilities such as access to the operating system, file I/O, string management, network communication, and much more.

 * The Python Language Reference: http://docs.python.org/2/reference/index.html
 * The Python Standard Library: http://docs.python.org/2/library/

#### Import 

To use a module in a Python program it first has to be imported. A module can be imported using the `import` statement. For example, to import the module `math`, which contains many standard mathematical functions, we can do:

In [None]:
import math

This includes the whole module and makes it available for use later in the program. For example, we can do:

In [None]:
import math

x = math.cos(2 * math.pi)

print(x)

Alternatively, we can chose to import all symbols (functions and variables) in a module to the current namespace (so that we don't need to use the prefix "`math.`" every time we use something from the `math` module:

In [None]:
from math import *

x = cos(2 * pi)

print(x)

This pattern can be very convenient, but in large programs that include many modules it is often a good idea to keep the symbols from each module in their own namespaces, by using the `import math` pattern. This would elminate potentially confusing problems with name space collisions.

As a third alternative, we can chose to import only a few selected symbols from a module by explicitly listing which ones we want to import instead of using the wildcard character `*`:

In [None]:
from math import cos, pi

x = cos(2 * pi)

print(x)

#### Looking at what a module contains, and its documentation

Once a module is imported, we can list the symbols it provides using the `dir` function:

In [None]:
import math

print(dir(math))

And using the function `help` we can get a description of each function (almost .. not all functions have docstrings, as they are technically called, but the vast majority of functions are documented this way). 

In [None]:
help(math.log)

In [None]:
math.log(10)

Some very useful modules form the Python standard library are `os`, `sys`, `math`, `shutil`, `re`, `subprocess`, `multiprocessing`, `threading`. 

A complete lists of standard modules for Python 2 and Python 3 are available at http://docs.python.org/2/library/ and http://docs.python.org/3/library/, respectively.

### Variables and types

#### Symbol names

Variable names in Python can contain alphanumerical characters `a-z`, `A-Z`, `0-9` and some special characters such as `_`. Normal variable names must start with a letter. 

By convention, variable names start with a lower-case letter, and Class names start with a capital letter. 

In addition, there are a number of Python keywords that cannot be used as variable names. These keywords are:

    and, as, assert, break, class, continue, def, del, elif, else, except, 
    exec, finally, for, from, global, if, import, in, is, lambda, not, or,
    pass, print, raise, return, try, while, with, yield

Note: Be aware of the keyword `lambda`, which could easily be a natural variable name in a scientific program. But being a keyword, it cannot be used as a variable name.

#### Assignment

The assignment operator in Python is `=`. Python is a dynamically typed language, so we do not need to specify the type of a variable when we create one.

Assigning a value to a new variable creates the variable:

In [5]:
# variable assignments
x = 1.0
my_variable = 12.2

Although not explicitly specified, a variable does have a type associated with it. The type is derived from the value that was assigned to it.

In [None]:
type(x)

If we assign a new value to a variable, its type can change.

In [6]:
x = 1
type(x)

int

If we try to use a variable that has not yet been defined we get an `NameError`:

In [None]:
print(y)

#### Fundamental types

In [None]:
# integers
x = 1
type(x)

In [None]:
# integers
x = 1
type(x)

In [None]:
# boolean
b1 = True
b2 = False

type(b1)

In [None]:
# complex numbers: note the use of `j` to specify the imaginary part
x = 1.0 - 1.0j
type(x)

In [None]:
print(x)


In [None]:
print(x.real, x.imag)

In [None]:
x = 1.0

# check if the variable x is a float
type(x) is float

In [None]:
# check if the variable x is an int
type(x) is int

We can also use the `isinstance` method for testing types of variables:

In [None]:
isinstance(x, float)

#### Type casting

In [None]:
x = 1.5

print(x, type(x))

In [None]:
x = int(x)

print(x, type(x))

In [None]:
z = complex(x)

print(z, type(z))

In [None]:
x = float(z)

In [None]:
## Operators and comparisons

In [None]:
Most operators and comparisons in Python work as one would expect:

* Arithmetic operators `+`, `-`, `*`, `/`, `//` (integer division), '**' power


In [None]:
Most operators and comparisons in Python work as one would expect:

* Arithmetic operators `+`, `-`, `*`, `/`, `//` (integer division), '**' power


In [None]:
1 + 2, 1 - 2, 1 * 2, 1 / 2

In [None]:
1.0 + 2.0, 1.0 - 2.0, 1.0 * 2.0, 1.0 / 2.0

In [7]:
# Euclidian division
20 // 6

3

In [9]:
# Rest of Euclidian division
20 % 6

2

In [10]:
# Note! The power operators in python isn't ^, but **
2 ** 2

4

* The boolean operators are spelled out as the words `and`, `not`, `or`. 

In [11]:
True and False, True or False , not (True)

(False, True, False)

* Comparison operators `>`, `<`, `>=` (greater or equal), `<=` (less or equal), `==` equality, `is` identical.

In [None]:
2 > 1, 2 < 1

In [None]:
2 > 2, 2 < 2

In [None]:
2 >= 2, 2 <= 2

In [13]:
# equality
[1,2] == [1,2]

True

In [12]:
# objects identical?
l1 = l2 = [1,2]
l1 is l2

True

In [14]:
# objects identical?
l1 = [1,2]
l2 = [1, 2]
l1 is l2

False

### Containers: strings, list, tuple, and dictionaries

##### Strings

Strings are the variable type that is used for storing text messages.

In [17]:
s = "Hello world"
type(s)

str

In [18]:
# length of the string: the number of characters
len(s)

11

In [None]:
# replace a substring in a string with something else
s2 = s.replace("world", "test")
print(s2)

We can index a character in a string using `[]`:

In [None]:
s[0]

Indexing start at 0!

We can extract a part of a string using the syntax `[start:stop]`, which extracts characters between index `start` and `stop` -1 (the character at index `stop` is not included):

In [19]:
s[0:5]

'Hello'

In [20]:
s[4:5]

'o'

If we omit either (or both) of `start` or `stop` from `[start:stop]`, the default is the beginning and the end of the string, respectively:

In [None]:
s[:5]

In [None]:
s[6:]

We can also define the step size using the syntax `[start:end:step]` (the default value for `step` is 1, as we saw above):

In [21]:
s[::1]

'Hello world'

In [22]:
s[::2]

'Hlowrd'

This technique is called *slicing*. Read more about the syntax here: http://docs.python.org/release/2.7.3/library/functions.html?highlight=slice#slice
        
Python has a very rich set of functions for text processing. See for example http://docs.python.org/2/library/string.html for more information.

In [None]:
print("str1", 1.0, False, -1j)  # The print statements converts all arguments to strings

In [23]:
print("value = %f" % 1.0)       # we can use C-style string formatting

# this formatting creates a string
s2 = "value1 = %.2f. value2 = %d" % (3.1415, 1.5)

print(s2)

# alternative, more intuitive way of formatting a string 
s3 = 'value1 = {0}, value2 = {1}'.format(3.1415, 1.5)

print(s3)

value = 1.000000
value1 = 3.14. value2 = 1
value1 = 3.1415, value2 = 1.5


#### Unicode

In python 3, all strings are unicode, and allow you to use different alphabets, accented characters or pictograms, and so on.

In [16]:
unicode_str = "Les échecs (♔♕♖♗♘♙), c'est \u263A"
print(unicode_str)
japanese_str = 'Du japonais : ウェブ'
print(japanese_str)
mix_str = '你好 kollha दुनिया'
print('"' + mix_str + '"', 'veut dire: "bonjour tout le monde"')

Les échecs (♔♕♖♗♘♙), c'est ☺
Du japonais : ウェブ
"你好 kollha दुनिया" veut dire: "bonjour tout le monde"


### List

Lists are very similar to strings, except that each element can be of any type.

The syntax for creating lists in Python is [...]:


In [None]:
l = [10,20,30,40]

print(type(l))
print(l)

We can use the same slicing techniques to manipulate lists as we could use on strings:

In [None]:
print(l)

print(l[1:3])

print(l[::2])

In [None]:
l = [1, 'a', 1.0, 1-1j]

print(l)

Python lists can be inhomogeneous and arbitrarily nested:

In [None]:
nested_list = [1, [2, [3, [4, [5]]]]]

nested_list

Lists play a very important role in Python. For example they are used in loops and other flow control structures (discussed below). There are a number of convenient functions for generating lists of various types, for example the `range` function:

In [None]:
start = 10
stop = 30
step = 2

range(start, stop, step)

In [None]:
# in python 3 range generates an iterator, which can be converted to a list using 'list(...)'.
# It has no effect in python 2
list(range(start, stop, step))

In [None]:
list(range(-10, 10))

In [25]:
# convert a string to a list by type casting:
l = list(s)
l

['H', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd']

#### Adding, inserting, modifying, and removing elements from lists

In [2]:
# create a new empty list
l = []

# add an elements using `append`
l.append(1)
l.append(10)
l.append(100)

print(l)

[1, 10, 100]


 We can modify lists by assigning new values to elements in the list. In technical jargon, lists are *mutable*.

In [27]:
l[1] = 10000
l[2] = 100000000

print(l)

[1, 10000, 100000000]


In [4]:
l[1:3] = [0, 0]

print(l)

[1, 0, 0]


Insert an element at an specific index using insert

In [30]:
l.insert(1, 10000000)
l

[1, 10000000, 10000000, 0, 0]

Remove first element with specific value using 'remove'

In [5]:
l.remove(0)
l

[1, 0]

#### Tuples

Tuples are like lists, except that they cannot be modified once created, that is they are *immutable*. 

In Python, tuples are created using the syntax `(..., ..., ...)`, or even `..., ...`:

In [6]:
point = (10, 20)

print(point, type(point))

(10, 20) <class 'tuple'>


If we try to assign a new value to an element in a tuple we get an error:

In [None]:
point[0] = 20

##### Dictionaries

Dictionaries are also like lists, except that each element is a key-value pair. The syntax for dictionaries is `{key1 : value1, ...}`:

In [8]:
params = {"parameter1" : 1.0,
          "parameter2" : 2.0,
          "parameter3" : 3.0,}

print(type(params))
print(params)

<class 'dict'>
{'parameter1': 1.0, 'parameter2': 2.0, 'parameter3': 3.0}


In [None]:
Add an element to a dictionary:

In [14]:
params["parameters4"] = 4
params

{'parameter1': 1.0, 'parameter2': 2.0, 'parameter3': 3.0, 'parameters4': 4}

Access keys, values and items:

In [15]:
params.keys()

dict_keys(['parameter1', 'parameter2', 'parameter3', 'parameters4'])

In [16]:
params.values()

dict_values([1.0, 2.0, 3.0, 4])

In [18]:
params.items()

dict_items([('parameter1', 1.0), ('parameter2', 2.0), ('parameter3', 3.0), ('parameters4', 4)])

### Control flow

##### Conditional statements: if, elif, else
The Python syntax for conditional execution of code uses the keywords `if`, `elif` (else if), `else`:

In [None]:
statement1 = False
statement2 = False

if statement1:
    print("statement1 is True")
    
elif statement2:
    print("statement2 is True")
    
else:
    print("statement1 and statement2 are False")

For the first time, here we encounted a peculiar and unusual aspect of the Python programming language: Program blocks are defined by their indentation level. 

Compare to the equivalent C code:

    if (statement1)
    {
        printf("statement1 is True\n");
    }
    else if (statement2)
    {
        printf("statement2 is True\n");
    }
    else
    {
        printf("statement1 and statement2 are False\n");
    }

In C blocks are defined by the enclosing curly brakets `{` and `}`. And the level of indentation (white space before the code statements) does not matter (completely optional). 

But in Python, the extent of a code block is defined by the indentation level (usually a tab or say four white spaces). This means that we have to be careful to indent our code correctly, or else we will get syntax errors. 

##### Loops

In Python, loops can be programmed in a number of different ways. The most common is the `for` loop, which is used together with iterable objects, such as lists. The basic syntax is:

##### for loops

In [20]:
for x in [1,2,3]:
    print(x)

1
2
3


In [22]:
for word in ["scientific", "computing", "with", "python"]:
    print(word)

scientific
computing
with
python


The `for` loop iterates over the elements of the supplied list, and executes the containing block once for each element. 

We can also iterates with a generator. For example:

In [19]:
for x in range(4): # by default range start at 0
    print(x)

0
1
2
3


In [21]:
for x in range(-3,3):
    print(x)

-3
-2
-1
0
1
2


Sometimes it is useful to have access to the indices of the values when iterating over a list. We can use the `enumerate` function for this:

In [23]:
for idx, x in enumerate(["scientific", "computing", "with", "python"]):
    print(idx, x)

0 scientific
1 computing
2 with
3 python


To iterate over key-value pairs of a dictionary:

In [None]:
for key, value in params.items():
    print(key + " = " + str(value))

##### While loops

In [24]:
i = 0

while i < 5:
    print(i)
    
    i = i + 1
    
print("done")

0
1
2
3
4
done


### Functions

A function in Python is defined using the keyword `def`, followed by a function name, a signature within parentheses `()`, and a colon `:`. The following code, with one additional level of indentation, is the function body.

Functions that returns a value use the `return` keyword:

In [26]:
def square(x):
    """
    Return the square of x.
    """
    return x ** 2

In [None]:
square(4)

In [27]:
help(square)

Help on function square in module __main__:

square(x)
    Return the square of x.



We can return multiple values from a function using tuples (see above):

In [28]:
def powers(x):
    """
    Return a few powers of x.
    """
    return x ** 2, x ** 3, x ** 4

In [29]:
powers(2)

(4, 8, 16)

### Classes

Classes are the key features of object-oriented programming. A class is a structure for representing an object and the operations that can be performed on the object. 

In Python a class can contain *attributes* (variables) and *methods* (functions).

A class is defined almost like a function, but using the `class` keyword, and the class definition usually contains a number of class method definitions (a function in a class).

* Each class method should have an argument `self` as its first argument. This object is a self-reference.

* Some class method names have special meaning, for example:

    * `__init__`: The name of the method that is invoked when the object is first created.
    * `__str__` : A method that is invoked when a simple string representation of the class is needed, as for example when printed.
    * There are many more, see http://docs.python.org/2/reference/datamodel.html#special-method-names

In [15]:
class Point:
    """
    Simple class for representing a point in a Cartesian coordinate system.
    """
    
    def __init__(self, x, y):
        """
        Create a new Point at x, y.
        """
        self.x = x
        self.y = y
        
    def translate(self, dx, dy):
        """
        Translate the point by dx and dy in the x and y direction.
        """
        self.x += dx
        self.y += dy
        
    def __str__(self):
        return("Point at [%f, %f]" % (self.x, self.y))

To create a new instance of a class:

In [None]:
p1 = Point(0, 0) # this will invoke the __init__ method in the Point class

print(p1)         # this will invoke the __str__ method

To invoke a class method in the class instance `p`:

In [None]:
p2 = Point(1, 1)

p1.translate(0.25, 1.5)

print(p1)
print(p2)

### Exceptions

In Python errors are managed with a special language construct called "Exceptions". When errors occur exceptions can be raised, which interrupts the normal program flow and fallback to somewhere else in the code where the closest try-except statement is defined.

To generate an exception we can use the `raise` statement, which takes an argument that must be an instance of the class `BaseException` or a class derived from it. 

In [30]:
raise Exception("description of the error")

Exception: description of the error

A typical use of exceptions is to abort functions when some error condition occurs, for example:

    def my_function(arguments):
    
        if not verify(arguments):
            raise Exception("Invalid arguments")
        
        # rest of the code goes here

To gracefully catch errors that are generated by functions and class methods, or by the Python interpreter itself, use the `try` and  `except` statements:

    try:
        # normal code goes here
    except:
        # code for error handling goes here
        # this code is not executed unless the code
        # above generated an error

For example:

In [31]:
try:
    print("test")
    # generate an error: the variable test is not defined
    print(test)
except:
    print("Caught an exception")

test
Caught an exception


To get information about the error, we can access the `Exception` class instance that describes the exception by using for example:

    except Exception as e:

In [32]:
try:
    print("test")
    # generate an error: the variable test is not defined
    print(test)
except Exception as e:
    print("Caught an exception:" + str(e))

test
Caught an exception:name 'test' is not defined


### Context managers

Context managers allow you to allocate and release resources precisely when you want to. The most widely used example of context managers is the with statement. Suppose you have two related operations which you’d like to execute as a pair, with a block of code in between. Context managers allow you to do specifically that. For example:

In [24]:
with open('some_file.txt', 'w') as opened_file:
    opened_file.write('Hola!')

The above code opens the file, writes some data to it and then closes it. If an error occurs while writing the data to the file, it tries to close it. The above code is equivalent to:

In [25]:
file = open('some_file.txt', 'w')
try:
    file.write('Hola!')
finally:
    file.close()

While comparing it to the first example we can see that a lot of boilerplate code is eliminated just by using with. The main advantage of using a with statement is that it makes sure our file is closed without paying attention to how the nested block exits.

A common use case of context managers is locking and unlocking resources and closing opened files.

## Advanced Python mecanism

    

### Packing / Unpacking
    

Implicit packing and unpacking

In [34]:
# Tuple packing 
v = (10, 0, 20)
# Tuple unpacking
x, y, z = v

print(x)
print(y)
print(z)

10
0
20


Attention, it is necessary to have as many variables to the left of the assignment, as objects in the iterable to decompose

In [None]:
a, b = (2, 3, 4)

In [None]:
a, b, c = (2, 3)

These packing / unpacking mechanisms also allow:

- to swap the value of 2 variables

In [None]:
a = 2
b = 3
print(a, b)

a, b = b, a
print(a, b)

- to return multiple values of a function

In [35]:
def f(x, y):
    return x+y, x**2, y**2

f(4, 3)

(7, 16, 9)

To complete the * unpacking * functionality, Python has an operator named *** splat ***, or where ** * unpacking * operator, ** represented by the ** `*` ** character, not to be confused with multiplication. This operator is a unary operator: that is to say, it operates only on an object, placing itself in front of it.

Let's see the use cases of this operator.

#### Used on the left during an assignment, to recover several elements during a decomposition

In [1]:
L = [0, 1, 2, 3, 4, 5]
first_elt, *b, last_elt = L
print(first_elt, b, last_elt)
print(type(first_elt), type(b), type(last_elt))

0 [1, 2, 3, 4] 5
<class 'int'> <class 'list'> <class 'int'>


In [None]:
a, b, *c = range(10)
print(a, type(a))
print(b, type(b))
print(c, type(c))

In [None]:
*a, (b, *c) = (0, 1, 2, (3, 4, 5))
print(a)
print(b)
print(c)

In [9]:
*content, last_line = open('./test_file.txt')
print(*content)
print("----")
print(last_line)

Hello,
 blabla
 blabla
 blabla
 blibli

----
I love Python



The operator can also apply to the right of the assignment and the iterables to force the unpacking

In [None]:
T = *range(4), # instead of T = tuple(range(4))
print(T, type(T))

[range(5)]

In [None]:
L = [*range(4)] # instead of L = list(range(4))
print(L, type(L))

In [36]:
L1 = (1, 2)
L2 = [3, 4]
L3 = range(5, 7)

allL = [*L1, *L2, *L3]
print(allL)

[1, 2, 3, 4, 5, 6]


#### Double-splat must be used to apply unpacking to dictionaries

In [None]:
#unpacking D1 and D2 then packing in D to "concatenate" dictionaries
D1 = {'x': 1}
D2 = {'y': 2, 'z': 3}
D = {**D1, **D2} 
print(D)

# we can also do :
D = {'x': 1, **{'y': 2, 'z': 3}}
print(D)

#### The splat ** `*` ** can be used to force the unpacking of iterables to pass to function parameters

In [10]:
planets = ["earth", "mars", "jupyter"]
print(planets)
print(*planets)

['earth', 'mars', 'jupyter']
earth mars jupyter


In [11]:
def print_elements(elem1, elem2=None, elem3=None):
    print(elem1)
    print(elem2)
    print(elem3)

In [12]:
print_elements(planets)

['earth', 'mars', 'jupyter']
None
None


In [13]:
print_elements(*planets)

earth
mars
jupyter


In [None]:
Other example:

In [None]:
# explicit unpacking of a list that will be unpacked into unit effective parameters
# passed to the function. It must be understood that the number of elements of the iterable
# is the number of formal parameters defined in the function

def aire_rectangle(a, b):
    return a*b

rec1 = [3, 8]
   
print(aire_rectangle(*rec1))

In [None]:
# this cell generates an error, 3 positional parameters (see definition below) are expected

def aire_rectangle(a, b, c):
    return a*b

rec1 = [3, 8]
   
print(aire_rectangle(*rec1))

In [None]:
# this cell does not generate an error, parameter c has a default value

def aire_rectangle(a, b, c=2):
    return a*b, c

rec1 = [3, 8]
   
print(aire_rectangle(*rec1))

print(aire_rectangle(*rec1, 4))

#### You can also pass a dictionary as named arguments with double-splat **

In [3]:
# explicit unpacking of a dictionary that will be packaged into effective parameters
# named unitaries passed to the function. Of course, the number of elements
# of the dictionary corresponds to the number of formal parameters defined by the function.


def aire_rectangle(cote1=0, cote2=0):
    return cote1*cote2

rec2 = {'cote1':4, 'cote2':8}

print(aire_rectangle(**rec2))

32


#### Explicit packing with the * splat * operator

In [16]:
def square_sum(*args):  
    sum =0  
    for arg in args:  
        sum = sum + arg**2 
    return sum  
   
print(square_sum(1, 2, 3, 10))  

114


#### Multiple arguments ** `kargs` **

In [None]:
def aire_rectangle2(**kwargs):  # arguments passed in parameter are packed in kwargs
                                 # that behaves like a dictionary
    if len(kwargs) == 2:
        result = 1
        for key, value in kwargs.items():
            result *= value
        return result
    else:
        print('Merci de passer seulement deux parametres nommés')

In [None]:
# A dictionary will be created from the arguments named
print(aire_rectangle2(cote1=4, cote2=8))

### Comprehension

#### Comprehension list
~~~ python
a_list = [func (var) for var in iterable]
~~~
> we apply a `func` function on` var` for any `var` element of ` iterable`

~~~ python
a_list = [func (var) for var in iterable if condition]
~~~
> apply `func` to ` var` for any `var` element of ` iterable` if `condition` is True


~~~ python
a_list = [func (var) if condition else anotherfunc (var) for var in iterable]
~~~

> we apply a `func` function to ` var` if `condition` is True otherwise we apply an` otherfunc` function on `var` for every `v` element of ` iterable`

- code is more concise
- code is faster!

In [19]:
%timeit L = [n ** 2 for n in range(10000)]

1.75 ms ± 22.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [20]:
%%timeit
L = []
for n in range(10000):
    L.append(n ** 2)

2.16 ms ± 11.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


A few other examples:

In [None]:
test_liste = [i**2 if i%2 else i**3 for i in range(10)]
type(test_liste), test_liste

In [None]:
mylist = ["this","is","a","test"]
selected = [ X.capitalize() if X.startswith("t") else X.upper() for X in mylist ]
print(selected)

In [None]:
from math import pi
[str(round(pi, i)) for i in range(1, 6)]

In [None]:
# création d'une liste de tuple
[(x, x**2) for x in range(6)]

In [None]:
[(x, y) for x in [1,2,3] for y in [3,1,4] if x != y]

An example a little more complex but very useful: ** list all the files of a directory in python **:

> The Python language provides the developer with the ** os ** module to interact with the operating system, such as listing all files in a directory.

> The ** `listdir ()` ** method lists all files and directories contained in a directory passed as a parameter. To browse the directories recursively, simply run the method in a ** `for ... in` ** loop. If you want to list only the files, the ** `isfile ()` ** method of the ** `os.path ()` ** class will check each entry found. This method requires the full path to the file. We use the ** `join ()` ** function to concatenate the directory with the file name. Along with the list building mechanism, the processing is thus reduced to a single line.

In [22]:
from os import listdir
from os.path import isfile, join

my_dir="."
files = [f for f in listdir(my_dir) if isfile(join(my_dir, f))]
files

['test_file.txt', 'introduction_python.ipynb']

#### Comprehension dictionaries

Same mechanism as for the lists, one will note the use of the markers of the dictionary: braces instead of the brackets, and the double-point to prefix the values of a key.

~~~ python
one_dict = {var: func (var) for var in iterable}
~~~

~~~ python
one_dict = {var: func (var) for var in iterable if condition}
~~~

~~~ python
one_dict = {var: func (var) if condition else elsefunc (var) for var in iterable}
~~~

In [27]:
data = range(100) # to "simulate" data

print({x : x**2 for x in data})

# same as :

one_dict = dict()
for x in data:
    one_dict[x] = x**2

print(one_dict)

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81, 10: 100, 11: 121, 12: 144, 13: 169, 14: 196, 15: 225, 16: 256, 17: 289, 18: 324, 19: 361, 20: 400, 21: 441, 22: 484, 23: 529, 24: 576, 25: 625, 26: 676, 27: 729, 28: 784, 29: 841, 30: 900, 31: 961, 32: 1024, 33: 1089, 34: 1156, 35: 1225, 36: 1296, 37: 1369, 38: 1444, 39: 1521, 40: 1600, 41: 1681, 42: 1764, 43: 1849, 44: 1936, 45: 2025, 46: 2116, 47: 2209, 48: 2304, 49: 2401, 50: 2500, 51: 2601, 52: 2704, 53: 2809, 54: 2916, 55: 3025, 56: 3136, 57: 3249, 58: 3364, 59: 3481, 60: 3600, 61: 3721, 62: 3844, 63: 3969, 64: 4096, 65: 4225, 66: 4356, 67: 4489, 68: 4624, 69: 4761, 70: 4900, 71: 5041, 72: 5184, 73: 5329, 74: 5476, 75: 5625, 76: 5776, 77: 5929, 78: 6084, 79: 6241, 80: 6400, 81: 6561, 82: 6724, 83: 6889, 84: 7056, 85: 7225, 86: 7396, 87: 7569, 88: 7744, 89: 7921, 90: 8100, 91: 8281, 92: 8464, 93: 8649, 94: 8836, 95: 9025, 96: 9216, 97: 9409, 98: 9604, 99: 9801}
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8:

### Iterable and iterators

** An iterable object is an object that accepts iteration, which can be iterated, ie a collection that can be read one by one. **

To put it simply, everything you can apply a loop ** `for` ** for example (construction **` for ... in ... `**)

In [1]:
my_iterable = ["a", "small", "list", "of", "words", "on", "wich", "you", "can", "iterate"]

for elem in my_iterable:
    print(elem, end=" ")

a small list of words on wich you can iterate 

The best-known example are lists, but this applies to all containers that are by definition iterable objects (** `tuple` **, **` dict` **, ** `set` ** , ** `str` **, the files) as well as the generators that we will see a little further.

In [16]:
obj = "hi my dear friend"
for x in obj: 
    print(x, end=" ")

h i   m y   d e a r   f r i e n d 

In [17]:
obj = range(1,10)
for x in obj: 
    print(x, end=" ")

1 2 3 4 5 6 7 8 9 

In [18]:
obj = {'a': -2, 'b': 1, 'c': 3, 'd': -4, 'aa': 5, 'bb': 10, 'cc': -8, 'dd': 6}

for x in obj.keys(): 
    print(x, end=" ")
print("\n")  

for x in obj.values(): 
    print(x, end=" ")
print("\n")  
    
for x,y in obj.items(): 
    print(x,y, end=" ")    

a b c d aa bb cc dd 

-2 1 3 -4 5 10 -8 6 

a -2 b 1 c 3 d -4 aa 5 bb 10 cc -8 dd 6 

** Iterators are objects that allow you to browse a sequence without the elements that
are not known in advance **.
The principle is equivalent to a data cursor placed on the first datum and which discovers the elements as the sequence progresses.


** A Python object is said to be iterable ** if it has a ** special `__iter__` ** method and ** the object returned ** by this method ** has a` __next__` ** method.

** The object returned ** by the special method `__iter__` is an ** iterator **.
** The iterator is the object that will move along the iterable by calling its `__next__` method. **

For example:

In [19]:
my_iterable = ["a", "small", "list", "of", "words", "on", "wich", "you", "can", "iterate"]


What does the ** `__iter__` ** method of our iterable return?

In [28]:
my_iterator = my_iterable.__iter__() 
print(my_iterator) # return adress and type of the object

<list_iterator object at 0x7fc7766e6a20>


What does the ** `__next__` ** method return from our iterator?

In [29]:
my_iterator.__next__()

'a'

The iterator "points" to the first element of the iterable and returns its value.
Again ?

In [30]:
print(my_iterator.__next__())
print(my_iterator.__next__())
print(my_iterator.__next__())
print(my_iterator.__next__())
print(my_iterator.__next__())
print(my_iterator.__next__())
print(my_iterator.__next__())
print(my_iterator.__next__())
print(my_iterator.__next__())
print(my_iterator.__next__())


small
list
of
words
on
wich
you
can
iterate


StopIteration: 

=> The ** `__next__` ** method is used to move the iterator and return the next element or raise a ** StopIteration` ** exception when the end of the iterable is reached (more elements to read). 


The same can be said for all Python collections. This operation makes it possible to have a greater performance and a great homogeneity.

** Note **: Python offers 2 built-in () `and` next () `functions that act as an" interface "to simplify the code.

The following instructions are equivalent:

> `my_iterator = iter (my_removable)` <=> `my_user = my_removable .__ iter __ ()`
> `next (my_iterator)` <=> `my_iterator .__ next __ ()`



##### What does the loop do `for`?

> ~~~ python
> mon_iterable = ["a", "small", "list", "of", "words", "that", "one", "can", "browse"]
>
> for elem in mon_iterable:
> print (elem, end = "")
> ~~~

- The first operation performed by the ** `for` ** is to call the **` iter () `** function to create an ** iterator ** on the` my_iterable` object.

- It will then apply ** ** next () ** on our iterator to move in the iterable and return the read item.

- In the case of a ** `for` ** loop, it automatically ends when it encounters the **` StopIteration` ** exception.

To better understand what is happening, we could rewrite what the ** `for` ** loop does in the following way with a while loop:

In [40]:
for elem in my_iterable:
    print(elem)

a
small
list
of
words
on
wich
you
can
iterate


In [41]:
my_iterator = iter(my_iterable)

while True:
    try:
        elem = next(my_iterator)
    except StopIteration:
        break
    
    print(elem)

a
small
list
of
words
on
wich
you
can
iterate


** Question: ** What do you observe at the execution of the previous cell if you first comment on the iterator's initialization my_iterator


In [42]:
#mon_iterateur = iter(mon_iterable)

while True:
    try:
        elem = next(my_iterator)
    except StopIteration:
        break
    
    print(elem)

Answer: Nothing ...

Important!

The reason is that ** at the exit of the while ** loop (the first one you executed), ** the iterator was totally "consumed": all the elements of the iterable were read and the read items are not stored in memory by the iterator. **

It must be re-initialized in order to use it:

In [44]:
my_iterator = iter(my_iterable)

while True:
    try:
        elem = next(my_iterator)
    except StopIteration:
        break
    
    print(elem)

a
small
list
of
words
on
wich
you
can
iterate


The use of iterators is an efficient way (the memory footprint is weak) of successively accessing the different components of an iterable object (string, list, tuple, dictionary, set, etc.) while avoiding creating all these components in memory.

Compare the size of an iterable and an iterator:

In [47]:
from sys import getsizeof # importe la fonction calculant la taille mémoire d’un objet

my_iterable1 = ["a", "small", "list"]

my_iterable2 = ["a", "small", "world"]

my_iterable3 = ["a", "small", "list", "of", "words", "on", "wich", "you", "can", "iterate"]


print(getsizeof([])) # 64
print(getsizeof(my_iterable1)) # 64 + 8*3
print(getsizeof(my_iterable2)) # 64 + 8*3
print(getsizeof(my_iterable3)) # 64 + 8*9

my_iterator1 = iter(my_iterable1)
my_iterator2 = iter(my_iterable2)
my_iterator3 = iter(my_iterable3)

print(getsizeof(my_iterator1))
print(getsizeof(my_iterator2))
print(getsizeof(my_iterator3))


64
88
88
144
56
56
56


`my_literable1` and` my_literable2` have the same size in memory ??

Reminder: Lists, like any container, do not contain objects but references to objects!

An empty list occupies 64 bytes in memory, each reference occupies 8 bytes.

#### Create its own iterable and iterator

##### Iterator
We have just seen that an iterator must have at least 2 methods:

- ** `__next __ ()` **: which will return a value (that of the current element read in the iterable or a value calculated from the value of the current element read). This method will throw a `StopIteration` exception to indicate that all values ​​have been supplied or another stop criterion.

- ** `__iter __ ()` **: which returns `self` to be itself an iterable

The class creating this type of object will look like this:

~~~ python
class MyIterator:
    def __iter __ (self):
        return self
    
    def __next __ (self):
    
        #code that returns the value read or calculates the value to return
        #val =
        return val
~~~

##### Iterable
An iterable must have the ** `__iter __ ()` ** method that will return an iterator object. It can also define a ** `__init __ ()` ** constructor method to set the properties of our iterable object. The class to create an itable can look like this:

~~~ python
class MyIterable:
    def __init __ (self, param):
        self.param = param
        # a possible constructor method to define the properties of an object of type MonIterable
        #admit here that our object will have a property "param"
        
    def __iter __ (self):
        return MonIterator (self)
~~~

### >> Use
Creating an iterable `MyIterable` and an iterator on ` my_iterable`:

~~~ python
my_iterable = MyIterable (param)
my_iterator = iter (my_iterable) # call the __iter () __ method of MyIterable
~~~

Reading in a while loop:

~~~ python
while True:
    try:
        elem = next (my_iterator) # call from the __next () __ method of MonIterator
    except StopIteration:
        break
    print (elem)
~~~

Or in a `for` loop (more" pythonic ", implicit call of` iter () `and` next () `and handling of the exception):

~~~ python
my_iterable = MyIterable (param)
for elem in my_iterable:
    print (elem)
~~~

##### Example
Let's review this through an example to better "visualize" things.

We are going to define a class ** `MySingleRange` ** that will allow us to create an iterable object **` `range` **" simplified according to 3 parameters (* properties *): `start`,` stop` and ` step`.

- We will define a ** `__init __ ()` ** constructor method to define the `start`,` stop` and `step` properties of our **` MySingleRange` ** object.

- The ** `__iter __ ()` ** method will return an object ** `MyRangeIterator` **

A ** `MyRangeIterator` ** object will be able to return numbers in the` [start, stop] `range with a` step` step. The `start` value will be considered as the current starting value.

> For simplicity, we will not treat the case of negative step, we will not check either that we have a good stop> start, and that stop> start + step

In [None]:
class MyRangeSimple:
        
    def __init__(self, start, stop, step):      
        self.stop = stop
        self.step = step
        self.start = start
            
    def __iter__(self):
        return MonRangeIterator(self)
    

class MyRangeIterator:
        
    def __init__(self, mon_range):      
        self.current = mon_range.start
        self.stop = mon_range.stop
        self.step = mon_range.step
            
    def __iter__(self):
        return self
        
    def __next__(self):
        
        #condition d'arrêt
        if self.current > self.stop:
            raise StopIteration
        
        #traitement de la valeur à renvoyer
        val = self.current
        self.current += self.step
        return val

In [None]:
create an iterable my_numbers

In [None]:
my_numbers = MyRangeSimple(20,30, 2)
print(my_numbers)

`my_numbers` is an object of type` MySingleRange`. We can verify that it is an iterable using the `` Iterable` abstract class of the `collections` module:

In [None]:
import collections
if isinstance(my_numbers, collections.Iterable):
    print("it's an iterable :-)")

In [None]:
for number in my_numbers:
    print(nombre, end=' ')

In [None]:
print(list(my_numbers))

In [None]:
max(my_numbers)

Here is another example that simulates a dice: we create an iterable that randomly provides integers between 1 and 6, and stops when we draw a 6:

In [8]:
#source : GNU/Linux Magazine HS n° 086

from random import randint

class Dice:
    def __iter__(self):
        return self
 
    def __next__(self):
        n = randint(1, 6)
        if n >= 6:
            raise StopIteration
        return n

dice = Dice()
for number in dice:
    print(number)

In [48]:
for number in dice:
    print(number)

5
5


#### The functions `zip` and` enumerate`

##### `zip`
* `zip (iterable1, iterable2, ...)` *

The built-in ** `zip ()` ** function allows you to iterate over several iterables at the same time and ** returns a tuples iterator `(elmt1, elmt2)` from the elements read in each iterable **:

In [None]:
numbersList = [1, 2, 3]
numbersTuple = ('ONE', 'TWO', 'THREE')

result = zip(numbersList, numbersTuple)
print(result)

Once created, the iterator can be explicitly used by calling its `next ()` method:

In [None]:
print(next(result))
print(next(result))
print(next(result))

It can be converted directly into a list:

In [None]:
list(result)

Does not show anything? ** Remember, arrived at the end of sequence, the iterator is consumed, it must be re-initialized! **

In [None]:
result = zip(numbersList, numbersTuple)
list(result)

More usually the iterator is used implicitly by using the `for ... in` loop:

In [None]:
# return each tuple
for elem in zip(numbersList, numbersTuple):
    print(elem)

In [None]:
# this comes down to this: each tuple is assigned to a variable
elem1, elem2, elem3 = zip(numbersList, numbersTuple)
print(elem1)
print(elem2)
print(elem3)

In [None]:
# we can unpack each tuple in 2 values
for t_elema, t_elemb in zip(numbersList, numbersTuple):
    print(t_elema, t_elemb)

In [None]:
# this comes down to this: each tuple is assigned to a variable
(elem1a, elem1b), (elem2a, elem2b), (elem3a, elem3b) = zip(numbersList, numbersTuple)
print(elem1a, elem1b)
print(elem2a, elem2b) 
print(elem3a, elem3b)

Example with 3 iterables of different length!

In [49]:
numbersList = [1, 2, 3]
strList = ['one', 'two']
numbersTuple = ('ONE', 'TWO', 'THREE')

result = zip(numbersList, strList, numbersTuple)
print(list(result))

for elem1, elem2, elem3 in zip(numbersList, strList, numbersTuple):
    print(elem1, elem2, elem3)

[(1, 'one', 'ONE'), (2, 'two', 'TWO')]
1 one ONE
2 two TWO


We can use the `zip_longest` function of the `itertools` which takes as arguments 2 iterables `p` and` q` and returns an iterator on couples `(p[0], q[0]), (p[1], q[1]), …`

If the `p` and ` q` iterables do not have the same length, we can pass a parameter named `fillvalue` to complete the shortest iterable.

In [51]:
from itertools import zip_longest

numbersList = [1, 2, 3]
strList = ['one', 'two']

for item in zip_longest(numbersList, strList, fillvalue='BLANK'):
    print(item)

(1, 'one')
(2, 'two')
(3, 'BLANK')


##### `enumerate`
* `enumerate (iterable, start = 0)` *

The built-in `enumerate ()` function takes an iterable object as argument and creates an iterator that returns the `(elem) index` pairs formed of the successive` elem` objects of the iterable and their sequence number `index `.
> `start` (optional):` enumerate () `returns a counter from this number. If omitted, the counter starts at 0

In [52]:
grocery = ['bread', 'milk', 'butter']
enumerateGrocery = enumerate(grocery)
print(type(enumerateGrocery))

<class 'enumerate'>


In [53]:
print(next(enumerateGrocery))
print(next(enumerateGrocery))
print(next(enumerateGrocery))

(0, 'bread')
(1, 'milk')
(2, 'butter')


It can be converted directly into a list:

In [54]:
list(enumerateGrocery)

[]

The list is empty? ** Remember, arrived at the end of sequence, the iterator is consumed, it must be re-initialized! **

In [55]:
enumerateGrocery = enumerate(grocery)
list(enumerateGrocery)

[(0, 'bread'), (1, 'milk'), (2, 'butter')]

In [None]:
More usually the iterator is used implicitly by using the `for ... in` loop:

In [58]:
for i, grocery_elem in enumerate(grocery):
    print(i, grocery_elem)

0 bread
1 milk
2 butter


In [60]:
# exemple modification du compteur par défaut
enumerateGrocery = enumerate(grocery, 10)
print(list(enumerateGrocery))

[(10, 'bread'), (11, 'milk'), (12, 'butter')]


###### The `itertools` module
There are other powerful tools for iterating or manipulating the iterables, which are available in a dedicated module: the ** `itertools` ** module (` import itertools`)
> we already met it with the function `zip_longest (p, q, fillevalue = ...)`, similar to `zip`, but which aligns with the largest iterable rather than the smaller iterable.

There are many others: https://docs.python.org/3/library/itertools.html#module-itertools

### Generate data on the fly: generators

Very often when you browse a sequence, the only value you are interested in is the current value, no need to have the whole list in memory.

In addition, Python has some problems with managing large lists of integers and reals.

They are not necessarily released before the end of the program. If you allocate 100 lists of one million integers, you can end up in a critical situation with these 100 lists still in memory even if they are no longer used or you have called the garbage collector.
See discussion examples on this topic [here] (https://stackoverflow.com/questions/9617001/python-garbage-collection-fails) or [there] (https://stackoverflow.com/questions/1316767/how -can-i-Explicitly-free-memory-in-python /)

To work around this problem many Python functions, such as ** `range` ** that returned lists of integers in Python 2, became" generators "in Python 3.

    $ python2
    >>> range (10)
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

    $ python3
    >>> range (10)
    range (0, 10)
    >>> list (range (10))
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

> In Python 2, you can use `xrange` to have the same behavior as Python 3's` `range``.

`range ()` actually gives access to an ** iterator **

Generators are therefore iterators and are just a different way of implementing an iterator.

> <div style = "background-color: # fcf2f2; border-color: # dFb5b4; border-left: 5px solid # dfb5b4; padding: 0.5em; padding: 8px 35px 8px 14px;">

> There are two ways to create generators.

> - ** generating expressions **
> - ** and the generating functions ** (which are often called simply "** generators **").


###### Generating expressions * (generators by comprehension) *

Generating expressions are expressions that return iterators.
** You can create a simple generator by understanding ** as in the example below.
> Note the use of `()` in place of `[]`
> PEP 289 presents them as a generalization of list comprehensions that are more efficient and more efficient in memory.

In [10]:
from sys import getsizeof # importe la fonction calculant la taille mémoire d’un objet

#construction d'une liste par compréhension
list1 = [x**2 for x in range(20)]
print(list1)
print("taille mémoire de la liste : ",getsizeof(list1))

#construction d'un générateur par compréhension
gen1 = (x**2 for x in range(20))
print(gen1)
print("taille mémoire du générateur : ",getsizeof(gen1))

sum(gen1)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361]
taille mémoire de la liste :  264
<generator object <genexpr> at 0x7fc776fd2fc0>
taille mémoire du générateur :  88


2470

Once you have a generator, call ** `next` ** to iterate over its elements, or use it as a parameter for a sequence constructor if you really need all the values:

In [None]:
next(gen1)

In [None]:
next(gen1)

In [None]:
print(list(gen1), end=' ') #si les 2 cellules de code précédentes ont été exécutées, 2 éléments auront déjà été lus

On peut itérer à l'aide d'une `for` :

In [None]:
gen1 = (x**2 for x in range(20))
for x in gen1:
    print(x, end=' ')

** Generators allow you to calculate the next value on demand, rather than calculating all values at one time. **

Just like in list comprehensions, we can use the ternary operator ** `if-else` ** in a generating expression:

In [None]:
list2 = [x**2 if x%2 == 0 else x**3 for x in range(10)]
print(list2)

gen2 = (x**2 if x%2 == 0 else x**3 for x in range(10))
for x in gen2:
    print(x, end=' ')

In [None]:
print(list2[0])

Contrary to the sequences, the generators are not switchable (the elements are not stored):

In [61]:
# Error
print(gen2[0])

NameError: name 'gen2' is not defined

You can apply the built-in functions that support the iteration mechanism:

In [None]:
list2 = [x**2 if x%2 == 0 else x**3 for x in range(10)]
print(sum(list2))

gen2 = (x**2 if x%2 == 0 else x**3 for x in range(10))
print(sum(gen2))

Fortunately, this gives the same result, but which calculation is the fastest?

In [None]:
def f(size):
    mylist = [ val**2/2.0 for val in list(range(size))]
    return sum(mylist)

def g(size):
    mygen  = ( val**2/2.0 for val in range(size) )
    return sum(mygen)

In [None]:
%timeit f(50000)
%timeit g(50000)

Little difference ... And in terms of memory usage?

In [62]:
#si cela ne fonctionne pas, il vous faut installer memory_profiler au préalable !
# conda install memory_profiler
%load_ext memory_profiler


ModuleNotFoundError: No module named 'memory_profiler'

In [None]:
#on augmente la taille pour voir la différence

%memit f(100000)
%memit g(100000)

%memit f(1000000)
%memit g(1000000)

The gain is mainly at the level of memory occupancy, which can be very interesting for the processing of large volumes of data.

###### Generating functions * (use of the yield keyword) *

Python offers a ** partial return function mechanism ** or * generating functions * to develop more complex generators, using the ** `yield` ** keyword.

The ** `yield` ** keyword is a bit like the **` return` ** of ordinary functions except that it does not mean the end of the function but ** ** pause ** "; 

and at the next call of the function its code will resume execution just after the keyword ** `yield` **.

With this keyword, a function can return a value and be recalled again to return a new value.

#### Example 1
Here is an example of a generator function:

In [64]:
def a_generator_test_function():
    print("first call")
    yield 'first value returned'
    print("second call")
    yield 'second value returned'
    print("third call")
    yield 'third value returned'

Let's call this function:

In [65]:
a_generator_test_function()

<generator object a_generator_test_function at 0x7fc776692410>

This type of function returns an object of type ** `generator` **, which provides a method **` next` ** (a generator is an iterator!).

What is important to understand here is that evaluating ** `a_generator_test_function` ** does not cause the ** function code to execute a `a_generator_test_function` **: this only sets up a mechanism, ready to be triggered at our request.

Let's assign it to a variable to use it:

In [None]:
a_generator_object = a_generator_test_function()

With each call of ** `next` **, the execution is resumed since the last **` yield` **:

In [None]:
next(a_generator_object)

In [None]:
next(a_generator_object)

In [None]:
next(a_generator_object)

#### Example 2
Let's see another example.

In [68]:
def square_generator(n):
    for val in range(n+1):
        yield val**2

x = square_generator(4) #initialisation du générateur 
                                   #ce générateur ne pourra générer qu'un ensemble de valeurs fini

In [69]:
next(x), next(x), next(x), next(x), next(x)

(0, 1, 4, 9, 16)

Each `next ()` call returns the value that follows the ** `yield` ** statement in the body of the function.
When it is waiting, the function retains its "context", in particular the value of its local variables like here the variale `val`, to allow a clean restart.

In [71]:
# this cell generates an error ... only 5 values can be generated!
next(x)

StopIteration: 

The ** `next` ** function accepts a default argument, which overrides **` StopIteration` ** when it exceeds the number of iterations:

In [72]:
next(x, "that's all!")

"that's all!"

If a `f` function is a generator, we can solicit all the values by means of the` for .. in f` loop (if the set of values is finite, it is preferable). In this case it is the loop that itself triggers the "next" mechanism and that catches the end of iteration exception.
Let's recreate a generator and ask him to send us back the first 10 values of his suite.

In [74]:
x = square_generator(10)

In [75]:
for k in x: 
    print(k, end=' ') # demande et imprime les valeurs

0 1 4 9 16 25 36 49 64 81 100 

You can also use the `list` constructor to retrieve all values from a list:

In [76]:
list(square_generator(10))

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

#### Example 3
This third example illustrates the application of a generator for the processing of a file (extraction of words of more than 3 characters):

In [77]:
def extract_words(file):
    with open(file) as f:
            for line in f:
                for word in line.split():
                    if len(word) > 3:
                        yield word

for word in extract_words("./a_file_with_words.txt"):
    print(word)

file
with
words
moon
jupyter


In [78]:
! more a_file_with_words.txt

a
file
with
words
sun
moon
jupyter


##### Infinite generator

What is especially interesting in the operation of a generator is that it allows to form the values
successive sequences of a virtually infinite series.

Let's modify our generator as follows:

In [79]:
def infinite_square_generator():
    x, y = 0, 1 # begining
    while True: # ad vitam aeternam
        yield x**2 # current value
        x, y = y, y+1

In [80]:
f = infinite_square_generator()
[next(f) for k in range(0, 10)] # 10 first values

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [81]:
[next(f) for k in range(0, 5)] # 5 next

[100, 121, 144, 169, 196]

###### Chaining the generators

This rather simplistic example illustrates that generators can be chained together to form very efficient data processing pipelines!

In [None]:
def integers():
    for i in range(1, 9):
        yield i

def squared(seq):
    for i in seq:
        yield i * i
        
def negated(seq):
    for i in seq:
        yield -i

In [None]:
chain = negated(squared(integers()))

print(list(chain))

I invite you to read this [post] (http://nvie.com/posts/iterators-vs-generators/) from which is extracted the following image:

<img src = "images/relationships.png" style = "width: 600px;" />


## Style guide

One of Guido's key insights is that code is read much more often than it is written. The guidelines provided here are intended to improve the readability of code and make it consistent across the wide spectrum of Python code. As PEP 20 says, "Readability counts".

A style guide is about consistency. Consistency with this style guide is important. Consistency within a project is more important. Consistency within one module or function is the most important.

However, know when to be inconsistent -- sometimes style guide recommendations just aren't applicable. When in doubt, use your best judgment. Look at other examples and decide what looks best. And don't hesitate to ask!

In particular: do not break backwards compatibility just to comply with this PEP!

Some other good reasons to ignore a particular guideline:

    When applying the guideline would make the code less readable, even for someone who is used to reading code that follows this PEP.
    To be consistent with surrounding code that also breaks it (maybe for historic reasons) -- although this is also an opportunity to clean up someone else's mess (in true XP style).
    Because the code in question predates the introduction of the guideline and there is no other reason to be modifying that code.
    When the code needs to remain compatible with older versions of Python that don't support the feature recommended by the style guide.


#### Code Layout

##### indentation

Use 4 spaces per indentation level.

In [None]:
# YES

# Aligned with opening delimiter.
foo = long_function_name(var_one, var_two,
                         var_three, var_four)

# Add 4 spaces (an extra level of indentation) to distinguish arguments from the rest.
def long_function_name(
        var_one, var_two, var_three,
        var_four):
    print(var_one)

# Hanging indents should add a level.
foo = long_function_name(
    var_one, var_two,
    var_three, var_four)

In [None]:
# NO

# Arguments on first line forbidden when not using vertical alignment.
foo = long_function_name(var_one, var_two,
    var_three, var_four)

# Further indentation required as indentation is not distinguishable.
def long_function_name(
    var_one, var_two, var_three,
    var_four):
    print(var_one)

##### Tabs or Spaces?

Spaces are the preferred indentation method.

Tabs should be used solely to remain consistent with code that is already indented with tabs.

**Python 3 disallows mixing the use of tabs and spaces for indentation.**

##### Maximum Line Length

Limit all lines to a maximum of 79 characters.

For flowing long blocks of text with fewer structural restrictions (docstrings or comments), the line length should be limited to 72 characters.

Limiting the required editor window width makes it possible to have several files open side-by-side, and works well when using code review tools that present the two versions in adjacent columns.

##### Should a Line Break Before or After a Binary Operator?

For decades the recommended style was to break after binary operators. But this can hurt readability in two ways: the operators tend to get scattered across different columns on the screen, and each operator is moved away from its operand and onto the previous line. Here, the eye has to do extra work to tell which items are added and which are subtracted:

In [None]:
# No: operators sit far away from their operands
income = (gross_wages +
          taxable_interest +
          (dividends - qualified_dividends) -
          ira_deduction -
          student_loan_interest)

To solve this readability problem, mathematicians and their publishers follow the opposite convention. Donald Knuth explains the traditional rule in his Computers and Typesetting series: "Although formulas within a paragraph always break after binary operations and relations, displayed formulas always break before binary operations" [3].

Following the tradition from mathematics usually results in more readable code:

In [None]:
# Yes: easy to match operators with operands
income = (gross_wages
          + taxable_interest
          + (dividends - qualified_dividends)
          - ira_deduction
          - student_loan_interest)

##### Blank Lines

Surround top-level function and class definitions with two blank lines.

Method definitions inside a class are surrounded by a single blank line.

Extra blank lines may be used (sparingly) to separate groups of related functions. Blank lines may be omitted between a bunch of related one-liners (e.g. a set of dummy implementations).

Use blank lines in functions, sparingly, to indicate logical sections.

##### Source File Encoding

Code in the core Python distribution should always use UTF-8 (or ASCII in Python 2).

Files using ASCII (in Python 2) or UTF-8 (in Python 3) should not have an encoding declaration.

In the standard library, non-default encodings should be used only for test purposes or when a comment or docstring needs to mention an author name that contains non-ASCII characters; otherwise, using \x, \u, \U, or \N escapes is the preferred way to include non-ASCII data in string literals.

For Python 3.0 and beyond, the following policy is prescribed for the standard library (see PEP 3131): All identifiers in the Python standard library MUST use ASCII-only identifiers, and SHOULD use English words wherever feasible (in many cases, abbreviations and technical terms are used which aren't English). In addition, string literals and comments must also be in ASCII. The only exceptions are (a) test cases testing the non-ASCII features, and (b) names of authors. Authors whose names are not based on the Latin alphabet (latin-1, ISO/IEC 8859-1 character set) MUST provide a transliteration of their names in this character set.

Open source projects with a global audience are encouraged to adopt a similar policy.

#### Imports

Imports should usually be on separate lines:

In [None]:
# Yes: 
import os
import sys

# No:  
import sys, os

It's okay to say this though:

In [None]:
from subprocess import Popen, PIPE

Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants.
Imports should be grouped in the following order:
- Standard library imports.
- Related third party imports.
- Local application/library specific imports.

You should put a blank line between each group of imports.

Absolute imports are recommended, as they are usually more readable and tend to be better behaved (or at least give better error messages) if the import system is incorrectly configured (such as when a directory inside a package ends up on sys.path):

In [None]:
import mypkg.sibling
from mypkg import sibling
from mypkg.sibling import example

However, explicit relative imports are an acceptable alternative to absolute imports, especially when dealing with complex package layouts where using absolute imports would be unnecessarily verbose:

In [None]:
from . import sibling
from .sibling import example

Standard library code should avoid complex package layouts and always use absolute imports.
Implicit relative imports should never be used and have been removed in Python 3.
When importing a class from a class-containing module, it's usually okay to spell this:

In [None]:
from myclass import MyClass
from foo.bar.yourclass import YourClass

If this spelling causes local name clashes, then spell them explicitly:

In [None]:
import myclass
import foo.bar.yourclass

and use "myclass.MyClass" and "foo.bar.yourclass.YourClass".

Wildcard imports (from <module> import *) should be avoided, as they make it unclear which names are present in the namespace, confusing both readers and many automated tools. There is one defensible use case for a wildcard import, which is to republish an internal interface as part of a public API (for example, overwriting a pure Python implementation of an interface with the definitions from an optional accelerator module and exactly which definitions will be overwritten isn't known in advance).

When republishing names this way, the guidelines below regarding public and internal interfaces still apply.

#### Module Level Dunder Names

Module level "dunders" (i.e. names with two leading and two trailing underscores) such as __all__, __author__, __version__, etc. should be placed after the module docstring but before any import statements except from __future__ imports. Python mandates that future-imports must appear in the module before any other code except docstrings:

In [1]:
"""This is the example module.

This module does stuff.
"""

from __future__ import barry_as_FLUFL

__all__ = ['a', 'b', 'c']
__version__ = '0.1'
__author__ = 'Cardinal Biggles'

import os
import sys

### String Quotes

In Python, single-quoted strings and double-quoted strings are the same. This PEP does not make a recommendation for this. Pick a rule and stick to it. When a string contains single or double quote characters, however, use the other one to avoid backslashes in the string. It improves readability.

For triple-quoted strings, always use double quote characters to be consistent with the docstring convention in PEP 257.

### Whitespace in Expressions and Statements

Avoid extraneous whitespace in the following situations:

- Immediately inside parentheses, brackets or braces.

In [None]:
# Yes: spam(ham[1], {eggs: 2})
# No:  spam( ham[ 1 ], { eggs: 2 } )

Between a trailing comma and a following close parenthesis.

In [None]:
# Yes:
foo = (0,)
# No:
bar = (0, )

Immediately before a comma, semicolon, or colon:

In [None]:
# Yes:
if x == 4: print x, y; x, y = y, x
# No:
if x == 4 : print x , y ; x , y = y , x

However, in a slice the colon acts like a binary operator, and should have equal amounts on either side (treating it as the operator with the lowest priority). In an extended slice, both colons must have the same amount of spacing applied. Exception: when a slice parameter is omitted, the space is omitted.

In [None]:
# Yes:
ham[1:9], ham[1:9:3], ham[:9:3], ham[1::3], ham[1:9:]
ham[lower:upper], ham[lower:upper:], ham[lower::step]
ham[lower+offset : upper+offset]
ham[: upper_fn(x) : step_fn(x)], ham[:: step_fn(x)]
ham[lower + offset : upper + offset]

In [None]:
# No:
ham[lower + offset:upper + offset]
ham[1: 9], ham[1 :9], ham[1:9 :3]
ham[lower : : upper]
ham[ : upper]

Immediately before the open parenthesis that starts the argument list of a function call:

In [None]:
# Yes:
spam(1)
# No:
spam (1)

Immediately before the open parenthesis that starts an indexing or slicing:

In [None]:
# Yes:
dct['key'] = lst[index]
# No:
dct ['key'] = lst [index]

More than one space around an assignment (or other) operator to align it with another.

In [None]:
# Yes:
x = 1
y = 2
long_variable = 3

# No:
x             = 1
y             = 2
long_variable = 3

#### Other Recommendations

Avoid trailing whitespace anywhere. Because it's usually invisible, it can be confusing: e.g. a backslash followed by a space and a newline does not count as a line continuation marker. Some editors don't preserve it and many projects (like CPython itself) have pre-commit hooks that reject it.

Always surround these binary operators with a single space on either side: assignment (=), augmented assignment (+=, -= etc.), comparisons (==, <, >, !=, <>, <=, >=, in, not in, is, is not), Booleans (and, or, not).

If operators with different priorities are used, consider adding whitespace around the operators with the lowest priority(ies). Use your own judgment; however, never use more than one space, and always have the same amount of whitespace on both sides of a binary operator.

In [None]:
# Yes:
i = i + 1
submitted += 1
x = x*2 - 1
hypot2 = x*x + y*y
c = (a+b) * (a-b)

# No:

i=i+1
submitted +=1
x = x * 2 - 1
hypot2 = x * x + y * y
c = (a + b) * (a - b)

Don't use spaces around the = sign when used to indicate a keyword argument, or when used to indicate a default value for an unannotated function parameter.

In [None]:
# Yes:
def complex(real, imag=0.0):
    return magic(r=real, i=imag)

# No:
def complex(real, imag = 0.0):
    return magic(r = real, i = imag)

Compound statements (multiple statements on the same line) are generally discouraged.

In [None]:
# Yes:
if foo == 'blah':
    do_blah_thing()
    do_one()
    do_two()
    do_three()
    
    
    # Rather not:

    if foo == 'blah': do_blah_thing()
    do_one(); do_two(); do_three()

    While sometimes it's okay to put an if/for/while with a small body on the same line, never do this for multi-clause statements. Also avoid folding such long lines!

# Rather not:

    if foo == 'blah': do_blah_thing()
    for x in lst: total += x
    while t < 10: t = delay()

    Definitely not:

    if foo == 'blah': do_blah_thing()
    else: do_non_blah_thing()

    try: something()
    finally: cleanup()

    do_one(); do_two(); do_three(long, argument,
                                 list, like, this)

    if foo == 'blah': one(); two(); three()

### Comments

Comments that contradict the code are worse than no comments. Always make a priority of keeping the comments up-to-date when the code changes!

Comments should be complete sentences. The first word should be capitalized, unless it is an identifier that begins with a lower case letter (never alter the case of identifiers!).

Block comments generally consist of one or more paragraphs built out of complete sentences, with each sentence ending in a period.

You should use two spaces after a sentence-ending period in multi- sentence comments, except after the final sentence.

When writing English, follow Strunk and White.

Python coders from non-English speaking countries: please write your comments in English, unless you are 120% sure that the code will never be read by people who don't speak your language.
Block Comments

Block comments generally apply to some (or all) code that follows them, and are indented to the same level as that code. Each line of a block comment starts with a # and a single space (unless it is indented text inside the comment).

Paragraphs inside a block comment are separated by a line containing a single #.
Inline Comments

Use inline comments sparingly.

An inline comment is a comment on the same line as a statement. Inline comments should be separated by at least two spaces from the statement. They should start with a # and a single space.

Inline comments are unnecessary and in fact distracting if they state the obvious. Don't do this:

In [28]:
x = x + 1                 # Increment x

# But sometimes, this is useful:

x = x + 1                 # Compensate for border

#### Documentation Strings

Conventions for writing good documentation strings (a.k.a. "docstrings") are immortalized in PEP 257.

Write docstrings for all public modules, functions, classes, and methods. Docstrings are not necessary for non-public methods, but you should have a comment that describes what the method does. This comment should appear after the def line.

PEP 257 describes good docstring conventions. Note that most importantly, the """ that ends a multiline docstring should be on a line by itself:

In [31]:
"""
Return a foobang

Optional plotz says to frobnicate the bizbaz first.
"""

'\nReturn a foobang\n\nOptional plotz says to frobnicate the bizbaz first.\n'

### Naming Conventions

Names to Avoid:

Never use the characters 'l' (lowercase letter el), 'O' (uppercase letter oh), or 'I' (uppercase letter eye) as single character variable names.

In some fonts, these characters are indistinguishable from the numerals one and zero. When tempted to use 'l', use 'L' instead.
ASCII Compatibility

Identifiers used in the standard library must be ASCII compatible as described in the policy section of PEP 3131.

#### Package and Module Names

Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability. Python packages should also have short, all-lowercase names, although the use of underscores is discouraged.

#### Class Names

Class names should normally use the CapWords convention.

The naming convention for functions may be used instead in cases where the interface is documented and used primarily as a callable.

#### Exception Names

Because exceptions should be classes, the class naming convention applies here. However, you should use the suffix "Error" on your exception names (if the exception actually is an error).

#### Global Variable Names

(Let's hope that these variables are meant for use inside one module only.) The conventions are about the same as those for functions.

#### Function and Variable Names

Function names should be lowercase, with words separated by underscores as necessary to improve readability.

Variable names follow the same convention as function names.

mixedCase is allowed only in contexts where that's already the prevailing style (e.g. threading.py), to retain backwards compatibility.

#### Function and Method Arguments

Always use self for the first argument to instance methods.

Always use cls for the first argument to class methods.

If a function argument's name clashes with a reserved keyword, it is generally better to append a single trailing underscore rather than use an abbreviation or spelling corruption. Thus class_ is better than clss. (Perhaps better is to avoid such clashes by using a synonym.)

#### Method Names and Instance Variables

Use the function naming rules: lowercase with words separated by underscores as necessary to improve readability.

Use one leading underscore only for non-public methods and instance variables.

To avoid name clashes with subclasses, use two leading underscores to invoke Python's name mangling rules.

### Programming Recommendations

Code should be written in a way that does not disadvantage other implementations of Python (PyPy, Jython, IronPython, Cython, Psyco, and such).

For example, do not rely on CPython's efficient implementation of in-place string concatenation for statements in the form a += b or a = a + b. This optimization is fragile even in CPython (it only works for some types) and isn't present at all in implementations that don't use refcounting. In performance sensitive parts of the library, the ''.join() form should be used instead. This will ensure that concatenation occurs in linear time across various implementations.

Comparisons to singletons like None should always be done with is or is not, never the equality operators.

Also, beware of writing if x when you really mean if x is not None -- e.g. when testing whether a variable or argument that defaults to None was set to some other value. The other value might have a type (such as a container) that could be false in a boolean context!

Use is not operator rather than not ... is. While both expressions are functionally identical, the former is more readable and preferred.

~~~python
# Yes:
    if foo is not None:
# No:
    if not foo is None:
~~~

#### Exception

Derive exceptions from Exception rather than BaseException. Direct inheritance from BaseException is reserved for exceptions where catching them is almost always the wrong thing to do.

Design exception hierarchies based on the distinctions that code catching the exceptions is likely to need, rather than the locations where the exceptions are raised. Aim to answer the question "What went wrong?" programmatically, rather than only stating that "A problem occurred" (see PEP 3151 for an example of this lesson being learned for the builtin exception hierarchy)

Class naming conventions apply here, although you should add the suffix "Error" to your exception classes if the exception is an error. Non-error exceptions that are used for non-local flow control or other forms of signaling need no special suffix.

Use exception chaining appropriately. In Python 3, "raise X from Y" should be used to indicate explicit replacement without losing the original traceback.

When deliberately replacing an inner exception (using "raise X" in Python 2 or "raise X from None" in Python 3.3+), ensure that relevant details are transferred to the new exception (such as preserving the attribute name when converting KeyError to AttributeError, or embedding the text of the original exception in the new exception message).

When raising an exception in Python 2, use raise ValueError('message') instead of the older form raise ValueError, 'message'.

The latter form is not legal Python 3 syntax.

The paren-using form also means that when the exception arguments are long or include string formatting, you don't need to use line continuation characters thanks to the containing parentheses.

When catching exceptions, mention specific exceptions whenever possible instead of using a bare except: clause:
~~~python
    try:
        import platform_specific_module
    except ImportError:
        platform_specific_module = None
~~~

A bare except: clause will catch SystemExit and KeyboardInterrupt exceptions, making it harder to interrupt a program with Control-C, and can disguise other problems. If you want to catch all exceptions that signal program errors, use except Exception: (bare except is equivalent to except BaseException:).

A good rule of thumb is to limit use of bare 'except' clauses to two cases:

If the exception handler will be printing out or logging the traceback; at least the user will be aware that an error has occurred.

If the code needs to do some cleanup work, but then lets the exception propagate upwards with raise. try...finally can be a better way to handle this case.

When binding caught exceptions to a name, prefer the explicit name binding syntax added in Python 2.6:
~~~python
    try:
        process_data()
    except Exception as exc:
        raise DataProcessingFailedError(str(exc))
~~~

This is the only syntax supported in Python 3, and avoids the ambiguity problems associated with the older comma-based syntax.

When catching operating system errors, prefer the explicit exception hierarchy introduced in Python 3.3 over introspection of errno values.

Additionally, for all try/except clauses, limit the try clause to the absolute minimum amount of code necessary. Again, this avoids masking bugs.

~~~python
    Yes:

    try:
        value = collection[key]
    except KeyError:
        return key_not_found(key)
    else:
        return handle_value(value)

    No:

    try:
        # Too broad!
        return handle_value(collection[key])
    except KeyError:
        # Will also catch KeyError raised by handle_value()
        return key_not_found(key)
~~~
    When a resource is local to a particular section of code, use a with statement to ensure it is cleaned up promptly and reliably after use. A try/finally statement is also acceptable.

#### Context manager

Context managers should be invoked through separate functions or methods whenever they do something other than acquire and release resources.

In [None]:
    Yes:

    with conn.begin_transaction():
        do_stuff_in_transaction(conn)

    No:

    with conn:
        do_stuff_in_transaction(conn)

The latter example doesn't provide any information to indicate that the __enter__ and __exit__ methods are doing something other than closing the connection after a transaction. Being explicit is important in this case.

Be consistent in return statements. Either all return statements in a function should return an expression, or none of them should. If any return statement returns an expression, any return statements where no value is returned should explicitly state this as return None, and an explicit return statement should be present at the end of the function (if reachable).

~~~python
#Yes:

    def foo(x):
        if x >= 0:
            return math.sqrt(x)
        else:
            return None

    def bar(x):
        if x < 0:
            return None
        return math.sqrt(x)

#No:

    def foo(x):
        if x >= 0:
            return math.sqrt(x)

    def bar(x):
        if x < 0:
            return
        return math.sqrt(x)
~~~

#### string 

Use string methods instead of the string module.

String methods are always much faster and share the same API with unicode strings. Override this rule if backwards compatibility with Pythons older than 2.0 is required.

Use ''.startswith() and ''.endswith() instead of string slicing to check for prefixes or suffixes.

startswith() and endswith() are cleaner and less error prone:
~~~python
    # Yes: 
    if foo.startswith('bar'):
    # No:
    if foo[:3] == 'bar':
~~~

Object type comparisons should always use isinstance() instead of comparing types directly.
~~~python
    # Yes: 
    if isinstance(obj, int):
    # No:  
    if type(obj) is type(1):
~~~
When checking if an object is a string, keep in mind that it might be a unicode string too! In Python 2, str and unicode have a common base class, basestring, so you can do:
~~~python
    if isinstance(obj, basestring):
~~~
Note that in Python 3, unicode and basestring no longer exist (there is only str) and a bytes object is no longer a kind of string (it is a sequence of integers instead).

For sequences, (strings, lists, tuples), use the fact that empty sequences are false.
~~~python
    # Yes: 
    if not seq:
    if seq:

    # No:  if len(seq):
    if not len(seq):
~~~
Don't write string literals that rely on significant trailing whitespace. Such trailing whitespace is visually indistinguishable and some editors (or more recently, reindent.py) will trim them.

Don't compare boolean values to True or False using ==.
~~~python
    # Yes:
    if greeting:
    # No:
    if greeting == True:
    # Worse:
    if greeting is True:
~~~