In [25]:
from IPython.display import HTML
from IPython.display import display

tag = HTML('''
<style>
.advanced-cell {
    background-color: #e84c2250;
}
.advanced-cell::after {
    position: absolute;
    display: block;
    top: -2px;
    right: -2px;
    width: 5px;
    height: calc(100% + 3px);
    content: '';
    background: #e84c22;
}
.advanced-label-row {
    border-bottom: 1px solid #e84c22;
    display: flex;
    font-weight: bold;
}
.advanced-label {
    margin-left: auto;
    background-color: #e84c22;
    padding: 5px 8px;
    color: white;
    margin-right: -2px;
}
</style>
<script>

// A function to hide/show highlight advanced topics in the notebook
var highlighted = false;
function highlight_advanced_topics() {
    $(".advanced-cell").removeClass("advanced-cell");
    $(".advanced-label-row").remove();
    if(highlighted) {
        highlighted = false;
        return;
    }
    var advanced = false;
    $(".jp-Cell.jp-MarkdownCell,.jp-Cell.jp-CodeCell").each(function(){
        if(!advanced) {
            if($(this).find(".advanced-start").length > 0) {
                $(this).before("<div class='advanced-label-row'><span class='advanced-label'>Advanced Topic</span></div>");
                $(this).addClass("advanced-cell");
                advanced = true;
            }        
        } else {
            if($(this).find(".advanced-stop").length > 0) {
                if($(this).find(".advanced-start").length > 0) {
                    $(this).before("<div class='advanced-label-row' style='margin-top: 10px;'><span class='advanced-label'>Advanced Topic</span></div>");
                    $(this).addClass("advanced-cell");
                } else {
                    advanced = false;
                }
            } else {
                $(this).addClass("advanced-cell");
            }
        }
    });
    highlighted = true
}

(function() {
  // Load the script
  const script = document.createElement("script");
  script.src = 'https://ajax.googleapis.com/ajax/libs/jquery/3.6.0/jquery.min.js';
  script.type = 'text/javascript';
  script.addEventListener('load', () => {
    $(document).ready(highlight_advanced_topics);
  });
  document.head.appendChild(script);
})();
</script>
<div class="m-5 p-5"><span class="alert alert-block alert-danger">Advanced topics in notebook are highlighted!</span></div>''')
display(tag)

# Python idioms and conventions

As any other programming language, Python has a set of peculiarities and patterns to *express* a certain idea. 
Some tasks you may be already familiar with, may have a particular syntax or feature specifically dedicated to them within the Python language.

Generaly speaking, in programming an *idiom* is a particular way of writing code in order to perform a specific task, and it consistently follows always the same structure. 

If you are already familiar with the concept *design patterns*, note that idioms are different. Design patterns are high-level ideas, independent from the language, but they do not (usually) translate immediately into code. On the other hand, idioms are about code. It is the way things should be written when we want to perform a particular task in a given language.

As idioms are code, they are language dependent. Every language will have its own idioms to perform a given task. When the code follows these idioms, it is known as being idiomatic. In Python, the code *becomes* Pythonic.

Writing code in an idiomatic way usually performs better, make it more compact, easier to understand and consistent.

## Indexes and slices

Let's start with data structures or types that support accessing its elements by index. Like in many other programming languages, the first element is indexed `0`. 

For example, how would you access the last element of a sequence? One way would be to access the element in the position of the length of the sequence minus one. This works in Python as well, but we can exploit negative indexes to start counting from the *end* of the data structure.

In [3]:
even = [0, 2, 4, 6, 8, 10]
even[len(even)-1] # OK
even[-1] # Pythonic

10

Python provides extra features when we want to access the elements in a different order than usual.
We can access multiple elements using slicing `[start:stop:step]`.
This way we get all of the elements of the sequence, starting from the index `start` (included), up to the index `end` (excluded) using strides of length `step`. 
Omitting either one of the intervals, `start` or `stop`, will return all elements from the beginning or up to the end of the sequence, respectively.

In [5]:
print(even[:3])
print(even[3:])
print(even[::])
print(even[0:4:2])

[0, 2, 4]
[6, 8, 10]
[0, 2, 4, 6, 8, 10]
[0, 4]


If possible, we should always prefer to use this built-in syntax for slices, as opposed to manually trying to iterate over our sequences explicitly inside a loop, excluding the elements by hand.

We will see more details on this when will tackle `Sequences` as a topic. 

## Comprehensions and assignment expressions

Comprehension expressions are a concise way of writing code, and in general, code written this way tends to be easier to read. The exception being when we have to handle some heavy-weight transformation on the data we're collecting, as comprehension might lead to some more complicated, or more difficult to read, code. 

The use of comprehensions is recommended to create data structures in a single operation, instead of multiple ones.

In [11]:
# Without comprehension
numbers = []
for i in range(10):
    numbers.append(pow(i, 3))
    
# With comprehension
numbers = [pow(i, 3) for i in range(10)]

Code written in this form usually performs better because it uses a single Python operation, instead of calling `list.append()` repeatedly.

<span class="advanced-start"></span>
### Example - Collecting usernames from email addresses

In [4]:
from typing import Iterable, Set
import re

def collect_usernames_v1(emails: Iterable[str]) -> Set[str]:
    """
    Iterate over a collection of email addresses and store usernames in a set
    """
    usernames = set()
    for email in emails:
        matched = re.match(r'(?P<username>[\w\.]+)@((?P<website>\w+)\.)+(?P<domain>\w+)', email)
        if matched is not None:
            username = matched.groupdict()["username"]
            usernames.add(username)
    return usernames

In [14]:
emails = ["sheev.palpatine@senate.republic.gov", "count.dooku@council.separatists.gov", "obi.wan@jedi", "padme.amidala@senate.republic.gov", "bail.organa@senate.republic.gov", "nute.gunray@trade.federation.com"]
collect_usernames_v1(emails)

{'bail.organa',
 'count.dooku',
 'nute.gunray',
 'padme.amidala',
 'sheev.palpatine'}

We can achieve the same functionality in fewer lines by using comprehension expressions in a way that resembles functional programming.

In [20]:
def collect_usernames_v2(emails: Iterable[str]) -> Set[str]:
    usernames = filter(None, (re.match(r'(?P<username>[\w\.]+)@((?P<website>\w+)\.)+(?P<domain>\w+)', email) \
                              for email in emails))
    return {username.groupdict()["username"] for username in usernames}

In [21]:
collect_usernames_v2(emails)

{'bail.organa',
 'count.dooku',
 'nute.gunray',
 'padme.amidala',
 'sheev.palpatine'}

First, we apply the result of trying to match the regular expression to all the strings provided, and we filter those that are not producing a match. The result is an iterator that we will later use to extract the username in a set comprehension expression.

Since Python 3.8 and the introduction of assignment expressions, we can refactor `collect_usernames_v2` and such a rewrite actually make the code far easier to understand.

In [8]:
def collect_usernames_v3(emails: Iterable[str]) -> Set[str]:
    return {
        username.groupdict()["username"] 
        for email in emails
        if (username := re.match(r'(?P<username>[\w\.]+)@((?P<website>\w+)\.)+(?P<domain>\w+)', email)) is not None
    }

In [9]:
collect_usernames_v3(emails)

{'bail.organa',
 'count.dooku',
 'nute.gunray',
 'padme.amidala',
 'sheev.palpatine'}

On the third line of the comprehension we set a temporary name bound to the result of applying the regular expression to the string. That name can be reused elsewhere within the scope of the set comprehension.

All in all, when we have to decide in which way to write a piece of code, there is always a trade-off between compactness and adherence to conventions with respect to readability. If your code is an unreadable one-liner, the point is moot. Always keep in mind the *KISS* principle ([Keep It Simple, Stupid](https://en.wikipedia.org/wiki/KISS_principle)).

Circling back to assignment expressions, beside comprehension, they may be very useful if we take into account performance considerations. If we have to use a function as part of our transformation logic, we don't want to call that function more times than stricly necessary. Assigning the result of the function to a temporary identifier can be an optimization technique that also make the code more readable.

<span class="advanced-stop"></span>
## Underscores in Python
Single and double underscores have special meanings in Python variable and method names. Some of those meanings are merely related to conventions and intended as a hint to programmers. Others are enforced by the Python interpreter. We may encounter one of the following instances:
- Single leading underscore: `_x`
- Single trailing underscore: `x_`
- Double leading underscore: `__x`
- Double leading and trailing underscore: `__x__`
- Single underscore: `_`

When it comes to variable and method names, the single underscore prefix has a meaning by convention only. It's a hint to programmers that the Python community agrees upon, indicating that such an object is meant to be private. It does not affect the behavior of your programs.
Consider that Python does not have strong distinctions between *private* and *public* (like C++, Java etc.) so consider it as a mere suggestion. It does have repercussions when taking into account the `import` statement, but we will see more about it later on in this module.

A single trailing underscore is used by convention to avoid naming conflicts with Python keywords or symbols already defined in the current namespace.

In [34]:
name, surname, _, _ = ('Mario', 'Rossi', 'Torino', '01/01/1980')
print(name, surname, _)

Mario Rossi 01/01/1980


<span class="advanced-start"></span>
A double underscore prefix causes the Python interpreter to rewrite the name in order to avoid naming conflicts and collisions in subclasses. This is called *name mangling*. We will revisit more in depth this concept during the third module.

>Double underscores are often referred to as *dunders* in the Python community. Since they are ubiquitous in the language, dunders is just a convenience name.

Names that have both leading and trailing dunders are reserved for special use in the language. In this case, name mangling is not applied. Methods with these names are often referred to as *magic methods* (or *dunder methods*). They're a core feature in Python and are related to some of the language interal aspects. We will tackle magic methods at a later stage during this module, whilst talking about the Python data model.

<span class="advanced-stop"></span>
By convention, a single stand-alone underscore is sometimes used as a name to indicate that a variable is temporary or insignificant. It can also be used as a *don't care* variable in unpacking expressions.

## A few examples of non-Pythonic vs. Pythonc code
One reason for the high readability of Python code is its relatively complete set of code style guidelines and idioms. When say that some code is not Pythonic, we mean that it does not follow the common guidelines and fails to express its intent in what is considered the best (usually, most readable) way.

Do not stress (too much) about this, but take into account these guidelines and try to stick with them if possible.

### Variables definitions

In [24]:
notPythonicVarName = "lowerCamelCase is not the right choice for variables"

pythonic_var_name = "this is a preferred way to name variables"
a, b = 0, 1
c = "single variable"
name:str = "variable with type, only work with python 3"

### Variable exchange

In [40]:
# Rather than this...
tmp = a
a = b
b = tmp
# ...do this
a, b = b, a

### Looping
Try to exploit features of the language rather than manually handle looping over something.

In [None]:
numbers = [1,2,3,4,5]
# Rather than this...
for i in range(0, len(numbers)):
    print(numbers[i])
# ...do this
for n in numbers:
    print(n)

In [None]:
numbers = [1,2,3,4,5]
# Rather than this...
for i in range(len(numbers)):
    print(f'{i}:{numbers[i]}')
# ...do this
for i, n in enumerate(numbers):
    print(f'{i}:{n}')

In [None]:
numbers = [1,2,3,4,5]
# Rather than this...
for i in range(len(numbers)-1, -1, -1):
    print(numbers[i])
# ...do this
for n in reversed(numbers):
    print(n)

In [None]:
x = [1,2,3]
y = ['a', 'b', 'c', 'd']
# Rather than this...
n = min(len(x), len(y))
for i in range(n):
    print(f'{x[i]} --- {y[i]}')
# ...do this
for i, j in zip(x, y):
    print(f'{i} --- {j}')

### Create a homogeneous list of a given length

In [1]:
N = 10
all_zeros = [0] * N

### Create a homogeneous list of a given length containing (empty) lists
Because lists are mutable, the `*` operator would create a list of `N` references to the same list, which is not likely what you want. Use a list comprehension instead.

In [7]:
N = 10
n_lists = [[] for _ in range(N)]

[[], [], [], [], [], [], [], [], [], []]


### Creating a string from a list

In [10]:
letters = ['s', 't', 'r', 'i', 'n', 'g']
word = ''.join(letters)

<span class="advanced-start"></span>
### List comprehensions and generators

In [None]:
# This will create the whole list in memory
max_distance_from_origin = max([euclidean_distance(p) for p in points])
# This will use constant space
max_distance_from_origin = max(euclidean_distance(p) for p in points)

### Filtering a list
We should not remove items from a list while iterating on it.

In [23]:
numbers = [1, 2, 3, 4, 5, 6, 7, 8]
for n in numbers:
    if n < 4:
        numbers.remove(n)
        
print(numbers) # Wrong result!

[2, 4, 5, 6, 7, 8]


Once again, use comprehension instead.

In [27]:
# Comprehensions create a new list object
numbers = [1, 2, 3, 4, 5, 6, 7, 8]
filtered_numbers = [n for n in numbers if n < 4]

# Generators don't create another list
numbers = [1, 2, 3, 4, 5, 6, 7, 8]
filtered_numbers = (n for n in numbers if n < 4)

<span class="advanced-stop"></span>
### Modifying the values of a list
Assignments never create a new object. If two or more variables refer to the same list, changing one of them changes them all.

In [30]:
numbers = [1, 2, 3, 4, 5, 6, 7, 8]
numbers_yet_again = numbers               
for i, v in enumerate(numbers):
    numbers[i] *= 2
print(numbers, numbers_yet_again)

[2, 4, 6, 8, 10, 12, 14, 16] [2, 4, 6, 8, 10, 12, 14, 16]


In [31]:
numbers = [1, 2, 3, 4, 5, 6, 7, 8]
a_different_copy_of_numbers = numbers[:]              
for i, v in enumerate(numbers):
    numbers[i] *= 2
print(numbers, a_different_copy_of_numbers)

[2, 4, 6, 8, 10, 12, 14, 16] [1, 2, 3, 4, 5, 6, 7, 8]


### Dictionaries

In [None]:
gems = {'shappire':'blue', 'emerald':'green', 'ruby':'red'}
# This is ok
for key in gems:
    print(f'{key} -> {gems[key]}')
# This is better
for key, val in gems.items():
    print(f'{key} -> {val}')

In [56]:
names = ['shappire', 'emerald', 'ruby']
colors = ['blue', 'green', 'red']
# This is ok
gems = {}
for name, color in zip(names, colors):
    gems[name] = color
# This is better
gems = dict(zip(names, colors))

In [14]:
def lookup_collection(item, coll):
    return item in coll

my_list = ['a', 'b', 'c', 'd']
my_set = set(my_list)

### Lookup in a collection

In [15]:
lookup_collection('s', my_set) # This happens in constant time

False

In [16]:
lookup_collection('s', my_list) # This happens in linear time

False

### Explicit code
The most explicit and straightforward manner is usually preferred. Favour readability over *cleverness*.

In [61]:
# Rather than this...
def coordinates(*args):
    x, y = args
    return dict(**locals())
# ...do this
def coordinates(x, y):
    return {'x': x, 'y': y}

### One statement per line

In [None]:
# Rather than this...
print('one'); print('two')
if x == 1: print('one')
if __some__complex__comparison__ and __some__other__complex__comparison__:
    pass
# ...do this
print('one')
print('two')
if x == 1:
    print('one')
cond1 = __some__complex__comparison__
cond2 = __some__other__complex__comparison__
if cond1 and cond2:
    pass

### Check if a variable equals a constant
There is no need to explicitly compare a value to `True`, or `None`, or `0`. Just make sure to remember what evaluates to either `True` or `False` in the language.

In [None]:
# Just check the value...
if x:
    print('x is truthy!')
# ...or check for the opposite...
if not x:
    print('x is falsy!')
# ...or, since None evaluates to False, explicitly check for it
if x is None:
    print('x is None!')

### Line continuations
When a logical line of code is longer than the accepted limit, you need to split it over multiple physical lines. The Python interpreter will join consecutive lines if the last character of the line is a backslash. Remember that a white space added to the end of the line after the backslash, though, will break the code and may have unexpected results.

A better solution is to use parentheses around your elements. The Python interpreter will join the next line until the parentheses are closed. The same behavior holds for curly and square braces.

In [32]:
a_very_long_string = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, \
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, \
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat."

In [33]:
a_better_very_long_string = (
"Lorem ipsum dolor sit amet, consectetur adipiscing elit,"
"sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,"
"quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat."
)

Be aware of this feature in case it is not your intended purpose!

In [35]:
my_list_of_strings = ['a first string', 'a second string' 'I missed a comma', 'not the fourth string']
for s in my_list_of_strings:
    print(s)

a first string
a second stringI missed a comma
not the fourth string


### More and more examples...
This list of examples could go on and on and on... We will see additional ones in the following lessons and modules, in the proper context and with the proper timing.