# Week 11 Further Readings

This week we'll look at some Python constructs that makes it readable, likable and popular. These are often called **_syntactic sugar_** because it makes the code less verbose and more concise. [Formally](https://en.wikipedia.org/wiki/Syntactic_sugar), 
> Syntactic sugar is syntax within a programming language that is designed to make things easier to read or to express. It makes the language "sweeter" for human use: things can be expressed more clearly, more concisely, or in an alternative style that some may prefer.

That means there are equivalent constructs that achieve the same thing but are often not used/preferred because there is a more elegant or succinct way of writing the same code. In part one of a two series readings, let us explore a few such constructs. You may be familiar with and already using these in your assignments and other coding exercises without realizing there is a more verbose way of writing them. The goal of discussing these here is two-fold:
1. Introduce you to a few (hopefully)new syntactic sugar constructs that you can adopt.
2. You are aware of the shorthand and their equivalent longhand notations and use them appropriately to make the code more readable. It is important to note that less verbose code is not always very readable.

Let us start with some easy ones.

### Unpacking iterables

Unpacking an iterable means assigning its values to a series of variables one by one. This is something we have used quite a lot in assignments when working with BLAST results when reading in only the fields of interest and not worrying about others. 

In [1]:
blast_hit = ["sp|P09835|UHPB_ECOLI|311-499", "NZ_CP028828.1", 42.408, 191, 105, 3, 1, 188, 621711, 622277, 2.66e-23, 93.6, 189]

#longer version
sseq_id = blast_hit[1]
pident = blast_hit[2]
leng = blast_hit[3]
sstart = blast_hit[8]
send = blast_hit[9]
qlen = blast_hit[12]

print(f'''Subject Sequence ID: {sseq_id}
% of identical positions: {pident}
Length: {leng}
Start of subject sequence alignment: {sstart}
End of subject sequence alignment: {send}
Query sequence length: {qlen}'''
)

Subject Sequence ID: NZ_CP028828.1
% of identical positions: 42.408
Length: 191
Start of subject sequence alignment: 621711
End of subject sequence alignment: 622277
Query sequence length: 189


In [2]:
#using variable unpacking
_, sseq_id, pident, leng, _, _, _, _, sstart, send, _, _, qlen = blast_hit

print(f'''Subject Sequence ID: {sseq_id}
% of identical positions: {pident}
Length: {leng}
Start of subject sequence alignment: {sstart}
End of subject sequence alignment: {send}
Query sequence length: {qlen}'''
)

Subject Sequence ID: NZ_CP028828.1
% of identical positions: 42.408
Length: 191
Start of subject sequence alignment: 621711
End of subject sequence alignment: 622277
Query sequence length: 189


You may remember from class that we can use a `*` in front of a variable to consume multiple values, for example `*_` consumes 4 values between "leng" and "sstart". However, note that `*_` can only be used once in an assignment statement as Python figures out how many values `*` should consume based on the fixed number of other values being unpacked.

In [3]:
_, sseq_id, pident, leng, *_, sstart, send, _, _, qlen = blast_hit

print(f'''Subject Sequence ID: {sseq_id}
% of identical positions: {pident}
Length: {leng}
Start of subject sequence alignment: {sstart}
End of subject sequence alignment: {send}
Query sequence length: {qlen}'''
)

Subject Sequence ID: NZ_CP028828.1
% of identical positions: 42.408
Length: 191
Start of subject sequence alignment: 621711
End of subject sequence alignment: 622277
Query sequence length: 189


Note that Python will complain if there are not enough or more variables given while unpacking. Even though the demo illustrates unpacking a list, we can extend this concept to any iterable. When you iterate over a dictionary, it unpacks the keys by default. 

An interesting consequence of this variable unpacking is swapping two numbers without a third variable which you might have come across: 

In [4]:
a = 38
b = 94
print(f"Before swapping a: {a}, b: {b}")

a, b = b, a
print(f"After swapping a: {a}, b: {b}")

Before swapping a: 38, b: 94
After swapping a: 94, b: 38


What happens behind the scenes when you write a statement like `a, b = b, a` is that variables with commas on the right side of the statement are packed into a tuple and unpacked by placing the same number of comma-separated target variables on the left side. We commonly see tuples being written as `(item1, item2, ...)` but tuples can also be constructed by separating tuple items using commas without explict round brackets around them. These are called "implicit tuples".

### Chained conditions

Like we chained multiple methods where the requirement was that one method's return type is the same as the following method's input type, we can also chain operators. A good example of this is checking if a number falls within a range:

In [5]:
#longer version
a = 15
if a > 10 and a < 20:
    print("a is between 10 and 20")
else:
    print("a lies outside the range 10 and 20")

a is between 10 and 20


Instead, we can write this by chaining operators as below:

In [6]:
#using operator chaining
if 10 < a < 20:
    print("a is between 10 and 20")
else:
    print("a lies outside the range 10 and 20")

a is between 10 and 20


If you remember from assignments 4 and 5 where we were comparing if the hits we obtained from BLAST results for the presence of Histidine Kinase domains lie within the range of the bed feature, we used a condition:

In [7]:
blast_sstart = 1745
blast_send = 1374
bed_start = 1210
bed_end = 1868

#longer version
if ((blast_sstart > bed_start and blast_send > bed_start) and (blast_sstart <= bed_end and blast_send <= bed_end)):
    print("Hit is within gene boundary")
else:
    print("Hit does not lie within the gene boundary")

Hit is within gene boundary


We can rewrite the same conditions using operator chaining as below which is more compact: 

In [8]:
if (bed_start < blast_sstart <= bed_end and bed_start < blast_send <= bed_end):
    print("Hit is within gene boundary")
else:
    print("Hit does not lie within the gene boundary")

Hit is within gene boundary


You'll find more examples and different ways of chaining operators [here](https://realpython.com/python-boolean/#chaining-comparison-operators).

### Conditional expressions

If you have used other programming languages like C, you might be familiar with a syntax like `condition ? task_if_true : task_if_false` which is called a ternary operator. This is equivalent but a shorthand notation for a simple `if`-`else` statement.

In [9]:
#longer version
ch  = 'b'
if ch in "aeiou":
    print("It's a vowel")
else:
    print("It's a consonant")

It's a consonant


We can write a one-liner, English-like equivalent of the above `if`-`else` block or a syntax equivalent to the ternary operator in Python called "conditional expression" as below:

`task_if_true if condition else task_if_false`

In [10]:
#using conditional expression
print("It's a vowel") if ch in "aeiou" else print("It's a consonant")

It's a consonant


Further, we can use this syntax for variable assignment:

In [11]:
today = "Sunday"
day = "weekday" if today in ["Monday", "Tuesday", "Wednesday", "Thursday", "Friday"] else "weekend"
print(day)

weekend


### Assignment expressions

This syntax is not something we see very often and it was introduced only in Python version 3.8. When we use a statement like `a = 10`, we assign the value 10 to a variable 'a' and this is called an assignment statement. However, with assignment expressions, we can assign a value to a variable as well as return the value. Below is a quick demo to understand what that means:

In [12]:
a = 10

(b := 15)

15

In [13]:
#to show a and b are also assigned with the values
print(f"a: {a}, b: {b}")

a: 10, b: 15


Two things to note here:
* when we used an assignment expression denoted by `:=`, it assigns the value and also returns it. Fun fact, the assignment expression is also called a "walrus operator" because `:=` looked like the face of a walrus lying on its side to some people!
* usage of the walrus operator instead of the assignment operator is illegal(as shown below) so we had to add parentheses around it to make it a valid Python code. The rationale behind this as stated in the [PEP 572](https://peps.python.org/pep-0572/#exceptional-cases) where the walrus operator was approved and implemented -
> This rule is included to simplify the choice for the user between an assignment statement and an assignment expression – there is no syntactic position where both are valid.


In [14]:
c := 20

SyntaxError: invalid syntax (1674197358.py, line 1)

Why do we care about the assignment expressions? The basic idea behind the assignment operator is to save intermediate results often from expensive calculations for later use. This could be evaluating a complex mathematical equation or pattern matching using `re`(a standard library for handling regular expressions). Let us look at examples showing both:

Let us suppose we are trying to evaluate an expression like 

$$
\sum_{n=0}^{\infty} \frac{1}{2n + 1}
$$


And we stop the summation if the value is less than 0.000000001. Since it is hard to say for what value of 'n', the equation will result in a value less than 0.000000001 beforehand, we can write a loop like this:

In [15]:
sum = 0
for n in range(0, 20_000): #we can use _ to separate numbers instead of commas
    if (1 / ( 2 * n + 1)) ** 2 >= 0.000_000_001: 
        sum += (1 / (2 * n + 1)) ** 2
    else:
        break
print(f"n={n}")
print(f"sum={sum}")

n=15811
sum=1.233684738359571


If we observe carefully, we are evaluating the expression `(1 / ( 2 * n + 1)) ** 2` two times, once in the `if` statement and second in calculating the sum, leading to a wastage of computational time. To overcome this, we can create a variable to store the value of this expression and use that variable instead.

In [16]:
#longer version
sum = 0
for n in range(0, 20_000): #we can use _ to separate numbers instead of commas
    expr = (1 / ( 2 * n + 1)) ** 2
    if expr >= 0.000_000_001: 
        sum += expr
    else:
        break
print(f"sum={sum}")

sum=1.233684738359571


By taking advantage of walrus operator, we can rewrite the above code as:

In [17]:
#using walrus operator
sum = 0
for n in range(0, 20_000): #we can use _ to separate numbers instead of commas
    if (expr := (1 / ( 2 * n + 1)) ** 2) >= 0.000_000_001: 
        sum += expr
    else:
        break
print(f"sum={sum}")

sum=1.233684738359571


Notice how we combined the assignment statement and the conditional into one statement using `:=`. To be more adventurous, we can also use conditional expressions to make the code shorter, but the readability starts to suffer. As mentioned at the beginning of the document, concise code could sometimes compromise the readability of the code, so it is necessary to balance them both.

In [18]:
sum = 0
for n in range(0, 20_000): #we can use _ to separate numbers instead of commas
    sum = sum + expr if (expr := (1 / ( 2 * n + 1)) ** 2) >= 0.000_000_001 else sum
print(f"sum={sum}")

sum=1.233684738359571


Now for the second example using `re` package. If you remember from assignment 3, we extracted gene names from the header lines of HK_domain.faa. You can download this file from the discussion post or copy the existing file to your current working directory to follow along. We'll do the same exercise here in Python:

In [19]:
import re

headers = []
with open("HK_domain.faa") as fin:
    for line in fin:
        if line.startswith(">"):
            headers.append(line.strip("\n"))
        else:
            continue    

In [20]:
#longer version
gene_names = []
for header in headers:
    match = re.search("GN=([a-z]{3}[A-Z])", header)
    if match:
        gene_names.append(match.group(1))

print(gene_names)

['baeS', 'glrK', 'hprS', 'basS', 'zraS', 'phoQ', 'dpiB', 'cusS', 'atoS', 'glnL', 'phoR', 'dcuS', 'evgS', 'narX', 'kdpD', 'rcsC', 'cpxA', 'torS', 'barA', 'envZ', 'cheA', 'arcB', 'qseC', 'rstB', 'creC', 'narQ']


Let us try to understand what the above code is doing: 
* `re.search()` - this function takes in a regex pattern and a string to search the pattern in. It returns a Match object if there is a match, else None. We need that if conditional to filter out the None values.
* `match.group()` - returns one or more subgroups of the match (you may remember that this is similar to the capture groups where we put a part of the pattern within the round brackets and later use it for back referencing using `\1`, `\2` etc). If we pass 0 as the argument, it returns all the subgroups, 1 returns the first subgroup and so on. We can also pass multiple arguments and the function returns those matches as tuples.

This is a simple example of pattern matching, but as the complexity of the patterns grows and we need to make repetitive matches, it can start to become more computationally expensive.

`re` package has many more useful tools for working with regexes and pattern matching. You can find them all in [the document](https://docs.python.org/3/library/re.html#).

Now back to using walrus operator:

In [21]:
#using walrus operator
gene_names = []
for header in headers:
    if (match := re.search("GN=([a-z]{3}[A-Z])", header)):
        gene_names.append(match.group(1))

print(gene_names)

['baeS', 'glrK', 'hprS', 'basS', 'zraS', 'phoQ', 'dpiB', 'cusS', 'atoS', 'glnL', 'phoR', 'dcuS', 'evgS', 'narX', 'kdpD', 'rcsC', 'cpxA', 'torS', 'barA', 'envZ', 'cheA', 'arcB', 'qseC', 'rstB', 'creC', 'narQ']


An assignment expression can be used in almost any kind of context where expressions are permitted in Python with the most common case being `if` statements but it can be used with a `while` condition as well. However, there are places where you should not use it including the one we saw earlier, as a replacement for assignment operator `=`. More such scenarios are provided in the [PEP 572](https://peps.python.org/pep-0572/#exceptional-cases).

To read more about the walrus operator and its use cases with examples, check out [this blog post](https://realpython.com/python-walrus-operator/).

Hopefully you enjoyed learning some pieces of syntactic sugar supported by Python! We'll explore some more next week.