<a href="https://colab.research.google.com/github/DaryaTereshchenko/DaryaTereshchenko/blob/main/Clean_Code.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Clean Code in Python

### Why clean code?

Look at the following function. Is it easy to understand what it does?

In [None]:
def password(p):
    n = 0
    c = 0
    for i in p:
        if i.isdigit():
            n += 1
        if i.isupper():
            c += 1
    return len(p) >= 8 and p.isalnum() and c == 2 and n == 2

[True, True, False, False]


Now look at the following code. The only thing that changed in the logic is the variable declaration at the start.

Is it easier to understand?

In [None]:
def is_valid(password):
    """Checks that a password is valid:

     * it has at least 8 characters
     * it contains only letters and digits
     * it contains exactly 2 digits
     * it contains exactly 2 capital letters
    """
    digits, capitals = 0, 0
    for c in password:
        if c.isdigit():
            digits += 1
        if c.isupper():
            capitals += 1
    return (len(password) >= 8 and password.isalnum()
           and digits == 2 and capitals == 2)

## Code Formatting

Pay attention to:
- Indentation
- Lines and line breaks
- Variable and function names
- Comments and docstrings

Above all: be consistent.

Check out [Style Guide for Python Code](https://peps.python.org/pep-0008/) for more details.

In [None]:
def tokenize(s): pass
def is_foreign_word(t): pass

def has_foreign_word(sentence):
  """Check if there's at least one foreign word in a sentence.
  
  sentence: a string that contains the raw (untokenized) sentence.
  Returns True if there's a foreign word in the sentence, and False otherwise.
  """
  for token in tokenize(sentence):
    if is_foreign_word(token):
      return True
  return False


['Hello', 'World']

### Lines and line breaks

Keep your line length under 80 characters. This line (11) is too long:

In [None]:
def calculate_price(supplies_cost, wage, utilities_rate, time, tax, profit_margin):
  """Calculates the price (in $) of a product per item

   * supplies_cost: total cost of supplies for an item ($)
   * wage: labor cost per hour ($)
   * utilities_rate: cost of utilities (electricity, etc.) per hour ($)
   * time: time to make the item (in hours)
   * tax: tax paid during sale (%)
   * profit_margin: the profit margin for the product (%)
  """
  return (supplies_cost + (wage + utilities_rate) * time) * (1 + tax / 100) * (1 + profit_margin / 100)

calculate_price(20, 10, 1.5, 2, 20, 50)

77.4

Here are some strategies to make lines shorter.

Breaking up the line:

In [None]:
def calculate_price(supplies_cost, wage, utilities_rate, time, tax, profit_margin):
  """Calculates the price (in $) of a product per item

   * supplies_cost: total cost of supplies for an item ($)
   * wage: labor cost per hour ($)
   * utilities_rate: cost of utilities (electricity, etc.) per hour ($)
   * time: time to make the item (in hours)
   * tax: tax paid during sale (%)
   * profit_margin: the profit margin for the product (%)
  """
  production_cost = (wage + utilities_rate) * time
  tax_coeff = (1 + tax / 100)
  profit_margin_coeff = (1 + profit_margin / 100)
  return (supplies_cost + production_cost) * tax_coeff * profit_margin_coeff

calculate_price(20, 10, 1.5, 2, 20, 50)

77.4

Using implicit line continuation:

In [None]:
def calculate_price(supplies_cost, wage, utilities_rate, 
                    time, tax, profit_margin):
  """Calculates the price (in $) of a product per item

   * supplies_cost: total cost of supplies for an item ($)
   * wage: labor cost per hour ($)
   * utilities_rate: cost of utilities (electricity, etc.) per hour ($)
   * time: time to make the item (in hours)
   * tax: tax paid during sale (%)
   * profit_margin: the profit margin for the product (%)
  """
  return ((supplies_cost + (wage + utilities_rate) * time) 
          * (1 + tax / 100) * (1 + profit_margin / 100))

calculate_price(20, 10, 1.5, 2, 20, 50)

Using line breaks:

In [None]:
def calculate_price(supplies_cost, wage, utilities_rate, 
                    time, tax, profit_margin):
  """Calculates the price (in $) of a product per item

   * supplies_cost: total cost of supplies for an item ($)
   * wage: labor cost per hour ($)
   * utilities_rate: cost of utilities (electricity, etc.) per hour ($)
   * time: time to make the item (in hours)
   * tax: tax paid during sale (%)
   * profit_margin: the profit margin for the product (%)
  """
  return (supplies_cost + (wage + utilities_rate) * time) \
         * (1 + tax / 100) * (1 + profit_margin / 100)

calculate_price(20, 10, 1.5, 2, 20, 50)

77.4

## Python Lifehacks

### Conditions

Everything in Python is `True`, except:
- Zeroes
- Empty sequences
- `None`
- `False`

You can check your values with the `bool()` function.

On a match, `re.match()` returns the match which is an object, and therefore will be `True` if the regex matches.

In [None]:
import re

s = "hello"

if not s:
  print("The string is empty")

if re.match(".*ous", "studious"):
  print("This word ends with `ous`")

This word ends with `ous`


Booleans can be numbers:
- `True` is 1
- `False` is 0
- `None` is **not** a number

In [None]:
sum([True, False, True])
# 2

sum([True, None, True])
# TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'

The in operator is useful in conditions:

In [None]:
word = "better"

if word == "good" or word == "better" or word == "best":
  print("This is a good word")

if word in {"good", "better", "best"}:
  print("This is a good word")

Both `if x not in` and `if not x in` work. 

Pick whichever looks most readable to you.

In [None]:
digit = 5

if not digit in [0, 2, 4, 6, 8]:
  print("This is not an even digit")

if digit not in [0, 2, 4, 6, 8]:
  print("This is not an even digit")

This is not an even digit
This is not an even digit


### Data Structures

In [None]:
# list
["Hello", "world", "!"]

# set
{"Africa", "Europe", "Asia", "Australia",
 "Antarctica", "North America", "South America"}

# dictionary
{
    "hello": 5,
    "world": 5,
    "!": 1
}

{'!': 1, 'hello': 5, 'world': 5}

A set is useful when you don't care how many times an item appears, or if it doesn't make sense for it to appear more than once.

In [None]:
sentence = ["a", "good", "person", "is", "a", "person", "who", "does", "good"]
print(len(sentence))

sentence_set = set(sentence)
print(sentence_set)
print(len(sentence_set))

9
{'good', 'is', 'a', 'does', 'person', 'who'}
6


Dictionaries allow for complex structures and let us label data.

In [None]:
sentence = {
    "subject": "Obi-Wan Kenobi",
    "object": "our only hope",
    "verb": "is"
}

def svo(sentence):
  return (sentence['subject'] + " " + sentence['verb'] + " "
          + sentence['object'] + ".")

print(svo(sentence))

def yoda(sentence):
  return (sentence['object'] + ", " + sentence['subject'] + " "
          + sentence['verb'] + ".")
  
print(yoda(sentence))

Obi-Wan Kenobi is our only hope.
our only hope, Obi-Wan Kenobi is.


Data structures can be of arbitrary complexity and can contain other data structures.

In [None]:
sentence = [
    {"word": "I",
     "pos": "pronoun",
     "attrs": {"first-person", "singular", "subjective"}},
    {"word": "like",
     "pos": "verb",
     "attrs": {"present-tense", "abstract"}},
    {"word": "green",
     "pos": "adjective",
     "attrs": {"color", "absolute"}},
    {"word": "apples",
     "pos": "noun",
     "attrs": {"plural", "concrete", "fruit"}}
]

print(sentence[1]["pos"])
print(sentence[2:4])
print("fruit" in sentence[-1]["attrs"])

verb
[{'word': 'green', 'pos': 'adjective', 'attrs': {'color', 'absolute'}}, {'word': 'apples', 'pos': 'noun', 'attrs': {'fruit', 'plural', 'concrete'}}]
True


### Strings

Strings are **lists** of characters. This means:
- We can access characters in strings using indices
- We can slice strings just like arrays
- We can iterate over strings


In [None]:
s = "Hello World!"

last_char = s[-1]
# "!"

hello = s[:5]
# "Hello"

reversed = s[::-1]
# "!dlroW olleH"

for c in s:
  print(c.capitalize())
# H
# E
# L
# L
# O
#  
# W
# O
# R
# L
# D
# !

def vowel(c):
  if c in "aeiou":
    print(c, "is a vowel!")

vowel('a')

H
E
L
L
O
 
W
O
R
L
D
!
a is a vowel!


### List Comprehensions

A `for` loop is a tried and true method to get stuff done in Python.

In [None]:
squares = []
for x in range(10):
  squares.append(x**2)
print(squares)

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


But sometimes, you want something fancier.

In [None]:
squares = [x**2 for x in range(10)]
print(squares)

List comprehensions can help if you need to convert a data structure (most often a list) to another data structure.

You can include a condition in a list comprehension. This way, you can filter data:

In [None]:
odd_squares = [x**2 for x in range(10) if x % 2 == 1]
print(odd_squares)

[1, 9, 25, 49, 81]


Sometimes, you have several nested `for` loops.

In [None]:
sentence = ["Hello", "World", "!"]

vowels = []
for word in sentence:
  for c in word:
    if c in "aeiou":
      vowels.append(c)

print(vowels)

['e', 'o', 'o']


You can convert them to a list comprehension by using several `for`s.

**NB!** Keep in mind that you have to go left to right, from the outside to the inside. `word` should be declared before it's used later on.

In [None]:
sentence = ["Hello", "World", "!"]

vowels = [c for word in sentence for c in word if c in "aeiou"]
print(vowels)

['e', 'o', 'o']


If you mix up the order, your code will either throw an exception or fail silently.

In [None]:
sentence = ["Hello", "World", "!"]

# wrong order: uses existing `word` variable
vowels = [c for c in word for word in sentence if c in "aeiou"]
print(vowels)

[]
!


Sometimes, you write an extremely cool list comprehension and feel great. But then your friend looks at your code and says they can't understand a thing.

In [None]:
sentence = "hello to the world!"

" ".join([word if word in {"a", "an", "the", "in", "on", "up", "by", "to"} 
          else word.capitalize() for word in sentence.split()])

'Hello to the World!'

If you can't explain your list comprehension in a simple sentence, or if it barely fits on several lines, then it's likely that the logic you're writing is too complicated for a single list comprehension. Even if it compiles.

This is likely much more readable:

In [None]:
sentence = "hello to the world!"

capitalized_words = []
for word in sentence.split():
  # don't capitalize articles and prepositions
  if word in {"a", "an", "the", "in", "on", "up", "by", "to"}:
    capitalized_words.append(word)
  else:
    capitalized_words.append(word.capitalize())

" ".join(capitalized_words)

'Hello to the World!'

List comprehensions aren't only useful for creating different lists. Python has a number of functions that aggregate information in a list in some way.

You might have wondered in what situation we could encounter a list of booleans. Here's an example: we can create it via a list comprehension, and then use `all` or `any` on it to check a logical condition.

In [None]:
sentence = ["Hello", "World"]

print(all([word.istitle() for word in sentence]))

print(any([word.startswith("W") for word in sentence]))

True
True


Remember the bit about how booleans can be numbers? Here's a possible way to use that.

If you want to get all the vowels from a sentence, you might do it like this:
- Use a condition to filter the sentence and get a list of vowels.
- Count how many items it has.

In [None]:
sentence = ["Hello", "World", "!"]

vowels = [c for word in sentence for c in word if c in "aeiou"]
print(vowels)
len(vowels)

['e', 'o', 'o']


3

However, you could also do this:
- Get a list of `True`/`False` values that correspond to whether a character is a vowel.
- Call `sum` and get the number of the ones that are `True`.

In [None]:
sentence = ["Hello", "World", "!"]

vowels_boolean = [c in "aeiou" for word in sentence for c in word]
print(vowels_boolean)
sum(vowels_boolean)

[False, True, False, False, True, False, True, False, False, False, False]


3

Both options are good. However, if your list has very large objects, the list of booleans will take significantly less memory.

### Destructuring

You all know how to do variable assignment (I hope!).

In [None]:
x = 5
y = 10
print(x, y)

5 10


However, you might not know you can also do it like this:

In [None]:
x, y = 5, 10
print(x, y)

5 10


Under the hood, this expression uses a mechanism called **destructuring**. In Python, you can easily break down a data structure into separate variables.

For example, a list:

In [None]:
fives = [5, 10]
x, y = fives
print(x, y)

5 10


However, if the number of elements doesn't match, we'll get an exception.

In [None]:
fives = [5, 10, 15, 20, 25]
x, y = fives
print(x, y)

ValueError: ignored

One way to fix it is to use an asterisk. It will collect all the remaining elements into a list.

In [None]:
fives = [5, 10, 15, 20, 25]
x, y, *others = fives
print(x, y, others)

5 10 [15, 20, 25]


You can also use it at the start:

In [None]:
sentence = ["Hello", "world", "!"]
*start, end = sentence
print(start, end)

['Hello', 'world'] !


Or even in the middle:

In [None]:
sentence = ["Hello", "my", "dear", "world", "!"]
first, *middle, last = sentence
print(first, middle, last)

if first.istitle() and last in "?!.":
  print("The sentence is ok!")

Hello ['my', 'dear', 'world'] !
The sentence is ok!


If you don't need a value, you can use an underscore (`_`) as a variable name. It's not required, but it's the convention in Python.

In [None]:
token = "apple/NN"
word, _ = token.split("/")
print(word)

apple


Destructuring can also be used in `for` loops. For example, to iterate over dictionaries:

In [None]:
frequencies = {
    "the": 500,
    "a": 400,
    "an": 50
}

for key, value in frequencies.items():
  print(f'The frequency of "{key}" is {value}')

The frequency of "the" is 500
The frequency of "a" is 400
The frequency of "an" is 50


But also in other cases:

In [None]:
word = "hello"

for i, c in enumerate(word):
  print(f"'{c}' is the letter number {i + 1}")

'h' is the letter number 1
'e' is the letter number 2
'l' is the letter number 3
'l' is the letter number 4
'o' is the letter number 5


### Files

Hopefully, you already know how to work with files, but let's recap. 

In [None]:
# create file.txt
!echo -e "I am a file!\nNo, really." > file.txt

Use `with open("filename") as f:` to open a file. (You can use any variable name but `f` is most often used.)

- `open("filename", "r")` opens a file for reading (this is the **default** one)
- `open("filename", "w")` opens a file for writing (it overwrites the initial contents)
- `open("filename", "a")` open a file for appending (adding at the end)

There are other options, but these are the main ones.

In [None]:
with open("file.txt") as f:
  lines = f.readlines()
  print(lines)

with open("my-file.txt", "w") as f:
  for line in lines:
    f.write("The file said: " + line)

['I am a file!\n', 'No, really.\n']


## General Wisdom

### Code Testing

Code testing is an art, and the margins of this Colab notebook are too narrow to contain it. However, here's some advice:

- Automated tests help you improve your code
- Use asserts to check that something works as you expect
- Write tests for edge cases and unexpected situations, too
- TDD (Test-Driven Development): try writing tests before you write the code
- Bad tests are better than no tests

In [None]:
def add_five(x):
  return x + 4

assert add_five(7) == 12

AssertionError: ignored

### Code Duplication

- If you find yourself copy-pasting code, it likely should be a function/variable
- Duplicate code breeds inconsistency
- However, premature optimization is the root of evil

Original code:

In [None]:
def capitalize(token):
  if token.split("/")[1] == "NNP" or token.split("/")[1] == "NNPS":
    return token.split("/")[0].capitalize() + "/" + token.split("/")[1]
  return token

capitalize("apple/NNPS")

'apple/NNS'

Improving performance by removing duplicate method calls:

In [None]:
def capitalize(token):
  word_pos = token.split("/")
  if word_pos[1] == "NNP" or word_pos[1] == "NNPS":
    return word_pos[0].capitalize() + "/" + word_pos[1]
  return token

capitalize("apple/NNPS")

'Apple/NNPS'

Making variables clearer:

In [None]:
def capitalize(token):
  word, pos = token.split("/")
  if pos == "NNP" or pos == "NNPS":
    return word.capitalize() + "/" + pos
  return token

capitalize("apple/NNPS")

'Apple/NNPS'

Tidying up:

In [None]:
def capitalize(token):
  word, pos = token.split("/")
  if pos in {"NNP", "NNPS"}:
    return word.capitalize() + "/" + pos
  return token

capitalize("apple/NNPS")

'Apple/NNPS'

## What We've Learned

Let's look at our exam questions once again.



### 1. An easy task for a skilled student

In [None]:
%%file tests/shorten.py

import pytest

def shorten(phrase):
  # Your code here
  pass

def test_shorten():
  assert shorten("the tasks that are easy") == "the easy tasks"
  assert shorten("a teacher who is amazing") == "an amazing teacher"

Overwriting tests/shorten.py


Run this to test the function:

In [None]:
!pytest tests/shorten.py

platform linux -- Python 3.7.13, pytest-3.6.4, py-1.11.0, pluggy-0.7.1
rootdir: /content, inifile:
plugins: typeguard-2.7.1
[1mcollecting 0 items                                                             [0m[1mcollecting 1 item                                                              [0m[1mcollected 1 item                                                               [0m

tests/shorten.py F[36m                                                       [100%][0m

[31m[1m_________________________________ test_shorten _________________________________[0m

[1m    def test_shorten():[0m
[1m>     assert shorten("the tasks that are easy") == "the easy tasks"[0m
[1m[31mE     AssertionError: assert None == 'the easy tasks'[0m
[1m[31mE      +  where None = shorten('the tasks that are easy')[0m

[1m[31mtests/shorten.py[0m:9: AssertionError


## Let's Practice!

- [Practice: Volume 0](https://gitlab.com/grammarly-compling-summer-school/summer-school-2022/-/blob/master/classes/4_python/tasks.md)
- [Practice: Volume 1](https://gitlab.com/grammarly-compling-summer-school/summer-school-2022/-/blob/master/classes/4_python/tasks.md#volume-1)