# Code formatting and Tools

- Follow PEP-8 and have a coding style for your team (it could be an extension/modification o PEP-8).
- Use docstrings whenever possible. They can also help generating automatic documentation (using Sphynx, for example). Problem with docstrings: they need to be manually updated and can be long.

## Docstrings and Annotations

In [None]:
class Point:
    def __init__(self, lat, long):
        self.lat = lat
        self.long = long
 
 
def locate(latitude: float, longitude: float) -> Point:
    """Find an object in the map by its coordinates"""

In [None]:
locate.__doc__

We can get a dictionary with all the annotations:

In [None]:
locate.__annotations__

Docstring don't need to have datatypes if we are using annotations, but it is a good idea to include examples of input and output for complex data types (lik dynamic and nested).

In [None]:
def data_from_response(response: dict) -> dict:
    """If the response is OK, return its payload.
 
    - response: A dict like::
 
    {
        "status": 200, # <int>
        "timestamp": "....", # ISO format string of the current
        date time
        "payload": { ... } # dict with the returned data
    }
 
    - Returns a dictionary like::
 
    {"data": { .. } }
 
    - Raises:
    - ValueError if the HTTP status is != 200
    """
    if response["status"] != 200:
        raise ValueError
    return {"data": response["payload"]}

## Tools / Packages

- MyPy
- Pylint
- Makefile for typehint, lint, and test. The build should fail if any of these don't pass.
- Code formatter: Black (https://github.com/psf/black)

# Pythonic Code

## Creating your own sequences

Delegate as much as possible to existing functionalities (encapsulation). Every sequence object in Python has the methods \_\_len\_\_ and \_\_getitem\_\_:

In [None]:
class Items:
    def __init__(self, *values):
        self._values = list(values)
        
    def __len__(self):
        return len(self._values)
    
    def __getitem(self, item):
        return self._values.__getitem(item)
        

If you need to build your own, follow these rules:
- When indexing by range, it should return an instance of the same type of the class
- Respect the semantics of a slice, like excluding the last element.

## Context Managers

Context managers consist of two magic methods: \_\_enter\_\_ and \_\_exit\_\_. The with statement call the enter method and assigns whatever it returns to the variable after as. After the code in the context, Python runs the exit method (even if there is an error or exception).

In [None]:
def stop_database():
    run("systemctl stop postgresql.service")


def start_database():
    run("systemctl start postgresql.service")


class DBHandler:
    def __enter__(self):
        stop_database()
        return self

    def __exit__(self, exc_type, ex_value, ex_traceback):
        start_database()


def db_backup():
    run("pg_dump database")


def main():
    with DBHandler():
        db_backup()

We can also build context managers using a decorator. This is useful when we don't want to add more responsabilities (the magic methods) to an existing class.

In [None]:
import contextlib

@contextlib.contextmanager
def db_handler():
    stop_database() # Everything before yield is __enter__
    yield  # Whatever is yielded here will be assigned to as ...
    start_database() # Everything after is __exit__
    
with db_handler():
    db_backup()

Using a decorator without the with is also an option. In the example below, offline_backup is runs inside a context manager. Disvantage: we don't have access to an object inside the context (with ... as ...).

In [None]:
class dbhandler_decorator(contextlib.ContextDecorator):
    def __enter__(self):
        stop_database()

    def __exit__(self, ext_type, ex_value, ex_traceback):
        start_database()

# offl
@dbhandler_decorator()
def offline_backup():
    run("pg_dump database")

## Underscores (and interfaces)

Everythin that is not strictly part on an object's interface (internal use only) should be kept prefixed with a single underscore. This is only a convention, as Python does not enforce it.

In [None]:
class Connector:
    def __init__(self, source):
        self.source = source
        self._timeout = 60
        # Common misconception: __ does not make it private
        # This is just name mangling: _<class-name.__attribute-name>
        # Do not use double underscores
        self.__timeout = 60
        
conn = Connector("postgresql://localhost")

In [None]:
conn.__dict__

In [None]:
conn._Connector__timeout

## Properties

Useful for establishing different rules for retrieving an attribute and setting an attribute.  

Good to achieve command and query separation: @property is the query and @<property_name>.setter is the command that will do something.

**Advice**: Methods should do one thing (break action and check in separate methods).

In [None]:
import re

EMAIL_FORMAT = re.compile(r"[^@]+@[^@]+\.[^@]+")


def is_valid_email(potentially_valid_email: str):
    return re.match(EMAIL_FORMAT, potentially_valid_email) is not None


class User:
    def __init__(self, username):
        self.username = username
        self._email = None

    # This is called by <user>.email (retrieval logic)
    @property
    def email(self):
        return self._email
    
    # This will be run when we do <user>.email = <new_email> (modification logic)
    @email.setter
    def email(self, new_email):
        if not is_valid_email(new_email):
            raise ValueError(f"Can't set {new_email} as it's not a valid email")
        self._email = new_email

In [None]:
u1 = User("jsmith")
u1.email = 'asas'

In [None]:
u1.email = "jsmith@g.co"

## Iterable objects

When we iterate on an object, Python calls the iter() function over it. If there is a \_\_iter\_\_ method, Python will execute it.

In [None]:
from datetime import timedelta
from datetime import date

class DateRangeIterable:
    """An iterable that contains its own iterator object."""

    def __init__(self, start_date, end_date):
        self.start_date = start_date
        self.end_date = end_date
        self._present_day = start_date

    # By returning self, Python is saying that the object is
    # an iterable, and the task of generating values is delegated
    # to __next__ 
    # Python is returning an iterator, so it will work only
    # for one loop before it is exhausted
    def __iter__(self):
        return self

    def __next__(self):
        if self._present_day >= self.end_date:
            raise StopIteration
        today = self._present_day
        self._present_day += timedelta(days=1)
        return today

If we need to use the object multiple times, we can create a **generator** every time:

In [None]:
class DateRangeContainerIterable:
    def __init__(self, start_date, end_date):
        self.start_date = start_date
        self.end_date = end_date

    def __iter__(self):
        current_day = self.start_date
        while current_day < self.end_date:
            yield current_day
            current_day += timedelta(days=1)

The first approach returns an iterator with its own \_\_next\_\_ method. The second approach creates a generator every time that \_\_iter\_\_ is called (a generator is also an iterator).

In [None]:
r0 = DateRangeIterable(date(2018, 1, 1), date(2018, 1, 5))
r1 = DateRangeContainerIterable(date(2018, 1, 1), date(2018, 1, 5))

## Sequences

If our object does not have the \_\_iter\_\_ method, the iter() function will look for a \_\_getitem\_\_ method. A sequence is an object that implements \_\_getiitem\_\_ and \_\_len\_\_. It also expects to be able to get elements one at a time, by index, starting with index 0.  

**Note**: generators are O(n) to get the nth item, but use less memory. Sequences have all the elements in memory, but they are O(1) for retrieval.

In [None]:
class DateRangeSequence:
    def __init__(self, start_date, end_date):
        self.start_date = start_date
        self.end_date = end_date
        self._range = self._create_range()
        
    # Notice that we don't have yield this time
    # The method returns the whole sequence
    def _create_range(self):
        # Using a list here allows us to use several properties
        # without explicit definition (like negative indexes)
        days = []
        current_day = self.start_date
        while current_day < self.end_date:
            days.append(current_day)
            current_day += timedelta(days=1)
        return days

    def __getitem__(self, day_no):
        return self._range[day_no]

    def __len__(self):
        return len(self._range)

## Callable objects

The magic method \_\_call\_\_ will be called wjen we try to execute our object as a regular function.  
**Advantage**: objects have states, so we can maintain information across calls (functions with memory).

In [None]:
from collections import defaultdict

class CallCount:

    def __init__(self):
        self._counts = defaultdict(int)

    def __call__(self, argument):
        self._counts[argument] += 1
        return self._counts[argument]

## Caveats

These are probably never justified, so refactor them when found.

**Mutable default arguments**  
In the example below, the call works only on the first time.

In [None]:
def func(a=list([1,2,3])):
    a.remove(1)
    return a

In [None]:
func()

Do this instead:

In [None]:
def func(a=None):
    a= [1,2,3]
    
    return a

In [None]:
func()

**Extending built-in types**

Instead of using inheritance, use the collections module.

In [None]:
from collections import UserList

class GoodList(UserList):
    def __getitem__(self, index):
        value = super().__getitem__(index)
        if index % 2 == 0:
            prefix = "even"
        else:
            prefix = "odd"
        return f"[{prefix}] {value}"

# General Traits of Good Code

## Design by Contract

**Preconditions**: this goes besides cheking types (like when using MyPy). Functions should have a proper validation for the information that is is going to handle.  

Important question: where should we place the validation logic? Should the client validate all the data before calling the function? Or should the function itself validate is?

**Postconditions**: check and validate for everything that a client might need.

**Pythonic way**: not very well defined (PEP is deferred). Maybe using exceptions like RuntimeError or ValueError.

## Defensive Programming / Error Handling

**Value Substitution**  
Replace the wrong value with something else.  
Trade-off between robustness and correctness.   
Not recommended for critical applications.  
Good use: default value when data is not provided.

**Exception Handling**  
Raise exceptions when there is actually something wrong with the code that callers need to be aware of. DO NOT use as a go-to for business logic.  

*Tip*: If a function raises too many exceptions, maybe it has too many responsabilities and it needs to be broken into smaller functions.  

Observations: be careful about exposing tracebacks (privacy and security). Be specific when specifying except blocks (avoid empty except blocks).

Example using logger:

In [None]:
def connect_with_retry(connector, retry_n_times, retry_threshold=5):
    """Tries to establish the connection of <connector> retrying
    <retry_n_times>.

    If it can connect, returns the connection object.
    If it's not possible after the retries, raises ConnectionError

    :param connector: An object with a `.connect()` method.
    :param retry_n_times int: The number of times to try to call
                                ``connector.connect()``.
    :param retry_threshold int: The time lapse between retry calls.

    """
    for _ in range(retry_n_times):
        try:
            return connector.connect()
        except ConnectionError as e:
            logger.info(
                "%s: attempting new connection in %is", e, retry_threshold
            )
            time.sleep(retry_threshold)
    exc = ConnectionError(f"Couldn't connect after {retry_n_times} times")
    logger.exception(exc)

Using the syntax below, the message retains the original exceptions ("The above exception was the direct cause of the following exception"):

In [None]:
class InternalDataError(Exception):
    """An exception with the data of our domain problem."""


def process(data_dictionary, record_id):
    try:
        return data_dictionary[record_id]
    except KeyError as e:
        raise InternalDataError("Record not present") from e

In [None]:
process(dictionary, 'z')

## Separation of Concerns

- Different responsabilities should go into different components, layers, or modules of the application.  
- If we have to modify or refactor some part of the code that has to have minimal impact on the rest of the application.
- Well-defined software will achieve high cohesion and low coupling.

## Acronyms to Live By

**DRY / OAOO**  
**Don't repeat yourself / Once and Only Once**  
One of the causes is doing computations on the fly, and not registering that operation as knowledge and assigning it to a variable, method, etc.  

**YAGNI**  
**You Ain't Gonna Need It**  
Focus on the current requirements and don't try to guess what will be needed in the future. 

**KIS**  
**Keep It Simple**  
Similar to YAGNI.

**EAFP / LBYL**  
**Easier to Ask Forgiveness than Permission / Look Before You Leap**  
EAFP: perform the action and take care of the consequences in case it doesn't work. This typically means catching exceptions.  
LBYL: checking before doing the action (using an if before opening a file, for example).  

For Python, prefer EAFP (this is more explicit, as it raises the error instead of just not working).



## Composition and Inheritance

**Problem with inheritance**: every time that we extend a base class, we are creating a new one that is coupled with the parent (remember that low coupling is better). Inheritance should not be used only to reuse code.

**When inheritance is a good decision** (specialization)
- Good heuristic: are we going to use all the inherited methods? 
- If not, the superclass might have too much responsability. Or it could mean that the subclass is not a proper specialization.
- **Very good use**: designing interfaces. This enforces that all the subclasses will have the necessary methods (think scikit-learn).
- Another good use: defining new exceptions.  

## Arguments in Functions and Methods

- If a function needs too many parameters to work properly, consider it a code smell.
- Work with immutable objects, and avoid side effects as much as possible.

Be careful when modifying mutable arguments in functions. In general, it might be a good idea to not change parameters at all. The changes will persist outside of the scope of the function:

In [94]:
def function(argument):
    argument += " added"
    print(argument)

In [95]:
mutable = list("hello")

In [96]:
function(mutable)

['h', 'e', 'l', 'l', 'o', ' ', 'a', 'd', 'd', 'e', 'd']


In [97]:
mutable

['h', 'e', 'l', 'l', 'o', ' ', 'a', 'd', 'd', 'e', 'd']

Use unpacking whenever possible:

In [None]:
USERS = [(i, f"first_name_{i}", f"last_name_{i}") for i in range(5)]
[first_name for (id, first_name, last_name) in USERS]

In [None]:
def function(a, b):
    return a+b

function(**{'a':10, 'b':20})

When defining a function, the opposite happens. Aruments will be collected by \*args and named variables will be collected in a dicationary kwargs.

In [None]:
def function(a, *args, **kwargs):
    print(a)
    print(args)
    print(kwargs)

b=2
c=3
function(1, b, c, d=4, e=5)

## Final Remarks

**Orthogonality**
- Orthogonality: Changing a module, class, or function should have no impact on the outside world to that component that is being modified 
- Always try to minimize side-effects on your code
- Orthogonality makes unit testing easier

**Structuring the code**  
- Having large files with lots of definitions is bad practice.
- Use packages (\_\_init__.py)
- Use a file with constants and import from it:

In [None]:
from mypoject.constants import CONNECTION_TIMEOUT

# The SOLID Principles

## Single responsability principle

- SRP states that a software component (in general, a class) must have only one responsability.
- Tip: classes should be designed so that most of their properties and attributes are used by its methods, most of the time.
- If you find methods that are mutually exclusive or do not relate to each other, they should be broke down into smaller classes. In other words, orthogonal methods should not be on the same class.

## Open/Closed Principle

- Code should be open for extension but closed for modification.
- When something new appears on the domain problem, we only want to add new code, nor modify existing code.

In [93]:
## Good example of how to use polymorphism for extension

class Event:
    def __init__(self, raw_data):
        self.raw_data = raw_data

    @staticmethod
    def meets_condition(event_data: dict):
        return False

class UnknownEvent(Event):
    """A type of event that cannot be identified from its data"""

class LoginEvent(Event):
    @staticmethod
    def meets_condition(event_data: dict):
        return (
            event_data["before"]["session"] == 0
            and event_data["after"]["session"] == 1
        )

class LogoutEvent(Event):
    @staticmethod
    def meets_condition(event_data: dict):
        return (
            event_data["before"]["session"] == 1
            and event_data["after"]["session"] == 0
        )

class SystemMonitor:
    """Identify events that occurred in the system."""

    def __init__(self, event_data):
        self.event_data = event_data

    def identify_event(self):
        # to add new types of events, we just need to inherit
        # from the Event class and have an implementation
        # of meets_condition
        for event_cls in Event.__subclasses__():
            try:
                if event_cls.meets_condition(self.event_data):
                    return event_cls(self.event_data)
            except KeyError:
                continue
        return UnknownEvent(self.event_data)

Observation on the meaning of @classmethod and @staticmethod

In [84]:
class Test:
    
    def method(self):
        print(self) 
        
    @classmethod
    def class_method(cls):
        print(cls)
        
    @staticmethod
    def static_method():
        print("no arguments needed")

The regular methods pass a reference to the instance as an argument:

In [85]:
test_instance = Test()

In [86]:
test_instance.method()

<__main__.Test object at 0x0000022DC51C98D0>


In [92]:
Test.method(test_instance)

<__main__.Test object at 0x0000022DC51C98D0>


Using the classmethod decorator, we passa a reference to the class:

In [87]:
test_instance.class_method()

<class '__main__.Test'>


In [88]:
Test.class_method()

<class '__main__.Test'>


Using the staticmethod decorator, we don't pass any argument by default (this works like a regular function).

## Liskov's Substition Principle

- A client should be able to use any of its subtypes indistinguishably, ithout even noticing. In the example above, we can use the event subclasses instead of the base class.
- A good class must define a clear and concise interface, and as long as a subclass honor that interface, the program will remain correct.
- Some vioations (like incorrect datatypes and incompatible method signatures) can be detected using Mypy and Pylint.
- Remember: the parent class defines a contract with its clients. Subclasses of this one must respect the contract as well.
- LSP emphasizes polymorphism and contributes to OCP!

## Interface Segregation

- An **Interface** is represented by the set of method an object exposes. 
- Idea behind duck typing: any object is really represented by the methods it has, and by what it is capable of doing. 
- **Interface segregation principle**: interfaces should be small.

## Dependency Inversion

- Make you code independent of things that are fragile, volatile, or out of your control. Our code should depende on interfaces, and not on concrete implementations.
- This can involve creating an abstract base class, to serve as interface bewteen our main class and targets. In Python, duck typing makes this more flexible. Just by having certain methods that are required to interact with the main class could be enough (if it quacks like the interface, it does not necessarily need to inherit from it).

# Using Decorators to Improve Our Code

For syntax details about decorators, see the **decorators notebook**. 

Some **good uses** for decorators:  
- Transforming / validating parameters
- Tracing code / logging execution of a function and its parameters
- Monitor metrics like time and cpu usage
- Implementing retry operations
- Moving repetitive logic to decorators

**Common mistake**: changing the function's properties like \_\_name\_\_. To avoid this, use **@wraps**. Good template:

In [None]:
def decorator(original_function):
    @wraps(original_function)
    def decorated_function(*args, **kwargs):
        # modifications done by the decorator ...
        return original_function(*args, **kwargs)

    return decorated_function

# Descriptors

A descriptor is an instance of a class that implements the descriptor protocol. It must have at least one of the following methods:
- \_\_get\_\_
- \_\_set\_\_
- \_\_delete\_\_
- \_\_set_name\_\_

The descriptor object has to be defined as a class attribute.

## \_\_get\_\_

If we simply define an Attribute as a class instance, nothing special happens:

In [7]:
class Attribute:
    value = 42
    
class Client:
    attribute = Attribute()

In [8]:
Client().attribute

<__main__.Attribute at 0x1b52e806e48>

In [9]:
Client().attribute.value

42

When using a descriptor, the situation is different. When we call the class attribute (the Descriptor instance), the \_\_get\_\_ is called instead:

In [48]:
class DescriptorClass:
    def __get__(self, instance, owner):
        print('This is the __get__ method!')
        print(f"self:{self}")
        print(f"instance:{instance}")
        print(f"owner:{owner}")
        
class Client:
    descriptor = DescriptorClass()

In [49]:
# Notice that we are not getting an instance of DescriptorClass
# as our response. Besides, self refers to the instance
# of DescriptorClass, instance referes to the instance o Client
# and owner referes to the class Client
Client().descriptor

This is the __get__ method!
self:<__main__.DescriptorClass object at 0x000001B52ED27080>
instance:<__main__.Client object at 0x000001B52ED27390>
owner:<class '__main__.Client'>


In [50]:
Client.descriptor

This is the __get__ method!
self:<__main__.DescriptorClass object at 0x000001B52ED27080>
instance:None
owner:<class '__main__.Client'>


## \_\_set\_\_

This method is called when we try to assign something to a descriptor. Be careful: if \_\_set\_\_ is not implemented, the assignment will override the descriptor. In this example, we implement a validation method with a descriptor:

In [51]:
def is_greater_10(x):
    if x > 10:
        return True
    return False

In [52]:
class Validator:
    
    def __init__(self, validation):
        self.validation = validation
        
    def __get__(self, instance, owner):
        return self.validation
    
    def __set__(self, instance, value):
        if not self.validation(value):
            raise ValueError(f"{value} is not greater than 10")
        instance.__dict__[self._name]

In [53]:
Client().descriptor

This is the __get__ method!
self:<__main__.DescriptorClass object at 0x000001B52ED27080>
instance:<__main__.Client object at 0x000001B52ED37F98>
owner:<class '__main__.Client'>
