# Docstring and Typing

> ## If the Implementation is hard to explain, it's a bad idea.
> ## If the Implementation is easy to explain, it may be a good idea.

_[The Zen of Python](https://www.python.org/dev/peps/pep-0020/)_

The Zen of Python is a set of eight short principles by Tim Peters. Even though it looks like an internal easter egg, it is actually a very important principle. In this paragraph, Tim Peters talks about the importance of how your implementation should be in terms of explanation and readability.

One great way to realize your code is hard (or easy) to explain is by defining a docstring. Docstrings are a way to document your code. They are a way to explain your code to the user or to another developer. 

Thus, when you are writing your code, you should think about how you can explain your code to the user or to another developer. Is it hard to explain? Then, the implementation is a bad idea. Is it easy to explain? Good job!

Another way to document your code is by strict documentation of your program, but we will see that later in this module.

Apart from documentation, you should also think about the type of your variables. For example, if your function is intended to work with a list of numbers, then you can specify that in both the docstring and the function arguments.

Before moving on, run the following code to download a couple of scripts that will be necessary for the lesson

In [None]:
!wget "https://aicore-files.s3.amazonaws.com/Foundations/Software_Engineering/spreadsheet_printer.py" "https://aicore-files.s3.amazonaws.com/Foundations/Software_Engineering/Date.py"

# Docstrings

> ## Code tells you how, comments tell you why.

By now, the code you have been working with should have comments on what a piece of code is doing. Commenting has many purposes:
- Describing sections of your code
- Use of algorithms that, by only seeing the code, might be difficult to notice
- Tagging: Probably one of the most important, tagging is used to mark a section of the code (usually) as incomplete. Typical tags are BUG, FIXME, and (my favourite) TODO

In [1]:
my_list = [1, 2, 3]
# TODO: Check the length of this list 
length = 3
# FIXME: Use the len() function
length = len(my_list)


Docstrings are another way to add comments, but they are more concrete and targeted to functions, methods, classes, modules, or even packages (as we will see later)

Docstrings can be checked using the `__doc__` attribute, or using the help() built-in function

In [8]:
import pandas as pd
# help(pd)
print(pd.__doc__)


pandas - a powerful data analysis and manipulation library for Python

**pandas** is a Python package providing fast, flexible, and expressive data
structures designed to make working with "relational" or "labeled" data both
easy and intuitive. It aims to be the fundamental high-level building block for
doing practical, **real world** data analysis in Python. Additionally, it has
the broader goal of becoming **the most powerful and flexible open source data
analysis / manipulation tool available in any language**. It is already well on
its way toward this goal.

Main Features
-------------
Here are just a few of the things that pandas does well:

  - Easy handling of missing data in floating point as well as non-floating
    point data.
  - Size mutability: columns can be inserted and deleted from DataFrame and
    higher dimensional objects
  - Automatic and explicit data alignment: objects can be explicitly aligned
    to a set of labels, or the user can simply ignore the labels and

In this case, we saw the docstring of the pandas module, but you can also check docstrings of its methods

In [9]:
print(pd.DataFrame.from_dict.__doc__)


        Construct DataFrame from dict of array-like or dicts.

        Creates DataFrame object from dictionary by columns or by index
        allowing dtype specification.

        Parameters
        ----------
        data : dict
            Of the form {field : array-like} or {field : dict}.
        orient : {'columns', 'index'}, default 'columns'
            The "orientation" of the data. If the keys of the passed dict
            should be the columns of the resulting DataFrame, pass 'columns'
            (default). Otherwise if the keys should be rows, pass 'index'.
        dtype : dtype, default None
            Data type to force, otherwise infer.
        columns : list, default None
            Column labels to use when ``orient='index'``. Raises a ValueError
            if used with ``orient='columns'``.

        Returns
        -------
        DataFrame

        See Also
        --------
        DataFrame.from_records : DataFrame from structured ndarray, sequence
          

You can create a docstring by giving a description of the functionality of the object. But remember that you have to use three quotation marks (''' docstring ''') or triple double quotations (""" docstring """)

In [10]:
def say_hi(name):
    # This function says hi to the user
    print(f"Hello {name}")

help(say_hi)

Help on function say_hi in module __main__:

say_hi(name)



In [11]:
def say_hi(name):
    """ This function says hi to the user """
    print(f"Hello {name}")

help(say_hi)

Help on function say_hi in module __main__:

say_hi(name)
    This function says hi to the user



A regular comment does not work!

The convention for docstring can be found in the [PEP257 ](https://www.python.org/dev/peps/pep-0257/). But don't go there yet! You won't find how to write docstring, but rather, the rules to follow when writing one. You will see that docstrings can be one-line docstrings, as the one we saw in the `say_hi` function, or multi-line docstrings, which are more descriptive.

## Multi-line Docstrings

The structure of a mulit-line docstring is:
- One-line summary
- An empty line
- An elaborated description

In [14]:
def say_hi(name):
    """
    This function says hi to the user

    The purpose of this function is to demonstrate how to document
    a function following the convention established in the PEP257.
    It actually does not do much, and I am writing this to fill
    the docstring... Lorem ipsum dolor sit amet.
    """
    print("Hello {}".format(name))

help(say_hi)

Help on function say_hi in module __main__:

say_hi(name)
    This function says hi to the user
    
    The purpose of this function is to demonstrate how to document
    a function following the convention established in the PEP257.
    It actually does not do much, and I am writing this to fill
    the docstring... Lorem ipsum dolor sit amet.



## Docstring for Classes

So far, we have seen only docstrings for functions, but as mentioned, we can also see docstrings for classes.

They follow the same principle as the docstrings for functions, but there are a few more rules: 
- The docstring should be the first thing in the class definition.
- Each method should have a docstring. This is excluded if the method is private.
- There is no clear consensus on whether the `__init__` method should have a docstring. However, many frameworks refer to the class docstring when defining the `__init__` method docstring.

In [19]:
class Date:
    '''
    This class is used to represent a date.

    Attributes:
        year (int): The year of the date.
        month (int): The month of the date.
        day (int): The day of the date.
    '''
    def __init__(self, year: int, month: int, day: int):
        '''
        See help(Date) for accurate signature
        '''
        self.year = year
        self.month = month
        self.day = day

    def __str__(self):
        '''
        This function is used to return the string representation of the date.

        Returns:
            str: The string representation of the date.
        '''
        return "{0}-{1}-{2}".format(self.year, self.month, self.day)

    def __repr__(self):
        '''
        This function is used to return the string representation of the date.

        Returns:
            str: The string representation of the date.
        '''
        return "{0}-{1}-{2}".format(self.year, self.month, self.day)

    def __eq__(self, other):
        '''
        This function is used to compare the date with other date.

        Args:
            other (Date): The other date to be compared with.

        Returns:
            bool: True if the date is equal to the other date, False otherwise.
        '''
        return self.year == other.year and self.month == other.month and \
            self.day == other.day

    def __lt__(self, other):
        '''
        This function is used to compare the date with other date.

        Args:
            other (Date): The other date to be compared with.

        Returns:
            bool: True if the date is less than the other date, False otherwise.
        '''
        if self.year < other.year:
            return True
        elif self.year == other.year:
            if self.month < other.month:
                return True
            elif self.month == other.month:
                if self.day < other.day:
                    return True
        return False
        
    
    @staticmethod
    def is_date_valid(year, month, day):
        '''
        This function is used to check if the date is valid.

        Args:
            year (int): The year of the date.
            month (int): The month of the date.
            day (int): The day of the date.

        Returns:
            bool: True if the date is valid, False otherwise.
        '''
        return year >= 0 and month >= 1 and month <= 12 and \
            day >= 1 and day <= 31

    @classmethod
    def from_string(cls, date_as_string):
        '''
        This function is used to create a date from a string.

        Args:
            date_as_string (str): The string representation of the date.

        Returns:
            Date: The date created from the string.
        '''
        year, month, day = map(int, date_as_string.split('-'))
        return cls(year, month, day)

In [20]:
help(Date)

Help on class Date in module __main__:

class Date(builtins.object)
 |  Date(year: int, month: int, day: int)
 |  
 |  This class is used to represent a date.
 |  
 |  Attributes:
 |      year (int): The year of the date.
 |      month (int): The month of the date.
 |      day (int): The day of the date.
 |  
 |  Methods defined here:
 |  
 |  __eq__(self, other)
 |      This function is used to compare the date with other date.
 |      
 |      Args:
 |          other (Date): The other date to be compared with.
 |      
 |      Returns:
 |          bool: True if the date is equal to the other date, False otherwise.
 |  
 |  __init__(self, year: int, month: int, day: int)
 |      See help(Date) for accurate signature
 |  
 |  __lt__(self, other)
 |      This function is used to compare the date with other date.
 |      
 |      Args:
 |          other (Date): The other date to be compared with.
 |      
 |      Returns:
 |          bool: True if the date is less than the other date, Fa

Cool, isn't it? This seems a little bit tedious, but it will pay off in the future!

## Docstrings in Modules and Packages

Docstrings can also be included at the beginning of a module or inside a package containing multiple modules. The principle for both are the same, so here we will just show the syntax for the module level docstring.

`Date.py` contains the same class as we have above, plus a function for displaying the date. If we import that into this namespace, we can check the documentation as well.

In [21]:
import Date
help(Date)

Help on module Date:

NAME
    Date

DESCRIPTION
    This module contains a class for representing the date.
    It also contains a function for printing the date in a
    readable format.

CLASSES
    builtins.object
        Date
    
    class Date(builtins.object)
     |  Date(year: int, month: int, day: int)
     |  
     |  This class is used to represent a date.
     |  
     |  Attributes:
     |      year (int): The year of the date.
     |      month (int): The month of the date.
     |      day (int): The day of the date.
     |  
     |  Methods defined here:
     |  
     |  __eq__(self, other)
     |      This function is used to compare the date with other date.
     |      
     |      Args:
     |          other (Date): The other date to be compared with.
     |      
     |      Returns:
     |          bool: True if the date is equal to the other date, False otherwise.
     |  
     |  __init__(self, year: int, month: int, day: int)
     |      See help(Date) for accu

We will see how to document a package later in this module. Just rememeber that it will be in the `__init__.py` file.

## Docstring for Command Line Interface

Sometimes you intend that your program is ran in the command line, and if that's the case, you will use arguments to pass to your program. The `spreadsheet_printer.py` program is a simple program that takes a file name as an argument and prints the contents of that file to the screen. You can see that you can check the docstring in the command line by typing `python spreadsheet_printer.py -h` and you will see that it has a description of what the program does.

# Docstring Formats

Have you noticed that, during this notebook and in some of the examples given, the format was different? 

There are specific docstring formats that most users are familiar with. Also, these formats can be used to help docstring parsers to create your documentation in an automatic fashion (Sphinx). Some of the most common formats are:

- [Google](https://google.github.io/styleguide/pyguide.html)
- [Sphinx or reStructuredText](http://sphinx-doc.org/markup/desc.html)
- [Numpydoc](https://numpydoc.readthedocs.io/en/latest/format.html)
- [Epytext](https://epytext.readthedocs.io/en/latest/format.html)

As an example, let's look at the how the `spreadsheet_printer` module is documented using each one of these formats.

In [None]:
### Google

"""Gets and prints the spreadsheet's header columns

Args:
    file_loc (str): The file location of the spreadsheet
    print_cols (bool): A flag used to print the columns to the console
        (default is False)

Returns:
    list: a list of strings representing the header columns
"""

In [None]:
### Sphinx 

"""Gets and prints the spreadsheet's header columns

:param file_loc: The file location of the spreadsheet
:type file_loc: str
:param print_cols: A flag used to print the columns to the console
    (default is False)
:type print_cols: bool
:returns: a list of strings representing the header columns
:rtype: list
"""


In [None]:
### NumPy

"""Gets and prints the spreadsheet's header columns

Parameters
----------
file_loc : str
    The file location of the spreadsheet
print_cols : bool, optional
    A flag used to print the columns to the console (default is False)

Returns
-------
list
    a list of strings representing the header columns
"""


In [None]:
### Epytext
"""Gets and prints the spreadsheet's header columns

@type file_loc: str
@param file_loc: The file location of the spreadsheet
@type print_cols: bool
@param print_cols: A flag used to print the columns to the console
    (default is False)
@rtype: list
@returns: a list of strings representing the header columns
"""


# Typing

Typing is another way to comment your code. Hoewever in this case, you are not explaining the code, but rather the intended data type of your variables.

Just like comments, you don't have to follow the directions of typing hints, but if they are there, it's because they are important!

We have been seeing type hinting from the beginning of this notebook. If you didn't notice, good! That means your eye is getting used to type hinting! Let's see some examples using functions. The syntax for type hinting is:

`def function_name(parameter_name: type) -> return_type:`

In [25]:
import requests
from bs4 import BeautifulSoup
def get_html(url: str) -> BeautifulSoup:
    """
    Get the HTML of a URL
    
    Parameters
    ----------
    url : str
        The URL to get the HTML of
    
    Returns
    -------
    str
        The HTML of the URL
    """
    r = requests.get(url)
    if r.status_code == 200:
        return BeautifulSoup(r.text, 'html.parser')
    else:
        return None

Looks fine right? We can see in the typing that we can pass a string to the function, and it will return a BeautifulSoup object.

But wait, what if we don't get a good response? We would return None. The typing library can help us defining multiple types for our function.

In [1]:
import requests
from bs4 import BeautifulSoup
from typing import Union

def get_html(url: str) -> Union[BeautifulSoup, None]:
    """
    Get the HTML of a URL
    
    Parameters
    ----------
    url : str
        The URL to get the HTML of
    
    Returns
    -------
    str
        The HTML of the URL
    """
    r = requests.get(url)
    if r.status_code == 200:
        return BeautifulSoup(r.text, 'html.parser')
    else:
        return None

in this case, we are telling Python that we are expecting either a BeatifulSoup object or a None. This can be actually simplified with the Optional type:

In [15]:
import requests
from bs4 import BeautifulSoup
from typing import Optional
import typing as t

def get_html(url: str) -> Optional[BeautifulSoup]:
    """
    Get the HTML of a URL
    
    Parameters
    ----------
    url : str
        The URL to get the HTML of
    
    Returns
    -------
    str
        The HTML of the URL
    """
    r = requests.get(url)
    if r.status_code == 200:
        return BeautifulSoup(r.text, 'html.parser')
    else:
        return None

The typing library has multiple ways to specify types. The most common ones are:

- [`typing.Any`](https://docs.python.org/3/library/typing.html#typing.Any): Essentially, a wildcard.
- [`typing.Callable`](https://docs.python.org/3/library/typing.html#typing.Callable): A function or method. 
- [`typing.Union`](https://docs.python.org/3/library/typing.html#typing.Union): A type that can be one of several types. Union[type1, type2, ...]
- [`typing.Optional`](https://docs.python.org/3/library/typing.html#typing.Optional): A type that can be None. Optional[type]
- [`typing.Tuple`](https://docs.python.org/3/library/typing.html#typing.Tuple): A type that can be a tuple of types. Tuple[type1, type2, ...]
- [`typing.List`](https://docs.python.org/3/library/typing.html#typing.List): A type that can be a list of types. List[type1, type2, ...]

Apart from the types in the typing library, there are also some more specific types that are useful for writing tests:

- str: A string.
- int: An integer.
- float: A floating point number.
- bool: A boolean.
- None: A value that can be None.
- list: A list.

## Typecheckers: mypy

There are multiple modules that checks that your code is using the correct types. For example mypy, pytype, pyright, or pyre.

This would not make much sense in a language like Java where types are statically defined.

In this case, we are going to use mypy. You can use `mypy` to check the types of your code. First, install mypy:

`pip install mypy`

Then, you can use it to check the types of a specific file:

`mypy <filename>`

It will return a list of errors.

There are some libraries that are not included in the objects detected by mypy, in those cases you can create your own stubs. But if you don't want to spend time on that you can include the following after importing the library:

`# type: ignore`

In [16]:
from bs4 import BeautifulSoup # type: ignore

# Pydantic

Using type hints doesn't enforce the user to use the specified type. 

> <font size=+1>[Pydantic](https://pydantic-docs.helpmanual.io/) enforces the user to use the arguments to pass to the model or function we create with this library.</font>

Install it using:


In [17]:
!pip install pydantic



Or

In [None]:
!conda install pydantic -c conda-forge

The most basic usage of Pydantic is through models, which are classes that inherits from `BaseModel`, and we can create a class the same way we used the `dataclass` decorator.

In [7]:
class Person(BaseModel):
    name: str
    age: int
    role: str

In [19]:
'''
This is a Temperature module
it contains the Temperature class, which allows you to: 
 - Set a Temperature in either Degrees Celsius or Farenheit 
- Convert a Temperature between Degrees Celsius or Farenheit 
- Set a Temperatue to 0 
- Check if a Temperature is valid between -273 and 3000 
'''
from pydantic import BaseModel
from pydantic import validate_arguments


class Temperature(BaseModel): 
    ''' 
    This is the intialisation function featuring class decorator @dataclass
    Attribute : 
        heat_level(float): the heat_level represented in Degrees Celsius 
    '''
    heat_level : float


    def temp_f_convert(self):
        '''
        Function to convert a temperature from Degrees Celsius to Farenheit 
        Returns
            ----------
            float:
             The heat_level in Farenheit  
        '''
        temp_f = round((float(1.8 * self.heat_level) + 32))
        print("Temperature converted from Celsius to Farenheit")
        return temp_f


    @staticmethod
    @validate_arguments
    def temp_c_convert(temp_far:float):
        '''
        Function to convert a Temperature from Farenheit to Degrees Celsius 
        Argument:
            ...........
        temp_cels : Takes a temperature in Farenheit and outputs it as Celsius
        Returns:
            ..........
            str:  
                String representation of float variable  temp_cels + '°C' 
        '''    
        temp_cels = round(float((temp_far - 32) / 1.8),2)
        return str(temp_cels) 

    
    @staticmethod
    def is_temp_valid(check_temp:float):
        '''  
        Function to check if a temperature is valid 
        Returns
            .......
            Bool: True if conditions are met 
        '''
        if 3000 >= check_temp >= -273:
            return True
        else:
            return False 

          
    @classmethod
    @validate_arguments  
    def new_temp_f(cls, temp_fh:float): # farenheit 
        '''   
        Function to create a new instance of the Temperature Class 
        Args: 
            .......
            float
                temp_fh : The temperature in Farenheit 
        Returns: 
            .......
            A new instance of the Temperature Class in Celsius  
        '''
        temp_in_celsius = cls.temp_c_convert(temp_fh) # calls the staticmethod / function. Go to the class, go to the method, input the number into it.
        return cls(heat_level=temp_in_celsius)

    @classmethod
    def standard(cls): 
        ''' Function to set a temperature to zero using a new instance of the class 
            Attribute:
                .........
                int
                    temp_c = The temperature in Celsius
           '''
        standard_temp = 0 
        return cls(heat_level=standard_temp) 

Kettle = Temperature(heat_level=40.0) 
print(Kettle)
Bath = Kettle.temp_f_convert()
print(Bath)
print(Temperature.temp_c_convert(36.1))
print(Temperature.is_temp_valid(600))
print(Temperature.new_temp_f(40.1))

heat_level=40.0
Temperature converted from Celsius to Farenheit
104
2.28
True
heat_level=4.5


We just created a class with three attributes, and we are __forcing__ the user to use those types when creating the class

In [19]:
michael = Person(name='Michael Scott', age=46, role='Regional Manager')

Nothing is wrong, but if we force it to have a different type:

In [20]:
dwight = Person(name='Dwight Schrute', age='Thirty Six', role='Beet farmer')

ValidationError: 1 validation error for Person
age
  value is not a valid integer (type=type_error.integer)

Pydantic prevents us from defining that variable. One cool feature is that Pydantic will try to cast the arguments you pass to the type you specified in the class definition. Here, for age we are passing a string (containing a number):

In [5]:
jim = Person(name='Jim Halpert', age='36', role='Sales Representative')
type(jim.age)

int

Quite convenient isn't it?

Additionally, Pydantic's models have quite useful methods to check the values of the attributes:

In [21]:
print(jim.dict())
print(jim.json())

{'name': 'Jim Halpert', 'age': 36, 'role': 'Sales Representative'}
{"name": "Jim Halpert", "age": 36, "role": "Sales Representative"}


You can check more methods on the Pydantic [documentation](https://pydantic-docs.helpmanual.io/usage/models/)

Alternatively, if you are more comfortable working with decorators, you can achieve the same result using the dataclass decorator from pydantic. Also, you can use the same type hints class we saw above.

In [22]:
from pydantic.dataclasses import dataclass
from typing import Optional

@dataclass
class Person:
    name: str
    age: int
    role: Optional[str] = None

pam = Person(name='Pam Beesly', age='36')
print(pam.role)

None


Lastly, if you want to use pydantic checks on a function, you can decorate it using the `validate_arguments` decorator:

In [23]:
from pydantic import validate_arguments

@validate_arguments
def say_my_name(x: str):
    if x == 'Heisenberg':
        print("You're goddamn right")
    else:
        print(x)
        print(type(x))

say_my_name(3) # This will cast 3 as a string
say_my_name(['Jesse', 'Walter', 'Heisenberg']) 

3
<class 'str'>


ValidationError: 1 validation error for SayMyName
x
  str type expected (type=type_error.str)

Modern libraries are starting to use Pydantic as a way to meet some user standards. For example, FastAPI uses these models for defining what is to be expected from requests.

# Summary

- Comments are a great way to document code. But you need more than that to create a good documentation.
- Thus, we need to rely on Docstring and Typing.
- Docstring is a string that describes the purpose of a function, method, class, script, module, or package.
- Typing is a way to check the type of a variable.
- You can check that your code has the correct types using `mypy`.
- You can enforce the user to user certain type of data using Pydantic