### The Importance of the Standard Library
Python is often described as coming with "batteries included," which is usually a reference to its standard library. The Python standard library is vast, unlike any other language in the tech world. The Python standard library includes modules to connect to a socket; that is, one to send emails, one to connect to SQLite, one to work with the locale module, or one to encode and decode JSON and XML.

It is also renowned for including such modules as turtle and tkinter, graphical interfaces that most users probably don't use anymore, but they have proven useful when Python is taught at schools and universities.

It even includes IDLE, a Python-integrated development environment, it is not widely used as there are either other packages within the standard library that are used more often or external tools to substitute them. These libraries are divided into high-level modules and lower-level modules:

#### High-Level Modules
The Python standard library is truly vast and diverse, providing a toolbelt for the user that can be used to write most of their trivial programs. You can open an interpreter and run the following code snippet to print graphics on the screen. This can be executed on the Python terminal. The code mentioned here is with the >>> symbol:

#>>> from turtle import Turtle, done
#>>> turtle = Turtle()
#>>> turtle.right(180)
#>>> turtle.forward(100)
#>>> turtle.right(90)
#>>> turtle.forward(50)
#>>> done()

This code uses the turtle module which can be used to print the output on the screen. This output will look like the trail of a turtle that follows when the cursor is moved. The turtle module allows the user to interact with the cursor and leave a trail as it keeps moving. It has functions to move around the screen and print as it advances.

Here is a detailed explanation of the turtle module code snippet:

It creates a turtle in the middle of the screen.
It then rotates it 180 degrees to the right.
It moves forward 100 pixels, painting as it walks.
It then rotates to the right once again, this time by 90 degrees.
It then moves forward 50 pixels once again.
It ends the program using done().

You can go ahead and explore and input different values, playing around a bit with the turtle module and checking the different outputs you get, before you dive further into this chapter.

The turtle module you worked on is an example of one of the high-level modules that the standard library offers.

Other examples of high-level modules include:

- Difflib: To check the differences line by line across two blocks of text.
- Re: For regular expressions, which will be covered in Being Pythonic course.
- Sqlite3: To create and interact with SQLite databases.
- Multiple data compressing and archiving modules, such as gzip, zipfile, and tarfile.
- XML, JSON, CSV, and config parser: For working with multiple file formats.
- Sched: To schedule events in the standard library.
- Argparse: For the straightforward creation of command-line interfaces.
Now, you will use another high-level module argparse as an example and see how it can be used to create a command-line interface that echoes words passed in and, optionally, capitalizes them in a few lines of code. This can be executed in the Python terminal:

*>>> import argparse
#>>> parser = argparse.ArgumentParser()
#>>> parser.add_argument("message", help="Message to be echoed")
#>>> parser.add_argument("-c", "--capitalize", action="store_true")
#>>> args = parser.parse_args()
#>>> if args.capitalize:
        print(args.message.capitalize())
    else:
        print(args.message)

This code example creates an instance of the ArgumentParser class, which helps you to create command-line interface applications.

It then defines two arguments in lines 3 and 4: message and capitalize.

Note that capitalize can also be referred to as -c, and we make it a Boolean flag option by changing the default action to store_true. At that point, you can just call parse_args, which will take the arguments passed in the command line, validate them, and expose them as attributes of args.

The code then takes the input message and chooses whether to capitalize it based on the flag.

#### Lower-Level Modules
The standard library also contains multiple lower-level modules that users rarely interact with. These lower-level modules are outside that of the standard library. Good examples are the different internet protocol modules, text formatting and templating, interacting with C code, testing, serving HTTP sites, and so on. The standard library comes with low-level modules to satisfy the needs of users in many of those scenarios, but you will usually see Python developers relying on libraries such as jinja2, requests, flask, cython, and cffi that are built on top of the low-level standard library module as they provide a nicer, simpler, more powerful interface. It is not that you cannot create an extension with the C API or ctypes, but cython allows you to remove a lot of the boilerplate, whereas the standard library requires you to write and optimize the most common scenarios.

Finally, there is another type of low-level module, which extends or simplifies the language. Notable examples of these are the following:

- Asyncio: To write asynchronous code
- Typing: To type hinting
- Contextvar: To save state based on the context
- Contextlib: To help with the creation of context managers
- Doctest: To verify code examples in documentation and docstrings
- Pdb and bdb: To access debugging tools

There are also modules such as dis, ast, and code that allow the developer to inspect, interact, and manipulate the Python interpreter and the runtime environment, but those aren't required by most beginner and intermediate developers.

#### Knowing How to Navigate in the Standard Library
Getting to know the standard library is key for any intermediate/advanced developer, even if you don't know how to use all the modules. Knowing what the library contains and when modules can be used provides any developer with a boost in speed and quality when developing Python applications.

While developers from other languages may try to implement everything on their own from scratch, experienced Python programmers will always first ask themselves "how can I do this with the standard library?" since using the code in the standard library brings multiple benefits, which will be explained later in the chapter.

The standard library makes code simpler and easier to understand. By using modules such as dataclasses, you can write code that would otherwise take hundreds of lines to create by ourselves and would most likely include bugs.

The dataclass module allows you to create value semantic types with fewer keystrokes by providing a decorator that can be used in a class, which will generate all the required boilerplate to have a class with the most common methods.

##### Exercise: Using the dataclass Module
In this exercise, you will create a class to hold data for a geographical point. This is a simple structure with two coordinates, x and y.

These coordinate points, x and y, are used by other developers who need to store geographical information. They will be working daily with these points, so they need to be able to create them with an easy constructor and be able to print them and see their values — converting them into a dictionary to save them into their database and share it with other people.

In [1]:
#Import the dataclass module 
import dataclasses

In [3]:
#defining a dataclass
@dataclasses.dataclass
class Point:
    x: int
    y: int 

In [4]:
#creata an instance which is the data for a geographical point
p = Point (x=10, y=20)
print (p)

Point(x=10, y=20)


In [5]:
p2 = Point (x=10, y=20)

p == p2

True

In [6]:
#serialize the data
dataclasses.asdict(p)

{'x': 10, 'y': 20}

The dataclasses module is part of the standard library, so most experienced users will understand how a class decorated with a dataclass decorator will behave compared to a custom implementation of those methods. This would require either further documentation to be written, or for users to fully understand all the code in all classes that are manually crafting those methods.

Moreover, using a battle-tested code that the standard library provides is also key to writing an efficient and robust application. Functions such as sort in Python use a custom sorting algorithm known as timsort. This is a hybrid stable sorting algorithm derived from merge sort and insertion sort, and will usually result in better performance results and fewer bugs than any algorithm that a user could implement in a limited amount of time.

##### Exercise: Extending the echo.py Example
After the creation of the capitalize tool that you saw earlier in this topic, you can implement an enhanced version of the echo tool in Linux, which is used in some embedded systems that have Python. You will, use the previous code for capitalize and enhance it to have a nicer description. This will allow the echo command to repeat the word passed in and to take more than one word.

In [7]:
%run echo -h

ERROR:root:File `'echo.py'` not found.


In [8]:
parser = argparse.ArgumentParser(description="""
Prints out the words passed in, capitalizes them if required
and repeats them in as many lines as requested.
""")

NameError: name 'argparse' is not defined

#### Using List Comprehensions
List comprehensions are a flexible, expressive way of writing Python expressions to create sequences of values. They make iterating over the input and building the resulting list implicit so that program authors and readers can focus on the important features of what the list represents. It is this concision that makes list comprehensions a Pythonic way of working with lists or sequences.

List comprehensions are built out of bits of Python syntax we have already seen. They are surrounded by square brackets ([]), which signify Python symbols for a literal list. They contain for element in a list, which is how Python iterates over members of a collection. Optionally, they can filter elements out of a list using the familiar syntax of the if expression.

##### Exercise: Using List Comprehensions
List comprehensions are a flexible, expressive way of writing Python expressions to create sequences of values. They make iterating over the input and building the resulting list implicit so that program authors and readers can focus on the important features of what the list represents. It is this concision that makes list comprehensions a Pythonic way of working with lists or sequences.

List comprehensions are built out of bits of Python syntax we have already seen. They are surrounded by square brackets ([]), which signify Python symbols for a literal list. They contain for element in a list, which is how Python iterates over members of a collection. Optionally, they can filter elements out of a list using the familiar syntax of the if expression.

In [3]:
cubes = []
for x in [1,2,3,4,5]:
    cubes.append(x**3)
print(cubes)

[1, 8, 27, 64, 125]


Understanding this code involves keeping track of the state of the cube's variable, which starts as an empty list, and of the x variable, which is used as a cursor to keep track of the program's position in the list. This is all irrelevant to the task at hand, which is to list the cubes of each of these numbers. It will be better – more Pythonic, even – to remove all the irrelevant details. Luckily, list comprehensions allow us to do that.

In [4]:
cubes = [x**3 for x in [1,2,3,4,5]]
print(cubes)

[1, 8, 27, 64, 125]


Now the code is as short and succinct as it can be. Rather than telling you the recipe that the computer follows to build a list of the cubes of the numbers 1, 2, 3, 4, and 5, it tells you that it calculates the cube of x for every x starting from 1 and smaller than 6. 

This is the essence of Pythonic coding: reducing the gap between what you say and what you mean when you tell the computer what it should do.A list comprehension can also filter its inputs when building a list. To do this, you add an if expression to the end of the comprehension, where the expression can be any test of an input value that returns True or False. This is useful when you want to transform some of the values in a list while ignoring others. As an example, you could build a photo gallery of social media posts by making a list of thumbnail images from photos found in each post, but only when the posts are pictures, not text status updates.

You want to get Python to shout the names of the Monty Python cast, but only those whose name begins with "T". Enter the following Python code into a notebook:

In [6]:
names = ["Graham Chapman", "John Cleese", "Terry Gilliam", "Eric Idle", "Terry Jones"]

Those are the names you are going to use. Enter this list comprehension to filter only those that start with "T" and operate on them:

In [8]:
print([name.upper() for name in names if name.startswith("T")])

['TERRY GILLIAM', 'TERRY JONES']


##### Exercise: Using Multiple Input Lists
All the examples you have seen so far build one list out of another by performing an expression on each member of the list. You can define a comprehension over multiple lists, by defining a different element name for each of the lists.

To show how this works, in this exercise, you will be multiplying the elements of two lists together. The Spam Café in Monty Python's Flying Circus (refer to the preceding note) famously served a narrow range of foodstuffs mostly centered around a processed meat product. You will use ingredients from its menu to explore multiple-list comprehension:

In [9]:
print([x*y for x in ['spam', 'eggs', 'chips'] for y in [1,2,3]])

['spam', 'spamspam', 'spamspamspam', 'eggs', 'eggseggs', 'eggseggseggs', 'chips', 'chipschips', 'chipschipschips']


Inspecting the result shows that the collections are iterated in a nested fashion, with the rightmost collection on the inside of the nest and the leftmost on the outside. Here, if x is set to spam, then x*y is calculated with y being equal to each of the values of 1, 2, and then 3 before x is set to eggs, and so on.


In [10]:
print([x*y for x in [1,2,3] for y in ['spam', 'eggs', 'chips']])

['spam', 'eggs', 'chips', 'spamspam', 'eggseggs', 'chipschips', 'spamspamspam', 'eggseggseggs', 'chipschipschips']


Swapping the order of the lists changes the order of the results in the comprehension. Now, x is initially set to 1, then y to each of spam, eggs, and chips, before x is set to 2, and so on. While the result of anyone multiplication does not depend on its order (for instance, the results of 'spam'*2 and 2*'spam' are the same, namely, spamspam), the fact that the lists are iterated in a different order means that the same results are computed in a different sequence.



In [11]:
#the same list could be iterated multiple times in a list comprehension — the lists for x and y do not have to be different:

numbers = [1,2,3]
print([x**y for x in numbers for y in numbers])

[1, 1, 1, 2, 4, 8, 3, 9, 27]


##### Activity: Building a Chess Tournament
In this activity, you will use a list comprehension to create the fixtures for a chess tournament. Fixtures are strings of the form "player 1 versus player 2." Because there is a slight advantage to playing as white, you also want to generate the "player 2 versus player 1" fixture so that the tournament is fair. But you do not want people playing against themselves, so you should also filter out fixtures such as "player 1 versus player 1.

In [14]:
names = ['Magnus Carlsen', 'Fabiano Caruana', 'Yifan Hou', 'Wenjun Ju']
fixtures = [f'{p1} vs. {p2}' for p1 in names for p2 in names if p1 != p2]
print (fixtures)

['Magnus Carlsen vs. Fabiano Caruana', 'Magnus Carlsen vs. Yifan Hou', 'Magnus Carlsen vs. Wenjun Ju', 'Fabiano Caruana vs. Magnus Carlsen', 'Fabiano Caruana vs. Yifan Hou', 'Fabiano Caruana vs. Wenjun Ju', 'Yifan Hou vs. Magnus Carlsen', 'Yifan Hou vs. Fabiano Caruana', 'Yifan Hou vs. Wenjun Ju', 'Wenjun Ju vs. Magnus Carlsen', 'Wenjun Ju vs. Fabiano Caruana', 'Wenjun Ju vs. Yifan Hou']


#### Set and Dictionary Comprehensions
List comprehensions are handy ways to concisely build sequences of values in Python. Other forms of comprehensions are also available, which you can use to build other collection types. A set is an unordered collection: you can see what elements are in a set, but you cannot index into a set nor insert an object at a particular location in the set because the elements are not ordered. An element can only be present in a set once, whereas it could appear in a list multiple times.

Sets are frequently useful in situations where you want to quickly test whether an object is in a collection but do not need to track the order of the objects in the collection. For example, a web service might keep track of all of the active session tokens in a set, so that when it receives a request, it can test whether the session token corresponds to an active session.

A dictionary is a collection of pairs of objects, where one object in the pair is called the key, and the other is called the value. In this case, you associate a value with a particular key, and then you can ask the dictionary for the value associated with that key. Each key may only be present in a dictionary once, but multiple keys may be associated with the same value. While the name "dictionary" suggests a connection between terms and their definitions, dictionaries are commonly used as indices (and, therefore, a dictionary comprehension is often used to build an index). Going back to your web service example, different users of the service could have different permissions, thus limiting the actions that they can perform. The web service could construct a dictionary in which the keys are session tokens, and the values represent user permissions. This is so that it can quickly tell whether a request associated with a given session is permissible.

The syntax for both set and dictionary comprehensions looks very similar to list comprehension, with the square brackets ([]) simply replaced by curly braces ({}). The difference between the two is how the elements are described. For a set, you need to indicate a single element, for example, { x for x in … }. For a dictionary, you need to indicate a pair containing the key and the value, for example, { key:value for key in… }

In [15]:
#to get a list
print([a + b for a in [0,1,2,3] for b in [4,3,2,1]])

[4, 3, 2, 1, 5, 4, 3, 2, 6, 5, 4, 3, 7, 6, 5, 4]


In [16]:
#change the list above to a set

print({a+b for a in [0,1,2,3] for b in [4,3,2,1]})

{1, 2, 3, 4, 5, 6, 7}


Notice that the set created in step 2 is much shorter than the list created in step 1. The reason for this is that the set does not contain duplicate entries – try counting how many times the number 4 appears in each collection. It's in the list four times (because 0 + 4 = 4, 1 + 3 = 4, 2 + 2 = 4, and 3 + 1 = 4), but sets don't retain duplicates, so there's only one instance of the number 4 in the set. If you just removed the duplicates from the list produced in step 1, you'd have a list of [4, 3, 2, 1, 5, 6, 7]. Sets don't preserve the order of their elements either, so the numbers appear in a different order in the set created in step 2. The fact that the numbers in the set appear in numerical order is due to the implementation of the set type in Python.




#### Using Dictionary Comprehensions
Curly-brace comprehension can also be used to create a dictionary. The expression on the left-hand side of the for keyword in the comprehension should contain a key value pair. You write the expression that will generate the dictionary keys to the left of the colon and the expression that will generate the values to the right. Note that a key can only appear once in a dictionary.

In [18]:
names = ["Eric", "Graham", "Terry", "John", "Terry"]
print({k:len(k) for k in ["Eric", "Graham", "Terry", "John", "Terry"]})

{'Eric': 4, 'Graham': 6, 'Terry': 5, 'John': 4}


Notice that the entry for Terry only appears once, because dictionaries cannot contain duplicate keys. You have created an index of the length of each name, keyed by name. An index like this could be useful in a game, where it could work out how to layout the score table for each player without repeatedly having to recalculate the length of each player's name.




##### Activity: Building a Scorecard Using Dictionary Comprehensions and Multiple Lists
You are the backend developer for a renowned college. The management has asked you to build a demo scorecard for their students based on the marks they have achieved in their exams.

Your goal in this activity is to use dictionary comprehension and lists in Python to build a demo scorecard for four students in the college.

In [23]:
students = ['Eric', 'Mark', 'Wade', 'Betty']
scores = [50,79,98,56]
score = {students [i]: scores [i] for i in range (4)}

print (score)

{'Eric': 50, 'Mark': 79, 'Wade': 98, 'Betty': 56}


#### Default Dictionary
The built-in dictionary type considers it to be an error when you try to access the value for a key that doesn't exist. It will raise a KeyError, which you have to handle or your program crashes. Often, that's a good idea. If the programmer doesn't get the key correct, it could indicate a typo or a misunderstanding of how the dictionary is used.

It's often a good idea, but not always. Sometimes, it's fairly possible that a programmer doesn't know what the dictionary contains; whether it's created from a file supplied by the user or the content of a network request, for example. In situations like this, any of the keys the programmer expects could be missing, but handling KeyError instances everywhere will be tedious, repetitive, and make the intent of the code harder to see.

For these situations, Python provides the collections.defaultdict type. It works like a regular dictionary, except that you can give it a function that creates a default value to use when a key is missing. Rather than raise an error, it calls that function and returns the result.

##### Exercise: Adopting a Default Dict
In this exercise, you will be using a regular dictionary that raises a KeyError when you try to access a missing key:

In [24]:
john = { 'first_name': 'John', 'surname': 'Cleese' }
john['middle_name']

KeyError: 'middle_name'

In [26]:
#Now, import the defaultdict from collections and wrap the dictionary in a defaultdict:

from collections import defaultdict
safe_john = defaultdict(str, john)

print(safe_john['middle_name'])




Using the wrapped dictionary does not throw an error when undefined keys are used

No exception is triggered at this stage; instead, an empty string is returned. The first argument to the constructor of defaultdict, called default_factory, can be any callable (that is, function-like) object. You can use this to compute a value based on the key or return a default value that is relevant to your domain.


Create a defaultdict that uses lambda as its default_factory. default_factory is a function that returns the default value for the missing keys.

In [28]:
from collections import defaultdict
courses = defaultdict(lambda: 'No!')
courses['Java'] = 'This is Java'

In [29]:
print(courses['Python'])

No!


In [30]:
print(courses['Java'])

This is Java


The benefit of the default dictionary is that in situations where you know it is likely that expected keys will be missing from a dictionary, you can work with default values and not have to sprinkle your code with exception-handling blocks. This is another example of Pythonicity: if what you mean is "use the value for the "foo" key, but if that doesn't exist, then use "bar" as the value," then you should write that, rather than "use the value for the "foo" key, but if you get an exception and the exception is KeyError, then use "bar" as the value."

Default dicts are great for working with untrusted input, such as a file chosen by the user or an object received over the network. A network service shouldn't expect any input it gets from a client to be well formatted. If it treats the data, it receives in a request as a JSON object. It should be ready for the data to not be in JSON format. If the data is really JSON, the program should not expect all of the keys defined by the API to have been supplied by the client. The default dict gives you a really concise way to work with such under-specified data.

#### Iterators
The Pythonic secret that enables comprehensions to find all of the entries in a list, range, or other collection is an iterator. Supporting iterators in your own classes opens them up for use in comprehensions, for…in loops, and anywhere that Python works with collections. Your collection must implement a method called __iter__(), which returns the iterator.

The iterator itself is also a Python object with a simple contract. It must provide a single method, __next__(). Each time __next__() is called, the iterator returns the next value in the collection. When the iterator reaches the end of the collection, __next__() raises StopIteration to signal that the iteration should terminate.

If you've used exceptions in other programming languages, you may be surprised by this use of an exception to signal a fairly commonplace situation. After all, plenty of loops reach an end, so it's not exactly an exceptional circumstance. Python is not so dogmatic about exceptions, favoring simplicity and expressiveness over universal rules-lawyering.

Once you've learned the techniques to build iterators, the applications are limitless. Your own collections or collection-like classes can supply iterators so that programmers can work with them using Pythonic collection techniques such as comprehensions. For example, an application that stores its data model in a database can use an iterator to retrieve each row that matches a query as a separate object in a loop or comprehension. A programmer can say, "For each row in the database, do this to the row," and treat it like a list of rows, when your data model object is secretly running a database query each time the iterator's __next__() method is called.

##### Exercise: The Simplest Iterator
The easiest way to provide an iterator for your class is to use one from another object. If you are designing a class that controls access to its own collection, then it might be a good idea to let programmers iterate over your object using the collection's iterator. In this case, just have __iter__() return the appropriate iterator.

In this exercise, you will be coding an Interrogator who asks awkward questions to people on a quest. It takes a list of questions in its constructor. You will write this program that prints these questions as follows:

Using an Interrogator in a loop probably means asking each of its questions in sequence. The easiest iterator that can achieve this is the iterator for the collection of questions. Therefore to implement the __iter__() method to return that object.


In [9]:
class Interrogator:
    def __init__(self, questions):
        self.questions = questions

# Add the __iter__() method:
    def __iter__(self):
        return self.questions.__iter__()
    
#create a list of questions
questions = ["What is your name?", "What is your quest?", "What is the average airspeed velocity of an unladen swallow?"]

#Create an Interrogator:
awkward_person = Interrogator(questions)



In [10]:
#Now use the Interrogator in a for loop:
for question in awkward_person:
    print(question)

What is your name?
What is your quest?
What is the average airspeed velocity of an unladen swallow?


On the face of it, you've done nothing more than adding a level of interaction between the Interrogator class and the collection of questions. From an implementation perspective, that's exactly right. However, from a design perspective, what you've done is much more powerful. You've designed an Interrogator class that programmers can ask to iterate over its questions, without having to tell the programmer anything about how the Interrogator stores its questions. While it's just forwarding a method call to a list object today, you could change that tomorrow to use a SQLite3 database or a web service call, and programmers using the Interrogator class will not need to change anything.

For a more complicated case, you need to write your own iterator. The iterator is required to implement a __next__() method, which returns the next element in the collection or raises StopIteration when it gets to the end.

##### Exercise: A Custom Iterator
In this exercise, you'll implement a classical-era algorithm called the Sieve of Eratosthenes. To find prime numbers between 2 and an upper bound value, n, first, list all of the numbers in that range. Now, 2 is a prime, so return that. Then, remove 2 from the list, and all multiples of 2, and return the new lowest number (which will be 3). Continue until there are no more numbers left in the collection. Every number that gets returned using this method is a successively higher prime. It works because any number you find in the collection to return did not get removed at an earlier step, so has no lower prime factors other than itself.

First, build the architecture of the class. Its constructor needs to take the upper bound value and generate the list of possible primes. The object can be its own iterator, so its __iter__() method will return itself:



In [11]:
#Define the PrimesBelow class and its initializer:
class PrimesBelow:
    def __init__(self, bound):
        self.candidate_numbers = list(range(2,bound))

#Implement the __iter__() method to return itself:
    def __iter__(self):
         return self
        
#Define the __next__() method and the exit condition. 
#If there are no remaining numbers in the collection, then the iteration can stop:
    def __next__(self):
        if len(self.candidate_numbers) == 0:
            raise StopIteration
            
        next_prime = self.candidate_numbers[0]
        self.candidate_numbers = [x for x in self.candidate_numbers if x % next_prime != 0]
        return next_prime
    
#Use an instance of this class to find all the prime numbers below 100:
primes_to_a_hundred = [prime for prime in PrimesBelow(100)]
print(primes_to_a_hundred)

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]


In [None]:
'''The main body of the algorithm is in the __next__() method. With each iteration, 
it finds the next lowest prime. If there isn't one, it raises StopIteration. 
If there is one, it sieves that prime number and its multiples from the collection and then returns the prime number.'''

In [None]:
'''Complete the implementation of __next__() by selecting the lowest number in the collection as 
the value for next_prime and removing any multiples of that number before returning the new prime:
'''

This exercise demonstrates that by implementing an iterative algorithm as a Python iterator, you can treat it like a collection. In fact, the program does not actually build the collection of all of the prime numbers: you did that yourself in step 5 by using the PrimesBelow class, but otherwise, PrimesBelow was generating one number at a time, whenever you called the __next()__ method. This is a great way to hide the implementation details of an algorithm from a programmer. Whether you actually give them a collection of objects to iterate over or an iterator that computes each value as it is requested, programmers can use the results in exactly the same way.

##### Exercise: Controlling the Iteration
You do not have to use an iterator in a loop or comprehension. You can use the iter() function to get its argument's iterator object, and then pass that to the next() function to return successive values from the iterator. These functions call through to the __iter__() and __next__() methods, respectively. You can use them to add custom behavior to an iteration or to gain more control over the iteration.

In this exercise, you will print the prime numbers below 5. An error should be raised when the object runs out of prime numbers. To do this, you will use the PrimesBelow class created in the previous exercise:

In [12]:
class PrimesBelow:
    def __init__(self, bound):
        self.candidate_numbers = list(range(2,bound))
    def __iter__(self):
        return self
    def __next__(self):
        if len(self.candidate_numbers) == 0:
            raise StopIteration
        next_prime = self.candidate_numbers[0]
        self.candidate_numbers = [x for x in self.candidate_numbers if x % next_prime != 0]
        return next_prime
primes_under_five = iter(PrimesBelow(5))

In [15]:
#Repeatedly use next() with this object to generate successive prime numbers:
next(primes_under_five)

StopIteration: 

When the object runs out of prime numbers, the subsequent use of next() raises the StopIteration error:


Being able to step through an iteration manually is incredibly useful in programs that are driven by a sequence of inputs, including a command interpreter. You can treat the input stream as an iteration over a list of strings, where each string represents a command. Call next() to get the next command, work out what to do, and then execute it. Then, print the result, and go back to next() to await the subsequent command. When StopIteration is raised, the user has no more commands for your program, and it can exit.

#### Itertools
Iterators are useful for describing sequences, such as Python lists and ranges, and sequence-like collections, such as your own data types, that provide ordered access to their contents. Iterators make it easy to work with these types in a Pythonic way. Python's library includes the itertools module, which has a selection of helpful functions for combining, manipulating, and otherwise working with iterators. In this section, you will use a couple of helpful tools from the module. There are plenty more available, so be sure to check out the official documentation for itertools.

One of the important uses of itertools is in dealing with infinite sequences. There are plenty of situations in which a sequence does not have an end: everything from infinite series in mathematics to the event loop in a graphical application. A graphical user interface is usually built around an event loop in which the program waits for an event (such as a keypress, a mouse click, a timer expiring, or something else) and then reacts to it. The stream of events can be treated as a potentially infinite list of event objects, with the program taking the next event object from the sequence and doing its reaction work. Iterating over such a sequence with either a Python for..in loop or a comprehension will never terminate. There are functions in itertools for providing a window onto an infinite sequence, and the following exercise will look at one of those.

##### Exercise: Using Infinite Sequences and takewhile
An alternative algorithm to the Sieve of Eratosthenes for generating prime numbers is to test each number in sequence – to see whether it has any divisors other than itself. This algorithm uses a lot more time than the Sieve in return for a lot less space.

In this exercise, you will be implementing a better algorithm that uses less space than the Sieve for generating prime numbers:

In [16]:

class Primes:
    def __init__(self):
        self.current = 2
        
    def __iter__(self):
        return self
     
    def __next__(self):
        while True:
            current = self.current
            square_root = int(current ** 0.5)
            is_prime = True
            if square_root >= 2:
                for i in range(2, square_root + 1):
                    if current % i == 0:
                        is_prime = False
                        break
            self.current += 1
            if is_prime:
                return current

Note: The class you just entered is an iterator, but the __next__() method never raises a StopIteration error. That means it never exits. Even though you know that each prime number it returns is bigger than the previous one, a comprehension doesn't know that so you can't simply filter out large values

In [None]:
#Enter the following code to get a list of primes that are lower than 100:
[p for p in Primes() if p < 100]

Because the iterator never raises StopIteration, this program will never finish. You'll have to force it to exit. This is because of the fact this list comprehension is equivalent to

In [18]:
myList = []
for p in Primes():
    if p < 100:
        myList.append(p)

KeyboardInterrupt: 

To work with this iterator, itertools provides the takewhile() function, which wraps the iterator in another iterator. You also supply takewhile() with a Boolean function, and its iteration will take values from the supplied iterator until the function returns False, at which time it raises StopIteration and stops. This makes it possible to find the prime numbers below 100 from the infinite sequence entered previously.

In [19]:
#Use takewhile() to turn the infinite sequence into a finite one:
import itertools
print([p for p in itertools.takewhile(lambda x: x<100, Primes())]) #the 'takewhile' wraps the iterator into another iterator

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]


##### Exercise: Turning a Finite Sequence into an Infinite One, and Back Again
In this exercise, consider a turn-based game, such as chess. The person playing white makes the first move. Then, the person playing black takes their turn. Then white. Then black. Then white, black, white, and so on until the game ends. If you had an infinite list of white, black, white, black, white, and so on, then you could always look at the next element to decide whose turn it is:

In [21]:
import itertools
players = ['White', 'Black']

#Use the itertools function cycle to generate an infinite sequence of turns:
turns = itertools.cycle(players)

To demonstrate that this has the expected behavior, you'll want to turn it back into a finite sequence so that you can view the first few members of the turns iterator. You can use takewhile() for that, and, here, combine it with the count() function from itertools, which produces an infinite sequence of numbers.

In [22]:
#List the players who take the first 10 turns in a chess game:
countdown = itertools.count(10, -1)
print([turn for turn in itertools.takewhile(lambda x:next(countdown)>0, turns)])

['White', 'Black', 'White', 'Black', 'White', 'Black', 'White', 'Black', 'White', 'Black']


This is the "round-robin" algorithm for allocating actions (in this case, making a chess move) to resources (in this case, the players), and has many more applications than board games. A simple way to do load balancing between multiple servers in a web service or database application is to build an infinite sequence of the available servers and choose one in turn for each incoming request.

#### Generators
A function that returns a value does all of its computation and gives up control to its caller, which supplies that value. This is not the only possible behavior for a function. It can instead yield a value, which passes control (and the value) back to the caller but leaves the function's state intact. Later, it can yield another value, or finally return to indicate that it is done. A function that yields is called a generator.

Generators are useful because they allow a program to defer or postpone calculating a result until it's required. Finding the successive digits of π, for example, is hard work, and it gets harder as the number of digits increases. If you wrote a program to display the digits of π, you might calculate the first 1,000 digits. Much of that effort will be wasted if the user only asks to see the first 10 digits. Using a generator, you can put off the expensive work until your program actually requires the results.

A real-world example of a situation where generators can help is when dealing with I/O. A stream of data coming from a network service can be represented by a generator that yields the available data until the stream is closed when it returns the remaining data. Using a generator allows the program to pass control back and forth between the I/O stream when data is available, and the caller where the data can be processed.

Python internally turns generator functions into objects that use the iterator protocol (such as __iter__, __next__, and the StopIteration error), so the work you put into understanding iterations in the previous section means you already know what generators are doing. There is nothing you can write for a generator that could not be replaced with an equivalent iterator object. However, sometimes, a generator is easier to write or understand. Writing code that is easier to understand is the definition of Pythonicity.

##### Exercise: Generating a Sieve
In this exercise, you will be rewriting the Sieve of Eratosthenes as a generator function and comparing it with the result of the iterator version:

In [23]:
#Rewrite the Sieve of Eratosthenes as a generator function that yields its values:
def primes_below(bound):
    candidates = list(range(2,bound))
    while(len(candidates) > 0):
        yield candidates[0] #'yield' is like 'return' except that the function would return a generator and not a value.
        candidates = [c for c in candidates if c % candidates[0] != 0]

In [25]:
#Confirm that the result is the same as the iterator version:
print ([prime for prime in primes_below(100)])

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]


That's really all there is to generators — they're just a different way of expressing an iterator. They do, however, communicate a different design intention; namely, that the flow of control is going to pass back and forth between the generator and its caller.

##### Activity: Using Random Numbers to Find the Value of Pi
The Monte Carlo method is a technique that is used for approximating a numerical solution using random numbers. Named after the famous casino, chance is at the core of Monte Carlo methods. They use random sampling to obtain information about a function that will be difficult to calculate deterministically. Monte Carlo methods are frequently used in scientific computation to explore probability distributions, and in other fields including quantum physics and computational biology. They're also used in economics to explore the behavior of financial instruments under different market conditions. There are many applications for the Monte Carlo principle.

In this activity, you'll use a Monte Carlo method to find an approximate value for π. Here's how it works: two random numbers, (x,y), somewhere between (0,0) and (1,1), represent a random point in a square positioned at (0,0) with sides of length 1:

Using Pythagoras' Theorem, if the value of $$\sqrt{x^2 + y^2}$$ is less than 1, then the point is also in the top-right corner of a circle centered at (0,0) with a radius of -1:

Generate lots of points, count how many are within the circle segment, and divide the number of points within the circle by the total number of points generated. This gives you an approximation of the area of the circle segment, which should be π/4. Multiply by 4, and you have an approximate value of π. Data scientists often use this technique to find the area under more complex curves that represent probability distributions.

##### Steps: 
Write a generator to yield successive estimates of π. The steps are as follows:

- Define your generator function.
- Set the total number of points, and the number within the circle segment, to 0.
- Do the following substeps 10,000 times:
-- Generate two numbers between 0 and 1, using Python's random.random() function.

-- Add 1 to the total number of points.

-- Use math.sqrt() to find out how far the point represented by the numbers is from (0,0).

-- If the distance is less than 1; add 1 to the number of points within the circle.

-- Calculate your estimate for π: 4 * (points within the circle) / (total points generated).

-- If you have generated a multiple of 1,000 points, yield the approximate value for π. If you have generated 10,000 points, return the value.
- Inspect the successive estimates of π and check how close they are to the true value (math.pi).

In [26]:
import math
import random

In [27]:
#Define the approximate_pi function:
def approximate_pi():

#Set the counters to zero:
    total_points = 0
    within_circle = 0
    
#Calculate the approximation multiple times:
    for i in range (10001):
#Here, x and y are random numbers between 0 and 1, which, together, represent a point in the unit square
        x = random.random()
        y = random.random()
        total_points += 1
        #Use Pythagoras' Theorem to work out the distance between the point and the origin, (0,0):
        distance = math.sqrt(x**2+y**2)
        if distance < 1:
#If the distance is less than 1,this point is both inside the square and inside a circle of radius 1, centered on the origin
            within_circle += 1
        #yield a result every 1000 points
        if total_points % 1000 == 0:
            #the ratio of the points within the circle to the total points generated should be approx n/4
            pi_estimate = 4 * within_circle / total_points
            if total_points == 10000:
                #after 1000 points are generated, return the estimates to complete the iteration
                return pi_estimate
            else:
                yield pi_estimate
#use the generator to find the estimates for the value of n
estimates = [estimate for estimate in approximate_pi()]
errors = [estimate - math.pi for estimate in estimates]

In [30]:
print(estimates)
print(errors)

[3.168, 3.12, 3.16, 3.179, 3.1808, 3.179333333333333, 3.164, 3.159, 3.1502222222222223]
[0.026407346410207033, -0.02159265358979301, 0.018407346410207026, 0.03740734641020671, 0.039207346410206956, 0.03774067974354001, 0.02240734641020703, 0.017407346410206692, 0.008629568632429141]


#### Regular Expressions
Regular expressions (or regexes) are a domain-specific programming language, defining a grammar for expressing efficient and flexible string comparisons. Introduced in 1951 by Stephen Cole Kleene, regular expressions have become a popular tool for searching and manipulating text. As an example, if you're writing a text editor and you want to highlight all web links in a document and make them clickable, you might search for strings that start with HTTP or HTTPS, then those that contain ://, and then those that contain some collection of printable characters, until you stop finding printable characters (such as a space, newline, or the end of the text), and highlight everything up to the end. With standard Python syntax, this will be possible, but you will end up with a very complex loop that will be difficult to get right. Using regexes, you match against https?://\S+.

features used in regular expressions as seen in the preceding URL:

- Most characters match their own identities, so "h" in a regex means "match exactly the letter h."
- Enclosing characters in square brackets can mean choosing between alternates, so if we thought a web link might be capitalized, we could start with "[Hh]" to mean "match either H or h." In the body of the URL, we want to match against any non-whitespace characters, and rather than write them all out. We use the \S character class. Other character classes include \w (word characters), \W (non-word characters), and \d (digits).
- Two quantifiers are used: ? means "0 or 1 time," so "s?" means "match if the text does not have s at this point or has it exactly once." The quantifier, +, means "1 or more times," so "\S+" says "one or more non-whitespace characters." There is also a quantifier *, meaning "0 or more times."Additional regex features that you will use in this chapter are listed here:
- Parentheses () introduce a numbered sub-expression, sometimes called a "capture group." They are numbered from 1, in the order that they appear in the expression.
- A backslash followed by a number refers to a numbered sub-expression, described previously. As an example, \1 refers to the first sub-expression. These can be used when replacing text that matches the regex or to store part of a regex to use later in the same expression. Because of the way that backslashes are interpreted by Python strings, this is written as \\1 in a Python regex.

Regular expressions have various uses throughout software development, as so much software deals with text. Validating user input in a web application, searching for and replacing entries in text files, and finding interesting events in application log files are all uses that regular expressions can be put to in a Python program.

##### Exercise: Matching Text with Regular Expressions
In this exercise, you'll use the Python re module to find instances of repeated letters in a string.

The regex you will use is (\w)\\1+"."(\w) searches for a single character from a word (that is, any letter or the underscore character, _) and stores that in a numbered sub-expression, \1. Then, \\1+ uses a quantifier to find one or more occurrences of the same character. The steps for using this regex are as follows:

In [31]:
#Import the re module:
import re

In [32]:
#Define the string that you will search for, and the pattern by which to search:
title = "And now for something completely different"
pattern = "(\w)\\1+"

In [33]:
#Search for the pattern and print the result:
print(re.search(pattern, title))

<re.Match object; span=(35, 37), match='ff'>


The re.search() function finds matches anywhere in the string: if it doesn't find any matches, it will return None. If you were only interested in whether the beginning of the string matched the pattern, you could use re.match(). Similarly, modifying the search pattern to start with the beginning-of-line marker (^) achieves the same aim as re.search("^(\w)\\1+", title).

##### Exercise: Using Regular Expressions to Replace Text
In this exercise, you'll use a regular expression to replace occurrences of a pattern in a string with a different pattern. The steps are as follows:



In [34]:
#Define the text to search:
import re
description = "The Norwegian Blue is a wonderful parrot. This parrot is notable for its exquisite plumage."

In [35]:
#Define the pattern to search for, and its replacement:
pattern = "(parrot)"
replacement = "ex-\\1"

In [36]:
#Substitute the replacement for the search pattern, using the re.sub() function:
print(re.sub(pattern, replacement, description))

The Norwegian Blue is a wonderful ex-parrot. This ex-parrot is notable for its exquisite plumage.


The replacement refers to the capture group, "\1", which is the first expression in the search pattern to be surrounded by parentheses. In this case, the capture group is the whole word parrot. This lets you refer to the word parrot in the replacement without having to type it out again.

##### Activity: Regular Expressions
At your online retail company, your manager has had an idea for a promotion. There is a whole load of old "The X-Files" DVDs in the warehouse, and she has decided to give one away for free to any customer whose name contains the letter x.

In this activity, you will be using Python's re module to find winning customers. The x could be capitalized if it's their initial, or lower case if it's in the middle of their name, so use the regular expression [Xx] to search for both cases:

In [37]:
import re

customers = ['Xander Harris', 'Jennifer Smith', 'Timothy Jones', 'Amy Alexandrescu', 'Peter Price', 'Weifung Xu']

In [40]:
winner = [customer for customer in customers if re.search ('[Xx]', customer)]

print (winner)

['Xander Harris', 'Amy Alexandrescu', 'Weifung Xu']


### Software Development

#### Debugging
Sooner or later in your development, you will reach a point where you see our program behave differently than you initially expected. In situations like these, you usually look back at the source code and try to understand what is different between your expectations and the code or inputs that are being used. To facilitate that process, there are multiple methods (in general, and some that are specific to Python) that you can use to try to "debug" or "troubleshoot" the issue.

Usually, the first action of an experienced developer, when frustration arises from unexpected results in their code, is to look at the logs or any other output that the application produces. A good starting point is trying to increase the logging verbosity, as discussed in Standard Library course. If you are not able to troubleshoot the problem with just logs, it usually means that you should look back at how we are instructing our application to log its state and activity producing what are known as traces, as there might be a good opportunity to improve it.

The next step of verifying the inputs and outputs of the program is to receive and verify the log. The usual next step in Python is to use the Python debugger, pdb.

The pdb module and its command line interface which is a cli tool allows you to navigate through the code as it runs and ask questions about the state of the program, its variables, and the flow of execution. It is similar to other tools, such as gdb, but it is at a higher level and is designed for Python.

There are two main ways to start pdb. You can just run the tool and feed it with a file or use the breakpoint command.

In [44]:
# This is a comment
this = "is the first line to execute"
def secret_sauce(number):
    if number <= 10:
        return number + 10
    else:
        return number - 10
def magic_operation(x, y):
    res = x + y
    res *= y
    res /= x
    res = secret_sauce(res)
    return res
print(magic_operation(2, 10))

50.0


In [None]:
#When you begin executing the script with pdb, it works as follows:

python3.8 –m pdb magic_operation.py
> [...]Lesson08/1.debugging/magic_operation.py(3)<module>()
-> this = "is the first line to execute"
(Pdb)

It will stop on the first line of the Python code to execute and give us a prompt to interact with pdb.

The first line shows us which current file you are in at the moment, while the final line shows us the pdb prompt (pdb), which tells us which debugger you are running and that it is waiting for input from the user.

Another way to start pdb is to change the source code to do this. At any point in the code, we can write "import pdb;pdb.set_trace()" for earlier versions of Python to tell the Python interpreter that you want to start a debugging session at that point. If you are using Python 3.7 or a later version, you can use breakpoint().

If you execute the magic_operation_with_breakpoint.py file attached in the GitHub repository, which has breakpoint() in one of its lines, you will see that the debugger starts for you where you requested it.

When you are running things in an IDE or code in a large application you could achieve the same effect by using the operations that we will demonstrate later, but just dropping that line in the file is by far the simplest and fastest way:

In [None]:
$ python3.7 magic_operation_with_breakpoint.py
> [...]/Lesson08/1.debugging/magic_operation_with_breakpoint.py(7)secret_sauce()
-> if number <= 10:
(Pdb)

At this point, you can get a list of all the commands by running help, or you can get more information about a specific command by running the help command. The most commonly used commands are as follows:

- break filename:linenumber: This sets a breakpoint in the specified line. It ensures that you will stop the code at that point when other commands are running by continuing the execution. Breakpoints can be set in any file included in the standard library. If we want to set a breakpoint in a file that is part of a module, you can do so by just using its full path within the Python path. For example, to stop the debugger in the parser module, which is part of the HTML package of the standard library, you would perform b html/parser:50 to stop the code on line 50 of the file.
- break function: You can request to stop the code when a specific function is called. If the function is in the current file, you can pass the function name. If the function is imported from another module, you will have to pass the full function specification, for example, break html.parser. HTMLParser.reset, to stop at the reset function of the HTMLParser class of html.parser.
- break without arguments: This lists all the current breakpoints that are set in the current state of the program.
- continue: This continues the execution until a breakpoint is found. This is quite useful when you start a program, set breakpoints in all the lines of code or functions you want to inspect, and then just let it run until it stops at any of those.
- where: This prints a stack trace with the current line of execution where the debugger stopped. It is useful to know what called this function or to be able to move around the stack.
- down and up: These two commands allow us to move around in the stack. If we are in a function call, we can use up to move to the caller of the function and inspect the state in that frame, or you can use down to go deeper in the stack after we have moved up.
- list: This displays 11 lines of code from the point where the execution stopped for the first time to when it is called. Successive calls to list will display the following lines in batches of 11. To start again from where the execution stopped, use list.
- longlist: This shows the source code of the current function in the current frame that is being executed.
- next: This executes the line and moves to the following one.
- step: This executes the current line and stops at the first opportunity within the function being executed. This is useful when you don't want to just execute a function, but we want to step through it.
- p: This prints the value of an expression. It is useful for checking the content of variables.
- pp: This allows you to pretty print an expression. It is useful for when we are trying to print long structures.
- run/restart: This restarts the program keeping all the breakpoints still set. It is useful if you have passed an event you expected to see.

Many functions have shortcuts; for example, you can use b instead of break, c or cont instead of continue, l instead of a list, ll for longlist, and so on.

##### Exercise: Debugging a Salary Calculator
In this exercise, you will use the skills you learned to use pdb to debug an application that is not working as expected.

This is a salary calculator. Our company is using this to calculate the salary increase that will be given to our employees year after year, and a manager has reported that she is getting a 20% raise when the rulebook seems to suggest that she should be getting a 30% raise.

You are just told that the manager's name is Rose, and you will find that the code for the salary raise calculation is the following:

In [1]:
  
"""Adjusts the salary rise of an employee"""

def _manager_adjust(salary, rise):
    if rise < 0.10:
        # We need to keep managers happy.
        return 0.10

    if salary >= 1_000_000:
        # They are making enough already.
        return rise - 0.10


def calculate_new_salary(salary, promised_pct, is_manager, is_good_year):
    rise = promised_pct

    # remove 10% if it was a bad year
    if not is_good_year:
        rise -= 0.01
    else:
        pass

    # managers have a special adjust
    if is_manager:
        rise = _manager_adjust(salary, rise)

    # Extra bonus for people with high rises
    if rise >= 0.20:
        rise = rise + 0.10

    salary_increase = salary * rise
    return int(salary + salary_increase)


rose_salary = calculate_new_salary(1_000_000, 0.30, True, True)
print("Rose's salary will be:", rose_salary)

Rose's salary will be: 1200000


###### The following steps will help you complete this exercise:

1. Understand the problem by asking the right questions.

The first step is to fully understand the issue, evaluate whether there is an issue with the source code, and to get all the possible data. You need to ask the user who reported the error, and ourselves, common questions such as the following indent question list:

- What version of the software were they using?

- When did the error happen for the first time?

- Has it worked before?

- Is it an intermittent failure, or can the user consistently reproduce it?

- What was the input of the program when the issue manifested?

- What is the output and what would be the expected output?

- Do we have any logs or any other information to help us debug the issue?

In this instance, you get to know that this happened with the last version of our script and the person who reported it could reproduce it. It seems to be happening only to Rose, but that might be related to the arguments she is providing.

For instance, she reported that her current salary is $1,000,000. She was told she would get a 30 % $ raise, and even if she is aware that managers earning that much get a penalty of 10%, as the company had a good year and she was a high earner, she was expecting a 10% bonus, which should amount to 30%. But she saw that her new salary was $1,200,000, rather than $1,300,000.

You can translate this into the following arguments:

salary: 1,000,000.

promised_pct: 0.30.

is_manager: True

is_good_year: True

The expected output was 1,300,000, and the output she reported was 1,200,000.We don't have any logs about the execution, as the code was not instrumented with this.

2. Reproduce the issue by running the calculate_new_salary function and the known arguments.

The next step in our debugging investigation is to confirm that you can reproduce the issue. If you are not able to reproduce it, then it means that some of the input or assumptions that either we or the user made were incorrect, and you should go back to step 1 for clarification.

In this scenario, trying to reproduce the issue is easy—you need to run the function with the known arguments:

In [2]:
rose_salary = calculate_new_salary(1_000_000, 0.30, True, True)
print("Rose's salary will be:", rose_salary)

Rose's salary will be: 1200000


3. Run the code with the other current inputs, such as 1,000,000 and 2,000,000, to see the difference.

In some situations, it is helpful to try with other inputs to see how the program behaves before even running the debugger. This can give you some extra information. You know that there are special rules for people that earn a million dollars or more, so what happens if you raise that number to, say, $2,000,000?

Consider the following:

In [3]:
rose_salary = calculate_new_salary(2_000_000, 0.30, True, True)
print("Rose's salary will be:", rose_salary)

Rose's salary will be: 2400000


In [4]:
#You can also try changing the percentage, so let's try that with a promised initial raise of 40%:

rose_salary = calculate_new_salary(1_000_000, 0.40, True, True)
print("Rose's salary will be:", rose_salary)

Rose's salary will be: 1400000


From just trying out different inputs, you have seen what is special about Rose's situation, it is her 30% increase. When you start to debug things in the following step, you will see that you should keep an eye on the code that interacts with the promised percentage, as the initial salary change did not make a difference.

4. Start the debugger by firing up pdb and set up a breakpoint in your calculate_new_salary function:

In [9]:
python -m pdb salary_calculator.py (do this in an anaconda shell)


(Pdb) b calculate_new_salary


SyntaxError: invalid syntax (<ipython-input-9-e7cc61fec3f8>, line 1)

5. Now run continue or c to ask the interpreter to run until the function is executed:

In [11]:
(Pdb) c

SyntaxError: invalid syntax (<ipython-input-11-157182c7b368>, line 1)

6. Run the where command in order to get information about how you got to this point:

In [12]:
(Pdb) where

SyntaxError: invalid syntax (<ipython-input-12-e52294523d6c>, line 1)

When you can pinpoint the issue to a part of the program, you can go step by step, running the code and checking whether your expectations match what the result of running that line gives us.
An important step here is to think about what you expect to happen before you run the line. This might seem to make it take longer to debug the program, but it will pay off, because if there is a result that appears to be correct, but it is not, it will be easier to detect whether you expected the result rather than just confirming whether it was right a posteriori. Let's do this in your program.
7. Run the l command to confirm where we are in the program and args to print the arguments of the function:

In [13]:
(Pdb) l

SyntaxError: invalid syntax (<ipython-input-13-c8370aace924>, line 1)

In [None]:
#To use args to print the arguments of the function:
(Pdb) args

You are effectively on the first line of the code, and the arguments are what you expected. We could also run ll to get the whole function printed.
8. Advance the lines of code by using n to move one line at a time:

In [None]:
 (Pdb) n

You next check on whether it was a good year. As the variable is True, it does not get into the branch and jumps to line 23. As Rose is a manager, this does get into that branch, where it will perform the manager adjustment.
9. Print the value of the raise before and after the _manager_adjust function is called by running p rise.

You can run step to get into the function, but the error is unlikely to be there, so you can print the current raise before and after executing the function. You know that, as she is earning a million dollars, her pay should be adjusted, and, therefore, the rise should be 0.2 after executing it:

In [14]:
(Pdb) p rise
0.3
(Pdb) n
> /Lesson08/1.debugging/Exercise112.py (27)calculate_new_salary()
-> if rise >= 0.20:
(Pdb) p rise
0.19999999999999998

SyntaxError: invalid syntax (<ipython-input-14-828ab442c4a5>, line 1)

The adjusted raise is 0.199999999999999998 rather than 0.20, so what is going on here? There is clearly an issue within the _manager_adjust function. You will have to restart the debugging and investigate it.
10. You can then continue to the second execution and print the lines and arguments at that point, by running "c", "c", "ll" and "args" as follows:

In [None]:
(Pdb) b _manager_adjust
Breakpoint 2 at /Lesson08/1.debugging/ Exercise112.py:3
(Pdb) restart

You see the input is what you expected (0.3), but you know the output is not. Rather than 0.2, you are getting 0.19999999999999998. Let's walk through this function code to understand what is happening. By running "n" three times until the end of the function, you can then use "rv" to see the returned value as follows:

In [15]:
(Pdb) n
> /Lesson08/1.debugging/ Exercise112.py (8)_manager_adjust()
-> if salary >= 1_000_000:
(Pdb) n
> /Lesson08/1.debugging/ Exercise112.py (10)_manager_adjust()
-> return rise - 0.10
(Pdb) n
--Return--
> /Lesson08/1.debugging/ Exercise112.py (10)_manager_adjust()->0.19999999999999998
-> return rise - 0.10
(Pdb) rv
0.19999999999999998

SyntaxError: invalid syntax (<ipython-input-15-f474e4b6ee63>, line 1)

You found the error: when we are subtracting 0.10 from 0.30, the result is not 0.20 as you might have expected. It is that weird number, 0.19999999999999998, due to the loose precision of float numbers. This is a well-known issue in computer science. We should not rely on floats for equality comparison if you need fraction numbers, we should use the decimal module instead, as we have seen in previous chapters.

In this exercise, you have learned how to identify errors when you perform debugging. You can now start to think about how to fix these errors and propose solutions to our colleagues.

##### Activity: Debugging Sample Python Code for an Application
Consider the following scenario: you have a program that creates a picnic basket for you. The baskets are created in a function that depends on whether the user wants a healthy meal and whether they are hungry. You provide a set of initial items in the basket, but users can also customize this via a parameter.

A user reported that they got more strawberries than expected when creating multiple baskets. When asked for more information, they said that they tried to create a healthy basket for a non-hungry person first, and a non-healthy basket for a hungry person with just "tea" in the initial basket. Those two baskets were created correctly, but when the third basket was created for a healthy person who was also hungry, the basket appeared with one more strawberry than expected.

In this activity, you need to run the reproducers mentioned and check for the error in the third basket. Once you have found the error with the basket, you need to debug the code and fix the error.

There is a reproducer in the code example, so continue the debugging from there, and figure out where the issue is in the code.

Take a look at the following steps:

- First, write test cases with the inputs provided in the preceding table.
- Next, confirm whether the error report is genuine.
- Then, run the reproducers in the code file and confirm the error in the code.
- Finally, fix the code with the simple logic of if and else. 

In [16]:
DEFAULT_INITIAL_BASKET = ["orange", "apple"]

def create_picnic_basket(healthy, hungry, initial_basket=DEFAULT_INITIAL_BASKET):
    basket = initial_basket
    if healthy:
        basket.append("strawberry")
    else:
        basket.append("jam")

    if hungry:
        basket.append("sandwich")
    return basket

# Reproducer
print("First basket:", create_picnic_basket(True, False))
print("Second basket:", create_picnic_basket(False, True, ["tea"]))
print("Third basket:", create_picnic_basket(True, True))

First basket: ['orange', 'apple', 'strawberry']
Second basket: ['tea', 'jam', 'sandwich']
Third basket: ['orange', 'apple', 'strawberry', 'strawberry', 'sandwich']


In [17]:
def create_picnic_basket(healthy, hungry, basket=None):
    if basket is None:
        basket = ["orange", "apple"]
    if healthy:
        basket.append("strawberry")
    else:
        basket.append("jam")
    if hungry:
        basket.append("sandwich")
    return basket

print("First basket:", create_picnic_basket(True, False))
print("Second basket:", create_picnic_basket(False, True, ["tea"]))
print("Third basket:", create_picnic_basket(True, True))

First basket: ['orange', 'apple', 'strawberry']
Second basket: ['tea', 'jam', 'sandwich']
Third basket: ['orange', 'apple', 'strawberry', 'sandwich']


#### Automated Testing
Even though you explored and learned how to debug applications when errors are reported, you would prefer not having to find errors in our applications. To increase the chances of having a bug-free code base, most developers rely on automated testing.

At the beginning of their careers, most developers will just manually test their code as they develop it. By just providing a set of inputs and verifying the output of the program, you can get a basic level of confidence that our code "works." But this quickly becomes tedious and does not scale as the code base grows and evolves. Automated testing allows you to record a series of steps and stimuli that you perform in our code and have a series of expected output recorded.

This is extremely efficient to reduce the number of bugs in our code base, because not only are we verifying the code, but we are also implementing it, and you keep a record of all those verifications for future modifications of the codebase.

The amount of test lines that you write for each line of code really depends on each application. There are notorious cases, such as SQLite, where orders of magnitude more lines of tests are needed than lines of code, which greatly improves confidence in the software and allows quick release of new versions as features are added without needing the extensive quality assurance (QA) that other systems might require.

Automated testing is similar to the QA process that we see in other engineering fields. It is a key step of all software development and should be taken into account when developing a system.

Additionally, having automated tests also helps you to troubleshoot, as we have a set of test scenarios that you can adapt to simulate the user's input and environment and keep what is known as a regression test (not to be confused with Regression mentioned in Machine Learning course. Regression testing is performed whenever there is a code change and it verifies whether the cases that were working earlier are still working in the same manner after the change.

##### Test Categorization
One of the first things to think about when writing an automated test is "What are we verifying?". And that would depend on the "level" of testing that you are doing. There is a lot of literature about how to categorize different test scenarios in the functions that they validate and the corresponding dependencies they have. It is not the same to write a test that just validates a simple Python function in our source code, as it is to write something that validates an accounting system that connects to the internet and sends emails. To validate large systems, it is common to create different types of tests. They are usually known as the following:

- Unit tests: These are tests that just validate a small part of your code. Usually, they just validate a function with specific inputs within one of your files and only depend on code that has already been validated with other unit tests.
- System Integration tests (SIT): These are more coarse-grained tests that will either validate interactions between different components of your codebase (known as integration tests without environment) or the interactions between your code and other systems and the environment (known as integration tests with the environment).
- Functional or end-to-end tests / User Acceptance Testing (UAT): These are usually really high-level tests that depend on the environment and often on external systems that validate the solution with inputs as the user provides them.

Say that you were to test the workings of Twitter, using the tests you are familiar with:

A unit test would verify one of the functions, which will check whether a tweet body is shorter than a specific length.
An integration test would validate that, when a tweet is injected into the system, the trigger to other users is called.
An end-to-end test is one that ensures that, when a user writes a tweet and clicks Send, they can then see it on their home page.

Software developers tend to prefer unit tests as they don't have external dependencies and are more stable and faster to run. The further we go into more coarse-grained tests, the more we'll come across what the user will perform, but both integration and end-to-end tests usually take much longer to run, as the dependencies need to be set up and they are usually flakier because, for example, the email server might not be working on that day, meaning we would be unable to run our tests.

Note: This categorization is a simplification of many experts working in the field. If you are interested in the different levels of testing and getting the right balance of tests, then a good place to start is the famous Testing Pyramid.

##### Test Coverage
Something that generates debate across the community is test coverage. When you write tests for our code, you start to exercise it and begin to hit different code paths. As you write more tests, we cover more and more of the code that you are testing. The percentage of code that you test is known as test coverage, and developers will argue that different percentages are "the right amount." Getting to 100% coverage might seem an unnecessary task, but it proves to be quite useful in large codebases that need to perform tasks such as migrating from Python 2 to Python 3. However, this all depends on how much you are willing to invest in testing your application, and each developer might target a different number for each of the projects that they run.

Moreover, something important to remember is that 100% coverage does not mean that your code does not have bugs. You can write tests that exercise your code but do not properly validate it, so be mindful of falling into the trap of just writing tests to hit the coverage target. Tests should be written to exercise the code with inputs that will be provided by users and try to find edge cases that can uncover issues with the assumptions that you made at the time that you wrote it, and not just to hit a number.

##### Writing Tests in Python with Unit Testing
The Python standard library comes with a module, unittest, to write test scenarios and validate your code. Usually, when you are creating tests, we create a file for the test to validate the source code of another file. In that file, you can create a class that inherits from unittest.TestCase and has method names that contain the word test to be run on execution. You can record expectations through functions such as assertEquals and assertTrue, which are part of the base class, and you can, therefore, access them.

##### Exercise: Checking Sample Code with Unit Testing
In this exercise, you will write and run tests for a function that checks whether a number is divisible by another. This will help you to validate the implementation and potentially find any existing bugs:

1. Create a function, is_divisible, which checks whether a number is divisible by another. Save this function in a file named sample_code.This function is also provided in the sample_code.py file. The file just has a single function that checks whether a number is divisible by another:

In [19]:
def is_divisible(x, y):
    if x % y == 0:
        return True
    else:
        return False

2. Create a test file that will include the test cases for our function. Then, add the skeleton for a test case:

In [None]:
import unittest
from sample_code import is_divisible
class TestIsDivisible(unittest.TestCase):
    def test_divisible_numbers(self):
        pass
if __name__ == '__main__':
    unittest.main()

This code imports the function to test, is_divisible, and the unittest module. It then creates the common boilerplate to start writing tests: a class that inherits from unittest.TestCase and two final lines that allow us to run the code and execute the tests.

3. Now, write the test code

In [None]:
    def test_divisible_numbers(self):
        self.assertTrue(is_divisible(10, 2))
        self.assertTrue(is_divisible(10, 10))
        self.assertTrue(is_divisible(1000, 1))
    def test_not_divisible_numbers(self):
        self.assertFalse(is_divisible(5, 3))
        self.assertFalse(is_divisible(5, 6))
        self.assertFalse(is_divisible(10, 3)

You now write the code for Your tests by using the self.assertX methods. There are different kinds of methods for different kinds of asserts. For example, self.assertEqual will check whether the two arguments are equal or fail otherwise. You will use self.assertTrue and self.assertFalse. With this, you can create the preceding tests.
4. Run the test:

Run the test by executing it with a Python interpreter. By using -v, you get extra information about the test names as the tests are running

In [None]:
python test_unittest.py -v (in your anaconda shell)

5. Now, add more complex tests:

In [None]:
   def test_dividing_by_0(self):
        with self.assertRaises(ZeroDivisionError):
            is_divisible(1, 0)

By adding a test when you pass 0, you want to check whether it will raise an exception.
The assertRaises context manager will validate that the function raises the exception passed in within the context.
So, there you go: you have a test suite with the standard library unittest module.
Unit testing is a great tool for writing automated tests, but the community seems to generally prefer to use a third-party tool named Pytest. Pytest allows the user to write tests by just having plain functions in their function and by using Python assert.

This means that rather than using self.assertEquals(a, b), you can just do assert a == b. Additionally, pytest comes with some enhancements, such as capturing output, modular fixtures, or user-defined plugins. If you plan to develop any test suite that is bigger than a few tests, consider checking for pytest.

In [26]:
#full test code
import unittest
from sample_code import is_divisible

class TestIsDivisible(unittest.TestCase):
    
    def test_divisible_numbers(self):
        self.assertTrue(is_divisible(10, 2))
        self.assertTrue(is_divisible(10, 10))
        self.assertTrue(is_divisible(1000, 1))
    
    def test_not_divisible_numbers(self):
        self.assertFalse(is_divisible(5, 3))
        self.assertFalse(is_divisible(5, 6))
        self.assertFalse(is_divisible(10, 3))
    def test_dividing_by_0(self):
        with self.assertRaises(ZeroDivisionError):
            is_divisible (1, 0)
        
if __name__ == '__main__':
    unittest.main()

ImportError: cannot import name 'is_divisible' from 'sample_code' (C:\Users\chidi\Personal_Tutorials\sample_code.py)

#### Writing a Test with pytest
Even if a unit test is part of the standard library, it is more common to see developers use pytest to run and write the test.

In [None]:
from sample_code import is_divisible
import pytest
def test_divisible_numbers():
    assert is_divisible(10, 2) is True
    assert is_divisible(10, 10) is True
    assert is_divisible(1000, 1) is True
def test_not_divisible_numbers():
    assert is_divisible(5, 3) is False
    assert is_divisible(5, 6) is False
    assert is_divisible(10, 3) is False
def test_dividing_by_0():
    with pytest.raises(ZeroDivisionError):
        is_divisible(1, 0)

This code creates three test cases by using pytest. The main difference is that having a class that has assert methods within it, you can create free functions and use the assert keyword of Python itself. This also gives us more explicit error reports when they fail. To execute a pytest based unit test, run the following code:

In [None]:
pytest test_pytest.py

#### Creating a PIP Package
When you are working with Python code, you need to differentiate between the source code tree, the source distributions (sdist), and a binary distribution (wheels for example which is explained ahead). The folder where you work on the code is known as the source code tree, which is essentially how it is presented in the folder. This also contains Git files, configuration files, and others. The source distribution is a way to package our code so that it can be executed and installed on any machine—it just contains all the source code without any development-related files. A binary distribution is similar to source distribution, but it comes with the files ready to be installed on the system—there is no execution needed in the client host. Wheels are a particular standard for binary distributions that replace the old format, Python eggs. When we consume Python wheels we just get a file that is ready to be installed without the need of any compilation or build step, just ready to be consumed. This is especially useful for Python packages with C extensions.

When you want to distribute our code to users, you need to create source or binary distributions and then upload them to a repository. The most common Python repository is PyPI, which allows users to install packages by using pip.

The Python Packaging Index (PyPI), is an official package repository maintained by the Python Software Foundation that contains Python packages. Anyone can publish packages to it, and many Python tools usually default to consume packages from it. The most common way to consume from PyPI is through pip, which is the Python Packaging Authority (PyPA). This is the recommended tool for consuming Python packages.

The most common tool to package our source code is setuptools. With setuptools, you can create a setup.py file that contains all the information about how to create and install the package. Setuptools comes with a method named setup, which should be called with all the metadata that we want to create a package with.

Here's some example boilerplate code that could be copied and pasted when creating a package:

In [29]:
import setuptools
setuptools.setup(
    name="packt-sample-package",
    version="1.0.0",
    author="Author Name",
    author_email="author@email.com",
    description="packt example package",
    long_description="This is the longer description and will appear in the web.",
    py_modules=["packt"],
    classifiers=[
        "Programming Language :: Python :: 3",
        "Operating System :: OS Independent",
    ],
)

SystemExit: usage: ipykernel_launcher.py [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
   or: ipykernel_launcher.py --help [cmd1 cmd2 ...]
   or: ipykernel_launcher.py --help-commands
   or: ipykernel_launcher.py cmd --help

error: option -f not recognized

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)


Take special note of the following parameters:

- Name: The name of the package in PyPA. It is a good practice to have it match your library or file import name.
- Version: A string that identifies the version of the package.
- Py_modules: A list of Python files to package. You can also use the package keyword to target full Python packages— you will explore how to do this in the next exercise.

You can now create the source distribution by running the following:

In [None]:
python3.7 setup.py sdist (in the anaconda shell)

This will generate a file in the dist folder, which is ready to be distributed to PyPI.

If you have the wheel package installed, you can also run the following to create a wheel:

python3.7 setup.py bdist_wheel

Once you have this file generated, you can install Twine, which is the tool recommended by the PyPA for uploading packages to PyPI. With twine installed, you just need to run the following:

twine upload dist/*

You can test our package by installing any of the artifacts in the dist folder.

Usually, you won't just have a single file to distribute, but a whole set of files within a folder, which makes a Python package. In those situations, there is no need to write all the files within the folder one by one—you can just use the following line instead of the py_module option:

packages=setuptools.find_packages(),

This will find and include all the packages in the directory where the setup.py file is.

##### Exercise: Creating a Distribution That Includes Multiple Files within a Package
In this exercise, you are going to create our own package that can contain multiple files and upload them to the test version of PyPI:

1. Create a virtual environment and install twine and setuptools. Start by creating a virtual environment with all the dependencies that you need. Make sure you are in an empty folder to start:

python3.7 –m venv venv

. venv/bin/activate

python3.7 –m pip install twine setuptools

You now have all the dependencies we need to create and distribute our package.

2. Create the actual package source code. You will create a Python package named john_doe_package.Note, please change this to your first and last name:

mkdir john_doe_package

touch john_doe_package/__init__.py

echo "print('Package imported')" > john_doe_package/code.py

The second line will create a Python file, which you will package within the Python package. The aforementioned code snippets are useful for Linux users only. For Windows user, the following steps must be performed. Create a folder john_doe_package and create a file named __init__.py in it. Create another file named code.py and paste the following line of code in it:
print('Package imported')

This is a basic Python package that just contains an __init__ file and another file named code—we can add as many files as desired. The '__init__' file marks the folder as a Python package.
3. Add the setup.py file. You need to add a setup.py file at the top of our source tree to indicate how our code should be packaged. Add a setup.py file like the following:



In [None]:
import setuptools
setuptools.setup(
    name="john_doe_package",
    version="1.0.0",
    author="Author Name",
    author_email="author@email.com",
    description="packt example package",
    long_description="This is the longer description and will appear in 
the web.",
    packages=setuptools.find_packages(),
    classifiers=[
        "Programming Language :: Python :: 3",
        "Operating System :: OS Independent",
    ],
)


The previously mentioned code is a function call where you pass all the metadata. Be sure to change john_doe_package to the name of your own package.
4. Create the distribution by calling the setup.py file:

python3.7 setup.py sdist

This will create a source distribution. You can test it out by installing it locally:

cd dist && python3.7 –m pip install *

5. All Windows users, please follow these steps:
6. Go to CMD and navigate to the dist folder
7. Inside the dist folder, execute the following command

python3.7 –m pip install john_doe_package-1.0.0.tar.gz

8. Upload to the PyPI test:

twine upload --repository-url=https://test.pypi.org/legacy/ dist/*
9. The last step is to upload the file to the test version of PyPI. Note that this method is applicable only to Linux users. To run this step, you need an account in Test PyPI. Go to test.pypi.org to create one. Once created, you can run the following command to upload the package to the web:

This will prompt you for the user and password that you used to create your account. Once this is uploaded, you can go here, click on your project

#### Adding More Information to Your Package
So, you have seen how to create a really simple package. When you create a package, you should also include a README file that can be used to generate a description of the project and is part of the source distribution. This file gets packaged by default.

Consider exploring the different attributes that can be used with setuptools.setup. By having a look through documentation, you can find a lot of useful metadata that might be appropriate for your package.

Additionally, to facilitate testing, many people consider it to be good practice to place all the source code of your package within an src directory. This is done to prevent the Python interpreter from automatically finding your package, as it is part of the current working directory, as Python adds the current working directory to the Python path. If your package contains any logic about the data files that are packaged with your code, you should really use the src directory, as it will force you to work against the installed version of your package, rather than the source directory tree.

PyPA has recently created a guide on how to package projects, which contains further details than those discussed in this book.

Note: If you need to package multiple applications, consider having a look through packaging.python.org.

#### Creating Documentation the Easy Way
A critical part of all software that is distributed across the world is documentation. Documentation allows the users of your code to be able to understand calling the different functions that we provide without having to read the code. There are multiple levels of documentation that you are going to explore in this topic. You will see how to write documentation that can be consumed in the console and on the web. In the purpose and size of our project, you should consider how broad our documentation should be and what kind of instructions and information it should contain.

##### Docstrings
In Python, documentation is part of the language. When you declare a function, you can use docstrings to document its interface and behavior. Docstrings can be created by having a triple-quoted string block just after the function signature. This content is not only available to the reader but also to the user of the API, as it is part of a __doc__ attribute of the function, class, or module. It is the content that will be provided if we call the help function in the object passed. As an example, take a look at the contents of the __doc__ attribute of the print function:





In [None]:
print(print.__doc__)

It is the same content as calling help(print). You can create your own function with a __doc__ attribute, as follows

In [None]:
>>>def example():
    """Prints the example text"""
    print("Example")
>>>example.__doc__
'Prints the example text' 

You can now use help in your function, by executing the "help(example)"

Docstrings usually contain a title with a short description of the function and a body with further information about what it does in detail. Additionally, you can also document all the parameters the function takes, including its types, the return type, and whether it raises any exceptions. This is all really useful information for your users and even for ourselves when you have to use the code at a later time.

#### Using Sphinx
Using docstrings to document APIs is useful, but quite often you need something more. You want to generate a website with guides and other information about your library. In Python, the most common way to do this is via Sphinx. Sphinx allows you to generate documentation in multiple formats, such as PDF, epub, or html, easily from RST with some markup. Sphinx also comes with multiple plugins, and some of them are useful for Python, such as generating API documentation from docstrings or allowing you to view code behind the API implementation.

Once installed via pip, it comes with two main CLI scripts, which the user interacts with: sphinx-build and sphinx-quickstart. The first is used to build the documentation on an existing project with Sphinx configuration, while the second can be used to quickly bootstrap a project.

When you bootstrap a project, Sphinx will generate multiple files for you, and the most important ones are as follows:

- Conf.py: This contains all the user configuration for generating the documentation. This is the most common place to look for configuration parameters when you want to customize something from the Sphinx output.
- Makefile: An easy-to-use makefile that can be used to generate the documentation with a simple "make html." There are other targets that can be useful, such as the one to run doctests.
- Index.rst: The main entry point for our documentation.

Usually, most projects create a folder named docs within their source tree root to contain everything related to the documentation and Sphinx. This folder can then refer to the source code by either installing it or by adding it to the path in their configuration file.

If you are not familiar with RST, it is best to have a quick look through sphinx-doc.org. It has a short explanation of the different special syntaxes you can find in RST that will be translated into special HTML tags such as links, anchors, tables, images, and others.

On top of this, Sphinx is easily extendible via plugins. Some of them are part of the default distribution when you install sphinx. Plugins allow you to extend the functionality to do things such as automatically create documentation for your modules, classes, and functions by just writing a single directive.

Finally, there are multiple themes available when you generate documentation with Sphinx—these are all configurable in conf.py. Quite often, you can find more Sphinx themes available on PyPI, which can be just installed easily via pip.

##### Exercise: Documenting a Divisible Code File
In this exercise, you are going to document the module that you created in the testing topic, divisible.py

1. Create a folder structure.

First, create an empty folder with just the divisible.py module and another empty folder named docs. The divisible.py module should contain the following code:

In [None]:
def is_divisible(x, y):
    if x % y == 0:
        return True
    else:
        return False

2. Run the sphinx quick-start tool:

Make sure you have Sphinx installed (otherwise, run python3.7 –m pip install sphinx –user) and run sphinx-quickstart within the docs folder. You can leave all the functions with the default value by pressing return when prompted, except for the following:

Project name: divisible.

Author name: Write your name here.

Project Release: 1.0.0.

Autodoc: y.Intersphinx: y.

With these options, you are ready to start a project that can be easily documented and generate HTML output with Sphinx. Additionally, you have enabled two of the most common plugins: autodoc, which we will use to generate documentation out of the code; and intersphinx, which allows you to reference other sphinx projects, such as the Python standard library.

3. Build the documentation for the first time.

Building the documentation is easy—just run make html within the docs directory to generate the HTML output of your documentation. You can now open the index.html file in your browser within the docs/build/html folder.

4. Configure Sphinx to find our code.

The next step is to generate and include documentation from your Python source code. The first thing that you will have to do to be able to do that is to edit the conf.py file within the docs folder and uncomment these three lines:

#import os

#import sys

#sys.path.insert(0, os.path.abspath('.'))

Once uncommented, the last line should be changed to this since you have our divisible source code one level above our code:

sys.path.insert(0, os.path.abspath('..'))

A better alternative to this would be to make sure your package is installed when running Sphinx—this is a more extended method, but a simpler solution.

Last but not least, you are going to use another plugin, called Napoleon. This allows you to format your functions by using the Napoleon syntax. To do so, add the following line in the list of extensions within the conf.py file, within the extensions variable, after 'sphinx.ext.autodoc':

    'sphinx.ext.napoleon',
You can read here for more information about the Napoleon syntax for Sphinx.

5. Generate documentation from the source code.

Adding the documentation from a module to Sphinx is really simple—you can just add the following two lines to your index.rst:

    automodule:: divisible
    :members:
Once those two lines are added, run make html again and check whether an error is generated. If no error appears, then you are all set. You have configured Sphinx to bring the documentation from docstrings to your rst file.

6. Add docstrings.

To give Sphinx something to work with, add a docstring at the module level and one docstring for the function that you defined.

Our divisible.py file should now look like the following:

In [None]:
"""Functions to work with divisibles"""
def is_divisible(x, y):
    """Checks if a number is divisible by another
    Arguments:
        x (int): Divisor of the operation.
        y (int): Dividend of the operation.
    Returns:
        True if x can be divided by y without reminder,
        False otherwise.
    Raises:
        :obj:'ZeroDivisionError' if y is 0.
    """
    if x % y == 0:
        return True
    else:
        return False`

You are using the napoleon style syntax to define the different arguments that our function takes, what it can return, and the exception it raises. Note that you use a special syntax to reference the exception that it raises. This will generate a link to the definition of the object.

If you run make html again, you should get the following output:

#### More Complex Documentation
In the previous exercise, you examined simple documentation for a really small module. Most libraries also include tutorials and guides along with their API documentation. Check Django, flask, or CPython as examples, as they are all generated with Sphinx.

Note that if you intend our library to be used extensively and successfully, then documentation will be a key part of it. When you want to document how an API behaves, you should use just the plain API documentation that you generated before. However, there is also room to create small guides for specific features or tutorials to walk users through the most common steps to start a project.

Additionally, there are tools such as readthedocs, which greatly simplifies the generation and hosting of documentation. You can take the project that we just generated and connect it to readthedocs through their UI to have our documentation hosted on the web and automatically regenerated every time you update the master branch of our project.

Note: You can go to readthedocs.org to create an account and set up your repositories in GitHub to automatically generate documentation.

#### Source Management
When you work with code, you need a way in which to keep a picture of how your code evolves and how changes are being applied to different files. For instance, say that, by mistake, you make changes to your code that suddenly breaks it, or you start to make changes and just want to go back to the previous version. Many people start with just copying their source code into different folders and naming them with a timestamp based on checkpoints they make on different phases of the project. This is the most rudimentary approach to version control.

Version control is the system by which you keep control of code as it evolves over time. Developers have been suffering for long enough to create a piece of software that can do this job efficiently, and one of the most popular tools to do this is a Git. Git is a Distributed Version Control System that allows developers to manage their code locally as it evolves, look at the history, and easily collaborate with other developers. Git is used to manage some of the biggest projects around the world, such as the Windows kernel, CPython, Linux, or Git itself; however, at the same time, git is really useful and versatile for small projects as well.

##### Repository
A repository is an isolated workspace where you can work with your changes and have git record them and track the history of them. One repository can contain as many files and folders as you want, with all of them tracked by git.

There are two ways to create a repository: you can either clone an existing repository by using git clone (url of the repository>), which will bring a local copy of a repository into your current path, or you can create a repository from an existing folder with git init, which will just mark the folder as a repository by creating the necessary files.

Once you have a repository locally created, you can start to work with our version control system by issuing different commands to indicate whether you want to add changes, check previous versions, or more.

##### Commit
A commit object is the history of our repository. Each repository has many commits: one for every time we use git commit. Each of those commits will contain the commit title, the person who added the commit to the repository, the author of the changes, the dates when the commit and the changes were made, an ID that is represented by a hash, and the hash of the parent commit. With this, you can create a tree of all the commits within the repository, which allows us to see the history of our source code. You can see the content of any commit by running git show <(commit sha>).

When you run git commit, we create a commit from all the changes that you have in the staging area. An editor will open, which includes some meta-information such as the title and the commit body.

Note: A good guide on how to write good commit messages can be found here. You suggest that you take a look after finishing this workshop.

##### Staging Area
When you are working locally, making changes to our files and source code, git will report that those changes happened, and they are not saved. By running git status, you can see what files were modified. If you decide that we want to save those changes in the staging area in preparation for a commit, you can add them with the git add <(path>) command. It can be used in files or folders to add all the files within that folder. Once they are added to the staging area, the next git commit command will save the changes in the repository through a commit object.

Sometimes, you don't want to add all the contents of a file to the staging area, just part of them. For this use case, both git commit and git add have an option to guide you through the different changes in the file and allow you to select which ones you want to add. This is through the -p option, which will ask you for each of the changed chunks within your code, which ones you do want to add.

##### Undoing Local Changes
When you are working on a file, you can run git diff to see all the changes that have been made locally but are not yet part of the staging area or a commit. Sometimes, you realize we want to undo our changes and come back to the version you have saved in the staging area or in the last commit. You can do this by checking out the file by running git checkout <path)>. This applies to both files and folders. Alternatively, you can remove all local changes by git stash.

If instead, you want to revert our repository back to a previous commit in history, you can do this by running git reset <(commit sha>).

##### History
As you mentioned before, the repository has a commit history. This includes all the commits that have been performed before. You can see them by running git log, which will present you with the title, body, and some other information. The most important part of each of these entries is the sha of the commit, which uniquely represents each of the commits.

##### Ignoring Files
When you work with our source code, we may find that, by running our program or any other action, you have files in your repository that you don't want git to track. In that scenario, you can use a special file that has to be placed at the top of the directory and named .gitignore, which can list all the files in glob patterns that you don't want git to track. This is especially handy for adding things such as IDE-generated files, compiled Python files, and more.

##### Exercise: Making a Change in CPython Using git
In this exercise, you are going to change a file in the local CPython repository by cloning the repository and working on our local copy. For the sake of the exercise, you will just add our name to the list of authors of the project.

Note: The repository will be on your local PC, so no one will see the changes – don't worry.

You begin by first installing git. That is the first step to installing the tool itself. You can install it on Windows via git-scm.com, or in Unix by following the instructions here.

If you are running on Windows, follow this exercise by using the git-shell for Windows. On Unix, just use your preferred Terminal:

1. Now, begin by cloning the cpython repository.

As you mentioned before, you can create a repository by simply cloning it. You can clone the cpython source code by running the following:

git clone https://github.com/python/cpython.git
This will create a folder named cpython in the current workspace. Don't worry; it is normal for it to take a few minutes, as CPython has a lot of code and long history:

2. Edit the Misc/ACKS file and confirm the changes.

You can now add your name to the Misc/ACKS file. To do this, just open the file in that path and add your name in alphabetical and your surname.

Check the changes by running git status. This command will show you whether there are any changed files:

Note how it gives you instructions on how to proceed if you want to add the changes to the staging area in preparation for a commit or to reset them. Let's check the content of the changes by running git diff:

This provides you with a nice output that indicates the changes in the lines. Green with a plus sign means that a line was added, while red with a minus sign means a line was removed.

3. Now commit the changes.

Now that you are happy with the changes that you have made let's add those to the staging area by running git add Misc/ACKS, which will move the file into the staging area, allowing us to then commit them at any time by running git commit. When you run git commit, an editor will open to create the commit. Add a title and body (separated by an empty line):

When you close the editor and save, the commit should be created:

You have created your first commit. You can check the contents of it by running git show:



#### Developing Collaboratively
At its heart, membership of a programming team involves multiple people sharing their changes through git and ensuring that you are incorporating everybody else's changes when doing your own work.

There are many ways for people to work together using git. The developers of the Linux kernel each maintain their own repository and share potential changes over email, which they each choose whether to incorporate or not. Large companies, including Facebook and Google, use trunk-based development, in which all changes must be made on the main branch, usually called the "master."

A common workflow popularized by support in the GitHub user interface is the pull request.

In the pull request workflow, you maintain your repository as a fork in GitHub of the canonical version from which software releases are made, often referred to as upstream or origin. You make a small collection of related changes, each representing progress toward a single bug fix or new feature, in a named branch on your own repository, which you push to your hosted repository with git push. When you are ready, you submit a pull request to the upstream repository. The team reviews these changes together in the pull request, and you add any further work needed to the branch. When the team is happy with the pull request, a supervisor or another developer merges it upstream, and the changes are "pulled" into the canonical version of the software.

The advantage of the pull request workflow is that it's made easy by the user interface in applications such as Bitbucket, GitHub, and GitLab. The disadvantage comes from keeping those branches around while the pull request is being created and is under review. It's easy to fall behind as other work goes into the upstream repository, leaving your branch out of date and introducing the possibility that your change will conflict with some other changes, and those conflicts will need a resolution.

To deal with fresh changes and conflicts as they arise, rather than as a huge headache when it comes time to merge the pull request, you use git to fetch changes from the upstream repository, and either merge them into your branch or rebase your branch on the up-to-date upstream revision. Merging combines the history of commits on two branches and rebasing reapplies commits such that they start at the tip of the branch you are rebasing against. Your team should decide which of these approaches they prefer.

##### Exercise: Writing Python on GitHub as a Team
In this exercise, you will learn how to host code on GitHub, make a pull request, and then approve changes to the code. To make this exercise more effective, you can collaborate with a friend.

1. Log into Github and create a new repository

2. Give the repository an appropriate name, such as python-demo, and click on Create.

3. Now click on Clone or download, and you will be able to see the HTTPS URL; however, note that we will need the SSH URL. Hence, you will see Use SSH on the same tab, which you need to click on:

4. Now copy the SSH URL on GitHub. Then, using your local Command Prompt, such as CMD in Windows, clone the repository:

5. In your new python-demo directory, create a Python file. It doesn't matter what it contains; for instance, create a simple one-line test.py file, as shown in the following code snippet:

echo "x = 5" >> test.py

6. Let's commit our changes:

git add .

git commit -m "Initial"

git push origin master

7. Create a new branch called dev:

git checkout -b dev

8. Create a new file called hello_world.py. This can be done in a text editor, or with the following simple command:

echo "print("Hello World!")" >> hello_world.py

9. commit the new file to the dev branch and push it to the created python-demo repository:

git add .

git commit -m "Adding hello_world"

git push --set-upstream origin dev

10. Go to the project repository in your web browser and click on Compare & pull request:

11. Here, you can see a list of changes made on the dev branch that you created. You can also provide an explanation that someone else might read when reviewing your code before deciding whether or not it should be committed to the master branch:

12. Click on Create pull request to add the justifications on GitHub.

Now, if working with a partner, you should switch back to the original repository that you own and view their pull request. You could comment on it if you have any concerns regarding the commit request; otherwise, you can simply click on Merge pull request:

#### Dependency Management
In the IT world, most complex programs depend on libraries beyond the Python standard library. You may use numpy or pandas to deal with multidimensional data or matplotlib to visualize data in graphs or any number of other libraries available to Python developers.

Just like your own software, the libraries developed by other teams frequently change as bugs are fixed, features are added, and old code is removed or refactored, which is the process of restructuring existing code. That means it's important that your team uses the same version of a library so that it works in the same way for all of them.

Additionally, you want your customers or the servers where you deploy your software to use the same versions of the same libraries as well, so that everything works the same way on their computers, too.

There are multiple tools for solving this problem. These include pip, easy_install, brew, and conda, to name a few. You are already familiar with pip, and in some contexts, it suffices to use this package manager to keep track of dependencies.

#### Virtual Environments
Here, you will use conda to create "virtual environments." When you code in Python, you have certain versions of certain packages installed. You're also using a specific version of Python, However, what if you are working on two projects, with each requiring different versions of the packages? You would need to reinstall all of the packages when switching between these projects, which would be a hassle. Virtual environments address this problem. 

A virtual environment contains a set of particular packages at specific versions. By switching between virtual environments, you can switch between different packages and versions instantly. Typically, you will have a different virtual environment for each major project you are working on.

##### Exercise: Creating and Setting Up a conda Virtual Environment to install numpy and pandas
In this exercise, you'll create a virtual environment with conda and execute some simple code to import basic libraries. This exercise will be performed in the conda environment.

Now, with conda installed on your system, you can create a new conda environment and include packages in it; for example, numpy.

1. Now you should run the following command using the Anaconda Prompt program, which is now installed on your computer:

(conda create -n example_env numpy)

2. Activate the conda environment:

(conda activate example_env)

You can add other packages to the environment with conda install.

3. Now, add pandas to the example_env environment:

(conda install pandas)

4. Next, open a Python terminal within the virtual environment by typing in python and then verify that you can import pandas as numpy as expected:

python

import pandas as pd

import numpy as np

5. Now, exit the Python terminal in the virtual environment using the exit() method:

exit()

6. Finally, deactivate the virtual environment:

(conda deactivate)

Note: You may have noticed the $ sign in the prompts. While working on the prompt, you need to ignore the dollar sign. The dollar sign is just to mention that the command will be executed on the terminal.

##### Saving and Sharing Virtual Environments
Now, suppose you have built an application that relies on various Python packages. You now decide that you want to run the application on a server, so you want a way of setting up the same virtual environment on the server as you have running on your local machine. As you previously encountered with pip freeze, the metadata defining a conda environment can be easily exported to a file that can be used to recreate an identical environment on another computer.

##### Exercise: Sharing Environments between a conda Server and Your Local System
In this exercise, you will export the metadata of our example_env conda environment,Creating and Setting Up a conda Virtual Environment to Install numpy and pandas, to a text file and learn how to recreate the same environment using this file.

This exercise will be performed on the conda environment command line:

1. Activate your example environment, for example_env:

conda activate example_env

2. Now, export the environment to a text file:

conda env export > example_env.yml

The env export command produces the text metadata (which is mainly just a list of Python package versions), and the > example_env.yml part of the command stores this text in a file. Note that the .yml extension is a special easy-to-read file format that is usually used to store configuration information.

3. Now deactivate that environment and remove it from conda:

conda deactivate

conda env remove --name example_env

4. You no longer have an example_env environment, but you can recreate it by importing the example_env.yml file you created earlier in the exercise:

conda env create -f example_env.yml 

You have now learned how to save your environment and create an environment using the saved file. This approach could be used when transferring your environment between your personal computers when collaborating with another developer, or even when deploying code to a server.

#### Deploying Code into Production
You have all of the pieces now to get your code onto another computer and get it running. You can use PIP to create a package, and conda to create a portable definition of the environment needed for your code to run. These tools still give users a few steps to follow to get up and running, and each step adds effort and complexity that may put them off.

A common tool for one-command setup and installation of software is Docker. Docker is based on Linux container technologies. However, because the Linux kernel is open source, developers have been able to make it so that Docker containers can run on both Windows and macOS. Programmers create Docker images, which are Linux filesystems containing all of the code, tools, and configuration files necessary to run their applications. Users download these images and use Docker to execute them or deploy the images into networks using docker-compose, Docker Swarm, Kubernetes, or similar tools.

You prepare your program for Docker by creating a Dockerfile file that tells Docker what goes into your image. In the case of a Python application, that's Python and your Python code.

Firstly, you need to install Docker.

Note that after installing, you may need to restart your computer.

To test Docker, run the hello-world application to confirm that Docker is correctly configured. hello-world is a simple Docker application that comes as part of the standard library of Docker apps:

docker run hello-world

##### Exercise: Exercise 120: Dockerizing Your Fizzbuzz Tool
In this exercise, you'll use Docker to create an executable version of a simple Python script that creates a sequence of numbers. However, instead of printing 3 or multiples of 3, it will print Fizz, and multiples of 5 will print Buzz.

This exercise will be performed in the docker environment:

1. Create a new directory called my_docker_app and cd into this directory, as shown in the following code snippet:

mkdir my_docker_app

cd my_docker_app

2. Within this directory, create an empty file called Dockerfile. You can create this with Jupyter Notebook, or your favorite text editor. Ensure this file does not have any extensions, such as .txt.
3. Now, add the first line to your Dockerfile:

FROM python:3

This line tells it to use a system that has Python 3 installed. Specifically, this is going to use a Python image built on top of a minimal Linux distribution called Alpine.

4. Next, create a fizzbuzz.py file in the my_docker_app directory with the following code:

for num in range(1,101):
    string = ""
    if num % 3 == 0:
        string = string + "Fizz"
    if num % 5 == 0:
        string = string + "Buzz"
    if num % 5 != 0 and num % 3 != 0:
        string = string + str(num)
    print(string)

5. Now ADD a second line to your Dockerfile file. This line tells Docker to include the fizzbuzz.py file in the application:

ADD fizzbuzz.py /

6. Finally, add the command that Docker must run:

CMD [ "python", "./fizzbuzz.py" ]

7. Your Dockerfile file should look like this:

FROM python:3

ADD fizzbuzz.py /

CMD [ "python", "./fizzbuzz.py" ]

Note: This Docker output file will be saved locally on your system. You shouldn't try to access such files directly.

8. Now build your Docker image. You will give it the name fizzbuzz_app:

$ docker build -t fizzbuzz_app .

This command created an image file on your system that contains all of the information required to execute your code in a simple Linux environment.

9. Now you can run your program inside Docker:

docker run fizzbuzz_app

You can see the full list of Docker images available on your system by running docker images. This list should include your new fizzbuzz_app application. 

Finally, suppose your fizzbuzz file imported a third-party library as part of the code. For example, perhaps it used the pandas library (it shouldn't need to). In this case, our code would break, because the installation of Python within the Docker image does not contain the pandas package.

10. To fix this, you can simply add a pip install pandas line to our Dockerfile file. Our updated Dockerfile file will look like this:

FROM python:3

ADD fizzbuzz.py /

RUN pip install pandas

CMD [ "python", "./fizzbuzz.py" ]

#### Multiprocessing
It's common to need to execute more than one thing in parallel in a modern software system. Machine learning programs and scientific simulations benefit from using the multiple cores available in a modern processor, dividing their work up between concurrent threads operating on the parallel hardware. Graphical user interfaces and network servers do their work "in the background," leaving a thread available to respond to user events or new requests.

Python itself uses multiple threads to do some work internally, which puts some limits on the ways in which a Python program can do multiprocessing. 

The three safest ways to work are as follows:

- Find a library that solves your problem and handles multiprocessing for you (which has been carefully tested).
- Launch a new Python interpreter by running another copy of your script as a completely separate process.
- Create a new thread within the existing interpreter to do some work concurrently.

The first of these is the easiest and the most likely to be a success. The second is fairly simple and imposes the most overhead on your computer as the operating system is now running two independent Python scripts. The third is very complicated, easy to get wrong, and still creates a lot of overhead as Python maintains a Global Interpreter Lock (GIL), which means that only one thread at a time can interpret a Python instruction. A quick rule of thumb to choose between the three approaches is to always pick the first one. If a library doesn't exist to address your needs, then pick the second. If you absolutely need to share memory between the concurrent processes, or if your concurrent work is related to handling I/O, then you can choose the third carefully.

#### Multiprocessing with execnet
It's possible to launch a new Python interpreter with the standard library's subprocess module. However, doing so leaves a lot of work up to you about what code to run and how to share data between the "parent" and "child" Python scripts.

An easier interface is the execnet library. execnet makes it very easy to launch a new Python interpreter running some given code, including versions such as Jython and IronPython, which integrate with the Java virtual machine and .NET common language runtime, respectively. It exposes an asynchronous communication channel between the parent and child Python scripts, so the parent can send data that the child works on and get on with its own thing until it's ready to receive the result. If the parent is ready before the child is finished, then the parent waits.

##### Exercise: Working with execnet to Execute a Simple Python Squaring Program
In this exercise, you'll create a squaring process that receives x over an execnet channel and responds with x**2. This is much too small a task to warrant multiprocessing, but it does demonstrate how to use the library.

This exercise will be performed on a Jupyter notebook:

1. First, install execnet using the pip package manager:

$ pip install execnet

2. Now write the square function, which receives numbers on a channel and returns their square:

import execnet

def square(channel):
    
    while not channel.isclosed():
        
        number = channel.receive()
        
        number_squared = number**2
        
        channel.send(number_squared) 

Note: Due to the way execnet works, you must type the following examples into a Jupyter notebook. You cannot type them into the interactive >>> prompt.

The while not channel.isclosed() statement ensures that we only proceed with the calculation if there is an open channel between the parent and child Python processes. number = channel.receive() takes the input from the parent process that you want to square. It is then squared in the number_squared = number**2 code line. Lastly, you send the squared number back to the parent process with channel.send(number_squared).

3. Now set up a gateway channel to a remote Python interpreter running that function:

gateway = execnet.makegateway()

channel = gateway.remote_exec(square)

A gateway channel manages the communication between the parent and child Python processes. The channel is used to actually send and receive data between the processes.

4. Now send some integers from our parent process to the child process, as shown in the following code snippet:

for i in range(10):
    
    channel.send(i)
    
    i_squared = channel.receive()
    
    print(f"{i} squared is {i_squared}") 
    
Here, you loop through 10 integers, send them through the square channel, and then receive the result using the channel.receive() function.

5. When you are done with the remote Python interpreter, close the gateway channel to cause it to quit:

gateway.exit()

In this exercise, you learned how to use execnet to pass instructions between Python processes.

In [2]:
import execnet

In [3]:
def square(channel):
    while not channel.isclosed():
        number = channel.receive()
        number_squared = number**2
        channel.send(number_squared)

In [4]:
gateway = execnet.makegateway()
channel = gateway.remote_exec(square)

In [5]:
for i in range(10):
    channel.send(i)
    i_squared = channel.receive()
    print(f"{i} squared is {i_squared}")

0 squared is 0
1 squared is 1
2 squared is 4
3 squared is 9
4 squared is 16
5 squared is 25
6 squared is 36
7 squared is 49
8 squared is 64
9 squared is 81


In [6]:
gateway.exit()

#### Multiprocessing with the Multiprocessing Package
The multiprocessing module is built into Python's standard library. Similar to execnet, it allows you to launch new Python processes. However, it provides an API that is lower-level than execnet. This means that it's harder to use than execnet, but affords more flexibility. An execnet channel can be simulated by using a pair of multiprocessing queues.