### The Importance of the Standard Library
Python is often described as coming with "batteries included," which is usually a reference to its standard library. The Python standard library is vast, unlike any other language in the tech world. The Python standard library includes modules to connect to a socket; that is, one to send emails, one to connect to SQLite, one to work with the locale module, or one to encode and decode JSON and XML.

It is also renowned for including such modules as turtle and tkinter, graphical interfaces that most users probably don't use anymore, but they have proven useful when Python is taught at schools and universities.

It even includes IDLE, a Python-integrated development environment, it is not widely used as there are either other packages within the standard library that are used more often or external tools to substitute them. These libraries are divided into high-level modules and lower-level modules:

#### High-Level Modules
The Python standard library is truly vast and diverse, providing a toolbelt for the user that can be used to write most of their trivial programs. You can open an interpreter and run the following code snippet to print graphics on the screen. This can be executed on the Python terminal. The code mentioned here is with the >>> symbol:

#>>> from turtle import Turtle, done
#>>> turtle = Turtle()
#>>> turtle.right(180)
#>>> turtle.forward(100)
#>>> turtle.right(90)
#>>> turtle.forward(50)
#>>> done()

This code uses the turtle module which can be used to print the output on the screen. This output will look like the trail of a turtle that follows when the cursor is moved. The turtle module allows the user to interact with the cursor and leave a trail as it keeps moving. It has functions to move around the screen and print as it advances.

Here is a detailed explanation of the turtle module code snippet:

It creates a turtle in the middle of the screen.
It then rotates it 180 degrees to the right.
It moves forward 100 pixels, painting as it walks.
It then rotates to the right once again, this time by 90 degrees.
It then moves forward 50 pixels once again.
It ends the program using done().

You can go ahead and explore and input different values, playing around a bit with the turtle module and checking the different outputs you get, before you dive further into this chapter.

The turtle module you worked on is an example of one of the high-level modules that the standard library offers.

Other examples of high-level modules include:

- Difflib: To check the differences line by line across two blocks of text.
- Re: For regular expressions, which will be covered in Being Pythonic course.
- Sqlite3: To create and interact with SQLite databases.
- Multiple data compressing and archiving modules, such as gzip, zipfile, and tarfile.
- XML, JSON, CSV, and config parser: For working with multiple file formats.
- Sched: To schedule events in the standard library.
- Argparse: For the straightforward creation of command-line interfaces.
Now, you will use another high-level module argparse as an example and see how it can be used to create a command-line interface that echoes words passed in and, optionally, capitalizes them in a few lines of code. This can be executed in the Python terminal:

*>>> import argparse
#>>> parser = argparse.ArgumentParser()
#>>> parser.add_argument("message", help="Message to be echoed")
#>>> parser.add_argument("-c", "--capitalize", action="store_true")
#>>> args = parser.parse_args()
#>>> if args.capitalize:
        print(args.message.capitalize())
    else:
        print(args.message)

This code example creates an instance of the ArgumentParser class, which helps you to create command-line interface applications.

It then defines two arguments in lines 3 and 4: message and capitalize.

Note that capitalize can also be referred to as -c, and we make it a Boolean flag option by changing the default action to store_true. At that point, you can just call parse_args, which will take the arguments passed in the command line, validate them, and expose them as attributes of args.

The code then takes the input message and chooses whether to capitalize it based on the flag.

#### Lower-Level Modules
The standard library also contains multiple lower-level modules that users rarely interact with. These lower-level modules are outside that of the standard library. Good examples are the different internet protocol modules, text formatting and templating, interacting with C code, testing, serving HTTP sites, and so on. The standard library comes with low-level modules to satisfy the needs of users in many of those scenarios, but you will usually see Python developers relying on libraries such as jinja2, requests, flask, cython, and cffi that are built on top of the low-level standard library module as they provide a nicer, simpler, more powerful interface. It is not that you cannot create an extension with the C API or ctypes, but cython allows you to remove a lot of the boilerplate, whereas the standard library requires you to write and optimize the most common scenarios.

Finally, there is another type of low-level module, which extends or simplifies the language. Notable examples of these are the following:

- Asyncio: To write asynchronous code
- Typing: To type hinting
- Contextvar: To save state based on the context
- Contextlib: To help with the creation of context managers
- Doctest: To verify code examples in documentation and docstrings
- Pdb and bdb: To access debugging tools

There are also modules such as dis, ast, and code that allow the developer to inspect, interact, and manipulate the Python interpreter and the runtime environment, but those aren't required by most beginner and intermediate developers.

#### Knowing How to Navigate in the Standard Library
Getting to know the standard library is key for any intermediate/advanced developer, even if you don't know how to use all the modules. Knowing what the library contains and when modules can be used provides any developer with a boost in speed and quality when developing Python applications.

While developers from other languages may try to implement everything on their own from scratch, experienced Python programmers will always first ask themselves "how can I do this with the standard library?" since using the code in the standard library brings multiple benefits, which will be explained later in the chapter.

The standard library makes code simpler and easier to understand. By using modules such as dataclasses, you can write code that would otherwise take hundreds of lines to create by ourselves and would most likely include bugs.

The dataclass module allows you to create value semantic types with fewer keystrokes by providing a decorator that can be used in a class, which will generate all the required boilerplate to have a class with the most common methods.

##### Exercise: Using the dataclass Module
In this exercise, you will create a class to hold data for a geographical point. This is a simple structure with two coordinates, x and y.

These coordinate points, x and y, are used by other developers who need to store geographical information. They will be working daily with these points, so they need to be able to create them with an easy constructor and be able to print them and see their values — converting them into a dictionary to save them into their database and share it with other people.

In [1]:
#Import the dataclass module 
import dataclasses

In [3]:
#defining a dataclass
@dataclasses.dataclass
class Point:
    x: int
    y: int 

In [4]:
#creata an instance which is the data for a geographical point
p = Point (x=10, y=20)
print (p)

Point(x=10, y=20)


In [5]:
p2 = Point (x=10, y=20)

p == p2

True

In [6]:
#serialize the data
dataclasses.asdict(p)

{'x': 10, 'y': 20}

The dataclasses module is part of the standard library, so most experienced users will understand how a class decorated with a dataclass decorator will behave compared to a custom implementation of those methods. This would require either further documentation to be written, or for users to fully understand all the code in all classes that are manually crafting those methods.

Moreover, using a battle-tested code that the standard library provides is also key to writing an efficient and robust application. Functions such as sort in Python use a custom sorting algorithm known as timsort. This is a hybrid stable sorting algorithm derived from merge sort and insertion sort, and will usually result in better performance results and fewer bugs than any algorithm that a user could implement in a limited amount of time.

##### Exercise: Extending the echo.py Example
After the creation of the capitalize tool that you saw earlier in this topic, you can implement an enhanced version of the echo tool in Linux, which is used in some embedded systems that have Python. You will, use the previous code for capitalize and enhance it to have a nicer description. This will allow the echo command to repeat the word passed in and to take more than one word.

In [7]:
%run echo -h

ERROR:root:File `'echo.py'` not found.


In [8]:
parser = argparse.ArgumentParser(description="""
Prints out the words passed in, capitalizes them if required
and repeats them in as many lines as requested.
""")

NameError: name 'argparse' is not defined

#### Using List Comprehensions
List comprehensions are a flexible, expressive way of writing Python expressions to create sequences of values. They make iterating over the input and building the resulting list implicit so that program authors and readers can focus on the important features of what the list represents. It is this concision that makes list comprehensions a Pythonic way of working with lists or sequences.

List comprehensions are built out of bits of Python syntax we have already seen. They are surrounded by square brackets ([]), which signify Python symbols for a literal list. They contain for element in a list, which is how Python iterates over members of a collection. Optionally, they can filter elements out of a list using the familiar syntax of the if expression.

##### Exercise: Using List Comprehensions
List comprehensions are a flexible, expressive way of writing Python expressions to create sequences of values. They make iterating over the input and building the resulting list implicit so that program authors and readers can focus on the important features of what the list represents. It is this concision that makes list comprehensions a Pythonic way of working with lists or sequences.

List comprehensions are built out of bits of Python syntax we have already seen. They are surrounded by square brackets ([]), which signify Python symbols for a literal list. They contain for element in a list, which is how Python iterates over members of a collection. Optionally, they can filter elements out of a list using the familiar syntax of the if expression.

In [3]:
cubes = []
for x in [1,2,3,4,5]:
    cubes.append(x**3)
print(cubes)

[1, 8, 27, 64, 125]


Understanding this code involves keeping track of the state of the cube's variable, which starts as an empty list, and of the x variable, which is used as a cursor to keep track of the program's position in the list. This is all irrelevant to the task at hand, which is to list the cubes of each of these numbers. It will be better – more Pythonic, even – to remove all the irrelevant details. Luckily, list comprehensions allow us to do that.

In [4]:
cubes = [x**3 for x in [1,2,3,4,5]]
print(cubes)

[1, 8, 27, 64, 125]


Now the code is as short and succinct as it can be. Rather than telling you the recipe that the computer follows to build a list of the cubes of the numbers 1, 2, 3, 4, and 5, it tells you that it calculates the cube of x for every x starting from 1 and smaller than 6. 

This is the essence of Pythonic coding: reducing the gap between what you say and what you mean when you tell the computer what it should do.A list comprehension can also filter its inputs when building a list. To do this, you add an if expression to the end of the comprehension, where the expression can be any test of an input value that returns True or False. This is useful when you want to transform some of the values in a list while ignoring others. As an example, you could build a photo gallery of social media posts by making a list of thumbnail images from photos found in each post, but only when the posts are pictures, not text status updates.

You want to get Python to shout the names of the Monty Python cast, but only those whose name begins with "T". Enter the following Python code into a notebook:

In [6]:
names = ["Graham Chapman", "John Cleese", "Terry Gilliam", "Eric Idle", "Terry Jones"]

Those are the names you are going to use. Enter this list comprehension to filter only those that start with "T" and operate on them:

In [8]:
print([name.upper() for name in names if name.startswith("T")])

['TERRY GILLIAM', 'TERRY JONES']


##### Exercise: Using Multiple Input Lists
All the examples you have seen so far build one list out of another by performing an expression on each member of the list. You can define a comprehension over multiple lists, by defining a different element name for each of the lists.

To show how this works, in this exercise, you will be multiplying the elements of two lists together. The Spam Café in Monty Python's Flying Circus (refer to the preceding note) famously served a narrow range of foodstuffs mostly centered around a processed meat product. You will use ingredients from its menu to explore multiple-list comprehension:

In [9]:
print([x*y for x in ['spam', 'eggs', 'chips'] for y in [1,2,3]])

['spam', 'spamspam', 'spamspamspam', 'eggs', 'eggseggs', 'eggseggseggs', 'chips', 'chipschips', 'chipschipschips']


Inspecting the result shows that the collections are iterated in a nested fashion, with the rightmost collection on the inside of the nest and the leftmost on the outside. Here, if x is set to spam, then x*y is calculated with y being equal to each of the values of 1, 2, and then 3 before x is set to eggs, and so on.


In [10]:
print([x*y for x in [1,2,3] for y in ['spam', 'eggs', 'chips']])

['spam', 'eggs', 'chips', 'spamspam', 'eggseggs', 'chipschips', 'spamspamspam', 'eggseggseggs', 'chipschipschips']


Swapping the order of the lists changes the order of the results in the comprehension. Now, x is initially set to 1, then y to each of spam, eggs, and chips, before x is set to 2, and so on. While the result of anyone multiplication does not depend on its order (for instance, the results of 'spam'*2 and 2*'spam' are the same, namely, spamspam), the fact that the lists are iterated in a different order means that the same results are computed in a different sequence.



In [11]:
#the same list could be iterated multiple times in a list comprehension — the lists for x and y do not have to be different:

numbers = [1,2,3]
print([x**y for x in numbers for y in numbers])

[1, 1, 1, 2, 4, 8, 3, 9, 27]


##### Activity: Building a Chess Tournament
In this activity, you will use a list comprehension to create the fixtures for a chess tournament. Fixtures are strings of the form "player 1 versus player 2." Because there is a slight advantage to playing as white, you also want to generate the "player 2 versus player 1" fixture so that the tournament is fair. But you do not want people playing against themselves, so you should also filter out fixtures such as "player 1 versus player 1.

In [14]:
names = ['Magnus Carlsen', 'Fabiano Caruana', 'Yifan Hou', 'Wenjun Ju']
fixtures = [f'{p1} vs. {p2}' for p1 in names for p2 in names if p1 != p2]
print (fixtures)

['Magnus Carlsen vs. Fabiano Caruana', 'Magnus Carlsen vs. Yifan Hou', 'Magnus Carlsen vs. Wenjun Ju', 'Fabiano Caruana vs. Magnus Carlsen', 'Fabiano Caruana vs. Yifan Hou', 'Fabiano Caruana vs. Wenjun Ju', 'Yifan Hou vs. Magnus Carlsen', 'Yifan Hou vs. Fabiano Caruana', 'Yifan Hou vs. Wenjun Ju', 'Wenjun Ju vs. Magnus Carlsen', 'Wenjun Ju vs. Fabiano Caruana', 'Wenjun Ju vs. Yifan Hou']


#### Set and Dictionary Comprehensions
List comprehensions are handy ways to concisely build sequences of values in Python. Other forms of comprehensions are also available, which you can use to build other collection types. A set is an unordered collection: you can see what elements are in a set, but you cannot index into a set nor insert an object at a particular location in the set because the elements are not ordered. An element can only be present in a set once, whereas it could appear in a list multiple times.

Sets are frequently useful in situations where you want to quickly test whether an object is in a collection but do not need to track the order of the objects in the collection. For example, a web service might keep track of all of the active session tokens in a set, so that when it receives a request, it can test whether the session token corresponds to an active session.

A dictionary is a collection of pairs of objects, where one object in the pair is called the key, and the other is called the value. In this case, you associate a value with a particular key, and then you can ask the dictionary for the value associated with that key. Each key may only be present in a dictionary once, but multiple keys may be associated with the same value. While the name "dictionary" suggests a connection between terms and their definitions, dictionaries are commonly used as indices (and, therefore, a dictionary comprehension is often used to build an index). Going back to your web service example, different users of the service could have different permissions, thus limiting the actions that they can perform. The web service could construct a dictionary in which the keys are session tokens, and the values represent user permissions. This is so that it can quickly tell whether a request associated with a given session is permissible.

The syntax for both set and dictionary comprehensions looks very similar to list comprehension, with the square brackets ([]) simply replaced by curly braces ({}). The difference between the two is how the elements are described. For a set, you need to indicate a single element, for example, { x for x in … }. For a dictionary, you need to indicate a pair containing the key and the value, for example, { key:value for key in… }

In [15]:
#to get a list
print([a + b for a in [0,1,2,3] for b in [4,3,2,1]])

[4, 3, 2, 1, 5, 4, 3, 2, 6, 5, 4, 3, 7, 6, 5, 4]


In [16]:
#change the list above to a set

print({a+b for a in [0,1,2,3] for b in [4,3,2,1]})

{1, 2, 3, 4, 5, 6, 7}


Notice that the set created in step 2 is much shorter than the list created in step 1. The reason for this is that the set does not contain duplicate entries – try counting how many times the number 4 appears in each collection. It's in the list four times (because 0 + 4 = 4, 1 + 3 = 4, 2 + 2 = 4, and 3 + 1 = 4), but sets don't retain duplicates, so there's only one instance of the number 4 in the set. If you just removed the duplicates from the list produced in step 1, you'd have a list of [4, 3, 2, 1, 5, 6, 7]. Sets don't preserve the order of their elements either, so the numbers appear in a different order in the set created in step 2. The fact that the numbers in the set appear in numerical order is due to the implementation of the set type in Python.




#### Using Dictionary Comprehensions
Curly-brace comprehension can also be used to create a dictionary. The expression on the left-hand side of the for keyword in the comprehension should contain a key value pair. You write the expression that will generate the dictionary keys to the left of the colon and the expression that will generate the values to the right. Note that a key can only appear once in a dictionary.

In [18]:
names = ["Eric", "Graham", "Terry", "John", "Terry"]
print({k:len(k) for k in ["Eric", "Graham", "Terry", "John", "Terry"]})

{'Eric': 4, 'Graham': 6, 'Terry': 5, 'John': 4}


Notice that the entry for Terry only appears once, because dictionaries cannot contain duplicate keys. You have created an index of the length of each name, keyed by name. An index like this could be useful in a game, where it could work out how to layout the score table for each player without repeatedly having to recalculate the length of each player's name.




##### Activity: Building a Scorecard Using Dictionary Comprehensions and Multiple Lists
You are the backend developer for a renowned college. The management has asked you to build a demo scorecard for their students based on the marks they have achieved in their exams.

Your goal in this activity is to use dictionary comprehension and lists in Python to build a demo scorecard for four students in the college.

In [23]:
students = ['Eric', 'Mark', 'Wade', 'Betty']
scores = [50,79,98,56]
score = {students [i]: scores [i] for i in range (4)}

print (score)

{'Eric': 50, 'Mark': 79, 'Wade': 98, 'Betty': 56}


#### Default Dictionary
The built-in dictionary type considers it to be an error when you try to access the value for a key that doesn't exist. It will raise a KeyError, which you have to handle or your program crashes. Often, that's a good idea. If the programmer doesn't get the key correct, it could indicate a typo or a misunderstanding of how the dictionary is used.

It's often a good idea, but not always. Sometimes, it's fairly possible that a programmer doesn't know what the dictionary contains; whether it's created from a file supplied by the user or the content of a network request, for example. In situations like this, any of the keys the programmer expects could be missing, but handling KeyError instances everywhere will be tedious, repetitive, and make the intent of the code harder to see.

For these situations, Python provides the collections.defaultdict type. It works like a regular dictionary, except that you can give it a function that creates a default value to use when a key is missing. Rather than raise an error, it calls that function and returns the result.

##### Exercise: Adopting a Default Dict
In this exercise, you will be using a regular dictionary that raises a KeyError when you try to access a missing key:

In [24]:
john = { 'first_name': 'John', 'surname': 'Cleese' }
john['middle_name']

KeyError: 'middle_name'

In [26]:
#Now, import the defaultdict from collections and wrap the dictionary in a defaultdict:

from collections import defaultdict
safe_john = defaultdict(str, john)

print(safe_john['middle_name'])




Using the wrapped dictionary does not throw an error when undefined keys are used

No exception is triggered at this stage; instead, an empty string is returned. The first argument to the constructor of defaultdict, called default_factory, can be any callable (that is, function-like) object. You can use this to compute a value based on the key or return a default value that is relevant to your domain.


Create a defaultdict that uses lambda as its default_factory. default_factory is a function that returns the default value for the missing keys.

In [28]:
from collections import defaultdict
courses = defaultdict(lambda: 'No!')
courses['Java'] = 'This is Java'

In [29]:
print(courses['Python'])

No!


In [30]:
print(courses['Java'])

This is Java


The benefit of the default dictionary is that in situations where you know it is likely that expected keys will be missing from a dictionary, you can work with default values and not have to sprinkle your code with exception-handling blocks. This is another example of Pythonicity: if what you mean is "use the value for the "foo" key, but if that doesn't exist, then use "bar" as the value," then you should write that, rather than "use the value for the "foo" key, but if you get an exception and the exception is KeyError, then use "bar" as the value."

Default dicts are great for working with untrusted input, such as a file chosen by the user or an object received over the network. A network service shouldn't expect any input it gets from a client to be well formatted. If it treats the data, it receives in a request as a JSON object. It should be ready for the data to not be in JSON format. If the data is really JSON, the program should not expect all of the keys defined by the API to have been supplied by the client. The default dict gives you a really concise way to work with such under-specified data.

#### Iterators
The Pythonic secret that enables comprehensions to find all of the entries in a list, range, or other collection is an iterator. Supporting iterators in your own classes opens them up for use in comprehensions, for…in loops, and anywhere that Python works with collections. Your collection must implement a method called __iter__(), which returns the iterator.

The iterator itself is also a Python object with a simple contract. It must provide a single method, __next__(). Each time __next__() is called, the iterator returns the next value in the collection. When the iterator reaches the end of the collection, __next__() raises StopIteration to signal that the iteration should terminate.

If you've used exceptions in other programming languages, you may be surprised by this use of an exception to signal a fairly commonplace situation. After all, plenty of loops reach an end, so it's not exactly an exceptional circumstance. Python is not so dogmatic about exceptions, favoring simplicity and expressiveness over universal rules-lawyering.

Once you've learned the techniques to build iterators, the applications are limitless. Your own collections or collection-like classes can supply iterators so that programmers can work with them using Pythonic collection techniques such as comprehensions. For example, an application that stores its data model in a database can use an iterator to retrieve each row that matches a query as a separate object in a loop or comprehension. A programmer can say, "For each row in the database, do this to the row," and treat it like a list of rows, when your data model object is secretly running a database query each time the iterator's __next__() method is called.

##### Exercise: The Simplest Iterator
The easiest way to provide an iterator for your class is to use one from another object. If you are designing a class that controls access to its own collection, then it might be a good idea to let programmers iterate over your object using the collection's iterator. In this case, just have __iter__() return the appropriate iterator.

In this exercise, you will be coding an Interrogator who asks awkward questions to people on a quest. It takes a list of questions in its constructor. You will write this program that prints these questions as follows:

Using an Interrogator in a loop probably means asking each of its questions in sequence. The easiest iterator that can achieve this is the iterator for the collection of questions. Therefore to implement the __iter__() method to return that object.


In [9]:
class Interrogator:
    def __init__(self, questions):
        self.questions = questions

# Add the __iter__() method:
    def __iter__(self):
        return self.questions.__iter__()
    
#create a list of questions
questions = ["What is your name?", "What is your quest?", "What is the average airspeed velocity of an unladen swallow?"]

#Create an Interrogator:
awkward_person = Interrogator(questions)



In [10]:
#Now use the Interrogator in a for loop:
for question in awkward_person:
    print(question)

What is your name?
What is your quest?
What is the average airspeed velocity of an unladen swallow?


On the face of it, you've done nothing more than adding a level of interaction between the Interrogator class and the collection of questions. From an implementation perspective, that's exactly right. However, from a design perspective, what you've done is much more powerful. You've designed an Interrogator class that programmers can ask to iterate over its questions, without having to tell the programmer anything about how the Interrogator stores its questions. While it's just forwarding a method call to a list object today, you could change that tomorrow to use a SQLite3 database or a web service call, and programmers using the Interrogator class will not need to change anything.

For a more complicated case, you need to write your own iterator. The iterator is required to implement a __next__() method, which returns the next element in the collection or raises StopIteration when it gets to the end.

##### Exercise: A Custom Iterator
In this exercise, you'll implement a classical-era algorithm called the Sieve of Eratosthenes. To find prime numbers between 2 and an upper bound value, n, first, list all of the numbers in that range. Now, 2 is a prime, so return that. Then, remove 2 from the list, and all multiples of 2, and return the new lowest number (which will be 3). Continue until there are no more numbers left in the collection. Every number that gets returned using this method is a successively higher prime. It works because any number you find in the collection to return did not get removed at an earlier step, so has no lower prime factors other than itself.

First, build the architecture of the class. Its constructor needs to take the upper bound value and generate the list of possible primes. The object can be its own iterator, so its __iter__() method will return itself:



In [11]:
#Define the PrimesBelow class and its initializer:
class PrimesBelow:
    def __init__(self, bound):
        self.candidate_numbers = list(range(2,bound))

#Implement the __iter__() method to return itself:
    def __iter__(self):
         return self
        
#Define the __next__() method and the exit condition. 
#If there are no remaining numbers in the collection, then the iteration can stop:
    def __next__(self):
        if len(self.candidate_numbers) == 0:
            raise StopIteration
            
        next_prime = self.candidate_numbers[0]
        self.candidate_numbers = [x for x in self.candidate_numbers if x % next_prime != 0]
        return next_prime
    
#Use an instance of this class to find all the prime numbers below 100:
primes_to_a_hundred = [prime for prime in PrimesBelow(100)]
print(primes_to_a_hundred)

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]


In [None]:
'''The main body of the algorithm is in the __next__() method. With each iteration, 
it finds the next lowest prime. If there isn't one, it raises StopIteration. 
If there is one, it sieves that prime number and its multiples from the collection and then returns the prime number.'''

In [None]:
'''Complete the implementation of __next__() by selecting the lowest number in the collection as 
the value for next_prime and removing any multiples of that number before returning the new prime:
'''

This exercise demonstrates that by implementing an iterative algorithm as a Python iterator, you can treat it like a collection. In fact, the program does not actually build the collection of all of the prime numbers: you did that yourself in step 5 by using the PrimesBelow class, but otherwise, PrimesBelow was generating one number at a time, whenever you called the __next()__ method. This is a great way to hide the implementation details of an algorithm from a programmer. Whether you actually give them a collection of objects to iterate over or an iterator that computes each value as it is requested, programmers can use the results in exactly the same way.

##### Exercise: Controlling the Iteration
You do not have to use an iterator in a loop or comprehension. You can use the iter() function to get its argument's iterator object, and then pass that to the next() function to return successive values from the iterator. These functions call through to the __iter__() and __next__() methods, respectively. You can use them to add custom behavior to an iteration or to gain more control over the iteration.

In this exercise, you will print the prime numbers below 5. An error should be raised when the object runs out of prime numbers. To do this, you will use the PrimesBelow class created in the previous exercise:

In [12]:
class PrimesBelow:
    def __init__(self, bound):
        self.candidate_numbers = list(range(2,bound))
    def __iter__(self):
        return self
    def __next__(self):
        if len(self.candidate_numbers) == 0:
            raise StopIteration
        next_prime = self.candidate_numbers[0]
        self.candidate_numbers = [x for x in self.candidate_numbers if x % next_prime != 0]
        return next_prime
primes_under_five = iter(PrimesBelow(5))

In [15]:
#Repeatedly use next() with this object to generate successive prime numbers:
next(primes_under_five)

StopIteration: 

When the object runs out of prime numbers, the subsequent use of next() raises the StopIteration error:


Being able to step through an iteration manually is incredibly useful in programs that are driven by a sequence of inputs, including a command interpreter. You can treat the input stream as an iteration over a list of strings, where each string represents a command. Call next() to get the next command, work out what to do, and then execute it. Then, print the result, and go back to next() to await the subsequent command. When StopIteration is raised, the user has no more commands for your program, and it can exit.

#### Itertools
Iterators are useful for describing sequences, such as Python lists and ranges, and sequence-like collections, such as your own data types, that provide ordered access to their contents. Iterators make it easy to work with these types in a Pythonic way. Python's library includes the itertools module, which has a selection of helpful functions for combining, manipulating, and otherwise working with iterators. In this section, you will use a couple of helpful tools from the module. There are plenty more available, so be sure to check out the official documentation for itertools.

One of the important uses of itertools is in dealing with infinite sequences. There are plenty of situations in which a sequence does not have an end: everything from infinite series in mathematics to the event loop in a graphical application. A graphical user interface is usually built around an event loop in which the program waits for an event (such as a keypress, a mouse click, a timer expiring, or something else) and then reacts to it. The stream of events can be treated as a potentially infinite list of event objects, with the program taking the next event object from the sequence and doing its reaction work. Iterating over such a sequence with either a Python for..in loop or a comprehension will never terminate. There are functions in itertools for providing a window onto an infinite sequence, and the following exercise will look at one of those.

##### Exercise: Using Infinite Sequences and takewhile
An alternative algorithm to the Sieve of Eratosthenes for generating prime numbers is to test each number in sequence – to see whether it has any divisors other than itself. This algorithm uses a lot more time than the Sieve in return for a lot less space.

In this exercise, you will be implementing a better algorithm that uses less space than the Sieve for generating prime numbers:

In [16]:

class Primes:
    def __init__(self):
        self.current = 2
        
    def __iter__(self):
        return self
     
    def __next__(self):
        while True:
            current = self.current
            square_root = int(current ** 0.5)
            is_prime = True
            if square_root >= 2:
                for i in range(2, square_root + 1):
                    if current % i == 0:
                        is_prime = False
                        break
            self.current += 1
            if is_prime:
                return current

Note: The class you just entered is an iterator, but the __next__() method never raises a StopIteration error. That means it never exits. Even though you know that each prime number it returns is bigger than the previous one, a comprehension doesn't know that so you can't simply filter out large values

In [None]:
#Enter the following code to get a list of primes that are lower than 100:
[p for p in Primes() if p < 100]

Because the iterator never raises StopIteration, this program will never finish. You'll have to force it to exit. This is because of the fact this list comprehension is equivalent to

In [18]:
myList = []
for p in Primes():
    if p < 100:
        myList.append(p)

KeyboardInterrupt: 

To work with this iterator, itertools provides the takewhile() function, which wraps the iterator in another iterator. You also supply takewhile() with a Boolean function, and its iteration will take values from the supplied iterator until the function returns False, at which time it raises StopIteration and stops. This makes it possible to find the prime numbers below 100 from the infinite sequence entered previously.

In [19]:
#Use takewhile() to turn the infinite sequence into a finite one:
import itertools
print([p for p in itertools.takewhile(lambda x: x<100, Primes())]) #the 'takewhile' wraps the iterator into another iterator

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]


##### Exercise: Turning a Finite Sequence into an Infinite One, and Back Again
In this exercise, consider a turn-based game, such as chess. The person playing white makes the first move. Then, the person playing black takes their turn. Then white. Then black. Then white, black, white, and so on until the game ends. If you had an infinite list of white, black, white, black, white, and so on, then you could always look at the next element to decide whose turn it is:

In [21]:
import itertools
players = ['White', 'Black']

#Use the itertools function cycle to generate an infinite sequence of turns:
turns = itertools.cycle(players)

To demonstrate that this has the expected behavior, you'll want to turn it back into a finite sequence so that you can view the first few members of the turns iterator. You can use takewhile() for that, and, here, combine it with the count() function from itertools, which produces an infinite sequence of numbers.

In [22]:
#List the players who take the first 10 turns in a chess game:
countdown = itertools.count(10, -1)
print([turn for turn in itertools.takewhile(lambda x:next(countdown)>0, turns)])

['White', 'Black', 'White', 'Black', 'White', 'Black', 'White', 'Black', 'White', 'Black']


This is the "round-robin" algorithm for allocating actions (in this case, making a chess move) to resources (in this case, the players), and has many more applications than board games. A simple way to do load balancing between multiple servers in a web service or database application is to build an infinite sequence of the available servers and choose one in turn for each incoming request.

#### Generators
A function that returns a value does all of its computation and gives up control to its caller, which supplies that value. This is not the only possible behavior for a function. It can instead yield a value, which passes control (and the value) back to the caller but leaves the function's state intact. Later, it can yield another value, or finally return to indicate that it is done. A function that yields is called a generator.

Generators are useful because they allow a program to defer or postpone calculating a result until it's required. Finding the successive digits of π, for example, is hard work, and it gets harder as the number of digits increases. If you wrote a program to display the digits of π, you might calculate the first 1,000 digits. Much of that effort will be wasted if the user only asks to see the first 10 digits. Using a generator, you can put off the expensive work until your program actually requires the results.

A real-world example of a situation where generators can help is when dealing with I/O. A stream of data coming from a network service can be represented by a generator that yields the available data until the stream is closed when it returns the remaining data. Using a generator allows the program to pass control back and forth between the I/O stream when data is available, and the caller where the data can be processed.

Python internally turns generator functions into objects that use the iterator protocol (such as __iter__, __next__, and the StopIteration error), so the work you put into understanding iterations in the previous section means you already know what generators are doing. There is nothing you can write for a generator that could not be replaced with an equivalent iterator object. However, sometimes, a generator is easier to write or understand. Writing code that is easier to understand is the definition of Pythonicity.

##### Exercise: Generating a Sieve
In this exercise, you will be rewriting the Sieve of Eratosthenes as a generator function and comparing it with the result of the iterator version:

In [23]:
#Rewrite the Sieve of Eratosthenes as a generator function that yields its values:
def primes_below(bound):
    candidates = list(range(2,bound))
    while(len(candidates) > 0):
        yield candidates[0] #'yield' is like 'return' except that the function would return a generator and not a value.
        candidates = [c for c in candidates if c % candidates[0] != 0]

In [25]:
#Confirm that the result is the same as the iterator version:
print ([prime for prime in primes_below(100)])

[2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97]


That's really all there is to generators — they're just a different way of expressing an iterator. They do, however, communicate a different design intention; namely, that the flow of control is going to pass back and forth between the generator and its caller.

##### Activity: Using Random Numbers to Find the Value of Pi
The Monte Carlo method is a technique that is used for approximating a numerical solution using random numbers. Named after the famous casino, chance is at the core of Monte Carlo methods. They use random sampling to obtain information about a function that will be difficult to calculate deterministically. Monte Carlo methods are frequently used in scientific computation to explore probability distributions, and in other fields including quantum physics and computational biology. They're also used in economics to explore the behavior of financial instruments under different market conditions. There are many applications for the Monte Carlo principle.

In this activity, you'll use a Monte Carlo method to find an approximate value for π. Here's how it works: two random numbers, (x,y), somewhere between (0,0) and (1,1), represent a random point in a square positioned at (0,0) with sides of length 1:

Using Pythagoras' Theorem, if the value of $$\sqrt{x^2 + y^2}$$ is less than 1, then the point is also in the top-right corner of a circle centered at (0,0) with a radius of -1:

Generate lots of points, count how many are within the circle segment, and divide the number of points within the circle by the total number of points generated. This gives you an approximation of the area of the circle segment, which should be π/4. Multiply by 4, and you have an approximate value of π. Data scientists often use this technique to find the area under more complex curves that represent probability distributions.

##### Steps: 
Write a generator to yield successive estimates of π. The steps are as follows:

- Define your generator function.
- Set the total number of points, and the number within the circle segment, to 0.
- Do the following substeps 10,000 times:
-- Generate two numbers between 0 and 1, using Python's random.random() function.

-- Add 1 to the total number of points.

-- Use math.sqrt() to find out how far the point represented by the numbers is from (0,0).

-- If the distance is less than 1; add 1 to the number of points within the circle.

-- Calculate your estimate for π: 4 * (points within the circle) / (total points generated).

-- If you have generated a multiple of 1,000 points, yield the approximate value for π. If you have generated 10,000 points, return the value.
- Inspect the successive estimates of π and check how close they are to the true value (math.pi).

In [26]:
import math
import random

In [27]:
#Define the approximate_pi function:
def approximate_pi():

#Set the counters to zero:
    total_points = 0
    within_circle = 0
    
#Calculate the approximation multiple times:
    for i in range (10001):
#Here, x and y are random numbers between 0 and 1, which, together, represent a point in the unit square
        x = random.random()
        y = random.random()
        total_points += 1
        #Use Pythagoras' Theorem to work out the distance between the point and the origin, (0,0):
        distance = math.sqrt(x**2+y**2)
        if distance < 1:
#If the distance is less than 1,this point is both inside the square and inside a circle of radius 1, centered on the origin
            within_circle += 1
        #yield a result every 1000 points
        if total_points % 1000 == 0:
            #the ratio of the points within the circle to the total points generated should be approx n/4
            pi_estimate = 4 * within_circle / total_points
            if total_points == 10000:
                #after 1000 points are generated, return the estimates to complete the iteration
                return pi_estimate
            else:
                yield pi_estimate
#use the generator to find the estimates for the value of n
estimates = [estimate for estimate in approximate_pi()]
errors = [estimate - math.pi for estimate in estimates]

In [30]:
print(estimates)
print(errors)

[3.168, 3.12, 3.16, 3.179, 3.1808, 3.179333333333333, 3.164, 3.159, 3.1502222222222223]
[0.026407346410207033, -0.02159265358979301, 0.018407346410207026, 0.03740734641020671, 0.039207346410206956, 0.03774067974354001, 0.02240734641020703, 0.017407346410206692, 0.008629568632429141]


#### Regular Expressions
Regular expressions (or regexes) are a domain-specific programming language, defining a grammar for expressing efficient and flexible string comparisons. Introduced in 1951 by Stephen Cole Kleene, regular expressions have become a popular tool for searching and manipulating text. As an example, if you're writing a text editor and you want to highlight all web links in a document and make them clickable, you might search for strings that start with HTTP or HTTPS, then those that contain ://, and then those that contain some collection of printable characters, until you stop finding printable characters (such as a space, newline, or the end of the text), and highlight everything up to the end. With standard Python syntax, this will be possible, but you will end up with a very complex loop that will be difficult to get right. Using regexes, you match against https?://\S+.

features used in regular expressions as seen in the preceding URL:

- Most characters match their own identities, so "h" in a regex means "match exactly the letter h."
- Enclosing characters in square brackets can mean choosing between alternates, so if we thought a web link might be capitalized, we could start with "[Hh]" to mean "match either H or h." In the body of the URL, we want to match against any non-whitespace characters, and rather than write them all out. We use the \S character class. Other character classes include \w (word characters), \W (non-word characters), and \d (digits).
- Two quantifiers are used: ? means "0 or 1 time," so "s?" means "match if the text does not have s at this point or has it exactly once." The quantifier, +, means "1 or more times," so "\S+" says "one or more non-whitespace characters." There is also a quantifier *, meaning "0 or more times."Additional regex features that you will use in this chapter are listed here:
- Parentheses () introduce a numbered sub-expression, sometimes called a "capture group." They are numbered from 1, in the order that they appear in the expression.
- A backslash followed by a number refers to a numbered sub-expression, described previously. As an example, \1 refers to the first sub-expression. These can be used when replacing text that matches the regex or to store part of a regex to use later in the same expression. Because of the way that backslashes are interpreted by Python strings, this is written as \\1 in a Python regex.

Regular expressions have various uses throughout software development, as so much software deals with text. Validating user input in a web application, searching for and replacing entries in text files, and finding interesting events in application log files are all uses that regular expressions can be put to in a Python program.

##### Exercise: Matching Text with Regular Expressions
In this exercise, you'll use the Python re module to find instances of repeated letters in a string.

The regex you will use is (\w)\\1+"."(\w) searches for a single character from a word (that is, any letter or the underscore character, _) and stores that in a numbered sub-expression, \1. Then, \\1+ uses a quantifier to find one or more occurrences of the same character. The steps for using this regex are as follows:

In [31]:
#Import the re module:
import re

In [32]:
#Define the string that you will search for, and the pattern by which to search:
title = "And now for something completely different"
pattern = "(\w)\\1+"

In [33]:
#Search for the pattern and print the result:
print(re.search(pattern, title))

<re.Match object; span=(35, 37), match='ff'>


The re.search() function finds matches anywhere in the string: if it doesn't find any matches, it will return None. If you were only interested in whether the beginning of the string matched the pattern, you could use re.match(). Similarly, modifying the search pattern to start with the beginning-of-line marker (^) achieves the same aim as re.search("^(\w)\\1+", title).

##### Exercise: Using Regular Expressions to Replace Text
In this exercise, you'll use a regular expression to replace occurrences of a pattern in a string with a different pattern. The steps are as follows:



In [34]:
#Define the text to search:
import re
description = "The Norwegian Blue is a wonderful parrot. This parrot is notable for its exquisite plumage."

In [35]:
#Define the pattern to search for, and its replacement:
pattern = "(parrot)"
replacement = "ex-\\1"

In [36]:
#Substitute the replacement for the search pattern, using the re.sub() function:
print(re.sub(pattern, replacement, description))

The Norwegian Blue is a wonderful ex-parrot. This ex-parrot is notable for its exquisite plumage.


The replacement refers to the capture group, "\1", which is the first expression in the search pattern to be surrounded by parentheses. In this case, the capture group is the whole word parrot. This lets you refer to the word parrot in the replacement without having to type it out again.

##### Activity: Regular Expressions
At your online retail company, your manager has had an idea for a promotion. There is a whole load of old "The X-Files" DVDs in the warehouse, and she has decided to give one away for free to any customer whose name contains the letter x.

In this activity, you will be using Python's re module to find winning customers. The x could be capitalized if it's their initial, or lower case if it's in the middle of their name, so use the regular expression [Xx] to search for both cases:

In [37]:
import re

customers = ['Xander Harris', 'Jennifer Smith', 'Timothy Jones', 'Amy Alexandrescu', 'Peter Price', 'Weifung Xu']

In [40]:
winner = [customer for customer in customers if re.search ('[Xx]', customer)]

print (winner)

['Xander Harris', 'Amy Alexandrescu', 'Weifung Xu']


### Software Development

#### Debugging
Sooner or later in your development, you will reach a point where you see our program behave differently than you initially expected. In situations like these, you usually look back at the source code and try to understand what is different between your expectations and the code or inputs that are being used. To facilitate that process, there are multiple methods (in general, and some that are specific to Python) that you can use to try to "debug" or "troubleshoot" the issue.

Usually, the first action of an experienced developer, when frustration arises from unexpected results in their code, is to look at the logs or any other output that the application produces. A good starting point is trying to increase the logging verbosity, as discussed in Standard Library course. If you are not able to troubleshoot the problem with just logs, it usually means that you should look back at how we are instructing our application to log its state and activity producing what are known as traces, as there might be a good opportunity to improve it.

The next step of verifying the inputs and outputs of the program is to receive and verify the log. The usual next step in Python is to use the Python debugger, pdb.

The pdb module and its command line interface which is a cli tool allows you to navigate through the code as it runs and ask questions about the state of the program, its variables, and the flow of execution. It is similar to other tools, such as gdb, but it is at a higher level and is designed for Python.

There are two main ways to start pdb. You can just run the tool and feed it with a file or use the breakpoint command.

In [44]:
# This is a comment
this = "is the first line to execute"
def secret_sauce(number):
    if number <= 10:
        return number + 10
    else:
        return number - 10
def magic_operation(x, y):
    res = x + y
    res *= y
    res /= x
    res = secret_sauce(res)
    return res
print(magic_operation(2, 10))

50.0


In [None]:
#When you begin executing the script with pdb, it works as follows:

python3.8 –m pdb magic_operation.py
> [...]Lesson08/1.debugging/magic_operation.py(3)<module>()
-> this = "is the first line to execute"
(Pdb)

It will stop on the first line of the Python code to execute and give us a prompt to interact with pdb.

The first line shows us which current file you are in at the moment, while the final line shows us the pdb prompt (pdb), which tells us which debugger you are running and that it is waiting for input from the user.

Another way to start pdb is to change the source code to do this. At any point in the code, we can write "import pdb;pdb.set_trace()" for earlier versions of Python to tell the Python interpreter that you want to start a debugging session at that point. If you are using Python 3.7 or a later version, you can use breakpoint().

If you execute the magic_operation_with_breakpoint.py file attached in the GitHub repository, which has breakpoint() in one of its lines, you will see that the debugger starts for you where you requested it.

When you are running things in an IDE or code in a large application you could achieve the same effect by using the operations that we will demonstrate later, but just dropping that line in the file is by far the simplest and fastest way:

In [None]:
$ python3.7 magic_operation_with_breakpoint.py
> [...]/Lesson08/1.debugging/magic_operation_with_breakpoint.py(7)secret_sauce()
-> if number <= 10:
(Pdb)

At this point, you can get a list of all the commands by running help, or you can get more information about a specific command by running the help command. The most commonly used commands are as follows:

- break filename:linenumber: This sets a breakpoint in the specified line. It ensures that you will stop the code at that point when other commands are running by continuing the execution. Breakpoints can be set in any file included in the standard library. If we want to set a breakpoint in a file that is part of a module, you can do so by just using its full path within the Python path. For example, to stop the debugger in the parser module, which is part of the HTML package of the standard library, you would perform b html/parser:50 to stop the code on line 50 of the file.
- break function: You can request to stop the code when a specific function is called. If the function is in the current file, you can pass the function name. If the function is imported from another module, you will have to pass the full function specification, for example, break html.parser. HTMLParser.reset, to stop at the reset function of the HTMLParser class of html.parser.
- break without arguments: This lists all the current breakpoints that are set in the current state of the program.
- continue: This continues the execution until a breakpoint is found. This is quite useful when you start a program, set breakpoints in all the lines of code or functions you want to inspect, and then just let it run until it stops at any of those.
- where: This prints a stack trace with the current line of execution where the debugger stopped. It is useful to know what called this function or to be able to move around the stack.
- down and up: These two commands allow us to move around in the stack. If we are in a function call, we can use up to move to the caller of the function and inspect the state in that frame, or you can use down to go deeper in the stack after we have moved up.
- list: This displays 11 lines of code from the point where the execution stopped for the first time to when it is called. Successive calls to list will display the following lines in batches of 11. To start again from where the execution stopped, use list.
- longlist: This shows the source code of the current function in the current frame that is being executed.
- next: This executes the line and moves to the following one.
- step: This executes the current line and stops at the first opportunity within the function being executed. This is useful when you don't want to just execute a function, but we want to step through it.
- p: This prints the value of an expression. It is useful for checking the content of variables.
- pp: This allows you to pretty print an expression. It is useful for when we are trying to print long structures.
- run/restart: This restarts the program keeping all the breakpoints still set. It is useful if you have passed an event you expected to see.

Many functions have shortcuts; for example, you can use b instead of break, c or cont instead of continue, l instead of a list, ll for longlist, and so on.