# I. Introduction to Python > 18. Modules (Part 1)


**[<< Previous lesson](./17_Buil-in-Functions-Part-2-and-Lambda.ipynb)   |   [Next lesson >>](./19_Modules-Part-2.ipynb)**

<hr>
&nbsp;

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Introduction" data-toc-modified-id="Introduction-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Introduction</a></span></li><li><span><a href="#Importing-a-module" data-toc-modified-id="Importing-a-module-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Importing a module</a></span></li><li><span><a href="#The-collection-module" data-toc-modified-id="The-collection-module-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>The <code>collection</code> module</a></span><ul class="toc-item"><li><span><a href="#Counter" data-toc-modified-id="Counter-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span><code>Counter</code></a></span></li><li><span><a href="#defaultdict" data-toc-modified-id="defaultdict-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span><code>defaultdict</code></a></span></li></ul></li><li><span><a href="#The-timeit-module" data-toc-modified-id="The-timeit-module-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>The <code>timeit</code> module</a></span></li><li><span><a href="#Comparing-map(),-lambda-functions-and-list-comprehension-with-timeit()" data-toc-modified-id="Comparing-map(),-lambda-functions-and-list-comprehension-with-timeit()-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Comparing <code>map()</code>, <code>lambda</code> functions and list comprehension with <code>timeit()</code></a></span></li><li><span><a href="#Credits" data-toc-modified-id="Credits-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Credits</a></span></li></ul></div>

<hr>
&nbsp;

## Introduction

Programs would be really hard to understand if everything was in a single file. This is why we use **modules**. Modular programming refers to the process of breaking a large programming task into separate, smaller, more manageable subtasks or modules.

This process makes the code:
- simpler since the focus in on a small portion of the problem
- easier to maintain
- reusable
- avoid name collisions (see [Scope](./13_Global-Local-and-Nonlocal.ipynb#4))

Technically, a module is a file that contains a collections of **functions** and **global variables**. We organise the modules into **package** (a directory). And we regroup related modules and package into a **library**.

![Module](./attachments/module.png)

<hr>
&nbsp;

## Importing a module

In [1]:
# we use the keyword import to import a module
import random

In [2]:
# now we can call the functions randint() from random
# randint() returns a random integer in [a, b] (both included)
random.randint(0, 100)

64

You can rerun the cell, and you will get a different number each time.

In [3]:
# you need to specify where the function comes from when using it
randint(0, 100)

NameError: name 'randint' is not defined

In [4]:
# Unless you import the function only
from random import randint

In [5]:
# in this case, we write the function without the name of the module
randint(0, 100)

32

In [6]:
# we can also import the function under a different name
from random import randint as pick_random

In [7]:
pick_random(0, 10)

6

&nbsp;

**NOTE:** You might see the use of **`*`** like this:

    from module import *
    
**DON'T do it**. It imports everything from the module, but this is a bad practice. You might have name collisions and conflicts. Instead, be as specific as possible.

Now let's have a explore a few useful modules

<hr>
&nbsp;

## The `collection` module


The **`collections`** provides different types of containers as alternatives to the basics: `dict`, `list`, `set`, and `tuple`. A Container is an object that is used to store different objects and provide a way to access the contained objects and iterate over them

### `Counter`

A `counter` is a sub-class of the dictionary. It is used to keep the count of elements. Inside of it, the elements are stored as dictionary keys and the counts of the objects are stored as the value.

In [8]:
from collections import Counter

In [9]:
# counter with list
lst = [1,2,2,2,2,3,3,3,1,2,1,12,3,2,32,1,21,1,223,1]

Counter(lst)

Counter({1: 6, 2: 6, 3: 4, 12: 1, 32: 1, 21: 1, 223: 1})

In [10]:
# counter with strings
Counter('aabsbsbsbhshhbbsbs')

Counter({'a': 2, 'b': 7, 's': 6, 'h': 3})

In [11]:
# counter with words in a sentence
s = 'How many times does each word show up in this sentence word times each each word'
words = s.split()
Counter(words)

Counter({'How': 1,
         'many': 1,
         'times': 2,
         'does': 1,
         'each': 3,
         'word': 3,
         'show': 1,
         'up': 1,
         'in': 1,
         'this': 1,
         'sentence': 1})

In [12]:
# the objects counter have a method called most_common()
mycounter = Counter(words)
mycounter.most_common(1)

[('each', 3)]

In [13]:
# and we can choose how many elements we want to get
mycounter.most_common(3)

[('each', 3), ('word', 3), ('times', 2)]

&nbsp;

### `defaultdict`

**`defaultdict`** is a dictionary-like object which provides the same methods as a standard dictionary but it takes a first argument (default_factory). Let's see why it is insteresting.

In [14]:
# this is a (normal) empty dictionary
d = {}

In [15]:
# if I try to get something out of it
d[0]

KeyError: 0

In [16]:
# and it's not because the dictionary is empty, look here
d = {'a':0, 'b':1}
d['c']

KeyError: 'c'

A **`defaultdict`** instead will **never raise a KeyError**. Any key that does not exist gets the value returned by the default factory.

In [17]:
from collections import defaultdict

In [18]:
# we pass a data type as a an argument
d  = defaultdict(int)

In [19]:
d['a']

0

In [20]:
d[0]

0

In [21]:
d  = defaultdict(str)

In [22]:
d['a']

''

In [23]:
d[0]

''

In [24]:
# or we can pass a default value with lambda
d = defaultdict(lambda: 5)

In [25]:
d['a']

5

In [26]:
d[0]

5

Check the [python documentation](https://docs.python.org/3/library/collections.html) for more information on the collection module

<hr>
&nbsp;

## The `timeit` module

The **`timeit`** module allow you to check how long a bit of code is taking to run.

In [27]:
import timeit

In [28]:
# let's consider this big number
99999 ** 999

9900596848431960165680474400729030428710512983698523993401321263022788367415028399819213175538745186745440380432774441989942373434073458860589066820901643196435927916092157933123615175334553000572454454893088576287009828186572429841867232840312216770417106182557352594257996425240576501402454631961249679654782981101325584242180799894326536056687015676739263806047771610421662047092028277769094779071150355035718084096978346466497327302286373388715911295487769283064073585109741564683257714952900956088696657045918291440438102872999117974402524913962020470751712787129535639998907406171763073327743789239579973972669375410457693878268455371387733155232247385229477332387160014363131098267758280048418072023197324606256535749422654493453423459629292746370892284045843302493345836687850084450703967700323852331170755375055393725250873556523528120015417216330870254283409781103974177143166239606011160329711765344432919258921753947617840337038673418754030680983351500752908637042782107261149056131593725

In [29]:
# let's see how long is takes Python to calculate it
timeit.timeit("99999 ** 999")

44.50339371700102

That took a very long time...

It is because `timeit()` is a timer that runs the code several times (**1,000,000x** by default) and calculate how long it took **in total**.

In [30]:
# we can specify how many times to run the code
timeit.timeit("99999 ** 999", number=1_000)

0.055053178999514785

In [31]:
# we can specify how many times to run the code
timeit.timeit("99999 ** 999", number=1)

8.734799848753028e-05

In [32]:
# there is also a repeat function, here we repeat 3x
timeit.repeat("99999 ** 999", repeat=3, number=1_000)

[0.0585462520011788, 0.04841283099995053, 0.049350995001077536]

`repeat()` is a timer that repeatedly runs `timeit()` (by default **5x**) and store each result in a list. If we don't specify a `number`, it will run the default number of time (i.e. 1,000,000x).

In [33]:
# to make the code easier to read we can also do as follow
statement = "999999 ** 9999"
timeit.timeit(stmt=statement, number=1_000)

2.3138745309988735

In [34]:
# it works the same with repeat
statement = "999999 ** 9999"
timeit.repeat(stmt=statement, repeat=4, number=100)

[0.28364791799867817,
 0.22693738500129257,
 0.21895545000006678,
 0.2355707179995079]

In [35]:
# now let's try with the following list comprehension
statement = "[float(num) for num in range(100)]"
timeit.timeit(stmt=statement, number=100_000)

0.6556005940001342

In [36]:
# and let's compare it with the equivalent using map()
statement = "list(map(float,range(100)))"
timeit.timeit(stmt=statement, number=100_000)

0.48041807700064965

It looks like `map()` is faster than list comprehension

**NOTE:** we can also run a timer on bigger block of codes

In [37]:
# Let's define write the same function with a for loop
def func_1(n):
    result = []
    for i in range(n):
        result.append(str(i))
    return result

In [38]:
# Let's have the same with list comprehension
def func_2(n):
    return [str(num) for num in range(n)]

In [39]:
# And another one using map()
def func_3(n):
    return list(map(str,range(n)))

In [40]:
# check
func_1(12)

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11']

In [41]:
# they return the same result
func_2(12)

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11']

In [42]:
func_3(12)

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11']

In [43]:
# we do the same as before
stmt1 = '''
def func_1(n):
    result = []
    for i in range(n):
        result.append(str(i))
    return result
'''

In [44]:
stmt2 = '''
def func_2(n):
    return [str(num) for num in range(n)]
'''

In [45]:
stmt3 = '''
def func_3(n):
    return list(map(str,range(n)))
'''

In [46]:
# and let's compare the 3 of them with 100,000,000x run
print(timeit.timeit(stmt=stmt1, number=100_000_000))
print(timeit.timeit(stmt=stmt2, number=100_000_000))
print(timeit.timeit(stmt=stmt3, number=100_000_000))

4.098156865000419
4.222038174999398
4.208762567999656


So clearly the for `loop` is slower. But don't you find the result to be low regarding the number of run?

This is because what is calculated is the time it took to **define the functions**. Not to run them.

In [47]:
# this is the way to go about it in this case
setup1 = '''
def func_1(n):
    result = []
    for i in range(n):
        result.append(str(i))
    return result
'''

stmt1 = 'func_1(100)'

In [48]:
setup2 = '''
def func_2(n):
    return [str(num) for num in range(n)]
'''

stmt2 = 'func_2(100)'

In [49]:
setup3 = '''
def func_3(n):
    return list(map(str,range(n)))
'''

stmt3 = 'func_3(100)'

In [50]:
print(timeit.timeit(stmt=stmt1, setup=setup1, number=100_000))
print(timeit.timeit(stmt=stmt2, setup=setup2, number=100_000))
print(timeit.timeit(stmt=stmt3, setup=setup3, number=100_000))

1.51843768100116
1.265808915000889
1.073520716998246



Check the [python documentation](https://docs.python.org/3/library/timeit.html) for more information on the timeit module

&nbsp;

**NOTE:** We can also use the [Jupyter built-in magic command](https://ipython.readthedocs.io/en/stable/interactive/magics.html#) **`%timeit`** and **`%%timeit`**. But this method is **ONLY** available **in Jupyter**.

In [51]:
# %timeit works only for the current line
%timeit func_1(10)

1.6 µs ± 72.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [52]:
# this does not work
%timeit
func_1(10)

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

**NOTE:** # if we want to make it work for the entire cell we use **`%%timeit`** . However we need to apply those rules then:
- **`%%timeit`** is at the top of the cell
- nothing before the command, not even commented code

In [53]:
%%timeit
func_2(10)

1.45 µs ± 51.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [54]:
# we can add the parameters -n for number
%timeit -n 100000 func_3(100)

10.2 µs ± 363 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


<hr>
&nbsp;

## Comparing `map()`, `lambda` functions and list comprehension with `timeit()`

In our previous examples we have seen that map() was faster. However we had said in the previous lesson that map() was slower. Let's investigate this further

In [55]:
# Let's generate a list of squares with comprehension vs map()
statement1 = "[x**2 for x in range(50)]"
statement2 = "list(map(lambda x: x**2, range(50)))"

print(timeit.timeit(stmt=statement1, number=500_000))
print(timeit.timeit(stmt=statement2, number=500_000))

5.131992253000135
5.729579064000063


In [56]:
# check that both statements are equal
A = [x**2 for x in range(50)]
B = list(map(lambda x: x**2, range(50)))
A == B

True

In [57]:
# a simpler calculation
statement1 = "[x+2 for x in range(50)]"
statement2 = "list(map(lambda x: x+2, range(50)))"

print(timeit.timeit(stmt=statement1, number=1_000_000))
print(timeit.timeit(stmt=statement2, number=1_000_000))

1.8724786869988748
3.518423627001539


In [58]:
# with a conditionals
statement1 = "[x if x%2 == 0 else 0 for x in range(50)]"
statement2 = "list(map(lambda x: x if x%2 == 0 else 0, range(50)))"

print(timeit.timeit(stmt=statement1, number=1_000_000))
print(timeit.timeit(stmt=statement2, number=1_000_000))

2.8895411909998074
4.5576480320014525


In [59]:
# check that both statements are equal
A = [x if x%2 == 0 else 0 for x in range(50)]
B = list(map(lambda x: x if x%2 == 0 else 0, range(50)))
A == B

True

In [60]:
# and without a lambda function
statement1 = "[len(x) for x in 'This is a test sentence'.split(' ')]"
statement2 = "list(map(len, 'This is a test sentence'.split(' ')))"

print(timeit.timeit(stmt=statement1, number=1_000_000))
print(timeit.timeit(stmt=statement2, number=1_000_000))

0.5056269920005434
0.4091475069999433


In [61]:
# check that both statements are equal
A = [len(x) for x in 'This is a test sentence'.split(' ')]
B = list(map(len, 'This is a test sentence'.split(' ')))
A == B

True

In [62]:
# what if we call a external function
setup1 = '''
def convert_to_fahrenheit(temp):
    return (9/5)*temp + 32
'''

In [63]:
statement1 = "[convert_to_fahrenheit(x) for x in range(50)]"
statement2 = "list(map(convert_to_fahrenheit, range(50)))"

print(timeit.timeit(stmt=statement1, setup=setup1, number=1_000_000))
print(timeit.timeit(stmt=statement2, setup=setup1, number=1_000_000))

5.940471326999614
4.855648755999937


In [64]:
# what about filter
statement1 = "[x for x in range(50) if x % 2 == 0]"
statement2 = "list(filter(lambda x: x % 2 == 0, range(50)))"

print(timeit.timeit(stmt=statement1, number=1_000_000))
print(timeit.timeit(stmt=statement2, number=1_000_000))

2.5198309420011356
3.8934332900007576


In [65]:
# check that both statements are equal
A = [x for x in range(50) if x % 2 == 0]
B = list(filter(lambda x: x % 2 == 0, range(50)))
A == B

True

&nbsp;

So this confirm that **it is better to use list comprehension**, especially `map()` or `filter()` are used with lambda functions. List comprehension are also more readable.

<hr>
&nbsp;

## Credits
- [Pierian Data](https://github.com/Pierian-Data/Complete-Python-3-Bootcamp)
- [Geeks for Geeks](https://www.geeksforgeeks.org/python-map-vs-list-comprehension/)
- [Stack overflow](https://stackoverflow.com/questions/1247486/list-comprehension-vs-map)
- [pymotw](https://pymotw.com/2/timeit/)
- [finxter](https://blog.finxter.com/which-is-faster-list-comprehension-or-map-function-in-python/)
- [switowski](https://switowski.com/blog/for-loop-vs-list-comprehension)