In [1]:
from data import *

# Using %timeit: your turn!
You'd like to create a list of integers from 0 to 50 using the range() function. However, you are unsure whether using list comprehension or unpacking the range object into a list is faster. Let's use %timeit to find the best implementation.

For your convenience, a reference table of time orders of magnitude is provided below (faster at the top).

|symbol	| name	| unit (s) |
|-------|-------|----------|
|ns	|nanosecond	|10^9|
|µs (us)	|microsecond|	10^6|
|ms	|millisecond|	10^3|
|s	|second	|10^0|

In [2]:
# Create a list of integers (0-50) using list comprehension
nums_list_comp = [num for num in range(0,51)]
print(nums_list_comp)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50]


In [3]:
# Create a list of integers (0-50) by unpacking range
nums_unpack = [*range(51)]
print(nums_unpack)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50]


## Question
Use %timeit within your IPython console (i.e. not within the script.py window) to compare the runtimes for creating a list of integers from 0 to 50 using list comprehension vs. unpacking the range object. Don't include the print() statements when timing.

Which method was faster?

Unpacking the range object was faster than list comprehension.

In [4]:
%timeit [num for num in range(0,51)]

1.48 µs ± 217 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [5]:
%timeit [*range(51)]

547 ns ± 34.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


# Using %timeit: specifying number of runs and loops
A list of 480 superheroes has been loaded into your session (called heroes). You'd like to analyze the runtime for converting this heroes list into a set. Instead of relying on the default settings for %timeit, you'd like to only use 5 runs and 25 loops per each run.

What is the correct syntax when using %timeit and only using 5 runs with 25 loops per each run?

%timeit -r5 -n25 set(heroes)

In [6]:
%timeit -r5 -n25 set(heroes)

8.87 µs ± 954 ns per loop (mean ± std. dev. of 5 runs, 25 loops each)


# Using %timeit: formal name or literal syntax
Python allows you to create data structures using either a formal name or a literal syntax. In this exercise, you'll explore how using a literal syntax for creating a data structure can speed up runtimes.

|data structure	|formal name|	literal syntax|
|---------------|-----------|-----------------|
|list	|list()|	[]|
|dictionary	|dict()|	{}|
|tuple	|tuple()|	()|

In [7]:
# Create a list using the formal name
formal_list = list()
print(formal_list)

# Create a list using the literal syntax
literal_list = []
print(literal_list)

[]
[]


In [8]:
# Print out the type of formal_list
print(type(formal_list))

# Print out the type of literal_list
print(type(literal_list))

<class 'list'>
<class 'list'>


In [9]:
%timeit list()

79.6 ns ± 9.92 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In [10]:
%timeit []

20 ns ± 1.28 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)


## Question
Use %timeit in your IPython console to compare runtimes between creating a list using the formal name (list()) and the literal syntax ([]). Don't include the print() statements when timing.

Which naming convention is faster?

Using the literal syntax ([]) to create a list is faster.

# Using cell magic mode (%%timeit)
From here on out, you'll be working with a superheroes dataset. For this exercise, a list of each hero's weight in kilograms (called wts) is loaded into your session. You'd like to convert these weights into pounds.

You could accomplish this using the below for loop:

```python
hero_wts_lbs = []
for wt in wts:
    hero_wts_lbs.append(wt * 2.20462)
```

Or you could use a numpy array to accomplish this task:

```python
wts_np = np.array(wts)
hero_wts_lbs_np = wts_np * 2.20462
```

Use %%timeit in your IPython console to compare runtimes between these two approaches. Make sure to press SHIFT+ENTER after the magic command to add a new line before writing the code you wish to time. After you've finished coding, answer the following question:

Which of the above techniques is faster?

The numpy technique was faster.

In [11]:
def loop_technique():
    hero_wts_lbs = []
    for wt in wts:
        hero_wts_lbs.append(wt * 2.20462)

In [12]:
import numpy as np 

def numpy_technique():
    wts_np = np.array(wts)
    hero_wts_lbs_np = wts_np * 2.20462

In [13]:
%timeit loop_technique()

36.2 µs ± 1.69 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [14]:
%timeit numpy_technique()

15.5 µs ± 180 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


# Pop quiz: steps for using %lprun
Below is the convert_units() function, which converts the heights and weights of our favorite superheroes from metric units to Imperial units.

```python
def convert_units(heroes, heights, weights):

    new_hts = [ht * 0.39370  for ht in heights]
    new_wts = [wt * 2.20462  for wt in weights]

    hero_data = {}

    for i,hero in enumerate(heroes):
        hero_data[hero] = (new_hts[i], new_wts[i])

    return hero_data
```

Suppose you have a list of superheroes (named heroes) along with each hero's height (in centimeters) and weight (in kilograms) loaded as NumPy arrays (named hts and wts respectively).

What are the necessary steps you need to take in order to profile the convert_units() function acting on your superheroes data if you'd like to see line-by-line runtimes?

The first and second options from above are necessary

- Use %load_ext line_profiler to load the line_profiler within your IPython session.
- Use %lprun -f convert_units convert_units(heroes, hts, wts) to get line-by-line runtimes.

In [15]:
def convert_units(heroes, heights, weights):

    new_hts = [ht * 0.39370  for ht in heights]
    new_wts = [wt * 2.20462  for wt in weights]

    hero_data = {}

    for i,hero in enumerate(heroes):
        hero_data[hero] = (new_hts[i], new_wts[i])

    return hero_data

In [16]:
%load_ext line_profiler

In [17]:
%lprun -f convert_units convert_units(heroes, hts, wts)

Timer unit: 1e-07 s

Total time: 0.0008973 s
File: <ipython-input-15-d844055423a7>
Function: convert_units at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
     1                                           def convert_units(heroes, heights, weights):
     2                                           
     3         1        868.0    868.0      9.7      new_hts = [ht * 0.39370  for ht in heights]
     4         1        798.0    798.0      8.9      new_wts = [wt * 2.20462  for wt in weights]
     5                                           
     6         1          6.0      6.0      0.1      hero_data = {}
     7                                           
     8       481       3361.0      7.0     37.5      for i,hero in enumerate(heroes):
     9       480       3930.0      8.2     43.8          hero_data[hero] = (new_hts[i], new_wts[i])
    10                                           
    11         1         10.0     10.0      0.1      return hero_data

In [18]:
%timeit convert_units(heroes, hts, wts)

106 µs ± 6.95 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


# Using %lprun: spot bottlenecks
Profiling a function allows you to dig deeper into the function's source code and potentially spot bottlenecks. When you see certain lines of code taking up the majority of the function's runtime, it is an indication that you may want to deploy a different, more efficient technique.

Lets dig deeper into the convert_units() function.

```python
def convert_units(heroes, heights, weights):

    new_hts = [ht * 0.39370  for ht in heights]
    new_wts = [wt * 2.20462  for wt in weights]

    hero_data = {}

    for i,hero in enumerate(heroes):
        hero_data[hero] = (new_hts[i], new_wts[i])

    return hero_data
```

Load the line_profiler package into your IPython session. Then, use %lprun to profile the convert_units() function acting on your superheroes data. Remember to use the special syntax for working with %lprun (you'll have to provide a -f flag specifying the function you'd like to profile).

The convert_units() function, heroes list, hts array, and wts array have been loaded into your session. After you've finished coding, answer the following question:

What percentage of time is spent on the new_hts list comprehension line of code relative to the total amount of time spent in the convert_units() function?



In [19]:
def convert_units_broadcast(heroes, heights, weights):

    # Array broadcasting instead of list comprehension
    new_hts = heights * 0.39370
    new_wts = weights * 2.20462

    hero_data = {}

    for i,hero in enumerate(heroes):
        hero_data[hero] = (new_hts[i], new_wts[i])

    return hero_data

In [20]:
%load_ext line_profiler

The line_profiler extension is already loaded. To reload it, use:
  %reload_ext line_profiler


In [21]:
test_hts = np.array(hts)
test_wts = np.array(wts)

%lprun -f convert_units_broadcast convert_units_broadcast(heroes, test_hts, test_wts)

Timer unit: 1e-07 s

Total time: 0.0007267 s
File: <ipython-input-19-097b3089decf>
Function: convert_units_broadcast at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
     1                                           def convert_units_broadcast(heroes, heights, weights):
     2                                           
     3                                               # Array broadcasting instead of list comprehension
     4         1        113.0    113.0      1.6      new_hts = heights * 0.39370
     5         1         23.0     23.0      0.3      new_wts = weights * 2.20462
     6                                           
     7         1          6.0      6.0      0.1      hero_data = {}
     8                                           
     9       481       2819.0      5.9     38.8      for i,hero in enumerate(heroes):
    10       480       4287.0      8.9     59.0          hero_data[hero] = (new_hts[i], new_wts[i])
    11                             

In [22]:
%timeit convert_units_broadcast(heroes, test_hts, test_wts)

148 µs ± 2.77 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


# Using %mprun: Hero BMI
You'd like to calculate the body mass index (BMI) for a selected sample of heroes. BMI can be calculated using the below formula:

A random sample of 25,000 superheroes has been loaded into your session as an array called sample_indices. This sample is a list of indices that corresponds to each superhero's index selected from the heroes list.

A function named calc_bmi_lists has also been created and saved to a file titled bmi_lists.py. For convenience, it is displayed below:

```python
def calc_bmi_lists(sample_indices, hts, wts):

    # Gather sample heights and weights as lists
    s_hts = [hts[i] for i in sample_indices]
    s_wts = [wts[i] for i in sample_indices]

    # Convert heights from cm to m and square with list comprehension
    s_hts_m_sqr = [(ht / 100) ** 2 for ht in s_hts]

    # Calculate BMIs as a list with list comprehension
    bmis = [s_wts[i] / s_hts_m_sqr[i] for i in range(len(sample_indices))]

    return bmis
```

Notice that this function performs all necessary calculations using list comprehension (hence the name calc_bmi_lists()). Dig deeper into this function and analyze the memory footprint for performing your calculations using lists:

- Load the memory_profiler package into your IPython session.
- Import calc_bmi_lists from bmi_lists.
- Once you've completed the above steps, use %mprun to profile the calc_bmi_lists() function acting on your superheroes data. The hts array and wts array have already been loaded into your session.

After you've finished coding, answer the following question:

How much memory do the list comprehension lines of code consume in the calc_bmi_lists() function? (i.e., what is the total sum of the Increment column for these four lines of code?)

In [23]:
def loadSample(file_dir):
    with open(file=file_dir) as file:
        numbers = [int(line) for line in file]
    return numbers

sample_indices = loadSample('sample.txt')

In [24]:
%load_ext memory_profiler

In [25]:
from bmi_lists import calc_bmi_lists

In [26]:
%mprun -f calc_bmi_lists calc_bmi_lists(sample_indices, test_hts, test_wts)




Filename: e:\Github-Workspace\nhutnamhcmus\datacamp-playground\writing-efficient-python-code\2. Timing and profiling code\bmi_lists.py

Line #    Mem usage    Increment  Occurences   Line Contents
     1     62.7 MiB     62.7 MiB           1   def calc_bmi_lists(sample_indices, hts, wts):
     2                                         
     3                                             # Gather sample heights and weights as lists
     4     63.7 MiB      1.0 MiB       25003       s_hts = [hts[i] for i in sample_indices]
     5     64.6 MiB      1.0 MiB       25003       s_wts = [wts[i] for i in sample_indices]
     6                                         
     7                                             # Convert heights from cm to m and square with list comprehension
     8     65.7 MiB      1.1 MiB       25003       s_hts_m_sqr = [(ht / 100) ** 2 for ht in s_hts]
     9                                         
    10                                             # Calculate BMIs as

# Using %mprun: Hero BMI 2.0
Let's see if using a different approach to calculate the BMIs can save some memory. If you remember, each hero's height and weight is stored in a numpy array. That means you can use NumPy's handy array indexing capabilities and broadcasting to perform your calculations. A function named calc_bmi_arrays has been created and saved to a file titled bmi_arrays.py. For convenience, it is displayed below:

```python
def calc_bmi_arrays(sample_indices, hts, wts):

    # Gather sample heights and weights as arrays
    s_hts = hts[sample_indices]
    s_wts = wts[sample_indices]

    # Convert heights from cm to m and square with broadcasting
    s_hts_m_sqr = (s_hts / 100) ** 2

    # Calculate BMIs as an array using broadcasting
    bmis = s_wts / s_hts_m_sqr

    return bmis
```

Notice that this function performs all necessary calculations using arrays.

Let's see if this updated array approach decreases your memory footprint:

- Load the memory_profiler package into your IPython session.
- Import calc_bmi_arrays from bmi_arrays.
- Once you've completed the above steps, use %mprun to profile the calc_bmi_arrays() function acting on your superheroes data. The sample_indices array, hts array, and wts array have been loaded into your session.

After you've finished coding, answer the following question:

How much memory do the array indexing and broadcasting lines of code consume in the calc_bmi_array() function? (i.e., what is the total sum of the Increment column for these four lines of code?)

In [27]:
%load_ext line_profiler

The line_profiler extension is already loaded. To reload it, use:
  %reload_ext line_profiler


In [28]:
from bmi_arrays import calc_bmi_arrays

In [29]:
%mprun -f calc_bmi_arrays calc_bmi_arrays(sample_indices, test_hts, test_wts)




Filename: e:\Github-Workspace\nhutnamhcmus\datacamp-playground\writing-efficient-python-code\2. Timing and profiling code\bmi_arrays.py

Line #    Mem usage    Increment  Occurences   Line Contents
     1     63.1 MiB     63.1 MiB           1   def calc_bmi_arrays(sample_indices, hts, wts):
     2                                         
     3                                             # Gather sample heights and weights as arrays
     4     63.5 MiB      0.4 MiB           1       s_hts = hts[sample_indices]
     5     63.7 MiB      0.2 MiB           1       s_wts = wts[sample_indices]
     6                                         
     7                                             # Convert heights from cm to m and square with broadcasting
     8     64.0 MiB      0.3 MiB           1       s_hts_m_sqr = (s_hts / 100) ** 2
     9                                         
    10                                             # Calculate BMIs as an array using broadcasting
    11     64.0

# Bringing it all together: Star Wars profiling
A list of 480 superheroes has been loaded into your session (called heroes) as well as a list of each hero's corresponding publisher (called publishers).

You'd like to filter the heroes list based on a hero's specific publisher, but are unsure which of the below functions is more efficient.

```python
def get_publisher_heroes(heroes, publishers, desired_publisher):

    desired_heroes = []

    for i,pub in enumerate(publishers):
        if pub == desired_publisher:
            desired_heroes.append(heroes[i])

    return desired_heroes
```

```python
def get_publisher_heroes_np(heroes, publishers, desired_publisher):

    heroes_np = np.array(heroes)
    pubs_np = np.array(publishers)

    desired_heroes = heroes_np[pubs_np == desired_publisher]

    return desired_heroes
```


In [30]:
def get_publisher_heroes(heroes, publishers, desired_publisher):

    desired_heroes = []

    for i,pub in enumerate(publishers):
        if pub == desired_publisher:
            desired_heroes.append(heroes[i])

    return desired_heroes
def get_publisher_heroes_np(heroes, publishers, desired_publisher):

    heroes_np = np.array(heroes)
    pubs_np = np.array(publishers)

    desired_heroes = heroes_np[pubs_np == desired_publisher]

    return desired_heroes

In [31]:
import numpy as np

# Use get_publisher_heroes() to gather Star Wars heroes
star_wars_heroes = get_publisher_heroes(heroes, publishers, desired_publisher='George Lucas')

print(star_wars_heroes)
print(type(star_wars_heroes))

# Use get_publisher_heroes_np() to gather Star Wars heroes
star_wars_heroes_np = get_publisher_heroes_np(heroes, publishers, desired_publisher='George Lucas')

print(star_wars_heroes_np)
print(type(star_wars_heroes_np))

['Darth Vader', 'Han Solo', 'Luke Skywalker', 'Yoda']
<class 'list'>
['Darth Vader' 'Han Solo' 'Luke Skywalker' 'Yoda']
<class 'numpy.ndarray'>


In [32]:
%timeit get_publisher_heroes(heroes, publishers, desired_publisher='George Lucas')

18.7 µs ± 492 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [33]:
%timeit get_publisher_heroes_np(heroes, publishers, desired_publisher='George Lucas')

78.6 µs ± 1.88 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [34]:
%lprun -f get_publisher_heroes get_publisher_heroes(heroes, publishers, desired_publisher="George Lucas")

Timer unit: 1e-07 s

Total time: 0.0002598 s
File: <ipython-input-30-74702c55f56d>
Function: get_publisher_heroes at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
     1                                           def get_publisher_heroes(heroes, publishers, desired_publisher):
     2                                           
     3         1         11.0     11.0      0.4      desired_heroes = []
     4                                           
     5       481       1307.0      2.7     50.3      for i,pub in enumerate(publishers):
     6       480       1253.0      2.6     48.2          if pub == desired_publisher:
     7         4         24.0      6.0      0.9              desired_heroes.append(heroes[i])
     8                                           
     9         1          3.0      3.0      0.1      return desired_heroes

In [35]:
%lprun -f get_publisher_heroes_np get_publisher_heroes_np(heroes, publishers, desired_publisher="George Lucas")

Timer unit: 1e-07 s

Total time: 0.000108 s
File: <ipython-input-30-74702c55f56d>
Function: get_publisher_heroes_np at line 10

Line #      Hits         Time  Per Hit   % Time  Line Contents
    10                                           def get_publisher_heroes_np(heroes, publishers, desired_publisher):
    11                                           
    12         1        568.0    568.0     52.6      heroes_np = np.array(heroes)
    13         1        374.0    374.0     34.6      pubs_np = np.array(publishers)
    14                                           
    15         1        135.0    135.0     12.5      desired_heroes = heroes_np[pubs_np == desired_publisher]
    16                                           
    17         1          3.0      3.0      0.3      return desired_heroes

In [36]:
from publisher_heroes import get_publisher_heroes
from publisher_heroes_np import get_publisher_heroes_np

In [37]:
%mprun -f get_publisher_heroes_np get_publisher_heroes_np(heroes, publishers, desired_publisher="George Lucas")




Filename: e:\Github-Workspace\nhutnamhcmus\datacamp-playground\writing-efficient-python-code\2. Timing and profiling code\publisher_heroes_np.py

Line #    Mem usage    Increment  Occurences   Line Contents
     3     64.0 MiB     64.0 MiB           1   def get_publisher_heroes_np(heroes, publishers, desired_publisher):
     4                                         
     5     64.0 MiB      0.0 MiB           1       heroes_np = np.array(heroes)
     6     64.0 MiB      0.0 MiB           1       pubs_np = np.array(publishers)
     7                                         
     8     64.0 MiB      0.0 MiB           1       desired_heroes = heroes_np[pubs_np == desired_publisher]
     9                                         
    10     64.0 MiB      0.0 MiB           1       return desired_heroes

In [38]:
%mprun -f get_publisher_heroes get_publisher_heroes(heroes, publishers, desired_publisher="George Lucas")




Filename: e:\Github-Workspace\nhutnamhcmus\datacamp-playground\writing-efficient-python-code\2. Timing and profiling code\publisher_heroes.py

Line #    Mem usage    Increment  Occurences   Line Contents
     1     64.0 MiB     64.0 MiB           1   def get_publisher_heroes(heroes, publishers, desired_publisher):
     2                                         
     3     64.0 MiB      0.0 MiB           1       desired_heroes = []
     4                                         
     5     64.0 MiB      0.0 MiB         481       for i,pub in enumerate(publishers):
     6     64.0 MiB      0.0 MiB         480           if pub == desired_publisher:
     7     64.0 MiB      0.0 MiB           4               desired_heroes.append(heroes[i])