Note: This notebook was created as part of the DataCamp course of the same name

# Writing Efficient Python Code

As a Data Scientist, the majority of your time should be spent gleaning actionable insights from data -- not waiting for your code to finish running. Writing efficient Python code can help reduce runtime and save computational resources, ultimately freeing you up to do the things you love as a Data Scientist. In this course, you'll learn how to use Python's built-in data structures, functions, and modules to write cleaner, faster, and more efficient code. We'll explore how to time and profile code in order to find bottlenecks. Then, you'll practice eliminating these bottlenecks, and other bad design patterns, using Python's Standard Library, NumPy, and pandas. After completing this course, you'll have the necessary tools to start writing efficient Python code!

**Instructor:** Logan Thomas, Scientific Software Technical Trainer @ Enthought

# $\star$ Chapter 1: Foundations for efficiencies
In this chapter, you'll learn what it means to write efficient Python code. You'll explore Python's Standard Library, learn about NumPy arrays, and practice using some of Python's built-in tools. This chapter builds a foundation for the concepts covered ahead.

#### Course overview
* Write cleaner, faster, more efficient Python code
* Time and profile your code for bottlenecks
* Eliminate bottlenecks and bad design patterns using
    * Python's Standard Library
    * NumPy 
    * pandas
#### Definining efficient
* In the context of this course, **efficient** refers to code that satisfies two key concepts:\
    * Minimal completion time (*fast runtime*)
        * Small latency between execution and returning a result
    * Minimal resource consumption (*small memory footprint*)
        * skillfully allocates resources without unnecessary overhead
* **The goal of writing efficient code is to reduce both *latency* and *overhead*.**

#### Defining Pythonic
* Focus on *readability*
* Using Python's constructs as intended ("Pythonic")
* Pythonic code tends to be less verbose and easier to interpret
* Although Python supports code that doesn't follow its guiding principles, this type of code tends to run slower
* **Pythonic code = efficient code**

```
# Non-Pythonic
doubled_numbers = []

for i in range(len(numbers)):
    doubled_numbers.append(numbers[i]*2)
```
***

```    
# Pythonic
doubled_numbers = [x *2 for x in numbers]
```

**The Zen of Python**- Tim Peters

In [2]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


#### Pythonic vs. Non-Pythonic Looping

In [3]:
names = ['Jerry', 'Kramer', 'Elaine', 'George', 'Newman']

In [4]:
# Print the list created using the Non-Pythonic approach
i = 0
new_list= []
while i < len(names):
    if len(names[i]) >= 6:
        new_list.append(names[i])
    i += 1
print(new_list)

['Kramer', 'Elaine', 'George', 'Newman']


A more *Pythonic* approach would loop over the contents of `names`, rather than using an index variable. Print `better_list`.

In [5]:
# Print the list created by looping over the contents of names
better_list = []
for name in names:
    if len(name) >= 6:
        better_list.append(name)
print(better_list)

['Kramer', 'Elaine', 'George', 'Newman']


The best *Pythonic* way of doing this is by using list comprehension. Print `best_list`.

In [6]:
# Print the list created by using list comprehension
best_list = [name for name in names if len(name) >= 6]
print(best_list)

['Kramer', 'Elaine', 'George', 'Newman']


### Building with built-ins
* Python comes with a number of built-in components that you can think of as a "batteries included" concept
* Built-in components are referred to as the **Python Standard Library**
* Built-in types:
    * `list`, `tuple`, `set`, `dict`, and others
* Built-in functions:
    * `print()`, `len()`, `range()`, `round()`, `enumerate()`, `map()`, `zip()`, and others
* Built-in modules:
    * `os`, `sys`, `itertools`, `collections`, `math`, and others
* **Python's built-ins have been optimized to work within the Python language itself.** Therefore, we should default to using a built-in solution (if one exists), rather than developing our own.

### `range(start, stop, step)`
* Creates a range object, which we can then convert into a list and print
* Unpack a range object with **`*`**  

In [26]:
nums = range(11)
print(nums)

range(0, 11)


In [27]:
print(list(nums))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


***

In [28]:
print(nums)

range(0, 11)


In [29]:
print(*nums)

0 1 2 3 4 5 6 7 8 9 10


In [34]:
nums2 = [*nums]

In [35]:
print(nums2)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]


In [36]:
print([*range(1,12,2)])

[1, 3, 5, 7, 9, 11]


### `enumerate(list, start=)`
* creates an indexed list of objects by:
* creates an index-item pair for each item in the object provided
* Similar to range, **enumerate returns an enumerate object, which can also be converted into a list and printed.**

In [9]:
letters = ['a', 'b', 'c', 'd']

In [10]:
indexed_letters = enumerate(letters)

In [11]:
indexed_letters_list = list(indexed_letters)
print(indexed_letters_list)

[(0, 'a'), (1, 'b'), (2, 'c'), (3, 'd')]


* We can also specify the starting index of `enumerate` with the keyword argument **`start`**
* Below, we tell enumerate to start the index at five by passing `start = 5` into the function call.

In [13]:
indexed_letters2 = enumerate(letters, start=5)

In [14]:
indexed_letters2_list = list(indexed_letters2)
print(indexed_letters2_list)

[(5, 'a'), (6, 'b'), (7, 'c'), (8, 'd')]


### `map(function, object)`
* Applies a function to each element in an object
* Note that `map` also returns a map object, which can then be converted into a list and printed
* First argument = function you'd like to apply
* Second argument = object you'd like to apply the function on (for example: list)

In [15]:
nums = [1.5, 2.3, 3.4, 4.6, 5.0]

In [17]:
rnd_nums = map(round, nums)

In [18]:
print(list(rnd_nums))

[2, 2, 3, 5, 5]


#### `map()` + `lambda`
* `map()` can also be used with a `lambda`, or anonymous, function
* Notice below that we can use `map` and a `lambda` expression to apply a self-defined function to our original list `nums`.
* `map` provides a quick and clean way to appy a function to an object iteratively without writing a for loop.

In [19]:
nums = [1, 2, 3, 4, 5]

In [20]:
sqrd_nums = map(lambda x: x ** 2, nums)

In [21]:
print(sqrd_nums)

<map object at 0x7fb9f0a57d00>


In [22]:
print(list(sqrd_nums))

[1, 4, 9, 16, 25]


#### Exercises: range()

In [37]:
# Create a range object that goes from 0 to 5
nums = range(6)
print(type(nums))

# Convert nums to a list
nums_list = list(nums)
print(nums_list)

# Create a new list of odd numbers from 1 to 11 by unpacking a range object
nums_list2 = [*range(1,12,2)]
print(nums_list2)

<class 'range'>
[0, 1, 2, 3, 4, 5]
[1, 3, 5, 7, 9, 11]


#### Exercises: enumerate()

In [38]:
names = ['Jerry', 'Kramer', 'Elaine', 'George', 'Newman']

If you wanted to attach an index representing a person's arrival order, you could use the following for loop:

```
indexed_names = []
for i in range(len(names)):
    index_name = (i, names[i])
    indexed_names.append(index_name)

[(0,'Jerry'),(1,'Kramer'),(2,'Elaine'),(3,'George'),(4,'Newman')]
```

But, that's not the most efficient solution. Let's explore how to use `enumerate()` to make this more efficient.

In [39]:
# Rewrite the for loop to use enumerate
indexed_names = []
for i, name in enumerate(names):
    index_name = (i,name)
    indexed_names.append(index_name) 
print(indexed_names)

[(0, 'Jerry'), (1, 'Kramer'), (2, 'Elaine'), (3, 'George'), (4, 'Newman')]


Rewrite the previous for loop using `enumerate()` and list comprehension to create a new list, `indexed_names_comp`.

In [40]:
# Rewrite the above for loop using list comprehension
indexed_names_comp = [(i,name) for i,name in enumerate(names)]
print(indexed_names_comp)

[(0, 'Jerry'), (1, 'Kramer'), (2, 'Elaine'), (3, 'George'), (4, 'Newman')]


Create another list (`indexed_names_unpack`) by using the star character (`*`) to unpack the *enumerate object* created from using `enumerate()` on `names`.

In [42]:
print([*enumerate(names)])

[(0, 'Jerry'), (1, 'Kramer'), (2, 'Elaine'), (3, 'George'), (4, 'Newman')]


This time, start the index for `enumerate()` at one instead of zero.

In [41]:
print([*enumerate(names, 1)])

[(1, 'Jerry'), (2, 'Kramer'), (3, 'Elaine'), (4, 'George'), (5, 'Newman')]


#### Exercises: `map()`
Suppose you wanted to create a new list (called `names_uppercase`) that converted all the letters in each name to uppercase. you could accomplish this with the below for loop:

```
names_uppercase = []

for name in names:
  names_uppercase.append(name.upper())

['JERRY', 'KRAMER', 'ELAINE', 'GEORGE', 'NEWMAN']
```

Let's explore using the `map()` function to do this more efficiently in one line of code.

In [43]:
# Use map to apply str.upper to each element in names
names_map  = map(str.upper, names)

# Print the type of the names_map
print(type(names_map))

# Unpack names_map into a list
names_uppercase = [*names_map]

# Print the list created above
print(names_uppercase)

<class 'map'>
['JERRY', 'KRAMER', 'ELAINE', 'GEORGE', 'NEWMAN']


### The power of NumPy arrays
* NumPy arrays provide a fast and memory efficient alternative to Python lists
* Numpy arrays are **homogeneous**, meaning they *must* contain elements of the same type.
* Get type of each element using the `.dtype`. method
* **Homogeneity allows NumPy arrays to be more memory efficient and faster than Python lists.**
* Requiring that all elements be the same type eliminates the overhead needed for data type checking.

### NumPy array broadcasting
* When analyzing data, you'll often want to perform operations over entire collections of values quickly 
    * Python lists do *not* support broadcasting.
* A big advantange of numpy arrays is their **broadcasting functionality.**
* **NumPy arrays vectorize operations so they are performed on all elements of an object at once.**
* This allows us to efficiently perform calculations over entire arrays

```
nums = [-2, -1, 0, 1, 2]
nums ** 2
TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'
```

In [45]:
import numpy as np

In [46]:
nums_np = np.array([-2, -1, 0, 1, 2])
nums_np ** 2

array([4, 1, 0, 1, 4])

* Notice that by squaring the `nums_np` array above, each **element** is square at once.

#### NumPy array indexing
* Another advantage of NumPy is its **indexing capabilities**
* When comparing 1D lists with 1D np arrays, the indexing capabilities are identical. 
* However, when comparing 2D arrays and lists, the advantages of arrays are clear

<img src='data/efficient1.png' width="600" height="300" align="center"/>

* Notice that with lists, we must use a list comprehension to return columns.

#### NumPy array boolean indexing
* Suppose we wanted to gather only positive numbers from the sequence below

<img src='data/efficient2.png' width="600" height="300" align="center"/>

* To do this using a list, we need to either:
    * Write a for loop to filter the list, or;
    * Use a list comprehension
* In either case, using a NumPy array to index is less verbose and has a faster runtime

#### Exercises:

```
# Print second row of nums
print(nums[1,:])

# Print all elements of nums that are greater than six
print(nums[nums > 6])

# Double every element of nums
nums_dbl = nums * 2
print(nums_dbl)

# Replace the third column of nums
nums[:,2] = nums[:,2] + 1
print(nums)
```

#### Exercises

```
# Create a list of arrival times
arrival_times = [*range(10, 51, 10)]

# Convert arrival_times to an array and update the times
arrival_times_np = np.array(arrival_times)
new_times = arrival_times_np - 3

# Use list comprehension and enumerate to pair guests to new times
guest_arrivals = [(names[guest],time) for guest,time in enumerate(new_times)]

# Map the welcome_guest function to each (guest,time) pair
welcome_map = map(welcome_guest, guest_arrivals)

guest_welcomes = [*welcome_map]
print(*guest_welcomes, sep='\n')
```

# $\star$ Chapter 2: Timing and profiling code
In this chapter, you will learn how to gather and compare runtimes between different coding approaches. You'll practice using the line_profiler and memory_profiler packages to profile your code base and spot bottlenecks. Then, you'll put your learnings to practice by replacing these bottlenecks with efficient Python code.

### Examining runtime
* Here, we will learn how to examine the runtime of our code
* Runtime is an important consideration when thinking about efficiency
* To compare runtimes, we need to be able to compute the runtime for a line or multiple lines of code
* Calculate runtime with IPython magic command `%timeit`

#### Magic commands
* **Magic commands** are enhancements that have been added on top of normal Python syntax
* These commands are prefixed with the percentage sign `%`
* See all available magic commands with `%lsmagic`

In [47]:
%lsmagic

Available line magics:
%alias  %alias_magic  %autoawait  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %colors  %conda  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %man  %matplotlib  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %pip  %popd  %pprint  %precision  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%debug  %%file  %%html  %%javascript  %%js  %%latex  %%markdown  %%perl  %%prun  %%pypy  %%

<img src='data/string23.png' width="600" height="300" align="center"/>