# Making stuff run faster.  A few strategies. 

This lesson presents a few options for how to make things run faster or more efficieintly in python.  

gde 4.2.2020

### Don't worry about it, unless it's actually a problem!

"The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming."

-- Donald Knuth

"Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%."

    Variant in Knuth, "Structured Programming with Goto Statements". Computing Surveys 6:4 (December 1974), pp. 261–301, §1. doi:10.1145/356635.356640

## Understanding the basics

There are different types of resource issues you may run into when working with computers.  The solution will depend on which of these is a constraint.

First, understand that there are capacity problems and there are speed problems.  Capacity problems include: 

1. Running out of memory
2. Running out of storage space

In a capacity problem, the time for an operation is basically unaffected until you hit some limit, then you are out of luck.  In terms of a speed problem, there are several things that will affect how long your program takes to run.  It is worth understanding what is fast and slow on computers.  From fast to slow, you can expect:

1. CPU operations 
2. Accessing RAM 
3. Sending output to the console
4. Reading from a solid-state drive
5. Writing to a solid-state drive
6. Reading/writing from a traditional hard drive
7. Access over a local network 
8. Access over the internet
...
100. Anything that involves a human

That last one includes both any manual data processing, and the act of writing the code itself.  In scientific computing, most of the programs we write will be used by just a couple of people.  What we really care about is minimizing the total time to getting results.  


## Step 1: Diagnose the problem

### 1. Use the resource monitor

In [None]:
# a computationally intensive function
import math 

for i in range(0,10000):
    x = math.factorial(i)


In [None]:
# a memory intensive function
l = []
for i in range(0,10000):
    new_list = [4.564234] * i * i
    l.append(new_list)

### 2. Use a timer

In [3]:
# overall time
import math 
import datetime

start_time = datetime.datetime.now()
for i in range(0,10000):
    x = math.factorial(i)

time_elapsed = datetime.datetime.now() - start_time
print('Finished in ', time_elapsed)


Finished in  0:00:10.989655


In [4]:
# time for multiple steps

import math 
import datetime

start_time = datetime.datetime.now()

# step 1
for i in range(0,10000):
    x = math.factorial(i)
step1_time = datetime.datetime.now() - start_time    
print('Finished step 1 in ', step1_time)

# step 2
l = []
for i in range(0,1000):
    new_list = [4.564234] * i * i
    l.append(new_list)
step2_time = datetime.datetime.now() - (start_time + step1_time)
print('Finished step 2 in ', step2_time)


Finished step 1 in  0:00:11.027193
Finished step 2 in  0:00:02.080413


### 3. Use a profiling tool

https://docs.python.org/3/library/profile.html

https://www.pluralsight.com/guides/quick-profiling-in-python

https://jiffyclub.github.io/snakeviz/

https://mortada.net/easily-profile-python-code-in-jupyter.html

https://jakevdp.github.io/PythonDataScienceHandbook/01.07-timing-and-profiling.html

1. Built-in profile does method by method profiling

In [None]:
import cProfile

In [None]:
def factorial_range(n):
    for i in range(0,n):
        x = math.factorial(i)

In [None]:
pr = cProfile.Profile()
pr.enable()
factorial_range(10000)
pr.disable()
pr.print_stats()

In [None]:
pr.dump_stats("factorial_range_profile.txt")

2. To do line-by-line profile, using line_profiler, potentially with kernprof.  

First, install: 

    conda install line_profiler

In [5]:
%load_ext line_profiler

The line_profiler extension is already loaded. To reload it, use:
  %reload_ext line_profiler


In [6]:
%lprun -f factorial_range factorial_range(10000)

UsageError: Could not find function 'factorial_range'.
NameError: name 'factorial_range' is not defined


3. To do memory profiling, use memory_profiler



### Chunking

HDF5
pandas chunking.
Multithreading.