# Caching

There are multiple meanings of the word "caching" when it comes to computers and, particularly, code performance. In this section we will look at two common meanings relevant to code performance.

## Caching Variables

When certain values are calculated repeatedly, it may be worth saving their values rather than calculating them repeatedly. The more times the value is calculated and the more complex the calculation, the more viable this strategy becomes.

For example, consider the following codes which aim to calcualte:

$\sum\limits_{i=0}^{1000}\sum\limits_{j=0}^{1000}\sin{\left(\frac{i\pi}{1000}\right)}\sin{\left(\frac{j\pi}{1000}\right)} $

In [1]:
!pip install line_profiler
import math
%load_ext line_profiler

def sum_function():
  result=0

  for i in range(0, 1001):
    for j in range(0, 1001):
      result = result + math.sin(i * math.pi / 1000)*math.sin(j * math.pi / 1000)

  return(result)

%lprun -f sum_function print(sum_function())

Collecting line_profiler
[?25l  Downloading https://files.pythonhosted.org/packages/d8/cc/4237472dd5c9a1a4079a89df7ba3d2924eed2696d68b91886743c728a9df/line_profiler-3.0.2-cp36-cp36m-manylinux2010_x86_64.whl (68kB)
[K     |████▊                           | 10kB 16.7MB/s eta 0:00:01[K     |█████████▌                      | 20kB 1.8MB/s eta 0:00:01[K     |██████████████▎                 | 30kB 2.3MB/s eta 0:00:01[K     |███████████████████             | 40kB 1.7MB/s eta 0:00:01[K     |███████████████████████▉        | 51kB 1.9MB/s eta 0:00:01[K     |████████████████████████████▋   | 61kB 2.2MB/s eta 0:00:01[K     |████████████████████████████████| 71kB 2.1MB/s 
Installing collected packages: line-profiler
Successfully installed line-profiler-3.0.2
405284.0679028919


The first thing we might notice is that we're currently performing the operation $\frac{\pi}{1000}$ 1,000,000 times and this will always have the same value. We can pre-calculate this value once and use it repeatedely:

In [2]:
!pip install line_profiler
import math
%load_ext line_profiler

def sum_function():
  result=0
  pi_over_1000 = math.pi / 1000

  for i in range(0, 1001):
    for j in range(0, 1001):
      result = result + math.sin(i * pi_over_1000)*math.sin(j * pi_over_1000)

  return(result)

%lprun -f sum_function print(sum_function())

The line_profiler extension is already loaded. To reload it, use:
  %reload_ext line_profiler
405284.06790289184


The next thing we might notice is that there are two nested ```for``` loops. The variable ```j``` takes 1000 different values for each value ```i``` takes. Thus, we can calculate the value $\sin{\left(\frac{i\pi}{1000}\right)}$ and cache it inside the outer loop:

In [3]:
!pip install line_profiler
import math
%load_ext line_profiler

def sum_function():
  result=0
  pi_over_1000=math.pi/1000

  for i in range(0, 1001):
    sin_i = math.sin(i * pi_over_1000)
    for j in range(0, 1001):
      result = result + sin_i * math.sin(j * pi_over_1000)

  return(result)

%lprun -f sum_function print(sum_function())

The line_profiler extension is already loaded. To reload it, use:
  %reload_ext line_profiler
405284.06790289184


This reduces the number of times we call ```math.sin``` from 2,000,000 to 1,001,000.

Finally, we might notice that we actually only call ```math.sin``` with 1,000 different values so we can actually create a list of the resultant values to cache them:

In [4]:
!pip install line_profiler
import math
%load_ext line_profiler

def sum_function():
  result=0
  pi_over_1000=math.pi/1000

  sin_values=[]

  for i in range(0, 1001):
    sin_values.append(math.sin(i * pi_over_1000))

  for sin_i in sin_values:
    for sin_j in sin_values:
      result = result + sin_i * sin_j

  return(result)

%lprun -f sum_function print(sum_function())

The line_profiler extension is already loaded. To reload it, use:
  %reload_ext line_profiler
405284.06790289184


The resulting code calls the ```sin``` function 1,000 times and runs in about half the time compared to the original code. However, it does use more memory and is less readable.

## Caching Function Results

Often, functions will be called repeatedly with the same values passed as arguments and, thus, returning the same result. If a function is complex, it's possible to save a significant amount of time by noting a function has been called before and returning the value that was called then without performing the body of the function. For example, the following code tests a recursive function designed to calcualte the Fibonacci sequence:

In [5]:
import cProfile

def fibonacci(n):
  if n < 2:
    return(n)
  else:
    return(fibonacci(n-1) + fibonacci(n-2))

cProfile.run('print(fibonacci(32))')

2178309
         7049193 function calls (39 primitive calls) in 1.570 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
7049155/1    1.570    0.000    1.570    1.570 <ipython-input-5-fa872ea90043>:3(fibonacci)
        1    0.000    0.000    1.570    1.570 <string>:1(<module>)
        3    0.000    0.000    0.000    0.000 iostream.py:195(schedule)
        2    0.000    0.000    0.000    0.000 iostream.py:307(_is_master_process)
        2    0.000    0.000    0.000    0.000 iostream.py:320(_schedule_flush)
        2    0.000    0.000    0.000    0.000 iostream.py:382(write)
        3    0.000    0.000    0.000    0.000 iostream.py:93(_event_pipe)
        3    0.000    0.000    0.000    0.000 socket.py:342(send)
        3    0.000    0.000    0.000    0.000 threading.py:1062(_wait_for_tstate_lock)
        3    0.000    0.000    0.000    0.000 threading.py:1104(is_alive)
        3    0.000    0.000    0.000    0.000 threading.py:

When we run this code we see the function is called a large number of times to calculate the desired value. We know that the function will only be called with values of ```n``` less than 32, however. This means if we could cache the results of the function with those 32 values of ```n``` we could eliminate the bodies of most of the functions and thus most of the function calls and most of the time spent.

It's possible to tell Python to cache the results of calls to a function automatically. This stores the results behind the scenes for the last few combinations of arguments used. To do this we may import the ```lru_cache``` "decorator" from the ```functools``` module and adding it to the function:

In [6]:
import cProfile
from functools import lru_cache

@lru_cache(maxsize=32)
def fibonacci(n):
  if n < 2:
    return(n)
  else:
    return(fibonacci(n-1) + fibonacci(n-2))

cProfile.run('print(fibonacci(32))')

2178309
         71 function calls (39 primitive calls) in 0.000 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     33/1    0.000    0.000    0.000    0.000 <ipython-input-6-8ba46ca0dba5>:4(fibonacci)
        1    0.000    0.000    0.000    0.000 <string>:1(<module>)
        3    0.000    0.000    0.000    0.000 iostream.py:195(schedule)
        2    0.000    0.000    0.000    0.000 iostream.py:307(_is_master_process)
        2    0.000    0.000    0.000    0.000 iostream.py:320(_schedule_flush)
        2    0.000    0.000    0.000    0.000 iostream.py:382(write)
        3    0.000    0.000    0.000    0.000 iostream.py:93(_event_pipe)
        3    0.000    0.000    0.000    0.000 socket.py:342(send)
        3    0.000    0.000    0.000    0.000 threading.py:1062(_wait_for_tstate_lock)
        3    0.000    0.000    0.000    0.000 threading.py:1104(is_alive)
        3    0.000    0.000    0.000    0.000 threading.py:506(i

Decorators, when added to functions, modify how the function behaves. In this case, the ```lru_cache``` decorator causes the resutls of the functions for the last ```maxsize``` unique combinations of arguments provided. When one of the stored combinations of arguments is used to call the function, the cached value is returned instead of calling the function in its entirety.

In this case, the body of most function is bypassed in almost every case and, as almost all function calls are in the bodies of function, most function calls are also eliminated. This means the number of function calls is reduced from over 7,000,000 to just 33 and the run-time is also decreased from over a second to almost nothing.

This example happens to be a case where this tactic is particularly effective as we can guarantee that there will only be a small number of values passed as an argument and the number of function calls was intially very high.

## Exercise
Look at the sample code below. It shows a code designed to calculate the sum:

$\sum\limits_{0}^{1,000,000} \left(\cos{\left(\frac{i\pi}{k}\right)}\right)^{2}$

In the version left for you to edit, try to optimise the code using caching (either of the approaches discussed earlier in this notebook).

In the second copy of it, edit it using the principles of caching described above to make it run more quickly. Note that there are two sample solutions showing two different levels of caching and one further example which removes the need for caching through calculation.

In [9]:
# The intial function
import cProfile
import math

#This function returns the values of cos(x) ** 2
def   cos_squared(x):
  return(math.cos(x) ** 2)

# This function calcualtes the desired sum
def sum_cos_squared_i_pi_over_k(k, n):
  # This assert statement makes sute K is an integer. If it's not, an error will be thrown. 
  # This allows us to assume k is an integer
  assert(type(k) == int)
  
  #Initialise evaluation at 0 and use it to track the cumulative sum
  evaluation = 0

  for i in range(n):
    # Add each value of cos(i * pi / k) ** 2
    evaluation = evaluation + cos_squared(i * math.pi / k)

  return(evaluation)

cProfile.run('print(sum_cos_squared_i_pi_over_k(7, 1000001))')

500001.3117440707
         2000041 function calls in 0.602 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.265    0.265    0.602    0.602 <ipython-input-9-546a4875c4df>:10(sum_cos_squared_i_pi_over_k)
  1000001    0.230    0.000    0.337    0.000 <ipython-input-9-546a4875c4df>:6(cos_squared)
        1    0.000    0.000    0.602    0.602 <string>:1(<module>)
        3    0.000    0.000    0.000    0.000 iostream.py:195(schedule)
        2    0.000    0.000    0.000    0.000 iostream.py:307(_is_master_process)
        2    0.000    0.000    0.000    0.000 iostream.py:320(_schedule_flush)
        2    0.000    0.000    0.000    0.000 iostream.py:382(write)
        3    0.000    0.000    0.000    0.000 iostream.py:93(_event_pipe)
        3    0.000    0.000    0.000    0.000 socket.py:342(send)
        3    0.000    0.000    0.000    0.000 threading.py:1062(_wait_for_tstate_lock)
        3    0.000    0.000    0.

In [0]:
# For you to edit
import cProfile
import math

#This function returns the values of cos(x) ** 2
def   cos_squared(x):
  return(math.cos(x) ** 2)

# This function calcualtes the desired sum
def sum_cos_squared_i_pi_over_k(k, n):
  # This assert statement makes sute K is an integer. If it's not, an error will be thrown. 
  # This allows us to assume k is an integer
  assert(type(k) == int)
  
  #Initialise evaluation at 0 and use it to track the cumulative sum
  evaluation = 0

  for i in range(n):
    # Add each value of cos(i * pi / k) ** 2
    evaluation = evaluation + cos_squared(i * math.pi / k)

  return(evaluation)

cProfile.run('print(sum_cos_squared_i_pi_over_k(7, 1000001))')

In [8]:
#@title

# We first notice that we're calcualting cos(i*pi/7 + 2 * n * pi) for i = 0 - 13
# The value returned by this will be independent of the value of n
# Thus we can cache the values of cos_squared for 14 values
# The 14 values should be i*pi/7 for i = 0 - 13
import cProfile
import math
from functools import lru_cache

@lru_cache(maxsize=14)
def   cos_squared(x):
  return(math.cos(x) ** 2)

def sum_cos_squared_i_pi_over_k(k, n):
  assert(type(k) == int)
  
  evaluation = 0

  for i in range(n):
    evaluation = evaluation + cos_squared(i % 14 * math.pi / k)

  return(evaluation)

cProfile.run('print(sum_cos_squared_i_pi_over_k(7, 1000001))')

500001.3117449025
         67 function calls in 0.290 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       14    0.000    0.000    0.000    0.000 <ipython-input-8-43e30ac9eb46>:11(cos_squared)
        1    0.290    0.290    0.290    0.290 <ipython-input-8-43e30ac9eb46>:15(sum_cos_squared_i_pi_over_k)
        1    0.000    0.000    0.290    0.290 <string>:1(<module>)
        3    0.000    0.000    0.000    0.000 iostream.py:195(schedule)
        2    0.000    0.000    0.000    0.000 iostream.py:307(_is_master_process)
        2    0.000    0.000    0.000    0.000 iostream.py:320(_schedule_flush)
        2    0.000    0.000    0.000    0.000 iostream.py:382(write)
        3    0.000    0.000    0.000    0.000 iostream.py:93(_event_pipe)
        3    0.000    0.000    0.000    0.000 socket.py:342(send)
        3    0.000    0.000    0.000    0.000 threading.py:1062(_wait_for_tstate_lock)
        3    0.000    0.000    0.000 

In [11]:
#@title

# Now, we note that we're asking the lru_cache to keep track of which values have been used
# We can actually do this more efficiently by caching and accessing these values ourselves
# We cache the 14 values in a list
import cProfile
import math
from functools import lru_cache

@lru_cache(maxsize=14)
def   cos_squared(x):
  return(math.cos(x) ** 2)

def sum_cos_squared_i_pi_over_k(k, n):
  assert(type(k) == int)
  
  #Create the list
  cos_squared_values=[]

  # Populate the list
  for i in range(2 * k):
    cos_squared_values.append(cos_squared(i * math.pi / k))

  evaluation = 0

  # The appropriate value from the list to call will be the i%k'th value
  for i in range(n):
    evaluation = evaluation + cos_squared_values[i % k]

  return(evaluation)

cProfile.run('print(sum_cos_squared_i_pi_over_k(7, 1000001))')

500001.3117449025
         81 function calls in 0.080 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       14    0.000    0.000    0.000    0.000 <ipython-input-11-78ea43ca144d>:10(cos_squared)
        1    0.080    0.080    0.080    0.080 <ipython-input-11-78ea43ca144d>:14(sum_cos_squared_i_pi_over_k)
        1    0.000    0.000    0.080    0.080 <string>:1(<module>)
        3    0.000    0.000    0.000    0.000 iostream.py:195(schedule)
        2    0.000    0.000    0.000    0.000 iostream.py:307(_is_master_process)
        2    0.000    0.000    0.000    0.000 iostream.py:320(_schedule_flush)
        2    0.000    0.000    0.000    0.000 iostream.py:382(write)
        3    0.000    0.000    0.000    0.000 iostream.py:93(_event_pipe)
        3    0.000    0.000    0.000    0.000 socket.py:342(send)
        3    0.000    0.000    0.000    0.000 threading.py:1062(_wait_for_tstate_lock)
        3    0.000    0.000    0.00

In [0]:
#@title

# Next we realise that the for loop is largely redundant as the same value is being added a large number of times
# We can calculate the nubmer of times a given value will be added and, instead, add that values multiplied by that number of times
# This actually largely removes the need for caching.
# This is the fastest version, but is more difficult to understand at a glance
import cProfile
import math

def   cos_squared(x):
  return(math.cos(x) ** 2)

def sum_cos_squared_i_pi_over_k(k, n):
  assert(type(k) == int)
  
  evaluation = 0

  # Loop over the 14 values of i which produce unique solutions
  for i in range(2 * k):
    # Each value of i will be represented (n - i - 1) // (2 * k) + 1 times
    evaluation = evaluation + ((n - i - 1) // (2 * k) + 1) * cos_squared(i % 14 * math.pi / k)

  return(evaluation)

cProfile.run('print(sum_cos_squared_i_pi_over_k(7, 1000001))')