<span><h1 style="color:#4987FF">
Using Appropriate Data Structures
</h1></span>

### For any given problem that you know how to solve with code, you can probably think of a few different ways of solving it.
### For example, you might know you could use a numpy array or a list for one part, or a for-loop or list-comprehension for another

### This notebook will focus on heuristics for choosing the 'best' approach for a given problem

In [4]:
"""
1st) Lists vs. numpy arrays -

General:
    Lists -> 
    mutable, can insert / remove elements efficiently, cannot operate on all elements at once
    
    np.arrays -> 
    mutable (other than size), must create new array to insert / remove elements,
    can operate on all elements at once.
    
Heuristics:
    > If you want to gradually build up a list-like structure (and don't know final size), use a list
    > If you have already know what your elements are (or at least how many you'll have),
      and want to do operations on them all, use a np.array
"""

import numpy as np
import timeit

def build_list(count):
    out = []
    for i in range(count):
        out.append(i)
    return out

def build_array(count):
    out = np.array([])
    for i in range(count):
        out = np.append(out,i)
    return out

print('build_list, build_array;')
%timeit build_list(5000)
%timeit build_array(5000)

prebuilt_array = np.linspace(0,5000)
prebuilt_list = list(prebuilt_array)

def square_all_array(list_struct):
    return list_struct**2

def square_all_list(list_struct):
    out = []
    for i in list_struct:
        out.append(i**2)
    return out

print('\n\nsquare_all_list, square_all_array;')
%timeit square_all_list(prebuilt_list)
%timeit square_all_array(prebuilt_array)

build_list, build_array;
100 loops, best of 3: 2.26 ms per loop
10 loops, best of 3: 156 ms per loop


square_all_list, square_all_array;
10000 loops, best of 3: 78.9 µs per loop
The slowest run took 8.60 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 5.11 µs per loop


### Note: An argument could be made that I've compared apples with oranges here, but that's kind of the point; the structures aren't the same, and don't support the same types of operations sometimes. I've tried to be fair and implement each as optimally as possible for that data type in each case.

In [5]:
"""
2nd) For loops vs. List comprehensions -

General:
    Less extreme differences in speed.
    Main advantages of each are;
        LC - shorter, quicker to write, slightly faster
        FL - easier to read, can edit multiple variables in the loop
    
Heuristics:
    > if your for loop changes one variable, and you care about speed, do it as a list comprehension
    > if it does more than one thing, and/or changes more than one variable, keep it as a for loop
    
Note: This heuristic is more to do with readability than speed, other than the small benefit of using LCs
Also Note: Nesting is supported in list comprehensions.
"""

# Using prebuilt_list from last example
def square_list_FL(list_struct):
    out = []
    for element in list_struct:
        out.append(element**2)
    return out

def square_list_LC(list_struct):
    return [element**2 for element in list_struct]

print('square_list_FL, square_list_LC;')
%timeit square_list_FL(prebuilt_list)
%timeit square_list_LC(prebuilt_list)

square_list_FL, square_list_LC;
10000 loops, best of 3: 88.1 µs per loop
10000 loops, best of 3: 68.8 µs per loop


In [8]:
"""
3rd) np.ndarray vs. np.matrix - 

General:
    np.matrix may be deprecated soon
    np.array can be used instead of np.matrix in almost all cases
    Development on np.matrix has slowed (or stopped?) and so np.array can be a few times faster in some cases
    
Heuristic:
    Always prefer np.ndarray to np.matrix
"""

def generate_dense_matrix(N, density=0.2):
    real = np.random.rand(N, N) > density/2
    imaginary =(np.random.rand(N, N) > density/2)*1j
    return real+imaginary

big_ndarray = generate_dense_matrix(500)
big_matrix = np.matrix(generate_dense_matrix(500))

%timeit big_matrix**300
%timeit big_ndarray**300

1 loop, best of 3: 1.36 s per loop
1 loop, best of 3: 186 ms per loop
