# Table of Contents (click to jump):
## [Native Python Solution](#natpython)
## ["Smarter" Python Solution](#smartpython)
## [Numpy Solution](#numpypython)
## [Numba Solution](#numbapython)

# Python Speed Module

# Function Headers and Objects
These are used throughout the program. The different implementations I used throughout this study are located within their respective sections.. This cell should be ran at the start of looking at the notebook so that all of the functions are in memory.

In [4]:
from ctypes import CDLL
randlib = CDLL("libc.so.6")
import time, sys

# Look here for spoilers of what I used to speed up code
import numpy as np
from numba import jit, cuda

def genrand3dpt(MIN, MAX):
    X = (MAX-MIN)*(float(randlib.rand())/2147483647)+MIN
    Y = (MAX-MIN)*(float(randlib.rand())/2147483647)+MIN
    Z = (MAX-MIN)*(float(randlib.rand())/2147483647)+MIN
    return X,Y,Z

class point:
    def __init__(self, x:float ,y: float, z: float):
        self.x, self.y, self.z = x, y, z
    
def setCase(case):
    if(case=='T1'):
        seed = 7
        pts = 100
        expected = "Expected Min, Max: 526.986 15183.808\n"
    elif case=='T2':
        seed = 7
        pts = 1000
        expected = "Expected Min, Max: 70.299 15784.777\n"
    elif case=='T3':
        seed = 7
        pts = 10000
        expected = "Expected Min, Max: 9.730 16509.943\n"
    elif case=='T4':
        seed = 7
        pts = 30000
        expected = "Expected Min, Max:9.270 16643.182\n"
    elif case=='T5':
        seed = 7
        pts = 40005
        expected = "Expected Min, Max: 8.705 16830.027\n"
    return seed, pts, expected

# Native Python Solution <a class="anchor" id="natpython"></a>
The Intention here is to demonstrate that native python is slow. I tried to get it as close to the original C code just for demonstration purposes. I know I can do some list comprehension stuff to make it probably faster and more succinct but that's not really the point of this particular test. It is nearly a 1 to 1 version of my C implementation but doing things slightly more pythonic

In [45]:
def shortest_native(srandseed, num_points):

    points = []
    for i in range(num_points):
        tempx, tempy, tempz = genrand3dpt(0, 10000) # call random points macro
        points.append(point(tempx, tempy, tempz)) # this part differs from the C code 
                                                  # in the fact that we're adding to an array as
                                                  # we go through the generation.
    loopMAX = 0
    loopMIN = 1.7976931348623157e+308 # kinda big
    
    for i in range(num_points):      
        for j in range(i+1,num_points):
            xsqrd = (points[i].x - points[j].x)**2
            ysqrd = (points[i].y - points[j].y)**2
            zsqrd = (points[i].z - points[j].z)**2
            distance = xsqrd + ysqrd + zsqrd
            if loopMAX < distance:
                loopMAX = distance
            elif loopMIN > distance:
                loopMIN = distance
          
    print(loopMIN**0.5, loopMAX**0.5)
    

In [46]:
%%time
seed, pts, expected = setCase('T1')
randlib.srand(seed)
print(expected)
shortest_native(seed, pts)


Expected Min, Max: 526.986 15183.808

526.9855752684347 15183.807320792597
CPU times: user 5.65 ms, sys: 0 ns, total: 5.65 ms
Wall time: 3.65 ms


## Native Python Summary
As expected, the Python ran *slower* than the C code. Who would have thought?

**Important Note** *Python floats internally are the same as doubles in C, so I had to run my original C code with doubles instead of floats. Timings vary from original code for this reason.*

**System Specifications** AMD Ryzen 5 3600 6 cores, overclocked to 4.2Mhz

### Here is a table containing timings for all five cases:

| Case Name |Points| C Code Time | Python Timing  |
|:---:       |:---:|   :----:   |           ---: |
|T1|100|**real** 0.0 sec<br/>**user** 0.0 sec<br/>**sys** 0.0 sec<br/> | **real** 0.00419 sec<br/>**user** 0.00655 sec<br/>**sys** 0.0 sec<br/> |
| | | |
|T2|1000|**real** 0.0 sec<br/>**user** 0.0 sec<br/>**sys** 0.0 sec<br/> |  **real** 0.304 sec<br/>**user** 0.306 sec<br/>**sys** 0.0 sec<br/> |
| | | |
|T3|10000|**real** 0.08 sec<br/>**user** 0.08 sec<br/>**sys** 0.0 sec<br/> |**real** 29.8 sec<br/>**user** 29.8 sec<br/>**sys** 0.00033 sec<br/>                |
| | | |
|T4|30000| **real** 0.74 sec<br/>**user** 0.74 sec<br/>**sys** 0.0 sec<br/> | **real** 4min 16s 🤣 <br/>**user** 4min 16 s<br/> sys 0ns |
| | | |
|T5|40000|**real** 1.41 sec<br/>**user** 1.41 sec<br/>**sys** 0.0 sec<br/>  |**real** 7min 36s 🤡<br/>**user** 7min 36s<br/> sys 0ns|

**Thermals**: Peaked at 70 degrees celsius, idle temp 32-45 degrees

If you could not already tell by my emojis, I was most suprised by case 4 and 5. Without optimization, special methods, or other trickery, the Python gets smoked by the C. I am really just preaching to the choir here...python is slower than C-any first year programmer could have told you that. My intention here is to have a baseline to beat. If we aren't going faster than this, we aren't doing the purpose of the study.

# ===============================================

# First Optimization: "Smarter" Python <a class="anchor" id="smartpython"></a>
List comprehensions, not allocating objects and other micro-optimizations should help our code run just a tad bit faster. 

## A Few Things First
There are few things that I found while reading online and also testing code that can significantly improve the speed of python code. One of the first things I found was trying to avoid creating objects for number crunching. The first thing I did was store my numbers by their pure values in a 3d array rather than creating a 1d array of point objects. Run the cell below to see this difference.

In [4]:
# Demonstrating that Allocation without Objects is Faster...
num_points=50000

start= time.time()
points = []
for i in range(num_points):
    tempx, tempy, tempz = genrand3dpt(0, 10000)
    points.append(point(tempx, tempy, tempz))
print("Using Objects",time.time()-start)
del points

start= time.time()
points = [ [each for each in genrand3dpt(0, 10000)] for i in range(num_points)]
print("Pure Points, no Objects",time.time()-start)
del points

Using Objects 0.09811925888061523
Pure Points, no Objects 0.08208441734313965


The next thing I did was convert any loops I could to list comprehensions. Run the cell below to see the time difference between using range(), in container, mapping to lambda, and list comprehensions. What you should see is that list comprehensions are indeed the fastest

In [5]:
start= time.time()
test = [i for i in range(0,10000000)]
for i in range(10000000):
    test[i] = test[i]**2

print("Using range() construct", time.time()-start)
print(test[:5],'\n')
del test

start= time.time()
test = [i for i in range(0,10000000)]
for i, each in enumerate(test):
    test[i] = each**2

print("Using in/enumerate statement",time.time()-start)
print(test[:5],'\n')
del test

# just for fun
start = time.time()
test = [i for i in range(0,10000000)]
test=list(map(lambda x: x**2, test))

print("Using lamba trickery", time.time()-start)
print(test[:5],'\n')
del test

start= time.time()
test = [i for i in range(0,10000000)]
test = [each**2 for each in test]
print("Using list comprehension", time.time()-start)
print(test[:5],'\n')
del test

Using range() construct 2.7793142795562744
[0, 1, 4, 9, 16] 

Using in/enumerate statement 2.8723955154418945
[0, 1, 4, 9, 16] 

Using lamba trickery 2.3698582649230957
[0, 1, 4, 9, 16] 

Using list comprehension 2.1546289920806885
[0, 1, 4, 9, 16] 



Sure the time differences using these two techniques are quite small, but these will add up during execution. The resulting code that was created with these optimizations was far shorter. The implementation is below:

In [32]:
def shortest_smartpython(srandseed, num_points):
    points = [ [each for each in genrand3dpt(0, 10000)] for i in range(num_points)]
    loopMAX = 0
    loopMIN = 1.7976931348623157e+308

    for i in range(len(points)):
        for j in range(i+1, num_points):
            distance = (points[i][0] - points[j][0])**2 + (points[i][1] - points[j][1])**2 + (points[i][2]- points[j][2])**2
            if loopMAX < distance:
                loopMAX = distance
            elif loopMIN > distance:
                loopMIN = distance

    print(loopMIN**0.5, loopMAX**0.5)

In [33]:
%%time
seed, pts, expected = setCase('T2')
randlib.srand(seed)
print(expected)
shortest_smartpython(seed, pts) # 7408973.643113739

Expected Min, Max: 70.299 15784.777

[[4869.041393915676, 8679.774123560532, 5925.911942462396], [2147.0984034925227, 102.26538875245741, 5148.185619687749], [9959.482527319102, 319.3230229985542, 6015.652793466884], [553.4484659104834, 5267.799438567738, 893.7405473057836], [7644.367086535491, 8154.919030216951, 8889.724807296752], [1638.986729848658, 2155.0440146378446, 7876.8271337621045], [7886.977730266273, 666.4578619722547, 4325.436914491204], [533.801331433375, 3409.3195029577796, 5098.388085653255], [160.24685006600194, 2613.7003873538692, 3636.968142183948], [7594.23427637398, 359.2309403974707, 724.071776831556], [1815.5351894979997, 5228.2723343131465, 9403.845905048702], [7741.447131960395, 7375.370742462283, 9506.111293801157], [2889.6327469915304, 7334.85326512477, 9825.434321456325], [8905.285545115026, 7888.301731035254, 5093.2337553674515], [9799.026092420809, 5532.668812914131, 3248.1527809277886], [8688.750899717561, 7171.655542762789, 5403.196800222247], [6565.5780

# ===============================================

## "Smarter" Python Summary
The smarter python tended to be faster on average by a few seconds. I believe this speed increase is pretty much because of the allocation of the array being faster. Originally, I had the arithmetic inside of a list comprehension, but I was getting wall times of 25 minutes on case 5 which is obviously less than ideal. 

For brevity I am no longer writing the C code timings nor the point count for each case. Below is a table with my average timings.

| Case Name | Python Timing  | How much faster? |
|:---:       |:---:|        ---: |
| T1 | **real** 0.00371 sec<br/>**user** 0.00521 sec<br/>**sys** 0.0 sec<br/>  | 0.00371 seconds|
| T1 | **real** 0.26 sec<br/>**user** 0.262 sec<br/>**sys** 0.0 sec<br/>  | 0.042 seconds |
| T3 | **real** 27 sec<br/>**user** 27 sec<br/>**sys** 0.0 sec<br/>  | 2.8 seconds |
| T4 | **real** 3 min 57 sec<br/>**user** 3 min 57 sec<br/>**sys** 0.0 sec<br/>  | 19 seconds|
| T5 | **real** 6 min 55s sec<br/>**user** 6min 55s sec<br/>**sys** 0.0 sec<br/>  | 41 seconds|

I am positive there are even more optimizations you can do using basic python, but let's be honest here...if I am doing serious scientific programming, seven minutes of wall time for a relatively simple program is not acceptable.


# Second Optimization: Numpy Arrays<a class="anchor" id="numpypython"></a>
I would be remiss if I didn't mention the elephant in the room. Numpy arrays are C based so they are usually wickedly fast. They should allow for some pretty decent speed increases.

## A Few Things First (Again)
When I first started running a numpy optimization, I was actually running **slower** than the regular python code. The only real thing I did different was declaring my python array as a numpy array then stepping through it like the regular algorithm. 

This was a mistake. It was only after I changed my code to benefit from numpy functions and not step through explicitly using loops that I began to see speed improvements. The website here: https://shihchinw.github.io/2019/03/performance-tips-of-numpy-ndarray.html offers a great explanation of why this is. Most of it was review from systems (don't step through column stored arrays row major or vice versa) but the author makes a point to talk about Vectorization which in turn explains why not using loops leads to such higher speeds. My current implementation is much faster than the regular python, but still slower than C.

I imagine taking that loop out and replacing it with a numpy implicit method would make this even faster, but I found that combining methods ended up with odd results (namely np.add.reduce(np.subtract(points[i], points[i+1:])\*\*2 ,1) ). Below is my faster implementation:

In [193]:


def shortest_numpypython(srandseed, num_points):
    points = np.empty((num_points,3), dtype=np.double)

    for i in range(num_points):
        points[i]  = genrand3dpt(0, 10000) # call random points macro    
    loopMAX = 0
    loopMIN = 1.7976931348623157e+308
    
    for i in range(num_points-1):
        shorts =np.subtract(points[i], points[i+1:])**2 # subtracts [i, i+1], [i,i+2]...[i, i+n] where i = index's xyz points
        # Array appearance: [ [(x0-x1)^2, (y0-y1)^2, (z0-z1)^2], [(x0-x2)^2, (y0-y2)^2, (z0-z2)^2], [(x0-xn)^2, y0-yn)^2, (z0-zn)^2]]
        
        shorts=np.add.reduce(shorts, 1) # takes row with xdist, ydist, zdist and adds them all together, turns into column of 
                                        # non square rooted euclidean distance.
            
        # shorts basically equals 1d arr: [((x0-x1)^2 + (y0-y1)^2 + (z0-z1)^2), ((x0-x2)^2+ (y0-y2)^2+ (z0-z2)^2) ]
        if loopMAX < np.amax(shorts):
            loopMAX = np.amax(shorts)
        elif loopMIN > np.amin(shorts):
            loopMIN =  np.amin(shorts)

    print(np.sqrt(loopMIN), np.sqrt(loopMAX))



In [201]:
%%time
seed, pts, expected = setCase('T4')
randlib.srand(seed)
print(expected)
shortest_numpypython(seed, pts) # 7408973.643113739

Expected Min, Max:9.270 16643.182

9.27028748694085 16643.18207021964
CPU times: user 8.91 s, sys: 0 ns, total: 8.91 s
Wall time: 8.9 s


## Numpy Python Summary
As expected, numpy improved speeds quite a bit. My initial hunch was that the smaller amount of points will cause little time difference because of the extra overhead of allocating the array reducing the speed increases gained by using a numpy array. As you will see in the results, this is mostly correct. It was indeed faster, but not really by all that much in cases 1 and 2 (10, 100 points respectively). Case 3 showed a massive speed increase for its 10000 points. Case 4 and 5 were the most satisfying because they cut down computation time by literal minutes.

I have attached a table showing numpy speeds, and the original table showing the vanilla python code timings. You can see the improvement.

###  <center>Numpy Python</center>
| Case Name | Python Timing  | How much faster? |
|:---:       |:---:|        ---: |
| T1 | **real** 0.00306 sec<br/>**user** 0.00468 sec<br/>**sys** 0.0 sec<br/>  |0.00116 sec|
| T2 | **real** 0.0236 sec<br/>**user** 0.0254 sec<br/>**sys** 0.0 sec<br/>  | 0.068 sec |
| T3 | **real** 1.07 sec<br/>**user** 1.07 sec<br/>**sys** 0.0 sec<br/>  | 28.71 sec |
| T4 | **real** 8.91 sec<br/>**user** 8.91 sec<br/>**sys** 0.0 sec<br/>  | 4 min, 7 sec !!! |
| T5 | **real** 15.7 sec<br/>**user** 15.7 sec<br/>**sys** 0.00272 sec<br/>  | 7 min, 16.3 sec !!!!|

###  <center>Vanilla Python</center>
| Case Name |Points| C Code Time | Python Timing  |
|:---:       |:---:|   :----:   |           ---: |
|T1|100|**real** 0.0 sec<br/>**user** 0.0 sec<br/>**sys** 0.0 sec<br/> | **real** 0.00419 sec<br/>**user** 0.00655 sec<br/>**sys** 0.0 sec<br/> |
| | | |
|T2|1000|**real** 0.0 sec<br/>**user** 0.0 sec<br/>**sys** 0.0 sec<br/> |  **real** 0.304 sec<br/>**user** 0.306 sec<br/>**sys** 0.0 sec<br/> |
| | | |
|T3|10000|**real** 0.08 sec<br/>**user** 0.08 sec<br/>**sys** 0.0 sec<br/> |**real** 29.8 sec<br/>**user** 29.8 sec<br/>**sys** 0.00033 sec<br/>                |
| | | |
|T4|30000| **real** 0.74 sec<br/>**user** 0.74 sec<br/>**sys** 0.0 sec<br/> | **real** 4min 16s 🤣 <br/>**user** 4min 16 s<br/> sys 0ns |
| | | |
|T5|40000|**real** 1.41 sec<br/>**user** 1.41 sec<br/>**sys** 0.0 sec<br/>  |**real** 7min 36s 🤡<br/>**user** 7min 36s<br/> sys 0ns|

I am happy with these results, but I was a little disappointed that I couldn't just make one modification to the code and see speed increases. I had to do a lot of googling on methods and also rewrite my loop arithmetic. It might be my laziness talking, but it would certainly be nice if there was some way to speed up my python code without adding too much to the actual code itself...

# Third Optimization: Numba <a class="anchor" id="numbapython"></a>
Oh yeah...there is a library that kind of does that. Is it too good to be true?

## Intro
I work with someone named Andy at work who is a mechanical engineer that has basically become a software engineer and just a few days ago was telling him about my independent study. He recommended that I check out a library he had used in the past to optimize python speed called Numba. The way he explained it working, I was kind of skeptical at first, but once I read into the methodology behind it I was a bit more convinced.

### How does it Work?
From their website: *"Numba generates optimized machine code from pure Python code using the LLVM compiler infrastructure. With a few simple annotations, array-oriented and math-heavy Python code can be just-in-time optimized to performance similar as C, C++ and Fortran, without having to switch languages or Python interpreters."*

In short, they turn Python into a JIT compiled language! I was familiar with the term because of Theory of Programming and knew that it was certainly going to be faster than regular python code. What I deduced is that my first iteration of the loop was going to be slower while subsequent iterations were going to be faster. 

To use numba, all one has to do is import it and then decorate their function with the @jit annotator. Optional arguments can be supplied within parentheses after it.

I would say that is it and you are done but that would be a lie. If you are lazy like me and don't want to change a lot, you have to change the order of one step in the process (at least of this algorithm).

Numba doesn't like complex types and function calls within your annotated function. If they are there, they need to be annotated with types. This got complicated when I was calling a C function through an imported dll. What I did to get around this was move my creation of my array to outside the body of the function, then just pass it through as an argument.

Below is my implementation. You may notice that it is basically the same as the smart python method of allocation. The only difference between this and that is I am using a numpy array for storage for faster allocation and passing of data, and it also has the @jit annotator

In [6]:
@jit
def shortest_numba(points, ptcnt):
    loopMAX = 0
    loopMIN = 1.7976931348623157e+308 # kinda big
    for i in range(len(points)):
        for j in range(i+1, ptcnt):
            distance = (points[i][0] - points[j][0])**2 + (points[i][1] - points[j][1])**2 + (points[i][2]- points[j][2])**2
            if loopMAX < distance:
                loopMAX = distance
            elif loopMIN > distance:
                loopMIN = distance
          
    print(loopMIN**0.5, loopMAX**0.5)

"""
@jit(target="cuda")
def shortest_cudanumba(points, ptcnt):
    loopMAX = 0
    loopMIN = 1.7976931348623157e+308 # kinda big
    for i in range(len(points)):
        for j in range(i+1, ptcnt):
            distance = (points[i][0] - points[j][0])**2 + (points[i][1] - points[j][1])**2 + (points[i][2]- points[j][2])**2
            if loopMAX < distance:
                loopMAX = distance
            elif loopMIN > distance:
                loopMIN = distance
          
          
    print(loopMIN**0.5, loopMAX**0.5)
    
"""

'\n@jit(target="cuda")\ndef shortest_cudanumba(points, ptcnt):\n    loopMAX = 0\n    loopMIN = 1.7976931348623157e+308 # kinda big\n    for i in range(len(points)):\n        for j in range(i+1, ptcnt):\n            distance = (points[i][0] - points[j][0])**2 + (points[i][1] - points[j][1])**2 + (points[i][2]- points[j][2])**2\n            if loopMAX < distance:\n                loopMAX = distance\n            elif loopMIN > distance:\n                loopMIN = distance\n          \n          \n    print(loopMIN**0.5, loopMAX**0.5)\n    \n'

In [12]:
%%time
seed, pts, expected = setCase('T1')
randlib.srand(seed)
print(expected)
points = np.empty((pts,3), dtype=np.double)

for i in range(pts):
    points[i]  = genrand3dpt(0, 10000) # call random points macro   
shortest_numba(points, pts)

Expected Min, Max: 526.986 15183.808

526.9855752684347 15183.807320792597
CPU times: user 4.27 ms, sys: 190 µs, total: 4.46 ms
Wall time: 2.22 ms


## Numba Python Summary
Out of all of the solutions, this one satisified me the most. WOW. Nearly C speeds for hardly any changes to my original python code. That is beyond remarkable in my honest opinion. Here is what Numba has going for it: simple import, simple change to code, very fast code speedups, fallback to python if it can't JIT compile without having issues with types. I really enjoy how fairly naive solution of this problem in python could be saved just by using this library.

Here is the final table of my timings:

###  <center>Numba</center>
| Case Name | Python Timing  | How much faster? |
|:---:       |:---:|        ---: |
| T1 | **real** 0.00222 sec<br/>**user** 0.00427 sec<br/>**sys** 0.0 sec<br/>  | 0.002 slower|
| T2 | **real** 0.00038 sec<br/>**user** 0.0 sec<br/>**sys** 0.000513 sec<br/>  | 0.30362 sec |
| T3 | **real** 0.0869 sec<br/>**user** 1.07 sec<br/>**sys** 0.0 sec<br/>  | 29 sec |
| T4 | **real** 0.656 sec<br/>**user** 0.649 sec<br/>**sys** 0.00987 sec<br/>  | 4 min, 15 sec !!! |
| T5 | **real** 1.14 sec<br/>**user** 1.15 sec<br/>**sys** 0.000335 sec<br/>  | 7 min, 35 sec !!!!|

The first two rows' timings make sense because a common function of most JIT compilers are they are supposed to 'know' when it is okay to compile. Sometimes they may pick and choose what to compile because the time required for compilation and running beats out the time to just run it. It was only when those loops got siginificantly more impactful in size that the JIT timing started showing up. 

Let me bring back a chart for a second to jog your memory...when I saw this I chuckled:

| Case Name |Points| C Code Time | Python Timing  |
|:---:       |:---:|   :----:   |           ---: |
|T1|100|**real** 0.0 sec<br/>**user** 0.0 sec<br/>**sys** 0.0 sec<br/> | **real** 0.00419 sec<br/>**user** 0.00655 sec<br/>**sys** 0.0 sec<br/> |
| | | |
|T2|1000|**real** 0.0 sec<br/>**user** 0.0 sec<br/>**sys** 0.0 sec<br/> |  **real** 0.304 sec<br/>**user** 0.306 sec<br/>**sys** 0.0 sec<br/> |
| | | |
|T3|10000|**real** 0.08 sec<br/>**user** 0.08 sec<br/>**sys** 0.0 sec<br/> |**real** 29.8 sec<br/>**user** 29.8 sec<br/>**sys** 0.00033 sec<br/>                |
| | | |
|T4|30000| **real** 0.74 sec<br/>**user** 0.74 sec<br/>**sys** 0.0 sec<br/> | **real** 4min 16s 🤣 <br/>**user** 4min 16 s<br/> sys 0ns |
| | | |
|T5|40000|**real** 1.41 sec<br/>**user** 1.41 sec<br/>**sys** 0.0 sec<br/>  |**real** 7min 36s 🤡<br/>**user** 7min 36s<br/> sys 0ns|

Sure we smoked the python in terms of speed, but you know what else we smoked in case 4 and 5? **The C code!**
I am quite positive that was a coincidence, but thought it was funny enough to mention. Regardless, it puts Numba in my toolbox as a viable option, despite its certain limitations.

## Final Results Summary
In terms of speed gains versus time I put into it, Numba came out up top. The numba solution took a whopping 5 minutes to put together (not counting the time it took to write the initial algorithm). I am sure the Numpy solution could have been faster than the Numba solution, but I was already about an hour deep into the numpy solution just researching ways to vectorize the code so that way I would see the speed increases I wanted. What resulted was a very syntatically different code that in my opinion isn't as readable as the basic python code. That is one thing the original implementation had over all the others: it was very easy to follow. I think any programmer-even if they *don't* code in python-could follow what was going on (Perhaps even a non programmer could?) This aspect was slighlty muddified by the "smarter" python solution's use of list comprehensions, and then lost even more with the numpy implementation.

That is yet another reason why i like Numba. I kept the simplicity of my original python code but got numpy speeds for less effort. I see that as an absolute win. What numpy does have on Numba however is the ability to use user functions and things like that much easier than numba.

I guess there is really a use case for each solution here now that I think about it.

For fun, here is the One Table to Rule them All (I *dare* the reader to double click this cell and look at the markdown code)

|Case Name|Points|C Code Time|Python Timing|Smarter Python Timing |Numpy Timing |Numba Timing |
|:---:    |:---:|   :----:   |  :----:     | :----:   | :----:   |            ---: |
|T1|100|**real** 0.0 sec<br/>**user** 0.0 sec<br/>**sys** 0.0 sec<br/> | **real** 0.00419 sec<br/>**user** 0.00655 sec<br/>**sys** 0.0 sec<br/> | **real** 0.00371 sec<br/>**user** 0.00521 sec<br/>**sys** 0.0 sec<br/> |**real** 0.00306 sec<br/>**user** 0.00468 sec<br/>**sys**|**real** 0.00222 sec<br/>**user** 0.00427 sec<br/>**sys** 0.0 sec<br/>  |
| | | |
|T2|1000|**real** 0.0 sec<br/>**user** 0.0 sec<br/>**sys** 0.0 sec<br/> |  **real** 0.304 sec<br/>**user** 0.306 sec<br/>**sys** 0.0 sec<br/> |**real** 0.26 sec<br/>**user** 0.262 sec<br/>**sys** 0.0 sec<br/> |**real** 0.0236 sec<br/>**user** 0.0254 sec<br/>**sys** 0.0 sec<br/>|**real** 0.00038 sec<br/>**user** 0.0 sec<br/>**sys** 0.000513 sec<br/>  |
| | | |
|T3|10000|**real** 0.08 sec<br/>**user** 0.08 sec<br/>**sys** 0.0 sec<br/> |**real** 29.8 sec<br/>**user** 29.8 sec<br/>**sys** 0.00033 sec<br/> |**real** 27 sec<br/>**user** 27 sec<br/>**sys** 0.0 sec<br/> |**real** 1.07 sec<br/>**user** 1.07 sec<br/>**sys** 0.0 sec<br/> |**real** 0.0869 sec<br/>**user** 1.07 sec<br/>**sys** 0.0 sec<br/>  | 
| | | |
|T4|30000| **real** 0.74 sec<br/>**user** 0.74 sec<br/>**sys** 0.0 sec<br/> | **real** 4min 16s 🤣 <br/>**user** 4min 16 s<br/> sys 0ns |**real** 3 min 57 sec<br/>**user** 3 min 57 sec<br/>**sys** 0.0 sec<br/>|**real** 8.91 sec<br/>**user** 8.91 sec<br/>**sys** 0.0 sec<br/>  |**real** 0.656 sec<br/>**user** 0.649 sec<br/>**sys** 0.00987 sec<br/>  |
| | | |
|T5|40000|**real** 1.41 sec<br/>**user** 1.41 sec<br/>**sys** 0.0 sec<br/>  |**real** 7min 36s 🤡<br/>**user** 7min 36s<br/> sys 0ns|**real** 6 min 55s sec<br/>**user** 6min 55s sec<br/>**sys** 0.0 sec<br/>|**real** 15.7 sec<br/>**user** 15.7 sec<br/>**sys** 0.00272 sec<br/> | **real** 1.14 sec<br/>**user** 1.15 sec<br/>**sys** 0.000335 sec<br/>  |


 
 

# ================================================

# Parallelization, Calling C and Other Methods: Part 2 of the Hit Series "Speeding Up My Code"