# Table of Contents (click to jump):
## [Native Python Solution](#natpython)
## ["Smarter" Python Solution](#smartpython)

# Python Speed Module

In [3]:
from ctypes import CDLL
randlib = CDLL("libc.so.6")

import numpy as np
import time, sys

# Function Headers and Objects
These are used throughout the program Different implementations I used throughout this study are located within their respective sections.. This cell should be ran at the start of looking at the notebook so that all of the functions are in memory.

In [4]:


def genrand3dpt(MIN, MAX):
    X = (MAX-MIN)*(float(randlib.rand())/2147483647)+MIN
    Y = (MAX-MIN)*(float(randlib.rand())/2147483647)+MIN
    Z = (MAX-MIN)*(float(randlib.rand())/2147483647)+MIN
    return X,Y,Z

class point:
    def __init__(self, x:float ,y: float, z: float):
        self.x, self.y, self.z = x, y, z
    
def setCase(case):
    if(case=='T1'):
        seed = 7
        pts = 100
        expected = "Expected Min, Max: 526.986 15183.808\n"
    elif case=='T2':
        seed = 7
        pts = 1000
        expected = "Expected Min, Max: 70.299 15784.777\n"
    elif case=='T3':
        seed = 7
        pts = 10000
        expected = "Expected Min, Max: 9.730 16509.943\n"
    elif case=='T4':
        seed = 7
        pts = 30000
        expected = "Expected Min, Max:9.270 16643.182\n"
    elif case=='T5':
        seed = 7
        pts = 40005
        expected = "Expected Min, Max: 8.705 16830.027\n"
    return seed, pts, expected

# Native Python Solution <a class="anchor" id="natpython"></a>
The Intention here is to demonstrate that native python is slow. I tried to get it as close to the original C code just for demonstration purposes. I know I can do some list comprehension stuff to make it probably faster and more succinct but that's not really the point of this particular test. It is nearly a 1 to 1 version of my C implementation but doing things slightly more pythonic

In [None]:
def shortest_native(srandseed, num_points):

    points = []
    for i in range(num_points):
        tempx, tempy, tempz = genrand3dpt(0, 10000) # call random points macro
        points.append(point(tempx, tempy, tempz)) # this part differs from the C code 
                                                  # in the fact that we're adding to an array as
                                                  # we go through the generation.
    loopMAX = 0
    loopMIN = 1.7976931348623157e+308 # kinda big
    
    for i in range(num_points):      
        for j in range(i+1,num_points):
            xsqrd = (points[i].x - points[j].x)**2
            ysqrd = (points[i].y - points[j].y)**2
            zsqrd = (points[i].z - points[j].z)**2
            distance = xsqrd + ysqrd + zsqrd
            if loopMAX < distance:
                loopMAX = distance
            elif loopMIN > distance:
                loopMIN = distance
          
    print(loopMIN**0.5, loopMAX**0.5)r
    

In [28]:
%%time
seed, pts, expected = setCase('T5')
randlib.srand(seed)
print(expected)
shortest_native(seed, pts)


Expected Min, Max: 8.705 16830.027

8.704671647258278 16830.026589988684
CPU times: user 7min 36s, sys: 0 ns, total: 7min 36s
Wall time: 7min 36s


## Native Python Summary
As expected, the Python ran *slower* than the C code. Who would have thought?

**Important Note** *Python floats internally are the same as doubles in C, so I had to run my original C code with doubles instead of floats. Timings vary from original code for this reason.*

**System Specifications** AMD Ryzen 5 3600 6 cores, overclocked to 4.2Mhz

### Here is a table containing timings for all five cases:

| Case Name |Points| C Code Time | Python Timing  |
|:---:       |:---:|   :----:   |           ---: |
|T1|100|**real** 0.0 sec<br/>**user** 0.0 sec<br/>**sys** 0.0 sec<br/> | **real** 0.00419 sec<br/>**user** 0.00655 sec<br/>**sys** 0.0 sec<br/> |
| | | |
|T2|1000|**real** 0.0 sec<br/>**user** 0.0 sec<br/>**sys** 0.0 sec<br/> |  **real** 0.304 sec<br/>**user** 0.306 sec<br/>**sys** 0.0 sec<br/> |
| | | |
|T3|10000|**real** 0.08 sec<br/>**user** 0.08 sec<br/>**sys** 0.0 sec<br/> |**real** 29.8 sec<br/>**user** 29.8 sec<br/>**sys** 0.00033 sec<br/>                |
| | | |
|T4|30000| **real** 0.74 sec<br/>**user** 0.74 sec<br/>**sys** 0.0 sec<br/> | **real** 4min 16s 🤣 <br/>**user** 4min 16 s<br/> sys 0ns |
| | | |
|T5|40000|**real** 1.41 sec<br/>**user** 1.41 sec<br/>**sys** 0.0 sec<br/>  |**real** 7min 36s 🤡<br/>**user** 7min 36s<br/> sys 0ns|

**Thermals**: Peaked at 70 degrees celsius, idle temp 32-45 degrees

If you could not already tell by my emojis, I was most suprised by case 4 and 5. Without optimization, special methods, or other trickery, the Python gets smoked by the C. I am really just preaching to the choir here...python is slower than C-any first year programmer could have told you that. My intention here is to have a baseline to beat. If we aren't going faster than this, we aren't doing the purpose of the study.

# ===============================================

# First Optimization: "Smarter" Python <a class="anchor" id="smartpython"></a>
List comprehensions, not allocating objects and other micro-optimizations should help our code run just a tad bit faster. 

## A Few Things First
There are few things that I found while reading online and also testing code that can significantly improve the speed of python code. One of the first things I found was trying to avoid creating objects for number crunching. The first thing I did was store my numbers by their pure values in a 3d array rather than creating a 1d array of point objects. Run the cell below to see this difference.

In [311]:
# Demonstrating that Allocation without Objects is Faster...
num_points=50000

start= time.time()
points = []
for i in range(num_points):
    tempx, tempy, tempz = genrand3dpt(0, 10000)
    points.append(point(tempx, tempy, tempz))
print("Using Objects",time.time()-start)
del points

start= time.time()
points = [ [each for each in genrand3dpt(0, 10000)] for i in range(num_points)]
print("Pure Points, no Objects",time.time()-start)
del points

Using Objects 0.19133377075195312
Pure Points, no Objects 0.12816953659057617


The next thing I did was convert any loops I could to list comprehensions. Run the cell below to see the time difference between using range(), in container, mapping to lambda, and list comprehensions. What you should see is that list comprehensions are indeed the fastest

In [310]:
start= time.time()
test = [i for i in range(0,10000000)]
for i in range(10000000):
    test[i] = test[i]**2

print("Using range() construct", time.time()-start)
print(test[:5],'\n')
del test

start= time.time()
test = [i for i in range(0,10000000)]
for i, each in enumerate(test):
    test[i] = each**2

print("Using in statement",time.time()-start)
print(test[:5],'\n')
del test

# just for fun
start = time.time()
test = [i for i in range(0,10000000)]
test=list(map(lambda x: x**2, test))

print("Using lamba trickery", time.time()-start)
print(test[:5],'\n')
del test

start= time.time()
test = [i for i in range(0,10000000)]
test = [each**2 for each in test]
print("Using list comprehension", time.time()-start)
print(test[:5],'\n')
del test

Using lamba trickery 2.441319465637207
[0, 1, 4, 9, 16] 

Using range() construct 2.8472323417663574
[0, 1, 4, 9, 16] 

Using in statement 2.922112464904785
[0, 1, 4, 9, 16] 

Using list comprehension 2.181307077407837
[0, 1, 4, 9, 16] 



Sure the time differences between these two are quite small, but these will add up during execution. The resulting code that was created with these optimizations was far shorter. The implementation is below:

In [1]:
def shortest_smartpython(srandseed, num_points):
    points = [ [each for each in genrand3dpt(0, 10000)] for i in range(num_points)]
    
    loopMAX = 0
    loopMIN = 1.7976931348623157e+308

    for i in range(len(points)):
        for j in range(i+1, num_points):
            distance = (points[i][0] - points[j][0])**2 + (points[i][1] - points[j][1])**2 + (points[i][2]- points[j][2])**2
            if loopMAX < distance:
                loopMAX = distance
            elif loopMIN > distance:
                loopMIN = distance

    print(loopMIN**0.5, loopMAX**0.5)

In [5]:
%%time
seed, pts, expected = setCase('T2')
randlib.srand(seed)
print(expected)
shortest_smartpython(seed, pts) # 7408973.643113739

Expected Min, Max: 70.299 15784.777

70.29876940940112 15784.777591681941
CPU times: user 191 ms, sys: 0 ns, total: 191 ms
Wall time: 188 ms


# ===============================================

## "Smarter" Python Summary
The smarter python tended to be faster on average by a few seconds. I believe this speed increase is pretty much because of the allocation of the array being faster. Originally, I had the arithmetic inside of a list comprehension, but I was getting wall times of 25 minutes on case 5 which is obviously less than ideal. 

For brevity I am no longer writing the C code timings nor the point count for each case. Below is a table with my average timings.

| Case Name | Python Timing  | How much faster? |
|:---:       |:---:|        ---: |
| T1 | **real** 0.00371 sec<br/>**user** 0.00521 sec<br/>**sys** 0.0 sec<br/>  | 0.00371 seconds|
| T1 | **real** 0.26 sec<br/>**user** 0.262 sec<br/>**sys** 0.0 sec<br/>  | 0.042 seconds |
| T3 | **real** 27 sec<br/>**user** 27 sec<br/>**sys** 0.0 sec<br/>  | 2.8 seconds |
| T4 | **real** 3 min 57 sec<br/>**user** 3 min 57 sec<br/>**sys** 0.0 sec<br/>  | 19 seconds|
| T5 | **real** 6 min 55s sec<br/>**user** 6min 55s sec<br/>**sys** 0.0 sec<br/>  | 41 seconds|

I am positive there are even more optimizations you can do using basic python, but let's be honest here...if I am doing serious scientific programming, seven minutes of wall time for a relatively simple program is not acceptable.


# Second Optimization: Numpy Arrays
