## Comparison Between a Data Generator Using Threads vs Not

### Summary
A normal data generator is fine when yielding information, but what happens when you process that information outside of the generator? Well the generator sits there and does nothing until you request new data when it starts executing code again while you wait for it to return values. A better way would be for the generator to be a thread that executes code in the background between calls so that it is ready to yield as soon as you call it. This is a simple (not complete) experiment into timing differences. Obviously the times are random and the efficiency of the method depends on respective processing times of the generator and gap between the main thread calling yield.

#### Create a simple data generator

In [1]:
import threading
import time
import numpy as np

def do_sleep(i):
    time.sleep(i)

def initial_data_gen():
    for i in range(30):
        do_sleep(3)
        yield(i)

#### Create a threaded generator

In [2]:
input_ = 5
result = None
access_input = threading.Lock()
access_output = threading.Lock()

def thread_do_sleep():
    global result, input_, access_input,access_output
    #wait to acquire the output mutex to block main thread
    access_output.acquire()
    access_input.acquire()
    val = input_
    access_input.release()
    time.sleep(3)
    result = val + 1
    #release output mutex for main thread to continue
    access_output.release()

#Problem initialising threads - maybe semaphore or something is needed?
#also starting new thread on each round is slowwww - improve
def threaded_data_gen():
    global result, input_, access_input,access_output
    #acquire the mutex, write data and release the mutex
    input_ = 0
    threading.Thread(target=thread_do_sleep).start()

    for i in range(1,30):
        
        #Note this bit is broken and hacky - should improve
        #while result == None:
            #time.sleep(0.1)

        #wait to acquire the output mutex and read result
        access_output.acquire()
        value = result
        
        #ensure blocking of do sleep thread (order is important here!!)
        access_input.acquire()
        access_output.release()
        
        threading.Thread(target=thread_do_sleep).start()
        input_ = value
        access_input.release()

        yield(value)


### Test the average Yield times of each generator for the same task

In [4]:
def test_functions():
    generator_v1 = initial_data_gen()
    generator_v2 = threaded_data_gen()

    g1_time = []
    g2_time = []

    for i in range(20):
        time.sleep(2)

        start = time.time()
        end = time.time()
        g1_time.append(start-end)

        start = time.time()
        end = time.time()
        g2_time.append(start-end)
        
    print("The average time for non_threaded is " + str(np.mean(g1_time)) + " the average time for threaded is " + str(np.mean(g2_time)))

test_functions()

The average time for non_threaded is -2.3245811462402344e-06 the average time for threaded is -2.5033950805664064e-07
