<a href="https://colab.research.google.com/github/InNoobWeTrust/made-up-noob-algo/blob/main/parallel_programming/parallel_programming_practice.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
'''
# Problem statement:
Given a sequence of numbers, find the missing numbers?

Example data: 1 2 3 5

Assumption:
- sorted
- incrementally
- data length n is a fixed constant
- fixed size, can be large (but manageable size)

Edge case:
- Array can start from any number
- No number overflow (int64)
'''

import random
import sys
import numpy as np

NUM_ITEMS = 10
def prepare_test_data(n = NUM_ITEMS, step = 1):
    '''
    TDD approach to problem: define test datasets first
    This kind of data is easy to augment
    '''
    if n == 1:
        n = NUM_ITEMS
    # randomly pick a start number
    rand_start = random.randint(0, sys.maxsize - n * step)
    arr = list(range(rand_start, rand_start + (n + 1) * step, step))
    # randomly remove a number in the middle
    rand_idx = random.choice(range(1, n))
    missing = arr.pop(rand_idx)
    return arr, missing

In [2]:
'''
Check the generated test data
'''
test_arr, missing_num = prepare_test_data(int(1e5), 5)
print('Test data:', test_arr)
print('Missing number:', missing_num)

Test data: [6106591458371871951, 6106591458371871956, 6106591458371871961, 6106591458371871966, 6106591458371871971, 6106591458371871976, 6106591458371871981, 6106591458371871986, 6106591458371871991, 6106591458371871996, 6106591458371872001, 6106591458371872006, 6106591458371872011, 6106591458371872016, 6106591458371872021, 6106591458371872026, 6106591458371872031, 6106591458371872036, 6106591458371872041, 6106591458371872046, 6106591458371872051, 6106591458371872056, 6106591458371872061, 6106591458371872066, 6106591458371872071, 6106591458371872076, 6106591458371872081, 6106591458371872086, 6106591458371872091, 6106591458371872096, 6106591458371872101, 6106591458371872106, 6106591458371872111, 6106591458371872116, 6106591458371872121, 6106591458371872126, 6106591458371872131, 6106591458371872136, 6106591458371872141, 6106591458371872146, 6106591458371872151, 6106591458371872156, 6106591458371872161, 6106591458371872166, 6106591458371872171, 6106591458371872176, 6106591458371872181, 6

In [3]:
'''
Intuition:
- We know the step size
- First and last number in an array is not the missing number
- If 2 consecutive numbers in the array is not increasing with the step
  size, then we have a missing number in between. Simple?
'''

def validate_increasing(a, b, step = 1):
    '''
    Simple logic to check if 2 consecutive numbers is increasing
    with given step size
    '''
    return b - step == a

def check_missing_number(arr, step = 1):
    '''
    Simple check logic on a stride of sliding window
    '''
    for i in range(0, len(arr) - 1):
        if not validate_increasing(arr[i], arr[i+1], step):
            return arr[i] + step
    return None

import os
from multiprocessing.pool import ThreadPool

def check_missing_parallel(arr, step = 1):
    '''
    - Split the data to multiple windows with overlap of 1 item.
    - Then for each window, let a thread/process do the job of
    finding the missing number. Then wait and collect results
    - Find a truthy value among the returned results
    '''
    num_proc = os.cpu_count() or 2
    window = len(arr) // num_proc
    splits = [arr[i : i + window + 1] for i in range(num_proc)]
    with ThreadPool(os.cpu_count()) as pool:
        candidates = pool.map(lambda w: check_missing_number(w, step), splits)
    return list(filter(None, candidates))[:1] or None

In [4]:
%time
check_missing_number(test_arr)

CPU times: user 3 µs, sys: 1e+03 ns, total: 4 µs
Wall time: 8.82 µs


6106591458371871952

In [5]:
%time
check_missing_parallel(test_arr)

CPU times: user 4 µs, sys: 1e+03 ns, total: 5 µs
Wall time: 6.91 µs


[6106591458371871952]

# Conclusion
Not worth optimizing, just use a noob's version, easier to understand and easier to find bug.
Until we see performance issue, then we fix...
Or simply say `It's not a bug, it's a feature!`