### Exploring creating dictionaries with base python, numba and c++

Looks like base python is as fast as anything here for creation

But numba.typed.Dict can be read from more than twice as fast as base python (with basic integer keys anyway). If you account for types by making keys float64 (the same as base python dict) numba dict still reads 50% (ish) faster

So could stick to numba.typed.Dict and wrap other operations in numba (eg pythagoras for distance to houses/facilities/buss stops/etc) for speed

c++ might be slower because of I/O, so possible it's faster with more operations. But it's not obvious it would be so

In [1]:

from numba import njit, vectorize, types
from numba.typed import Dict
import random
from concurrent.futures import ThreadPoolExecutor
import example   # compiled from c++ file in same folder

In [2]:
def make_big_dictionary():
    dict = {}
    for i in range(1000000):
        dict[i] = "a"
        
    return None 


In [3]:
%%time
make_big_dictionary()

CPU times: user 138 ms, sys: 54.7 ms, total: 193 ms
Wall time: 194 ms


In [4]:
make_big_dictionary_jit = njit()(make_big_dictionary)

In [5]:
%%time
make_big_dictionary_jit()  # maybe 10% faster than non-jit func: would have to run many times to check. Is marginal

CPU times: user 1.31 s, sys: 215 ms, total: 1.53 s
Wall time: 1.32 s


In [6]:
make_big_dictionary_jit_parallel = njit(parallel=True)(make_big_dictionary)

In [7]:
%%time
make_big_dictionary_jit_parallel() 
# returns a warning because adding to a dictionary can't be run in parallel

CPU times: user 425 ms, sys: 24.2 ms, total: 449 ms
Wall time: 454 ms


The keyword argument 'parallel=True' was specified but no transformation for parallel execution was possible.

To find out why, try turning on parallel diagnostics, see https://numba.pydata.org/numba-doc/latest/user/parallel.html#diagnostics for help.

File "<ipython-input-2-cb2b39cfe4ae>", line 1:
def make_big_dictionary():
^



In [8]:
%%time
example.make_dict(1000000)     # c++ takes maybe 50% longer than base python here

CPU times: user 183 ms, sys: 24.2 ms, total: 207 ms
Wall time: 208 ms


In [9]:
%%time
example.make_dict(100000000) 
# vs numba dict this is slower (about half the speed with c++ base int - same with 'signed long int'). 
# So don't think can blame i/o

CPU times: user 19.9 s, sys: 7.1 s, total: 27 s
Wall time: 27.7 s


In [10]:
@njit
def make_big_numba_dictionary():
    dict_to_populate = Dict.empty(  # Dict is from numba.typed
        key_type=types.int32,   #types.float64[:] would mean float array
        value_type=types.unicode_type,  # key is int32, value is string
    )
    
    for i in range(1000000):
        dict_to_populate[i] = "a"
        
    return None 
    

In [11]:
%%time
make_big_numba_dictionary()
# takes about the same time with a numba dictionary

  dict_to_populate[i] = "a"


CPU times: user 495 ms, sys: 105 ms, total: 600 ms
Wall time: 606 ms


### Now looking at speed of reading a dictionary

In [12]:
@njit
def read_big_numba_dictionary(dict_to_populate):
    for i in range(len(dict_to_populate)):
        a = dict_to_populate[i]
    return None

In [13]:
def read_big_dictionary_base(dict):
    for i in range(len(dict)):
        a = dict[i]
    return None

In [14]:
# Neither of these work 
read_big_dictionary_jit = njit()(read_big_dictionary_base)
read_big_dictionary_jit_parallel = njit(parallel=True)(read_big_dictionary_base)

In [15]:
@njit
def make_big_numba_dictionary_return_data(dict_size):
    dict_to_populate = Dict.empty(  # Dict is from numba.typed
        key_type=types.int32,   #types.float64[:] would mean float array
        value_type=types.unicode_type,  # key is int32, value is string
    )
    
    for i in range(dict_size):
        dict_to_populate[i] = "a"
        
    return dict_to_populate

dict_to_populate = make_big_numba_dictionary_return_data(1000000)


def make_big_dictionary(dict_size):
    dict = {}
    for i in range(dict_size):
        dict[i] = "a"
        
    return dict 

base_dict_to_populate = make_big_dictionary(1000000)



In [16]:
%%time
read_big_numba_dictionary(dict_to_populate)
# This is maybe 10% faster with int32 key than int64. And 50% slower if you make it float64

CPU times: user 231 ms, sys: 4.96 ms, total: 236 ms
Wall time: 236 ms


In [17]:
%%time
read_big_dictionary_base(base_dict_to_populate)
# slightly half the speed of read_big_numba_dictionary()

CPU times: user 76 ms, sys: 2.53 ms, total: 78.5 ms
Wall time: 76.7 ms


In [18]:
%%time
example.read_in_dict(base_dict_to_populate)

CPU times: user 201 ms, sys: 17.2 ms, total: 218 ms
Wall time: 218 ms


In [19]:
%%time
read_big_dictionary_jit(base_dict_to_populate)

TypingError: Failed in nopython mode pipeline (step: nopython frontend)
non-precise type pyobject
During: typing of argument at <ipython-input-13-00fe08a05826> (2)

File "<ipython-input-13-00fe08a05826>", line 2:
def read_big_dictionary_base(dict):
    for i in range(len(dict)):
    ^ 

This error may have been caused by the following argument(s):
- argument 0: Cannot determine Numba type of <class 'dict'>


In [None]:

# for test
giant_base_dict_to_populate = make_big_dictionary(100000000)
giant_numba_dict_to_populate = make_big_numba_dictionary_return_data(100000000)


In [None]:
%%time
read_big_numba_dictionary(giant_numba_dict_to_populate)   # time taken increases more than linearly with size 
                                                            # (x400 for x100 size increase)