# **Python Lists and Tuples - Possible usecases**

In python language the most used data structure is list. Compared to many other programming languages, python lists are easy to use and dynamic. Tuples are not commonly used as lists, but they are essentially a type of list which are immutable (means does not allow changes ones created, unless you created a new).


When we create a new list, we first allocate a block of memory which can hold addresses(pointers to actual data), its going to have. So thats how they act dynamically since list itself does not hold the data type. One another interesting fact is list data structure of python keep track of its size as well. **which is why `len` function of lists have O(1) time complexity**

In [16]:
arr_big = [int(i) for i in range(10000000)]
arr_small = [int(i) for i in range(10)]

In [65]:
from time import time
from random import randint

Reason measuring time like below is, since len() function is fast it gets messed up with timeit magic function for some reason. Honestly not exactly sure what causes that issue. Anyway as a workaround had measure the speeds myself.

[More Info regarding above](https://stackoverflow.com/questions/32248882/complexity-of-len-with-regard-to-sets-and-lists)

In [63]:
t = 0
for i in range(1000000):
    st = time()
    x=len(arr_big)
    et = time()
    t += et-st

print("Average Time:", t/100000.0)

Average Time: 1.1945462226867677e-06


In [64]:
t = 0
for i in range(1000000):
    st = time()
    x=len(arr_big)
    et = time()
    t += et-st

print("Average Time:", t/100000.0)

Average Time: 1.1460018157958985e-06


As we can see average time is super fast and almost equal in both cases.

Also since the data is stored as addresses we can directly locate items we know the index, which means with constant time complexity.
(start index + index*size) 

In [88]:
t = 0
for i in range(1000000):
    idx = 500000 #randint(0,len(arr_big)-1)

    st = time()
    x=arr_big[idx]
    et = time()
    
    t += et-st

print("Average Time:", t/100000.0)

Average Time: 8.819222450256348e-07


In [89]:
t = 0
for i in range(1000000):
    idx = 5 #randint(0, len(arr_small)-1)

    st = time()
    x=arr_small[idx]
    et = time()
    
    t += et-st

print("Average Time:", t/100000.0)

Average Time: 7.789707183837891e-07


We can observe that the execution times are extreamly close to each other even though the array sizes are vastly different.( This is general C array behaviour if you ask me! Also I think array values get cached, therefore if we change the index in each iteration large array may have bit of high average excution time due to cache misses. Not sure though!) 

But if we dont know the index, then we need to go though the list items to check. In such cases python lists have O(n) time complexity (In fact python list index() function has this behaviour). If we need more speed than that, will need to use different data structure like dictionary.

On the other hand if list is sorted, we can use binary search to search values. Python built in list Sort function uses a combination of insertion sort and merge sort alorithms which means O(nlogn) complexity.

In [157]:
def binary_search(arr, val):

    lidx = 0
    ridx = len(arr)-1

    while(True):

        if(lidx > ridx):
            return -1

        mid = (lidx+ridx)//2

        if(arr[mid] < val):
            lidx = mid + 1
        elif(arr[mid] > val):
            ridx = mid - 1
        else:
            return mid


In [158]:
arr = [i for i in range(100)]
binary_search(arr, 99)

99

This is still slower than dictionary, still may be useful in some cases.

Also python inbuilt bisect module helps to do additional tasks like appending, finding closest element on top of sorted array while keeping the sorted property.

In [164]:
import bisect

In [196]:
def find_closest(arr, val):
    '''
    Due to the way bisect_left function work (Returns where to insert the new value), 
    to get the closest element index we need to modify the output a bit.
    '''
    idx = bisect.bisect_left(arr, val)

    # Possibly searched value is not available in the arr
    if(idx == len(arr)):
        return idx - 1
    
    # Searching value is already available in arr
    if(arr[idx] == val):
        return idx

    # val is lower than the whole arr
    if(idx==0):
        return 0

    # To check where the closest item resides (in the left or right)
    if(arr[idx] - val > val - arr[idx-1]):
        return idx - 1
    else:
        return idx

In [197]:
arr = [14, 265, 496, 661, 683, 734, 881, 892, 973, 992]

In [198]:
arr[find_closest(arr, -250)]

14

In [199]:
arr[find_closest(arr, 500)]

496

In [200]:
arr[find_closest(arr, 1100)]

992

In [201]:
for i in range(10):
    new_number = randint(0, 1000)
    bisect.insort(arr, new_number)
print(arr)

[14, 159, 163, 174, 244, 265, 406, 429, 496, 531, 537, 661, 683, 734, 881, 892, 924, 927, 973, 992]


bisect insort lets us insert values to a already sorted array without violating sorted properties.

So what about Tuples? what are the differences between lists and tuples?
As we already know, tuples are immutable, means data cannot be changes once they are created.

This means tuples are inherently created for different purpose. It is good for describing multiple properties of an object which would not change once they are created.

As python lists are dynamic, we can append items on the fly. But this operation can be costly, if there are many element to move inside memory. Therefore as any dynamically memory allocated data structure, list also overallocate memory when it have to. The overallocation happen according to the following formula.

**<center>M = (N >> 3) + (3 if N < 9 else 6)</center>**

In [7]:
N = 5
M = (N >> 3) + (3 if N < 9 else 6)  # N bitshift by 3 and add a constant
M

3

Thing to note is that, when you directly define the list with items, it does not overallocate. But if we add items to a list using append, then it causes to overallocate.

In [1]:
%load_ext memory_profiler

In [2]:
%memit x = [i*i for i in range(1000_000)]

peak memory: 153.79 MiB, increment: 73.35 MiB


In [3]:
%%memit
a = []
for i in range(1000_000):
    a.append(i*i)

peak memory: 157.78 MiB, increment: 38.20 MiB


In [30]:
%timeit [i*i for i in range(100_000)]

4.68 ms ± 191 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [31]:
%%timeit
a = []
for i in range(100_000):
    a.append(i*i)

7.24 ms ± 62.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


On the otherhand for tuples, since they are static they does not over allocate. Except they directly resize to the required size. Because of such properties tuples are bit faster when creating compared to lists.

In [5]:
%timeit [0,1,2,3,4,5,6,7,8,9]

52 ns ± 0.434 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


In [6]:
%timeit (0,1,2,3,4,5,6,7,8,9)

6.35 ns ± 0.0361 ns per loop (mean ± std. dev. of 7 runs, 100,000,000 loops each)
