# Python Lists and Arrays


This notebook contains the code and most of the text for the article [Python Lists vs. Arrays:  How to Choose Between Them](https://codesolid.com/python-lists-vs-arrays/).  See the article for further details and analysis.

When newcomers think of arrays, they may have an idea that this is implemented in Python as the built in list type.  However, as in other languages, Python lists and arrays are actually quite different.  Even experienced Python developers often reach for the list type first when they need a sequence type, but there are advantages to how arrays are implemented.  

As with most things in Python, the language's consistency means that you can often easily replace the one with the other, or at least, thir APIs are very similar.   Let's look at arrays and lists in turn.

In [1]:
import sys
sys.version

'3.8.12 (9ef55f6fc369, Oct 25 2021, 05:10:01)\n[PyPy 7.3.7 with GCC Apple LLVM 13.0.0 (clang-1300.0.29.3)]'

In [1]:

from datetime import datetime

# A list initialized with a literal
short_list = ["first", 2, datetime.now()]

# Define a list using a list comprehension
long_list = [x for x in range(1,101)]


# Lists grow dynamically.
long_list_2 = list()
for x in range(1,101):
    long_list_2.append(x)
    
print(long_list == long_list_2)

True


From the first example above (```short_list```), we see that lists in Python are able to hold elements of disparate types.  The second example (```long_list```) shows a list comprehension to generate a list.  The third example creates a list identical to the one in the second example, using a simple loop as one might do in another language.  The other thing this demonstrates is that the list is created dynamically, and mutated in place.  We call append to add each element in turn, and the list grows in size to accommodate this.

Despite the name and the ability to grow dynamically, Python lists are not implemented internally as a linked list.  Rather, they are an array of references to objects, the memory for which is over-allocated as appends are done to be more efficient.  Because they are an array, indexing any object in the list is efficient no matter where in the array the object is located.

Next we turn to Python arrays.

There are a few differences between lists and arrays that we should go over up front:

* The list type is always available as part of Python's built-ins.  With arrays, you have to import the array class from the array module.
* Arrays do not have a literal syntax like the ```[item1, item2, ..., itemN]``` syntax that can be used to initialize a list.  For arrays, the array constructor must be used.
* Arrays must always contain objects of the same type, and the type must be specified when the object is initialized.

Some examples will make this clear.

In [2]:
b1 = b"123\n4"
print(f"Bytes of length: {len(b1)}:  {b1}")

b2 = br"123\n4"
print(f"Raw bytes of length: {len(b2)}:  {b2}")


Bytes of length: 5:  b'123\n4'
Raw bytes of length: 6:  b'123\\n4'


In [60]:
b1 = b"Hello, Python fans!"
print(b1[7:13])
print(b1[7])

# This code would give raise an exception:
# TypeError: 'bytes' object does not support item assignment
# b1[7] = 'J'

b'Python'
80


In [62]:
bytes_object = b'Python'
print(type(bytes_object[0]))

<class 'int'>


In [3]:
bytearray_object = bytearray(100)
print(bytearray_object[0:5])
bytearray_object[0:5] = b'Hello'
print(bytearray_object[0:5])

bytearray(b'\x00\x00\x00\x00\x00')
bytearray(b'Hello')


In [70]:
class FileIterator:
    def __init__(self, filename):
        self.filename = filename
        self.file_handle = open(filename, 'r')
        
    def readline(self):
        while line := self.file_handle.readline():
            yield line
        self.file_handle.close()
        
requirements = FileIterator('requirements.txt')
for line in requirements.readline():
    print(line)


alabaster==0.7.12

anyio==3.5.0

appnope==0.1.2

argcomplete==2.0.0

argon2-cffi==21.3.0

argon2-cffi-bindings==21.2.0

asttokens==2.0.5

attrs==21.4.0

Babel==2.9.1

backcall==0.2.0

black==21.12b0

bleach==4.1.0

certifi==2021.10.8

cffi==1.15.0

charset-normalizer==2.0.10

click==8.0.3

colorama==0.4.4

cycler==0.11.0

debugpy==1.5.1

decorator==5.1.1

defusedxml==0.7.1

distlib==0.3.4

docutils==0.17.1

entrypoints==0.3

executing==0.8.2

filelock==3.4.2

fonttools==4.29.0

idna==3.3

imagesize==1.3.0

iniconfig==1.1.1

ipykernel==6.7.0

ipython==8.0.1

ipython-genutils==0.2.0

ipywidgets==7.6.5

jedi==0.18.1

Jinja2==3.0.3

json5==0.9.6

jsonschema==4.4.0

jupyter==1.0.0

jupyter-client==7.1.2

jupyter-console==6.4.0

jupyter-core==4.9.1

jupyter-server==1.13.4

jupyterlab==3.2.8

jupyterlab-pygments==0.1.2

jupyterlab-server==2.10.3

jupyterlab-widgets==1.0.2

kiwisolver==1.3.2

livereload==2.6.3

MarkupSafe==2.0.1

matplotlib-inline==0.1.3

mistune==0.8.4

mypy-extensions==0.4.3

In [90]:
print(sum((x for x in range(10) if x %2 == 1)))

25


In [4]:
from array import array

# Build an array of unicode characters (strings) from a single larger string.
string_elements = array("u", "Hello, array lovers everywhere!")
print(len(string_elements))
print(string_elements[0])
print(string_elements[0:5])
print(type(string_elements[0]))
print(string_elements.itemsize)

31
H
array('u', 'Hello')
<class 'str'>
4


In the example above, the first parameter to the array constructor is a type code, in this case saying we're dealing with Unicode characters.  The second parameter is a string, which is converted into an array of type ```wchar_t``` internally in C. However, when we access an element of the list, Python converts the character to the Python string type, as we see in the last line.  It does the same thing for slices in this case.

We get a very different result if we change the type to an array of bytes.

In [18]:
from array import array

# Build an array of unicode characters (strings) from a single larger string.
string_elements = array('B', b"Hello, array lovers everywhere!")

print(len(string_elements))
print(string_elements[0])
print(string_elements[0:5])
print(type(string_elements[0]))
print(string_elements.itemsize)

31
72
array('B', [72, 101, 108, 108, 111])
<class 'int'>
1


In [5]:
# Build an array from the list
long_list = [x for x in range(1,101)]

# Array of signed short integers
short_array = array("h", long_list)

# Array of unsigned long integers
long_array = array("L", long_list)

Here, the "b" prefix before the string means to treat the string as an array of bytes, and it's represented internally as an unsigned char.  Because the string is in English, the length Python returns is the same in both cases, because it's the length of the array, not the underlying byte representation.  This works out because the first 128 ASCII characters in UTF-8 use the same 1-byte value that ASCII uses.

In [20]:
double_array = array("d", [42.42])
print(double_array.itemsize)

8


In addition to the space savings (in the case of more compact data types like short integers or chars), it's possible that Python may also be gaining some runtime efficiency from when it resolves the data to a Python type, since in the list case the code to look up and render the type in Python may be less efficient than it is in the case of arrays, or the looping code may be less efficient in some other way.

To test this out, we can use the timeit function to see if the iteration time differs significantly between arrays and lists.  In each case we'll iterate an array of 10,000 numbers 10,000 times.

In [6]:
from timeit import timeit

result = timeit(setup="num_list = [x for x in range(0,10000)]", stmt="for i in num_list: z = i", number=10000)
print(result)

0.2262013330037007


In [7]:
from timeit import timeit

result = timeit(setup="from array import array; num_array = array('i', [x for x in range(0,10000)])", stmt="for i in num_array: z = i", number=10000)
print(result)

0.24180920800426975


In [4]:
from timeit import timeit

result = timeit(setup="import numpy as np; np_array = np.array([x for x in range(0,10000)])", stmt="for i in np_array: z = i", number=10000)
print(result)

  add_newdoc('numpy.core.multiarray', 'ndarray', ('__class_getitem__',
  add_newdoc('numpy.core.multiarray', 'dtype', ('__class_getitem__',
  add_newdoc('numpy.core.numerictypes', 'number', ('__class_getitem__',


35.17996854199737


In [8]:
# Do a sum on three different types of lists.  First let's make a long list (approximately 256,000)  
# numeric values that can fit in a byte

from timeit import timeit

iterations = 500

def display(elapsed_time, message, rounding=3):
    print(f"{round(elapsed_time, rounding)} seconds -- {message}.")

result = timeit(setup="num_list = [n for n in range(0,256)] * 1000", 
                stmt="sum(num_list)", number=iterations)
display(result, "Python sum function on a list of numbers")

result = timeit(setup="l = [n for n in range(0,256)] * 1000; num_array = bytearray(l)", 
                stmt="sum(num_array)", number=iterations)
display(result, "Python sum function on a bytearray of numbers")

result = timeit(setup="import numpy as np; l = [n for n in range(0,256)] * 1000; np_array = np.array(l)", 
                stmt="sum(np_array)", number=iterations)
display(result, "Python sum function on a NumPy array", rounding=4)

result = timeit(setup="import numpy as np; l = [n for n in range(0,256)] * 1000; np_array = np.array(l)", 
                stmt="np_array.sum()", number=iterations)
display(result, "NumPy sum method on a NumPy array")


0.29 seconds -- Python sum function on a list of numbers.
0.633 seconds -- Python sum function on a bytearray of numbers.
74.7397 seconds -- Python sum function on a NumPy array.
0.056 seconds -- NumPy sum method on a NumPy array.


In [10]:
list_numbers = [0, 0, 0, 0, 0]
print(list_numbers)
first_three = list_numbers[0:2]
first_three[0] = 100
first_three[1] = 99
print(list_numbers)

[0, 0, 0, 0, 0]
[0, 0, 0, 0, 0]


In [11]:
list_numbers = [0, 0, 0, 0, 0]
print(list_numbers)
list_numbers[0:1]  = [5,6]
print(list_numbers)

[0, 0, 0, 0, 0]
[5, 6, 0, 0, 0, 0]


In [38]:
memoryview_numbers = memoryview(bytearray(5))
print(memoryview_numbers.tolist())
first_three = memoryview_numbers[0:2]
first_three[0] = 100
first_three[1] = 99
print(memoryview_numbers.tolist())

[0, 0, 0, 0, 0]
[100, 99, 0, 0, 0]
