# Introducing Python Object Types (Data Structures)

Before we get to the code, let’s first establish a clear picture of how this chapter fits
into the overall Python picture. From a more concrete perspective, Python programs
can be decomposed into modules, statements, expressions, and objects, as follows:

1. Programs composed of modules.
2. Modules contain statements.
3. Statements contain expressions.
4. Expressions create and process objects.

# Numbers

Python’s core objects set includes the usual suspects: integers that have no
fractional part, floating-point numbers that do, and more exotic types—complex numbers
with imaginary parts, decimals with fixed precision, rationals with numerator and
denominator, and full-featured sets. Built-in numbers are enough to represent most
numeric quantities—from your age to your bank balance—but more types are available
as third-party add-ons.

In [3]:
# Integer Addition
123 + 222

345

In [2]:
# Floating-point multiplication
2.5 * 4

10.0

In [3]:
# Power operation -> 2 ^ 4
2 ** 4

16

Besides expressions, there are a handful of a useful numeric modules that ship with Python - modules are just packages of additional tools we import to use:

The math module contains more advanced numeric tools as functions, while the
random module performs random-number generation and random selections (here,from a Python list coded in square brackets—an ordered collection of other objects to be introduced later in this chapter):

In [4]:
import math
math.pi

3.141592653589793

In [5]:
math.sqrt(9)

3.0

In [14]:
import random
random.random()

0.3274396710674369

In [13]:
random.choice([1,4,3,4])

1

## Strings

Strings are used to record both textual information (your name, for instance) as well
as arbitrary collections of bytes (such as an image file’s contents).Strictly speaking, strings
are sequences of one-character strings; other, more general sequence types include
lists and tuples, covered later.

In [16]:
# Make a 4-character string, and assign it to a name
S = 'Spam'

In [17]:
# Length
len(S)

4

In [18]:
# The first item in S, indexing by zero-based position
S[0]

'S'

In [19]:
# The second item from the left
S[1]

'p'

In [20]:
# The last item from the end in S
S[-1]

'm'

In [21]:
# The second to last item from the end
S[-2]

'a'

In [22]:
# Slice of S from offsets 1 through 2 (not 3)
S[1:3]

'pa'

In [23]:
# Everything past the first
S[1:]

'pam'

In [24]:
# Everything but the last three
S[:-3]

'S'

In [25]:
# Everything but the last two
S[:-2]

'Sp'

In [26]:
# The last char
S[-1]

'm'

In [27]:
# all the elements till elment index #2 (2 is not included)
S[:2]

'Sp'

In [33]:
# All of S as a top-level copy
S[:]

'Spam'

In [29]:
# Concatenation
S + 'xyz'

'Spamxyz'

In [34]:
2 + 5 

7

In [37]:
# Repetition
S*8

'SpamSpamSpamSpamSpamSpamSpamSpam'

In [38]:
# note the diff between [:-2] & [:2] slicing
S1 = 'SpamSpamSpamSpamSpamSpamSpamSpam'
# Everything but the last two
S1[:-2]


'SpamSpamSpamSpamSpamSpamSpamSp'

In [39]:
# from element index 0 till element index 2 (2 is not included)
S1[:2]

'Sp'

**Polymorphism:** Notice that the plus sign ( + ) means different things for different objects: addition for numbers, and concatenation for strings. This is a general property of Python that we’ll call polymorphism, the meaning of an operation depends on the objects being operated on.

**Immutability:** Strings are immutable in Python -- they cannot be changed in place after they are created. For example, you can’t change a string by assigning to one of its positions, but you can always build a new one and assign it to the same name. Immutability can be used to guarantee that an object remains constant throughout your program

In [40]:
# Immutable objects cannot be changed
S[0] = 'z'

TypeError: 'str' object does not support item assignment

In [41]:
# But we can run expressions to make new objects
S = 'z' + S[1:]
S

'zpam'

In [42]:
S

'zpam'

Every object in Python is classified as either immutable (unchangeable) or not. In terms of the core types, *numbers*, *strings*, and *tuples* are immutable; *lists*, *dictionaries*, and *sets* are not—they can be changed in place freely, as can most new objects you’ll code  with classes.

In addition to generic sequence operations, though, strings also have operations all their own, available as *methods*—functions that are attached to and act upon a specific object.

In [51]:
S = 'Spam'

# Find the offset of a substring in S
z = 'pa'
S.find(z)

1

In [52]:
S.find?

In [53]:
S

'Spam'

In [54]:
# Replace occurences of a string in S with another
S.replace('pa', 'XYZ')

'SXYZm'

In [55]:
# The original string is unchanged
S

'Spam'

**Other methods:** Split, case conversions, test the content of the string, and strip white space characters off the ends of the string.

In [63]:
line = 'aaa|bbb|cccc|dd'
line


'aaa|bbb|cccc|dd'

In [64]:
# split on a delimiter into a list of substrings. change string to list
line.split('|')

['aaa', 'bbb', 'cccc', 'dd']

In [65]:
S = 'spam'

# Upper- and lowercase conversions
S.upper()

'SPAM'

In [66]:
S

'spam'

In [67]:
# Content tests: isalpha, isdigit, etc.
S.isalpha()

True

In [68]:
line = 'aaa, bbb, cccc, dd\n  '

# Remove whitespace characters on the right side
line.rstrip()

'aaa, bbb, cccc, dd'

In [69]:
line

'aaa, bbb, cccc, dd\n  '

In [70]:
# Combine two operations - and change from string to list
line = line.rstrip().split(',')

In [71]:
line

['aaa', ' bbb', ' cccc', ' dd']

**Getting Help:** it returns a list of all the attributes available for any object passed to it. Assuming S is still the string, here are its attributes on Python

The dir function simply gives the methods’ names.

In [72]:
dir(S)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'removeprefix',
 'removesuffix',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',


In [73]:
S + 'NI!'

'spamNI!'

In [74]:
S.__add__('NI!')

'spamNI!'

help is one of a handful of interfaces to a system of code that ships with Python
known as PyDoc—a tool for extracting documentation from objects

In [75]:
help(S.replace)

Help on built-in function replace:

replace(old, new, count=-1, /) method of builtins.str instance
    Return a copy with all occurrences of substring old replaced by new.
    
      count
        Maximum number of occurrences to replace.
        -1 (the default value) means replace all occurrences.
    
    If the optional argument count is given, only the first count occurrences are
    replaced.



## Lists

The Python list object is the most general sequence provided by the language. Lists
are positionally ordered collections of arbitrarily typed objects, and they have no fixed
size. They are also mutable—unlike strings, lists can be modified in place by assignment
to offsets as well as a variety of list method calls. Accordingly, they provide a
very flexible tool for representing arbitrary collections—lists of files in a folder,
employees in a company, emails in your inbox, and so on.

In [76]:
# A list of three different-type objects
L = [123, 'spam', 1.23]

In [77]:
# Number of items in the list
len(L)

3

We can index, slice, and so on, just as for strings:

In [95]:
# Indexing by position
L[0]

123

In [96]:
# Slicing a list returns a new list
L[:-1]

[123, 'spam', 1.23, 4, 5, 6, 69, 69, 69, 69, 4, 5]

In [97]:
L + [4, 5, 6]

[123, 'spam', 1.23, 4, 5, 6, 69, 69, 69, 69, 4, 5, 6, 4, 5, 6]

In [98]:
L

[123, 'spam', 1.23, 4, 5, 6, 69, 69, 69, 69, 4, 5, 6]

In [99]:
# Concat/repeat make a new lists too
L = L + [4, 5, 6]

In [100]:
L

[123, 'spam', 1.23, 4, 5, 6, 69, 69, 69, 69, 4, 5, 6, 4, 5, 6]

In [101]:
L1

NameError: name 'L1' is not defined

In [102]:
L * 3

[123,
 'spam',
 1.23,
 4,
 5,
 6,
 69,
 69,
 69,
 69,
 4,
 5,
 6,
 4,
 5,
 6,
 123,
 'spam',
 1.23,
 4,
 5,
 6,
 69,
 69,
 69,
 69,
 4,
 5,
 6,
 4,
 5,
 6,
 123,
 'spam',
 1.23,
 4,
 5,
 6,
 69,
 69,
 69,
 69,
 4,
 5,
 6,
 4,
 5,
 6]

In [115]:
# We're not changing the original list
L

[123, 'spam', 1.23, 4, 5, 69, 69, 69, 69, 4, 5, 4, 5]

In [116]:
dir(L)

['__add__',
 '__class__',
 '__class_getitem__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

Further, lists have no fixed *size*. That is, they can grow and shrink on demand, in response to list-specific operations.

In [117]:
# Growing: add object at end of list
L = [123, 'spam', 1.23]
L.append('NI')
L

[123, 'spam', 1.23, 'NI']

In [118]:
# Shrinking: delete an item in the middle
L.pop(2)

1.23

In [119]:
# del L[2] deletes from a list too
L

[123, 'spam', 'NI']

Because lists are mutable, most list methods also change the list
object in place, instead of creating a new one:

In [120]:
M = ['bb', 'aa', 'cc']
M.sort()
M

['aa', 'bb', 'cc']

In [121]:
M.sort?

In [122]:
M.sort(reverse=True)
M

['cc', 'bb', 'aa']

In [123]:
M.reverse()
M

['aa', 'bb', 'cc']

In [124]:
help(M.reverse)

Help on built-in function reverse:

reverse() method of builtins.list instance
    Reverse *IN PLACE*.



**Nesting:** We can nest Python's core data types in any combination, and as deeply as we like. One immediate application of this feature is to represent matrices, or ``multidimensional arrays'' in Python.

In [125]:
# A 3 x 3 matrix, as nested lists; code can span lines if bracketed
M = [[1, 2, 3],
     [4, 5, 6],
     [7, 8, 9]]

In [126]:
M

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

In [129]:
# Get row 2
M[1]

[4, 5, 6]

In [130]:
# Get row 2, then get item 3 within the row
M[1][2]

6

**Comprehensions:** In addition to sequence operations and list methods, Python includes a more advanced operation known as a list comprehension expression, which turns out to be a powerful way to process structures like our matrix.

In [131]:
# loop on the matrix row by row, and pick 3rd item. Collect the items in column 3
col3 = [i[2] for i in M]

col3

[3, 6, 9]

In [132]:
# loop on the matrix row by row, and pick 2nd item. Collect the items in column 2
col2 = [i[1] for i in M]

col2

[2, 5, 8]

In [133]:
# The matrix is unchanged
M

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

In [134]:
# Add 1 to each item in column 2
[i[1] + 1 for i in M]

[3, 6, 9]

In [136]:
# Filter out odd items
[i[1] for i in M if i[1] % 2 == 0]

[2, 8]

In [137]:
# Collect a diagonal from matrix
diag = [M[i][i] for i in [0, 1, 2]]

diag

[1, 5, 9]

In [138]:
# Repeat characters in a string
doubles = [c * 2 for c in 'spam']

doubles

['ss', 'pp', 'aa', 'mm']

In [139]:
[c * 2 for c in str(345)]

['33', '44', '55']

In [140]:
[int(c) * 2 for c in str(345)]

[6, 8, 10]

In [141]:
[int(c) * 2 for c in str(345) + 'a']

ValueError: invalid literal for int() with base 10: 'a'

In [142]:
[int(c) * 2 for c in str(345) + 'a' if c.isdigit()]

[6, 8, 10]

In [147]:
[c for c in (str(345) + 'a').reverse()]

AttributeError: 'str' object has no attribute 'reverse'

In [148]:
[c *2 for c in (str(345) + 'a')]

['33', '44', '55', 'aa']

In [149]:
systems = ['Windows', 'macOS', 'Linux']
print('List:', systems)
systems.reverse()
print('Updated List:', systems)

List: ['Windows', 'macOS', 'Linux']
Updated List: ['Linux', 'macOS', 'Windows']


In [150]:
[c for c in str(systems).reverse()]

AttributeError: 'str' object has no attribute 'reverse'

In [151]:
my_list = ["3", "4", "5", "a"]

In [152]:
 my_list.reverse()

In [153]:
my_list

['a', '5', '4', '3']

In [154]:
my_list = ["3", "4", "5", "a"]
my_list.reverse()
[c for c in my_list]

['a', '5', '4', '3']

In [155]:
c = "4"
c.isdigit()

True

In [156]:
range(4)

range(0, 4)

In [157]:
a = range(0,4)
a

range(0, 4)

The following illustrates using **range** —a built-in that generates successive integers, and requires a surrounding list to display all its values in 3.X.

In [158]:
# Generate values from 0 to 3
list(range(4))

[0, 1, 2, 3]

In [160]:
# Generate values from -6 to 6 by 2 (step size) 
list(range(-6, 7, 2))

[-6, -4, -2, 0, 2, 4, 6]

In [162]:
# Multiple values
# [0*0, 0*0*0], [1*1, 1*1*1], [2*2,2*2*2], [3*3,3*3*3]
[[x ** 2, x**3] for x in range(4)]

[[0, 0], [1, 1], [4, 8], [9, 27]]

In [163]:
# Multiple values with "if" filters
# -6, -4, -2, 0, 2,4,6
#2,4,6
[[x, x/2, x * 2] for x in range(-6, 7, 2) if x > 0]

[[2, 1.0, 4], [4, 2.0, 8], [6, 3.0, 12]]

## Dictionaries

Python dictionaries are not sequences at all, but are instead known as mappings. They simply map keys to associated values. Dictionaries, the only mapping type in Python’s core objects set, are also mutable: like lists, they may be changed in place and can grow and shrink on demand.

In [2]:
D = {'food': 'Spam', 'quantity': 4, 'color': 'pink'}

In [3]:
# Fetch value of key 'food'
D['food']

'Spam'

In [167]:
# Add 1 to 'quantity' value
D['quantity'] = D['quantity'] + 1

D['quantity'] += 1

D

{'food': 'Spam', 'quantity': 6, 'color': 'pink'}

You can start with an empty dictionary and fill it out one key at a time.

In [5]:
D = {}

# Create keys by assignment
D['name'] = 'Bob'
D['job'] = 'dev'
D['age'] = 40

D

{'name': 'Bob', 'job': 'dev', 'age': 40}

In [6]:
print(D['name'])

Bob


In other applications, dictionaries can also be used to replace searching operations—indexing a dictionary by key is often the fastest way to code a search in Python.

We can also make dictionaries by passing to the dict type name either keyword arguments (a special name=value syntax in function calls), or the result of zipping together sequences of keys and values obtained at runtime (e.g., from files).

In [7]:
# Keywords
bob1 = dict(name='Bob', job='dev', age=40)

bob1

{'name': 'Bob', 'job': 'dev', 'age': 40}

In [10]:
z = zip(['name', 'job', 'age'], ['Bob', 'dev', 40])
z

<zip at 0x231c515b140>

In [11]:
zipped = list(zip(['name', 'job', 'age'], ['Bob', 'dev', 40]))
print(zipped)

[('name', 'Bob'), ('job', 'dev'), ('age', 40)]


In [12]:
# Zipping
bob2 = dict(zip(['name', 'job', 'age'], ['Bob', 'dev', 40]))

bob2

{'name': 'Bob', 'job': 'dev', 'age': 40}

In [13]:
[[a, b] for a, b in zip(['name', 'job', 'age'], ['Bob', 'dev', 40])]

[['name', 'Bob'], ['job', 'dev'], ['age', 40]]

**Nesting Revisited**: The following dictionary, coded all at once as a literal, captures more structured information.

In [14]:
rec = {'name': {'first': 'Bob', 'last': 'Smith'},
       'jobs': ['dev', 'mgr'],
       'age': 40.5}

In [15]:
# 'name' is a nested dictionary
rec['name']

{'first': 'Bob', 'last': 'Smith'}

In [16]:
# Index the nested dictionary
rec['name']['last']

'Smith'

In [17]:
# 'jobs' is a nested list
rec['jobs']

['dev', 'mgr']

In [18]:
# Index the nested list
rec['jobs'][-1]

'mgr'

In [19]:
# Expand Bob's job description in place
rec['jobs'].append('janitor')

rec

{'name': {'first': 'Bob', 'last': 'Smith'},
 'jobs': ['dev', 'mgr', 'janitor'],
 'age': 40.5}

The real reason for showing you this example is to demonstrate the flexibility of Python’s core data types. As you can see, nesting allows us to build up complex information structures directly and easily. Building a similar structure in a low-level language like C would be tedious and require much more code: we would have to lay out and
Dictionaries structures and arrays, fill out values, link everything together, and so on.

**Garbage Collection**: Just as importantly, in a lower-level language we would have to be careful to clean up all of the object’s space when we no longer need it. In Python, when we lose the last reference to the object—by assigning its variable to something else, for example—all
of the memory space occupied by that object’s structure is automatically cleaned up for us.

In [22]:
rec["name"]["middle"] = "Michael"

rec

{'name': {'first': 'Bob', 'last': 'Smith', 'middle': 'Michael'},
 'jobs': ['dev', 'mgr', 'janitor'],
 'age': 40.5}

**Missing Keys:** Fetching a nonexistent key is a mistake.

In [23]:
D = {'a': 1, 'b': 2, 'c': 3}

D

{'a': 1, 'b': 2, 'c': 3}

In [24]:
# Assigning new keys grows dictionaries
D['e'] = 99

D

{'a': 1, 'b': 2, 'c': 3, 'e': 99}

In [25]:
# Referencing a nonexistent key is an error
D['f']

KeyError: 'f'

In [26]:
'f' in D

False

In [27]:
'e' in D

True

Besides the if test, there are a variety of ways to avoid accessing nonexistent keys in the dictionaries we create: the **get** method, a conditional index with a default.

In [33]:
# Index but with a default
value = D.get('x', 0)

value

0

In [34]:
help(D.get)

Help on built-in function get:

get(key, default=None, /) method of builtins.dict instance
    Return the value for key if key is in the dictionary, else default.



In [37]:
xy = D.get("x")
xy

We can grab a list of keys with the dictionary **keys** method.

In [38]:
# Unordered keys list
Ks = list(D.keys())

Ks

['a', 'b', 'c', 'e']

In [46]:
# Sorted keys list
Ks.sort(reverse=True)
Ks

['e', 'c', 'b', 'a']

In [47]:
# Iterate through sorted keys
for key in Ks:
    print(key, "=>", D[key])

e => 99
c => 3
b => 2
a => 1


In [48]:
[[key, D[key]] for key in Ks]

[['e', 99], ['c', 3], ['b', 2], ['a', 1]]

**sorted** call returns the result and sorts a variety of object types, in this case sorting dictionary keys automatically.

In [49]:
for key in sorted(D):
    print(key, '=>', D[key])

a => 1
b => 2
c => 3
e => 99


In [50]:
sorted(D)

['a', 'b', 'c', 'e']

In [51]:
D['a']

1