# Built-in Data, Structures, Functions, & Files

## 3.1 Data Structures and Sequences

- ###  Tuple
A fixed length immutable sequence of python objects.

In [1]:
tup = 4, 5, 6

In [2]:
nested_tup = (4, 5, 6), (7, 8)

In [3]:
tup

(4, 5, 6)

In [4]:
nested_tup

((4, 5, 6), (7, 8))

In [5]:
# any sequence or iterator can be converted into a tuple
tuple([4, 0, 2])

(4, 0, 2)

In [6]:
tup = tuple("string")

In [7]:
tup

('s', 't', 'r', 'i', 'n', 'g')

In [8]:
# access through slicing
tup[2]

'r'

Once created the objects inside tuples cannot be modified into other object types

In [9]:
tup = tuple(["foo", [1, 3, 4], True])

In [10]:
tup[2] = False  # fails

TypeError: 'tuple' object does not support item assignment

If an object inside a tuple is mutable, such as a list, you can modify it in-place

In [11]:
tup[1].append(3)

In [12]:
tup

('foo', [1, 3, 4, 3], True)

In [14]:
# tuples can be concatenated and multiplied
(3, None, "foo") + (4, 0) + ("bar",)

(3, None, 'foo', 4, 0, 'bar')

In [15]:
("foo", "bar")*4

('foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar')

#### Unpacking Tuples

If you try to assign to a tuple-like expresssion of variable, Python will attempt to unpack the value on the righthand side of the equal sign

In [16]:
tup = (4, 5, 6)

In [17]:
a, b, c = tup

In [18]:
c

6

In [19]:
seq = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]

In [20]:
for a, b, c in seq:
    print("a={0}, b={1}, c={2}".format(a, b, c))

a=1, b=2, c=3
a=4, b=5, c=6
a=7, b=8, c=9


To capture an arbitrary long list of arguments use special syntax *_

In [21]:
values = 1, 3, 4, 5, 6, 7, 8

In [22]:
a, b, *_ = values

In [23]:
a

1

In [24]:
_

[4, 5, 6, 7, 8]

- ### List
Unlike tuples, lists are variable length and their contents can be modified in-place.

In [25]:
a_list = [3, 4, 5, None]

In [26]:
tup = ("foo", "bar", "bat")

In [27]:
b_list = list(tup)

In [28]:
b_list

['foo', 'bar', 'bat']

In [29]:
# lists can be modified
b_list[1] = "peekaboo"

In [30]:
b_list

['foo', 'peekaboo', 'bat']

The list function is frequently used in data processing as a way to materialize an iterator or generator expression
    

In [31]:
gen = range(10)

In [32]:
gen

range(0, 10)

In [33]:
list(gen)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Adding and removing elements

In [34]:
b_list.append("dwarf")  # append adds to end of list

In [35]:
b_list

['foo', 'peekaboo', 'bat', 'dwarf']

In [36]:
b_list.insert(2, "fox")  # inserts in specific location

In [37]:
b_list

['foo', 'peekaboo', 'fox', 'bat', 'dwarf']

"insert" is an computationally expensive command.
If you need to insert elements at bouth the beginning and end of a sequence, you may wish to explose collections.deque, a double-ended queue.

In [38]:
# removes and returns indexed element from list
b_list.pop(2)

'fox'

In [39]:
b_list  # no "fox"

['foo', 'peekaboo', 'bat', 'dwarf']

In [40]:
b_list.append("foo")

In [41]:
b_list

['foo', 'peekaboo', 'bat', 'dwarf', 'foo']

In [42]:
b_list.remove("foo")  # removes first value in list

In [43]:
b_list

['peekaboo', 'bat', 'dwarf', 'foo']

In [44]:
"dwarf" in b_list  # looks for value using the "in" keyword

True

In [45]:
"dwarf" not in b_list  # "not" negates "in"

False

Use "extend" function when concatenating lists; less expensive than "+"

In [46]:
x = [4, None, "foo"]

In [47]:
%time x.extend([7, 9, (3,4)])

CPU times: user 5 µs, sys: 1e+03 ns, total: 6 µs
Wall time: 10 µs


In [48]:
%time [4, None, 'foo'] + [7, 8, (2, 3)]  # more apparent difference with larger list

CPU times: user 5 µs, sys: 1e+03 ns, total: 6 µs
Wall time: 10 µs


[4, None, 'foo', 7, 8, (2, 3)]

- ### Sorting!

In [49]:
a = [3, 5, 2, 7, 8, 1]

In [50]:
a.sort()

In [51]:
a

[1, 2, 3, 5, 7, 8]

"sort" has a few options which are handy. One is passing a secondary "sort" key-- a function that produces a valie to use to  sort the objects

In [52]:
b = ["saw", "small", "He", "foxes", "six"]

In [53]:
b.sort(key=len)

In [54]:
b

['He', 'saw', 'six', 'small', 'foxes']

- ### Binary search and maintaining a sorted list

! Warning make sure list is sorted before using bisect !

In [55]:
import bisect
c = [1, 2, 2, 2, 3, 4, 7]

In [56]:
bisect.bisect(c, 2)  # returns index where 2 would be inserted in sorted list

4

In [57]:
bisect.insort(c, 5)  # inserts 5 allowing c to remain sorted

In [58]:
c

[1, 2, 2, 2, 3, 4, 5, 7]

- ### Slicing

In [59]:
seq = [7, 2, 4, 7, 5, 6, 0, 1]

In [60]:
seq[1:5]

[2, 4, 7, 5]

In [61]:
seq[3:4] = [6, 3]

In [62]:
seq

[7, 2, 4, 6, 3, 5, 6, 0, 1]

In [63]:
seq[:5]

[7, 2, 4, 6, 3]

In [64]:
seq[3:]

[6, 3, 5, 6, 0, 1]

In [65]:
seq[-4:]

[5, 6, 0, 1]

In [66]:
# a step can be used after a second colon
seq[::2]  # take every other element 

[7, 4, 3, 6, 1]

In [67]:
# used to reverse a list of tuple
seq[::-1]

[1, 0, 6, 5, 3, 6, 4, 2, 7]

- ### Built-in Sequence Functions

#### enumerate
common when iterating over a sequence you want to keep track of the index of the current item.

In [68]:
some_list = ["foo", "bar", "baz"]

In [69]:
mapping = {}

In [70]:
# good when indexing data
for i, v in enumerate(some_list):
    mapping[v] = i

In [71]:
mapping

{'bar': 1, 'baz': 2, 'foo': 0}

In [72]:
mapping["bar"]

1

#### sorted
returns a new sorted list from the lements of any sequence

In [73]:
sorted([4, 5, 6, 7, 4, 3, 2, 1])

[1, 2, 3, 4, 4, 5, 6, 7]

In [74]:
sorted("horse race")

[' ', 'a', 'c', 'e', 'e', 'h', 'o', 'r', 'r', 's']

#### zip

zip pairs up the elements of anumber of lists, tuples, or other sequences to create a list of tuples

In [75]:
seq1 = ["foo", "bar", "baz"]

In [76]:
seq2 = ["one", "two", "three"]

In [77]:
zipped = zip(seq1, seq2)  # zip takes any number of sequences

In [78]:
list(zipped)

[('foo', 'one'), ('bar', 'two'), ('baz', 'three')]

In [79]:
# number of elements is determined by shortest sequence
seq3 = [False, True]
list(zip(seq1, seq2, seq3))

[('foo', 'one', False), ('bar', 'two', True)]

In [80]:
for i, (a, b) in enumerate(zip(seq1, seq2)):
    print("{0}: {1}, {2}".format(i, a, b))

0: foo, one
1: bar, two
2: baz, three


In [81]:
pitchers = [("Nolan", "Ryan"), ("Roger", "Clemens"), ("Schilling", ("Curt"))]

In [82]:
# * is an unpacking argument for list
first_names, last_names = zip(*pitchers)

In [83]:
first_names

('Nolan', 'Roger', 'Schilling')

In [84]:
last_names

('Ryan', 'Clemens', 'Curt')

In [85]:
# another example of *
list(range(3, 6))

[3, 4, 5]

In [86]:
args = [3, 6]
list(range(*args))

[3, 4, 5]

#### reversed

In [87]:
# reversed iterates over the elements of a sequence in reverse order
# reversed is a generator so it does not create the reversed sequence
# until materalized
list(reversed(range(20)))

[19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

- ### dict
    - dict is likely the most important built-in python data structure. Also called hash map or associative array
    - it is a flexibly sized collection of key-value pairs, where the key and value are python objects

In [88]:
empty_dict = {}

In [89]:
d1 = {"a": "some value", "b": [1, 2, 3, 4]}

In [90]:
d1

{'a': 'some value', 'b': [1, 2, 3, 4]}

In [91]:
# you can access, insert, or set elements using the same syntax as before
d1[7] = "an integer"  # insert

In [92]:
d1

{7: 'an integer', 'a': 'some value', 'b': [1, 2, 3, 4]}

In [93]:
d1["b"]  # access with key "b"

[1, 2, 3, 4]

In [94]:
# check if a dict contains a key
"b" in d1

True

In [95]:
d1[5] = "some value"
d1["dummy"] = "another value"
d1

{5: 'some value',
 7: 'an integer',
 'a': 'some value',
 'b': [1, 2, 3, 4],
 'dummy': 'another value'}

In [96]:
del d1[5]  # deletes keyword 5
d1

{7: 'an integer',
 'a': 'some value',
 'b': [1, 2, 3, 4],
 'dummy': 'another value'}

In [97]:
ret = d1.pop("dummy")  # pop deletes dummy keyword but returns value
ret  # returns "another value"

'another value'

In [98]:
d1

{7: 'an integer', 'a': 'some value', 'b': [1, 2, 3, 4]}

In [99]:
list(d1.keys())  # returns keys

['a', 'b', 7]

In [100]:
list(d1.values())  # returns values

['some value', [1, 2, 3, 4], 'an integer']