# Built-in Data, Structures, Functions, & Files

## 3.1 Data Structures and Sequences

- ###  Tuple
A fixed length immutable sequence of python objects.

In [1]:
tup = 4, 5, 6

In [2]:
nested_tup = (4, 5, 6), (7, 8)

In [3]:
tup

(4, 5, 6)

In [4]:
nested_tup

((4, 5, 6), (7, 8))

In [5]:
# any sequence or iterator can be converted into a tuple
tuple([4, 0, 2])

(4, 0, 2)

In [6]:
tup = tuple("string")

In [7]:
tup

('s', 't', 'r', 'i', 'n', 'g')

In [8]:
# access through slicing
tup[2]

'r'

Once created the objects inside tuples cannot be modified into other object types

In [9]:
tup = tuple(["foo", [1, 3, 4], True])

In [10]:
tup[2] = False  # fails

TypeError: 'tuple' object does not support item assignment

If an object inside a tuple is mutable, such as a list, you can modify it in-place

In [11]:
tup[1].append(3)

In [12]:
tup

('foo', [1, 3, 4, 3], True)

In [13]:
# tuples can be concatenated and multiplied
(3, None, "foo") + (4, 0) + ("bar",)

(3, None, 'foo', 4, 0, 'bar')

In [14]:
("foo", "bar")*4

('foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar')

#### Unpacking Tuples

If you try to assign to a tuple-like expresssion of variable, Python will attempt to unpack the value on the righthand side of the equal sign

In [15]:
tup = (4, 5, 6)

In [16]:
a, b, c = tup

In [17]:
c

6

In [18]:
seq = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]

In [19]:
for a, b, c in seq:
    print("a={0}, b={1}, c={2}".format(a, b, c))

a=1, b=2, c=3
a=4, b=5, c=6
a=7, b=8, c=9


To capture an arbitrary long list of arguments use special syntax *_

In [20]:
values = 1, 3, 4, 5, 6, 7, 8

In [21]:
a, b, *_ = values

In [22]:
a

1

In [23]:
_

[4, 5, 6, 7, 8]

- ### List
Unlike tuples, lists are variable length and their contents can be modified in-place.

In [24]:
a_list = [3, 4, 5, None]

In [25]:
tup = ("foo", "bar", "bat")

In [26]:
b_list = list(tup)

In [27]:
b_list

['foo', 'bar', 'bat']

In [28]:
# lists can be modified
b_list[1] = "peekaboo"

In [29]:
b_list

['foo', 'peekaboo', 'bat']

The list function is frequently used in data processing as a way to materialize an iterator or generator expression
    

In [30]:
gen = range(10)

In [31]:
gen

range(0, 10)

In [32]:
list(gen)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Adding and removing elements

In [33]:
b_list.append("dwarf")  # append adds to end of list

In [34]:
b_list

['foo', 'peekaboo', 'bat', 'dwarf']

In [35]:
b_list.insert(2, "fox")  # inserts in specific location

In [36]:
b_list

['foo', 'peekaboo', 'fox', 'bat', 'dwarf']

"insert" is an computationally expensive command.
If you need to insert elements at bouth the beginning and end of a sequence, you may wish to explose collections.deque, a double-ended queue.

In [37]:
# removes and returns indexed element from list
b_list.pop(2)

'fox'

In [38]:
b_list  # no "fox"

['foo', 'peekaboo', 'bat', 'dwarf']

In [39]:
b_list.append("foo")

In [40]:
b_list

['foo', 'peekaboo', 'bat', 'dwarf', 'foo']

In [41]:
b_list.remove("foo")  # removes first value in list

In [42]:
b_list

['peekaboo', 'bat', 'dwarf', 'foo']

In [43]:
"dwarf" in b_list  # looks for value using the "in" keyword

True

In [44]:
"dwarf" not in b_list  # "not" negates "in"

False

Use "extend" function when concatenating lists; less expensive than "+"

In [45]:
x = [4, None, "foo"]

In [46]:
%time x.extend([7, 9, (3,4)])

Wall time: 0 ns


In [47]:
%time [4, None, 'foo'] + [7, 8, (2, 3)]  # more apparent difference with larger list

Wall time: 0 ns


[4, None, 'foo', 7, 8, (2, 3)]

- ### Sorting!

In [48]:
a = [3, 5, 2, 7, 8, 1]

In [49]:
a.sort()

In [50]:
a

[1, 2, 3, 5, 7, 8]

"sort" has a few options which are handy. One is passing a secondary "sort" key-- a function that produces a valie to use to  sort the objects

In [51]:
b = ["saw", "small", "He", "foxes", "six"]

In [52]:
b.sort(key=len)

In [53]:
b

['He', 'saw', 'six', 'small', 'foxes']

- ### Binary search and maintaining a sorted list

! Warning make sure list is sorted before using bisect !

In [54]:
import bisect
c = [1, 2, 2, 2, 3, 4, 7]

In [55]:
bisect.bisect(c, 2)  # returns index where 2 would be inserted in sorted list

4

In [56]:
bisect.insort(c, 5)  # inserts 5 allowing c to remain sorted

In [57]:
c

[1, 2, 2, 2, 3, 4, 5, 7]

- ### Slicing

In [58]:
seq = [7, 2, 4, 7, 5, 6, 0, 1]

In [59]:
seq[1:5]

[2, 4, 7, 5]

In [60]:
seq[3:4] = [6, 3]

In [61]:
seq

[7, 2, 4, 6, 3, 5, 6, 0, 1]

In [62]:
seq[:5]

[7, 2, 4, 6, 3]

In [63]:
seq[3:]

[6, 3, 5, 6, 0, 1]

In [64]:
seq[-4:]

[5, 6, 0, 1]

In [65]:
# a step can be used after a second colon
seq[::2]  # take every other element 

[7, 4, 3, 6, 1]

In [66]:
# used to reverse a list of tuple
seq[::-1]

[1, 0, 6, 5, 3, 6, 4, 2, 7]

- ### Built-in Sequence Functions

#### enumerate
common when iterating over a sequence you want to keep track of the index of the current item.

In [67]:
some_list = ["foo", "bar", "baz"]

In [68]:
mapping = {}

In [69]:
# good when indexing data
for i, v in enumerate(some_list):
    mapping[v] = i

In [70]:
mapping

{'bar': 1, 'baz': 2, 'foo': 0}

In [71]:
mapping["bar"]

1

#### sorted
returns a new sorted list from the lements of any sequence

In [72]:
sorted([4, 5, 6, 7, 4, 3, 2, 1])

[1, 2, 3, 4, 4, 5, 6, 7]

In [73]:
sorted("horse race")

[' ', 'a', 'c', 'e', 'e', 'h', 'o', 'r', 'r', 's']

#### zip

zip pairs up the elements of anumber of lists, tuples, or other sequences to create a list of tuples

In [74]:
seq1 = ["foo", "bar", "baz"]

In [75]:
seq2 = ["one", "two", "three"]

In [76]:
zipped = zip(seq1, seq2)  # zip takes any number of sequences

In [77]:
list(zipped)

[('foo', 'one'), ('bar', 'two'), ('baz', 'three')]

In [78]:
# number of elements is determined by shortest sequence
seq3 = [False, True]
list(zip(seq1, seq2, seq3))

[('foo', 'one', False), ('bar', 'two', True)]

In [79]:
for i, (a, b) in enumerate(zip(seq1, seq2)):
    print("{0}: {1}, {2}".format(i, a, b))

0: foo, one
1: bar, two
2: baz, three


In [80]:
pitchers = [("Nolan", "Ryan"), ("Roger", "Clemens"), ("Schilling", ("Curt"))]

In [81]:
# * is an unpacking argument for list
first_names, last_names = zip(*pitchers)

In [82]:
first_names

('Nolan', 'Roger', 'Schilling')

In [83]:
last_names

('Ryan', 'Clemens', 'Curt')

In [84]:
# another example of *
list(range(3, 6))

[3, 4, 5]

In [85]:
args = [3, 6]
list(range(*args))

[3, 4, 5]

#### reversed

In [86]:
# reversed iterates over the elements of a sequence in reverse order
# reversed is a generator so it does not create the reversed sequence
# until materalized
list(reversed(range(20)))

[19, 18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

- ### dict
    - dict is likely the most important built-in python data structure. Also called hash map or associative array
    - it is a flexibly sized collection of key-value pairs, where the key and value are python objects

In [87]:
empty_dict = {}

In [88]:
d1 = {"a": "some value", "b": [1, 2, 3, 4]}

In [89]:
d1

{'a': 'some value', 'b': [1, 2, 3, 4]}

In [90]:
# you can access, insert, or set elements using the same syntax as before
d1[7] = "an integer"  # insert

In [91]:
d1

{7: 'an integer', 'a': 'some value', 'b': [1, 2, 3, 4]}

In [92]:
d1["b"]  # access with key "b"

[1, 2, 3, 4]

In [93]:
# check if a dict contains a key
"b" in d1

True

In [94]:
d1[5] = "some value"
d1["dummy"] = "another value"
d1

{5: 'some value',
 7: 'an integer',
 'a': 'some value',
 'b': [1, 2, 3, 4],
 'dummy': 'another value'}

In [95]:
del d1[5]  # deletes keyword 5
d1

{7: 'an integer',
 'a': 'some value',
 'b': [1, 2, 3, 4],
 'dummy': 'another value'}

In [96]:
ret = d1.pop("dummy")  # pop deletes dummy keyword but returns value
ret  # returns "another value"

'another value'

In [97]:
d1

{7: 'an integer', 'a': 'some value', 'b': [1, 2, 3, 4]}

In [98]:
list(d1.keys())  # returns keys

['a', 'b', 7]

In [99]:
list(d1.values())  # returns values

['some value', [1, 2, 3, 4], 'an integer']

In [100]:
d1.update({"b": "foo", "c": 12})  # merge one dict into another using the update method
d1

{7: 'an integer', 'a': 'some value', 'b': 'foo', 'c': 12}

#### Creating dicts from sequences

In [102]:
# dict is essentially a collection of 2-tupes, the dict function accepts list of 2-tuples
mapping = dict(zip(range(5), reversed(range(5))))
mapping

{0: 4, 1: 3, 2: 2, 3: 1, 4: 0}

#### Default Values

In [105]:
""" 'get' by default will return None if the key is not present, while pop will
raise an exception. With setting values, a common case is for the values in a
dict to be other collections, like lists. For example, image categorizing a
list of words by their first letter as a dict of lists:"""

words = ["apple", "bat", "bar", "atom", "book"]
by_letter = {}
for word in words:
    letter = word[0]
    if letter not in by_letter:
        by_letter[letter] = [word]
    else:
        by_letter[letter].append(word)

by_letter

{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}

In [107]:
"""from the built-in collections module, defaultdict, makes the previous task
easier. To create one, pass a type or function for generating the default value
for each slot in the dict"""

from collections import defaultdict
by_letter = defaultdict(list)
for word in words:
    by_letter[word[0]].append(word)

by_letter

defaultdict(list, {'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']})

- ### set
A set is an unordered collecion of unique elemets. You can think of them like
dicts, but only keys, no values. Two ways to create: via the set fuction or 
a set literal with curly braces:


In [108]:
set([2, 2, 2, 1, 3, 3])  # set function

{1, 2, 3}

In [109]:
{2, 2, 2, 1, 3, 3}  # set literal

{1, 2, 3}

In [None]:
# set functions can be found in pg.66

- ### List, Set, and Dict Comprehensions
List comprehensions are one of the most-loved Python language features. They
allow you to concisely for a new list by filtering the element of a collection,
transformng the elements passing the filter in one concise expression. They
take the basic form:

    [expr for val in collection if condition]
    
    This is equivalent to the following for loop:


    result = []
    for val in collection:
        if condition:
            result.append(expr)

In [110]:
# example
strings = ["a", "as", "bat", "car", "dove", "python"]
[x.upper() for x in strings if len(x) > 2]

['BAT', 'CAR', 'DOVE', 'PYTHON']

In [113]:
# examples
unique_lengths = {len(x) for x in strings}
unique_lengths

{1, 2, 3, 4, 6}

In [115]:
loc_mapping = {val : index for index, val in enumerate(strings)}
loc_mapping

{'a': 0, 'as': 1, 'bat': 2, 'car': 3, 'dove': 4, 'python': 5}

#### Nested list comprehensions

In [118]:
# list of lists containing English and Spanish names
all_data = [['John', 'Emily', 'Michael', 'Mary', 'Steven'],
            ['Maria', 'Juan', 'Javier', 'Natalia', 'Pilar']]

# we want to get a single list containing all names with two or more e's in them
names_of_interest = []
for names in all_data:
    enough_es= [name for name in names if name.count('e') >= 2]
    names_of_interest.extend(enough_es)

names_of_interest

['Steven']

In [125]:
# using a nested list comprehension
# first name 
result = [name for names in all_data for name in names if name.count("e") >= 2]
result

['Steven']

## 3.2 Functions
- ### Anonymous (Lambda) Functions
    - Anonymous because def keyword is not used; no explicit __name__ attribute

In [127]:
# functions that can be written in a single statement with the 'lambda' keyword
# below are two equivalent functions

def short_function(x):
    return x * 2

equiv_anon = lambda x: x * 2

In [128]:
# example two

def apply_to_list(some_list, f):
    return [f(x) for x in some_list]

ints = [4, 0, 1, 5, 6]
apply_to_list(ints, lambda x: x * 2)

# could have also written [x * 2 for x in ints]

[8, 0, 2, 10, 12]

In [133]:
# example three
# sort a collection of strings by the number of distinct letters in each string

strings = ["foo", "card", "bar", "aaaa", "abab"]
strings.sort(key=lambda x: len(set(list(x))))
strings

['aaaa', 'foo', 'abab', 'bar', 'card']

- ### Generators
    - A generator is a concise way to construct a new iterable object. Unlike normal functions, generators return a sequence of multiple results lazily, pausing after each one until the next one is requested. To create a generator use the 'yield' keyword instead of return in a function.

In [175]:
def squares(n=10):
    print("Generating squares from 1 to {0}".format(n ** 2))
    for i in range(1, n + 1):
        yield i ** 2

gen = squares()
gen

<generator object squares at 0x0451CC30>

In [176]:
# when the iterator is called no code is executed
# it is not until you request elements from the generator that is begins to execute
# example
 
for x in gen:
    print(x, end=' ')

Generating squares from 1 to 100
1 4 9 16 25 36 49 64 81 100 

In [177]:
# a more concise way to make a generator is by using a generator expression
# analogous to generating a list, dict, & set comprehensions, but use parenthesis

gen = (x ** 2 for x in range(100))
gen

<generator object <genexpr> at 0x0476B510>

#### Generator expressions

In [179]:
# generator expressions can be usd instead of list comprehensions as function args

sum(x ** 2 for x in range(100))  # same as sum(gen)

328350

In [180]:
dict((i, i ** 2) for i in range(5))

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

#### itertools module
    - The standard library itertools module has a collection of generators
    for many common data algorithms.

In [188]:
# For example, 'groupby' takes any sequence and a function, grouping consecutive
# elements in the sequence by return value of the function

import itertools

first_letter = lambda x: x[0]
names = ["Alan","Adam", "Wes", "Will", "Albert", "Steven"]
for letter, names in itertools.groupby(names, first_letter):
    print(letter, list(names))  # names is a generator

A ['Alan', 'Adam']
W ['Wes', 'Will']
A ['Albert']
S ['Steven']
