## Built-in Data Structures, Functions, and Files
We’ll start with Python’s workhorse data structures: tuples, lists, dicts, and sets. Then, we’ll discuss creating your own reusable Python functions. Finally, we’ll look at the mechanics of Python file objects and interacting with your local hard drive.

### Data Structures and Sequences

** TUPLE **

A tuple is a fixed-length, immutable sequence of Python objects.
Elements can be accessed with square brackets [] as with most other sequence types. As in C, C++, Java, and many other languages, sequences are 0-indexed in Python.
While the objects stored in a tuple may be mutable themselves, once the tuple is created it’s not possible to modify which object is stored in each slot.
If an object inside a tuple is mutable, such as a list, you can modify it in-place


In [9]:
tup = 4, 5, 6
print(tup)

# more complicated structures
nested_tup = (4, 5, 6), (7, 8)
print(nested_tup)

tup = tuple('string')
print(tup)

tup[1]

a, b = 1, 2
print(a+b)

(4, 5, 6)
((4, 5, 6), (7, 8))
('s', 't', 'r', 'i', 'n', 'g')
3


In [11]:
# typical use
seq = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
for a, b, c in seq:
    print('a={0}, b={1}, c={2}'.format(a, b, c))

a=1, b=2, c=3
a=4, b=5, c=6
a=7, b=8, c=9


In [13]:
values = 1, 2, 3, 4, 5
a, b, *rest = values

print(a,b)
print(rest)

a, b, *_ = values # equivalent but saves you typing

1 2
[3, 4, 5]


In [16]:
a = (1, 2, 2, 2, 3, 4, 2)
a.count(2)

4

** LIST **

In contrast with tuples, lists are variable-length and their contents can be modified in-place. You can define them using square brackets [] or using the list type function.
Lists and tuples are semantically similar (though tuples cannot be modified) and can be used interchangeably in many functions.

In [21]:
a_list = [2, 3, 7, None]
print(a_list)
tup = ('foo', 'bar', 'baz')
b_list = list(tup)
print(b_list)

b_list[1] = 'peekaboo'

[2, 3, 7, None]
['foo', 'bar', 'baz']


In [27]:
b_list.append('dwarf')
b_list.insert(2, 'baxplus')
print(b_list)
b_list.pop(2)
print(b_list)

['foo', 'peekaboo', 'baxplus', 'baxplus', 'baz', 'dwarf', 'dwarf', 'dwarf', 'dwarf', 'dwarf']
['foo', 'peekaboo', 'baxplus', 'baz', 'dwarf', 'dwarf', 'dwarf', 'dwarf', 'dwarf']


In [31]:
# enumerate function
some_list = ['foo', 'bar', 'baz']
mapping = {}
for i, v in enumerate(some_list):
    mapping[v] = i
mapping

# zip
seq1 = ['foo', 'bar', 'baz']
seq2 = ['ssa', 'blue', 'zabba']
zipped = zip(seq1, seq2)

list(zipped)

[('foo', 'ssa'), ('bar', 'blue'), ('baz', 'zabba')]

** Dictionaries **


dict is likely the most important built-in Python data structure. A more common name for it is hash map or associative array. It is a flexibly sized collection of key-value pairs, where key and value are Python objects. One approach for creating one is to use curly braces {} and colons to separate keys and values

In [33]:
empty_dict = {}
d1 = {'a' : 'some value', 'b' : [1, 2, 3, 4]}
print(d1)
d1[7] = 'an integer'
print(d1['b'])

{'a': 'some value', 'b': [1, 2, 3, 4]}
[1, 2, 3, 4]


In [34]:
list(d1.keys())

['a', 'b', 7]

In [35]:
mapping = dict(zip(range(5), reversed(range(5))))

In [36]:
mapping

{0: 4, 1: 3, 2: 2, 3: 1, 4: 0}

In [40]:
words = ['apple', 'bat', 'bar', 'atom', 'book']
by_letter = {}
for word in words:
    letter = word[0]
    if letter not in by_letter:
        by_letter[letter] = [word]
    else:
        by_letter[letter].append(word)


In [41]:
by_letter

{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}

** Functions **

Functions are the primary and most important method of code organization and reuse in Python. As a rule of thumb, if you anticipate needing to repeat the same or very similar code more than once, it may be worth writing a reusable function. Functions can also help make your code more readable by giving a name to a group of Python statements.

There is no issue with having multiple return statements. If Python reaches the end of a function without encountering a return statement, None is returned automatically.

Functions can access variables in two different scopes: global and local. An alternative and more descriptive name describing a variable scope in Python is a namespace. Any variables that are assigned within a function by default are assigned to the local namespace. The local namespace is created when the function is called and immediately populated by the function’s arguments. After the function is finished, the local namespace is destroyed

Assigning variables outside of the function’s scope is possible, but those variables must be declared as global via the global keyword.



In [4]:
states = ['   Alabama ', 'Georgia!', 'Georgia', 'georgia', 'FlOrIda',
          'south   carolina##', 'West virginia?']

import re

def clean_strings(strings):
    result = []
    for value in strings:
        value = value.strip()
        value = re.sub('[!#?]', '', value)
        value = re.sub('   ', ' ', value)
        value = value.title()
        result.append(value)
    return result


clean_strings(states)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South Carolina',
 'West Virginia']

In [3]:
ttt = 'South Carolina'
ttt

'South Carolina'

** Anonymous (Lambda) Functions **

Python has support for so-called anonymous or lambda functions, which are a way of writing functions consisting of a single statement, the result of which is the return value. They are defined with the lambda keyword, which has no meaning other than “we are declaring an anonymous function”.

I usually refer to these as lambda functions in the rest of the book. They are especially convenient in data analysis because, as you’ll see, there are many cases where data transformation functions will take functions as arguments. It’s often less typing (and clearer) to pass a lambda function as opposed to writing a full-out function declaration or even assigning the lambda function to a local variable.

In [5]:
def short_function(x):
    return x * 2

equiv_anon = lambda x: x * 2

In [10]:
def apply_to_list(some_list, f):
    return [f(x) for x in some_list]

ints = [4, 0, 1, 5, 6]
apply_to_list(ints, lambda x: x * 2)

[8, 0, 2, 10, 12]

In [17]:
col = [1,2,23,4,4,4,2,23,None,2,3,21,21,None,1,23,12,31,31]

# apply_to_list(col, lambda x: sum(x == None))



In [8]:
strings = ['foo', 'card', 'bar', 'aaaa', 'abab']

strings.sort(key=lambda x: len(set(list(x))))

strings

['aaaa', 'foo', 'abab', 'bar', 'card']

In [9]:
some_dict = {'a': 1, 'b': 2, 'c': 3}
for key in some_dict:
    print(key)


a
b
c
