# Built-in Data Structures

## List

### bisect

The built-in bisect module implements binary search and insertion into a sorted list. bisect.bisect() finds the location where an element should be inserted to keep it sorted.

In [2]:
c = [1, 2, 2, 2, 3, 4, 7]

import bisect
bisect.bisect(c, 2)

4

In [3]:
bisect.bisect(c, 5)

6

In [4]:
c

[1, 2, 2, 2, 3, 4, 7]

bisect.insort() actually inserts the element into that location

In [5]:
bisect.insort(c, 6)
c

[1, 2, 2, 2, 3, 4, 6, 7]

### enumerate

In [12]:
some_list = ['foo', 'bar', 'baz']

In [13]:
mapping = {}

In [14]:
for i, v in enumerate(some_list):
    mapping[v] = i

In [15]:
mapping

{'foo': 0, 'bar': 1, 'baz': 2}

In [17]:
sorted([7, 1, 9, 6, 0, 3, 2])

[0, 1, 2, 3, 6, 7, 9]

### zip

In [31]:
seq1 = ['foo', 'bar', 'baz']
seq2 = ['one', 'two', 'three']
zipped = zip(seq1, seq2)
list(zipped)

[('foo', 'one'), ('bar', 'two'), ('baz', 'three')]

In [32]:
seq3 = [False, True]
list(zip(seq1, seq2, seq3))

[('foo', 'one', False), ('bar', 'two', True)]

### enumerate + zip

In [25]:
for i, (a, b) in enumerate(zip(seq1, seq2)):
    print('{0}: {1}, {2}'.format(i, a, b))

0: foo, one
1: bar, two
2: baz, three


### unzip the zipped

In [33]:
pitchers = [('Nolan', 'Ryan'), ('Roger', 'Clemens'), ('Schilling', 'Curt')]
first_names, last_names = zip(*pitchers)
print(first_names)
print(last_names)

('Nolan', 'Roger', 'Schilling')
('Ryan', 'Clemens', 'Curt')


In [30]:
list(reversed(range(10)))

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

## Dict

### Assigning

In [4]:
d1 = {'a':'some value', 'b':[1, 2, 3, 4]}
d1

{'a': 'some value', 'b': [1, 2, 3, 4]}

In [5]:
d1['b']

[1, 2, 3, 4]

In [6]:
d1[5] = 'some value'
d1

{'a': 'some value', 'b': [1, 2, 3, 4], 5: 'some value'}

### Creating dicts from sequences

In [21]:
''' equivalent to the following loop:
mapping = {}
for key, value in zip(key_list, value_list):
    mapping[key] = value
'''

mapping = dict(zip(range(1,6), ['a', 'b', 'c', 'd', 'e']))
mapping

{1: 'a', 2: 'b', 3: 'c', 4: 'd', 5: 'e'}

### Modify

In [7]:
del d1[5]
d1

{'a': 'some value', 'b': [1, 2, 3, 4]}

In [8]:
d1.update({'b': 'foo', 'c':12})

In [11]:
list(d1.keys())

['a', 'b', 'c']

In [12]:
list(d1.values())

['some value', 'foo', 12]

In [14]:
# You can merge one dict into another using the update method
d1.update({'b': 'foo', 'c': 20})
d1

{'a': 'some value', 'b': 'foo', 'c': 20}

### .setdefault()

dictionary.setdefault(keyname, value)

- If the key exist, this parameter has no effect.

- If the key does not exist, this value becomes the key's value

setdefault() returns the value of the key

In [6]:
words = ['apple', 'bat', 'bar', 'atom', 'book']

by_letter = {}
for word in words:
    letter = word[0]
    by_letter.setdefault(letter, []).append(word)
    '''
    假如letter的值是“a”，如果 by_letter 的 key 中尚无“a”，那么就把”a”作为 key 添加进去，并且把“a“对应的值设为空的 list 即 [];
    此时 by_letter.setdefault(letter, []) 返回的值是 a 的 value, 也就是一个空 list，所以可以直接用.append(word) 把 word 加进去。
    
    当第二次遇到 a 开头的值即"atom"时，by_letter.setdefault(letter, []) 返回的值将会是["apple"]，所以直接 .append("atom")
    '''
    print(by_letter)

{'a': ['apple']}
{'a': ['apple'], 'b': ['bat']}
{'a': ['apple'], 'b': ['bat', 'bar']}
{'a': ['apple', 'atom'], 'b': ['bat', 'bar']}
{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}


## set

A set is an **unordered** collection of **unique** elements. You can think of them like dicts, but keys only, no values.

Elements of a set should be immutable.

In [23]:
a = {1, 2, 3, 4, 5}
b = set([3, 4, 5, 6, 7, 8])

### operations

In [24]:
a.union(b)
a | b

{1, 2, 3, 4, 5, 6, 7, 8}

In [25]:
a.intersection(b)
a & b

{3, 4, 5}

## List, Set, and Dict Comprehensions


In [8]:
# List Comprehension

strings = ['a', 'as', 'bat', 'car', 'dove', 'python']
[x.upper() for x in strings if len(x) > 2]

['BAT', 'CAR', 'DOVE', 'PYTHON']

In [9]:
# Set comprehension
unique_lengths = {len(x) for x in strings}  # set 只保留不重复的值


In [29]:
# Dict comprehension
loc_mapping = {val: index for index, val in enumerate(strings)}
loc_mapping

{'a': 0, 'as': 1, 'bat': 2, 'car': 3, 'dove': 4, 'python': 5}

The for parts of the list comprehension are arranged according to the order of nesting, and any filter condition is put at the end as before. 

In [8]:
some_tuples = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
flattened = [x for tup in some_tuples for x in tup]
flattened

[1, 2, 3, 4, 5, 6, 7, 8, 9]

# Fuctions

In [1]:
states = ['  Alabama ', 'Georgia!', 'Georgia', 'georgia', 'FlOrIda', 
          'south carolina##', 'West virginia?']

import re

def clean_strings1(strings):
    result = []
    for value in strings:
        value = value.strip()
        value = re.sub('[!#?]', '', value)
        value = value.title()
        result.append(value)
    return result

clean_strings1(states)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South Carolina',
 'West Virginia']

## make a **list of the operations** you want to apply to a particular set of strings:

In [4]:
states = ['  Alabama ', 'Georgia!', 'Georgia', 'georgia', 'FlOrIda', 
          'south carolina##', 'West virginia?']

def remove_punctuation(value):
    return re.sub('[!#?]', '', value)

clean_ops = [str.strip, remove_punctuation, str.title]

def clean_strings2(strings, ops):
    result = []
    for value in strings:
        for function in ops:
            value = function(value)
        result.append(value)
    return result

clean_strings2(states, clean_ops)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South Carolina',
 'West Virginia']

In [6]:
states

['  Alabama ',
 'Georgia!',
 'Georgia',
 'georgia',
 'FlOrIda',
 'south carolina##',
 'West virginia?']

### map()

In [5]:
for x in map(remove_punctuation, states):
    print(x)

  Alabama 
Georgia
Georgia
georgia
FlOrIda
south carolina
West virginia


## Anonymous (Lambda) Functions

In [12]:
def short_function(x):
    return x*2

equiv_anon = lambda x: x * 2
three_doubled = equiv_anon(3)
three_doubled

6

In [13]:
def apply_to_list(some_list, f):
    return [f(x) for x in some_list]

ints = [4, 0, 1, 5, 6]
apply_to_list(ints, lambda x: x*2)

[8, 0, 2, 10, 12]

In [16]:
# Sort a collection of strings by the number of distinct letters in each string:

strings = ['foo', 'card', 'bar', 'aaaa', 'abab']

strings.sort(key=lambda x: len(set(list(x))))
strings

['aaaa', 'foo', 'abab', 'bar', 'card']

## Generator

An **iterator** is any object that will yield objects to the Python interpreter when used in a context like a for loop. Most methods expecting a list or list-like object will also accept any iterable object. This includes built-in methods such as min, max, and sum, and type constructors like list() and tuple().

A **generator** is a concise way to construct a new iterable object. Whereas normal functions execute and return a single result at a time, generators return a sequence of multiple results lazily, pausing after each one until the next one is requested. 

To create a generator, use the **yield** keyword instead of return in a function:



In [10]:
def squares(n=10):
    print('Generating squares from 1 to {0}'.format(n ** 2))
    for i in range(1, n+1):
        yield i ** 2
        
gen = squares(5)

# When you actually call the generator, no code is immediately executed:
gen

<generator object squares at 0x7fac18171bd0>

In [11]:
# It is not until you request elements from the generator that it begins executing its code:
for x in gen:
    print(x, end=' ')

Generating squares from 1 to 25
1 4 9 16 25 

### Generator expressions:


In [18]:
gen = (x ** 2 for x in range(100))
print(next(gen))
print(next(gen))
print(next(gen))

0
1
4


Generator expressions can be used instead of list comprehensions as function arguments in many cases:


In [20]:
# 括号中的部分是一个 generator expression 构成的 generator
print(type(x ** 2 for x in range(100)))

# generator 是一个 iterator，所以可以直接作为 sum() 的 parameter
sum(x ** 2 for x in range(100))

<class 'generator'>


328350

In [14]:
dict((i, i ** 2) for i in range(5))

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

### itertools module

The standard library itertools module has a collection of generators for many common data algorithms. 

In [17]:
import itertools

names = ['Alan', 'Adam', 'Wes', 'Will', 'Albert', 'Steven']

first_letter = lambda x: x[0]
for letter, names in itertools.groupby(names, first_letter):
    print(letter, list(names))

A ['Alan', 'Adam']
W ['Wes', 'Will']
A ['Albert']
S ['Steven']


## Errors and Exception Handling

### try-except

In [25]:
float('abc')  # parameter 应该是由数字构成的 string, 所以会产生 ValueError

ValueError: could not convert string to float: 'abc'

In [22]:
def attempt_float(x):
    try:
        return float(x)
    except:
        return x

#print(attempt_float('1.234'))
#print(attempt_float('abc'))
attempt_float('abc')

'abc'

In [24]:
float((3,5))  # parameter 应该是 string 而不是 tuple，所以产生的是 TypeError

TypeError: float() argument must be a string or a number, not 'tuple'

由于 attempt_float 没有指明 Error Type， 所以所有的 exception 都会被 except 分支 handle

In [23]:
attempt_float((3,5))  

(3, 5)

指明 handle 的 Error Type，就会只处理特定的 exception:

In [31]:
def attempt_float_b(x):
    try:
        return float(x)
    except ValueError:  #只 handle ValueError
        return x

In [28]:
attempt_float_b('abc')

'abc'

In [30]:
attempt_float_b((1.3, 2.6))  # TypeError 不会被 except 分支 handle

TypeError: float() argument must be a string or a number, not 'tuple'

可以指定 handle 多种特定类型的 exceptions:

In [33]:
def attempt_float_c(x):
    try:
        return float(x)
    except (TypeError, ValueError):
        return x

### try-finally

In some cases, you may not want to suppress an exception, but you want some code to be **executed regardless of** whether the code in the try block succeeds or not.

In [None]:
f = open(path, 'w')

try:
    write_to_file(f)
except:
    print('Failed')
else:
    print('Succeeded')
finally:
    f.close()

## Files and Operating System

### read

In [None]:
path = 'examples/readme.txt'

f = open(path)
for line in f:
    print(line.rstrip())
    
lines = [x.rstrip() for x in open(path)]

f.close()  # 必须明确地关闭文件

In [None]:
path = 'examples/readme.txt'

with open(path) as f:
    lines = [x.rstrip() for x in f]  # This will automatically close the file f when exiting the with block

- **r**    Read-only mode
- **w**    Write-only mode; creates a new file (erasing the data for any file with the same name)
- **x**    Write-only mode; creates a new file, but fails if the file path already exists
- **a**    Append to existing file (create the file if it does not already exist)
- **r+**   Read and write
- **b**    Add to mode for binary files (i.e., 'rb' or 'wb')
- **t**    Text mode for files (automatically decoding bytes to Unicode). This is the default if not specified. Add t to other modes to use this (i.e., 'rt' or 'xt')


In [None]:
f = open(path)

# .read() returns a certain number of characters from the file
# .read() method advances the file handle’s position by the number of bytes read.
f.read(10)

# .tell() gives you the current position of the handle
f.tell()  

# .seek() changes the file position to the indicated byte in the file:
f.seek(3)

### write

In [None]:
with open('tmp.txt', 'w') as handle:
    handle.writelines(x for x in open(path) if len(x) > 1)