<h1>Data Strucures and Sequences </h1>

<h3>Tuple </h3>

A tuple is fixed-length, immutable sequence of Python objects. The easiest way to create one is with a comma-separated sequence of values

In [1]:
tup = 4,5,6

In [2]:
tup

(4, 5, 6)

In [3]:
nested_tup = (4,5,6), (7,8)

In [4]:
nested_tup

((4, 5, 6), (7, 8))

To convert any sequence or iterator to a tuple, we use the <b>'tuple'</b> function

In [6]:
tuple([4,0,2])

(4, 0, 2)

Elements in a tuple can be accessed with square brackets [] as with most other sequence types

In [7]:
tup

(4, 5, 6)

In [8]:
tup[0]

4

In [10]:
tup[2]

6

While the objects stores in a tuple may be mutable themselves, once the tuple is created it's not possible to modify which object is stored in each slot

In [12]:
tup = tuple(['foo', [1,2], True])

In [13]:
tup[2] = False

TypeError: 'tuple' object does not support item assignment

If an object inside a tuple is mutable, such as lists, you can modify it in-place

In [14]:
tup[1].append(3)

In [15]:
tup

('foo', [1, 2, 3], True)

We can concatenate tuples using the + operator to produce longer tuples

In [17]:
(4, None, 'foo') + (6,0) + ('bar',)

(4, None, 'foo', 6, 0, 'bar')

Multiplying a tuple by an integer, as with lists, has the effect of concatenating together that many copies of the tuple

In [18]:
('foo', 'bar') * 4

('foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar')

<h3>Unpacking Tuples </h3>

In [19]:
tup = (4,5,6)

In [20]:
tup

(4, 5, 6)

In [21]:
a, b, c = tup

In [22]:
a

4

In [23]:
b

5

In [24]:
c

6

Even sequences with nested tuples can be unpacked

In [25]:
tup = 4,5,(6,7)

In [26]:
a,b,(c,d) = tup

In [27]:
a

4

In [28]:
b

5

In [29]:
c

6

In [31]:
d

7

In python swap can be done like this

In [32]:
a, b = 1,2

In [33]:
a

1

In [34]:
b

2

In [35]:
a, b = b, a

In [36]:
a

2

In [37]:
b

1

A common use of variable unpacking is iterating over sequences of tuples or lists

In [40]:
seq = [(1,2,3), (4,5,6), (7,8,9)]

In [41]:
for a,b,c, in seq:
    print(f"a: {a}, b: {b} & c: {c}")

a: 1, b: 2 & c: 3
a: 4, b: 5 & c: 6
a: 7, b: 8 & c: 9


Another commonc use is returning multiple values form a function

The python langugae recently acquired some more advanced tuple unpacking to help with situations where you may want to pluck a few elements from the beginning  fo a tuple. This uses the special syntax *rest, which is also used in function signatures to capture an arbitrarily long list of positional arguements

In [42]:
values = 1,2,3,4,5

In [43]:
a, b, *rest = values

In [44]:
a

1

In [45]:
b

2

In [46]:
rest

[3, 4, 5]

It is conventional among python programmers to use underscore (_) for unwanted variables instead of rest

In [47]:
a, b, *_ = values

In [48]:
_

[3, 4, 5]

<h3> Tuple Methods </h3>

Since the size and contents of a tuple cannot be modified, it is very light on instance methods. A particularly useful one is <b>'count'</b> which counts the number of occurences of a value.

In [49]:
a = (1,2,2,2,3,4,2)

In [50]:
a.count(1)

1

In [51]:
a.count(2)

4

In [52]:
a.index(4)

5

In [53]:
a.index(2)

1

Here, the index method in a tuple only returns the index of the first match object if the values are repeated.

<h3>List</h3>

In contrast to tuples, lists are variable length and their contents can be modified in-place. We can define them using square brackets [] or using the list type function

In [56]:
a_lsiit = [2,3,7, None]

In [57]:
tup = ('foo', 'bar', 'baz')

In [58]:
b_list = list(tup)

In [59]:
b_list

['foo', 'bar', 'baz']

In [61]:
b_list[1] = 'peekaboo'

In [62]:
b_list

['foo', 'peekaboo', 'baz']

The list function is frequently used in data processing as a way to materialize an interator or generator expression.

In [63]:
gen = range(10)

In [64]:
gen

range(0, 10)

In [65]:
list(gen)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

<h3>Adding and Removing Elements</h3>

Elements can be appended to the end of the list with the append method

In [66]:
b_list

['foo', 'peekaboo', 'baz']

In [67]:
b_list.append('dwarf')

In [68]:
b_list

['foo', 'peekaboo', 'baz', 'dwarf']

Using insert an element can be inserted at a specific location in the list


In [69]:
b_list.insert(1, 'red')

In [70]:
b_list

['foo', 'red', 'peekaboo', 'baz', 'dwarf']

<b>Note: </b> The insertion index must be between 0 and the length of the list, inclusive

The inverse operation to insert is pop, which removes and returns an element at a particular index

In [71]:
b_list.pop(2)

'peekaboo'

In [72]:
b_list

['foo', 'red', 'baz', 'dwarf']

If the index is non provided to the pop function then the last element in the list will be removed

In [73]:
b_list.pop()


'dwarf'

In [74]:
b_list

['foo', 'red', 'baz']

Elements can also be removed by value with remove, which locates the first such value and removes it from the list

In [75]:
b_list.append('foo')

In [77]:
b_list

['foo', 'red', 'baz', 'foo']

In [78]:
b_list.remove('foo')

In [79]:
b_list

['red', 'baz', 'foo']

To check if a list contains a value, we use the <b>'in'</b> keyword.

In [80]:
'dwarf' in b_list

False

In [81]:
'red' in b_list

True

In [82]:
'dwarf' not in b_list

True

Chekcing whether a list contains a value is a lot slower than doing so with dicts and sets, as Python makes a linear scan across the values of the list, whereas it can check the others in constant time.

<h3>Concatenating and combining lists</h3>

Similar to tuples, adding two lists together with + concatenates them

In [84]:
[4, None, 'foo'] + [7,8,(2,3)]

[4, None, 'foo', 7, 8, (2, 3)]

If we have a list already defined, we can append multiple elements or another lists ot it using the <b>'extend'</b> method

In [85]:
x = [4, None, 'foo']

In [86]:
x.extend([7,8,(2,3)])

In [87]:
x

[4, None, 'foo', 7, 8, (2, 3)]

<b>Note: </b> List concatenation by addition is comparatively expensive operation since a new list must be created and the objects coped over. Using extend to append elements to an existing list, especially if we are building up a large list, is usually preferable

<h3>Sorting</h3>

We can sort a list in-place by calling its sort function

In [89]:
a = [7,2,5,1,3]

In [90]:
a.sort()

In [91]:
a

[1, 2, 3, 5, 7]

<b>'sort'</b> has a few options that will occasionally come in handy. One is the ability to pass a secondary sort key-that is, a function that produces a value to use to sort the objects

In [92]:
b = ['saw', 'small', 'he', 'foxed', 'six']

In [93]:
b.sort(key=len)

In [94]:
b

['he', 'saw', 'six', 'small', 'foxed']

<h3>Binary Search and maintaining a sorted list </h3>

The built-in bisect module implements binary search and insertion into a sorted list. bisect.bisect finds the location where an element should be inserted to keep it sorted, while bisect.insort actually inserts the element into that location

In [95]:
import bisect

In [96]:
c = [1,2,2,2,3,4,7]

In [97]:
bisect.bisect(c,2)

4

In [98]:
bisect.bisect(c,5)

6

In [99]:
bisect.insort(c,6)

In [100]:
c

[1, 2, 2, 2, 3, 4, 6, 7]

<b>Note:</b> The bisect module functions do not check whether the list is sorted, as doing so would be computataionally expensive. Thus, using them with an unsorted list will succeed without erorr but may lead to incorrect results.

<h3>Slicing</h3>

We can select sections of most sequence types by using slice notatoin, which in its basic form consits of <b>start:stop(exclusive)</b> passed to thte indexing operator[]:

In [101]:
seq = [7,2,3,7,5,6,0,1]

In [104]:
len(seq)

8

In [103]:
seq[1:5]

[2, 3, 7, 5]

Slices can also be assigned to with a sequence

In [105]:
seq[3:4]= [6,3]

In [107]:
seq

[7, 2, 3, 6, 3, 5, 6, 0, 1]

In [108]:
len(seq)

9

While the element at the start index is included, the stop index is not included, so that the number of elements in the result is stop - start

Either the start or stop can be omitted, in which case they default to the start of the sequence and the end of the sequence, respectively

In [109]:
seq[:5]

[7, 2, 3, 6, 3]

Negative indices slice the sequence relative to the end:

In [111]:
seq[-4:]

[5, 6, 0, 1]

In [112]:
seq[-6:-2]

[6, 3, 5, 6]

A step can be used after a second colon to say, take every other element

In [113]:
seq

[7, 2, 3, 6, 3, 5, 6, 0, 1]

In [114]:
seq[::2]

[7, 3, 3, 6, 1]

A clever use of this is to pass -1, which has the useful effect of reversing a list or tuple

In [115]:
seq[::-1]

[1, 0, 6, 5, 3, 6, 3, 2, 7]

<h3>Built-in Sequence Functions</h3>

<b>enumerate</b> - It's common when interating over a sequence to want to keep track of the index of the current item. The enumerate is a built in python function, which returns a sequnece of (index, value) tuples

<p>Syntax: for i, value in enumerate(collection): </p>

When we are indexing data, a helpful pattern that uses enumerate is computing a dict mapping the values of a sequence(which are assumed to be unique) to their locations in the sequence

In [116]:
some_list = ['foo', 'bar', 'baz']

In [118]:
mapping = {}

In [119]:
for index, value in enumerate(some_list):
    mapping[index] = value

In [120]:
mapping

{0: 'foo', 1: 'bar', 2: 'baz'}

<b>sorted</b> - The sorted function returns a new sorted list from the elements of any sequence

In [121]:
sorted([7,1,2,6,0,3,2])

[0, 1, 2, 2, 3, 6, 7]

The sorted function acccepts the same arguements as the sort method on lists

In [122]:
words = ['the', 'sorted', 'function', 'accepts', 'the', 'same']

In [123]:
sorted(words,key = len)

['the', 'the', 'same', 'sorted', 'accepts', 'function']

<b>zip</b> - It 'pairs' up the elements of a number of lists, tuples or other sequences to create a list of tuples

In [124]:
seq1 = ['foo', 'bar', 'baz']

In [125]:
seq2 = ['one', 'two', 'three']

In [126]:
zipped = zip(seq1, seq2)

In [127]:
zipped

<zip at 0x1cba23b5f88>

In [128]:
list(zipped)

[('foo', 'one'), ('bar', 'two'), ('baz', 'three')]

<b>zip</b> can take an arbitrary number of sequences, and the number of elements it produces is determined by the shortest sequence:

In [129]:
seq3 = ['False', 'True']

In [130]:
list(zip(seq1, seq2, seq3))

[('foo', 'one', 'False'), ('bar', 'two', 'True')]

A very common use of zip is simultaneously iterating over multiple sequences, possibly also combined with enumerate

In [131]:
seq1

['foo', 'bar', 'baz']

In [132]:
seq2

['one', 'two', 'three']

In [133]:
for i, (a,b) in enumerate(zip(seq1, seq2)):
    print(f"{i}: {a}, {b}")

0: foo, one
1: bar, two
2: baz, three


Given a <b>'zipped'</b> sequence, zip can be applied in a clever way to 'unzip' the sequence. Another way to think about this converting a list of rows into a list of columns. 

In [134]:
pitchers = [('Nolan', 'Ryan'), ('Roger', 'Clemens'),
           ('Schilling', 'Curt')]

In [138]:
first_name, last_name = zip(*pitchers)

In [139]:
first_name

('Nolan', 'Roger', 'Schilling')

In [140]:
last_name

('Ryan', 'Clemens', 'Curt')

<b>reversed</b> - reversed iterates over the elements of a sequence in reverse order

In [147]:
list(reversed(range(10)))

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

<h3>dict</h3>

<b>'dict'</b> is likely the most important built-in python data structure. A more common name for it is hash map or associative array. It is a flexibly sized collection of key-value pairs, where key and value are Python objects. One approach to creating one is to use curly braces {} and colons to seperate keys and values.

In [148]:
empty_dict = {}

In [149]:
d1 = {'a': 'some value', 'b':[1,2,3,4]}

In [150]:
d1

{'a': 'some value', 'b': [1, 2, 3, 4]}

We can access, insert or set elements using the same sytanx as for accesing elements of a list or tuple

In [151]:
d1[7]  = 'an integer'

In [152]:
d1

{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}

In [153]:
d1['b']

[1, 2, 3, 4]

We can check if a dict contains a key using the same syntax used for checking whether a list or tuple contains a value

In [154]:
'b' in d1

True

In [155]:
'b' in d1.keys()

True

We can delete the values either using the <b>'del'</b> keyword or the pop method(which simultaneously retursn the value and deletes the key)

In [156]:
d1[5] = 'some value'

In [157]:
d1

{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer', 5: 'some value'}

In [158]:
d1['dummy'] = 'another value'

In [159]:
d1

{'a': 'some value',
 'b': [1, 2, 3, 4],
 7: 'an integer',
 5: 'some value',
 'dummy': 'another value'}

In [160]:
del d1[5]

In [161]:
d1

{'a': 'some value',
 'b': [1, 2, 3, 4],
 7: 'an integer',
 'dummy': 'another value'}

In [162]:
ret = d1.pop('dummy')

In [163]:
ret

'another value'

In [164]:
d1

{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}

The keys and values method give you iterators of the dict's keys and values, respectively. While the key-value pairs are not in any particular order, these functions output the keys and values in the same order

In [165]:
list(d1.keys())

['a', 'b', 7]

In [166]:
list(d1.values())

['some value', [1, 2, 3, 4], 'an integer']

We can merge one dict into another using the update method

In [167]:
d1.update({'b' : 'foo', 'c': 12})

In [168]:
d1

{'a': 'some value', 'b': 'foo', 7: 'an integer', 'c': 12}

The update methods changes dicst in-place, so any existing keys in the data passed to the update will have their old values discarded

<h3>Creating dicts from sequences</h3>

It is common to occasionally endup with two sequences that you want to pair up element-wise in a dict. 

<pre>
    mapping = {}
    for key, value in zip(key_list, value_list):
    mapping[key]= value
</pre>    

Since a dict is essentially a collection of 2 tuples, the dict function accepts a list of 2-tuples

In [172]:
mapping = dict(zip(range(5), reversed(range(5))))

In [173]:
mapping

{0: 4, 1: 3, 2: 2, 3: 1, 4: 0}

<b> Default Values </b>

<pre>
It is very common to have logic like:
    if key in some_dct:
        value = some_dict[key]
    else:
        value = default_value
    
</pre>

Thus, the dict methods get and pop can take default value to be returned, so that the above if-else block can be written as simply as:
<pre>
    value = some_dict.get(key,default_value)
</pre>

get by default will return None if they key is not present, while pop will raise an exception

In [5]:
by_letter= {}

In [6]:
for word in words:
    letter = word[0]
    if letter not in by_letter:
        by_letter[letter] = [word]
    else:
        by_letter[letter].append(word)

In [7]:
by_letter

{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}

The <b>'setdefault'</b> dict method is for precisely this purpose. The preceding for loop can be rewritten as:

In [183]:
for word in words:
    letter = word[0]
    by_letter.setdefault(letter,[]).append(word)

In [184]:
by_letter

{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}

The built-in collections module has a useful class, defaultdict, which makes this even easier. TO create one, you pass a type or function for generating the default valu for each slot in the dict:

In [186]:
from collections import defaultdict
by_letter = defaultdict(list)
for word in words:
    by_letter[word[0]].append(word)

In [187]:
by_letter

defaultdict(list, {'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']})

<h3>Valid dict key types</h3>

While the values of a dict can be any Python object, the keys generally have to be immutable objcets like scalar types(int, float, string) or tuples(all the objects in the tuple need to be immutable, too). The technical term here is hashabiility. We can check whether an object is hashable(cann be used as a key in dict) with the hash function

In [188]:
hash('string')

-8095158123584499513

In [189]:
hash((1,2,(2,3)))

1097636502276347782

In [190]:
hash((1,2,[2,3]))

TypeError: unhashable type: 'list'

To use list as a key, one option is to convert it to a tuple, which can be hashes as long as its elements also can


In [191]:
d = {}

In [192]:
d[tuple([1,2,3])] = 5

In [193]:
d

{(1, 2, 3): 5}

<h3>Set</h3>

A set is an unordered collection of unique elements. We can think of them like dicts, buy keys only, no values. A set can be created in two ways: via the <b>'set'</b> function or via a set literal with curly braces

In [194]:
set([2,2,2,1,3,3])

{1, 2, 3}

In [195]:
{2,2,2,1,3,3,}

{1, 2, 3}

Sets support mathematical set operations like union, intersection, difference, and symmetric difference.

In [196]:
a = {1,2,3,4,5}

In [197]:
b = {3,4,5,6,7,8}

The union of these two sets is the set of distinct elements occuring in etiher set. This can be computed with either the unioni method or the | binary operator.

In [198]:
a.union(b)

{1, 2, 3, 4, 5, 6, 7, 8}

In [201]:
a | b

{1, 2, 3, 4, 5, 6, 7, 8}

The intersection contains the elements occuring in both sets. The & operator or the intersection method can be used.

In [199]:
a.intersection(b)

{3, 4, 5}

In [200]:
a & b

{3, 4, 5}

![alt Text](Images/set_operations.png)

In [202]:
a

{1, 2, 3, 4, 5}

In [203]:
c = a.copy()

In [204]:
c

{1, 2, 3, 4, 5}

In [205]:
c|= b

In [206]:
c

{1, 2, 3, 4, 5, 6, 7, 8}

In [207]:
d = a.copy()

In [208]:
d

{1, 2, 3, 4, 5}

In [209]:
d&=b

In [210]:
d

{3, 4, 5}

Like dicts, set elements generally must be immutable. To have list-like elemetns you must convert it to a tuple:

In [211]:
my_data = [1,2,3,4]

In [212]:
my_set= {tuple(my_data)}

In [213]:
my_set

{(1, 2, 3, 4)}

We can also check if a set is a subset of (is contained in) or a superset of (contains all elements of) another set

In [215]:
a_set = {1,2,3,4,5}

In [216]:
{1,2,3}.issubset(a_set)

True

In [217]:
a_set.issuperset({1,2,3})

True

Sets are equal if and only if their contents are equal:


In [219]:
{1,2,3} == {3,2,1}

True

<h3>List, Set and Dict Comprehensions </h3>

List comprehension allows us to concisely from a new list by filtering the elements of a collection, transforming the elements passing the filter in one concide expression of the form:
<pre> 
    [expr for val in collection if condition]
</pre>

This is equivalent to the following for loop:
<pre>
    result = []
    for val in collection:
        if condition:
            result.append(expr)
</pre>

In [220]:
strings = ['a', 'as', 'bat', 'car', 'dove', 'python']

In [221]:
[x.upper() for x in strings if len(x) > 2]

['BAT', 'CAR', 'DOVE', 'PYTHON']

Set and dict comprehensions are a natural extension, producing sets and dicts in an idiomatically similar way instead of lists. A dict comprehension looks like this:
<pre>
    dict_comp = {key-expr: value-expr for value in collection if condition
</pre>

A set comprehension looks like the equivalent list comprehension except with curly braces instead of square brackets:
<pre>
    set_comp = {expr for value in collection if condition}
</pre>

In [222]:
strings

['a', 'as', 'bat', 'car', 'dove', 'python']

In [224]:
unique_lengths  = {len(x) for x in strings}

In [225]:
unique_lengths

{1, 2, 3, 4, 6}

We could also express this more functionally using the <b>'map'</b> function 

In [226]:
set(map(len, strings))

{1, 2, 3, 4, 6}

As a simple dict comprehesion example, we could create a lookup map of these strings to their locations in the list:

In [227]:
loc_mapping = {value: index for index, value in enumerate(strings)}

In [228]:
loc_mapping

{'a': 0, 'as': 1, 'bat': 2, 'car': 3, 'dove': 4, 'python': 5}

<h3>Nested List Comprehensions</h3>

In [229]:
all_data = [['John', 'Emily', 'Micheal', 'Mary', 'Steven'],
           ['Maria', 'Juan', 'Javier', 'Natalia', 'Pillar']]

To get a single list containing all names with two or more e's in them we could do this with a simple for loop:

In [234]:
names_of_interest = []
for names in all_data:
    enough_es = [name for name in names if name.count('e')>=2]
    names_of_interest.extend(enough_es)

In [235]:
names_of_interest

['Steven']

But we can actually wrap this whole operation up in a single nested list comprehension like:

In [236]:
result = [name for names in all_data for name in names if name.count('e')>=2]

In [237]:
result

['Steven']

At first, nested list comprehensions are a bit hard to wrap your head around. The for parts of the list comprehension are arranged according to the order of nesting, and any filter condition is pull at the end as before. Here is another example where we 'flatten' a list of tuple of integers into a simple list of integers

In [238]:
some_tuples = [(1,2,3),(4,5,6),(7,8,9)]

In [239]:
flattened = [x for tup in some_tuples for x in tup]

In [240]:
flattened

[1, 2, 3, 4, 5, 6, 7, 8, 9]

Kepp in mind that the order of the for expression would be the same if we wrote a nested for loop instead of a list comprehension:


In [241]:
flattened = []

In [242]:
for tup in some_tuples:
    for x in tup:
        flattened.append(x)

Building a list of lists using list comprehension

In [243]:
[[x for x in tup]for tup in some_tuples]

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

<h3> Functions </h3>

In [246]:
def my_function(x,y, z= 1.5):
    if z > 1:
        return z * (x+y)
    else:
        return z/ (x + y)

There is no issue with multiple return statements. If Python reaches the end of a function without encountering a return statemetn, None is returned automatically.

Each function can have positional arugments and keyword arguements. Keyword arguements are most commonly used to speccify default values or optional arguements. In the preceding fucntion, x and y are positional arguements while z is a keyword arguments. This means that the function can be called in any of these ways:

<pre>
    my_function(5,6,z=0.7)
    my_function(3.14,7,3.5)
    my_function(10,20
</pre>

In [247]:
my_function(5,6,z=0.7)

0.06363636363636363

In [248]:
my_function(3.14,7,3.5)

35.49

In [249]:
my_function(10,20)

45.0

The main restriction on function arguments is that the keyword arguments must always follows the positional arguements (if any). 

<h3>Namespaces, Scope and Local Functions</h3>

Functions can access variables in two different scopes: global and local. An alternative
and more descriptive name describing a variable scope in Python is a namespace. Any
variables that are assigned within a function by default are assigned to the local
namespace. The local namespace is created when the function is called and immedi‐
ately populated by the function’s arguments. After the function is finished, the local
namespace is destroyed (with some exceptions that are outside the purview of this
chapter).


In [261]:
def func():
    z = []
    for i in range(5):
        z.append(i)

When func() is called, the empty list z is created, five elements are appended, and then z is destroyed when the function exits.

Suppose instead we had declared z as follows:

In [263]:
z = []
def func():
    for i in range(5):
        z.append(i)

Assigning variables outsid of the function's scope is possible, but those variables must be declared as global via the global keyword

In [265]:
a = None

In [266]:
def bind_a_variable():
    global a
    a = []

In [267]:
bind_a_variable

<function __main__.bind_a_variable>

In [268]:
print(a)

None


<h3>Returning Multiple Values </h3>

In [269]:
def f():
    a = 5
    b = 6
    c = 7
    return a,b,c

In [270]:
a,b,c, = f()

In [271]:
a

5

In [272]:
b

6

In [273]:
c

7

A potentially attractive alternative to returning multiple values like before might be to return a dict instead

In [2]:
def f():
    a = 5
    b = 6
    c = 7
    return {'a': a, 'b': b, 'c': c}

In [3]:
f()

{'a': 5, 'b': 6, 'c': 7}

<h3>Functions Are Objects</h3>

Since Python functions are objects, many constructs can be easily expressed that are difficult to do in other languages.

In [4]:
states = ['    Alababa', 'Georgiga!', 'Georgia', 'georgia', 'FlOrIda', 'south    carolina##', 'West virginia?']

To convert it into statndard data we have to strip whitespace, remove puncutation symbols and standardize on propercapitalizaiton. One way to do this is to use built-in string methods along with the 're' standarad library module for regular expressions

In [5]:
import re

In [11]:
def clean_strings(strings):
    result = []
    for value in strings:
        value = value.strip()
        value = re.sub('[!#?]', '', value)
        value = value.title()
        result.append(value)
    return result

In [12]:
clean_strings(states)

['Alababa',
 'Georgiga',
 'Georgia',
 'Georgia',
 'Florida',
 'South    Carolina',
 'West Virginia']

An alternative approach that we may find useful is to make a list of the operations we want to apply to a particular set of strings

In [15]:
def remove_punctuation(value):
    return re.sub('[!#?]', '', value)

In [16]:
clean_ops = [str.strip, remove_punctuation, str.title]

In [17]:
def clean_strings(strings, ops):
    result = []
    for value in strings:
        for function in ops:
            value = function(value)
        result.append(value)
    return result

In [18]:
clean_strings(states, clean_ops)

['Alababa',
 'Georgiga',
 'Georgia',
 'Georgia',
 'Florida',
 'South    Carolina',
 'West Virginia']

A more functional pattern like this enables you to easily modify how the strings are
transformed at a very high level. The clean_strings function is also now more reus‐
able and generic|

<h3>Anonymous or Lambda Functions

Python has a support for so-called anaonymous or lambda functions, which are way of writing functions consisting of a single statment, the result of which is the return value. They are defined with the lambda keyword, which ahs no meaning other than "we are declarign an anonymous function"

In [19]:
def short_function(x):
    return x * 2

In [20]:
equivalent_anonymous = lambda x : x * 2

Lambda functions are especially convenient in data analysis because, there are many cases where data transformation functions will take functions as arguments. It is often less typing (and clearer) to pass a lambda function as opposed to writing a full-out function declaratino or even assigning the lambda function to a local variable.

In [21]:
def apply_to_list(some_list, f):
    return [f(x) for x in some_list]

In [22]:
ints = [4,0,1,5,6]

In [23]:
apply_to_list(ints, lambda x : x * 2)

[8, 0, 2, 10, 12]

In [24]:
strings = ['foo', 'card', 'bar', 'aaaa', 'abab']

In [25]:
strings.sort(key = lambda x: len(set(list(x))))

In [26]:
strings

['aaaa', 'foo', 'abab', 'bar', 'card']

<b>Note: </b> One reason lambda functions are called anonymous functions is that, unlike functions declared with the def keyword, the function object itself is never given an explicit __name__ attribute

<h3>Currying: Partial Argument Applicatoin </h3>

Currying is computer science jargon that means deriving new functions from existing ones by partial argument application.

In [32]:
def add_numbers(x,y):
    return x + y

In [33]:
add_numbers(3,4)

7

Using this function, we could derive a new function of one varibale, add_five, that adds 5 to its arguements

In [34]:
add_five = lambda y: add_numbers(5,y)


In [35]:
add_five(6)

11

Here, the second argument to add_numbers is said to be curried. 

The built-in functools moduel can simplify this process using the partial function

In [30]:
from functools import partial
add_five = partial(add_numbers,5)

In [37]:
add_five(4)

9

<h3>Generators</h3>

In [38]:
some_dict = {'a': 1, 'b': 2, 'c': 3}

In [40]:
for key in some_dict:
    print(key)

a
b
c


In [41]:
dict_iterator = iter(some_dict)

In [42]:
dict_iterator

<dict_keyiterator at 0x1fa490413b8>

An iterator is any object that will yeild objects to the Python interpreter when used in a context like a for loop

In [43]:
list(dict_iterator)

['a', 'b', 'c']

<h3>itertools module</h3>


The standard library itertools module has a collection of generators for many common data algorithms. For example, <b>groupby</b> takes any sequence and a function, grouping consecutive elements in the sequence by return value of the function. Here's an example:

In [50]:
import itertools

In [51]:
first_letter = lambda x: x[0]

In [52]:
names = ['Alan', 'Adam', 'Wes', 'Will', 'Albert', 'Steven']

In [53]:
for letter, names in itertools.groupby(names, first_letter):
    print(f"Letter : {letter} - {list(names)}")

Letter : A - ['Alan', 'Adam']
Letter : W - ['Wes', 'Will']
Letter : A - ['Albert']
Letter : S - ['Steven']


![alt Text](Images/itertools_func.png)

<h3>Errors and Exception Handling</h3>

In [54]:
float('1.2345')

1.2345

In [55]:
float('something')

ValueError: could not convert string to float: 'something'

In [56]:
def attempt_float(x):
    try:
        return float(x)
    except Exception as exc:
        return f"Excpetions occured as: {exc}"

In [57]:
attempt_float('1.2345')

1.2345

In [58]:
attempt_float('something')

"Excpetions occured as: could not convert string to float: 'something'"

We might want to suppress ValueError, since a TypeError might indicate a legitimate bug in your program. To do that, write the exception type after except:

In [59]:
def attempt_float(x):
    try:
        return float(x)
    except ValueError:
        return x

In [60]:
attempt_float((1,2))

TypeError: float() argument must be a string or a number, not 'tuple'

We can catch multiple exception types by writing a tuple of exception types intead 

In [61]:
def attempt_float(x):
    try:
        return float(x)
    except (TypeError, ValueError):
        return x

In some cases, we may not want to suppress an exception, but you want some code to be executed regardless of whether the code in the try block succees or not, To do this, we use <b>'finally'</b>