# Notes for Chapter 3:

# Built-in Data Structures, Functions, and Files

***

## 1. Data Structures and Sequences

### Tuple

A tuple is a immutable, fixed-length python in-built sequence. creating tuples:-

In [1]:
tup = 1, 2, 3

In [2]:
tup

(1, 2, 3)

It is most basic way of creating tuple. but it is a good pratice to always use parentheses to create tuple either basic or complex.

In [3]:
a_tup = (4, 5, 6)

nested_tup = ((1, 2, 3), (4, 5, 6))

In [4]:
a_tup

(4, 5, 6)

In [5]:
nested_tup

((1, 2, 3), (4, 5, 6))

In [6]:
type(nested_tup)

tuple

***

Furthermore, __tuple__ keyword can be used to convert any sequence to tuple.

In [8]:
tuple([4, 0, 2])

(4, 0, 2)

In [11]:
tup = tuple('Hello')

In [12]:
tup

('H', 'e', 'l', 'l', 'o')

In [13]:
tup[0]

'H'

Elements inside a tuple can be accessed with index starting at 0, as with most other sequences.

***

If an object stored inside a tuple is mutable, such as list, we can modify it in-place

In [3]:
tup = ('hello', [1, 2], True)

In [4]:
tup[1].append(3)

In [5]:
tup

('hello', [1, 2, 3], True)

***

Tuple concatenation can be done by using __+__ sign.

In [7]:
(4, None, 'hello') + (6, 0) + ('bar',)

(4, None, 'hello', 6, 0, 'bar')

Multiplying tuple by an integer, as with lists, concatenates that many copies of the tuple.

In [9]:
(1, 2) * 4

(1, 2, 1, 2, 1, 2, 1, 2)

In [10]:
('hello', 'world') * 2

('hello', 'world', 'hello', 'world')

Note that the objects themselves are not copied, only the references to them.

***

##### Unpacking tuples

Python will attempt to unpack the value on the right hand side of the equal sign, if a tuple like expression of variable is assigned.

In [11]:
tup = (1, 2, 3)

In [12]:
a, b, c = tup

In [13]:
a

1

###### Unpacking nested tuples

In [14]:
tup = 4, 5, (6, 7)

In [15]:
a, b, (c, d) = tup

In [16]:
c

6

In [17]:
a

4

###### Swapping in Python

In [18]:
a, b = 1, 2

In [19]:
a

1

In [20]:
b

2

In [21]:
b, a = a, b

In [22]:
a

2

In [23]:
b

1

***

The Python language recently acquired some more advanced tuple unpacking to help with situations where you may want to “pluck” a few elements from the beginning of a tuple. This uses the special syntax *rest, which is also used in function signatures to capture an arbitrarily long list of positional arguments:

In [24]:
values = 1, 2, 3, 4, 5

In [25]:
a, b, *rest = values

In [26]:
a

1

In [27]:
b

2

In [29]:
rest

[3, 4, 5]

The rest word is replaceable with any suitable name. The standard pratice of unwanted variables is usually (_) underscore.

In [31]:
a, b, *_ = values

In [32]:
_

[3, 4, 5]

***

###### Tuple methods

tuple is immutable so, it does not have lot of methods to work with. but, a count method is highly useful that counts the number of occurence of a value

In [34]:
a = (1, 2, 3, 4, 2, 6, 1, 7, 9)

In [35]:
a.count(2)

2

***

### List

List are mutable and highly used in-built python sequence. list are defined using square brackets or using the list type function to convert objects to List.

In [36]:
a_list = [1, 2, 3, None]

In [38]:
tup = ('x', 'y', 'z')

In [39]:
b_list = list(tup)

In [40]:
b_list

['x', 'y', 'z']

In [41]:
b_list[0] = 'e'

In [42]:
b_list

['e', 'y', 'z']

***

###### Adding and Removing Elements

append method is used to add the element at the end of the list.

In [62]:
b_list.append('a')

In [63]:
b_list

['e', 'y', 'z', 'a']

Another method, insert can be used to insert an element in specific index in the list.

In [66]:
b_list.insert(0, 'x')

In [67]:
b_list

['x', 'e', 'y', 'z', 'a']

pop method can be used to remove a specific element by specifying the index of element inside the list. This method returns the value removed.

In [69]:
b_list.pop(1)

'e'

In [70]:
b_list

['x', 'y', 'z', 'a']

Elements can also be removed using remove method, remove method removes the first such value specified .

In [72]:
b_list.insert(0, 'a')

In [73]:
b_list

['a', 'x', 'y', 'z', 'a']

In [75]:
b_list.remove('a')

In [76]:
b_list

['x', 'y', 'z', 'a']

Check if a list contains certain value or not:

In [77]:
'a' in b_list

True

In [78]:
'hello' in b_list

False

Checking whether a list contains a value is a lot slower than doing so with dicts and
sets (to be introduced shortly), as Python makes a linear scan across the values of the
list, whereas it can check the others (based on hash tables) in constant time.

***

###### Concatenating and Combining lists

__+__ sign is used to concatenate list in python. It is a straight forward approach. Beside this, __extend__ method can also be used to concatenate a list to another list.

In [79]:
[1, 2, 3] + [4, 5, 6]

[1, 2, 3, 4, 5, 6]

In [80]:
b_list

['x', 'y', 'z', 'a']

In [81]:
a_list

[1, 2, 3, None]

In [82]:
a_list.extend(b_list)

In [83]:
a_list

[1, 2, 3, None, 'x', 'y', 'z', 'a']

<span style='color:red'>Note:</span>list concatenation by addition is a comparatively expensive operation since a new list must be created and the objects copied over. Using extend to append elements to an existing list, especially if you are building up a large list, is usually preferable.

***

###### Sorting

sort function can be used to sort a list.

In [86]:
a = [10, 8 , 9 , 6, 2, 4, 1, 5, 3 ]

In [87]:
a

[10, 8, 9, 6, 2, 4, 1, 5, 3]

In [88]:
a.sort()

In [89]:
a

[1, 2, 3, 4, 5, 6, 8, 9, 10]

key is one of the feature that can be used in the sort function. for example:

In [92]:
b = ['i','Thank You', 'fine', 'am']

In [93]:
b.sort(key = len)

In [94]:
b

['i', 'am', 'fine', 'Thank You']

***

###### Binary search and maintaining a sorted list

The built-in bisect module implements binary search and insertion into a sorted list. bisect.bisect finds the location where an element should be inserted to keep it sorted, while bisect.insort actually inserts the element into that location:

In [95]:
import bisect

In [104]:
a_list = [1, 2, 3, 4, 5, 6, 7, 8, 9 , 10]

In [105]:
bisect.bisect(a_list, 2)

2

In [106]:
bisect.bisect(a_list, 9)

9

In [107]:
bisect.insort(a_list, 9)

In [108]:
a_list

[1, 2, 3, 4, 5, 6, 7, 8, 9, 9, 10]

***

###### Slicing

Slicing is generally referred to picking specific set of elements by defining the start:stop range in a sequence.

In [111]:
seq = [5, 4, 3, 2, 1, 0]

In [112]:
seq[1:3]

[4, 3]

slice of elements can also be specified to the sequence:

In [113]:
seq[3:4] = ['a', 'b']

In [114]:
seq

[5, 4, 3, 'a', 'b', 1, 0]

The upper limit or stop limit is excluded is slicing.

either start or stop can be omitted, in the slicing 

In [115]:
seq

[5, 4, 3, 'a', 'b', 1, 0]

In [116]:
seq[:4]

[5, 4, 3, 'a']

In [117]:
seq[1:]

[4, 3, 'a', 'b', 1, 0]

Furthermore, a step can also be used by specifying the step with double colon __::__

In [118]:
seq[::2]

[5, 3, 'b', 0]

In [119]:
seq[::-1]

[0, 1, 'b', 'a', 3, 4, 5]

***

### Built-in Sequence Functions

#### enumerate

enumerate is a built-in python functions which returns a sequence of (i, value) tuples:

In [None]:
for i, value in enumerate(collection):
    #do something with the value
    # i refers to the index of the collection

dict can be used along with enumerate function to keep the systematic track of index and value of any collection:

In [122]:
some_list = ['He', 'is', 'tall']

In [134]:
mapping = {}

In [135]:
for i, value in enumerate(some_list):
    mapping[i] = value

In [136]:
mapping

{0: 'He', 1: 'is', 2: 'tall'}

***

###### sorted

the __sorted__ function returns a new sorted list from the elements of any sequence:

In [137]:
sorted([1, 2, 3])

[1, 2, 3]

In [138]:
a_list = ['volley', 'ball', 'xylophone', 'rex']

In [139]:
sorted(a_list)

['ball', 'rex', 'volley', 'xylophone']

In [140]:
a_list

['volley', 'ball', 'xylophone', 'rex']

The real object is unchanged; __sorted__ just returns a new sorted list unlike __sort__ which entirely change the sequence itself into sorted form.

***

###### Zip

zip pairs up the elements of numbers of any sequence to create a list of tuples:

In [141]:
seq1 = ['foo', 'bar', 'baz']

In [142]:
seq2 = ['one', 'two', 'three']

In [143]:
zipped = zip(seq1, seq2)

In [144]:
zipped

<zip at 0x1fd51fc0c80>

In [145]:
list(zipped)

[('foo', 'one'), ('bar', 'two'), ('baz', 'three')]

zip can take an arbitrary number of sequences, and the number of elements it pro‐
duces is determined by the shortest sequence:

In [146]:
seq3 = [False, True]

In [147]:
list(zip(seq1, seq2, seq3))

[('foo', 'one', False), ('bar', 'two', True)]

***

###### reversed

reversed is a generator that iterates over the elements of a sequence in reverse order:

In [148]:
list(reversed(range(10)))

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

***

### Dict

dict is another built-in python sequence. And one of the most important. It is also known as hash map or associative array. It is flexibly sized collection of key-value pairs, where key and value are python objects.

In [149]:
empty_dict = {}

In [151]:
d1 = {1: 'hello', 2: 'world', 'a': [1, 2, 3]}

In [152]:
d1

{1: 'hello', 2: 'world', 'a': [1, 2, 3]}

we can use modifiers from list to dict as well:

<br>
<br>

Furthermore, __keys__ and __values__ methods gives the dict's keys and values respectively.

In [153]:
d1.keys()

dict_keys([1, 2, 'a'])

In [154]:
d1.values()

dict_values(['hello', 'world', [1, 2, 3]])

__del__ and __pop__ method can be used to remove value from dict. for example:

In [155]:
d1['b'] = 'some value'

In [156]:
d1

{1: 'hello', 2: 'world', 'a': [1, 2, 3], 'b': 'some value'}

In [158]:
del d1['b']

In [159]:
d1

{1: 'hello', 2: 'world', 'a': [1, 2, 3]}

__pop__ returns the value removed.

In [160]:
d1.pop('a')

[1, 2, 3]

In [161]:
d1

{1: 'hello', 2: 'world'}

__update__ method is used in dict to merge one dict to another.

In [162]:
d1.update({'b': 'foo', 'c': 12})

In [163]:
d1

{1: 'hello', 2: 'world', 'b': 'foo', 'c': 12}

In [164]:
d1.update({1: 'Bye'})

In [165]:
d1

{1: 'Bye', 2: 'world', 'b': 'foo', 'c': 12}

***

###### Creating dicts from sequences

Since a dict is essentially a collection of 2-tuples, the dict function accepts a list of
2-tuples:

In [None]:
mapping = {}
for key, value in zip(key_list, value_list):
    mapping[key] = value

In [169]:
mapping = dict(zip(range(5), reversed(range(5))))

In [170]:
mapping

{0: 4, 1: 3, 2: 2, 3: 1, 4: 0}

***

###### Valid dict key types

While the values of a dict can be any Python object, the keys generally have to be immutable objects like scalar types (int, float, string) or tuples (all the objects in the tuple need to be immutable, too). The technical term for this is hashability. An objects hashability can be checked with __hash__ function

In [171]:
hash('hello world')

-7051824215975767429

In [172]:
hash((1, 2 , (2, 3)))

-9209053662355515447

In [174]:
hash([1, 2]) # error because list are mutable.

TypeError: unhashable type: 'list'

***

###### Default Values

The dict methods __get__ and __pop__ can take a default value to be returned, so that the if-else block in __#sample 1__ below can be written as in __#sample 2__

In [None]:
# sample 1

if key in some_dict:
    value = some_dict[key]
else:
    value = default_value

In [None]:
# sample 2

value = some_dict.get(key, default_value)

***

### set

A set is an ordered collection of unique element. A set can be created either using __set__ function or passing set literal with curly braces. Sets are like dicts but keys only, no values.

In [2]:
set([1, 2, 3 , 4 , 4, 5])

{1, 2, 3, 4, 5}

###### repeated values are not taken in set.

In [5]:
{1, 2 , 2, 2, 3, 4, 4, 4, 5, 5,}

{1, 2, 3, 4, 5}

***

Set is use specially as it allows mathematical set operations like union, intersection, difference and symmetric difference

In [6]:
a = {1, 2, 3, 4, 5}

In [7]:
b = {6, 7, 8, 2, 3, 9}

In [8]:
a.union(b)

{1, 2, 3, 4, 5, 6, 7, 8, 9}

In [11]:
# symbolic representation of union

a | b

{1, 2, 3, 4, 5, 6, 7, 8, 9}

In [12]:
a.intersection(b)

{2, 3}

In [13]:
# symbolic representation of intersection

a & b

{2, 3}

##### other useful function of set

In [None]:
Function                  Alternative syntax               Description

a.add(x)                           N/A          Add element x to the set a
a.clear()                          N/A          Reset the set a to an empty state, discarding all of its elements

a.remove(x)                        N/A          Remove element x from the set a
a.pop()                            N/A          Remove arbitrary element from the set a, raising KeyError if the set is empty
 
a.union(b)                         a | b        All of the unique elements in a and b
a.update(b)                        a |= b       Set the contents of a to be the union of the elements in a and b

a.intersection(b)                  a & b        All of the elements in both a and b
a.intersection_update(b)           a &= b       Set the contents of a to be the intersection of theelements in a and b


a.difference(b)                    a - b        The elements in a that are not in b
a.difference_update(b)             a -= b       Set a to the elements in a that are not in b
a.symmetric_difference(b)          a ^ b        All of the elements in either a or b but not both
a.symmetric_difference_update(b)   a ^= b       Set a to contain the elements in either a or b but not both


a.issubset(b)                      N/A          True if the elements of a are all contained in b
a.issuperset(b)                    N/A          True if the elements of b are all contained in a
a.isdisjoint(b)                    N/A          True if a and b have no elements in common



In [14]:
# copy element of a set to another

c = a.copy()

In [15]:
c

{1, 2, 3, 4, 5}

we can check if a set is subset of or a superset of another set:

In [17]:
a_set = {1, 2, 3, 4, 5}

In [18]:
{1, 2, 3}.issubset(a_set)

True

In [19]:
a_set.issuperset({1, 2, 3})

True

<span style='color:red'>Note:</span> Sets are equal only if their contents are equal:

In [21]:
{1, 2, 3} == {1, 2, 3}

True

***

### List, set and Dict Comprehensions

List comprehensions in python allows to concisely form a new list by filtering the elements of a collection, transforming the elements passing the filter in once concise expression.

[expr for value in collection if condition]

In [None]:
# The syntax above is equivalent to the following for loop:

result = []
for value in collection:
    if condition:
        result.append(expr)

The filter condition can be omitted, leaving only the expression. For example, given a list of strings, we could filter out strings with length 2 or less and also convert them to uppercase like this:


In [22]:
strings = ['a', 'as', 'bat', 'hello', 'nice', 'trick']

In [23]:
[x.upper() for x in strings if len(x)>2]

['BAT', 'HELLO', 'NICE', 'TRICK']

Similarly, __dict comprehensions__:

In [None]:
dict_comp = {key-expr : value-expr for value in collection if condition}

__set comprehensions__:

In [None]:
set_comp = {expr for value in collection if condition}

###### Nested List Comprehensions

In [25]:
all_data = [['John', 'Emily', 'Michael', 'Mary', 'Steven'], ['Maria', 'Juan', 'Javier', 'Natalia', 'Pilar']]

suppose, this is a list with a lot of data of names; and we want to get a single list with all names with two or more e's in them. The for loop would be like this:

In [None]:
eNames = []
for names in all_data:
    enough_es = [name for name in names if name.count('e') >= 2]
    eNames.extend(enough_es)

similary, the nested list compression would be:

In [26]:
result = [name for names in all_data for name in names if name.count('e') >= 2]

In [28]:
result

['Steven']

***

## Functions

Function in simple terms represent a block of code that is purposefully written to deal with a specific problem withtin the code. Defining function makes the particular block of code reusable by calling it again and again if needed.

Functions are declared with def keyword and returned from with retun key word:

In [None]:
def sample_function(x, y, z=1.5):
    if z>1:
        return z* (x+y)
    else:
        return z/ (x+y)

If python reaches the end of function without any return statement, then None is returned automatically. Furthermore, there is no issue with having mutliple return statements.

Each function can have positional and keyword arguments. keyword argument are most commonly used to specify default values or optional arguments. 

In [None]:
sample_function (1, 2, z = 0.5)

In above example, z is the keyword argument and x and y are positional argument.

***

### Namespaces, Scope and Local Functions

Functions can access variables in two scopes: global and local. An alternative and more descriptive name describing a variable scope in Python is name space. 

Any variables that are assigned within a function by default are assigned to the local namespace. The local namespace is created when the function is called and immedi‐ately populated by the function’s arguments. After the function is finished, the local namespace is destroyed.

for example:

In [2]:
def func():
    a = []
    for i in range(10):
        a.append(i)


When func() is called, the empty list a is created and then a is destroyed when the function exits. we can also declare variable outside the function like this:

In [4]:
a = []
def func():
    for i in range(10):
        a.append(i)
    

***

### Returning Multiple Values

In [5]:
def f():
    a = 5
    b = 6
    c = 7
    return a, b, c

In [6]:
a, b, c = f()

In [7]:
return_value = f()

In [8]:
a

5

In [9]:
b

6

In [10]:
c

7

In [11]:
return_value

(5, 6, 7)

This is an example on how to return and retreive the multiple value.

***

### Functions are objects

In python, functions are objects as they can be passed as argument in another function, can be stored in variable and capable of other task. lets take a list of strings with messy data and do data cleaning to demonstrate it well:

In [17]:
states = [' Alabama ', 'Georgia!', 'Georgia', 'georgia', 'FlOrIda', 'south   carolina##', 'West virginia?']

These data needs to be cleaned by: stripping whitecase, removing punctuation symbols and standardizing on proper capitalization. one way to do this is to use built-in string methods along with the __re__ standard library module for regular expressions:

In [18]:
import re

In [19]:
def clean_strings(strings):
    result = []
    for value in strings:
        value = value.strip()
        value = re.sub('[!#?]', '', value)
        value = value.title()
        result.append(value)
    return result


In [20]:
clean_strings(states)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South   Carolina',
 'West Virginia']

Another approach for the same task would be to make a list of operations to apply to particular set of strings:

In [21]:
def remove_punctuation(value):
    return re.sub('[!#?]', '', value)

clean_ops = [str.strip, remove_punctuation, str.title]

In [23]:
def clean_strings(strings, ops):
    result = []
    for value in strings:
        for function in ops:
            value = function(value)
        result.append(value)
    return result

In [24]:
clean_strings(states, clean_ops)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South   Carolina',
 'West Virginia']

The 2nd approach makes it easy to modify how the strings are transformed at a very high level. The clean_strings functions is also now more reusable.

***

### Anonymous (Lamda) Functions

A way of writing functions consisting of single statement is known as anonymous or lamda functions in python. The result of such function is return value. They are defined with lambda keyword which signifies the declration of anonymous function:

In [25]:
def short_function(x):
    return x * 2

In [27]:
euqiv_anon = lambda x: x * 2 

lamda function are useful to be passed as argument for another function. for example:

In [28]:
def apply_to_list(some_list, f):
    return [f(x) for x in some_list]

ints = [1, 2, 3, 4]

In [29]:
apply_to_list(ints, lambda x: x * 2)

[2, 4, 6, 8]

<span style='color:red'>Note:</span>

One reason lambda functions are called anonymous functions is that , unlike functions declared with the def keyword, the function object itself is never given an explicit __name__ attribute.

***

### Currying:Partial Argument Application

The procedure of deriving new functions from exiting ones by partial argument application is knowns as Currying. for example:

In [31]:
def add_numbers(x, y):
    return x + y

We could derive a new function of one variable, add_five, that adds 5 to its argument:

In [32]:
add_five = lambda y: add_numbers(5, y)

We can simply this process using partial function from built-in functools module

In [35]:
from functools import partial

In [36]:
add_five = partial (add_numbers, 5)

***

### Generators

A generator is a concise way to construct a new iterable object.  Whereas normal functions execute and return a single result at a time, generators return a sequence of multiple results, pausing after each one until the next one is requested. To create a generator, use the yeild keyword instead of return in a function:

In [38]:
def squares(n = 10):
    print('Generating squares from 0 to {0}'.format(n ** 2))
    for i in range(1, n + 1):
        yield i ** 2

In [43]:
gen = squares()

gen

<generator object squares at 0x000002110A3F23C0>

In [44]:
for x in gen:
    print(x, end = ' ')

Generating squares from 0 to 100
1 4 9 16 25 36 49 64 81 100 

##### Generator expressions

__generator expressions__ is another more concise way to make a generator. This is a generator analogue to list, dict, and set comprehensions; to create one, enclose what would otherwise be a list comprehension within parentheses instead of brackets:

In [51]:
gen = (x ** 2 for x in range(1, 11))

In [52]:
gen

<generator object <genexpr> at 0x000002110A3F2D60>

In [53]:
list(gen)

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

This is equivalent to the approach by defining the __squares__ generator above.

##### itertools module

The standard library itertools module has a collection of generators for many common data algorithms. For example, groupby takes any sequence and a function, grouping consecutive elements in the sequence by return value of the function. For example:

In [54]:
import itertools

In [55]:
first_letter = lambda x: x[0]

In [56]:
names = ['Alan', 'Adam', 'Wes', 'Will', 'Albert', 'Steven']

In [58]:
for letter, names in itertools.groupby(names, first_letter):
    print(letter, list(names))

A ['Alan', 'Adam']
W ['Wes', 'Will']
A ['Albert']
S ['Steven']


Some useful itertools functions:

In [None]:
      Function                                                            Description

combinations(iterable, k)             Generates a sequence of all possible k-tuples of elements in the iterable, ignoring order
                                      and without replacement (see also the companion function combinations_with_replacement)

permutations(iterable, k)             Generates a sequence of all possible k-tuples of elements in the iterable, respecting
                                      order

groupby(iterable[, keyfunc])          Generates (key, sub-iterator) for each unique key

product(*iterables, repeat=1)         Generates the Cartesian product of the input iterables as tuples, similar to a nested for
                                      loop


***

### Errors and Exception Handling

We can use try and except block of code to handle error and exception in python. for example a __float__ function coverts string into float, but fails with a __ValueError__ on improper inputs:

In [59]:
float('1.3555')

1.3555

In [60]:
float('hello world')

ValueError: could not convert string to float: 'hello world'

Suppose, we want to handle this error we can do it like so:

In [78]:
def attempt_float(x):
    try:
        return float(x)
    except:
        return (print('(', x,')', 'cannot be converted to float.'))

In [79]:
attempt_float('34.355')

34.355

In [80]:
attempt_float('Hello World')

( Hello World ) cannot be converted to float.


we can specify the specific error type to supress a specific error in the program. for example __float__ function has exceptions other than ValueError:

In [81]:
float((1, 2))

TypeError: float() argument must be a string or a number, not 'tuple'

Since, we want to supress only ValueError. We can do the following:

In [83]:
def attempt_float(x):
    try:
        return float(x)
    except ValueError:
        return (print('(', x,')', 'cannot be converted to float.'))

In [84]:
attempt_float((2, 1))

TypeError: float() argument must be a string or a number, not 'tuple'

furthermore, we can catch multiple exception types by writing a tuple of exceptions types. for example:

In [85]:
def attempt_float(x):
    try:
        return float(x)
    except (TypeError, ValueError):
        return (print('(', x,')', 'cannot be converted to float.'))

In some cases, you may not want to suppress an exception, but you want some code to be executed regardless of whether the code in the try block succeeds or not. To do this, use __finally__

In [None]:
f = open(path, 'w')

try:
    write_to_file(f)
finally:
    f.close()

Similarly, we can have the code that executes only if the try: block success using __else__

In [None]:
f = open(path, 'w')

try:
    write_to_file(f)
except:
    print('failed')
else:
    print('succeeded')
finally:
    f.close()

***

## Files and Operating System

we can use __open__ a built-in function to read or write a file, either with realive or absolute file path:

In [None]:
path = 'user/helloWorld.txt'

f = open(path)

By deafault, the file is open in read only mode 'r'. we can treat the file handle __f__ like a list and iterate over the lines like so:

In [None]:
for line in f:
    pass

The lines come out of the file with the end-of-line (EOL) markers intact, so you’ll
often see code to get an EOL-free list of lines in a file like:

In [None]:
lines = [x.rstrip() for x in open(path)]

lines

['Sueña el rico en su riqueza,',
'que más cuidados le ofrece;',
'',
'sueña el pobre que padece',
'su miseria y su pobreza;',
'',
'sueña el que a medrar empieza,',
'sueña el que afana y pretende,',
'sueña el que agravia y ofende,',
'',
'y en el mundo, en conclusión,',
'todos sueñan lo que son,',
'aunque ninguno lo entiende.',
'']


When you use open to create file objects, it is important to explicitly close the file when you are finished with it. Closing the file releases its resources back to the operating system:


f.close()

Another way to open files is __with__ statement. It will automatically close the file when exiting the with block. for example:

In [None]:
with open(path) as f:
    lines = [x.rstrip() for x in f]

Some of the most common methods used for readable files are __read__, __seek__ and __tell__. __read__ returns a certain number of characters from the file. what makes a 'character' is determined by the file's encoding or simply raw bytes if the file is opened in binary mode.

In [None]:
In [1]: f = open(path)

In [2]: f.read(10)
Out[2]: 'Sueña el r'

In [3]: f2 = open(path, 'rb') # Binary mode

In [4]: f2.read(10)
Out[4]: b'Sue\xc3\xb1a el '

The read method advances the file handle’s position by the number of bytes read. __tell__ gives you the current position:


In [None]:
In [5]: f.tell()
Out[5]: 11
    
In [6]: f2.tell()
Out[6]: 10

Even though we read 10 characters from the file, the position is 11 because it took that many bytes to decode 10 characters using the default encoding

__seek__ changes the file position to the indicated byte in the file:

In [None]:
In [7]: f.seek(3)
Out[7]: 3
    
In [8]: f.read(1)
Out[8]: 'ñ'

We should always close the files at the end of use.

In [None]:
In [9]: f.close()
    
In [10]: f2.close()

#### Python File Modes

In [None]:
Mode                                         Description

r            Read-only mode

w            Write-only mode; creates a new file (erasing the data for any file with the same name)

x            Write-only mode; creates a new file, but fails if the file path already exists

a            Append to existing file (create the file if it does not already exist)

r+           Read and write

b            Add to mode for binary files (i.e., 'rb' or 'wb')

t            Text mode for files (automatically decoding bytes to Unicode). This is the default if not specified. Add t to
             other modes to use this (i.e., 'rt' or 'xt')

To write text to a file, you can use the file’s write or writelines methods. For example, we could create a version of prof_mod.py with no blank lines like so:


In [None]:
In [11]: with open('tmp.txt', 'w') as handle:
            handle.writelines(x for x in open(path) if len(x) > 1)

In [12]: with open('tmp.txt') as f:
            lines = f.readlines()

Out[12]:
['Sueña el rico en su riqueza,\n',
'que más cuidados le ofrece;\n',
'sueña el pobre que padece\n',
'su miseria y su pobreza;\n',
'sueña el que a medrar empieza,\n',
'sueña el que afana y pretende,\n',
'sueña el que agravia y ofende,\n',
'y en el mundo, en conclusión,\n',
'todos sueñan lo que son,\n',
'aunque ninguno lo entiende.\n']

#### Python file methods or attributes

In [None]:
  Method                                                             Description

read([size])              Return data from file as a string, with optional size argument indicating the number of bytes to read

readlines([size])         Return list of lines in the file, with optional size argument

write(str)                Write passed string to file

writelines(strings)       Write passed sequence of strings to the file

close()                   Close the handle

flush()                   Flush the internal I/O buffer to disk

seek(pos)                 Move to indicated file position (integer)

tell()                    Return current file position as integer

closed                    True if the file is closed

***

### Bytes and Unicodes with Files

The default behavior for Python files (whether readable or writable) is text mode, which means that you intend to work with Python strings (i.e., Unicode). This contrasts with binary mode, which you can obtain by appending b onto the file mode.

In [230]: with open(path) as f:
    chars = f.read(10)

In [231]: chars

Out[231]: 'Sueña el r'

UTF-8 is a variable-length Unicode encoding, so when requested some number of charaters from the file, Python reads enough bytes from the file to decode that many characters. If we open the file in 'rb' mode instead, and read requests exact numbers of bytes: 

In [None]:
In [232]: with open(path, 'rb') as f:
            data = f.read(10)

In [233]: data
Out[233]: b'Sue\xc3\xb1a el '