# Python Bootcamp Day 1 Afternoon

Box problem solution

In [None]:
import random

def individual_strategy(boxes_list, num_tries, num_to_find) :
    """ Implements the cycle strategy for a player to
    look for their number (num_to_find) in a list (boxes_list) with
    a limited number of tries (num_tries). The function
    returns True if the number is found, False otherwise.
    The algorithm treats boxes[i] as a permtuation and
    looks through the cycle containing i = person. """

    count = 0
    idx = num_to_find
    while count < num_tries :
        curr = boxes_list[idx]
        if curr == num_to_find :
            return True
        idx = curr
        count += 1
    return False

def play_box_game(num_ppl, num_tries, strategy, num_runs = 1) :
    """ Given num_ppl, num_tries, a strategy, and num_runs, will
    play the game num_runs times. For each run we randomly shuffle
    the box list. Returns the list of game outcomes."""

    boxes = list(range(num_ppl))
    game_results = []
    while len(game_results) < num_runs + 1 :
        result = True
        # Shuffle the boxes once per game
        random.shuffle(boxes)
        for person in range(num_ppl) :
            result = result and strategy(boxes,num_tries,person)
            if result is False : # We have lost !!!
                break 
        game_results.append(result)
    return sum(game_results)/len(game_results)

In [None]:
play_box_game(10, 5, individual_strategy, 100000)

## Docstrings, comments, and writing style

Along with **good variable names**, docstrings and comments are a great way to keep your code readable and organized. Let's begin with comments.

In [None]:
# This is a comment

four = 4 # This is a comment on the same line as some code

# To make multi-line commnets
# just place a # at the beginning of every line

You should keep your **line length** around **80** characters for readability. __In general, the end of a line in python means the end of a statement or command__. However, sometimes you have a very long expression and you want to use multiple lines to make it look clean.

If your line is getting too long, you can use `\` symbol to tell python to **explicitly** continue on the next line. Additionally, inside `[]`, `()` and similar brackets, python will **implicitly** continue reading on the next line.

In [None]:
# Continuing a line explicitly using \
if 6 < 4 \
    or 2 in [0,2,3] \
    and 'something' != 'else' :
        print("That's a lot of conditions to check.")

# You can also do this in declarations
example = 'this ' + 'is ' + 'not ' + 'the ' + 'best ' +\
    'example ' + 'however ' + 'I ' + 'hope ' + 'you ' + 'get ' +\
    'the ' + 'point'
    
print(example)

**Remark.** Notice that in the above `if` statement, the `or` operator is evaluated **before** the `and`. If you want to be explicit, use `()` around your `and`, `or` expressions. 

**Warning.** You cannot put *any* characters after the line continuation character `\`. Not even a space! Not even a comment.

In most cases, you should use implicit continuation. It generally reads better.

In [None]:
# For continuing implicitly, we can use ()
if (6 < 4
    or 2 in [0,2,3]
    and 'something' != 'else') :
        print("That's a lot of conditions to check.")
             
# In declaratinons we can also use implicit continuation
long_list = ['a', 'very', 'very', 'very', 'very', 'very',
             'very', 'very', 'long', 'list']

print(long_list)

The style guide conventions for python can be found at [www.python.org/dev/peps/pep-0008/](https://www.python.org/dev/peps/pep-0008/)

### Docstrings

It is important to document your functions so that users will know what kind of input the function expects and what kind of output it will produce. To documents a function, we use **docstrings** in python, which are just **triple-quoted** strings on the line right after the `def` statement.

The docstring style guide can be found at [www.python.org/dev/peps/pep-0257/](https://www.python.org/dev/peps/pep-0257/)

In [None]:
"""
This is amulti-line comment 

you can see if this is good style or not by looking style guide

"""

def bits_needed(n) :
    """
    Computes the number of bits needed to represent
    an integer. The count includes the sign.
    
    Parameters
    ----------
    n : int, required
    """
    return len('{0:+b}'.format(n))

In [None]:
help(bits_needed)

### Function annotation

If you ever encoutner something like

In [None]:
def add(x : int, y : int) -> int:
    return x + y

help(add)

know that this is **function annotation**. It is simply for **readabily** and **has no impact on how the code runs.** Python will **not** check the types for you if you write code this way. The above is **equivalent** to

In [None]:
def add(x,y) :
    return x + y

### Functions are Objects!

This makes is very easy to pass functions around. In the box problem, you already saw that I can pass the `strategy` function to my `play_box_game` code.

For another example, you could make a custom function to apply to every element of a list.

In [None]:
def custom(data) :
    return 3*data
    
a = [1,2,3,4]
custom_a = list(map(custom, a)) # apply custom to each element in a
print(custom_a)

Above, you see the wonderful `map` function.
* `map(func, list_1, list_2, ...)` applies `func` to the elements of the supplied lists. When multiple lists are supplied, map returns an **iterator** object containing `func(list_1[i], list_2[i], ...)`, until the shortest list is exhausted.

Here is another example.

In [None]:
def diff_sq(a,b) :
    """ Return the difference squared """
    return (a - b)**2

def dist(x,y) :
    """
    Return the Euclidean distance between vectors x and y.
    
    Implementation detail
    ---------------------
    If x is shorter than y (or vice versa) then y is truncated
    to the length of x """
    # Here, we map the diff_sq function over the two lists, sum the results,
    # and take the square root
    return sum(map(diff_sq, x, y))**(1/2)

print(dist((1,0,2),(1,0,3)))

I stated that `map` returns an **iterator**. An **iterator** is an object that represents a steam of data. Basically, it's an object that supports a `.__next__()` method or can be passed as an argument to the `next(some_iterator)` function. This is a convenient construct because it allows for **lazy evaluation**, i.e. the value of the next object is only computed as needed. For example, 

In [None]:
import random

# generate some random points in the plane with both coordinates
# between 0 and 1
# note : there are better ways to do this, we as will learn later
x_points = [ (random.random(),random.random()) for i in range(10) ]
y_points = [ (random.random(),random.random()) for i in range(10) ]

# print the first distance that's closer than 1/3
idx = 0
map_iterator = map(dist, x_points, y_points)
for d in map_iterator :
    if d < 1/3 :
        print("""Samples {} and {} were at
index {} and have distance {} < 1/3.""".format(x_points[idx],
                                               y_points[idx], idx, d))
        break
    idx += 1

Above, the `dist` function was never called for the points after the found index.

## Functions, passing by value, and scope

In python, parameters are passed to functions as references to objects in memory. Therefore, you can modify the input value of a function only when the object is __mutable__. For example,

In [None]:
def reverse_list(L) :
    """Sorts the contents of L in reverse order. Returns L."""
    print("Local id of L is {}.".format(id(L)))
    L.sort(reverse=True)
    return L

In [None]:
import random
# pick 5 random numbers from 0,...,9
rand_list = random.sample(range(10),5)
print(rand_list)

In [None]:
print("Global id of rand_list is {}".format(id(rand_list)))

In [None]:
reversed_list = reverse_list(rand_list)
print("Global id of rand_list is {}".format(id(reversed_list)))

In [None]:
rand_list is reversed_list

As you can see, `rand_list`, `reverse_list` and even `L` (while inside the function) point to the same bit of memory.

Python does does following when you call `reverse_list`
* declare a namespace for the function call of `reverse_list` (this is called a **stack frame**)
* copy the reference from `rand_list` to the **local** name `L`
* run the indented code under the `def` line
* the local namespace is cleared (i.e. the names used locally in the function are forgotten)

The **scope** of a variable name is a *textual* region of a python program where that variable name is directly accessible. 

In [None]:
L # Even though we ran the function above,
# L is not a variable name outside of the function call

Above, `L` has only **local** scope in the function `reverse_list`.

Namespaces are searched as a nested set. For example,

In [None]:
x = 5
def add_x(y) :
    return x+y # x does not exist in the local scope,
               # so it is pulled from the next one up

In [None]:
x = 165600

add_x(8)

Here, `x` has `global` (or `module`) level scope and can only be found is the global namespace.

In [None]:
x = 5
def add_7(y) :
    x = 7 # x is declared in the local scope
    return x+y # x now has local scope,
               # it is first found in the local namespace

In [None]:
print(add_7(7)) # inside the function x = 7
print(x) # outside the function x = 5

### Global variables

It is not recommended, but at times your code might work better (or read better) if you have a global variable that you would like to reassign or modify in the body of a function. 

In [None]:
L = 'a global list'.split() # outside we have L
print("Global id of L is {}".format(id(L)))

reverse_list(rand_list) # inside we also have L

In [None]:
def reverse_global() :
    global L # we can declare that we us the global L
    L.sort(reverse=True)
    return L

In [None]:
print(reverse_global())
print(L)

In [None]:
# cannot have both a local and global L
def reverse_global_with_arg(L) :
    global L
    L.sort(reverse=True)
    return L

## Positional, named and default parameters

So far, for functions with multiple arguments we have used  **position** to specify which parameter is set to which local variable. For example,


In [None]:
def print_args(a,b) :
    print(8*'-')
    print("a is", a)
    print("b is", b)
    print(8*'-')
    
print_args(5,'foo')
print_args('bar',6)

As you may have noticed earlier, I used a **named parameter** `reverse` in the method call `sort(reverse = True)`. Python allows you to specify the parameter name in the function call

In [None]:
print_args(b = 5, a = 'foo')

In [None]:
# What happens if you don't give all the parameters?
print_args(b = 5)

In [None]:
# What happens if you make up a variable
# name at the time of the call?
print_args(a = 5, c = 'bar')

In fact, the parameter `reverse` in the function call `sort(reverse = True)` also an **optional parameter**. These are also known as **keyword** parameters. Optional parameter can be specified as part of the function definition. They must come **after** positional argumetns and they need to be given a **function-global default value**.

In [None]:
# we define optional parameters the_list, reverse, and print_list
def append_and_sort(data, the_list = [],
                    reverse = False, print_list = False) :
    the_list.append(data)
    # how can we use reverse = revese? Explain!
    the_list.sort(reverse = reverse)
    if print_list :
        print('The list is', the_list)

In [None]:
L = [3,2]
append_and_sort(5, the_list = L)
print(L)

Above, we can still use position to specify parameters. However, we did not need to specify the value of `reverse`, which defaults to `False`.

We can continue to use position to specify the values of optional paramters. However, for optional parameters that have boolean values **this is not recommened.**

In [None]:
# Which arguemnt is set to True? Hard to tell.
append_and_sort(5, L, True)
print(L)

It is **much better** to use the named parameter format (also called keyword format).

In [None]:
# A mix of positional and optional
append_and_sort(5, the_list = L, print_list = True, reverse = True)

Above, we wanted to use the default value of `reverse` but a different value for `print_list`. We **must** use named parameters if we want to **skip** an optional parameter while using some positional arguments.

### Warning about default values
The default value of an optional parameter is a **function-global value**. This means that when the interpreter reads the definition of a function and creates the function object, it constructs the values of the default parameters **for all future function calls**.

In [None]:
append_and_sort(3,print_list = True)
append_and_sort(3,print_list = True)
append_and_sort(3,print_list = True)

Above, the default value of `the_list` is created **once** when the function definition is first read by the interpreter. Thus, **every time that the default value is used, it's the same piece of memory**! Keep in mind that there are two stages of memory allocation for functions.
* Function object stage : the interpreter reads the function definition and creates a function object. The default values of the function object are set.
* Function is called : the interpreter creates a new stack frame and copies the references to the function inputs. If default values are not reassigned by the input, those from the object creation stage are used.

## Sets

As you have seen, lists and tuples can contain equivalent values multiple times inside the list (i.e. `a = [1,1,1,1]` is a totally valid list even though `1 == 1 is True`). Sets in python are designed to contain equivalent values only once.

In python, sets come in two flavors, **mutable** (type `set`) and **immutable** (type `frozenset`)

To create a `set` in python, we can use the `set()` or `frozenset()` constructors.

In [None]:
a = set([1,1,1,'hello'])
b = frozenset([1,1,2,3,2])
print(a)
print(b)

In addition, there is the `{ }` notation, which creates **mutable** sets.

In [None]:
a = {1, 'hello', 492.4, 'hello', 1}
print(a)

Sets also support **iteration** and **set comprehension**

In [None]:
# set comprehension
a = { x % 10 for x in range(15) }
print(a)
# iteration
v = 0
for x in a :
    v += x
print(v)    

**Warning**. Sets are **unordered**, which means that there is **no guarantee** that iteration will be in any particular order.

**Warning.** To create an **empty** set you must always use `set()`.

Sets have plenty of useful methods. Here are some methods that work for both `set` and `frozenset`.
* `len(s)` returns the number of elements in set `s` (cardinality of `s`).
* `x in s` tests `x` for membership in `s`. Similarly `x not in s`.
* `s.isdisjoint(other_set)` returns `True` if the sets have empty intersection, `False` otherwise.
* `s <= other_set` or `s.issubset(other_set)` tests whether `s` is a subset of `other_set`.
   * `s < other_set` tests whether `s` is a **proper** subset of `other_set`.
* `s >= other_set` or `s.issuperset(other_set)` tests whether `other_set` is a subset of `s`.
   * `set > other_set` tests whether `other_set` is a **proper** subset of `s`.
* `s | other_set` or `s.union(other_set)` returns a **new** set which is the union of `s` and `other_set`.
* `s & other_set` or `s.intersection(other_set)` returns a **new** set which is the intersection of `s` and `other_set`.
* `s - other_set` or `s.difference(other_set)` returns `s` minus `other_set` in a **new** set.
* `s ^ other_set` or `s.symmetric_difference(other_set)` returns the symmetric difference in a **new** set.
* `s.copy()` returns a **shallow** copy of `s`

In [None]:
a = {1,2,3}
b = {3,4,5}
c = {4,5}
print(a | b)
print(a & b & c) # can intersect multiple sets like this
print(a - b)
print(a ^ b)
print(c < b)
print(a.intersection(b,c)) # can also intersect multiple sets like this 

There are mutating variants of these operations that only work on `set`.
* `s.add(elem)`
* `s.remove(elem)` removes `elem` from `s` but **raises** KeyError if `elem` is not in `s`.
* `s.discard(elem)` removes `elem` from `s` if present.
* `s.pop()` removes and return an **arbitrary** element from the set. Raises KeyError if the set is empty.
* `s.clear()` removes all elements from `s`.
* `s |= other_set` or `s.update(other_set)`  adds the contents of `other_set` to `s`.
* `s &= other_set` or`s.intersection_update(other_set)`.
* `s -= other_set` or `s.difference_update(other_set)` .
* `s ^= other_set` or `s.symmetric_difference_update(other_set)`.

### Not all objects can be added to a set

Unlike a list, a set in python can only contain **immutable** objects. In fact, sets can only contain **hashable** objects. An object is hashable if it has a method `obj.__hash__()`, `obj.hash()`, or `hash(obj)` returns without error. A hash function should follow the rules
* if `obj_1 ==  obj_2` then `hash(obj_1) == hash(obj_2)`
* if `hash(obj_1) == hash(obj_2)` then `obj_1` and `obj_2` are very likely to be equivalent (==).

Most immutable types in python support hashing. Sets require a hash function to quickly sort, store, and remove objects. You can read more on this at [http://www.laurentluce.com/posts/python-dictionary-implementation/](http://www.laurentluce.com/posts/python-dictionary-implementation/)

In [None]:
from isqrt import isqrt # integer sqrt

def primes_less_than(n) :
    """ Given an integer n, returns the list of primes less than n. """
    if not isinstance(n,int) or n < 3 : return []
    # We start wil all odd numbers less than n
    candidates = set(range(3,n,2))
    # For odd number x = 3, 5, ..., isqrt(n), we will
    # remove x^2, x(x+2), x(x+4),... < n from candidates.
    # Notice that we are removing only the odd ones.
    # Why this works : assume y = a*b with 1 < a <= b 
    # and y < n, then a < int_sqrt(n), so y will be eliminated.
    for i in range(3, isqrt(n) + 1, 2) :
        # note, difference_update can take any iterable
        candidates.difference_update(range(i**2, n, 2*i))
    candidates.add(2)
    return sorted(candidates)

In [None]:
primes_less_than(30)

## Dictionaries

Dictionaries are another very useful built-in type. They map **keys** to **values**. For example,

In [None]:
month_to_days = {'April': 30,
 'August': 31,
 'December': 31,
 'February': 28,
 'January': 31,
 'July': 31,
 'June': 30,
 'March': 31,
 'May': 31,
 'November': 30,
 'October': 31,
 'September': 30}

print("August has {} days.".format(month_to_days['August']))

Above, the months are they **keys** of the dictionary and the days are the **values**. As you can see, we can construct dictionaries using the notation `{ key_1 : value_1, key_2 : value_2, ... }`.

One can also use the `dict()` constructor that takes a list (or any iterable) of **pairs**. For example,

In [None]:
coefs = dict([('x^2',1784),('x',3244), ('1', 9)])
print(coefs)

You can always read values using the `[]` notation or by calling `.get(key)`. For example

In [None]:
print(coefs['1'])
print(coefs.get('x^2'))

Dictionaries are **mutable** objects. So you can use the assignment operator to define a new key-value pair.

In [None]:
coefs['x^3'] = 209
print(coefs)

a = {} # empty dictionary, NOT a set
a['something'] = 'else'
a['something'] = 'overwrite'

print(a)

We can also delete key-value pairs using the `del` keyword (if the pair exists) or you can use `pop()` to return and remove.

In [None]:
del coefs['1']
print(coefs)

value = coefs.pop('x^3')
print(coefs)
print(value)

Here are a few other useful commands.
* `some_dict.keys()` return the keys in `some_dict` as an iterable
* `some_dict.values()` return the values in `some_dict` as an iterable
* `some_dict.items()` return the (key,value) pairs in `some_dict` as an iterable
* `some_dict.popitem()` remove and return some item from `some_dict`
* `some_dict.update(other_dict)` **merge** the contents of `other_dict` into `some_dict`, overwriting for equivalent keys. 
* `some_dict.clear()` remove all contents from a dictionary
* `some_dict.copy()` return a **shallow** copy of the dictionary

Here is a simple function that counts letters in a sting and returns a dictionary with the count.

In [None]:
def char_count(string) :
    char_dict = {}
    for c in string :
        char_dict[c] = string.count(c)
    return char_dict

print(char_count("some random string"))

#### Not all objects can be keys
Just like with sets, the keys of a dictionary have to be **immutable** (and **hashable**) types. Also, just like with sets, **keys don't all have to be the same type**.

In [None]:
f = {1 : 2, '1': 2, ('a','non','mutable', 'tuple') : 2}
print(f)

In [None]:
f = {['some', 'mutable', 'list'] : 2}

#### Membership, iteration and dictionary comprehension

Just like all other container types we have seen, we can test membership in dictionaries, iterate over dictionaries and use comprehensions to define them. Since dictionaries have both keys and values, we have to be a little careful.

Dictionaries only support the `key in dict` and `key not in dict` for **keys**.

In [None]:
coefs = dict([('x^2',1784),('x',3244), ('1', 9)])

if '1' in coefs :
    print("'1' is a key of coefs")
if 9 in coefs.values() :
    print("9 is a value of coefs")


**Remark**. If you need to search for a value, you can always do that by looking in `some_dict.values()`

We can also iterate over the keys, values, or key-value pairs in a dictionary

In [None]:
# iterate over keys
for k in coefs.keys() :
    print("Found key", k)

print('-'*20)

# iterate over values
for v in coefs.values() :
    print("Found value", v)
    
print('-'*20)

# iterate over keys and values
for k, v in coefs.items() :
    print("Key {} has value {}.".format(k, v))

**Note:** to iterate over keys, you can also use `for k in some_dict.keys()`.

Finally, dictionary comprehension is very similar to sets, however, we must use the `:` to separate keys and values.

In [None]:
self_powers = { i : i**i for i in range(10) }  
print(self_powers)

In [None]:
coefs_squared = { k : 2*v for k,v in coefs.items() }
print(coefs_squared)

If you want, you can also think of a dictionary as a map and easily define its inverse with list comprehension. This only works if the values are also hashable.

In [None]:
self_roots = { v : k for k,v in self_powers.items() }
print(self_roots)

**Warning.** Dictionary views such as `.keys()`, `.values()`, and `.items()` are guaranteed to be in insertion order **only** as of python 3.7. Sets still have **no** such guarantee.

## Sorting

The built-in call `sorted` can work on any iterable such as a `map` object, a tuple, list, or string (along with others).

In [None]:
sorted('a string')

One can use a **key function** to help you sort. A `key` function takes **one argument** assigns it a **comparable type**, such as an integer. The idea is that for every element in a list, you compute a value that you then use to compare. For example, 

In [None]:
def count_t(s) :
    return s.count('t')

data = "a long string that we will split into words".split()
sorted_data = sorted(data, key = count_t, reverse = True)
print(sorted_data)

We can also use key functions when looking for minimums and maximums.

In [None]:
max(data, key = count_t)

There is a module called `operator` that includes the functions `itemgetter` and `attrgetter`. Calling `operator.itemgetter(i)` returns a function with gives the $i^\text{th}$ element in a list. The function `operator.attrgetter('attr_name')` returns a function which gives the attribute with `attr_name` of an object. Here are two examples.

In [None]:
from operator import itemgetter

def get_first(x) :
    return x[1]

a = [ ('A',0, 2), ('G', -2, 4) ]
a_sorted = sorted(a, key = itemgetter(1))
print(a_sorted)

In [None]:
from operator import attrgetter

def get_imag(x) :
    return x.imag

a = [ 1.1 + 5.2j, 3.0 + 6.7j, -2.2 - 3.1j]
a_sorted = sorted(a, key = attrgetter('imag'))
print(a_sorted)

Further, if you like to write complicated comparison functions, you can use `functools.cmp_to_key`. This will generate from any comparison function an comparable object used for sorting.

In [None]:
from functools import cmp_to_key

def weird_cmp(x,y) :
    """ Returns negative number is x < y, zero if x = y,
    or a positive number if x > y"""
    if isinstance(x,str) :
        return -1
    elif isinstance(y,str) :
        return -1
    elif x == y :
        return 0
    return -1 if x < y else 1

sorted([3,2,1,'w'], key = cmp_to_key(weird_cmp))

## Basic exception handling

Exceptions in python are tools to tell the interpreter that something "bad" has happened and that the code **should stop** executing. This could be because someone gave bad input, there was a problem writing or reading data, or a function returned unexpected output (note: you should check the output of poorly documented functions that you don't trust). The keywords here are

* `assert some_boolean_statement` will cause the code to stop and raise an AssertionError.
* `raise exception_type("some user message")` will cause the code to stop and raise an exception of type `exception_type` and a user message `"some user message"`. There are many many built-in exception types. 

Some useful exception types are
   * `Exception()` this is the general type
   * `RuntimeError()` something bad happened as your code was running
   * `TypeError()` bad type given to function
   * `ValueError()` good type, but bad value
   * `SyntaxError()` bad syntax, couldn't parse

In [None]:
assert isinstance(3,int)
print("Assertion passed!")

In [None]:
assert type(1.) is int
print("Assertion passed!")

In [None]:
raise Exception("Instead of returning, \
I can raise an exception on bad input")

### Catching exceptions : try, except, finally block

If we want to catch exceptions, we can do this with a `try, except, finally` block

In [None]:
[0,1,2,3].index(5)

In [None]:
try :
    [0,1,2,3].index(5)
    print("I will only run in the lines above me don't fail")
except ValueError : # Run if ValueError is raised
    print("Got a ValueError")
finally : # Will always be run at the end
    print("I always run at the very end")

You don't have to use the `finally` block at all, if you don't need it. Also, if you want to catch all exceptions, you can just use `except` without the exception type.

In [None]:
def max_or_minusone(t) :
    """ Tries to returns max(t) ,
    or -1 if max(t) fails. """
    try :
        return max(t)
    except :
        return -1

In [None]:
max_or_minusone([9,4,4])

Lastly, if you want to do nothing, just use the `pass` in the exception block. The `pass` statement does nothing. It can be used when a statement is required syntactically but the program requires no action. You can also use it as a place holder while working on code.

In [None]:
something = 'a string'
try :
    someting += 1
except : # catch all exceptions
    pass # TODO : figure out what I want to do
print(something)

## Input and Output

### Reading files

We can use the `open` command to open (text) files to read line by line. For example, say I have a file called `some_data.csv` in my current working directory, then I can read it as follows.

In [None]:
try :
    file_handle = open('some_data.csv','r')
    for line in file_handle :
        print(line)
    file_handle.close()
except :
    print("Failed to open file.")

In `open(path_to_file, mode)`
  * `path_to_file` can be a global path or a path **relative to your current working directory**.

We can also use other methods to read the text data.

* `file_handle.read()` returns the **whole** content of the file as a string
* `file_handle.read(size)` return the at most (the first) `size` characters (or bytes) of the file
* `file_handle.readline()` returns the whole next line, including the end of line character
* `list(file_handle)` or `file_handle.readlines()` returns a list of all the lines in the file

In [None]:
file_handle = open('some_data.csv','r')

In [None]:
file_handle.readline()

In [None]:
# don't forget to close
file_handle.close()

To get rid of of the new line character in a newly read string, we can use `.rstrip()` or `.rstrip('\n')`

In [None]:
file_handle = open('some_data.csv','r')
data = []
for line in file_handle :
    clean_line = line.rstrip()
    fields = clean_line.split(',')
    data.append(fields)
file_handle.close()

# prints each element of data
# separated by a new line character
print(*data, sep = '\n')

When you read a file, the file object returned by `open` will keep track of its position in the file. You can view and change this position by using

* `file_handle.tell()` returns an integer giving the current position in the file counted by the number of characters (or bytes) from the beginning of the file
* `file_handle.seek(offset)` changes the current position in the file by moving `offset` number of characters from the beginning

In [None]:
file_handle = open('some_data.csv','r')
file_handle.seek(80) # move to 80 characters from beginning
print(file_handle.readline())
file_handle.close()

### Writing files

We use the `open(path_to_file, mode)` command again, but here the `mode` parameter will be different.

Here are all the `mode` parameter options.

* `'r'` open for reading (default)
* `'w'`	open for writing, **erasing** the file first
* `'x'` open for exclusive creation, failing if the file already exists
* `'a'` open for writing, appending to the end of the file if it exists
* `'b'` binary mode
* `'t'`	text mode (default)
* `'+'`	open a disk file for updating (reading and writing)

If you are creating a **new** file that shouldn't already exist, use the `'x'` option to write it.

In [None]:
import random

random_ints = random.sample(range(1000), 5)
# chaning the 'x' to a 'w' this will overwrite the
# file everytime I run this code
file_handle = open('random_numbers.txt','x')
for i in random_ints :
    file_handle.write(str(i)+'\n')
file_handle.close()

f = open('random_numbers.txt','r')
print(f.read())
f.close()

In [None]:
random_ints = random.sample(range(1000), 5)
file_handle = open('random_numbers.txt','a')
for i in random_ints :
    file_handle.write(str(i)+'\n')
file_handle.close()

f = open('random_numbers.txt','r')
print(f.read())
f.close()

The final three modes `'b', 't', '+'` are modifier modes. For example, to read a binary file, you would use the mode `'rb'`, to write a `'xb'`, etc.

The `'+'` mode modifier allows for simultaneous reading and writing with, for example, the `'r+'` mode. However, I do **not recommend** using this mode unless you really really have to. It can also behave differently on windows and unix systems.

In [None]:
# open the file as binary
file_handle = open('random_numbers.txt','rb')
# read the first 20 bytes and
# print them *interpreted as a string!*
print(file_handle.read(20))
file_handle.close()

### `with` statement of opening files

Python has a nice statement called the `with` statement. On some level with is equivalent to placing the file open commands in a `try` block. Under the hood, this is called a **context manager**

In [None]:
with open('random_numbers.txt', 'r') as file_handle :
    print(file_handle.readlines())
# the with statement will automatically close the file!

### The `io` module

If you want to work with string streams or byte streams, you should use the `io` module. In fact, the `open()` function is the same as `io.open()`, at least in python 3.7.

In [None]:
import io

with io.StringIO() as text_stream :
    text_stream.write("First line\n")
    text_stream.write("Second line")
    text_stream.write(" with more text\n")
    print(text_stream.getvalue())
    text_stream.write("Another line")
    text_stream.seek(11)
    print(text_stream.readlines())

# if not using a with statment, be sure to call close

Similarly, is a buffered bytes stream class `io.BytesIO` along with other base classes. 

## Some modules for working with files

Just like you can work with files in the command line, you might want to list directories, copy files, and get system information from within your programs. The classic modules for this task are the `os`, `shutil`, and `glob` modules.

A new object oriented module is called `pathlib`

In [None]:
import os
import shutil
import glob

import pathlib

### File paths

For the `os.path` module, paths are strings and can be manipulated as follows.

In [None]:
# get the present (or current) working directory
cur_path = os.getcwd()
print(cur_path)

In [None]:
# get the abosulte path of a relative path
abs_path_up = os.path.abspath('..')
print(abs_path_up)

In [None]:
# get the base name of a file, including the extension
file_name = os.path.basename('../lectures/day-1/morning.ipynb')
print(file_name)

In [None]:
# get the directory name
dir_path = os.path.dirname('../lectures/day-1/morning.ipynb')
# NOTE : observe that the os.path module treats paths as strings
# so the dirname is the relative directory name
print(dir_path)

In [None]:
# get both the dir and file
dir_name, file_name = os.path.split('../lectures/day-1/morning.ipynb')
print(dir_name)
print(file_name)

In [None]:
# split off the file extension
root,ext = os.path.splitext('../lectures/day-1/morning.ipynb')
print(root)
print(ext)

In [None]:
# check if a path is absolute
print(os.path.isabs('../lectures/day-1/morning.ipynb'))
print(os.path.isabs('/Users/'))

In [None]:
# returns a canonial path for a file
print(os.path.realpath('../lectures/day-1/morning.ipynb'))

In [None]:
# joint one path relative to another (or several)
old_path = "/Users/yarmola/"
new_path = os.path.join(old_path, 'python-bootcamp', 'lectures/day-1/',
                        'mornging.pdf')
print(new_path)
# Notice that os.path.join takes care of all the slashes for you

The `os.path` module **does not automatically check for you if a file at a given path actually exists**. To do this, you can specifically check :

In [None]:
# check if there really is a file at a given path
print(os.path.isfile(new_path))

In [None]:
# check if there really is a dir at a path
print(os.path.isdir('/Users'))

### Copy, move, delete and rename files and directories

The `shutil` and `os` modules contains all the useful tools for these operations. Here are some examples

In [None]:
# copy file or directory
path_of_copy = shutil.copy('some_data.csv', 'more_data.csv')
print(path_of_copy)

In [None]:
# delete
os.remove(path_of_copy)

In [None]:
# make direcory
os.mkdir('test_dir_new')

In [None]:
# remove directory
os.rmdir('test_dir_new')

In [None]:
# change current working directory
# this is the same as `cd` in the terminal app
os.chdir('..')
print(os.getcwd())
os.chdir('python-bootcamp/lectures')

### Walking a directory tree

There are two useful tools to walk a directory tree. First there is `os.walk`

In [None]:
for root, dirs, files in os.walk('day-1') :
    print('='*40)
    print("The root directory is :", root)
    print('-'*40)
    print("The root directory has subdirectories:", *dirs, sep='\n' )
    print('-'*40)
    print("The root directory has files:", *files, sep='\n')
    print('='*40)
    print('\n')

Other one is `glob`. It uses the **wildcard** character `*` **in path names**. For example, `../lectures/day-1/*.ipynb`.

* `glob.iglob(path, recursive=False)`
   * return a possibly-empty list of path names that match `path`.  The argument `path` can be either absolute (like `/Users/yarmola/*`) or relative (like `../../Tools/*/*.gif`), and can contain shell-style wildcards.
   * if you set recursive = True then `**` will match any files and zero or more directories and subdirectories.



In [None]:
# print all files and directories relative to ..
for x in glob.iglob("../*") :
    print(x)

In [None]:
# get all pdf files two directories down
for x in glob.iglob("../*/*/*.ipynb") :
    print(x)

In [None]:
# get all directories and files in the tree below ..
for x in glob.iglob("../**", recursive = True) :
    print(x)

### `pathlib` module

The `pathlib` module is a new object oriented version of may of the `os.path` methods. Usually, you will only need `from pathlib import Path`. Here is a mapping of the relevant methods

|||
|--- |--- |
|os.path.abspath()|Path.resolve()|
|os.chmod()|Path.chmod()|
|os.mkdir()|Path.mkdir()|
|os.rename()|Path.rename()|
|os.replace()|Path.replace()|
|os.rmdir()|Path.rmdir()|
|os.remove(), os.unlink()|Path.unlink()|
|os.getcwd()|Path.cwd()|
|os.path.exists()|Path.exists()|
|os.path.expanduser()|Path.expanduser() and Path.home()|
|os.path.isdir()|Path.is_dir()|
|os.path.isfile()|Path.is_file()|
|os.path.islink()|Path.is_symlink()|
|os.stat()|Path.stat(), Path.owner(), Path.group()|
|os.path.isabs()|PurePath.is_absolute()|
|os.path.join()|PurePath.joinpath()|
|os.path.basename()|PurePath.name|
|os.path.dirname()|PurePath.parent|
|os.path.samefile()|Path.samefile()|
|os.path.splitext()|PurePath.suffix|
|glob.iglob() | Path.glob()|


## Module name aliases and reloading

So far we have been loading modules by using `import module_name`. This process **executes all lines of the code in the file `module_name.py`**, creates an object `module_name`, and attaches to this object all the functions defined in `module_name.py` as function attributes and all (module) global variables as data attributes. For example, a file called `module_example.py` with the contents : 
```python
a = 7

print("Loaded with name :",__name__)

def print_a() :
    print(a)
    
def print_hello() :
    print("Hello!")
```
can be loaded with `import` and has the data attribute `module_example.a` and function attribute `module_example.print_a`.

In [None]:
%cat module_example.py

In [None]:
import module_example
print('-'*10)
print(module_example.a)
print(module_example.print_a)
module_example.print_a()

As you can see above, the module as a global `__name__` variable. The global variable `__name__` is set to the filename when a module is imported. Later, when we learn to use out modules as **scripts**, the varaible `__name__` will become more important.

One thing we can do is create an **alias** for the module object by importing as follows.

In [None]:
import module_example as me
print('-'*10)
print(me.a)
print(me.print_a)
me.print_a()

**Calling `import` twice on a module that is already loaded, does not reload it**. You may have noticed that `print("Loaded with name :",__name__)` didn't run again in the above code. Therefore, if you have made changes to your file and want to update the module in your interpreter, you will need to use the `importlib` module.

In [None]:
import importlib as impl
impl.reload(me)
me.print_a()

## Dependency Management

The last thing I wanted to start talking about today is dependency management. For this, we will use `pipenv`, though there are many guides for using other package managers.

Let's say we want to start with a special version of python.

```mkdir new_project
cd new_project
pipenv --python 3.6.7
```

Let's install some packages

```
pipenv install requests
```

We can see what we have installed with `pipenv graph`.

In your project directory, there will only be two new files.

In [None]:
%ls new_project

The `Pipfile` and `Pipfile.lock` are files managed by `pipenv` to keep track of you current environment. The `Pipfile` tracks your project's top-level dependencies, while `Pipfile.lock` tracks all sub-dependencies and hashes of packages that should be on the system. Both files should be in your `git` repository.

Let's say that version `2.22.0` is too new for us and we want at most `2.21.0`.We have two options :
  * run `pipenv install "requests<=2.21.0"` which will overwrite the `2.22.0` install, update the lock file and install everything
  * edit the `Pipfile` to have version "<=2.21.0" and run `pipenv update`

**Remark 1**. If you have any old `pip`-style requirements files, you can load them into `pipenv` with

`pipenv install -r requirements.txt`


**Remark 2**. The virtual environment files are by default installed into `~/.local/share/virtualenvs/`. If you would like `pipenv` to install virtual environment packages in the project directory (because, for example, some production service does not have user directory access), you can `export PIPENV_VENV_IN_PROJECT=1`

The calls ` pipenv shell` and ` pipenv run` let you execute your code in your new virtual environment. To work interactively, `pipenv shell` hands you off to a new shell inside your virtual environment (wrapping the usual `activate` and `deactivate` calls). 

To check for public security vulnerabilities listed at [pyup.io](https://pyup.io) and PEP 508 requirements, you can run `pipenv check`

You can install development only packages with `pipenv install --dev`. This will allow you to keep development packages out of production as `pipenv` install by default only installs non-development packages.

Let's make a simple git repo and clone it to see what happens

```git init
git add Pipfile Pipfile.lock
git commit -m "pipenv setup"
mkdir ../clone
cd ../clone
git clone ../new_project
cd new_project
export PIPENV_VENV_IN_PROJECT=1

pipenv sync
```

We can now play with the `requests` module with

```
pipenv shell
> python
> import requests
> text_request = requests.get("http://www.randomtext.me/api/lorem/p-5/5-10/")
> text_request.text
```

**Remark**. If you ever need to generate a `requirements.txt` files (or for `install_requires` in your `setup.py`), you can run `pipenv lock -r > requirements.txt`

A good guide for using `pipenv` can be found here [realpython.com/pipenv-guide/](https://realpython.com/pipenv-guide/)

### Using your virtual environment with jupyter

If you want to use jupyter lab while developing, you could install jupyter as as `pipenv install --dev jupyterlab` inside every one of your projects. A better alternative however, is to keep the jupyter installation separate and add your virtual environment notebook kernel to the jupyter interface.

```
pipenv install --dev ipykernel
pipenv run python -m ipykernel install --user --name=new_project
```


In [None]:
import requests

text_request = requests.get("http://www.randomtext.me/api/lorem/p-5/5-10/")
text_request.text

In [None]:
import flask