# Default arguments

In [None]:
def default_arg(x, exponent=2):
    """Function with a default argument.
    Note how the name of the argument explains what it does.
    Always prefer meaningful names."""
    return x**exponent

In [None]:
default_arg(2) # can be called without the second argument

In [None]:
default_arg(2, 3)

In [None]:
# you can type out the argument name making
# the call more readable
default_arg(2, exponent = 5)

## Returning more than one thing

In [None]:
def return_two():
    return 1, 2 # will automatically turn into a tuple

In [None]:
return_two()

In [None]:
foo = 12
bar = 42

## Assigning more than one thing

In [None]:
foo, bar = bar, foo

In [None]:
foo, bar

In [None]:
one, two, three = range(1, 4) # lists work as well

In [None]:
one, two, three

# Dictionaries

Dictionaries or `dict`s for short are one of the cornerstones of python. You create them with curly brackages `{` and `}`.

In [None]:
my_dict = {1000: 'a', 1024: 'b'}
my_dict

In [None]:
# Dicts' elements are retrieved like elements of a list.
my_dict[1000]

In [None]:
my_dict.keys() # retrieve only the keys

In [None]:
my_dict.values() # retrieve the values

In [None]:
# keys and values are always in sync
my_dict.values()[my_dict.keys().index(1000)] == my_dict[1000]

In [None]:
# dict values can be nearly anything
my_dict[500] = []

In [None]:
my_dict

Dict **keys** need to be hashable.

In [None]:
hash(12) # hashable

In [None]:
hash(2.3) # hashable

In [None]:
hash('hello') # hashable

In [None]:
hash((1, 'foo')) # hashable

In [None]:
hash([1, 'foo']) # _not_ hashable

In [None]:
hash({'foo': 12}) # _not_ hashable

Dicts can emulate a sort of `switch` statement that Python lacks.

In [None]:
fns = {'sum': sum,
       'len': len}

In [None]:
fns['sum']([1,2,3])

In [None]:
fns['len']([1,2,3])

In [None]:
# remove and return the value corresponding
# to the key '500'
my_dict.pop(500)

In [None]:
my_dict # doesn't contain the pair (500, []) anymore

## Avoiding errors

Element access with square brackages (i.e. `my_dict['foo']`) raises an exception when the key (`'foo'` in this example) is not found. The methods `get` and `setdefault` are here to help.

In [None]:
my_dict['foo']

In [None]:
# return "Not Here!" if key not found.
my_dict.get('foo', "Not Here!")

In [None]:
# still the same
my_dict

In [None]:
# return 5*5 = 25 if key 5 not found.
# in addition, add the key and value to the dict
my_dict.setdefault(5, 5*5)

In [None]:
my_dict # now contains (5, 25)

In [None]:
# setdefault calls to a key that's already contained 
# in the dict will return the previously stored value
# and _not_ modify the dictionary
my_dict.setdefault(5, "nope")

In [None]:
my_dict

The `in` keyword can be used to check a dict for presence of a key.

In [None]:
5 in my_dict

# Dict comprehensions

Dict comprehensions work just like list comprehensions.

In [None]:
[i*2 for i in [1,2,3]]

In [None]:
{i: i**2 for i in [2,3,4]}

# Special dictionaries

The `dict` subclasses `defaultdict` and `Counter` are a great way of keeping count of things, i.e. the number of occurences of words in a text.

In [None]:
from collections import defaultdict, Counter

The `defaultdict` class uses a function taking no arguments and returning a default value for dictionary **values**. The built-in `int` function (and type, this a bit strange in Python) is one such example.

In [None]:
int

In [None]:
int()

By the way, `float` and other types work very similarly.

In [None]:
float()

In [None]:
# if a key is not present, use the
# int function to create its value
count_dict = defaultdict(int)

In [None]:
# adds the key 'apple' with value int()
# and returns the value
count_dict['apple']

In [None]:
count_dict

In [None]:
count_dict['orange'] += 1

In [None]:
count_dict

## Counter

The `Counter` class is initialized with anything iterable (e.g. lists, tuples, etc.) and can give you quick answers to questions like 'what are the 10 most used words in this document'.

In [None]:
counter = Counter([1,2,2,2,3,3,5])

In [None]:
counter

In [None]:
counter.most_common(2)

# Gotchas

We'll now see some common pitfalls.

In [None]:
def add_one(some_list):
    some_list.append(1)

In [None]:
my_list = []

In Python, function parameters that are not primitive types (like `int`, `float`, `str`) are ['passed by reference'][pbr], which means they can be modified inside function calls.

[pbr]: https://en.wikipedia.org/wiki/Evaluation_strategy#Call_by_reference

In [None]:
add_one(my_list)

In [None]:
my_list

Default values for function parameters are created **just once**. Be careful when modifying them.

In [None]:
def default_list(li = []):
    li.append(1)
    return li

In [None]:
default_list()

In [None]:
default_list()

In [None]:
# if you need complex default values, create them like so:
def better_list(li = None):
    if li == None:
        li = []
    li.append(1)
    return li

In [None]:
better_list(), better_list(), better_list()

In [None]:
my_dict[[1,2]]

In [None]:
# Break question: can functions be
# dictionary keys? Yes, they can!
{sum: 5}

# Argument lists

In [None]:
# functions can have an arbitrary number of arguments
def print_many(*args):
    for i in args:
        print i

In [None]:
print_many(1,4,'foo')

To better understand the example below, note that you can call the sum function like this:

In [None]:
sum([1,2,3])

But not like this:

In [None]:
sum(1,2,3)

In [None]:
def my_sum(*args):
    # args is a tuple in here
    return sum(args)

In [None]:
my_sum(1)

Now, `my_sum` *can* be called like in the **second** example above.

In [None]:
my_sum(1,2,3,4,5,6)

In [None]:
range(10)

Lists can be turned into many arguments with a star in the function call.

In [None]:
my_sum(*range(10))

## Why?

Argument lists are most often used to pass arguments on to *another* function.

In [None]:
def apply_to_many(fn, *args):
    return fn(args)

In [None]:
apply_to_many(sum, 1, 2, 3)

In [None]:
# in the same way, functions can have abritrary
# _named_ arguments, i.e. arguments in the form
#  f(foo = 1, bar = 12)
def many_named_args(**args):
    # args is a dictionary
    print args

In [None]:
many_named_args(arg1=42, arg2=9, name="James")

In [None]:
# just like argument lists, this technique is often used
# to pass arguments on to another function
def wrapped(fn, *args, **kwargs):
    if fn == sum or fn == len:
        return fn(*args, **kwargs)
    else:
        return None

In [None]:
wrapped(sum, [1,2,3])

In [None]:
wrapped(int, 3, key=5)

# Decorators

Argument lists and dictionaries can be used to make new functions of exisiting ones. Python provides a beautiful syntax for this called *decorators*.

In [None]:
def cached(fn):
    """Decorator to cache the result of function calls."""
    result_cache = {}
    def inner(*args):
        print result_cache # for educational purposes
        # HOMEWORK: replace the lines below with _one_ call
        #           to dict.setdefault
        if args in result_cache:
            return result_cache[args]
        else:
            result = fn(*args)
            result_cache[args] = result
            return result
    return inner

The function above can be used to make a new function of an exisiting one.

In [None]:
my_cached_sum = cached(my_sum)

In [None]:
my_cached_sum(1,2,3)

In [None]:
my_cached_sum(1,2,3)

The two cells below are identical but the upper one has a nicer syntax. This is called decorators.

In [None]:
@cached
def my_cached_sum(*args):
    return sum(args)

In [None]:
def my_cached_sum(*args):
    return sum(args)

my_cached_sum = cached(my_cached_sum)

# More on reading .csv

The Python `csv` module makes reading data from `csv` files much easier.

In [None]:
import csv

In [None]:
data = []
with open('data/trends.csv') as trends:
    reader = csv.DictReader(trends)
    for i in reader:
        data.append(i)

In [None]:
data[:5]

# JSON

Javascript object notation or `JSON` is a standard format used all over the web to pass around data in a structured way. It looks very similar to Python dictionaries and lists.

In [None]:
import json

In [None]:
# make python objects out of a string
json.loads("""
{"foo": 12,
 "bar": [1,2,5]}""")

In [None]:
data = json.loads("""
{"foo": 12,
 "bar": [1,2,5]}""")

In [None]:
data, type(data)

In [None]:
# make a valid JSON string out of a python object
print json.dumps(data, indent=2)

# Objects

Objects are an important concept in modern programming languages. Objects are *instances* of classes. There can be many instances of any given class.

In [None]:
# empty class
class Person(object):
    pass

In [None]:
# create an object, i.e. and _instance_ of that class
kirk = Person()

In [None]:
kirk

In [None]:
# set properties
kirk.firstname = "James"

In [None]:
kirk.middlename = "Tiberius"

In [None]:
kirk.lastname = "Kirk"

In [None]:
# access properties
kirk.firstname

In [None]:
# create another instance
spock = Person()

In [None]:
# spock has no lastname property, but kirk has
spock.lastname

It is usually not desirable to have objects of the same class having different attributes.

In [None]:
# set a single attribute
class BetterPerson(object):
    is_better = True # attribute

In [None]:
guy = BetterPerson()

In [None]:
guy.is_better

In [None]:
guy.is_better = False # change attribute

In [None]:
guy.is_better

In [None]:
other_guy = BetterPerson()

In [None]:
# changing the attribute on one object won't
# affect other objects' attrbutes
other_guy.is_better

The method above will create all objects of the same class with the **same** *value* for the attribute. Usually, we want an attribute to have a different value for each object, like the name of a person. To this end, we use the special `__init__` function, also known as the *constructor*.

In [None]:
class EvenBetterPerson(object):
    def __init__(self, firstname, lastname):
        self.firstname = firstname
        self.lastname = lastname

In [None]:
kirk_v2 = EvenBetterPerson("James T.", "Kirk")

In [None]:
kirk_v2.firstname, kirk_v2.lastname

In [None]:
EvenBetterPerson() # can't create without providing info

## Inheritance

If we want to add functionality to a class, we could just copy and paste.

In [None]:
class EvenBetterPersonWithPrint(object):
    def __init__(self, firstname, lastname):
        self.firstname = firstname
        self.lastname = lastname
    def print_me(self):
        #print "Person: {me.lastname}, {me.firstname}".format(me=self)
        print "Person: " + self.lastname + ", " + self.firstname

In [None]:
p = EvenBetterPersonWithPrint("Stephen", "Hawking")
p.print_me()

In [None]:
p.firstname

In [None]:
p.lastname

In [None]:
p_prime = p

In [None]:
p_prime.lastname, p_prime.firstname

In [None]:
# classroom question: How do I delete things?
del p_prime

In [None]:
p_prime

Adding functionality can also be achieved by letting a class *inherit* from a base class.

In [None]:
class PersonWithFullName(EvenBetterPersonWithPrint):
    # PersonWithFullName will have all the attributes and
    # methods of EvenBetterPersonWithPrint, plus everything
    # defined in here
    def get_full_name(self):
        return self.firstname + " " + self.lastname

In [None]:
# the constructor still works
p = PersonWithFullName("Donald", "Trump")

In [None]:
# so does the print_me method
p.print_me()

In [None]:
# additional functionality
p.get_full_name()

In [None]:
type(p)

In [None]:
# classrom question (and answer):
# How do I know if I deal with a subclass?
isinstance(p, EvenBetterPersonWithPrint)

# Web scraping

Scraping (sometimes: *crawling*) is a great way of retrieving information from the internet in a strucutred way. But you can also do a lot of harm.

- Be nice.
- Follow the rules.
- Read the terms and conditions.
- Read the robots.txt.

## HTML

HTML is the language of websites.

In [None]:
html = open('example.html').read()

In [None]:
print html

Learn more about HTML in the [W3 HTML tutorial](http://www.w3schools.com/html/).

In [None]:
# To extract information from HTML pages easily,
# we will use BeautifulSoup
from bs4 import BeautifulSoup

In [None]:
soup = BeautifulSoup(html, 'lxml')

In [None]:
# body tag
soup.body

In [None]:
# head tag
soup.head

In [None]:
# _first_ list item
soup.li

In [None]:
# find _all_ list itmes
soup('li')

In [None]:
# get the _first_ list
soup.ul

In [None]:
# first li tag in the first ul tag
soup.ul.li

In [None]:
soup.li.text # get the text

In [None]:
# get the 'src' attribute of the first image tag
soup.img['src']

In [None]:
# get the tag name
soup.img.name

In [None]:
# get the tag name of the image's parent tag
soup.img.parent.name

In [None]:
# make a list from the list's child tags
list(soup.ul.children)

In [None]:
soup('h2') # find all second-level headlines

In [None]:
# find (potentially) all second-level headers
# with id 'list-header', though ids should be unique
soup('h2', {'id': 'list-header'})

In [None]:
# you'd use this to follow all links on a page ...
soup('a')

# Scrapy

You will find the scraping examples at https://github.com/dhesse/stk_inf_scraping.

In [None]:
# urlparse is a useful tool to parse URLs
from urlparse import urljoin

In [None]:
urljoin('http://localhost:8888/files/example.html', 'example2.html')