Welcome to day 7 of the Python Challenge. The very last one!

If you missed any of the previous days, here are the links:

- [Day 1 (syntax, variable assignment, numbers)](https://www.kaggle.com/colinmorris/learn-python-challenge-day-1)
- [Day 2 (functions and getting help)](https://www.kaggle.com/colinmorris/learn-python-challenge-day-2)
- [Day 3 (booleans and conditionals)](https://www.kaggle.com/colinmorris/learn-python-challenge-day-3)
- [Day 4 (lists and objects)](https://www.kaggle.com/colinmorris/learn-python-challenge-day-4)
- [Day 5 (loops and list comprehensions)](https://www.kaggle.com/colinmorris/learn-python-challenge-day-5)
- [Day 6 (strings and dictionaries)](https://www.kaggle.com/colinmorris/learn-python-challenge-day-6)

Today's topic is... imports and libraries.

**TODO: write intro later when content is more firmed up. May want to mention intent of preparing them for their future encounters with non-standard Python libraries, esp. for data science.**

In [None]:
# import sys; sys.path.append('../input/whatever'); For purpose of starting with small module import example?
# or, idk, maybe not necessary

## Imports

So far we've talked about types and functions which are built-in to the language. 

But one of the best things about Python (especially if you're a data scientist) is the vast number of high-quality custom libraries that have been written for it. 

Some of these libraries are in the "standard library", meaning you can find them anywhere you run Python. Others libraries can be easily added, even if they aren't always shipped with Python.

Either way, we'll access this code with **imports**.

We'll start our example by importing `math` from the standard library.

In [1]:
import math

print("It's math! It has type {}".format(type(math)))

It's math! It has type <class 'module'>


`math` is a module. A module is just a collection of variables (a *namespace*, if you like). We can see all the names in `math` using the built-in function `dir()`.

In [3]:
print(dir(math))

['__doc__', '__loader__', '__name__', '__package__', '__spec__', 'acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign', 'cos', 'cosh', 'degrees', 'e', 'erf', 'erfc', 'exp', 'expm1', 'fabs', 'factorial', 'floor', 'fmod', 'frexp', 'fsum', 'gamma', 'gcd', 'hypot', 'inf', 'isclose', 'isfinite', 'isinf', 'isnan', 'ldexp', 'lgamma', 'log', 'log10', 'log1p', 'log2', 'modf', 'nan', 'pi', 'pow', 'radians', 'sin', 'sinh', 'sqrt', 'tan', 'tanh', 'trunc']


We can access these variables using dot syntax. Some of them refer to simple values, like `math.pi`:

In [4]:
print("pi to 4 significant digits = {:.4}".format(math.pi))

pi to 4 decimal places = 3.142


But most of what we'll find in the module are functions:

In [8]:
math.log(32, 2)

5.0

Of course, if we don't know what `math.log` does, we can call `help()` on it:

In [9]:
help(math.log)

Help on built-in function log in module math:

log(...)
    log(x[, base])
    
    Return the logarithm of x to the given base.
    If the base not specified, returns the natural logarithm (base e) of x.



We can also call `help()` on the module itself. This will give us the combined documentation for *all* the functions and values in the module (as well as a high-level description of the module). Click the "output" button to see the whole help page.

In [None]:
help(math)

### Other import syntax

If we know we'll be using functions in `math` frequently we can import it under a shorter alias to save some typing (though in this case "math" is already pretty short).

In [7]:
import math as mt
mt.pi

3.141592653589793

> You may have seen code that does this with popular libraries `numpy` or `pandas`. It's a common convention to `import numpy as np` and `import pandas as pd`.

The `as` simply renames the imported module. It's equivalent to doing something like:

In [None]:
import math
mt = math

Wouldn't it be great if we could refer to all the variables in the `math` module by themselves? i.e. if we could just refer to `pi` instead of `math.pi` or `mt.pi`? Good news: we can do that.

In [10]:
from math import *
print(pi, log(32, 2))

3.141592653589793 5.0


`import *` makes all the module's variables directly accessible to you (without any dotted prefix).

Bad news: some purists might grumble at you for doing this.

Worse: they kind of have a point.

In [11]:
from math import *
from numpy import *
print(pi, log(32, 2))

TypeError: return arrays must be of ArrayType

What the what? But it worked before!

These kinds of "star imports" can occasionally lead to weird, difficult-to-debug situations.

The problem in this case is that the `math` and `numpy` modules both have functions called `log`, but they have different semantics. Because we import from `numpy` second, its `log` overwrites (or "shadows") the `log` variable we imported from `math`.

A good compromise is to import only the specific things we'll need from each module:

In [None]:
from math import log, pi
from numpy import asarray

### Submodules

We've seen that modules contain variables which can refer to functions or values. Something to be aware of is that they can also have variables referring to *other modules*. 

In [15]:
import numpy
print("numpy.random is a", type(numpy.random))
print("it contains names such as...",
      dir(numpy.random)[-15:]
     )

numpy.random is a <class 'module'>


So if we import `numpy` as above, then calling a function in the `random` "submodule" will require *two* dots.

In [18]:
# Roll 10 dice
rolls = numpy.random.randint(low=1, high=6, size=10)
rolls

array([4, 5, 2, 3, 5, 2, 4, 2, 3, 1])

## Oh the places you'll go, oh the objects you'll see

So after 6 days of the Python Challenge, you're a pro with ints, floats, bools, lists, strings, and dicts (right?). 

Even if that were true, it doesn't end there. As you work with various libraries for specialized tasks, you'll find that they define their own types which you'll have to learn to work with. For example, if you work with the graphing library `matplotlib`, you'll be coming into contact with objects it defines which represent Subplots, Figures, TickMarks, and Annotations. `pandas` functions will give you DataFrames and Series. 

In this section, I want to share with you a quick survival guide for working with strange types.

### Three tools for understanding strange objects

In the cell above, we saw that calling a `numpy` function gave us an "array". We've never seen anything like this before (not in the Python Challenge anyways). But don't panic: we have three familiar builtin functions to help us here.

**1: `type()`** (what is this thing?)

In [19]:
type(rolls)

numpy.ndarray

**2: `dir()`** (what can I do with it?)

In [21]:
print(dir(rolls))

['T', '__abs__', '__add__', '__and__', '__array__', '__array_finalize__', '__array_interface__', '__array_prepare__', '__array_priority__', '__array_struct__', '__array_ufunc__', '__array_wrap__', '__bool__', '__class__', '__complex__', '__contains__', '__copy__', '__deepcopy__', '__delattr__', '__delitem__', '__dir__', '__divmod__', '__doc__', '__eq__', '__float__', '__floordiv__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__iand__', '__ifloordiv__', '__ilshift__', '__imatmul__', '__imod__', '__imul__', '__index__', '__init__', '__int__', '__invert__', '__ior__', '__ipow__', '__irshift__', '__isub__', '__iter__', '__itruediv__', '__ixor__', '__le__', '__len__', '__lshift__', '__lt__', '__matmul__', '__mod__', '__mul__', '__ne__', '__neg__', '__new__', '__or__', '__pos__', '__pow__', '__radd__', '__rand__', '__rdivmod__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rlshift__', '__rmatmul__', '__rmod__', '__rmul__', 

In [22]:
# What am I trying to do with this dice roll data? Maybe I want the average roll, in which case the "mean"
# method looks promising...
rolls.mean()

3.1

In [23]:
# Or maybe I just want to get back on familiar ground, in which case I might want to check out "tolist"
rolls.tolist()

[4, 5, 2, 3, 5, 2, 4, 2, 3, 1]

**3: `help()`** (tell me more)

In [24]:
# That "ravel" attribute sounds interesting. I'm a big classical music fan.
help(rolls.ravel)

Help on built-in function ravel:

ravel(...) method of numpy.ndarray instance
    a.ravel([order])
    
    Return a flattened array.
    
    Refer to `numpy.ravel` for full documentation.
    
    See Also
    --------
    numpy.ravel : equivalent function
    
    ndarray.flat : a flat iterator on the array.



In [None]:
# Okay, just tell me everything there is to know about numpy.ndarray
# (Click the "output" button to see the novel-length output)
help(rolls)

(Of course, you might also prefer to check out [the online docs](https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.ndarray.html))

### Operator overloading

What's the value of the below expression?

In [25]:
[3, 4, 1, 2, 2, 1] + 10

TypeError: can only concatenate list (not "int") to list

What a silly question. Of course it's an error. 

What about...

In [26]:
rolls + 10

array([14, 15, 12, 13, 15, 12, 14, 12, 13, 11])

We might think that Python strictly polices how pieces of its core syntax behave such as `+`, `<`, `in`, `==`, or square brackets for indexing and slicing. But in fact, it takes a very hands-off approach. When you define a new type, you can choose how addition works for it, or what it means for an object of that type to be equal to something else.

The designers of lists decided that adding them to numbers wasn't allowed. The designers of `numpy` arrays went a different way (adding the number to each element of the array).

Here are a few more examples of how `numpy` arrays interact unexpectedly with Python operators (or at least differently from lists).

In [28]:
# At which indices are the dice less than or equal to 3?
rolls <= 3

array([False, False,  True,  True, False,  True, False,  True,  True,
        True])

In [40]:
xlist = [[1,2,3],[2,4,6],]
# Create a 2-dimensional array
x = numpy.asarray(xlist)
print("xlist = {}\nx =\n{}".format(xlist, x))

xlist = [[1, 2, 3], [2, 4, 6]]
x =
[[1 2 3]
 [2 4 6]]


In [37]:
# Get the last element of the second row
x[1,-1]

6

In [38]:
# Get the last element of the second sublist?
xlist[1,-1]

TypeError: list indices must be integers or slices, not tuple

numpy's `ndarray` type is specialized for working with multi-dimensional data, so it defines its own logic for indexing, allowing us to index by a tuple to specify the index at each dimension.

### When does 1 + 1 not equal 2?

Things can get weirder than this. You may have heard of (or even used) tensorflow, a Python library popularly used for deep learning. It makes extensive use of operator overloading.

**TODO: maybe explain a bit more. tf.constant (link to docs?) Cut? **

In [None]:
import tensorflow as tf
a = tf.constant(1)
b = tf.constant(1)
a + b

`a + b` isn't 2, it is (to quote tensorflow's documentation)...

> a symbolic handle to one of the outputs of an `Operation`. It does not hold the values of that operation's output, but instead provides a means of computing those values in a TensorFlow `tf.Session`.



It's important just to be aware of the fact that this sort of thing is possible and that libraries will often use operator overloading in non-obvious or magical-seeming ways.

Understanding how Python's operators work when applied to ints, strings, and lists is no guarantee that you'll be able to immediately understand what they do when applied to a tensorflow `Tensor`, or a numpy `ndarray`, or a pandas `DataFrame`.

Once you've had a little taste of DataFrames, for example, an expression like the one below starts to look appealingly intuitive:

In [41]:
# TODO: comment
df[(df['population'] > 10**9) & (df['latitude'] < 40)]

NameError: name 'df' is not defined

In [None]:
df[(df['country'].startswith('The')) & (df['latitude'] < 40)]

But why does it work? What's each of those operations doing? It can help to know the answer when things start going wrong.

#### Curious how it all works?

(underscores methods...)

**TODO: outro**

<!--
# Final tips and tricks 

(dict constructor?)

## 1. Splitting long lines for readability

## 2. The star operator

## 3. Defensive programming with `assert`

## 4. Sets

## 5. The `zip` function

## 6. Weird slicing tricks

`L[:] = `, `L[::-1]`

## 7. `try`/`except`

## 8. `else:` clauses on loops

Just kidding, this is totally dorky. Don't do it.
-->