In [None]:
from IPython.core.display import HTML
from IPython.lib.display import YouTubeVideo

# Synopsis

One of the strengths of `Python` is the sheer amount of code that has been included by default in its distribution. In this unit, we will learn:

1. What libraries are
2. How to handle importing a library
3. Usage of several important, basic Python libraries (`math`, `glob`, `random`, `Collections`, `os`, `time`, `datetime`, `operator`)

This notebook draws heavily from the following references: 

* [Brief Tour of the Standard Library](https://docs.python.org/3/tutorial/stdlib.html)
* [The Python Standard Library - Index](https://docs.python.org/3/library/index.html)
* [Think Python](http://www.greenteapress.com/thinkpython/)

If you're looking for more after this lecture or after this course, they are good starting points.

# Videos

In [None]:
vid = YouTubeVideo('BVXv0-1Rcc8', width = 600)
display(vid)

In [None]:
vid = YouTubeVideo('RjMbCUpvIgw', width = 600)
display(vid)

# The Python Standard Library

Of all the reasons to use `Python`, number one  (by a landslide) is that the community of `Python` programmers provides a ton of support. Most problems that you will encounter have been solved by someone, it's just a matter of knowing how to find and interpret the solutions.

Prepare to become intricately familiar with [stackoverflow](https://stackoverflow.com), a message board where people post a dizzying array of problems and solutions that are ranked by the community in terms of their helpfulness. And if you're not already proficient at articulating the precise nature of your problem when searching with [Google](https://www.google.com), prepare to learn. 

Google can either be your [best friend](https://www.google.com/#q=python+divide+string+into+list+of+characters), or a [mortal foe](https://www.google.com/#q=how+do+i+get+numbers+and+letters+as+a+list+in+python).

Some problems, however, are so general that they just keep occurring again and again and again. The community of `Python` programmers have taken some of these problems, figured out fast and efficient solutions to them, and provided the code directly to you in the form of a `library`. 

Not all libraries that have been written by someone out there will be included in a  `Python` release. You will likely write your own libraries as you work on projects.

When thinking about libraries or even code you see somewhere, it is important to think about **responsibility** and **maintenance**. Responsibility refers to the individuals or organizations releasing the code.  Is this a beginner or an established organizations such as the **Python Software Foundation**? The Python Software Foundation is much more likely to release tested, validated, well-written code, to have a system for users to reports **bugs**, and to resolve reported bugs.

Maintenance refers to the process by which code is made compatible with new releases of other code that it depends upon.  Think of any piece of software as a building made of bricks. Each brick is another piece of code that the building depends upon for stability.  What makes the situation truly complex is that bricks are always being remade. If the building does not respond to changes in the bricks, it will just collapse.

Any release of `Python` will include the so-called `Standard Library`. A key principle with the functions included in the `Standard Library` is **trust** -- they will work and will work well. However, `Standard Library` is quite limited in scope. That is the reason why we have asked you to downloaded the pun-ly named `Anaconda` is because it includes many libraries that have been vetted and that are known to work well together and to be well maintained. 

***When you find an answer to a question on stackoverflow, your confidence in that answer should be tempered and the results tested.__ If it's in the `Python Standard Library`, thousands of people have already tested it and it works. The same goes for some other very common packages, such as those included in `Anaconda`**

Importantly, the source code of all those libraries is distributed widely.  You can read it, check it, learn from it. You should avoid random libraries from unknown users whose source code is not distributed. **When you run them, you are giving that code your permissions to change things in the computer.  They could encrypt or delete your data, steal your personal information, take control of your camera and microphone, attack sites in the Internet under your name, damage your hardware.** 


# Dealing with Libraries 

## ... or how I learned to stop writing so much code and use the Python Standard Library

I've said the words "library" and "libraries" quite a few times now, but what are they? Think of a library as a collection of useful functions all relating to a generally similar set of tasks. You might have seen code like this:

```
import math
from math import log
import math as m
from math import *
```
We'll look at each of these in turn to see what is going on here. 

Suppose that I was interested in knowing the logarithm of the number 348. I might type:

In [None]:
log(348)

And you should see a `NameError`, because 'normal' python doesn't know what 'log' means. We never defined it or told it that when I type `log(number)` what I really mean is that I want the exponent to which another fixed value, the base, must be raised to produce that number (phew). We could labor and think about _how_ to write that code but logarithms are pretty common right? Surely someone else has figured this out already. Enter `math`:

In [None]:
import math

In [None]:
log(348)

But still we have a `NameError`. 

**What's going on here?** 

Well to access all the cool functions in `math` we need to first tell python what we really mean when we say `log` is a specific function written in the math library:

In [None]:
math.log(348)

Voila! 5.8522024... 

Suppose, however, that we thought that was a little tedious to type over and over again. And all we really need the `math` library for is `log`, we don't care about all the other cool stuff that it has. Well, we could just type:

In [None]:
from math import log

And now `log()` should work out just fine:

In [None]:
log

In [None]:
log(348)

To assure ourselves that these two things are entirely equivalent:

In [None]:
math.log(348) == log(348)

.


.


**Indeed** 

In [None]:
math.log == log

How you choose to import functions from a library might depend a lot on your project. You'll also frequently see something like this:

In [None]:
import numpy as np

All we did was rename `numpy`. This is an area where following the almost universal convention is, for all purposes, mandatory.  If you deviate from the convention, it will make your code harder to read. Image reading a book in which the word `the` had been replace with `143`...

> It was 143 best of times, it was 143 worst of times, it was 143 age of wisdom, it was 143 age of foolishness, it was 143 epoch of belief, it was 143 epoch of incredulity, it was 143 season of Light, it was 143 season of Darkness, it was 143 spring of hope, it was 143 winter of despair, we had everything before us, we had nothing before us, we were all going direct to Heaven, we were all going direct 143 o143r way - in short, 143 period was so far like 143 present period, that some of its noisiest authorities insisted on its being received, for good or for evil, in 143 superlative degree of comparison only.

In [None]:
np.log(348)

In [None]:
np.log(348) == math.log(348)

.


.


**However**

In [None]:
np.log == math.log

**THIS IS HERE JUST FOR INFORMATION. DON'T DO IT!** 

There is a final way of importing libraries that you might see, but we're not going to actually run the code because it's the worst.
```
from math import *
```
You might be able to guess what this is doing and some of you might see why it's a terrible idea. Instead of having to type `math.log()`, importing in this manner will let us access every function in math directly by name `log()`, `exp()`, etc. This might seem nice and easy, but do you know everything that is in the `math` library? It might be huge. And what if my code is analyzing the revenues of a timber company and I happen to have a variable called `log` that refers to the price of a fallen tree. Depending on the order of when I run my code and my imports, `log` might either refer to a function or my variable. If I always use 
```
from math import log
```
I have the same problem in that I've defined `log` but I'm explicitly reminded of the name of the function that I'm importing. And if I was a timber company I might see the err in my ways. But by using the first syntax:
```
from math import *
```
I'm importing perhaps hundreds or thousands of functions whose name I don't even know.

> you'll see `from math import *` in your googling. Don't do it. 

## Documentation
So we found the logarithm of 348 a number of ways. But the astute among you may ask, logarithm of base what? Well, Jupyter (Ipython Notebook) can be really helpful here. Try typing:

In [None]:
math.log?

A helpful little box should have popped up explaining a bit about math.log (which you can close by clicking the x in the upper right corner).
We could also type `math.log` then Shift+Tab. Try that below:

In [None]:
math.log()

or

In [None]:
help(math.log)

From either of these options we learned that `math.log` needs a number `x`. We could also give it a second number separated by a comma to specify the base. If we don't give that second argument, then it will default to `e`, the natural logarithm.

In [None]:
print(math.log(348))
print(math.log(348, 2))
print(math.log(348, 10))

But is it really the natural logarithm? Let's double check:

In [None]:
print(math.log(348, e))

Ack! Python doesn't know what e is! How do I know whether `math.log` is really using the base `e`?

Well `e` is pretty mathy, maybe the math library can help us but how do I know?

In [None]:
help(math)

Remember we could also just have typed `math` and then held down Shift+Tab to get a drop down of some available options. But it looks like `e` is in there somewhere which is just what we want (as is `pi`, `tau`, `inf` and `nan`).

In [None]:
math.e

Success! Now is math.log really defaulting to base `e`?

In [None]:
math.log(348) == math.log(348, math.e)

You win `math` library, you win. 

This is a fun little exercise. But really, I trusted that `math` library all along because I know it's part of the Python Standard Library. Trust is key when using built-in libraries and functions otherwise you might never get anything done. Just don't spread that trust too broadly. You really _really_ need to get in the habit of looking at documentation when you use a library or a function that you've never used before. Thankfully Jupyter gives you a lot of options on how to do so, so you have no excuse. 

# `math` - Mathematical functions
[Package documentation](https://docs.python.org/3/library/math.html)

Our old friend `math` is as good of a starting point as any. We'll learn a lot more about complex mathematics and statistics libraries later. But for now, you'll need some basic math aside from +-*/ (which should all work as expected!).

But to see why `math` is so great let's take a break and try a little exercise:

**Exercise:** what is the value of 21! (21 factorial)?

In [None]:
def calculate_factorial(number_of_interest):
    #Place your code here
    return factorial_of_number

In [None]:
number_of_interest = 21
calculate_factorial(number_of_interest)

When you're learning how to code these exercises are great practice. 

Once you're comfortable you'll know that it's far easier to write:

In [None]:
math.factorial(21)

Did your answers match up? They better!

In [None]:
math.factorial(number_of_interest) == calculate_factorial(number_of_interest)

# `time` - Time access and conversions

[Package documentation](https://docs.python.org/3/library/time.html)

Greatest Hits:
* `time.sleep(x)`: pauses for x seconds
* `time.time()`: gets current time in seconds

In [None]:
import time

In [None]:
time.time()

If you are wondering what that number means, it is counting seconds from January 1st, 1970, midnight UTC (the *epoch* time). For other systems, **time zero** is January 1, 1601, or 1858. 

Don't ask me why...

This function provides a way to see how long your code takes to run!

In [None]:
start_time = time.time()
for i in range(10000):
    trash = i**2
end_time = time.time()
print(end_time - start_time)

As you might have guessed, someone already tackle this problem and created a library called `timeit` that provides functions to help you do this.

In [None]:
###Place your code here
import timeit
timeit.timeit('for i in range(10000): trash = i**2', number=10)

__Exercise__: Remember when we made a function to rival `math.factorial`? Which one runs faster?

On rare occasions, you might actually want your code to run _slower_ (perhaps when scraping a website). You might want to take a little break between each time you run a line of code:

In [None]:
for i in range(5):
    print(i)
    time.sleep(1)

Compare that to:

In [None]:
for i in range(5):
    print(i)

# `copy` - Shallow and deep copy operations

[Package documentation](https://docs.python.org/3/library/copy.html)

Greatest Hits:
* `copy.copy(x)`
* `copy.deepcopy(x)`

This is a really subtle but important point that you need to be aware of.

In [None]:
import copy

Suppose I defined variable `x` and for the time being I want to have `y` equal the same thing:

In [None]:
x = [5, 6]
y = x

But now something came up, I need to change `y`:

In [None]:
y[0] = 2

So now what are the values of x and y?

In [None]:
print(x)
print(y)

Clearly that's not what we wanted at all. 

This is an important point in `Python`. Every time you are trying to copy a variable that is a collection (list, tuple, set, or dictionary) python actually copies just the `reference` to that variable (that is, it creates another `pointer` to the same place in memory.

So how would we get what we wanted? Enter `copy`

In [None]:
x = [5, 6]
y = copy.copy(x)
y[0] = 2
print(x)
print(y)

Much better. However `copy.copy()` is _shallow_. If I have nested lists for instance it would only 'copy' the top-level list and not the underlying lists. It's a subtle point but for almost all applications what you really want is copy.deepcopy(). I.e. it will create an exact replica all the way down to the variable you give it. And it functions the exact same way as copy:

In [None]:
# Standard copy.copy doesn't work

x = ["a", [5, 6]]
y = copy.copy(x)
y[1][0] = 2
print(x)
print(y)

In [None]:
# But copy.deepcopy works!

x = ["a", [5, 6]]
y = copy.deepcopy(x)
y[1][0] = 2
print(x)
print(y)

# `operator` - Standard operators as functions

[Package documentation](https://docs.python.org/3/library/operator.html)

This will be easiest to describe with an example:

In [None]:
import operator

In [None]:
x = [[5,4,3], [2, 4, 5], [9,2,1]]
sorted(x)


What actually happened here? 

The list of lists was sorted based off of the first value of the lists. 

**But suppose I wanted to sort based off the second? or the last?**

In [None]:
help(sorted)

In [None]:
sorted(x, key = operator.itemgetter(2))


If you are trying to sort a list of dictionaries, you have to provide a `key` to `.itemgetter()` instead of an `index`.  

In [None]:
list_dict = [{'a': 1, 'b': 30}, {'a': 5, 'b': 3}, {'a': 10, 'b': 0}]

sorted(list_dict, key = operator.itemgetter('b'))

Notice that all dictionaries in the list must include the corresponding `key`.

In [None]:
list_dict = [{'a': 1, 'b': 30}, {'a': 5, 'b': 3}, {'a': 10, 'b': 0}, {'a': 30}]

sorted(list_dict, key = operator.itemgetter('b'))

# `collections` - Container datatypes

[Package documentation](https://docs.python.org/3/library/collections.html)

Greatest Hits:
* `collections.Counter`: counts repeated instances from an iterable

A pretty common problem that you may encounter is: given a list, how many times does each unique element appear inside of that list? 

An easy way to solve this is with Counter

In [None]:
# Write code to count the occurrences of all the names in the list
#
dwarfs = ['Doc', 'Grumpy', 'Happy', 'Sleepy', 'Bashful', 'Sneezy', 
          'Doc', 'Dopey']

###Place your code here



print(dwarfs_count)

While that's a good exercise, it's obviously a little bit tedious. Since I mentioned that this is a common problem, as you might expect, the work has already been done for you. 

In [None]:
import collections


In [None]:
dwarfs_count = collections.Counter(dwarfs)
print(dwarfs_count)

# `random` - Generate pseudo-random numbers

[Package documentation](https://docs.python.org/3/library/random.html)

Greatest Hits:
* `random.random()`: returns a number in the range [0.0, 1.0)
* `random.randint(a, b)`: returns an integer in the range [a, b]
* `random.choice(x)`: randomly returns a value from the sequence x
* `random.sample(x, y)`: randomly returns a sample of length y from the sequence x without replacement

In [None]:
import random

What if I just want a random number between 0 and 1, because who knows, it might be useful (it will be at some point):

In [None]:
random.random()

Make sure that you run that cell a few times, you should get a different answer every time.

Sometimes integers are just easier to deal with. 

In [None]:
random.randint(7, 261)

__Exercise:__ Are the numbers 7 and 261 included or excluded from this random number generator:

In [None]:
#Place your code here



For a lot of statistical tests you'll want to be able to randomly select items from a list so here are two easy ways to do it. As always, if this is random it better give you different results when you run it multiple times!

In [None]:
dwarfs = list(set(dwarfs))

In [None]:
help(random.sample)

In [None]:
random.sample(dwarfs, 3)

In [None]:
random.choice(dwarfs)

# `datetime` - Basic date and time types

[Package documentation](https://docs.python.org/3/library/datetime.html)

We'll talk about this package a bit more later, but for now let's just give you some basics:

In [None]:
import datetime

The `date` library in `datetime` provides functions and data to work with dates.

In [None]:
today = datetime.date.today()

print(today)
print()
print(today.day)
print(today.year)


In [None]:
birthday = datetime.date(1984, 2, 25)

print(birthday)
print()
print(birthday.day)
print(birthday.month)
print(birthday.year)

In _one_ variable called `birthday` we now have lots of information. This is much easier to work with than having separate variables for each of these:
```
birth_day = 25
birth_month = 2
birth_year = 1984
```
or one variable that we have to split apart everytime we only care about a particular piece of it:

```
birthday = '02-25-1984'
```

If you want to work with time, you need to use the library `datetime` in `datetime`

In [None]:
now = datetime.datetime.today()
print(now)

This should print the time and date according to your system. If you were in a different timezone, the output would be different.

`datetime` provides tools to convert between time zones. If you want to convert dates to HaLuah  or to Hijiri calendars, then you have to use other libraries.

# Final remarks


That's all for the `Python Standard Library`, but you haven't seen the last of any of these packages. 

I highly recommend reading through some of the links I provided at the top of this notebook. **But the best way to learn about packages also happens to be the best way to learn programming. Practice. Practice. Practice.**

Just know that while you're practicing, if you encounter a problem that seems like it might already have been done before, turn to `Google`. 

**Trust the standard library.**

**Trust other packages and `stackoverflow` less so.** 


Test functions that you use, make sure they give you results that you expect, and delight in how much time you'll have saved. 