# The Python Standard Library: "Batteries Included"

Of all the reasons to use Python, number one  (by a landslide) is that the community of Python programmers provide a ton of support. Most problems that you will encounter have been solved by someone, it's just a matter of knowing how to find and interpret the solutions.

Prepare to become intricately familiar with [stackoverflow](https://stackoverflow.org) which is a messageboard where people post a dizzying array of problems and solutions that are ranked by the community in terms of their helpfulness. And if you're not already proficient at articulating the precise nature of your problem with google, prepare to learn. Google can either be your [best friend](https://www.google.com/#q=python+divide+string+into+list+of+characters), or a [mortal foe](https://www.google.com/#q=how+do+i+get+numbers+and+letters+as+a+list+in+python).

Some problems, however, are so general that they just keep ocurring again and again and again. The community of Python programmers have taken some of these problems, figured out fast and efficient solutions to them, and provided the code directly to you in the form of a "library". Not all libraries come with the "default" Python because there are just too many of them. You can and likely will write your own libraries for specific problems you encounter. But some libraries are _really_ useful. The reason you all downloaded the punily named "Anaconda" is because it's Python, only _bigger_. Instead of just giving you Python, Anaconda has collected a range of other common "libraries" that nearly every Python programmer uses and has installed these for you.

"Default" Python, however still has a lot of useful libraries known as the "Standard Library". A key principle here is _trust_, the things you find in the Python Standard Library will work. And they will work well. __When you find an answer to a question on stackoverflow, your confidence in that answer should be tempered and the results tested.__ If it's in the Python Standard Library, thousands of people have already tested it and it works. The same goes for some other very common packages, such as those included in Anaconda.

Before we move on, we should note that this notebook has drawn heavily from the following references which are each great in their own right at helping to explain details of the Standard Library and Python programming in general. If you're looking for more after this lecture or after this course this is a good starting point: 

* [Brief Tour of the Standard Library](https://docs.python.org/3/tutorial/stdlib.html)
* [The Python Standard Library - Index](https://docs.python.org/3/library/index.html)
* [Think Python](http://www.greenteapress.com/thinkpython/)

# Libraries 
###(...or how I learned to stop writing so much code and use the Python Standard Library)

I've said the words "library" and "libraries" quite a few times now, but what are they? Think of a library as a collection of useful functions all relating to a generally similar topic. You might have seen code like this:

```
import math
from math import log
import math as awesome_mathematical functions
from math import *
```
We'll look at each of these in turn to see what is going on here. 

Suppose that I was interested in knowing the logarithm of the number 348. I might type:

In [None]:
log(348)

And you should see a `NameError`, because 'normal' python doesn't know what 'log' means. We never defined it or told it that when I type `log(number)` what I really mean is that I want the exponent to which another fixed value, the base, must be raised to produce that number (phew). We could labor and think about _how_ to write that code but logarithms are pretty common right? Surely someone else has figured this out already. Enter `math`:

In [None]:
import math

In [None]:
log(348)

But still we have a `NameError`, What's going on here? Well to access all the cool functions in `math` we need to first tell python what we really mean when we say `log` is a specific function written in the math library:

In [None]:
math.log(348)

Voila! 5.8522024... you get the point. Suppose we thought that was a little tedious to type over and over again. And all we really need the `math` library for is `log`, we don't care about all the other cool stuff that it has. Well, we could just type:

In [None]:
from math import log

And now `log()` should work out just fine:

In [None]:
log(348)

To assure ourselves that these two things are entirely equivalent:

In [None]:
math.log(348) == log(348)

How you choose to import functions from a library might depend a lot on your project. You'll also frequently see something like this:

In [None]:
import math as awesome_mathematical_functions

All we did was kind of rename `math`. This is silly, because we're actually typing more in this example than math. But as you'll see later there are some packages that are so commonly used that even though their name is only 5 letters long people _always_ import them as an alias with two letters. Our function should work exactly the same as before though: 

In [None]:
awesome_mathematical_functions.log(348)

In [None]:
awesome_mathematical_functions.log(348) == math.log(348)

There is a final way of importing libraries that you might see, but we're not going to actually run the code because it's the worst.
```
from math import *
```
You might be able to guess what this is doing and some of you might see why it's a terrible idea. Instead of having to type `math.log()`, importing in this manner will let us access every function in math directly by name `log()`, `exp()`, etc. This might seem nice and easy, but do you know everything that is in the `math` library? It might be huge. And what if my code is analyzing the revenues of a timber company and I happen to have a variable called `log` that refers to the price of a fallen tree. Depending on the order of when I run my code and my imports, `log` might either refer to a function or my variable. If I always use 
```
from math import log
```
I have the same problem in that I've defined `log` but I'm explicitly reminded of the name of the function that I'm importing. And if I was a timber company I might see the err in my ways. But by using the first syntax:
```
from math import *
```
I'm importing perhaps hundreds or thousands of functions whose name I don't even know.

tldr; you'll see `from math import *` in your googling. Don't do it. 

###Documentation
So we found the logarithm of 348 a number of ways. But the astute among you may ask, logarithm of base what? Well, Jupyter (Ipython Notebook) can be really helpful here. Try typing:

In [None]:
math.log?

A helpful little box should have popped up explainign a bit about math.log (which you can close by clicking the x in the upper right corner).
We could also type `math.log` then Shift+Tab. Try that below:

In [None]:
math.log

or

In [None]:
help(math.log)

From either of these options we learned that `math.log` needs a number `x`. We could also give it a second number separated by a comma to specify the base. If we don't give that second argument, then it will default to `e`, the natural logarithm.

In [None]:
print(math.log(348))
print(math.log(348, 2))
print(math.log(348, 10))

But is it really the natural logarithm? Let's double check:

In [None]:
print(math.log(348, e))

Ack! Python doesn't know what e is! How do I know whether `math.log` is really using the base `e`?

Well `e` is pretty mathy, maybe the math library can help us but how do I know?

In [None]:
help(math)

Remember we could also just have typed `math` and then held down Shift+Tab to get a drop down of some available options. But it looks like `exp` is in there somewhere which is just what we want.

In [None]:
help(math.exp)

So e raised to the first should just be e! Right?

In [None]:
math.exp(1)

Success! Now is math.log really defaulting to base e?

In [None]:
math.log(348) == math.log(348, math.exp(1))

You win `math` library, you win. 

This is a fun little exercise. But really, I trusted that `math` library all along because I know it's part of the Python Standard Library. Trust is key when using built in libraries and functions otherwise you might never anything done. Just don't spread that trust to broadly. You really _really_ need to get in the habit of looking at documentation when you use a library or a function that you've never used before. Thankfully Jupyter gives you a lot of options on how to do so, so you have no excuse. 

#Time for the tour:

# `math` - Mathematical functions
[Package documentation](https://docs.python.org/3/library/math.html)

Our old friend `math` is as good of a starting point as any. We'll learn a lot more about complex mathematics and statistics libraries later. But for now, you'll need some basic math aside from +-*/ (which should all work as expected!).

But to see why `math` is so great let's take a break and try a little exercise:

**Exercise:** what is the value of 21! (21 factorial)?

In [None]:
def calculate_factorial(number_of_interest):
    #Place your code here

  
    return factorial_of_number

In [None]:
number_of_interest = 21
calculate_factorial(number_of_interest)

When you're learning how to code these exercises are great practice. Once you're comfortable you'll know that it's far easier to say:

In [None]:
math.factorial(21)

Did your answers match up? They better!

In [None]:
math.factorial(number_of_interest) == calculate_factorial(number_of_interest)

# `random` - Generate pseudo-random numbers

[Package documentation](https://docs.python.org/3/library/random.html)

Greatest Hits:
* `random.random()`: returns a number in the range [0.0, 1.0)
* `random.randint(a, b)`: returns an integer in the range [a, b]
* `random.choice(x)`: randomly returns a value from the sequence x
* `random.sample(x, y)`: randomly returns a sample of length y from the sequence x without replacement

In [None]:
import random

What if I just want a random number between 0 and 1, because who knows, it might be useful (it will be at some point):

In [None]:
random.random()

Make sure that you run that cell a few times, you should get a different answer every time.

Sometimes integers are just easier to deal with. 

In [None]:
random.randint(7, 261)

__Exercise:__ Are the numbers 7 and 261 included or excluded from this random number generator:

In [None]:
#Place your code here



For a lot of statistical tests you'll want to be able to randomly select items from a list so here are two easy ways to do it. As always, if this is random it better give you different results when you run it multiple times!

In [None]:
dwarfs = ['Doc', 'Grumpy', 'Happy', 'Sleepy', 'Bashful', 'Sneezy', 'Dopey']

In [None]:
random.sample(dwarfs, 3)

In [None]:
random.choice(dwarfs)

#`os` - Miscellaneous operating system interfaces

[Package documentation](https://docs.python.org/3/library/os.html)

These should be pretty self explanatory. But when you're navigating through file systems to read and write files you'll quickly learn how important they are.

In [None]:
import os

In [None]:
current_directory = os.getcwd()
print(current_directory)

In [None]:
contents = os.listdir(current_directory)
print(contents)

# `glob` - Unix style pathname pattern expansion

[Package documentation](https://docs.python.org/3/library/glob.html)

`glob` doesn't have a lot. In fact, it's just two functions which are quite similar but powerful nevertheless. 

In [None]:
import glob

In [None]:
help(glob)

In [None]:
for infile in glob.glob(current_directory + '/*'):
    print(infile)

**Exercise:** how many files are in your current directory? How many of those are '.ipynb' files?

In [None]:
###Place code here


# `time` - Time access and conversions

[Package documentation](https://docs.python.org/3/library/time.html)

Greatest Hits:
* `time.sleep(x)`: pauses for x seconds
* `time.time()`: gets current time in seconds

In [None]:
import time


In [None]:
time.time()

In [None]:
time.time()

This can be a useful if somewhat tedious way to see how long your code takes to run!

In [None]:
start_time = time.time()
for i in range(10000):
    trash = i**2
end_time = time.time()
print(end_time - start_time)

__Exercise__: Remember when we made a function to rival `math.factorial`? Which one runs faster?

In [None]:
###Place your code here







On rare occasions, you might actually want your code to run _slower_ (perhaps when scraping a website). You might want to take a little break between each time you run a line of code:

In [None]:
for i in range(10):
    print(i)
    time.sleep(3)

Compare that to:

In [None]:
for i in range(10):
    print(i)

# `datetime` - Basic date and time types

[Package documentation](https://docs.python.org/3/library/datetime.html)

We'll talk about this package a bit more later, but for now let's just give you some basics:

In [None]:
import datetime

In [None]:
today = datetime.date.today()
print(today)
print(today.day)
print(today.year)


In [None]:
birthday = datetime.date(1984, 2, 25)
print(birthday.day)
print(birthday.month)
print(birthday.year)

In _one_ variable called `birthday` we now have lots of information. This is much easier to work with than having separate variables for each of these:
```
birth_day = 25
birth_month = 2
birth_year = 1984
```
or one variable that we have to split apart everytime we only care about a particular piece of it:

```
birthday = '02-25-1984'
```

# `copy` - Shallow and deep copy operations

[Package documentation](https://docs.python.org/3/library/copy.html)

Greatest Hits:
* `copy.copy(x)`
* `copy.deepcopy(x)`

This is a really subtle but important point that you need to be aware of.

In [None]:
import copy

Suppose I defined variable `x` and for the time being I want to have `y` equal the same thing:

In [None]:
x = [5, 6]
y = x

But now something came up, I need to change `y`:

In [None]:
y[0]= 2

So now what are the values of x and y?

In [None]:
print(x)
print(y)

Ack! That's not what we wanted at all. So how would we get what we wanted? Enter `copy`

In [None]:
x = [5, 6]
y = copy.copy(x)
y[0]= 2
print(x)
print(y)

Much better. However `copy.copy()` is _shallow_. If I have a dictionary of lists for instance it would only 'copy' the dictionary and not the underlying lists. It's a subtle point but for almost all applications what you really want is copy.deepcopy(). I.e. it will create an exact replica all the way down of the variable you give it. And it functions the exact same way as copy:

In [None]:
x = [5, 6]
y = copy.deepcopy(x)
y[0]= 2
print(x)
print(y)

# `operator` - Standard operators as functions

[Package documentation](https://docs.python.org/3/library/operator.html)

This will be easiest to describe with an example:

In [None]:
import operator

In [None]:
x = [[5,4,3], [2, 4, 5], [9,2,1]]
x.sort()
print(x)

What actually happened here? I sorted a list of lists based off of the first value of the lists. But suppose I wantd to sort based off the second? or the last?

In [None]:
x.sort(key=operator.itemgetter(3))
print(x)

Woops! Remember, the lists inside only have three elements in them. And we start indexing at 0 so I just told it to sort based off a non-existent entry. 

In [None]:
x.sort(key=operator.itemgetter(2))
print(x)

Much better :)

# `collections` - Container datatypes

[Package documentation](https://docs.python.org/3/library/collections.html)

Greatest Hits:
* `collections.defaultdict`: automatically creates values for missing keys
* `collections.Counter`: counts repeated instances from an iterable

In [None]:
import collections
TAs = ['adam', 'peter', 'nick', 'adam', 'hyojun', 'joao']
TA_names_dict = dict(collections.Counter(TAs))
print(TA_names_dict)

# new_TAs = ['joao', 'chuyue']
# for name in new_TAs:
#     TA_names_dict[name] += 1
# print(TA_names_dict)

# for name in new_TAs:
#     if name in TA_names_dict.keys():
#         TA_names_dict[name] += 1
#     else:
#         TA_names_dict[name] = 1
# print(TA_names_dict)

# TA_names_default_dict = collections.defaultdict(int, TA_names_dict.items())
# new_TAs = ['joao', 'chuyue']
# for name in new_TAs:
#     TA_names_default_dict[name] += 1
# print(TA_names_default_dict)


# `csv` - CSV file reading and writing

[Package documentation](https://docs.python.org/3/library/csv.html)

In [None]:
import csv

gpa_file = open('../Data/gpa_data.csv', 'r')
spreadsheet = csv.reader(gpa_file, delimiter=',')
print(spreadsheet)
# for row in spreadsheet:
#     print(row)

# `json` - JSON encoder and decoder

[Package documentation](https://docs.python.org/3/library/json.html)

Greatest Hits:
* `json.dump()` (dumps to file) and `json.dumps()` (dumps to string)
* `json.load()` (loads from a file) and `json.loads()` (loads a string)

In [None]:
import json
names = ['Adam H', 'Peter W', 'Joao M', 'Hyojun L']
ages = [21, 31, 24, 19]
age_dictionary = dict(zip(names, ages))

with open('../Data/TA_ages.json', 'w') as output_file:
    json.dump(age_dictionary, output_file)
    
#####This should be equivalent to:
# output_file = open('../Data/TA_ages.json', 'w')
# json.dump(age_dictionary, output_file)

In [None]:
with open('../Data/TA_ages.json', 'r') as input_file:
    TA_ages = json.load(input_file)
#####Which is equivalent to:
# input_file = open('../Data/TA_ages.json', 'r')
# TA_ages = json.load(input_file)

In [None]:
print(TA_ages)
print(age_dictionary)
# TA_ages == age_dictionary

**Exercise:** Write your own code (that does not use the json library) that writes a dictionary to a file and then reads that file back into a new dictionary.

**Bonus:** how long does the whole process take? And how long does it take to do the same thing using the json library?

In [None]:
my_dictionary = {'Ernest Hemmingway': ['The Sun Also Rises', 'A Farewell to Arms', 'For Whom the Bell Tolls'],
                 'Charles Dickens': ['Great Expectations', 'A Tale of Two Cities'],
                 'Leo Tolstoy': ['Anna Karenina', 'War and Peace'],
                 'Harper Lee': ['To Kill a Mockingbird']
                 }
                 
###Place your code here


In [None]:
from IPython.core.display import HTML
from IPython.lib.display import YouTubeVideo


def css_styling():
    styles = open("../styles/custom.css", "r").read()
    return HTML(styles)
css_styling()