# Python Syntax Review

This notebook provides a brief overview of many of the more commonly used features of the Python programming language.

The information here is provided in a brief outline format for quick review and reference, and is not intended to be a programming tutorial.

## Other resources

For more thorough introductions and tutorials to the Python programming language, here are a few great resources to check out:

 * [Interactive Learn Python tutorial](https://www.kaggle.com/learn/python) from Kaggle
 * [Python for Beginners](https://www.python.org/about/gettingstarted/) getting started guide
 * The [official Python docs](https://docs.python.org/3/)

# Data types

## What is type?


When we are dealing with various kinds of objects or values in programming, the type of a thing determines the operations we can perform with that thing.

In programming, the notion of type extends well beyond the realm of numbers into value concepts like character and text, as well as more complex data types like dates and times. 

For now, we will focus on these simple types:

 * The numeric types `int` and `float`.
 * The text type, `str` or string, which refers to a string of characters.
 * The boolean type, which is used to indicate truthiness (i.e. True and False).
 * The special null value in Python, which is called `None`

### Looking at type


The simplest way to get a feel for types in Python is simply to inspect the types of some things. Python has a builtin `type` function. You may already know the `print` function. We will use `type` in much the same way.

Let's call `type` on some things that come to mind.

## Numeric types

### What is the type of the number 1?

In [None]:
type(1)

int

`int` refers to the builtin "integer" type, which is how we generally refer to whole numbers.

### What is the type of 1.1?

In [None]:
type(1.1)

float

1.1 is a `float`, or floating-point number. You might think of these as "natural" numbers, or decimal numbers.

### What about 1.0?

In [None]:
type(1.0)

float

Simply appending the point-zero onto an integer value suddenly makes it a float instead of an integer.

## Text as a type

Strings of text in Python have the type `str`. A single character is just a string of length 1 -- there is not a special character type.

In [None]:
type('a')

str

In [None]:
type('abc')

str

`str` is Python's **string** type. Here, "string" refers to a string of characters.



---
🐍 **Characters and strings in Python**

> Some languages have a distinct type for characters, e.g. the `'a'` above, and strings of characters, e.g. `'abc'`. Python treats this as the same type.

---

### Operations on strings



In [None]:
'cats' + 'and' + 'dogs'

'catsanddogs'

---
### 🔨 **Try it!**

> Consider the above example of `'catsanddogs'`. What if we wanted spaces between the words so this string reads correctly?

---

## Booleans and boolean logic


The boolean values are written in Python as `True` and `False`. Note the capitalization.

We can see these in action quite simply:

In [None]:
not True

False

In [None]:
not False

True

In [None]:
bool1 = True
bool2 = False
bool3 = True

print(bool1 or bool2 or bool3)
print(bool1 and bool2)
print(bool1 and bool3)
print(not (bool1 and bool2 and bool3))

We will go into the details of what is happening here in that future lesson. In the mean time, just keep in mind that

While boolean as a data type is something with the explicit values of `True` or `False`, whereas there is a broader concept of "truthiness" which applies to all things and determines how that thing evaluates in a boolean conditional check.

Most things evaluate to True, but a few examples of things that evaluate to False:

 * `False` (of course)
 * `0`
 * `0.0`
 * `""` (the empty string)
 * `None`

 Take a moment to understand the output of each of the `bool` evaluations below:

In [None]:
print(bool(0), 0)
print(bool(0.0), 0.0)
print(bool(1), 1)
print(bool(0.0000001), 0.0000001)
print(bool(""), "The empty string")
print(bool(" "), "A string with spaces")
print(bool(None), None)


False 0
False 0.0
True 1
True 1e-07
False The empty string
True A string with spaces
False None


## When there is no type

Most programming languages have a concept of a `null` value. This is relevant to the topic of types because we have to be able to answer the question: **what is the type of nothing**? Which is not quite the same thing as, for example, the type of an empty thing, or the type of zero.

In Python, the null value is indicated by the special built-in object called `None`.

In practice, there is not much you can do with `None` on its own, and the idea that we even need a concept of "nothingness" in programming might not be at all intuitive at this point. This concept, however, becomes extremely important in the context of the name binding of variables, where it is possible to have a name that appears to be something, but in fact refers to nothing. That _nothing_ in Python is called `None`.

In [None]:
myvar = None
myvar is None

True

In [None]:
myvar = 1
myvar is None

False

## Operators and type

### Integer, and float, and mixed-type operations

Generally speaking, operations between floats result in floats:

In [None]:
print(2.0 * 3.0)
print(2.0 / 3.0)
print(2.0 / 1.0)
print(2.0 + 3.0)

6.0
0.6666666666666666
2.0
5.0


.. and operations between integers result in integers:

In [None]:
print(2 * 3)
print(2 + 3)

6
0.6666666666666666
2.0
5


.. except for division which produces floats:

In [None]:
print(2 / 3)
print(3 / 1)

0.6666666666666666
3.0


.. but there is a special integer division symbol `//`:

In [None]:
print( 2 // 3)
print( 3 // 1)

0
3


.. but mixed float+integer operations always yield a float, even with `//`:

In [None]:
print(1 + 1.0)
print(1.0 + 1)
print(1.0 / 2)
print(2.0 / 1)
print(2.0 // 1)

2.0
2.0
0.5
2.0
2.0


# Strings

### Strings as iterables

Strings are iterables and can be treated as lists:

In [None]:
for letter in 'abcd':
    print(letter)

a
b
c
d


In [None]:
'abcd'[:2]

'ab'

In [None]:
'abcd'[2:]

'cd'

In [None]:
'abcd'[0]

'a'

In [None]:
'abcd'[-1]

'd'

In [None]:
sorted('cbda')

['a', 'b', 'c', 'd']

### Common string operations

In [None]:
'aBcD'.lower()

'abcd'

In [None]:
'aBcD'.upper()

'ABCD'

In [None]:
'abcd'.startswith('a')

True

In [None]:
'abcd'.endswith('b')

False

#### Join a list of strings with a delimiter to create a single string.

In [None]:
",".join(["a", "b", "c", "d"])

'a,b,c,d'

### String templating

#### Simple `%s` strings

Pass items to be rendered into the `%s` placeholders via the `%` operator.

In [None]:
"%s o'clock, %s o'clock" % ("one", "two")

"one o'clock, two o'clock"

#### f-strings

Use `{ }` constructs to pass values into an f-string.

In [None]:
sky_color = "blue"
grass_color = "green"
f"The sky is {sky_color}, the grass is {grass_color}."

'The sky is blue, the grass is green.'

See the [Python docs](https://docs.python.org/3/library/string.html) for more common operations and string formatting options.

# Data structures

## Lists

Lists are ordered, comma-delimited sequences denoted by square brackets. Lists may contain objects or values of any type. Here are some examples of lists:

```
numbers_list = [1, 2, 3, 4, 5]
floats_list = [1.0, 2.0, 3.0, 4.0, 5.0]
strings_list = ['cow', 'chicken', 'goat', 'horse']
mixed_list = [3, 1.0, 'cow', 2, 'chicken', 3.0]
```

Lists can contain the same item multiple times:

```
repetitive_list = [1, 2, 1, 1, 2, 3, 2, 3, 1, 3, 3]
```



### Sorting lists

Lists can be sorted and reverse sorted.

In [None]:
repetitive_list = [1, 2, 1, 1, 2, 3, 2, 3, 1, 3, 3]
sorted_list = sorted(repetitive_list)
reverse_sorted_list = sorted(repetitive_list, reverse=True)
print(sorted_list)
print(reverse_sorted_list)

[1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3]
[3, 3, 3, 3, 2, 2, 2, 1, 1, 1, 1]


> 💡 Note that the `sorted` function is a Python built-in function which takes `reverse` as an optional parameter. The `sorted` function returns a new, sorted copy of the list. The original list is intact as you can see here:

In [None]:
repetitive_list

[1, 2, 1, 1, 2, 3, 2, 3, 1, 3, 3]

**Sorting with the `sort` method**

There is another way to sort a list, which is to use the `sort` method call directly on the list. 

> 💡 Rather than returning a sorted copy of the list, `sort` will sort the orginal list in place:

In [None]:
repetitive_list.sort()
repetitive_list

[1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3]

The sort method also accepts a `reverse` parameter:

In [None]:
repetitive_list.sort(reverse=True)
repetitive_list

[3, 3, 3, 3, 2, 2, 2, 1, 1, 1, 1]

---

### Append to a list


You can append individual items to a list with the append method:

In [None]:
mylist = [1,2,3]
print(mylist)
mylist.append(4)
print(mylist)
mylist.append(5)
print(mylist)

[1, 2, 3]
[1, 2, 3, 4]
[1, 2, 3, 4, 5]


### "Adding" (ie. concatenating) lists


Two or more lists can be concatenated together with the `+` operator. The result will be returned as a single list:

In [None]:
[1, 2, 3] + [4, 5, 6]

[1, 2, 3, 4, 5, 6]

### Extending a list with another list

An existing list can be extended with another list by using the `extend` method:

In [None]:
mylist = [1, 2, 3]
mylist.extend([4, 5, 6])
mylist

[1, 2, 3, 4, 5, 6]

## Dictionaries

A simple dictionary that maps digits to their words:

In [None]:
digits = {
    1: 'one',
    2: 'two',
    3: 'three',
    4: 'four',
    5: 'five',
 }
digits

{1: 'one', 2: 'two', 3: 'three', 4: 'four', 5: 'five'}

In [None]:
digits = {}
digits[1] = 'one'
digits[2] = 'two'
digits[3] = 'three'
digits[4] = 'four'
digits[5] = 'five'

digits

{1: 'one', 2: 'two', 3: 'three', 4: 'four', 5: 'five'}

Another way to construct a dictionary is from the key-value tuple pairs:

In [None]:
dict([ (1, 'one'), (2, 'two'), (3, 'three'), (4, 'four'), (5, 'five') ])

{1: 'one', 2: 'two', 3: 'three', 4: 'four', 5: 'five'}

To de-reference an item, use the `[]` syntax:

In [None]:
digits[3]

'three'

Or use the `get` method:

In [None]:
digits.get(5)

'five'

### Assignment/updating dictionary entries

You can also use the `[key]` syntax to add new values to the dictionary:

In [None]:
digits[6] = 'six'
digits

{1: 'one', 2: 'two', 3: 'three', 4: 'four', 5: 'five', 6: 'six'}

And you can combine the dictionary with another dictionary using update:

In [None]:
digits.update({ 7: 'seven', 8: 'eight', 9: 'nine' })
digits

{1: 'one',
 2: 'two',
 3: 'three',
 4: 'four',
 5: 'five',
 6: 'six',
 7: 'seven',
 8: 'eight',
 9: 'nine'}

### Iterating dictionaries

The keys of the dictionary are given by the `keys` method:

In [None]:
digits.keys()

dict_keys([1, 2, 3, 4, 5, 6, 7, 8, 9])

Which we could then iterate:

In [None]:
for k in digits.keys():
    print(k, digits[k])

1 one
2 two
3 three
4 four
5 five
6 six
7 seven
8 eight
9 nine


The key-value pairs can be obtained with the `items`. Note the pairity between what you get from calling `items` and the format for creating a dictionary from k-v pairs above:

In [None]:
digits.items()

dict_items([(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four'), (5, 'five'), (6, 'six'), (7, 'seven'), (8, 'eight'), (9, 'nine')])

The most common way to iterate a dictionary is by iterating the items:

In [None]:
for k, v in digits.items():
    print(k, v)

1 one
2 two
3 three
4 four
5 five
6 six
7 seven
8 eight
9 nine


> 🐍 Note: In older versions of Python, order of dictionaries was undefined. Now, in modern versions of Python, the order is determined by the order of insertion.

### Working with nested dictionaries

Data that is nested to deeper levels of a dictionary can be accessed with successive square-bracket dereferences:

In [None]:
translations = {
    "one": {
        "es": "uno",
        "de": "eins"
    },
    "two": {
        "es": "dos",
        "de": "zwei"
    },
    "three": {
        "es": "tres",
        "de": "drei"
    }
}

In [None]:
translations["one"]["es"]

'uno'

### Handling missing values

### Catching the KeyError

If you try to directly access a missing key, Python will throw a KeyError that will need to be handled.

In [None]:
try:
    four = translations["four"]
except KeyError:
    print("There is no four")


There is no four


See the Exception Handling section below for more information about dealing with exceptions.

### Using the get method

For safer access to potentially missing keys, a dictionary has a `get` method. By default, `get` returns `None` if the key is unavailable, but an alternate default value can be provided.

In [None]:
print(digits.get(10)) # No default. Returns None if missing

None


In [None]:
digits.get(10, "ten") # Default provided

'ten'

Nesting can be defaulted by returning a dictionary as the default:

In [None]:
four_es = translations.get("four", {}).get("es")
print(four_es)

None


# Control flow

## for-loops

A for-loop is a common construct for traversing the items of any iterable thing, such as a list.

### Basic for-loop

In [None]:
brands = ['wendys', 'burgerking', 'mcdonalds', 'tacobell', 'chipotle']

In [None]:
for b in brands:
    print(b)

wendys
burgerking
mcdonalds
tacobell
chipotle


### Enumerated for loop

A common need is to know the iteration count within the context of a loop. Python's `enumerate` is often used for this purpose

In [None]:
for i, b in enumerate(brands):
    print(i, b)

0 wendys
1 burgerking
2 mcdonalds
3 tacobell
4 chipotle


## Conditionals

In [None]:
val_1 = 10
val_2 = 100

if val_1 < val_2:
    print("Val 1 is less")
else:
    print("Val 1 is more")

Val 1 is less


#### elif for alternative conditions

`elif` means "else if"

In [None]:
if val_1 == 1:
    print("first")
elif val_1 == 2:
    print("second")
else:
    print("at least third")

at least third


## Combining loops and conditionals

Combined with the `%` mod operator, the enumeration index can be handy for periodic logging in a large loop.

In [None]:
import string
letters = string.ascii_lowercase

for i, x in enumerate(letters):
    if i % 10 == 0:
        print(i)

0
10
20


# Functions

## Python's builtin functions


A full list of the builtin functions is available here: https://docs.python.org/3.10/library/functions.html

### The print function

`print` is used to print output. Print can:

 * print multiple things at a time
 * take a `sep` separator parameter
 * take an `end` end parameter

In [None]:
print('one')
print('one', 'two')
print('one', 'two', 'three', sep=',')
print('red', 'orange', 'yellow', end=';')
print('green', 'blue', 'violet')

one
one two
one,two,three
red orange yellow;green blue violet


### Functions for numbers

#### Asolute value

In [None]:
print(abs(10))
print(abs(-10))

10
10


#### Convert between types

In [None]:
print(float(23))
print(int(23.0))

23.0
23


#### Min and Max

In [None]:
print(min([10,20,30,40]))
print(max(50,60,70,80))

10
80


#### Some math related functions


In [None]:
print(round(2.4))

# round doesn't really do what you expect in many cases. See the docs for details
print(round(2.5))
print(sum([2,3,4,5]))

2
2
14


### Handy logic functions

#### Check the "truthiness" of something

In [None]:
print(1, bool(1))
print(0, bool(0))
print(1.0, bool(1.0))
print(0.0, bool(0.0))
print(23, bool(23))
print('foo', bool('foo'))
print('', bool(''))
print(None, bool(None))

1 True
0 False
1.0 True
0.0 False
23 True
foo True
 False
None False


### Functions for data structures

#### Convert between different serial structures

In [None]:
print(set([1,2,1,1,2,2,3,2,3,1,3]))
print(list(set([1,1,2,2,3,3])))
print(tuple([1,2,3]))

{1, 2, 3}
[1, 2, 3]
(1, 2, 3)


#### Enumerate a series


In [None]:
for i, val in enumerate(['red', 'orange', 'yellow']):
    print(i, val)

0 red
1 orange
2 yellow


#### Zip together two lists of items

In [None]:
list(zip(['apples', 'oranges', 'bananas'], ['red', 'orange', 'yellow']))

[('apples', 'red'), ('oranges', 'orange'), ('bananas', 'yellow')]

## Defining your own functions


A function is a named unit of functionality which:

 * Accepts parameters (a.k.a. arguments)
 * Executes some code, using the provided parameters
 * Returns some value


Functions are defined with the `def` keyword, and are delimited by a block of consistent whitespace indentation. It is strongly advised that you use the Python community standard of 4 spaces for your indentations.

A function definition for adding two numbers might look like this:

```
def add(va11, val2):
    sum = al1 + val2
    return sum
```

Make note of the following:

 * The function definition line starts with `def` and ends with `:`
 * The body of the function is indented 4 spaces
 * The function returns a value. If there is no `return` for your function, an implicit `None` will be returned as the value. 

Things to consider:

 * Name your functions clearly according to what they do
 * It is customary in Python to name functions using `snake_case`, not `camelCase` or `PascalCase`.
 * A function can both positional and named (keyword) arguments. Positional arguments must come first.
 * Something is always returned from a function, even if it is nothing. If you do not explicitly return a value, Python will return `None` for you.
 * It is possible to design a function that appears to return multiple values by simply returning a tuple of values.

## Example functions


### A specialized printing function


Given a dictionary of brands and ratings, where each brand name is the key to a list of ratings values, print a table of min and max ratings by brand.

In [None]:
def brand_report(data):
    print('Brand', 'min', 'max', sep='\t')
    print('-----', '---', '---', sep='\t')
    for brand, ratings in data.items():
        print(brand, min(ratings), max(ratings), sep='\t')


brand_data = {
    'Nike': [3.0, 2.5, 4.0, 1.0],
    'Adidas': [2.0, 3.5, 4.0, 1.5],
    'Reebok': [1.0, 3.0, 3.5, 2.0]
}
brand_report(brand_data)

Brand	min	max
-----	---	---
Nike	1.0	4.0
Adidas	1.5	4.0
Reebok	1.0	3.5


* Functions, modules
  - Builtin functions (print, len, https://docs.python.org/3.7/library/functions.html)
  - Function def, block constructs and whitespace in Python
  - Parameters (aka arguments)
  - Variable args and keyword args (kwargs)
  - `import` and the python stdlib

# Standard library modules

## datetime and time

In [None]:
# import the datetime module
import datetime

# call the `now` class method on the datetime class
datetime.datetime.now()

datetime.datetime(2020, 5, 26, 1, 57, 27, 115614)

In [None]:
# import the date class from the datetime module and call `today`
from datetime import date
date.today()

datetime.date(2020, 5, 26)

In [None]:
# clock some duration of time
import time
start_time = time.time()
time.sleep(3) # do nothing for 3 seconds
end_time = time.time()
print('duration:', end_time - start_time)

duration: 3.0032832622528076


## statistics

In [None]:
import random
random.seed(123)
data = random.choices(range(1000), k=100)
data # is 100 random numbers from 0 to 999

In [None]:
import statistics
print('mean', statistics.mean(data))
print('median', statistics.median(data))

# using a subset since mode will throw an error if there is not a unique mode
print('mode (of 1st 20)', statistics.mode(data[:20]))

print('stdev', statistics.stdev(data))
print('variance', statistics.variance(data))

mean 435.73
median 405.5
median-low 404
median-high 407
mode (of 1st 20) 87
stdev 287.37600242135176
variance 82584.96676767677


## collections

The collections module has a number of useful utilities for working with collections of things. A very handy thing is the Counter.

In [None]:
import random
data = random.choices(['taco bell', 'wendys', 'burger king', 'mcdonalds'], k=10)
data # is a random list of fast food joints

['mcdonalds',
 'wendys',
 'wendys',
 'burger king',
 'mcdonalds',
 'mcdonalds',
 'wendys',
 'burger king',
 'burger king',
 'wendys']

In [None]:
from collections import Counter
counter = Counter(data)
counter

Counter({'burger king': 3, 'mcdonalds': 3, 'wendys': 4})

What if we want to count as we go through the data?

In [None]:
c2 = Counter()
for item in data:
    c2.update([item])
c2

Counter({'burger king': 3, 'mcdonalds': 3, 'wendys': 4})

## json

JSON is a web standard data format that comes from Javascript. If you are into state diagrams of language syntax, you can peruse the official JSON documentation [here](https://www.json.org). But mainly, suffice it to say that JSON pretty much works like Python dictionaries.

**However** When using JSON as an interchange format, the data comes into and goes out of Python code as a string. We might refer to this string data format as a JSON "object", although technically it is really just a string .. which happens to be in JSON format. (There is, technically, not really such a thing as a JSON **object** in Python).

What this means is that we need some kind of codec to encode and decode JSON data. Specifically, we need to **parse** or **decode** the JSON string to produce a Python dictionary, and conversely we need to **encode** Python data structures into JSON strings in order to save them as JSON. The `json` module handles this work for you.

### Encoding JSON

Use `json.dumps` to "dump" a dictionary structure to a JSON string.

In [None]:
import json

media_urls = {
    'Facebook': 'https://www.facebook.com/',
    'Twitter': 'https://twitter.com/home',
    'Instagram': 'https://www.instagram.com/',
    'TikTok': 'https://www.tiktok.com/'
}

media_urls_as_json = json.dumps(media_urls)
media_urls_as_json # Note: from Python's perspective this is a string!

'{"Facebook": "https://www.facebook.com/", "Twitter": "https://twitter.com/home", "Instagram": "https://www.instagram.com/", "TikTok": "https://www.tiktok.com/"}'

`dumps` also takes an indent parameter for prettier printing. Typically, however, you would only use this for display purposes, not for general data manipulation.

In [None]:
print(json.dumps(media_urls, indent=4))

{
    "Facebook": "https://www.facebook.com/",
    "Twitter": "https://twitter.com/home",
    "Instagram": "https://www.instagram.com/",
    "TikTok": "https://www.tiktok.com/"
}


### Decoding json

Use `json.loads` to "load" a JSON string into a dictionary structure.

In [None]:
data = json.loads(media_urls_as_json)
data # this is a Python dictionary

{'Facebook': 'https://www.facebook.com/',
 'Instagram': 'https://www.instagram.com/',
 'TikTok': 'https://www.tiktok.com/',
 'Twitter': 'https://twitter.com/home'}

### JSON file i/o

A JSON string can be written out to a file, which would be a file in the json standard data format:

```
with open('brandhq.json', 'w') as outfile:
    outfile.write(brand_hq_json) # just like writing any other string to a file
```

It is not necessary to encode the data to a json string before writing it out. The json module provides tools for direct encoding/parsing to and from a file. Starting again with our data dictionary instead of the json string:

```
with open('brandhq.json', 'w') as outfile:
    json.dump(outfile, brand_hq) # note the method is dump, not dumps (which stands for dump-string)
```

We can also go the other way. Given a json string, we can parse it into a dictionary:

In [None]:
data = json.loads(brand_hq_json)
data

Note the absence of quotes around this data. This is a dictionary, not a string:

In [None]:
data['Nike']

There is also a `load` method for working directly with a file:

```
with open('brandhq.json') as infile:
    data = json.load(infile)
```


Beyond the builtin functions discussed above, the Python standard library has a number of modules available with additional functionality.

If you want to use code from a module, you will need to import it.

Some modules that will be useful for you include:

 * [datetime](https://docs.python.org/3.8/library/datetime.html) and [time](https://docs.python.org/3/library/time.html). Basic date and time types

 * [statistics](https://docs.python.org/3/library/statistics.html). Mathematical statistics functions.

 * [collections](https://docs.python.org/3.8/library/collections.html)

 * [pathlib](https://docs.python.org/3/library/pathlib.html).
Object-oriented filesystem paths.

 * [json](https://docs.python.org/3/library/json.html). JSON encoder and decoder.

### JSON-L files

JSON-L is a non-standard file format in which each line is a JSON data object. This differs from standard JSON which is a single object in a file.

You will sometimes run across data files with a `.json` extension, which are actually json-l files.

To read a JSON-L file, iterate the lines of the file and load each one as a JSON string.

```
import json

with open("datafile.jsonl") as f:
    for line in f:
        record = json.loads(line)
        # do something with the data record
```

# External packages


What do you do when the standard library doesn't have what you need? There's a package for that. Python's central repository of 3rd party library packages is called PyPi (the Python Package Index). It has over 1/4 million packages of various utility.

Before you write code that seems like it should already be written, Google: "python <whatever>". Chances are someone has already tackled the same problem you are having now.

## Packages. What are they good for?


To mention a few things

 * plotting / charting
 * machine learning
 * natural language processing
 * web application frameworks
 * working with various services, web APIs, etc.
 * template languages
 * data parsers or various codecs
 * database drivers
 * better handling of x, where x might be:
   - date/time processing
   - statistics / scientific calculation
   - web resource fetching

The list goes on. This is an applied course, so we will use a **lot** of 3rd party libraries! You should get used to reading library documentation, and even sometimes looking at the code!

## Installing packages in Colab

`pip` is the go-to installer for Python packages. In your local environment, you would simply run `pip install requests`, e.g. to install the requests library.

To install packages into the Colab runtime environment, we need to call out to the shell to execute pip. We do this with a bang:

```
!pip install requests
```

Note, however, that Colab has a lot of packages already installed. E.g.:

In [None]:
!pip install requests



Note the _Requirement already satisfied_ since this is a popular library that is pre-installed on Colab.

The easiest way to see all the installed packages is to call:

```
pip freeze
```

In [None]:
# help can show you all of the modules, but it is a bit verbose and slow
# help('modules')

# instead you can call out to the shell to get the "pip freeze" which shows packages and their versions
!pip freeze

# File Input/Output

## Basic file i/o in Python

**Resources and context blocks**

When we talk about resources in programming, we are talking about external things that we "connect" with. Database connections, web connections and other networked resources, and os and filesystem resources like sockets ... and files.


---
### ⚠️ **Pro tip!** close your resources!

Always be sure to close any resources you open to avoid weird hangups and data corruption. The best way to do this is to open resources in a `with` block.

---

For the most part, file i/o is simple:

**open a file**:

```
f = open('/path/to/my/file')
```

**close the file**:

```
f.close()
```

But even better is to do your file activity within a managed context block. In Python, we do this using the `with` statement:

```
with open(my_filepath) as f:
    pass # do something with f here
# <<-- Python will close the file for you here
```


## File i/o in Colab

In the Colab runtime environment, we do not have direct access to a filesystem. Instead, you will need to mount your personal Google Drive and access files there.

Here is an example of what it looks like to mount your Drive and list the contents (just showing the first 3 results here):

In [None]:
import os
from google.colab import drive
drive.mount('/content/drive')
os.listdir('drive/My Drive')[:3]

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly&response_type=code

Enter your authorization code:
4/4AFBiwMFt--CRx4WXh3ueMLdEId28320aqt0ldNBGffRmie6AvD0a7A
Mounted at /content/drive


['1Agenda-setting Journal', '1Proposal ', 'Conference Submissions']

## Working with pathlib

pathlib has some nice features for working with file paths, including `glob` for listing files that match a pattern, and the slash syntax for constructing a path.

```
from pathlib import Path
mydrive = Path('drive/My Drive')
datadir = mydrive / 'data'
csv_files = list(datadir.glob('*.csv'))
```

## Processing a csv file

```
from csv import DictReader

with open('drive/My Drive/mydata.csv') as f:
    reader = DictReader(f)
    for i, row in enumerate(reader):
        # row is a dictionary of key-value pairs
        # with the keys corresponding to the CSV headers
        headers = row.keys()
```

# Regular expressions

https://docs.python.org/3/library/re.html

We will not go into a lot of detail about regular expressions in this course, but you need to know they exist as a potential tool in your toolbox. We will look at the concept briefly here to see how useful they can be.

A regular expressions (a.k.a. regexes) are a type of pattern matching language used to find patterns in strings. The details of this language are a bit specific to the environment you are in, but the concepts are general, and extend not only to programming languages, but also to operating systems and shells. Here are a few example matching functions from Python's flavor of regular expressions:

`.*` Match any character (.) for any number of characters (*)

`[0-9]+` Match 1 or more (+) instances of a digit

`\b[a-zA-Z_]+\w*` Match a valid python variable

`\(?\d{3}\)? ?[.\- ]?\d{3}[.\- ]\d{4}` A *rough* pass at matching some common phone number formats


Let's see a couple of these in action.

In [None]:
import re
p = re.compile(r'\b[a-zA-Z_]+\w*')
code = """
var1 = 123
var2 = 456
sum_total = var1 + var2"""
p.findall(code)

['var1', 'var2', 'sum_total', 'var1', 'var2']

In [None]:
p = re.compile(r'\(?\d{3}\)? ?[.\- ]?\d{3}[.\- ]\d{4}')
p.findall('555 867 5309 | (555)867-5309 | (555) 867-5309 | 555.867.5309')

['555 867 5309', '(555)867-5309', '(555) 867-5309', '555.867.5309']

As you can see, regular expressions are a bit quirky. The need to know when to escape certain characters and additional quirks, like the greediness setting of a given regex engine, adds to confusion. More than a lot of things in coding, regular expressions take a fair amount of experimentation, patience, and practice before you start getting them right.

In general, a few things to keep in mind:

 * Use "raw" strings to define your regular expressions (e.g. r'foo' not 'foo')
 * Everything inside [ ] is implicitly "OR"ed. Although some operators have a special function in this context. E.g. the - is a range operator that you will need to escape if matching explicitly.
 * `+` means 1 or more. `*` means zero or more. `{ }` can be used to indicate an exact number or range of repeated matches.
 * Python's regex engine is "greedy" if you have a runaway match, you may need to temper the greediness with a ? mark
 * Start simple and build up from there. E.g., `\d{3}-\d{4}` is a super simple 7-digit phone number match. Start there and work your way up to longer / more varied formats.


#### Greediness in regular expressions

The greediness of a regular expression function refers to how it decides when to stop matching. Python's regular expressions are greedy, meaning that for any given match, it will match as much of the text possible. We can usually reign in the greediness, if needed, with the ? operator.

For example, say we are trying to extract the HTML tags from an HTML document. A naive approach might use the pattern `<.*>`, but the greediness of the * will overmatch:

```
>>> markup = '<html><body></body></html>'
>>> p = re.compile(r'<.*>')
>>> p.findall(markup)
['<html><body></body></html>']
```

Instead, we need to use the non-greedy version of .*, which is `.*?`:

```
>>> p = re.compile(r'<.*?>')
>>> p.findall(markup)
['<html>', '<body>', '</body>', '</html>']
```

#### Capture groups

Capture groups enable us to extract sub-expressions from a match. These are used with the `match`, `search`, and `finditer` functions, rather than with the `findall` approach.

As an example, consider the HTML tag matcher above. Say we want to extract and list all the unique tag types without the HTML <,>,/ cruft. The above markup should give us ['html', 'body'] as the unique tag types. To do this, we define a capture group using ( ) as well as explicitly match the / closing operator to keep it out of our tag names:

In [None]:
markup = '<html><body></body></html>'
p = re.compile(r'</?(.*?)>')
set([m.group(1) for m in p.finditer(markup)])

{'body', 'html'}

`group(1)` here refers to the first matching group, which means you could have multiple parenthetically denoted matching groups. `group(0)` is a special case which means the match of the whole expression.

# Exception handling

## The Exception block

The construct for an Exception block in Python is this:

```
try:
    # some code that might throw an exception here
except:
    # code to handle the exception
finally:
    # code that is ultimately run no matter what
```

Here, we will focus on try-except -- finally is used for final cleanup of resources that needs to happen whether or not an exception was thrown. You won't need `finally` in this course.


### What does a try block do?

The general flow of a try block is this:

 * **try** to do something
 * **if** that thing fails (i.e. throws an Exception), then execute the **except** portion of the block



**Exception handling** is a kind of control flow that provides a mechanism for defining alternative flow paths under unpredictable conditions.

### What kinds of things are unpredictable?

**External resources:**

 * System / OS specific resources (like the colab example above)
 * Databases / database connections
 * Web sites and web APIs / internet connections
 * Web API authentication and rate-limit conditions

**Internal things: things in your code or data:**

 * Missing or corrupt data
   - possible divide by zero conditions
   - possible None value
   - missing dictionary keys

### Exception handling for external resources

For uncertain conditions with external resources, we almost always handle this with exception-based control flow, which is to say with a try-except block.

The goal here is to **define an alternative workflow which will make your code more robust and/or more graceful in exceptional conditions.**

Which can mean different things depending on the resource. What are some ways you might handle the following?

 * A web API throws an authentication error
 * A web API throws a rate limit error
 * A web request throws a connectivity error
 * An import statement throws a module not found error
 * A file open statement throws a file not found error

---
### Vocabulary: raising and throwing

You will generally hear of exceptions being "raised" or "thrown". In Python, the sytax itself contains a `raise` statement which is part of the control flow of exception handling.

Either **raise** or **throw** are acceptable terms in general. For our purposes, both simply mean "an error occurs".

---

### Example: handling unknown web resources

You have been handed a list of URLs that need to be fetched. Seems easy enough:

In [None]:
import requests

In [None]:
urls = [
    'http://google.com',
    'http://microsoft.com'
]

for url in urls:
    r = requests.get(url)
    print(r.text[:20])

<!doctype html><html



<!DOCTYPE html


But you really don't know where the URLs came from. There could be some bad data here. Perhaps just simply expired websites:

In [None]:
urls = [
    'http://google.com',
    'http://yyzzaabb99554.com',
    'http://microsoft.com'
]

for url in urls:
    r = requests.get(url)
    print(r.text[:20])

<!doctype html><html


ConnectionError: ignored

**How will you handle this?**

 * Inspect the error. Look through the whole stack trace for clues
 * Avoid hitting the Stack Overflow convenience search button for now. First, try to understand what is going on.
 * This stack trace looks daunting. It is giving you a lot of information (that you probably don't need) about what is going on under the hood. But even at a glance, the exception sequence stands out:
   - NewConnectionError
   - MaxRetryError
   - ConnectionError

Although we have not actually defined our request to execute retries, a lot of that underlying code is still part of the logic of fetching urls. We can ignore those errors for our purposes. We will handle this by catching the ConnectionError, which seems fairly intuitive:

In [None]:
urls = [
    'http://google.com',
    'http://yyzzaabb99554.com',
    'http://microsoft.com'
]

for url in urls:
    try:
        r = requests.get(url)
        print(r.text[:20])
    except requests.ConnectionError:
        print('Skipping URL:', url)

<!doctype html><html
Skipping URL: http://yyzzaabb99554.com



<!DOCTYPE html


Some questions you should be able to answer at this point:

 * Why is the `print(r.text)` **inside** the try block? We are not catching a print-related exception, so why does this need to be here, and not outside the try block?
 * What are some other things we might have done other than simply print 'Skipping'?
 * Is this the end-all be-all handler for URL fetching? What else could go wrong?

### Why not just catch everything?

This is certainly possible:

```
try:
    r = requests.get(url)
except:
    print('Skipping:', url)
```

Which is effectively the same thing as:

```
try:
    r = requests.get(url)
except Exception:
    print('Skipping:', url)
```

because `Exception` is base exception class, the one Exception to rule them all.

**However** You will generally want to be as specific as possible (or at least as is practical) when handling exceptions. Exceptions are a kind of hierarchy, and you can choose to handle a more general exception than a more specific one, but this can cause problems down the line. Consider this:

 * You discover a specific exception, but decide to handle a broader exception to save yourself from dealing with this again.
 * Sometime in the distant future, your code is failing. You are sure it is not due to said specific exception, but you cannot see why. The reason is, that you have buried your exception and it being handled in an unexpected way. E.g. consider this exception hierarchy:




### Burying the exception. Don't do this

Here is an extreme form of burying exceptions. Don't do this!

```
try:
    r = requests.get(url)
except:
    pass
```

🦁🦁🦁 and 🐯🐯🐯 and 🐻🐻🐻 oh my!

This is unmaintainable code that will cause you many headaches. When handling your exceptions:

 * Be as specific as is reasonable
 * Be sure to provide the alternate logic. Don't just bury an exception to squash it.

**How to handle this?**

It depends on what you want to do. It depends on overall program goals. One option might be to create the file if it doesn't exist:

### When to use exceptions vs other logic

It is not always clear. Generally speaking you want to reserve exceptions for resource handling and use more standard logic for other things. In Python, exceptions are considered to be more "lightweight" than in other languages, and so you tend to see them used more often. On some level it comes down to coding style.

Here are some ways you could use exceptions, but might be better off with other approaches:

In [None]:
items = [ ('grass', 'green'), ('sky', 'blue'), ('money', 'green'), ('ocean', 'blue')]
items_by_color = {}

for item, color in items:
    try:
        items_by_color[color].append(item)
    except KeyError:
        items_by_color[color] = [item]
items_by_color

{'blue': ['sky', 'ocean'], 'green': ['grass', 'money']}

There is nothing wrong with that code, and there are certainly times where KeyError becomes necessary to handle. Although in this case, it is probably a bit more idiomatic to do something like we have already seen:

In [None]:
items_by_color = {}
for item, color in items:
    if color not in items_by_color:
        items_by_color[color] = []
    items_by_color[color].append(item)
items_by_color

{'blue': ['sky', 'ocean'], 'green': ['grass', 'money']}

### Exceptions you should never handle

 * NameError
 * SyntaxError
 * IndentationError

 .. and the like. Anything that screams **hey this is just bad code and will never work no matter what**


 In fact, Python won't even let you handle SyntaxError and IndentationError as these are spotted and thrown by the interpreter before runtime even begins.

 But, NameError falls generally into the concept of things that fail at runtime, but will never be correct under any conditions. Don't do this:

In [None]:
# don't do this
try:
    for item in nonexistingthing:
        print(item)
except NameError:
    print('Everything is fine.')

Everything is fine.


There are some pretty advanced use cases where I could imagine you would do such a thing, but you will not likely run into them. In short: **don't try to "fix" things with exception handling** ... rather think of exception handling as a form of control flow and a way to choose alternate paths, based on current state and conditions.

This example also underscores why we specify the most practically specific Exception class. Consider the following code:

In [None]:
# don't do this
try:
    result = requests.get('http://some-non-existing-api').json()
    for item in my_result:
        print(item)
except Exception: # This is all exceptions possible!!!
    print('Nothing to see here.')

Nothing to see here.


What is the real error here? There are at least 3, but you wouldn't know it by using the catch-all Exception class!

### Some advanced usage


### Naming the exception for further handling

Sometimes you want to do something with the error you receive. You can give the exception a variable for further handling in the except block:

In [None]:
try:
    open('some-non-existing-file')
except FileNotFoundError as e: # e is an arbitrary variable name
    # The actual available properties will depend on the exception class
    # You can print(dir(e)) to see available properties
    print('NOT FOUND:', e.filename)

NOT FOUND: some-non-existing-file


### Handling multiple exceptions specifically

Sometimes you might want to group the handling of multiple exceptions. In this case, you can catch the multiple exceptions together in the except clause:

In [None]:
from json.decoder import JSONDecodeError

urls = [ 'https://google.com', 'http://u-cant-touch-this-url.com']

for url in urls:
    try:
        r = requests.get(url).json()
    except (JSONDecodeError, requests.ConnectionError):
        print('Bad URL:', url)

Bad URL: https://google.com
Bad URL: http://u-cant-touch-this-url.com


In [None]:
from json.decoder import JSONDecodeError

urls = [ 'https://google.com', 'http://u-cant-touch-this-url.com']

for url in urls:
    try:
        r = requests.get(url).json()
    except JSONDecodeError:
        pass
    except requests.ConnectionError:
        print('Bad URL:', url)

Bad URL: http://u-cant-touch-this-url.com
