# Agenda

1. Recap + Q&A + Exercise
2. Functions
    - What are functions?
    - How do we define functions?
    - Arguments and parameters
    - A little about local vs. global variables
3. Modules and Packages
    - Using the `import` statement to use a module
    - Python standard library and its modules
    - PyPI and packages you can download from the Internet
    - `requests` and consuming APIs with JSON using Python

# Recap from yesterday

1. Dictionaries
    - Key-value stores: Every key has a value, and every value has a key
    - Keys are unique, but values aren't
    - Keys must be immutable, but values can be anything
    - We define a dict with `{}` -- empty braces are an empty dict.
    - You can define a dict with keys and values by putting them in the braces: `{'a':10, 'b':[10, 20, 30], 'c':12.34}`
    - You can retrieve from a dict using `[]` and the key you want: `d['a']`
    - You can search in a dict's keys with the `in` operator
    - You can assign to a dict via `=`, as in `d['a'] = 1234`
        - If the key already exists, then the value is updated
        - If the key doesn't yet exist, then the key-value pair is created
    - Three paradigms for dict use
        1. Define it at the top of the program, and use it as a read-only database. Examples: Months, countries, locations.
        2. Define the dict with keys and initial values. We don't change the keys (adding or removing them), but we do update the values. This is useful for counters of various sorts.
        3. Define an empty dict, and over time, add both keys and values. We saw the "rainfall" program, which used this.
    - Iterating over dicts
        - If you iterate over a dict, you get the keys
        - You can (if you want) iterate over the result of invoking `dict.keys` or `dict.values`. The first is almost always a bad idea, and the second can be useful.
        - You can also use `dict.items`, a method that returns a 2-element tuple with the key and value with each iteration. If you then iterate using unpacking, with `for key, value in d.items():`, that lets you iterate over keys and values, accessing them with variables, in a nice and compact way.
2. Files
    - To work with a file, you need to first open it by stating the filename. We do that with the `open` function.
    - If we don't specify otherwise, then the file is opened in read-only mode.
    - We can specify a second argument to `open`, either `'r'` (for reading) or `'w'` (for writing).
    - Once we have a file object, we can get the contents of the file as a string with the `read` method. However, this returns the entire file's contents, which can be overwhelming -- to us or to our computers.
    - A better way to read through a file is line by line, iterating over the file object
    - Each line is a string, up to and including the next newline character in the file. Each string we get back from iterating is thus guaranteed to end with `\n`.
    - Once we have the string, we can analyze it, or even use `str.split` to break it into pieces.
    - To write to a file, make sure to open it with the `'w'` option, and then use `file.write` to write strings to the file.  When you're done writing to the file, be sure to close it. Otherwise, it's unknown when the data you've written will actually be written to disk.
    - It's common to use the `with` statement with files, because at the end of the `with` block, the file is automatically flushed and closed.

# Exercise: Count characters

The point of this program is to create a dictionary whose keys are characters and whose values indicate how many times each character appeared.

We'll give the program the name of a file. Our program will read through the file, one line at a time, adding to the count for each character. So the end result will be a dict counting how often each character appears in the file.

1. Define a variable `filename` that contains the name of a text file.
2. Define a variable `counts`, an empty dict.
3. Open and iterate over that file, one line at a time.
4. Inside of that loop, start a new (inner/nested) loop, iterating over each character in the current line.
    - If the character is already a key in `counts`, then just add 1 to the count
    - If the character is not already a key in `counts`, then add the key and the value 1
    - Note that we're counting all characters, including space, newline, etc.
5. Iterate over the `counts` dict, printing all keys and values.

Stage 1: Iterate over the lines of the file, and print them out.

In [2]:
# right now, all you need to do is
# (a) define a variable with a filename
# (b) open the named file, iterate over it one line at a time, and print the line

In [4]:
filename = 'wcfile.txt'    # in the same directory as Jupyter

for one_line in open(filename):   # open the file + iterate over it in a for loop
    print(one_line)               # print the current line from the file

This is a test file.



It contains 28 words and 20 different words.



It also contains 165 characters.



It also contains 11 lines.



It is also self-referential.



Wow!



In [5]:
# stage 2: we want to count the characters
# in order to do that, we need to get the characters

# Given the code we already have, expand it such that we don't print each line
# of the file, but rather we print each *character* in the file.
# the result of printing each character will be a very very long, skinny line of characters


In [None]:
filename = 'wcfile.txt'    # in the same directory as Jupyter

for one_line in open(filename):   # open the file + iterate over it in a for loop
    print(one_line)               # print the current line from the file

In [7]:
# if I have a string, and I want to print every charcter
# in that string, I can just use this kind of for loop:

one_line = 'this is a line'

for one_character in one_line:
    print(one_character)

t
h
i
s
 
i
s
 
a
 
l
i
n
e


In [9]:
filename = 'wcfile.txt'    # in the same directory as Jupyter

for one_line in open(filename):   # iterating over a file -- we get strings, the lines in a file
    for char in one_line:         # iterating over a string -- we get characters
        print(char)               

T
h
i
s
 
i
s
 
a
 
t
e
s
t
 
f
i
l
e
.




I
t
 
c
o
n
t
a
i
n
s
 
2
8
 
w
o
r
d
s
 
a
n
d
 
2
0
 
d
i
f
f
e
r
e
n
t
 
w
o
r
d
s
.




I
t
 
a
l
s
o
 
c
o
n
t
a
i
n
s
 
1
6
5
 
c
h
a
r
a
c
t
e
r
s
.




I
t
 
a
l
s
o
 
c
o
n
t
a
i
n
s
 
1
1
 
l
i
n
e
s
.




I
t
 
i
s
 
a
l
s
o
 
s
e
l
f
-
r
e
f
e
r
e
n
t
i
a
l
.




W
o
w
!




In [11]:
# stage 3
# we have this code.
# as we go through each character, don't print it -- rather, either update the
# existing count for that character in the dict, or add it with a count of 1

filename = 'wcfile.txt'    # in the same directory as Jupyter
counts = {}                # this where we'll keep a count of each character

for one_line in open(filename):   # iterating over a file -- we get strings, the lines in a file
    for char in one_line:         # iterating over a string -- we get characters

        # check to see if the character is already a key in the dict
        # if it is, then just +1 on the count
        # if it isn't, then add the key-value pair -- the character, and the count of 1
        if char in counts:
            counts[char] += 1   # we've seen this char before -- just increment the count
        else:
            counts[char] = 1    # we've never seen this char before -- add it, with a count of 1

print(counts)

{'T': 1, 'h': 2, 'i': 10, 's': 15, ' ': 22, 'a': 11, 't': 12, 'e': 10, 'f': 5, 'l': 7, '.': 5, '\n': 11, 'I': 4, 'c': 5, 'o': 9, 'n': 10, '2': 2, '8': 1, 'w': 3, 'r': 7, 'd': 4, '0': 1, '1': 3, '6': 1, '5': 1, '-': 1, 'W': 1, '!': 1}


In [12]:
counts['s']

15

In [13]:
counts[' ']

22

In [15]:
# stage 4
# now, instead of printing counts all at once, iterate through it
# and print each key-value pair on a line by itself

# setup
filename = 'wcfile.txt'    # in the same directory as Jupyter
counts = {}                # this where we'll keep a count of each character

# calculation
for one_line in open(filename):   # iterating over a file -- we get strings, the lines in a file
    for char in one_line:         # iterating over a string -- we get characters

        # check to see if the character is already a key in the dict
        # if it is, then just +1 on the count
        # if it isn't, then add the key-value pair -- the character, and the count of 1
        if char in counts:
            counts[char] += 1   # we've seen this char before -- just increment the count
        else:
            counts[char] = 1    # we've never seen this char before -- add it, with a count of 1

# report
for key, value in counts.items():   # each iteration of items returns (key, value)
    print(f'{key}: {value}')

T: 1
h: 2
i: 10
s: 15
 : 22
a: 11
t: 12
e: 10
f: 5
l: 7
.: 5

: 11
I: 4
c: 5
o: 9
n: 10
2: 2
8: 1
w: 3
r: 7
d: 4
0: 1
1: 3
6: 1
5: 1
-: 1
W: 1
!: 1


# Exercise: IP counts

There's a file called `mini-access-log.txt` in the same directory as Jupyter. It contains about 200 lines of log information from an old Web server. Each line starts with an IP address. We don't care about the rest of the line!

1. Create an empty dict.
2. Iterate over this file, one line at a time.
3. Grab the IP address from the start of each line.
4. If we have seen this IP address before, add to its count.
5. If we have never seen this IP address before, add it to the dict with a value of 1.
6. When we get to the end of the file, we will have a dict whose keys are IP addresses (strings) and whose values are integers (counts), indicating how often each IP address was accessing our server.
7. Use a `for` loop to iterate over the dict and print each IP address and count."

In [16]:
!head mini-access-log.txt

67.218.116.165 - - [30/Jan/2010:00:03:18 +0200] "GET /robots.txt HTTP/1.0" 200 99 "-" "Mozilla/5.0 (Twiceler-0.9 http://www.cuil.com/twiceler/robot.html)"
66.249.71.65 - - [30/Jan/2010:00:12:06 +0200] "GET /browse/one_node/1557 HTTP/1.1" 200 39208 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
65.55.106.183 - - [30/Jan/2010:01:29:23 +0200] "GET /robots.txt HTTP/1.1" 200 99 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.183 - - [30/Jan/2010:01:30:06 +0200] "GET /browse/one_model/2162 HTTP/1.1" 200 2181 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
66.249.71.65 - - [30/Jan/2010:02:07:14 +0200] "GET /browse/browse_applet_tab/2593 HTTP/1.1" 200 10305 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.65 - - [30/Jan/2010:02:10:39 +0200] "GET /browse/browse_files_tab/2499?tab=true HTTP/1.1" 200 446 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.12 - - [30/J

In [None]:
# setup
counts = {}  # empty dict, soon to be filled with IP:count pairs

# calculations
for one_line in open('mini-access-log.txt'):
    print(one_line)

# report
print(counts)