# Agenda

1. Recap + Q&A + Exercise
2. Functions
    - What are functions?
    - How do we define functions?
    - Arguments and parameters
    - A little about local vs. global variables
3. Modules and Packages
    - Using the `import` statement to use a module
    - Python standard library and its modules
    - PyPI and packages you can download from the Internet
    - `requests` and consuming APIs with JSON using Python

# Recap from yesterday

1. Dictionaries
    - Key-value stores: Every key has a value, and every value has a key
    - Keys are unique, but values aren't
    - Keys must be immutable, but values can be anything
    - We define a dict with `{}` -- empty braces are an empty dict.
    - You can define a dict with keys and values by putting them in the braces: `{'a':10, 'b':[10, 20, 30], 'c':12.34}`
    - You can retrieve from a dict using `[]` and the key you want: `d['a']`
    - You can search in a dict's keys with the `in` operator
    - You can assign to a dict via `=`, as in `d['a'] = 1234`
        - If the key already exists, then the value is updated
        - If the key doesn't yet exist, then the key-value pair is created
    - Three paradigms for dict use
        1. Define it at the top of the program, and use it as a read-only database. Examples: Months, countries, locations.
        2. Define the dict with keys and initial values. We don't change the keys (adding or removing them), but we do update the values. This is useful for counters of various sorts.
        3. Define an empty dict, and over time, add both keys and values. We saw the "rainfall" program, which used this.
    - Iterating over dicts
        - If you iterate over a dict, you get the keys
        - You can (if you want) iterate over the result of invoking `dict.keys` or `dict.values`. The first is almost always a bad idea, and the second can be useful.
        - You can also use `dict.items`, a method that returns a 2-element tuple with the key and value with each iteration. If you then iterate using unpacking, with `for key, value in d.items():`, that lets you iterate over keys and values, accessing them with variables, in a nice and compact way.
2. Files
    - To work with a file, you need to first open it by stating the filename. We do that with the `open` function.
    - If we don't specify otherwise, then the file is opened in read-only mode.
    - We can specify a second argument to `open`, either `'r'` (for reading) or `'w'` (for writing).
    - Once we have a file object, we can get the contents of the file as a string with the `read` method. However, this returns the entire file's contents, which can be overwhelming -- to us or to our computers.
    - A better way to read through a file is line by line, iterating over the file object
    - Each line is a string, up to and including the next newline character in the file. Each string we get back from iterating is thus guaranteed to end with `\n`.
    - Once we have the string, we can analyze it, or even use `str.split` to break it into pieces.
    - To write to a file, make sure to open it with the `'w'` option, and then use `file.write` to write strings to the file.  When you're done writing to the file, be sure to close it. Otherwise, it's unknown when the data you've written will actually be written to disk.
    - It's common to use the `with` statement with files, because at the end of the `with` block, the file is automatically flushed and closed.

# Exercise: Count characters

The point of this program is to create a dictionary whose keys are characters and whose values indicate how many times each character appeared.

We'll give the program the name of a file. Our program will read through the file, one line at a time, adding to the count for each character. So the end result will be a dict counting how often each character appears in the file.

1. Define a variable `filename` that contains the name of a text file.
2. Define a variable `counts`, an empty dict.
3. Open and iterate over that file, one line at a time.
4. Inside of that loop, start a new (inner/nested) loop, iterating over each character in the current line.
    - If the character is already a key in `counts`, then just add 1 to the count
    - If the character is not already a key in `counts`, then add the key and the value 1
    - Note that we're counting all characters, including space, newline, etc.
5. Iterate over the `counts` dict, printing all keys and values.

Stage 1: Iterate over the lines of the file, and print them out.

In [2]:
# right now, all you need to do is
# (a) define a variable with a filename
# (b) open the named file, iterate over it one line at a time, and print the line

In [4]:
filename = 'wcfile.txt'    # in the same directory as Jupyter

for one_line in open(filename):   # open the file + iterate over it in a for loop
    print(one_line)               # print the current line from the file

This is a test file.



It contains 28 words and 20 different words.



It also contains 165 characters.



It also contains 11 lines.



It is also self-referential.



Wow!



In [5]:
# stage 2: we want to count the characters
# in order to do that, we need to get the characters

# Given the code we already have, expand it such that we don't print each line
# of the file, but rather we print each *character* in the file.
# the result of printing each character will be a very very long, skinny line of characters


In [None]:
filename = 'wcfile.txt'    # in the same directory as Jupyter

for one_line in open(filename):   # open the file + iterate over it in a for loop
    print(one_line)               # print the current line from the file

In [7]:
# if I have a string, and I want to print every charcter
# in that string, I can just use this kind of for loop:

one_line = 'this is a line'

for one_character in one_line:
    print(one_character)

t
h
i
s
 
i
s
 
a
 
l
i
n
e


In [9]:
filename = 'wcfile.txt'    # in the same directory as Jupyter

for one_line in open(filename):   # iterating over a file -- we get strings, the lines in a file
    for char in one_line:         # iterating over a string -- we get characters
        print(char)               

T
h
i
s
 
i
s
 
a
 
t
e
s
t
 
f
i
l
e
.




I
t
 
c
o
n
t
a
i
n
s
 
2
8
 
w
o
r
d
s
 
a
n
d
 
2
0
 
d
i
f
f
e
r
e
n
t
 
w
o
r
d
s
.




I
t
 
a
l
s
o
 
c
o
n
t
a
i
n
s
 
1
6
5
 
c
h
a
r
a
c
t
e
r
s
.




I
t
 
a
l
s
o
 
c
o
n
t
a
i
n
s
 
1
1
 
l
i
n
e
s
.




I
t
 
i
s
 
a
l
s
o
 
s
e
l
f
-
r
e
f
e
r
e
n
t
i
a
l
.




W
o
w
!




In [11]:
# stage 3
# we have this code.
# as we go through each character, don't print it -- rather, either update the
# existing count for that character in the dict, or add it with a count of 1

filename = 'wcfile.txt'    # in the same directory as Jupyter
counts = {}                # this where we'll keep a count of each character

for one_line in open(filename):   # iterating over a file -- we get strings, the lines in a file
    for char in one_line:         # iterating over a string -- we get characters

        # check to see if the character is already a key in the dict
        # if it is, then just +1 on the count
        # if it isn't, then add the key-value pair -- the character, and the count of 1
        if char in counts:
            counts[char] += 1   # we've seen this char before -- just increment the count
        else:
            counts[char] = 1    # we've never seen this char before -- add it, with a count of 1

print(counts)

{'T': 1, 'h': 2, 'i': 10, 's': 15, ' ': 22, 'a': 11, 't': 12, 'e': 10, 'f': 5, 'l': 7, '.': 5, '\n': 11, 'I': 4, 'c': 5, 'o': 9, 'n': 10, '2': 2, '8': 1, 'w': 3, 'r': 7, 'd': 4, '0': 1, '1': 3, '6': 1, '5': 1, '-': 1, 'W': 1, '!': 1}


In [12]:
counts['s']

15

In [13]:
counts[' ']

22

In [15]:
# stage 4
# now, instead of printing counts all at once, iterate through it
# and print each key-value pair on a line by itself

# setup
filename = 'wcfile.txt'    # in the same directory as Jupyter
counts = {}                # this where we'll keep a count of each character

# calculation
for one_line in open(filename):   # iterating over a file -- we get strings, the lines in a file
    for char in one_line:         # iterating over a string -- we get characters

        # check to see if the character is already a key in the dict
        # if it is, then just +1 on the count
        # if it isn't, then add the key-value pair -- the character, and the count of 1
        if char in counts:
            counts[char] += 1   # we've seen this char before -- just increment the count
        else:
            counts[char] = 1    # we've never seen this char before -- add it, with a count of 1

# report
for key, value in counts.items():   # each iteration of items returns (key, value)
    print(f'{key}: {value}')

T: 1
h: 2
i: 10
s: 15
 : 22
a: 11
t: 12
e: 10
f: 5
l: 7
.: 5

: 11
I: 4
c: 5
o: 9
n: 10
2: 2
8: 1
w: 3
r: 7
d: 4
0: 1
1: 3
6: 1
5: 1
-: 1
W: 1
!: 1


# Exercise: IP counts

There's a file called `mini-access-log.txt` in the same directory as Jupyter. It contains about 200 lines of log information from an old Web server. Each line starts with an IP address. We don't care about the rest of the line!

1. Create an empty dict.
2. Iterate over this file, one line at a time.
3. Grab the IP address from the start of each line.
4. If we have seen this IP address before, add to its count.
5. If we have never seen this IP address before, add it to the dict with a value of 1.
6. When we get to the end of the file, we will have a dict whose keys are IP addresses (strings) and whose values are integers (counts), indicating how often each IP address was accessing our server.
7. Use a `for` loop to iterate over the dict and print each IP address and count."

In [16]:
!head mini-access-log.txt

67.218.116.165 - - [30/Jan/2010:00:03:18 +0200] "GET /robots.txt HTTP/1.0" 200 99 "-" "Mozilla/5.0 (Twiceler-0.9 http://www.cuil.com/twiceler/robot.html)"
66.249.71.65 - - [30/Jan/2010:00:12:06 +0200] "GET /browse/one_node/1557 HTTP/1.1" 200 39208 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
65.55.106.183 - - [30/Jan/2010:01:29:23 +0200] "GET /robots.txt HTTP/1.1" 200 99 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
65.55.106.183 - - [30/Jan/2010:01:30:06 +0200] "GET /browse/one_model/2162 HTTP/1.1" 200 2181 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)"
66.249.71.65 - - [30/Jan/2010:02:07:14 +0200] "GET /browse/browse_applet_tab/2593 HTTP/1.1" 200 10305 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.71.65 - - [30/Jan/2010:02:10:39 +0200] "GET /browse/browse_files_tab/2499?tab=true HTTP/1.1" 200 446 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.65.12 - - [30/J

In [24]:
# setup
counts = {}  # empty dict, soon to be filled with IP:count pairs

# calculations
for one_line in open('mini-access-log.txt'):
    ip_address = one_line.split()[0]

    if ip_address in counts:     # if we've seen this IP address before,
        counts[ip_address] += 1  # add 1 to the existing count
    else:
        counts[ip_address] = 1   # first time with this IP? Add the new key-value pair

# report
for key, value in counts.items():    # go through each key-value pair
    print(f'{key}:\t{value}')         # print the key and the value

67.218.116.165:	2
66.249.71.65:	3
65.55.106.183:	2
66.249.65.12:	32
65.55.106.131:	2
65.55.106.186:	2
74.52.245.146:	2
66.249.65.43:	3
65.55.207.25:	2
65.55.207.94:	2
65.55.207.71:	1
98.242.170.241:	1
66.249.65.38:	100
65.55.207.126:	2
82.34.9.20:	2
65.55.106.155:	2
65.55.207.77:	2
208.80.193.28:	1
89.248.172.58:	22
67.195.112.35:	16
65.55.207.50:	3
65.55.215.75:	2


# Exercise: Running a program from the command line

1. In PyCharm, create a simple program -- just ask for the user's name and print a greeting.
2. Find out where your project is located.
3. Open a terminal/CMD window
4. Type "python" or (on Windows) "py -3" followed by the full path + name of the program you wrote.
5. Check that the program runs!

Your command will probably look like this:

    py -3 /Users/reuven/PycharmProjects/cisco-aug-2023/myprog.py

or maybe

    python /Users/reuven/PycharmProjects/cisco-aug-2023/myprog.py


# Functions!

We've already seen the "DRY rule," where we say, "Don't repeat yourself!"

- If you have the same code on several lines in a row, then you can "DRY up" that code with a loop.
- If you have the same code in several places in a program, then you can "DRY up" your code with a function.

A function lets you define a new verb in Python. That shortens your code, and gives you semantic power -- it lets you refer to a lot of things using a single term/name.

This idea of hiding lots of complexity under a single name is known in engineering as "abstraction." Functions are all about abstraction, letting us express complex ideas in a short amount of time, and ignoring the details that will just distract us.

# how do you have a successful startup?

1. Good idea
2. (something goes here)
3. Exit!

In [25]:
# how do we define a new function?

# - def -- this is the reserved word that lets us define it
# - name the function
# - empty parentheses (for now), where the parameters will go
# - colon at the end of the line
# - an indented block, the "body" of the function

def hello():
    print('Hello!')

In [27]:
hello()   # call the function hello, with the help of ()

Hello!


# Exercise: Hello function

1. In PyCharm, start a new file (hello.py)
2. Define a function called `hello` that, when run, asks the user for their name and prints a nice greeting on the screen using their name.
3. In the same file, lower down (after the definition), run the function with `hello()`.
4. Now, run the file (inside of PyCharm is OK), and see that it asks for your name and prints it.

# Next up

- Return values from functions
- Arguments and parameters
- Local variables

- Resume at :50

# What can we put in a function?

Anything we want:

- Variable assignment
- Input from the user
- Displaying things to the user
- Working with files
- `for` and `while` loops

You want to keep a function as short/small as possible:
- It's easier to understand
- It's easier to test

I like to write functions that are no more than 20 lines long.

Better (I think) to have more, short functions than fewer, longer functions.

If you end up having a really complex function, I often think it's best to break it into several little pieces, each its own function, and then call those functions from the main one.

In [28]:
def very_complex_thing():
    print('thing1')
    print('thing2')
    print('thing3')

# We can turn the above into:

def very_complex_thing():
    thing1()
    thing2()
    thing3()
    
def thing1():
    print('thing1')

def thing2():
    print('thing2')

def thing3():
    print('thing3')

very_complex_thing()

thing1
thing2
thing3


# Return values

When I call some functions, I get a value back:

    s = 'abcd')
    len(s)       # this returns the integer 4
    s.upper()    # this could return the string 'ABCD'

When a function returns a value, I can do any of several things with it:

- Print it
- Assign it to a variable
- Pass it to another function as an argument

Right now, our functions cannot be used in this way because they *print* values, but they don't *return* values. 

Returning a value ≠ printing a value on the screen.

A function can print as much as it wants, as often as it wants. But it only gets to return one value per call.  A function might have code that returns different values, and even different types, depending on circumstances. But in any one given invocation of the function, we'll only have one return value.

How can our functions start to return values?

They just need to use the `return` keyword. When Python hits `return` in a function, it immediately stops the function's execution and returns that value.



In [29]:
def hello():
    name = input('Enter your name: ').strip()
    return f'Hello, {name}'

In [30]:
hello()

Enter your name:  Reuven


'Hello, Reuven'

In [31]:
# here, I took the output / return value from hello, 
# and immediately pased it to print

# first, the hello function runs, then it returns a string, then print displays that string
print(hello())

Enter your name:  Reuven


Hello, Reuven


In [32]:
# the return value of hello() can be assigned to a variable

s = hello()
print(s)

Enter your name:  Reuven


Hello, Reuven


In [33]:
s

'Hello, Reuven'

# Exercise: Calculator

1. Write a function, `calc`, that when run asks the user to enter three things:
    - first number
    - operator (a string, either `+` or `-`
    - second number
2. Return the value you got from invoking the operator on the numbers.
3. Assign the return value from `calc` to the variable `result`, and print it.

Example:

    result = calc()
    Enter first number: 3
    Enter operator: +
    Enter second number: 10

    print(result)
    13

# Parameters

The `calc` function we just wrote works... but it feels weird to call a function and then to have to enter values manually.

It seems more reasonable for us to call the function and pass those values to it, and for those values to be assigned to variables in the function.

The variables in the function are known as *parameters*. The values that we pass when we call a function are known as *arguments*. 

In [34]:
def square(x):       # here, I've defined the function with one parameter
    return x ** 2    # I know that the parameter has a value, and can use it

In [38]:
# here, I call the square function with an integer argument, 5
# Python assigns 5 to the parameter x
# then the function body runs
# then we get a result back

square(5)

25

In [37]:
# here, I call the square function with an integer argument, 12345
# Python assigns 12345 to the parameter x
# then the function body runs
# then we get a result back

square(12345)


152399025

In [39]:
# what if I call a function with the wrong number of arguments?
# that is, with fewer or more arguments than the number of parameters?

square()   # no arguments!

TypeError: square() missing 1 required positional argument: 'x'

In [40]:
square(10, 20)

TypeError: square() takes 1 positional argument but 2 were given

# Exercise: `calc` with arguments

Rewrite `calc` such that it doesn't use `input` to get input from the user. Rather, it gets three arguments -- the first number, the operator, and the second number. It should still return the result it gets.

Example:

    value = calc(10, '+', 3)
    print(value)  # should be 13

# Arguments and parameters

When you define a function, much as when you define a variable, you're cutting the ties between that function name and any previous function by that name.

This means that if you defined `hello` to take 0 arguments, and then you change `hello` to take two arguments, then any mention of `hello` will now be the two-argument version.  You can only have one version of a function at a time.

Can we restrict the types of values that we pass to a function?  That is, let's say I have a function that works well with strings. Can I somehow tell Python that if someone tries to call it with a non-string type, it'll give an error?

No. That doesn't exist.

In [41]:
def hello(name):
    return f'Hello, {name}!'

In [42]:
hello('world')


'Hello, world!'

In [43]:
hello('Reuven')

'Hello, Reuven!'

In [44]:
hello([10, 20, 30])

'Hello, [10, 20, 30]!'

In [45]:
hello(hello)   # yes, we'll the function to itself.

'Hello, <function hello at 0x7fd2c66b1e10>!'

# Exercise: `count_vowels_file`

1. Define a function whose argument is a filename.
2. The function will return an integer, the number of vowels in that file.
3. This means you'll need to open the file, iterate over every line, then iteratve over every character in the line.
4. Print the value returned by the function.

`count_vowels_file('/etc/passwd')` -- it'll return an integer.

# Next up:

1. Local vs. global variables (functions)
2. Modules
    - What is a module?
    - Importing from modules
    - Python standard library
    - PyPI and downloading packages
    - Download via an API
3. What's next + open Q&A

# Local vs. global variables

Without functions, all of the variables we define are known as "globals." That means that any one part of our program can view and modify any variable.

If we're collaborating on a software project, and I use the variable name `x`, and you use the variable name `x`, then we're going to have what's known as a *namespace collision*. Meaning, someone is going to lose, and it'll be very ugly.

Functions solve this problem by keeping things local. Any variable you define or assign to in a function (i.e., inside of a function body) is totally separate from all global variables. This means that if two different functions both use `x` inside of their function bodies, neither affects the other -- because the variables only exist when the function runs. When the function returns, the variables go away.

This means that when you're writing a function, you can largely ignore the variable names outside of the function.

Parameters (the variables we declared at the top of the function) are local variables, too. They are automatically assigned values when we call the function, and their are also wiped away when the function exits.

In [46]:
x = 100      # outside of a function -- so it must be global

def myfunc():
    x = 200  # inside of a function -- so it must be local
    print(f'x = {x}')  # local variables, if they exist, get priority

print(f'Before, x = {x}')    # we'll get the global
myfunc()
print(f'After, x = {x}')     # we'll get the global, because the function returned

Before, x = 100
x = 200
After, x = 100


# Modules

DRY -- don't repeat yourself

1. Several lines repeated -- replace with a loop!
2. The same code repeated in multiple places -- replace with a function!
3. The same code repeated in multiple *programs* -- use a library, known in Python as a "module."

Modules are Python's libraries. But they are also our *namespaces*, meaning that they ensure we don't have major namespace collisions.  



In [None]:
# if I want to generate a random integer between 0-100,
# I can use the "random" module, and specifically the "randint"
# function in the module.

# first, I have to load the module
import random

# then any names defined in random.py are available as attributes of the module object
# meaning, if "random