In [1]:
# LOAD THE FILES FOR THIS NOTEBOOK
!wget -q --show-progress --no-check-certificate 'https://docs.google.com/uc?export=download&id=1SObn6N0Ed5vPrH5yOQUsfVvEnAQmoD2m' -O 'Class 8.zip'
from zipfile import ZipFile
with ZipFile('Class 8.zip', 'r') as zipObj:
  zipObj.extractall()



<b>LING 193 - Lecture 8<br>
Intro to Python II</b><br>
Andrew McInnerney<br>
September 26, 2022

# 1 Reading files
Sometimes we will want Python to be able to access information stored in a format other than a '.py' Python file. We can do that with the `open()` function. You can feed `open()` a file name, or a file location, from the directory where your Python script is. If you give `open()` a '.txt' file, the output will be a long string. For us, it will tend to be more useful to get a list containing each line of the file. That can be done by using `.splitlines()`, like this:

```
>>> open("...").read().splitlines()
```

When reading in files, it tends to be good practice to use `with`. This keeps Python from storing the entire file in active memory for the duration of the runtime. This is what that looks like:

```
>>> with open("...") as file:
    x = file.read().splitlines()
```

It's also possible to read files from a url, but we're not going to worry about that here.

Reading files is not something you're going to need to master in this class. In general, I'll give you the code to read in the files you need. What you should know is that the `open()` function is the main player in reading files.

Run the following code block to read in the Scrabble dictionary we've been using.

In [2]:
with open("Collins Scrabble Words (2019).txt") as file:
    wordlist = file.read().splitlines()

# 2 Dictionaries
It can be helpful to store information that can be accessed easily by a "key." For example, we mgiht want to have a set of words paired with their lengths, or a set of letters paired with their frequencies. This is what "dictionaries" do in Python. 

> NOTE: Python "dictionaries" are a type of <i>data structure</i>, not necessarily an actual real world <i>dictionary</i>. That's a really important distinction, e.g. because the Scrabble dictionary we've been using is actually a <i>list</i> as far as Python is concerned.

Dictionaries in Python are distinguished by two things, curly brackets `{}` and semicolons `:`.

In [None]:
example_dictionary = {"a":1}
print(type(example_dictionary))

Here is a dictionary that lists each letter of the alphabet with its position in the alphabet:

In [None]:
letters = {"a":1,"b":2,"c":3,"d":4,"e":5,"f":6,"g":7,"h":8,\
           "i":9,"j":10,"k":11,"l":12,"m":13,"n":14,"o":15,\
           "p":16,"q":17,"r":18,"s":19,"t":20,"u":21,"v":22,\
           "w":23,"x":24,"y":25,"z":26}
print(letters)

We could define this dictionary more efficiently like this:

In [3]:
letters = {} # This defines an empty dictionary
alphabet = "abcdefghijklmnopqrstuvwxyz"
counter = 1
for letter in alphabet:
    letters[letter] = counter
    counter += 1
print(letters)

{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5, 'f': 6, 'g': 7, 'h': 8, 'i': 9, 'j': 10, 'k': 11, 'l': 12, 'm': 13, 'n': 14, 'o': 15, 'p': 16, 'q': 17, 'r': 18, 's': 19, 't': 20, 'u': 21, 'v': 22, 'w': 23, 'x': 24, 'y': 25, 'z': 26}


Dictionaries are made up of "items", which can be accessed like this:

In [4]:
print(list(letters.items()))

[('a', 1), ('b', 2), ('c', 3), ('d', 4), ('e', 5), ('f', 6), ('g', 7), ('h', 8), ('i', 9), ('j', 10), ('k', 11), ('l', 12), ('m', 13), ('n', 14), ('o', 15), ('p', 16), ('q', 17), ('r', 18), ('s', 19), ('t', 20), ('u', 21), ('v', 22), ('w', 23), ('x', 24), ('y', 25), ('z', 26)]


The output here is a list of "pairs". Note that if you don't tell Python to give them to you as a list, it will give you this weird thing:

In [5]:
print(letters.items())

dict_items([('a', 1), ('b', 2), ('c', 3), ('d', 4), ('e', 5), ('f', 6), ('g', 7), ('h', 8), ('i', 9), ('j', 10), ('k', 11), ('l', 12), ('m', 13), ('n', 14), ('o', 15), ('p', 16), ('q', 17), ('r', 18), ('s', 19), ('t', 20), ('u', 21), ('v', 22), ('w', 23), ('x', 24), ('y', 25), ('z', 26)])


Each dictionary item is made up of an "index" (on the left of the colon) and a "value" (on the right of the colon). We can access these like this:

In [6]:
print(list(letters.keys()))    # Same caveat about 'list()'
print(list(letters.values()))

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26]


What turns out to be really useful is to retrieve a value with a key:

In [7]:
print(letters["a"])
print(letters["m"])
print(letters["z"])

1
13
26


<b>PRACTICE:</b><br>
Create a dictionary with four items:<br>
- Keys should be `'APPLE'`, `'ORANGE'`, `'ONION'`, and `'BROCCOLI'`
- Values should be `'fruit'` or `'veggie'`, depending on whether the food is a fruit or a vegetable.

In [11]:
temp_dict = dict()
temp_dict['APPLE'] = 'fruit'
temp_dict['ORANGE'] = 'fruit'
temp_dict['ONION'] = 'vegggie'
temp_dict['BROCCOLI'] = 'vegggie'

print(temp_dict)


{'APPLE': 'fruit', 'ORANGE': 'fruit', 'ONION': 'vegggie', 'BROCCOLI': 'vegggie'}


# 3 Functions
The `def` operator is a very powerful tool. We can use it to store sequences of commands. For instance, we can calculate the average of a numerical list with a sequence of steps like this:

In [None]:
data = [10, 20, 100, 45, 50]    # Here's a sample data set
total = 0                       # We start a tally
for datum in data:              # We iterate through each data point
    total += datum              # We add each data point to the tally,
average = total/len(data)       # Which we divide by the number of data points
print(average)                  # to give us the average

To calculate the average of a new data set, like this one,<br>
`data2 = [10, 20, 100, 45, 50, 100, 1000, 2035]`<br>
we would need to write out all those commands again.

But with 'def', we can define a function performing all the steps, simplifying the process.

In [None]:
def average(data):  # NOTE: the names of the arguments in the definition are *ARBITRARY*
    total = 0 
    for datum in data:
        total += datum
    average = total/len(data)
    return average

What that definition does is say, "Ok, let's pretend like we have some list called `data`. Here's what we're going to do with that list."

In this case, we tell it to start a tally, then run through each item in the list and add it to the tally, then divide by the total number of list items. That's just what we did before, but now we've stored it as a compact command. We can easily calculate the average of a list like this now:

In [None]:
print(average(data))

And it's easy to apply to new lists:

In [None]:
data2 = [10, 20, 100, 45, 50, 100, 1000, 2035]
print(average(data2))

This technique is very useful when we want to use the same logic multiple times.
 For example, maybe we want a function that tells us how many letters are
 in a given word, but only if that word is in a given wordlist.

In [12]:
def count_letters(word):
    if word in wordlist:
        return "'"+word+"'"+" HAS "+str(len(word))+" LETTERS"
    else:
        return "'"+word+"'"+" IS NOT IN WORDLIST"
        
print(count_letters("PAYMENT"))
print(count_letters("BELERIAND"))

'PAYMENT' HAS 7 LETTERS
'BELERIAND' IS NOT IN WORDLIST


<b>PRACTICE 1</b><br>
Write a function that takes a word as input, and returns the number of <b>orthographic vowels</b> (a, e, i, o, u) in that word.

In [22]:
def calculate_vowels(word):
  word = word.lower()
  vowels = "aeiou"
  orth_dict = dict()
  
  for i in range(len(word)):
    if word[i] in vowels:
      try:
        orth_dict[word[i]] += 1
      except:
        orth_dict[word[i]] = 1

  print(orth_dict)

word = "Collins"
calculate_vowels(word)

{'o': 1, 'i': 1}


<b>PRACTICE 2</b><br>
Write a function that takes a list of words as input and returns the number of vowels in each word as output.<br>
Test the function out on the `fruits` and `vegetables` lists from last time:
```
fruits = ["apple", "orange", "lemon", "pear"]
vegetables = ["carrot", "tomato", "broccoli", "cucumber"]
```

In [27]:
fruits = ["apple", "orange", "lemon", "pear"]
vegetables = ["carrot", "tomato", "broccoli", "cucumber"]

print("Fruits list: ")
for fruit in fruits:
  print(f"{fruit}: ")
  calculate_vowels(fruit)

print()

print("Vegetables list: ")
for veggie in vegetables:
  print(f"{veggie}: ")
  calculate_vowels(veggie)

Fruits list: 
apple: 
{'a': 1, 'e': 1}
orange: 
{'o': 1, 'a': 1, 'e': 1}
lemon: 
{'e': 1, 'o': 1}
pear: 
{'e': 1, 'a': 1}

Vegetables list: 
carrot: 
{'a': 1, 'o': 1}
tomato: 
{'o': 2, 'a': 1}
broccoli: 
{'o': 2, 'i': 1}
cucumber: 
{'u': 2, 'e': 1}
