In [421]:
# Initialize Otter
import otter
grader = otter.Notebook("ps2.ipynb")

# STATS 507 W22
## Problem Set 2
All functions will be tested by visible as well as hidden tests. The maximum amount of time any function is allowed to run is 10 seconds.

You may use the `collections` and `itertools` modules in this assignment.

In [422]:
import collections
import itertools

### Question 1: Solve the spelling bee (8 points)

![image.png](2021-June-27-bee.png)

Each night the New York Times publishes a puzzle called [Spelling Bee](https://www.nytimes.com/puzzles/spelling-bee). The rules are:

1. Words must contain at least 4 letters.
1. Words must include the center letter.
1. Letters can be used more than once.
1. ~Words cannot be proper nouns~. (For this problem, we will relax rule \#4 since it is not easy to find an online word list that excludes proper nouns. So, we will consider any word that satisfies rules 1-3 to be valid.)

Scoring is as follows:
- 4-letter words are worth 1 point each.
- Longer words earn 1 point per letter.
- Each puzzle includes at least one “pangram” which uses every letter. These are worth 7 extra points!

In the image above, the seven letters are **d**, **p**, **e**, **z**, **i**, **t** and the center letter **u**. Some valid words are:

- "dude" (score: 1)
- "duet" (score: 1)
- "duped" (score: 5)
- "puppet" (score: 6)
- "deputize" (score: 15, pangram)
- "deputized" (score: 16, pangram)

Some invalid words are:
- "dud" (too short)
- "deed"  (doesn't contain "u")


**Note**: solutions to each puzzle are published by the website https://nytbee.com. In addition to the tests we have provided, you can use this web site to check your answers.

#### Word list
You may use the following word list, obtained from [an online source](https://github.com/dwyl/english-words), to test your code.

In [556]:
words = [line.strip() for line in open("words.txt", "rt")]
len(words)

370103

In [557]:
import random
[random.choice(words) for _ in range(10)]

['retributory',
 'anticommunistical',
 'androgonium',
 'vaganti',
 'unordinately',
 'revet',
 'dripproof',
 'adlumine',
 'geogonical',
 'frankish']

However, your functions should work if we pass in other, different word lists too.

**1(a)** (2 pts.) Write a function `word_score(word, letters)` which returns an integer score to each word according to the rules shown above.

In [558]:
def word_score(word: str, letters: str) -> int:
    """Return the spelling bee score for word spelled from letters.

    Args:
        word: the word to be scored
        letters: a string of seven letters. The first entry letters[0] is the center/required letter.

    Returns:
        an integer score for the letter according to the rules of the game.

    Raises:
        ValueError if the word is invalid (too short, cannot be spelled from letters, or does not contain the center letter.)
    """
    if all(letter in letters for letter in word) and letters[0] in word and word in words and len(word)>3:
        if len(word)==4:
            return 1
        elif len(word)>4 and not all(letter in word for letter in letters):
            return len(word)
        elif len(word)>4 and all(letter in word for letter in letters):
            return len(word)+7
    else:
        raise ValueError
    ...

In [559]:
grader.check("q1a")

**1(b)** (6 pts.) Write a function `spelling_bee(letters, word_list)` which finds all value words in `word_list` that can be spelled using the letters in `letters` according to the rules of Spelling Bee. The function should return a list of `(word, score)` tuples, each corresponding to a valid word and containing its score as calculated using the function you wrote above.

In [560]:
from functools import reduce


def spelling_bee(letters: str, word_list: "list[str]") -> "list[tuple(str, int)]":
    """
    Return all possible words in word_list that can be spelled using letters
    according to rules of Spelling Bee.

    Args:
        letters: string of length 7 containing the seven letters of the spelling bee.
            the yellow letter occupies position 0 of the string.
        word_list: a list of possible words.

    Returns:
        A list of (word, score) tuples. Each returned word:
            - Exists in word_list,
            - Is at least four letters long,
            - Is spelled entirely from letters in letters, and
            - Contains at least one instance of letters[0].
    """
    li = []
    must_letter = letters[0]
    alphabet = 'abcdefghijklmnopqrstuvwxyz'
    not_in_letters = [x for x in alphabet if x not in letters]

    # Reduce the words file
    for word in word_list:
        if len(word)>3:
            if must_letter in word:
                if not any([x in not_in_letters for x in word]):
                    li.append((word, word_score(word, letters)))
    return li
    ...

In [561]:
grader.check("q1b")

**1(c)** (Extra credit) For a given set of letters, define the *game score* to be the sum of the scores of all the valid words. For example, if the output of `spelling_bee()` was 
```
>>> spelling_bee("wrlkaod", word_list)
[("workload", 15), ("draw", 1), ("drawl", 5), ("wallow", 6)],
```

the game score would be $15+1+5+6=27$. 

Using any distinct seven lower-case letters of your choosing, what is the **lowest** possible game score you can achieve for the word list supplied with this problem set? Store your answer in a variable called `lowest_letters`. (Remember that in order to form a valid game, it must be possible to find at least one pangram in `word_list` using the letters in `lowest_letters`.)

Scoring:
- The student(s) who gets the lowest score will receive +3 pts extra credit;
- The student(s) who gets the second lowest score will receive +2 pts extra credit;
- Any other student whose answer is in the bottom 50% of lowest scores submitted will get +1 pt extra credit.
- If multiple winning entries contain the same set of letters, each entry will receive $\max(1, \frac{1}{\text{num. entries}})$ points.

In [562]:
"""
import random
ll = []
for word in words:
    if len(word)==7:
        ll.append(word)

dis = []
for word in ll:
    dis.append(list(dict.fromkeys(word)))


final = []
for i in range(200):
    for word in ll:
        if dis[i] == list(word):
            final.append(word)

random.shuffle(final)

scores = []
for letters in final:
    letters = ''.join(random.sample(letters,len(letters)))
    scores.append((letters, sum([x[1] for x in spelling_bee(letters, words)])))

print(scores)

#my_score = sum([x[1] for x in spelling_bee(letters, words)])
#print(my_score)
"""
lowest_letters = 'juncatb'

### Question 2: Fun with Strings (4 points)

In this problem, you'll implement a few simple functions for dealing
with strings. You need not perform any error checking in any of the
functions for this problem.

**2(a)** (1 pts.) A palindrome is a word or phrase that reads the same backwards and
    forwards (<https://en.wikipedia.org/wiki/Palindrome>). So, for
    example, the words "level", "kayak" and "pop" are all English
    palindromes, as are the phrases "rats live on no evil star" and "Was
    it a car or a cat I saw?", provided we ignore the spaces and
    punctuation. Write a function called `is_palindrome`, which takes a string as its
    only argument, and returns a Boolean. Your function should return
    `True` if the argument is a palindrome, and `False` otherwise. For the purposes
    of this problem, you may assume that the input is a string and will
    consist only of alphanumeric characters (i.e., the letters, either
    upper or lower case, and the digits 0 through 9) and spaces. Your
    function should ignore spaces and capitalization in assessing
    whether or not a string is a palindrome, so that `Aba` and `ab a` are both
    considered palindromes.

In [563]:
def is_palindrome(string):
    """
    Returns whether a string is Palindrome or not.
    """
    string = string.strip().lower().replace(" ", "")
    return list(string) == list(reversed(string))
    ...

In [564]:
grader.check("q2a")

**2(b)** (1 pts.) Let us say that a word is "abecedarian" if its letters appear in
    alphabetical order (repeated letters are okay). So, for example,
    "adder" and "beet" are abecedarian, whereas "dog" and "cat" are not.
    Write a function `is_abecedarian`, which takes a single argument in the form of a
    string and returns `True` if the argument is abecedarian and `False` otherwise.
    Here you may assume that the input consists only of alphabetic
    characters and spaces. Your function should ignore spaces, so that
    the string `abcd efgh xyz` is considered abecedarian. Your function should ignore
    capitalization, so that `Abc`, `A bc` and `aA Bc` are all considered
    abecedarian.

In [565]:
def is_abecedarian(string):
    """
    Returns whether a string is abecedarian or not.
    """
    string = string.strip().lower().replace(" ", "")
    return all([string[index] <= string[index+1] for index in range(len(string)-1)])
    ...

In [566]:
grader.check("q2b")

**2(c)** (2 pts.) Write a function called `double_vowels` that takes a string as its only argument
    and returns that string with all the vowels duplicated. For the
    purposes of this question, the vowels are the letters "a e i o u".
    Thus, `double_vowels(’cat’)` should return `caat`, `double_vowels(’audio’)` should return `aauudiioo`, `double_vowels(’Aa’)` should return `AAaa`,
    etc. **Hint:** there is a particularly elegant solution to this
    problem that makes use of the accumulator pattern we saw in lecture
    and the fact that Python strings implement the addition operation as
    string concatenation.

In [567]:
def double_vowels(string):
    """
    Returns a string with all the vowels duplicated
    """
    vowels = ['a', 'e', 'i', 'o', 'u']
    double_vowels=''
    for letter in string:
        if letter.lower() in vowels:
            double_vowels = double_vowels + 2*letter
        else:
            double_vowels = double_vowels + letter
            
    return double_vowels
    ...


In [568]:
grader.check("q2c")

### Question 3: Fun with Lists (4 points)

In this problem, you'll implement a few very simple list operations.

**3(a)** (1 pts.) Write a function `list_reverse` that takes a list as an argument and returns
    that list, reversed. That is, given the list `[1,2,3]`, your function
    should return the reversed list, `[3,2,1]`. Your function should raise an
    appropriate error in the event that the input is not a list.

In [569]:
def list_reverse(lis):
    '''
    Returns the reversed list. 
    '''
    if type(lis) != list:
        raise TypeError
    return [item for item in reversed(lis)]
    ...


In [570]:
grader.check("q3a")

**3(b)** (1 pts.) Write a function `is_sorted` that takes a sequence `seq` as its only argument
    and returns `True`if the sequence is sorted in non-decreasing order and
    returns `False` otherwise. You may assume that `seq` is, in fact, a Python
    sequence. Your function should require a single traversal of the
    list. You may assume that all
    elements in the input sequence `seq` support the comparison operations
    (`==`, `<`/`>`, `>=`, etc), so that there is no need for error checking.
    (Indeed, if you try to make the comparison, say, `1 < ’cat’`, Python will
    raise an error for you, anyway.) **Note:** this problem illustrates
    a particularly useful aspect of Python's dynamic typing. It is
    possible to write this function while being agnostic as to the type
    of the input variable. It requires only that `seq` supports indexing
    and that the elements in `seq` support the comparison operations.

In [571]:
def is_sorted(seq):
    """
    Returns whether the sequence is sorted.
    """
    return all([seq[index] <= seq[index+1] for index in range(len(seq)-1)])
    ...

In [572]:
grader.check("q3b")

**3(c)** (2 pts.) This one is a common coding interview question. Write a function
    called `binary_search` that takes two arguments, a list of integers `t` (which is
    guaranteed to be sorted in ascending order) and an integer `elmt`, and
    returns `True` if `elmt` appears in list `t` and `False` otherwise. Of course, you
    could do this with the `in` operator, but that will be slow when the
    list is long, for reasons that we discussed in class. Instead, you
    should use *binary search*: To look for `elmt`, first look at the
    "middle" element of the list `t`. If it's a match, return `True`. If it
    isn't a match, compare `elmt` against the "middle" element, and recurse,
    searching the first or second half of the list depending on whether
    `elmt` is bigger or smaller than the middle element. **Hint:** be
    careful of the *base cases*: What should you do when `t` is empty,
    length 1, length 2, etc.? **Note:** your solution must actually make
    use of binary search to receive credit, and your solution must not
    use any built-in sorting or searching functions. **Note:** we could,
    if we wanted, use the function `is_sorted` that we wrote above to do error
    checking here, but there is a good reason not to do so. This reason
    will become clear when we make our brief foray into the topic of
    *runtime analysis* later in the semester.

In [573]:
def binary_search(t, elmt):
    """
    Returns whether elmt is in t
    """
    if t == []:
        return False
    middle_point = t[len(t)//2]
    if elmt == middle_point:
        return True
    elif len(t) > 1 and elmt > middle_point:
        t = [x for x in t if x > middle_point]
        return binary_search(t, elmt)
    elif len(t) > 1 and elmt < middle_point:
        t = [x for x in t if x < middle_point]
        return binary_search(t, elmt)
    else:
        return False
    ...

In [574]:
grader.check("q3c")

### Question 4: More Fun with Strings (4 points)

In this problem, you'll implement some very simple counting operations
that are common in fields like biostatistics and natural language
processing. You need not perform any error checking in the functions for
this problem.

**4(a)** (2 pts.) Write a function called `char_hist` that takes a string as its argument and
    returns a dictionary whose keys are characters and values are the
    number of times each character appeared in the input. So, for
    example, given the string "gattaca", your function should return a
    dictionary with key-value pairs `g:1, a:3, t:2, c:1`. Your function should count *all*
    characters in the input (including spaces, tabs, numbers, etc). The
    dictionary returned by your function should have as its keys all and
    only the characters that appeared in the input (i.e., you don't need
    to have a bunch of keys with value 0). Your function should count
    capital and lower-case letters as the same, and key on the
    lower-case version of the character, so that `G` and `g` are both
    counted as the same character, and the corresponding key in the
    dictionary is `g`.

In [575]:
def char_hist(string):
    """
    Returns a dictionary whose keys are characters and values
    are the number of times each character appeared in the input.
    """
    d = {}
    for y in string.lower():
        d.setdefault(y,0)
        d[y] = d[y]+1
    return d
    ...

In [576]:
grader.check("q4a")

**4(b)** (2 pts.) In natural language processing and bioinformatics, we often want to
    count how often characters or groups of characters appear. Pairs of
    words or characters are called "bigrams". For our purposes in this
    problem, a bigram is a pair of characters. As an example, the string
    `mississippi` contains the following bigrams, in order: {'mi', 'is', 'ss', 'si', 'is', 'ss', 'si', 'ip', 'pp', 'pi'}. Write a function called `bigram_hist` that takes a string as its
    argument and returns a dictionary whose keys are 2-tuples of
    characters and values are the number of times that pair of
    characters appeared in the string. So, for example, when called on
    the string `mississippi`, your function should return a dictionary with keys
    {(m,i), (i,s), (s,s), (s,i), (i,p), (i,p), (p,p), (p,i)}
    and respective count values 1,2,2,2,1,1,1. As another example,
    if the two-character string `ab` occurred four times in the input,
    then your function should return a dictionary that includes the
    key-value pair `(a,b):4`. Your function should handle all characters
    (alphanumerics, spaces, punctuation, etc). So, for example, the
    string `cat, dog` includes the bigrams `t,`, `, ` and ` d`. As in the previous
    subproblem, the dictionary produced by your function should only
    include pairs that actually appeared in the input, so that the
    absence of a given key implies that the corresponding two-character
    string did not appear in the input. Also as in the previous
    subproblem, you should count upper- and lower-case letters as the
    same, so that `GA`, `Ga`, `gA` and `ga` all count for the same key, ``.

In [577]:
def bigram_hist(string):
    """
    Returns a dictionary whose keys are 2-tuples of characters
    and values are the number of times that pair of characters
    appeared in the string.
    """
    string = string.lower()
    key_pairs = []
    d = {}
    for i in range(len(string)-1):
        key_pairs.append((string[i], string[i+1]))
    for y in key_pairs:
        d.setdefault(y,0)
        d[y] = d[y]+1
    return d
    ...

In [578]:
grader.check("q4b")

### Question 5: Tuples as Vectors (5 points)

In this problem, we’ll see how we can use tuples to represent vectors. Later in the semester, we’ll see the Python numpy and scipy packages, which provide objects specifically meant to enable matrix and vector operations, but for now tuples are all we have. So, for this problem we will represent a d-dimensional vector by a length-d tuple of floats.

**5(a)** (1 pts.) Implement a function called `vec_scalar_mult`, which takes two arguments: a tuple of numbers (floats and/or integers) `t` and a number (float or integer) `s` and returns a tuple of the same length as `t`, with its entries equal to the entries of `t` multiplied by `s`. That is, `vec_scalar_mult` implements multiplication of a vector by a scalar. Your function should check to make sure that the types of the input are appropriate (e.g., that `s` is a float or integer), and raise a TypeError with a suitable error message if the types are incorrect. However, your function should gracefully handle the case where the input `s` is an integer rather than a float, or the case where some or all of the entries of the input tuple are integers rather than floats. Hint: you may find it useful for this subproblem and the next few that follow it to implement a function that checks whether or not a given tuple is a “valid” vector (i.e., checks if a variable is a tuple and checks that its entries are all floats and/or integers).

In [579]:
...

Ellipsis

In [580]:
def check(t):
    if type(t) != tuple:
        return False
    for i in t:
        if not (type(i)==float or type(i)==int):
            return False
            
def vec_scalar_mult(t, s):
    """
    Returns multiplication of a vector by a scalar.
    """
    if check(t) == False:
        raise TypeError('incorrect tuple')
    if not (type(s)==float or type(s)==int):
        raise TypeError('s has to be float or int')
    result = ()
    for number in t:
        result = result + (number*s,)
    return result
    ...

In [581]:
grader.check("q5a")

**5(b)** (1 pts.) Implement a function called `vec_inner_product` which takes two "vectors" (i.e.,
    tuples of floats and/or ints) as its inputs and outputs a float
    corresponding to the inner product of these two vectors. Recall that
    the inner product of vectors $x,y \in \mathbb{R}^d$ is given by
    $\sum_{j=1}^d x_j y_j$. Your function should check whether or not
    the two inputs are of the correct type (i.e., both tuples), and
    raise a `TypeError` if not. Your function should also check whether or not
    the two inputs agree in their dimension (i.e., length, so that the
    inner product is well-defined), and raise a `ValueError` if not.

In [582]:
def vec_inner_product(a, b):
    """
    Returns the inner product of two vectors.
    """
    if check(a) or check(b) == False:
        raise TypeError('incorrect tuple')
        
    if type(a) != tuple or type(b) != tuple:
        raise TypeError('inputs have to be tuples')
        
    if len(a) != len(b):
        raise ValueError('tuples must have the same dimension')
        
    inner_product = 0
    
    for (x_i, b_j) in zip(a,b):
            inner_product = inner_product + x_i*b_j
            
    return inner_product
    ...

In [583]:
grader.check("q5b")

**5(c)** (1 pts.) It is natural, following the above, to extend our scheme to the case
    of matrices. Recall that a matrix is simply a box of numbers. If you
    are not already familiar with matrices, feel free to look them up on
    Wikipedia or in any linear algebra textbook. We will represent a
    matrix as a *tuple of tuples*, i.e., a tuple whose entries are
    themselves tuples. We will represent an $m$-by-$n$ matrix as an
    $m$-tuple of $n$-tuples. To be more concrete, suppose that we are
    representing an $m$-by-$n$ matrix $M$ as a variable `my_mx`. Then `my_mx` will
    be a length-$m$ tuple of $n$-tuples, so that the $i$-th row of the
    matrix is given (as a vector) by the $i$-th entry of tuple `my_mx`.

Write a function `is_valid_matrix` that takes a single argument and returns a
    Boolean, which is `True` if the given argument is a tuple that validly
    represents a matrix as described above, and returns `False` otherwise. A
    valid matrix will be a tuple of tuples such that

-   Every element of the tuple is itself a tuple,

-   each of these tuples is the same length, and

-   every element of each of these tuples is a number (i.e., a float or integer).

In [584]:
def is_valid_mat(mat):
    """
    Returns whether mat is a valid matrix.
    """
    for t in mat:
        if check(t) == False:
            return False
    
    if all([len(t) == len(mat[0]) for t in mat]):
        return True
    else:
        return False
    ...

In [585]:
grader.check("q5c")

**5(d)** (2 pts.) Write a function `mat_vec_mult` that takes a matrix (i.e., tuple of tuples) and
    a vector (i.e., a tuple) as its arguments, and returns a vector
    (i.e., a tuple of numbers) that is the result of multiplying the
    given vector by the given matrix (we are treating vectors as column
    vectors here, so the matrix acts "on the left"). Again, if you are
    not familiar with matrix-vector multiplication, refer to Wikipedia
    or any linear algebra textbook. Your function should check that all
    the supplied arguments are reasonable (e.g., using your function
    `is_valid_mat`), and raise an appropriate error if not. **Hint:** you may find
    it useful to make use of the inner-product function that you defined
    previously.

In [586]:
def mat_vec_mult(mat, vec):
    """
    Returns the result of multiplying the given vector by the given matrix.
    """
    if not is_valid_mat(mat):
        raise TypeError('Incorrect Matrix')
    
    result = ()
    for vector in mat:
        result = result + (vec_inner_product(vector,vec),)
        
    return result
    ...

In [587]:
grader.check("q5d")

### Question 6: More Fun with Vectors (4 points)

In the previous problem, you implemented matrix and vector operations
using tuples to represent vectors. In many applications, it is common to
have vectors of dimension in the thousands or millions, but in which
only a small fraction of the entries are nonzero. Such vectors are
called *sparse* vectors, and if we tried to represent them as tuples, we
would be using thousands of entries just to store zeros, which would
quickly get out of hand if we needed to store hundreds or thousands of
such vectors.

A reasonable solution is to instead represent a sparse vector (or
matrix) by only storing its non-zero entries, with (index, value) pairs.
We will take this approach in this problem, and represent vectors as
dictionaries with positive integer keys (so we index into our vectors
starting from 1, just like in MATLAB and R). A *valid* sparse vector
will be a dictionary that has the properties that (1) all its indices
are positive integers, and (2) all its values are floats.

**6(a)** (2 pts.) Write a function `is_sparse_vector` that takes one argument, and returns `True` if and
    only if the input is a valid sparse vector, and returns `False`
    otherwise. **Note:** your function should *not* assume that the
    input is a dictionary.

In [588]:
def is_sparse_vector(d):
    """
    Returns whether the given variable represents a sparse vector
    """
    if type(d) == dict:
        for i in d.keys():
            if not (isinstance(i, int) and i > 0 and isinstance(d[i], float)):
                return False
            else:
                return True

    return False       
    ...

In [589]:
grader.check("q6a")

**6(b)** (2 pts.) Write a function `inner_sparse` that takes two "sparse vectors" as its inputs,
    and returns a float that is the value of the inner product of the
    vectors that the inputs represent. Your function should raise an
    appropriate error in the event that either of the inputs is not a
    valid sparse vector. Note that by our definition, a sparse vector
    has no specified dimension, so there is no need to check that the
    dimensions of the arguments agree.

In [590]:
def inner_sparse(vec1, vec2):
    """
    Returns the inner product of two sparse vectors
    """
    if is_sparse_vector(vec1) == False or is_sparse_vector(vec2) == False:
        raise TypeError('Inputs are not sparse vectors')

    inner_product = 0
    
    # compute inner product
    inner_product = 0
    for key in vec1.keys():
        if key in vec2.keys():
            inner_product = inner_product + vec1[key]*vec2[key]
    return inner_product 
    ...

In [591]:
grader.check("q6b")

---

To double-check your work, the cell below will rerun all of the autograder tests.

In [592]:
grader.check_all()

q1a results: All test cases passed!

q1b results: All test cases passed!

q1c results: All test cases passed!

q2a results: All test cases passed!

q2b results: All test cases passed!

q2c results: All test cases passed!

q3a results: All test cases passed!

q3b results: All test cases passed!

q3c results: All test cases passed!

q4a results: All test cases passed!

q4b results: All test cases passed!

q5a results: All test cases passed!

q5b results: All test cases passed!

q5c results: All test cases passed!

q5d results: All test cases passed!

q6a results: All test cases passed!

q6b results: All test cases passed!

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

Upload this .zip file to Gradescope for grading.

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(pdf=False)