## Week 6 Assignment - W200 Introduction to Data Science Programming, UC Berkeley MIDS

Write code in this Jupyter Notebook to solve each of the following problems. Each problem should have its solution in a separate cell. Please upload this **notebook**, your **scrabble.py** file, the **sowpods.txt** file, and your **score_word** module to your GitHub repository in your SUBMISSIONS/week_06 folder by 11:59PM PST the night before class.

## 6-2. Cheating at Scrabble (90 points + 10 Extra Credit Points)

Write a Python program that takes a Scrabble rack as a command-line argument and prints all "valid Scrabble English" words that can be constructed from that rack, along with their Scrabble scores, sorted by score. "valid Scrabble English" words are provided in the data source below. A Scrabble rack is made up of 2 to 7 characters.

Below are the requirements for the program:
- This needs to be able to be run as a command line tool as shown below (not an input statement!)
- Name the python file: `scrabble.py`
- Make a separate module named `wordscore` which contains at a minimum a function called `score_word`. This `score_word` function will take each word and return the score (scoring dictionary is described below). Import this function into your main `scrabble.py` program. 
- Allow anywhere from 2-7 character tiles (letters A-Z) to be inputted. 
- Do not restrict the number of same tiles (e.g., a user is allowed to input ZZZZZQQ).
- Output the **total** list of valid Scrabble words that can be constructed from the rack as (score, word) tuples, sorted by the score and then by the word alphabetically as shown in the first example below.
- Then output 'Total number of words:' and the total number.
- You need to handle input errors from the user and suggest what that error might be caused by and how to fix it (i.e., a helpful error message)
- Implement wildcards as either `*` or `?`. There can be a total of **only** two wild cards in any user input (that is, one of each character: one `*` and one `?`). Only use the `*` and `?` as wildcard characters. A wildcard character can take any value A-Z. Replace the wildcard symbol with the letter in your answer (see the second example below). 
- Wildcard characters are scored as 0 points, just like in the real Scrabble game. A two wildcard word can be made, should be outputted and scored as 0 points. 
  - In a wildcard case where the same word can be made with or without the wildcard, display the highest score. For example: given the input 'I?F', the word 'if' can be made with the wildcard '?F' as well as the letters 'IF'. Since using the letters 'IF' scores higher, display that score.
- For partial credit, your program should take less than one minute to run with 2 wildcards in the input. For full credit, the program needs to run with 2 wildcards in less than 30 seconds.
- Write docstrings for the functions and puts comments in your code.
- You may only use the Python standard library in this assignment. However, any function in the standard library is allowed.

An example invocation and output:
```
$ python scrabble.py "ZAEFIEE"
(17, feaze)
(17, feeze)
(16, faze)
(15, fez)
(15, fiz)
(12, zea)
(12, zee)
(11, za)
(6, fae)
(6, fee)
(6, fie)
(5, ef)
(5, fa)
(5, fe)
(5, if)
(2, ae)
(2, ai)
(2, ea)
(2, ee)
Total number of words: 19
```

An example wildcard invocation and output:
```
$ python scrabble.py "?F"
(4, ef)
(4, fa)
(4, fe)
(4, fy)
(4, if)
(4, of)
Total number of words: 6
```

#### Extra Credit (+10 points):
Requirements:
- Allow a user to specify that a certain letter has to be at a certain location. For the extra credit, locations of certain letters must be specified at the command line, and may not be some sort of user prompt. How you do this is up to you!
- This needs to be included and called from your regular scrabble.py file and work with the base requirements above. That is, your program must work with or without the extra credit portion and the extra credit cannot be in a different .py file.  
- Please put comments, any assumptions you made, and a sample of how to run your extra credit in the extra credit cell of this notebook below - it is the last cell. If there is not an example of how to run the extra credit in this cell we will assume that you did not do the extra credit part!

#### The Data
The file: http://courses.cms.caltech.edu/cs11/material/advjava/lab1/sowpods.zip or https://drive.google.com/file/d/1ewUiZL_4HanCDsaYB5pcKEgqjMFVgGnh/view?usp=sharing contains all "valid Scrabble English" words in the official words list, one word per line. You should download the word file and keep it in your repository so that the program is standalone (instead of accessing it over the web from Python).

You can read data from a text file with the following code:
```
with open("sowpods.txt","r") as infile:
    raw_input = infile.readlines()
    data = [datum.strip('\n') for datum in raw_input]
```

This will show the first 6 words:
```
print(data[0:6])
```
Please use the dictionary below containing the letters and their Scrabble values:
```
scores = {"a": 1, "c": 3, "b": 3, "e": 1, "d": 2, "g": 2,
         "f": 4, "i": 1, "h": 4, "k": 5, "j": 8, "m": 3,
         "l": 1, "o": 1, "n": 1, "q": 10, "p": 3, "s": 1,
         "r": 1, "u": 1, "t": 1, "w": 4, "v": 4, "y": 4,
         "x": 8, "z": 10}
```

#### Tips:
- If you don't know what "scrabble" is or the basic background of the game please look it up online!
- We recommend that you try to break down the problem into steps on your own before writing any code. Once you've scoped generally what you want to do, then start writing some code.  If you get stuck, go back to thinking about the problem rather than trying to fix lots of errors at the code level.
- If you have questions on getting arguments from the command line, please review async video 6.17 and Drill 6.18.
- If you keep getting stuck, then check out: https://wiki.openhatch.org/wiki/Scrabble_challenge or https://drive.google.com/file/d/1g3yz5ljkzaAeQ-AgQR1Hofy8ZJ0jo25x/view?usp=sharing. This is where we got the idea for this assignment and it provides some helpful tips for guiding you along the way.  However, we would recommend that you try to implement this first before looking at the hints on the website.

Good luck!

### The code below will test your command line implementation of the scrabble.py code. We've made some of these tests available for you to try!

In [1]:
# Code for the testing

import subprocess
from nose.tools import assert_equal 
from nose.tools import assert_true
from nose.tools import assert_greater
from nose.tools import assert_less

In [3]:
""" Code runs and can produce at least one error message """
# Autograde cell - do not erase/delete

# no rack error
!python scrabble.py  

usage: scrabble.py [-h] [--constraints CONSTRAINTS] [--timer] rack
scrabble.py: error: the following arguments are required: rack


In [4]:
""" Does not fail due to trivial mistakes and takes correct wildcard characters """
# Autograde cell - do not erase/delete

# does not fail due to case
!python scrabble.py "PENguin"

(10, penguin)
(9, pening)
(8, genip)
(8, unpeg)
(7, ingenu)
(7, penni)
(7, ping)
(7, pung)
(7, unpen)
(7, unpin)
(6, gip)
(6, gup)
(6, peg)
(6, pein)
(6, peni)
(6, pig)
(6, pine)
(6, pug)
(5, ennui)
(5, genu)
(5, gien)
(5, ginn)
(5, nep)
(5, nip)
(5, pen)
(5, pie)
(5, pin)
(5, piu)
(5, pun)
(4, eng)
(4, gen)
(4, gie)
(4, gin)
(4, gnu)
(4, gue)
(4, gun)
(4, neg)
(4, nine)
(4, pe)
(4, pi)
(4, up)
(3, gi)
(3, gu)
(3, inn)
(3, nie)
(3, nun)
(3, ug)
(3, uni)
(2, en)
(2, in)
(2, ne)
(2, nu)
(2, un)
Total number of words: 53


In [5]:
# Autograde cell - do not erase/delete

# takes wildcards

!python scrabble.py "PEN*?in"

(7, enprint)
(7, neaping)
(7, ninepin)
(7, opening)
(7, pannier)
(7, pantine)
(7, peaning)
(7, peening)
(7, peining)
(7, pending)
(7, penguin)
(7, pening)
(7, penni)
(7, pennia)
(7, pennied)
(7, pennies)
(7, pennill)
(7, pennine)
(7, penning)
(7, pennis)
(7, pension)
(7, pfennig)
(7, pinbone)
(7, pinene)
(7, pinenes)
(7, pinken)
(7, pinkens)
(7, pinnace)
(7, pinnae)
(7, pinnate)
(7, pinned)
(7, pinner)
(7, pinners)
(7, pinnet)
(7, pinnets)
(7, pinnie)
(7, pinnies)
(7, pinnoed)
(7, pinnule)
(7, pinones)
(7, pontine)
(7, punnier)
(7, spinner)
(7, spinnet)
(7, spinney)
(7, spinone)
(7, tenpin)
(7, tenpins)
(6, alpine)
(6, apneic)
(6, aspine)
(6, dipnet)
(6, epigon)
(6, genip)
(6, genips)
(6, gipsen)
(6, hippen)
(6, impend)
(6, impone)
(6, incept)
(6, inept)
(6, inspan)
(6, instep)
(6, kippen)
(6, leptin)
(6, lineup)
(6, lippen)
(6, loipen)
(6, lupine)
(6, mispen)
(6, naping)
(6, napkin)
(6, nappie)
(6, nepit)
(6, ne

(2, tige)
(2, tike)
(2, tile)
(2, time)
(2, tin)
(2, tind)
(2, ting)
(2, tink)
(2, tins)
(2, tint)
(2, tiny)
(2, tire)
(2, tite)
(2, tone)
(2, trie)
(2, trin)
(2, tune)
(2, twin)
(2, tyin)
(2, tyne)
(2, unai)
(2, unbe)
(2, unce)
(2, unci)
(2, unde)
(2, uni)
(2, unis)
(2, unit)
(2, vain)
(2, vane)
(2, veil)
(2, vena)
(2, vend)
(2, vent)
(2, viae)
(2, vibe)
(2, vice)
(2, vide)
(2, vie)
(2, vied)
(2, vier)
(2, vies)
(2, view)
(2, vile)
(2, vin)
(2, vina)
(2, vino)
(2, vins)
(2, vint)
(2, viny)
(2, vire)
(2, vise)
(2, vite)
(2, vive)
(2, vlei)
(2, wain)
(2, wane)
(2, wean)
(2, ween)
(2, weid)
(2, weil)
(2, weir)
(2, wen)
(2, wena)
(2, wend)
(2, wens)
(2, went)
(2, when)
(2, whin)
(2, wice)
(2, wide)
(2, wiel)
(2, wife)
(2, wile)
(2, win)
(2, wind)
(2, wing)
(2, wink)
(2, wino)
(2, wins)
(2, winy)
(2, wire)
(2, wise)
(2, wite)
(2, wive)
(2, wren)
(2, wynn)
(2, yean)
(2, yen)
(2, yens)
(2, yeti)
(2, yi

In [6]:
""" Produces a list of all words and scores that matches our expectations """
# Autograde cell - do not erase/delete

# The way windows and mac end lines is different 
# - windows adds a \r\n to each line 
# - mac/linux adds just a \n to each line
# The autograder will detect the system platform and use that to determine the 'correct' solution

# The code below is shown here so you know how the autograder works
# Try this block to see if your solution matches
# If your answer looks like it matches but still throws an 'autograder' error, please to double check your answer BUT 
# The purpose of the assignment isn't to make your code match the autograder - if you answer is correct it will be graded so 
# We will check every answer the autograder deems is incorrect

import platform

# test whether your output matches our expectation
cmd = ['python', 'scrabble.py', 'Penguin']
test = bytes.decode(subprocess.Popen(cmd, stdout=subprocess.PIPE).communicate()[0])

if platform.system() == 'Windows':
    solution = '(10, penguin)\r\n(9, pening)\r\n(8, genip)\r\n(8, unpeg)\r\n(7, ingenu)\r\n(7, penni)\r\n(7, ping)\r\n(7, pung)\r\n(7, unpen)\r\n(7, unpin)\r\n(6, gip)\r\n(6, gup)\r\n(6, peg)\r\n(6, pein)\r\n(6, peni)\r\n(6, pig)\r\n(6, pine)\r\n(6, pug)\r\n(5, ennui)\r\n(5, genu)\r\n(5, gien)\r\n(5, ginn)\r\n(5, nep)\r\n(5, nip)\r\n(5, pen)\r\n(5, pie)\r\n(5, pin)\r\n(5, piu)\r\n(5, pun)\r\n(4, eng)\r\n(4, gen)\r\n(4, gie)\r\n(4, gin)\r\n(4, gnu)\r\n(4, gue)\r\n(4, gun)\r\n(4, neg)\r\n(4, nine)\r\n(4, pe)\r\n(4, pi)\r\n(4, up)\r\n(3, gi)\r\n(3, gu)\r\n(3, inn)\r\n(3, nie)\r\n(3, nun)\r\n(3, ug)\r\n(3, uni)\r\n(2, en)\r\n(2, in)\r\n(2, ne)\r\n(2, nu)\r\n(2, un)\r\nTotal number of words: 53\r\n'
else:
    solution = '(10, penguin)\n(9, pening)\n(8, genip)\n(8, unpeg)\n(7, ingenu)\n(7, penni)\n(7, ping)\n(7, pung)\n(7, unpen)\n(7, unpin)\n(6, gip)\n(6, gup)\n(6, peg)\n(6, pein)\n(6, peni)\n(6, pig)\n(6, pine)\n(6, pug)\n(5, ennui)\n(5, genu)\n(5, gien)\n(5, ginn)\n(5, nep)\n(5, nip)\n(5, pen)\n(5, pie)\n(5, pin)\n(5, piu)\n(5, pun)\n(4, eng)\n(4, gen)\n(4, gie)\n(4, gin)\n(4, gnu)\n(4, gue)\n(4, gun)\n(4, neg)\n(4, nine)\n(4, pe)\n(4, pi)\n(4, up)\n(3, gi)\n(3, gu)\n(3, inn)\n(3, nie)\n(3, nun)\n(3, ug)\n(3, uni)\n(2, en)\n(2, in)\n(2, ne)\n(2, nu)\n(2, un)\nTotal number of words: 53\n'

print("\nDoes your output match our expectation?", test == solution)


Does your output match our expectation? True


In [7]:
""" The code should run in less than 30 seconds """
# Autograde cell - do not erase/delete

import time
start = time.time()

#test the code in the command line
cmd = ['python', 'scrabble.py', 'PENGU*?']
out = bytes.decode(subprocess.Popen(cmd, stdout=subprocess.PIPE).communicate()[0])

tot_time = time.time() - start
print('Total time was {} seconds'.format(tot_time))
assert_less(tot_time, 30)

Total time was 0.30055856704711914 seconds


In [8]:
# Autograding test

In [9]:
# Autograding test

In [10]:
# Autograding test

In [11]:
# Autograding test

In [12]:
# Autograding test

In [13]:
# Autograding test

In [22]:
# Autograding test

In [16]:
# I have added two additional optional arguments to my scrabble program.
# --constraints or -c followed by a regular expression in Python standard format will filter the results accordingly.
# --timer or -t enables a timer
# Use scrabble.py -h for more information.
#
# Please note that the constraints expression is set up to match against the start of the result strings as opposed
# to using the Python search approach would could match in multiple places.
#
# While the full flexibility of regular expressions is available, to specify a specific character in a given position 
# the user can use the "." character to match characters up to the position in question, place the character of choice
# at the spot. 

! echo scrabble.py -h to show the help message:
! echo    
! python scrabble.py -h
! echo
! echo scrabble.py Pengu*n --constraints ..n.* matches "n" in the third position with zero or more characters to follow:
! echo    
! python scrabble.py Pengu*n --constraints ..n.*
! echo
! echo scrabble.py Pengu*n --constraints .u.n.+ --timer matches "u" in the second position, "n" in the fourth position with one or more characters to follow, and a timer:
! echo
! python scrabble.py Pengu*n --constraints .u.n.+ --timer

scrabble.py -h to show the help message:

usage: scrabble.py [-h] [--constraints CONSTRAINTS] [--timer] rack

Scrabble words from tiles.

positional arguments:
  rack                  A valid tile rack with two to seven tiles. Tiles can
                        be any upper or lower case alphabetic character. In
                        addition the tiles may include up to two wildcards, a
                        maximum of one '*' and a maximum of one '?'.

optional arguments:
  -h, --help            show this help message and exit
  --constraints CONSTRAINTS, -c CONSTRAINTS
                        A regular expression to constrain results. The
                        constraint expression will follow Python regular
                        expression formatting rules. Expressions are run
                        against the beginning of the string with Python match
                        and not Python search.
  --timer, -t           If specified, add a performance timer.

scrabble.py P

## If you have feedback for this homework, please submit it using the link below:

http://goo.gl/forms/74yCiQTf6k