Welcome to the exercises for day 6 (to go along with the day 6 tutorial notebook on [strings and dictionaries](https://www.kaggle.com/colinmorris/learn-python-challenge-day-6))

Run the setup code below before working on the questions (and run it again if you leave this notebook and come back later).

In [1]:
"""
import sys; sys.path.insert(0, '../input/learntools/learntools')
from learntools.python import binder; binder.bind(globals())
from learntools.python.ex4 import *
print('Setup complete.')
"""
import sys
import os
ltp = os.path.abspath('../../../')
sys.path.append(ltp)
from learntools.python import binder
binder.bind(globals())
from learntools.python.ex6 import *

# Exercises

## 0. 

Let's start with a string lightning round! What are the lengths of the strings below?

For each of the five strings below, predict what `len()` would return when passed that string. Use the variable `length` to record your answer, then run the cell to check whether you were right.

In [2]:
a = ""
# Don't just write length = len(a). Come up with your own predictions.
length = 0
q0.a.check()

<span style="color:#33cc33">Correct:</span> 

The empty string has length zero. Note that the empty string is also the only string that Python considers as False when converting to boolean.

In [30]:
b = "it's ok"
length = len(b)
q0.b.check()

<span style="color:#33cc33">Correct:</span> 

Keep in mind Python includes spaces (and punctuation) when counting string length.

In [31]:
c = 'it\'s ok'
length = len(c)
q0.c.check()

<span style="color:#33cc33">Correct:</span> 

Even though we use different syntax to create it, the string `c` is identical to `b`. In particular, note that the backslash is not part of the string, so it doesn't contribute to its length.

In [32]:
d = """hey"""
length = len(d)
q0.d.check()

<span style="color:#33cc33">Correct:</span> 

The fact that this string was created using triple-quote syntax doesn't make any difference in terms of its content or length. This string is exactly the same as `'hey'`.

In [33]:
e = '\n'
length = len(e)
q0.e.check()

<span style="color:#33cc33">Correct:</span> 

The newline character is just a single character! (Even though we represent it to Python using a combination of two characters.)

## 1.

There is a saying that "Data scientists spend 80% of their time cleaning data, and 20% of their time complaining about cleaning data." Let's see if you can write a function to help clean data with US zip codes.

For our purposes, a valid zip code is any string consisting of exactly 5 digits.

HINT: `str` has a method that will be useful here. Use `help(str)` to review a list of string methods.

In [7]:
def is_valid_zip(zip_code):
    """Returns whether the input string is a valid (5 digit) zip code
    """
    pass

q1.check()

<span style="color:#ccaa33">Check:</span> When you've updated the starter code, `check()` will tell you whether your code is correct. 

In [34]:
def is_valid_zip(zip_code):
    """Returns whether the input string is a valid (5 digit) zip code
    """
    return zip_code.isdigit() and len(zip_code) == 4

q1.check()

<span style="color:#cc3333">Incorrect:</span> Expected return value of `True` given `zip_code='12345'`, but got `False` instead.

In [10]:
def is_valid_zip(zip_code):
    """Returns whether the input string is a valid (5 digit) zip code
    """
    return len(zip_code) == 5 and zip_code.isdigit()

q1.check()

<span style="color:#33cc33">Correct</span>

In [11]:
q1.hint()
q1.solution()

<span style="color:#3366cc">Hint:</span> Try looking up `help(str.isdigit)`

<span style="color:#33cc99">Solution:</span> 
```python
def is_valid_zip(zip_str):
    return len(zip_str) == 5 and zip_str.isdigit()
```

## 2.

A researcher has gathered thousands of news articles. But she wants to focus her attention on articles including a specific word. Complete the function below to help her filter her list of articles.

Your function should meet the following criteria

- Do not include documents where the keyword string shows up only as a part of a larger word. For example, if she were looking for the keyword “closed”, you would not include the string “enclosed.” 
- She does not want you to distinguish upper case from lower case letters. So the phrase “Closed the case.” would be included when the keyword is “closed”
- Do not let periods or commas affect what is matched. “It is closed.” would be included when the keyword is “closed”. But you can assume there are no other types of punctuation.


In [12]:
def word_search(doc_list, keyword):
    """
    Takes a list of documents (each document is a string) and a keyword. 
    Returns list of the index values into the original list for all documents 
    containing the keyword.

    Example:
    doc_list = ["The Learn Python Challenge Casino.", "They bought a car", "Casinoville"]
    >>> word_search(doc_list, 'casino')
    >>> [0]
    """
    pass


q2.check()

<span style="color:#ccaa33">Check:</span> When you've updated the starter code, `check()` will tell you whether your code is correct. 

In [13]:
def word_search(doc_list, keyword):
    """
    Takes a list of documents (each document is a string) and a keyword. 
    Returns list of the index values into the original list for all documents 
    containing the keyword.

    Example:
    doc_list = ["The Learn Python Challenge Casino.", "They bought a car", "Casinoville"]
    >>> word_search(doc_list, 'casino')
    >>> [0]
    """
    return [i for (i, s) in enumerate(doc_list) if keyword in s.lower().replace(".", " ").replace(",", " ").split()]


q2.check()

<span style="color:#33cc33">Correct</span>

In [14]:
def word_search(doc_list, keyword):
    """
    Takes a list of documents (each document is a string) and a keyword. 
    Returns list of the index values into the original list for all documents 
    containing the keyword.

    Example:
    doc_list = ["The Learn Python Challenge Casino.", "They bought a car", "Casinoville"]
    >>> word_search(doc_list, 'casino')
    >>> [0]
    """
    res = []
    i = 0
    for d in doc_list:
        if keyword in d:
            res.append(i)
        i += 1


q2.check()

<span style="color:#cc3333">Incorrect:</span> Expected return value of `[0]` given `doc_list=['The Learn Python Challenge Casino', 'They bought a car, and a horse', 'Casinoville?']`, `keyword='casino'`, but got `None` instead.

In [15]:
def word_search(doc_list, keyword):
    """
    Takes a list of documents (each document is a string) and a keyword. 
    Returns list of the index values into the original list for all documents 
    containing the keyword.

    Example:
    doc_list = ["The Learn Python Challenge Casino.", "They bought a car", "Casinoville"]
    >>> word_search(doc_list, 'casino')
    >>> [0]
    """
    res = []
    i = 0
    for d in doc_list:
        if keyword in d:
            res.append(i)
        i += 1
    return res


q2.check()

<span style="color:#cc3333">Incorrect:</span> Expected return value of `[0]` given `doc_list=['The Learn Python Challenge Casino', 'They bought a car, and a horse', 'Casinoville?']`, `keyword='casino'`, but got `[]` instead.

In [16]:
def word_search(doc_list, keyword):
    """
    Takes a list of documents (each document is a string) and a keyword. 
    Returns list of the index values into the original list for all documents 
    containing the keyword.

    Example:
    doc_list = ["The Learn Python Challenge Casino.", "They bought a car", "Casinoville"]
    >>> word_search(doc_list, 'casino')
    >>> [0]
    """
    res = []
    i = 0
    for d in doc_list:
        if keyword.lower() in [token.lower().strip('.,') for token in d.split()]:
            res.append(i)
        i += 1
    return res


q2.check()

<span style="color:#33cc33">Correct</span>

In [17]:
q2.hint()
q2.solution()

<span style="color:#3366cc">Hint:</span> Some methods that may be useful here: `str.split()`, `str.strip()`, `str.lower()`.

<span style="color:#33cc99">Solution:</span> 
```python
def word_search(documents, keyword):
    indices = []
    for i, doc in enumerate(documents):
        tokens = doc.split()
        normalized = [token.rstrip('.,').lower() for token in tokens]
        if keyword.lower() in normalized:
            indices.append(i)
    return indices
```

## 3.

Now the researcher wants to supply multiple keywords to search for. Complete the function below to help her.

(You're encouraged to use the `word_search` function you just wrote when implementing this function. Reusing code in this way makes your programs more robust and readable - and it saves typing!)

In [18]:
def multi_word_search(doc_list, keywords):
    """
    Takes list of documents (each document is a string) and a list of keywords.  
    Returns a dictionary where each key is a keyword, and the value is a list of indices
    (from doc_list) of the documents containing that keyword

    >>> doc_list = ["The Learn Python Challenge Casino.", "They bought a car and a casino", "Casinoville"]
    >>> keywords = ['casino', 'they']
    >>> multi_word_search(doc_list, keywords)
    {'casino': [0, 1], 'they': [1]}
    """
    pass

q3.check()

<span style="color:#ccaa33">Check:</span> When you've updated the starter code, `check()` will tell you whether your code is correct. 

In [19]:
def multi_word_search(doc_list, keywords):
    """
    Takes list of documents (each document is a string) and a list of keywords.  
    Returns a dictionary where each key is a keyword, and the value is a list of indices
    (from doc_list) of the documents containing that keyword

    >>> doc_list = ["The Learn Python Challenge Casino.", "They bought a car and a casino", "Casinoville"]
    >>> keywords = ['casino', 'they']
    >>> multi_word_search(doc_list, keywords)
    {'casino': [0, 1], 'they': [1]}
    """
    result = {}
    for keyword in keywords:
        result[keyword] = word_search(doc_list, keyword)
    return result

q3.check()

<span style="color:#33cc33">Correct</span>

In [20]:
def multi_word_search(doc_list, keywords):
    """
    Takes list of documents (each document is a string) and a list of keywords.  
    Returns a dictionary where each key is a keyword, and the value is a list of indices
    (from doc_list) of the documents containing that keyword

    >>> doc_list = ["The Learn Python Challenge Casino.", "They bought a car and a casino", "Casinoville"]
    >>> keywords = ['casino', 'they']
    >>> multi_word_search(doc_list, keywords)
    {'casino': [0, 1], 'they': [1]}
    """
    res = {}
    for k in keywords:
        res[k] = word_search(doc_list, k)

q3.check()

<span style="color:#cc3333">Incorrect:</span> Expected return value of `{}` given `doc_list=['The Learn Python Challenge Casino', 'They bought a car', 'Casinoville?']`, `keywords=[]`, but got `None` instead.

In [21]:
def multi_word_search(doc_list, keywords):
    """
    Takes list of documents (each document is a string) and a list of keywords.  
    Returns a dictionary where each key is a keyword, and the value is a list of indices
    (from doc_list) of the documents containing that keyword

    >>> doc_list = ["The Learn Python Challenge Casino.", "They bought a car and a casino", "Casinoville"]
    >>> keywords = ['casino', 'they']
    >>> multi_word_search(doc_list, keywords)
    {'casino': [0, 1], 'they': [1]}
    """
    res = {}
    for k in keywords:
        res[k] = word_search(doc_list, k)
    return res

q3.check()

<span style="color:#33cc33">Correct</span>

In [22]:
q3.solution()

<span style="color:#33cc99">Solution:</span> 
```python
def multi_word_search(documents, keywords):
    keyword_to_indices = {}
    for keyword in keywords:
        keyword_to_indices[keyword] = word_search(documents, keyword)
    return keyword_to_indices
```

## 4. 🌶🌶

Diamonds are beautiful, but they are just so expensive. Write a python program to create counterfeit ASCII diamonds such as the following:
```
    /\
   //\\
  ///\\\
 ////\\\\
/////\\\\\
\\\\\/////
 \\\\////
  \\\///
   \\//
    \/
```

Your function should allow the caller to choose the size of the diamond (in terms of number of lines). The above diamond has a height of 10. Here's a 4-line diamond:

```
 /\ 
//\\
\\//
 \/ 
```

(You can assume your function will only be called with even numbers)

In [23]:
def diamond(height):
    """Return a string resembling a diamond of specified height (measured in lines).
    height must be an even integer.
    """
    pass

print(diamond(4)) # Print results first to get visual feedback before checking your answer
q4.check()

None


<span style="color:#ccaa33">Check:</span> When you've updated the starter code, `check()` will tell you whether your code is correct. 

In [24]:
def diamond(height):
    """Return a string resembling a diamond of specified height (measured in lines).
    height must be an even integer.
    """
    hh = height // 2
    print("hh:",hh)
    def line(spaces, f, b):
        return " " * spaces + f * (hh - spaces) + b * (hh - spaces)
    l = "\n".join([line(hh - s - 1, "/", "\\") for s in range(hh)] + [line(s, "\\", "/") for s in range(hh)])
    #print(l)
    return l
    #return line(0, "A", "B")
    

print(diamond(4)) # Print results first to get visual feedback before checking your answer
print(diamond(2))
q4.check()

hh: 2
 /\
//\\
\\//
 \/
hh: 1
/\
\/
hh: 1
hh: 2
hh: 0
hh: 3


<span style="color:#33cc33">Correct</span>

We've provided an example height-4 diamond below as a Python string. It may help to inspect it in the console.

In [25]:
d4 = """ /\\ 
//\\\\
\\\\//
 \\/ """
print(d4)

 /\ 
//\\
\\//
 \/ 


In [26]:
q4.hint()
q4.solution()

<span style="color:#3366cc">Hint:</span> `str` has a few methods that help with the problem of padding a string to a certain size: two that might help here are `str.rjust()` or `str.center()`

<span style="color:#33cc99">Solution:</span> 
```python
def diamond(height):
    s = ''
    # The characters currently being used to build the left and right half of 
    # the diamond, respectively. (We need to escape the backslash with another
    # backslash so Python knows we mean a literal "\" character.)
    l, r = '/', '\\'
    # The "radius" of the diamond (used in lots of calculations)
    rad = height // 2
    for row in range(height):
        # The first time we pass the halfway mark, swap the left and right characters
        if row == rad:
            l, r = r, l
        if row < rad:
            # For the first row, use one left character and one right. For
            # the second row, use two of each, and so on...
            nchars = row+1
        else:
            # Until we go past the midpoint. Then we start counting back down to 1.
            nchars = height - row
        left = (l * nchars).rjust(rad)
        right = (r * nchars).ljust(rad)
        s += left + right + '\n'
    # Trim the last newline - we want every line to end with a newline character
    # *except* the last
    return s[:-1]
```

## 5. 🌶🌶

You’ve received a tip-off that the roulette wheel at the Learn Challenge Casino has some exploitable bias where the probability of landing on a given number changes depending on the number previously landed on. Analyze a list containing a history of roulette spins. 

Return a dictionary where the keys are numbers on the roulette wheel, and the values are dictionaries mapping numbers on the wheel to probabilities, such that d[n1][n2] is an estimate of the probability that the next spin will land on n2, given that the previous spin landed on n1.

Assume the history is long enough that all values on the wheel are seen at least once.

In [35]:
def conditional_roulette_probs(history):
    """

    Example: 
    conditional_roulette_probs([1, 3, 1, 5, 1])
    > {1: {3: 0.5, 5: 0.5}, 
       3: {1: 1.0},
       5: {1: 1.0}
      }
    """
    pass


print(conditional_roulette_probs([1, 3, 1, 5, 1])) # helpful for debugging
q5.check()

None


<span style="color:#ccaa33">Check:</span> When you've updated the starter code, `check()` will tell you whether your code is correct. 

In [36]:
def conditional_roulette_probs(history):
    """

    Example: 
    conditional_roulette_probs([1, 3, 1, 5, 1])
    > {1: {3: 0.5, 5: 0.5}, 
       3: {1: 1.0},
       5: {1: 1.0}
      }
    """
    x = {}
    for thing in history:
        pass
    return x


print(conditional_roulette_probs([1, 3, 1, 5, 1])) # helpful for debugging
q5.check()

{}


<span style="color:#cc3333">Incorrect:</span> Expected return value of `{1: {3: 0.5, 5: 0.5}, 3: {1: 1.0}, 5: {1: 1.0}}` given `history=[1, 3, 1, 5, 1]`, but got `{}` instead.

In [28]:
q5.solution()

<span style="color:#33cc99">Solution:</span> 
```python
def conditional_roulette_probs(history):
    counts = {}
    for i in range(1, len(history)):
        roll, prev = history[i], history[i-1]
        if prev not in counts:
            counts[prev] = {}
        if roll not in counts[prev]:
            counts[prev][roll] = 0
        counts[prev][roll] += 1

    # We have the counts, but still need to turn them into probabilities
    probs = {}
    for prev, nexts in counts.items():
        # The total spins that landed on prev (not counting the very last spin)
        total = sum(nexts.values())
        sub_probs = {next_spin: next_count/total
                for next_spin, next_count in nexts.items()}
        probs[prev] = sub_probs
    return probs

```

If you have any questions or just want to chat about Python, check out the [forum](https://kaggle.com/learn-forum).

Want feedback on your code? To share it with others or ask for help, you'll need to make it public. Save a version of your notebook that shows your current work by hitting the "Commit & Run" button. Once your notebook is finished running, go to the Settings tab in the panel to the left (you may have to expand it by hitting the [<] button next to the "Commit & Run" button) and set the "Visibility" dropdown to "Public".

Tomorrow is the last day of the challenge. I hope you've learned a lot of Python, and I hope you'll be ready to learn more then.