These exercises accompany the tutorial on [strings and dictionaries](https://www.kaggle.com/colinmorris/strings-and-dictionaries).

Run the setup code below before working on the questions (and run it again if you leave this notebook and come back later).

In [None]:
# SETUP. You don't need to worry for now about what this code does or how it works. If you're ever curious about the 
# code behind these exercises, it's available under an open source license here: https://github.com/Kaggle/learntools/
from learntools.core import binder; binder.bind(globals())
from learntools.python.ex6 import *
print('Setup complete.')

# Exercises

## 0. 

Let's start with a string lightning round to warm up. What are the lengths of the strings below?

For each of the five strings below, predict what `len()` would return when passed that string. Use the variable `length` to record your answer, then run the cell to check whether you were right.

In [None]:
a = ""
length = 0
q0.a.check()

In [None]:
b = "it's ok"
length = 7
q0.b.check()

In [None]:
c = 'it\'s ok'
length = 7
q0.c.check()

In [None]:
d = """hey"""
length = 3
q0.d.check()

In [None]:
e = '\n'
length = 1
q0.e.check()

## 1.

There is a saying that "Data scientists spend 80% of their time cleaning data, and 20% of their time complaining about cleaning data." Let's see if you can write a function to help clean US zip code data. Given a string, it should return whether or not that string represents a valid zip code. For our purposes, a valid zip code is any string consisting of exactly 5 digits.

HINT: `str` has a method that will be useful here. Use `help(str)` to review a list of string methods.

In [None]:
def is_valid_zip(zip_code):
    """Returns whether the input string is a valid (5 digit) zip code
    """
    return len(zip_code) == 5 and zip_code.isdigit()
# help(str)
q1.check()

In [None]:
q1.hint()
q1.solution()

## 2.

A researcher has gathered thousands of news articles. But she wants to focus her attention on articles including a specific word. Complete the function below to help her filter her list of articles.

Your function should meet the following criteria

- Do not include documents where the keyword string shows up only as a part of a larger word. For example, if she were looking for the keyword “closed”, you would not include the string “enclosed.” 
- She does not want you to distinguish upper case from lower case letters. So the phrase “Closed the case.” would be included when the keyword is “closed”
- Do not let periods or commas affect what is matched. “It is closed.” would be included when the keyword is “closed”. But you can assume there are no other types of punctuation.

In [None]:
def word_search(documents, keyword):
    """
    Takes a list of documents (each document is a string) and a keyword. 
    Returns list of the index values into the original list for all documents 
    containing the keyword.

    Example:
    doc_list = ["The Learn Python Challenge Casino.", "They bought a car", "Casinoville"]
    >>> word_search(doc_list, 'casino')
    >>> [0]
    """
    # list to hold the indices of matching documents
    indices = [] 
    # Iterate through the indices (i) and elements (doc) of documents
    for i, doc in enumerate(documents):
        # Split the string doc into a list of words (according to whitespace)
        tokens = doc.split()
        # Make a transformed list where we 'normalize' each word to facilitate matching.
        # Periods and commas are removed from the end of each word, and it's set to all lowercase.
        normalized = [token.rstrip('.,').lower() for token in tokens]
        # Is there a match? If so, update the list of matching indices.
        if keyword.lower() in normalized:
            indices.append(i)
    return indices


q2.check()

In [None]:
q2.hint()
q2.solution()

## 3.

Now the researcher wants to supply multiple keywords to search for. Complete the function below to help her.

(You're encouraged to use the `word_search` function you just wrote when implementing this function. Reusing code in this way makes your programs more robust and readable - and it saves typing!)

In [None]:
def multi_word_search(documents, keywords):
    """
    Takes list of documents (each document is a string) and a list of keywords.  
    Returns a dictionary where each key is a keyword, and the value is a list of indices
    (from doc_list) of the documents containing that keyword

    >>> doc_list = ["The Learn Python Challenge Casino.", "They bought a car and a casino", "Casinoville"]
    >>> keywords = ['casino', 'they']
    >>> multi_word_search(doc_list, keywords)
    {'casino': [0, 1], 'they': [1]}
    """
    keyword_to_indices = {}
    for keyword in keywords:
        keyword_to_indices[keyword] = word_search(documents, keyword)
    return keyword_to_indices


q3.check()

In [None]:
q3.solution()

## 4. <span title="Spicy" style="color: coral">🌶️🌶️</span>

Diamonds are beautiful, but they are just so expensive. Write a python program to create counterfeit ASCII diamonds such as the following:
```
    /\
   //\\
  ///\\\
 ////\\\\
/////\\\\\
\\\\\/////
 \\\\////
  \\\///
   \\//
    \/
```

Your function should allow the caller to choose the size of the diamond (in terms of number of lines). The above diamond has a height of 10. Here's a 4-line diamond:

```
 /\ 
//\\
\\//
 \/ 
```

(You can assume your function will only be called with even numbers)

In [None]:
def diamond(height):
    """Return a string resembling a diamond of specified height (measured in lines).
    height must be an even integer.
    """
    pass


q4.check()

We've provided an example height-4 diamond below as a Python string. It may help to inspect it in the console.

In [None]:
d4 = """
 /\\ 
//\\\\
\\\\//
 \\/ """
print(d4)

def diamond(height):
    s = ''
    # The characters currently being used to build the left and right half of 
    # the diamond, respectively. (We need to escape the backslash with another
    # backslash so Python knows we mean a literal "\" character.)
    l, r = '/', '\\'
    # The "radius" of the diamond (used in lots of calculations)
    rad = height // 2
    for row in range(height):
        # The first time we pass the halfway mark, swap the left and right characters
        if row == rad:
            l, r = r, l
        if row < rad:
            # For the first row, use one left character and one right. For
            # the second row, use two of each, and so on...
            nchars = row+1
        else:
            # Until we go past the midpoint. Then we start counting back down to 1.
            nchars = height - row
        left = (l * nchars).rjust(rad)
        right = (r * nchars).ljust(rad)
        s += left + right + '\n'
    # Trim the last newline - we want every line to end with a newline character
    # *except* the last
    return s[:-1]

diamond(4)

In [None]:
q4.hint()
q4.solution()

## 5. <span title="Spicy" style="color: coral">🌶️🌶️</span>

Dice may not have any memory, but apparently the roulette wheel at the Learn Challenge Casino does. You’ve received a tip-off that the wheel has some exploitable bias where the probability of landing on a given number changes depending on the number previously landed on. Analyze a list containing a history of roulette spins. 

Return a dictionary where the keys are numbers on the roulette wheel, and the values are dictionaries mapping numbers on the wheel to probabilities, such that `d[n1][n2]` is an estimate of the probability that the next spin will land on n2, given that the previous spin landed on n1.

In [None]:
def conditional_roulette_probs(history):
    """

    Example: 
    conditional_roulette_probs([1, 3, 1, 5, 1])
    > {1: {3: 0.5, 5: 0.5}, 
       3: {1: 1.0},
       5: {1: 1.0}
      }
    """
    # dict where keys are numbers and values are dicts
    # counts[a][b] is the number of times we've spun the number b immediately after spinning a
    counts = {}
    # Iterate over the indices of history *except* the first (index 0). (In the loop, We'll 
    # be looking at each index alongside its previous index. But there's no previous index for i=0)
    for i in range(1, len(history)):
        # The numbers for the ith spin and the spin before it
        roll, prev = history[i], history[i-1]
        # If we haven't seen prev before, we need to add it to counts. (Otherwise counts[prev] will give a KeyError)
        if prev not in counts:
            counts[prev] = {}
        # Similar to above - add key to the inner dict if not present
        if roll not in counts[prev]:
            counts[prev][roll] = 0
        counts[prev][roll] += 1

    # We have the counts, but still need to turn them into probabilities
    probs = {}
    # dict.items() gives us a dictionary's (key, value) pairs as a sequence of tuples.
    for prev, nexts in counts.items():
        # The total number of spins that landed on prev (not counting the very last spin)
        total = sum(nexts.values())
        # Take the nects dictionary and normalize it so that its values sum to 1 (and represent probabilities)
        sub_probs = {next_spin: next_count/total
                for next_spin, next_count in nexts.items()}
        probs[prev] = sub_probs
    return probs



q5.check()

In [None]:
q5.solution()

If you have any questions, be sure to post them on the [forums](https://www.kaggle.com/learn-forum).

Remember that your notebook is private by default, and in order to share it with other people or ask for help with it, you'll need to make it public. First, you'll need to save a version of your notebook that shows your current work by hitting the "Commit & Run" button. (Your work is saved automatically, but versioning your work lets you go back and look at what it was like at the point you saved it. It also let's you share a nice compiled notebook instead of just the raw code.) Then, once your notebook is finished running, you can go to the Settings tab in the panel to the left (you may have to expand it by hitting the [<] button next to the "Commit & Run" button) and setting the "Visibility" dropdown to "Public".

# Keep Going

When you're ready to continue, [click here](https://www.kaggle.com/colinmorris/working-with-external-libraries) to continue on to the next tutorial on imports.