## Python challenges

This notebook includes a series of challenges to test your Python coding skills. If you get stuck, try googling for answers. If you don't understand *why* a particular answer works, try searching for the answer to that question. Revisit old tutorials from this class as needed, and finally, turn to the course chatroom for help. Best of luck.

In [None]:
# required software
# conda install numpy pandas toyplot requests -c conda-forge

In [1]:
import requests
import numpy as np
import pandas as pd
import toyplot

### Challenge 1: 
Execute the code cell below to see an example of how it works. 
Use markdown in the cell after the code-block to describe the function `random_words_api`. Try to be descriptive about what each step of code in this function does, and why it works. 

In [2]:
def random_words_api(nwords=10):
    "no docstring"
    URL = "https://random-word-api.herokuapp.com/word"
    response = requests.get(url=URL, params={"number": nwords})
    return response.json()

# demonstration
random_words_api(5)

['squilgee', 'webfoot', 'humifications', 'vegetists', 'bracted']

**Description:** The function ``random_words_api`` queries a web app on a public Heroku server, which hosts a collection of words.  There is one integer argument *nwords* which sets the number of words queried (default is 10).

The variable *URL* defines the web address of the app.

The variable *response* saves the results of a GET request.  The *url* (lowercase) argument is the *URL* (uppercase) variable defined earlier.  The *params* argument is a dictionary with one key-value pair: "number" as the key, and the function-level argument *nwords* as the value.  The GET request takes the URL argument and appends the string "/word?number=nwords" where *nwords* is replaced by the provided function-level argument.

The function returns the *response* variable in JSON format.

### Challenge 2: 
Use the `random_words_api` function to get 50 random words and store the result as a variable. Write a function that takes the list of words as input and returns a dictionary with the longest word as the key and the length of the longest word as a value. If there is a tie in the length of words then have it return additional words as keys with their lengths as values.

In [12]:
def word_sorter(words):
    
    # Create dict object.
    longest_word = {}
    
    # Iterate over list of words.
    for word in words:
        
        # Words are added to the dict if their length equals the longest word in the list of words, as determined by max().
        # This allows for ties to be represnted in a dict of more than one key-value pair.
        if len(word) == len(max(fifty_words, key=len)):
            longest_word[word] = len(max(fifty_words, key=len))
            
    # Return dict.
    return longest_word

fifty_words = random_words_api(50)
print(fifty_words)
word_sorter(fifty_words)

['proportionable', 'unclipped', 'bandas', 'coequate', 'alike', 'obligating', 'newspeaks', 'construal', 'nitrifiers', 'spahee', 'wooing', 'readdressing', 'threaten', 'compeer', 'nobleness', 'nonchalant', 'executrix', 'forebodings', 'excitements', 'synopses', 'determination', 'reviviscence', 'plenished', 'cruelly', 'monseigneur', 'rereminds', 'delightfully', 'rechauffes', 'bedizened', 'calutron', 'vees', 'sepulchering', 'seis', 'ghastly', 'geologers', 'unchoke', 'oleaster', 'bursitis', 'accumulation', 'perversely', 'preventative', 'cockiest', 'ultrarealistic', 'bullheaded', 'seepage', 'sheuchs', 'salpa', 'payout', 'kitsches', 'floppier']


{'proportionable': 14, 'ultrarealistic': 14}

### Challenge 3: 
Write a function to take the list of words as input and trim all words to be at most 5 characters in length, and return as a list.

In [14]:
def word_trimmer(words):
    
    # Create list object.
    trimmed_words = []
    
    # Iterate over list of words.
    for word in words:
        
        # If the word is more than five characters, subset only the first five characters, then append it to the new list.
        if len(word) > 5:
            word = word[:5]
            trimmed_words.append(word)
            
        # Otherwise, just append it without any modifications.
        else:
            trimmed_words.append(word)
    
    # Return list.
    return trimmed_words

fifty_words = random_words_api(50)
print(fifty_words)
word_trimmer(fifty_words)

['redivorce', 'swept', 'enactment', 'treat', 'assentors', 'inhumation', 'plimsoll', 'spideriest', 'avert', 'dysarthrias', 'groynes', 'stirks', 'venerabilities', 'parbuckled', 'backchats', 'bedazzles', 'avengers', 'methoxyl', 'bonfire', 'fluorination', 'priggish', 'superatom', 'insociable', 'disboweling', 'botanizer', 'jeremiad', 'dimidiating', 'prearm', 'sideboard', 'conceptualised', 'lesioned', 'tetracyclines', 'telexing', 'resmelted', 'preannounced', 'urbaner', 'dive', 'matzot', 'percent', 'seducement', 'endodontic', 'recant', 'fuelling', 'downscale', 'disjointed', 'sepaled', 'monkeyshines', 'dellies', 'microsurgical', 'macromere']


['rediv',
 'swept',
 'enact',
 'treat',
 'assen',
 'inhum',
 'plims',
 'spide',
 'avert',
 'dysar',
 'groyn',
 'stirk',
 'vener',
 'parbu',
 'backc',
 'bedaz',
 'aveng',
 'metho',
 'bonfi',
 'fluor',
 'prigg',
 'super',
 'insoc',
 'disbo',
 'botan',
 'jerem',
 'dimid',
 'prear',
 'sideb',
 'conce',
 'lesio',
 'tetra',
 'telex',
 'resme',
 'prean',
 'urban',
 'dive',
 'matzo',
 'perce',
 'seduc',
 'endod',
 'recan',
 'fuell',
 'downs',
 'disjo',
 'sepal',
 'monke',
 'delli',
 'micro',
 'macro']

### Challenge 4: 
Write a function to take a list of words as input and to count the occurrence of all letters in every word and return as a dictionary mapping letters to integers., e.g., {'a': 10, 'b': 3, 'c': 5, ...}.  

In [50]:
def letter_occurrence(words):
    
    # Concatenate the inputted words into one long string.
    concat = "".join(words)
    
    # Create dict object.
    alphabet_counter = {}
    
    # Iterate over the concatenated string and build occurrence dict.
    for item in concat:
        if item not in alphabet_counter:
            alphabet_counter[item] = 1
        else:
            alphabet_counter[item] += 1
            
    # Return dict.
    return alphabet_counter
        
fifty_words = random_words_api(50)
print(fifty_words)
res = letter_occurrence(fifty_words)
for key in sorted(res.keys()):
    print(key, res[key])

['sequestrates', 'finalise', 'carboxylations', 'unfamiliar', 'specifics', 'overwrought', 'noncomparable', 'succours', 'marginalization', 'trochophore', 'uvular', 'crestless', 'unraised', 'thanker', 'commiserative', 'tekkies', 'epimers', 'wirelike', 'ionizations', 'clonal', 'segmentally', 'conniption', 'kelsons', 'shouting', 'harbinger', 'giveaway', 'impregnation', 'pretrial', 'proclivity', 'antechambers', 'overgrades', 'transfusional', 'maundering', 'shamuses', 'reedy', 'echini', 'antireform', 'intergradation', 'prorogate', 'germinabilities', 'surprints', 'heliometrically', 'entrapment', 'cowlstaves', 'fibroses', 'compared', 'extrasystoles', 'synovial', 'sardonic', 'prefocussing']
a 42
b 6
c 19
d 7
e 51
f 7
g 13
h 10
i 48
k 5
l 23
m 16
n 37
o 35
p 13
q 1
r 45
s 42
t 31
u 14
v 8
w 4
x 2
y 8
z 2


### Challenge 5:
Use [toyplot](https://toyplot.readthedocs.io/en/stable/tutorial.html) to create a barplot of the occurrences of each letter in your dictionary from the previous challenge. This will represent a histogram of the letters. Play with the size and color of the figure to try to make it look nice.

In [64]:
cats = []
counts = []
for key in sorted(res.keys()):
    cats.append(key)
    counts.append(res[key])

canvas, axes, mark = toyplot.bars(counts, width=400, height=300, color = "purple")
axes.x.ticks.locator = toyplot.locator.Explicit(labels=cats)

### Challenge 6: 
Using numpy create a new variable called `arr` with 1000 random samples from a normal distribution. Use the numpy `.histogram` function to bin these values into 20 bins, and then plot the histogram using a barplot from toyplot. Color the bars of the histogram orange.

In [49]:
arr = np.random.normal(100, size = 1000)
hist = np.histogram(arr, bins = 20)

canvas = toyplot.Canvas(width=300, height=300)
axes = canvas.cartesian()
bars = axes.bars(hist, color = "orange")

### Challenge 7: 
Write a `while` statement to continue running code in a loop until a condition is met, and then call `break` to end the loop. Inside of the loop, randomly draw a single value from a uniform distribution between 0 and 100. If the value is less than 25 and greater than 22 then break the loop, otherwise, continue the loop until a value meeting this condition is sampled. Use a variable to keep track of how many iterations of the loop are run, and print this value after calling `break`. 


In [122]:
# Define a counter variable.
counter = 0

# Define while loop, iterating the counter by 1 for each loop.
while 1:
    counter += 1
    
    # Sample from uniform distribution between 0 and 100.
    n = np.random.uniform(0, 100, 1)
    
    # Define condition for breaking loop.
    if 22 < n < 25:
        break

# Show final number of loops.
print("This function looped " + str(counter) + " times before breaking.")

This function looped 28 times before breaking.


### Challenge 8: 
Use pandas to load a CSV file from https://eaton-lab.org/data/iris-data-dirty.csv and save as a dataframe. Add custom names to the columns in the dataframe, based on the type of values in them (e.g., numeric versus strings). You can come up with any column names you want for these.

In [124]:
df = pd.read_csv("https://eaton-lab.org/data/iris-data-dirty.csv", header = None)
df.columns = ["float1", "float2", "float3", "float4", "string1"]
df

Unnamed: 0,float1,float2,float3,float4,string1
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,Iris-virginica
146,6.3,2.5,5.0,1.9,Iris-virginica
147,6.5,3.0,5.2,2.0,Iris-virginica
148,6.2,3.4,5.4,2.3,Iris-virginica


### Challenge 9: 
Calculate the mean value of the data in the left-most column for all data where the right-most column matches the value "Iris-setosa". 

In [75]:
# Variable to sum data in left-most column.
total = 0

# Variable to count number of values comprising sum.
count = 0

# Add value in left-most column to total if right-most column is "Iris-setosa" and iterate count when the condition is met.
for idx in df.index:
    if df.loc[idx, "string1"] == "Iris-setosa":
        total += df.loc[idx, "float1"]
        count += 1
        
# The mean is the total divided by the count.
mean = total / count
print(mean)

5.0102040816326525


### Challenge 10:
Create a copy of your iris data dataframe and name it `df2`. Sort the rows of this dataframe based on the values in the first (leftmost) column so that the lowest values are first, and the highest values at the bottom. After sorting, reset the index of the dataframe so that the index is once again ordered. Once you get this to work, try to rewrite it in a simpler form by chaining multiple function calls together to accomplish the goal in one line, by calling code that looks a bit like this, but with the correct function calls: `df.function().function().function()`

In [125]:
# Copy dataframe and sort rows by values in left-most column.
df2 = df.copy()
df2 = df2.sort_values(by = "float1")
df2 = df2.reset_index(drop = True)
df2

Unnamed: 0,float1,float2,float3,float4,string1
0,4.3,3.0,1.1,0.1,Iris-setosa
1,4.4,3.2,1.3,0.2,Iris-setosa
2,4.4,3.0,1.3,0.2,Iris-setosa
3,4.4,2.9,1.4,0.2,Iris-setosa
4,4.5,2.3,1.3,0.3,Iris-setosa
...,...,...,...,...,...
145,7.7,2.8,6.7,2.0,Iris-virginica
146,7.7,2.6,6.9,2.3,Iris-virginica
147,7.7,3.8,6.7,2.2,Iris-virginica
148,7.7,3.0,6.1,2.3,Iris-virginica


In [126]:
# Repeat sorting in more compact code.
df3 = df.copy().sort_values(by = "float1").reset_index(drop = True)
df3

Unnamed: 0,float1,float2,float3,float4,string1
0,4.3,3.0,1.1,0.1,Iris-setosa
1,4.4,3.2,1.3,0.2,Iris-setosa
2,4.4,3.0,1.3,0.2,Iris-setosa
3,4.4,2.9,1.4,0.2,Iris-setosa
4,4.5,2.3,1.3,0.3,Iris-setosa
...,...,...,...,...,...
145,7.7,2.8,6.7,2.0,Iris-virginica
146,7.7,2.6,6.9,2.3,Iris-virginica
147,7.7,3.8,6.7,2.2,Iris-virginica
148,7.7,3.0,6.1,2.3,Iris-virginica


### Challenge 11:
Write a function that uses string formatting (curly braces) to create a [mad lib](https://en.wikipedia.org/wiki/Mad_Libs) containing at least 4 words that will be filled in. The returned object of your function should be a string where the missing words are filled by randomly sampled words from the `random_words_api()` function. The sentence or paragraph of your mad lib can be anything you wish, be creative. 

In [104]:
def madlib():
    word1 = str(random_words_api(1)).strip("[]'")
    word2 = str(random_words_api(1)).strip("[]'")
    word3 = str(random_words_api(1)).strip("[]'")
    word4 = str(random_words_api(1)).strip("[]'")
    story = f"""The sheer force of the man shouting "{word1}" at the bear seems to have startled the bear into falling
off the cliff.  The man claims the bear shouted "{word2}" back as it fell, but this is highly unlikely, as bears 
cannot speak.  When later asked to describe the encounter in more detail, the man offered as further elaboration
"{word3}" and "{word4}", causing even more confusion."""
    print(story)
    
madlib()

The sheer force of the man shouting "embarked" at the bear seems to have startled the bear into falling
off the cliff.  The man claims the bear shouted "fantast" back as it fell, but this is highly unlikely, as bears 
cannot speak.  When later asked to describe the encounter in more detail, the man offered as further elaboration
"tendons" and "trauchles", causing even more confusion.


<div class="alert alert-success">
After completing all challenges in this notebook, save and download the .ipynb file to your computer. Move the file to your hack-program repo and put it in a folder called notebooks. Add/stage this file and folder and commit the change, and push to GitHub. The assignment is due by end of day on 3/7/2021.  
</div>