## Python challenges

This notebook includes a series of challenges to test your Python coding skills. If you get stuck, try googling for answers. If you don't understand *why* a particular answer works, try searching for the answer to that question. Revisit old tutorials from this class as needed, and finally, turn to the course chatroom for help. Best of luck.

In [None]:
# required software
# conda install numpy pandas toyplot requests -c conda-forge

In [3]:
import requests
import numpy as np
import pandas as pd
import toyplot

### Challenge 1: 
Execute the code cell below to see an example of how it works. 
Use markdown in the cell after the code-block to describe the function `random_words_api`. Try to be descriptive about what each step of code in this function does, and why it works. 

In [4]:
def random_words_api(nwords=10):
    "no docstring"
    URL = "https://random-word-api.herokuapp.com/word"
    response = requests.get(url=URL, params={"number": nwords})
    return response.json()

# demonstration
random_words_api(5)

['cytoskeletal', 'obsidians', 'welladays', 'placentation', 'wacker']

**Description:** 

*in general* - The random-word-api connects to a heroku app that randomly generates words, 1 at a time. This function adds a # of words parameter, and then using the get command, stores the specified number of randomly generated words in a JSON format (i.e. words in a list).

*line by line*
  - **1** Defining a function, random_words_api, which has one input parameter, *nwords*, that takes an integer argument. 
  - **2** The REST API for the random word generator heroku app is stored in a variable called *URL*.
  - **3** REST call using requests package gets the information (i.e. the random word) from the url for the heroku app, adding a parameter for number, which is *nwords* - meaning user inputs that argument when they call the function. This is stored in a variable called *response*. 
  - **4** The function returns the words saved in *response* as a JSON format, in this case words in a list.

The way the REST call works can also be tested if you add ?number=nwords to the original url.

### Challenge 2: 
Use the `random_words_api` function to get 50 random words and store the result as a variable. Write a function that takes the list of words as input and returns a dictionary with the longest word as the key and the length of the longest word as a value. If there is a tie in the length of words then have it return additional words as keys with their lengths as values.

In [5]:
#storing 50 random words from random_words_api in a variable
my_words = random_words_api(50)

#defining function 
def longest_word(words):
    """
    This function takes a list of words (in JSON format) and returns a dictionary.
    The longest word in the list of words is the key, and the length of letters is the value in that dictionary.
    
    """
    #creating empty dictionary to then store value and key in for output
    new_dict = {}
    #defining list of all the lengths of every word in the list
    word_lengths = [len(x) for x in words]
    #storing the maximum value of letters in a word found in the list
    longest_word = max(word_lengths)
    
    #iterating over the length of words
    for x in range(len(word_lengths)):
        #when we stumble on the biggest word
        if word_lengths[x] == longest_word:
            #then add that word and its length to the empty dictionary
            new_dict[words[x]] = word_lengths[x]
    
    #return the filled dictionary        
    return new_dict

#test this function with the stored 50 words we have at the top of this cell
longest_word(my_words)

{'superstructures': 15, 'ophthalmologies': 15}

### Challenge 3: 
Write a function to take the list of words as input and trim all words to be at most 5 characters in length, and return as a list.

In [6]:
#defining function
def word_trim(words):
    """
    This function takes a list of words as input and then trims them all down to 5 letters.
    Shorter words will remain untrimmed.
    Outputs the trimmed-down list
    """
    
    #to do this we'll iterate over the length of words in the list
    for x in range(len(words)):
        #if the word is longer than 5 letters
        if len(words[x]) > 5:
            #then resave that word indexed only to the first 5 letters
            words[x] = words[x][:5]
    
    #return the new list
    return words

#Testing on a shortened word list and printing side to side with full list to check it worked
words_again = random_words_api(10) #randomly generate 10 words

#printing
print(words_again) #regular list
print(word_trim(words_again)) #trimmed down list

['vocoder', 'recusals', 'rhythmizing', 'staffing', 'defied', 'sniffles', 'pyres', 'qubit', 'thunderingly', 'snowboarders']
['vocod', 'recus', 'rhyth', 'staff', 'defie', 'sniff', 'pyres', 'qubit', 'thund', 'snowb']


### Challenge 4: 
Write a function to take a list of words as input and to count the occurrence of all letters in every word and return as a dictionary mapping letters to integers., e.g., {'a': 10, 'b': 3, 'c': 5, ...}.  

In [7]:
#will use the list called words_again for testing

#defining function
def letter_count(words):
    """
    This function takes a list of words as input and returns a dictionary with the # occurrences of every letter in that list
    
    """
    #setting an empty dictionary to then fill and output
    new_dict = {}
    
    #iterating over the list of words
    for word in words:
        #nesting the for-loop to iterate over every letter in each word
        for letter in word:
            #if the letter is not already in the dictionary
            if letter not in new_dict:
                #start the count at 1
                new_dict[letter] = 1
            #if the letter is already in new dictionary
            elif letter in new_dict:
                #add to the existing count
                new_dict[letter] +=1
    
    #return the new dictionary
    return new_dict


#testing this on a list of 50 words
occurrence_data=letter_count(my_words)
occurrence_data

{'o': 30,
 's': 46,
 'c': 23,
 'i': 44,
 'n': 28,
 'e': 45,
 'h': 9,
 'p': 13,
 'a': 35,
 't': 28,
 'm': 15,
 'd': 14,
 'r': 31,
 'z': 4,
 'l': 21,
 'g': 13,
 'q': 1,
 'u': 16,
 'v': 4,
 'f': 6,
 'y': 6,
 'b': 12,
 'k': 2,
 'w': 1}

### Challenge 5:
Use [toyplot](https://toyplot.readthedocs.io/en/stable/tutorial.html) to create a barplot of the occurrences of each letter in your dictionary from the previous challenge. This will represent a histogram of the letters. Play with the size and color of the figure to try to make it look nice.

In [8]:
#getting list of letter in alphabetical order
letters = [str(l) for l in occurrence_data] # getting only the strings from the dictionary of occurrence data
letters.sort() #sorting in alphabetical order
#letters #checking

#getting frequency values for each letter occurrence
freq = [occurrence_data[l] for l in letters] #getting value from dictionary for every letter in alphabetical order from list
#freq #checking

#Creating histogram bar plot
canvas = toyplot.Canvas(width=400, height=400) #setting up parameters for plot 

axes = canvas.cartesian(label="Frequency of Letters in Dictionary", 
                        xlabel="Letter", 
                        ylabel="# Occurrences") #setting axes labels

mark = axes.bars(
    freq, 
    style={"fill": "turquoise"}
                ) #setting y-axis values and color

axes.x.ticks.locator = toyplot.locator.Explicit(labels=letters) #labelling x-axis ticks

### Challenge 6: 
Using numpy create a new variable called `arr` with 1000 random samples from a normal distribution. Use the numpy `.histogram` function to bin these values into 20 bins, and then plot the histogram using a barplot from toyplot. Color the bars of the histogram orange.

In [10]:
#setting new variable arr with 1000 random samples from a normal distribution
arr = np.random.normal(size=1000)

#Creating bar plot histogram for arr

#setting plot parameters
canvas = toyplot.Canvas(width=600, height=400) 
#making sure axes are cartesian coordinates and labelling axes
axes = canvas.cartesian(label="Normal Distribtion Histogram",
                       xlabel="Values",
                       ylabel="Frequency") 
#show axes ticks
axes.x.ticks.show = True
axes.y.ticks.show = True
#Binning values into 20 bins using np.histogram, and coloring them orange
bars = axes.bars(
    np.histogram(arr, bins=20),
    style={"fill":"orange"}
                )

### Challenge 7: 
Write a `while` statement to continue running code in a loop until a condition is met, and then call `break` to end the loop. Inside of the loop, randomly draw a single value from a uniform distribution between 0 and 100. If the value is less than 25 and greater than 22 then break the loop, otherwise, continue the loop until a value meeting this condition is sampled. Use a variable to keep track of how many iterations of the loop are run, and print this value after calling `break`. 


In [11]:
#starting while loop
while True:
    
    #randomly drawing a number between 0-100 from a uniform distribution
    draw = np.random.uniform(low=0,high=100)
    
    #conditional statement if drawn number is between 22-25
    if draw > 22 and draw < 25:
        #then break the loop
        break

#print the drawn number to show it will always be between 22-25        
print(draw)

24.964062573774328


### Challenge 8: 
Use pandas to load a CSV file from https://eaton-lab.org/data/iris-data-dirty.csv and save as a dataframe. Add custom names to the columns in the dataframe, based on the type of values in them (e.g., numeric versus strings). You can come up with any column names you want for these.

In [12]:
#saving url
url = "https://eaton-lab.org/data/iris-data-dirty.csv"

#loading it as a csv
data = pd.read_csv(url)
#saving column types as a string list
c_name = [str(i) for i in data.dtypes]

#creating new list of column names replacing float64 for count
c_new = [w.replace("float64","count") for w in c_name]
#and object for species
c_new = [w.replace("object","species") for w in c_new]
#creating empty dictionary to store values
c_num = {}

#appending number to each column name
for i in range(len(c_new)):
    #what's the name?
    name = c_new[i]
    #if it's in there
    if name in c_num:
        c_num[name] +=1 #count up
    else:
        c_num[name] = 1 #if not start the count

    #add number to the name to distinguish columns 
    if name == "species":
        c_new[i] = f"string_{c_num[name]}"
    else:
        c_new[i] = f"{name}_{c_num[name]}"
        
#setting this list as the new column names for the data
data.columns = c_new

#outputting data
data

Unnamed: 0,count_1,count_2,count_3,count_4,string_1
0,4.9,3.0,1.4,0.2,Iris-setosa
1,4.7,3.2,1.3,0.2,Iris-setosa
2,4.6,3.1,1.5,0.2,Iris-setosa
3,5.0,3.6,1.4,0.2,Iris-setosa
4,5.4,3.9,1.7,0.4,Iris-setosa
...,...,...,...,...,...
144,6.7,3.0,5.2,2.3,Iris-virginica
145,6.3,2.5,5.0,1.9,Iris-virginica
146,6.5,3.0,5.2,2.0,Iris-virginica
147,6.2,3.4,5.4,2.3,Iris-virginica


### Challenge 9: 
Calculate the mean value of the data in the left-most column for all data where the right-most column matches the value "Iris-setosa". 

In [14]:
#I can do this by setting my string_1 column equal to Iris-setosa, then calculating mean of count_1
data[data["string_1"]=="Iris-setosa"]["count_1"].mean()

5.008333333333334

### Challenge 10:
Create a copy of your iris data dataframe and name it `df2`. Sort the rows of this dataframe based on the values in the first (leftmost) column so that the lowest values are first, and the highest values at the bottom. After sorting, reset the index of the dataframe so that the index is once again ordered. Once you get this to work, try to rewrite it in a simpler form by chaining multiple function calls together to accomplish the goal in one line, by calling code that looks a bit like this, but with the correct function calls: `df.function().function().function()`

In [16]:
#the chained functions are copy, sort_values(by=col1), and reset_index
#note: this didn't work until I looked up on stackexchange (shorturl.at/fsJ49) that I had to add drop=True
#this replaces the index with one of increasing integers which is what we wanted
df2 = data.copy().sort_values(by = "count_1").reset_index(drop=True)

#output data to show
df2

Unnamed: 0,count_1,count_2,count_3,count_4,string_1
0,4.3,3.0,1.1,0.1,Iris-setosa
1,4.4,3.2,1.3,0.2,Iris-setosa
2,4.4,3.0,1.3,0.2,Iris-setosa
3,4.4,2.9,1.4,0.2,Iris-setosa
4,4.5,2.3,1.3,0.3,Iris-setosa
...,...,...,...,...,...
144,7.7,2.6,6.9,2.3,Iris-virginica
145,7.7,3.8,6.7,2.2,Iris-virginica
146,7.7,2.8,6.7,2.0,Iris-virginica
147,7.7,3.0,6.1,2.3,Iris-virginica


### Challenge 11:
Write a function that uses string formatting (curly braces) to create a [mad lib](https://en.wikipedia.org/wiki/Mad_Libs) containing at least 4 words that will be filled in. The returned object of your function should be a string where the missing words are filled by randomly sampled words from the `random_words_api()` function. The sentence or paragraph of your mad lib can be anything you wish, be creative. 

In [20]:
#defining function
def my_mad_lib():
    """
    This function randomly samples 4 words from the random_words_api
    It then will insert them into a sentence and output the resultant mad lib
    """
    #save 4 random words in a list
    words = random_words_api(4)
    
    #return this sentence, using curly braces and indexing to insert words from the above list
    return f"I'm just so {words[0]} with you, Please don't {words[1]} with me, Why yes bananas are {words[2]}, right? All is now {words[3]} honeyboo!"

my_mad_lib()

"I'm just so disown with you, Please don't catbrier with me, Why yes bananas are imperiling, right? All is now maltreaters honeyboo!"

<div class="alert alert-success">
After completing all challenges in this notebook, save and download the .ipynb file to your computer. Move the file to your hack-program repo and put it in a folder called notebooks. Add/stage this file and folder and commit the change, and push to GitHub. The assignment is due by end of day on 3/7/2021.  
</div>