---
<a id='coffee_preference'></a>

# Practice Control Flow on the Coffee Preference Data Set

### 1) Load Coffee Preference data from file and print.

The code to load in the data is provided below. 

The `with open(..., 'r') as f:` opens up a file in "read" mode (rather than "write") and assigns this opened file to `f`. 

We can then use the built-in `.readlines()` function to split the CSV file on newlines and assign it to the variable `lines`.

In [None]:
with open('../assets/datasets/coffee-preferences.csv','r') as f:
    lines = f.readlines()

#### Iterate through `lines` and print them out.

In [None]:
for line in lines:
    print(line)

#### Print out just the `lines` object by typing "lines" in a cell and hitting `enter`.

In [None]:
lines

---

### 2) Remove the remaining newline `'\n'` characters with a `for` loop.

Iterate through the lines of the data and remove the unwanted newline characters.

**.replace('\n', '')** is a built-in string function that will take the substring you want to replace as its first argument and the string you want to replace it with as its second.

In [None]:
for i, line in enumerate(lines):
    lines[i] = lines[i].replace('\n', '')

---

### 3) Split the lines into "header" and "data" variables.

The header is the first string in the list of strings. It contains our data's column names.

In [None]:
header = lines[0]
data = lines[1:]

---

### 4) Split the header and data strings on commas.

To split a string on the comma character, use the built-in **`.split(',')`** function. 

Split the header on commas, then print it. You can see that the original string is now a list containing items that were originally separated by commas.

Replace both `header` and `data` with the list versions (they should now be lists, not strings):

In [None]:
print(header.split(','))
header = header.split(',')

for index, row in enumerate(data):
    data[index] = row.split(',')

---

### 5) Remove the "Timestamp" column.

We aren't interested in the "Timestamp" column in our data, so remove it from the header and data list.

Removing "Timestamp" from the header can be done with list functions or with slicing. To remove the header column from the data, use a `for` loop.

Print out the new data object with the timestamps removed.

In [None]:
del header[0]
# you could also use slicing:
# header = header[1:]

for index, row in enumerate(data):
    data[index] = row[1:]

print(header)
print(data)

---

### 6) Convert numeric columns to floats and empty fields to `None`.

Iterate through the data and construct a new data list of lists that contains the numeric ratings converted from strings to floats and the empty fields (which are empty strings, '') replaced with the `None` object.

Use a nested `for` loop (a `for` loop within another `for` loop) to get the job done. You will likely need to use `if… else` conditional statements as well.

Print out the new data object to make sure you've succeeded.

In [None]:
# for each row (which is a list):
for index, row in enumerate(data):
    # look at all the items in the row
    for item_index, item in enumerate(row):
        # but ignore the first one (that's the name)
        if item_index > 0:
            # replace with None if it's an empty string
            if item == '':
                row[item_index] = None
            # or a float if it's a number
            else:
                row[item_index] = float(item)
    data[index] = row

print(data)

---

### 7) Count the `None` values per person and put the counts in a dictionary.

Use a `for` loop to count the number of `None` values per person. Create a dictionary with the names of the people as keys and the counts of `None` as values.

Who rated the most coffee brands? Who rated the least?

In [None]:
none_counts = {}

for row in data:
    name = row[0]
    ratings = row[1:]
    none_counts[name] = ratings.count(None)

print(none_counts)

You can use `sorted` to sort the dictionary by its items.

The `lambda x: x[1]` is a way to specify a custom sort.

`x` represents an item in the dictionary as a (key, value) tuple, meaning `x[0]` is the key and `x[1]` is the value.

In [None]:
sorted(none_counts.items(), key=lambda x: x[1], reverse=True)

---

### 8) Calculate average rating per coffee brand.

**Excluding `None` values**, calculate the average rating per brand of coffee.

The final output should be a dictionary with the coffee brand names as keys and their average rating as the values.

Remember that the average can be calculated as the sum of the ratings over the number of ratings:

```python
average_rating = float(sum(ratings_list))/len(ratings_list)
```

Print your dictionary to see the average brand ratings.

In [None]:
# extract the brands
brands = header[1:]

ratings_dict = {}

# loop through each brand, where i is the index of the brand list, and i+1 is the index of the rating
# because the 0-th index of each row is the person's name
for index, brand in enumerate(brands):
    # keep track of ratings per brand
    ratings_list = []
    for row in data:
        if row[index+1]: # this is a short way of writing "if row[i+1] is not None"
            ratings_list.append(row[index+1])
    
    # now average the ratings and add to the dictionary
    # we'll also round it to two decimal places with the round function
    average_rating = round(float(sum(ratings_list))/len(ratings_list), 2)
    ratings_dict[brand] = average_rating

print(ratings_dict)

---

### 9) Create a list containing only the people's names.

In [None]:
people = []
for row in data:
    people.append(row[0])

print(people)

---

### 11) Picking a name at random. What are the odds of choosing the same name three times in a row?

Now, we'll use a `while` loop to "brute force" the odds of choosing the same name three times in a row randomly from the list of names.

"Brute force" is a term used quite frequently in programming to refer to a computationally inefficient way of solving a problem. It's brute force in this situation because we can use statistics to solve this much more efficiently than if we actually played out an entire scenario.

Below, we've imported the **`random`** package, which has the essential function for this code: **`random.choice()`**.
The function takes a list as an argument and returns one of the elements of that list at random.

In [None]:
import random
# Choose a random person from the list of people:
random.choice(people)

Write a function to choose a person from the list randomly three times and check if they are all the same.

Define a function that has the following properties:

1) Takes a list (your list of names) as an argument.
2) Selects a name using `random.choice(people)` three separate times.
3) Returns `True` if the name was the same all three times; otherwise returns `False`.

In [None]:
def random_draw(people):

    num_occurrences = 0
    same_name_three_times = False

    while same_name_three_times == False:
        # increase the number of occurrences
        num_occurrences += 1

        # draw three names
        name_1 = random.choice(people)
        name_2 = random.choice(people)
        name_3 = random.choice(people)

        # test for equivalene
        if name_1 == name_2 == name_3:
            # found, so stop looping
            same_name_three_times = True
    
    print(f"The same name was drawn after {num_occurrences} occurrences")
    
# call the function!
random_draw(people)