# MSDS 430 Module 4 Python Assignment Solutions <font color=red>32 pts.</font>

<div class="alert alert-block alert-warning"><b>In this assignment you will read through the notebook and complete the exercises. Once you are satisfied with the results, submit your notebook, html file, and blizzard_totals.xlsx file to Canvas. Your files should include all output, i.e. run each cell and save your file before submitting.</b></div>

<div class="alert alert-block alert-danger"><b>Note:</b> You also must submit your <b>blizzard_totals.xlsx</b> file to Canvas for grading in addition to the usual notebook and html files.</div>

<div class="alert alert-block alert-info">One aspect of data science is working with data from files. In this assignment we will learn to read in data from four different file types:
    
1. text file (using a for loop and .readlines)
2. csv file (using pandas)
3. excel file (using pandas)
4. JSON file (using json)
    
In the process we will be creating and manipulating Python lists. We will also see how data can be written to a new excel file. Later in the course we'll learn more about how to display this information neatly and manipulate the data more efficiently, but for now we start by learning the basics of reading and writing files.</div>

### Reading Text Files

You are given a file `DQ_Blizzard_Nutrition.txt` that contains nutritional information about Dairy Queen Blizzard&#174; varieties.  Each row in the text file is a list of twelve values that correspond to `Menu Item`, `Calories (kcal)`, `Fat Calories (kcal)`, `Total Fat (g)`, `Saturated Fat (g)`, `Trans Fat (g)`, `Cholesterol (mg)`, `Sodium (mg)`, `Carbohydrates (g)`, `Fiber (g)`, `Sugars (g)`, and `Protein (g)` separated by spaces.

In Python, there is a built-in `open` method that takes the name of a text file in the current directory (or more generally a path to a text file in any directory on your computer) and returns what is known as a `file object`. This file object can be used to read from an existing text file, create and write to a new file, or append text to a pre-existing file. See the following documentation for more information:

__[Opening Files in Python](https://docs.python.org/3/library/functions.html#open)__

For example, 
```python
fileName = open('my_file.txt',r)
```

would open a file named `my_file.txt` for reading (i.e. `mode = 'r'`) and returns a corresponding file object which is assigned to the variable `fileName`. 

If the file cannot be opened for some reason (e.g. if the file doesn't exist in the current directory), then an error is generated. More specifically, an `Exception` object is created and said to be "thrown". 

In [1]:
# Set up notebook to display multiple outputs in one cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

Run the following cell to read in `DQ_Blizzard_Nutrition.txt` using the `open` method. No output will be displayed just yet. We will learn a few ways to view the contents of the file in just a moment.

In [2]:
# Open a text file
fileName = open('DQ_Blizzard_Nutrition.txt', 'r')

### Displaying File Contents

If `my_file` is a file object corresponding to a text file, you can iterate over the lines of text in the file as follows:
```python
for line in my_file:
    print(line)  # Do something with each line...for example we can print the line
```

In this next example we will use a `for` loop to iterate over each line (one at a time) in our file object `fileName`. The variable `line` will take on the value of each line then be printed. After a line is printed, the loop is executed again and the value of `line` will be overwritten with next line in the file object. The loop will continue until there are no more lines to read.

In [3]:
for line in fileName:
    print(line)

Cinnamon/Roll/Centers/Blizzard   890.0  350.0  32.0 20.0 1.5  120.0  390.0  132.0  1.0  106.0  18.0

Frosted/Sugar/Cookie/Blizzard   960.0  290.0  42.0  19.0  1.0   75.0  400.0  129.0  1.0  100.0  17.0

Butterfinger/Blizzard  730.0  230.0  26.0  15.0  1.0   55.0  330.0  107.0  2.0   81.0  18.0

Choco/Brownie/Extreme/Blizzard   810.0  330.0  36.0  21.0  1.0   55.0  370.0  111.0  4.0   87.0  16.0

Chocolate/Chip/Cookie/Dough/Blizzard  1030.0  370.0  41.0  24.0  1.0   60.0  570.0  151.0  2.0  111.0  17.0

Heath/Blizzard   860.0  330.0  37.0  23.0  1.0   65.0  440.0  119.0  1.0  106.0  16.0

M&M’s/Chocolate/Candy/Blizzard   800.0  240.0  27.0  17.0  1.0   60.0  250.0  124.0  2.0  107.0  16.0

OREO/Cookie/Blizzard   790.0  280.0  31.0  15.0  1.0   50.0  400.0  117.0  1.0   88.0  14.0

OREO/Hot/Cocoa/Blizzard  1050.0  410.0  45.0  22.0  1.5   65.0  500.0  147.0  2.0  113.0  19.0

Reese's/Peanut/Butter/Cups/Blizzard   750.0  280.0  31.0  16.0  1.0   60.0  380.0  102.0  2.0   88.0  19.0

Reese

Now let's look at the variable `line`. Notice below that it returns only the last line from the file and this variable is a single string.

In [4]:
line

type(line)

'Very/Cherry/Chip/Blizzard   730.0  230.0  25.0  18.0  1.0   65.0  250.0  113.0  2.0   97.0  16.0'

str

## Python Collection Data Types

Python has four collection types: Lists, Tuples, Dictionaries and Sets. For now we will focus primarily on lists and some with tuples.

1. **Lists** are ordered sequences of elements, with that order being specified by the order that the elements are in when the list is created or as elements are added to the list.  

    1. Lists are created using the `[]` syntax.
    
    2. Lists are <font color ='green'>**mutable**</font>. You can add, remove, and replace values using functions such as `append()`, `extend()`, `insert()`, `pop()`, `remove()`, and `del()`. 
    
    3. Lists can be created by string functions such as `split()` and `strip()`.
    
2. **Tuples** are similar to lists except for the very important fact that they are <font color = 'green'>**immutable**</font>.

    1. Tuples are created using the `()` syntax.
    
    2. Since Tuples are immutable, there are no functions that are built-in to modify the variables of Tuples.
    
    3. When to use a Tuple?  When you have data that will never change, like the days of the week:
       `days_of_the_week = ("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")`
    
3. **Both Lists and Tuples**

    1. Can include mixed data types.
    
    2. Are accessed by index.
    
    3. Contain a sequence of individual elements.
    
    4. Are stored in the order in that they were added.

### Separate each column of line
Recall, that each line of the file was read in as a single string. The goal would be to break each line up into a single list of strings. To do this, we will use the `split()` method, which is defined in the string class. We will do more with the string class and explore more of its methods in detail in a later module.

To use the split method we need to first have a string object. Above we created a string object called `line`. We will call the `split()` method on this object in this way:
```python
line.split()
```

In [5]:
# Use the split method to create a list of values for the line
lst = line.split()
print(lst)

type(lst)

['Very/Cherry/Chip/Blizzard', '730.0', '230.0', '25.0', '18.0', '1.0', '65.0', '250.0', '113.0', '2.0', '97.0', '16.0']


list

Now that our string is split into a list of strings, we can use indexing to retrieve specific values. Run the following three cells for some examples showing how to access elements of the list.

In [6]:
print(f'The first element of the list is {lst[0]}')

The first element of the list is Very/Cherry/Chip/Blizzard


In [7]:
print(f'The fifth element of the list is {lst[4]}')

The fifth element of the list is 18.0


In [8]:
print(f'The last element of the list is {lst[-1]}')

The last element of the list is 16.0


<div class="alert alert-block alert-success"><b>Problem 1 (3 pts)</b>: Iterate over lines in the file as demonstrated above and print the following for each line:</div>

`" < Menu Item > 'has < Saturated Fat > g of saturated fat and < Cholesterol > mg of cholesterol"`<br>

<div class="alert alert-block alert-info">For example, the first line printed should look like this: </div>

`Cinnamon/Roll/Centers/Blizzard has 20.0 g of saturated fat and 120.0 mg of cholesterol`

In [9]:
# TODO: Use the open() method to read in 'DQ_Blizzard_Nutrition.txt' and assign it to the file object
# named 'blizzard'
blizzard = open("DQ_Blizzard_Nutrition.txt", "r")

# TODO: Iterate over each line in 'blizzard' and split each line into a list of strings
for line in blizzard:
    
    lines = line.split()
    
    # TODO: Print the sentence shown above for each line
    print(f'{lines[0]} has {lines[4]} g of saturated fat and {lines[6]} mg of cholesterol')

# TODO: Close the file
blizzard.close()

Cinnamon/Roll/Centers/Blizzard has 20.0 g of saturated fat and 120.0 mg of cholesterol
Frosted/Sugar/Cookie/Blizzard has 19.0 g of saturated fat and 75.0 mg of cholesterol
Butterfinger/Blizzard has 15.0 g of saturated fat and 55.0 mg of cholesterol
Choco/Brownie/Extreme/Blizzard has 21.0 g of saturated fat and 55.0 mg of cholesterol
Chocolate/Chip/Cookie/Dough/Blizzard has 24.0 g of saturated fat and 60.0 mg of cholesterol
Heath/Blizzard has 23.0 g of saturated fat and 65.0 mg of cholesterol
M&M’s/Chocolate/Candy/Blizzard has 17.0 g of saturated fat and 60.0 mg of cholesterol
OREO/Cookie/Blizzard has 15.0 g of saturated fat and 50.0 mg of cholesterol
OREO/Hot/Cocoa/Blizzard has 22.0 g of saturated fat and 65.0 mg of cholesterol
Reese's/Peanut/Butter/Cups/Blizzard has 16.0 g of saturated fat and 60.0 mg of cholesterol
Reese's/Take/Five/Blizzard has 21.0 g of saturated fat and 65.0 mg of cholesterol
Royal/New/York/Cheesecake/Blizzard/Filled/with/Strawberry has 21.0 g of saturated fat and

### Creating Lists

Our next objective is to create three lists from the data: (1) a list of the names of each Blizzard&#174;, (2) a list with the corresponding number of calories, and (3) a list with the corresponding amount of sodium. 

Since we closed the file in Problem #1, we need to reopen the `DQ_Blizzard_Nutrition.txt` file for reading. This time we read all the lines at once using the file object's `readlines` method.

In [10]:
blizzard = open("DQ_Blizzard_Nutrition.txt", "r")
lines = blizzard.readlines()
print(lines)
print(40*'=')

# Check the data type for 'lines'
type(lines)

['Cinnamon/Roll/Centers/Blizzard   890.0  350.0  32.0 20.0 1.5  120.0  390.0  132.0  1.0  106.0  18.0\n', 'Frosted/Sugar/Cookie/Blizzard   960.0  290.0  42.0  19.0  1.0   75.0  400.0  129.0  1.0  100.0  17.0\n', 'Butterfinger/Blizzard  730.0  230.0  26.0  15.0  1.0   55.0  330.0  107.0  2.0   81.0  18.0\n', 'Choco/Brownie/Extreme/Blizzard   810.0  330.0  36.0  21.0  1.0   55.0  370.0  111.0  4.0   87.0  16.0\n', 'Chocolate/Chip/Cookie/Dough/Blizzard  1030.0  370.0  41.0  24.0  1.0   60.0  570.0  151.0  2.0  111.0  17.0\n', 'Heath/Blizzard   860.0  330.0  37.0  23.0  1.0   65.0  440.0  119.0  1.0  106.0  16.0\n', 'M&M’s/Chocolate/Candy/Blizzard   800.0  240.0  27.0  17.0  1.0   60.0  250.0  124.0  2.0  107.0  16.0\n', 'OREO/Cookie/Blizzard   790.0  280.0  31.0  15.0  1.0   50.0  400.0  117.0  1.0   88.0  14.0\n', 'OREO/Hot/Cocoa/Blizzard  1050.0  410.0  45.0  22.0  1.5   65.0  500.0  147.0  2.0  113.0  19.0\n', "Reese's/Peanut/Butter/Cups/Blizzard   750.0  280.0  31.0  16.0  1.0   60.0 

list

The list, `lines`, created above is a single list with each row from the file as an item in the list. We can now create separate lists by iterating over `lines` and splitting each element into its own list. The purpose of doing this would be to create new lists based on indices for example, `Fat Calories`, or `Sugars`.<br>
Recall, last week we learned a few ways to create lists. We learned to build a list from scratch, create a list using list comprehensions, and create or change a list using the `append` method. In this next example we start with an empty list `cholesterol` then iterate over `lines` and append to the list based on indices of the appropriate column. Notice we also convert the `Cholesterol` value to a float at the time it is added to the list.

In [11]:
cholesterol = [] # Start with an empty list

for line in lines: # Iterate over the 'lines' list
    split = line.split()  # Split each string (element) in 'lines' into its own list
    cholesterol.append(float(split[6])) # append 'Cholesterol' to the 'cholesterol' list as a float type

print(cholesterol)  # Display the list

[120.0, 75.0, 55.0, 55.0, 60.0, 65.0, 60.0, 50.0, 65.0, 60.0, 65.0, 105.0, 65.0, 65.0, 70.0, 65.0, 65.0]


<div class="alert alert-block alert-success"><b>Problem 2 (5 pts.)</b>: Complete the following tasks:
    
1. Start with two empty lists: <b><i>calories</i></b> and <b><i>sodium</i></b> then iterate over the <b><i>lines</i></b> list, splitting each line in turn.
2. Append each value for calories and sodium (converted to a float type) to the appropriate list.
3. Print each list.
</div>

In [12]:
# TODO: Create two empty lists: 'calories' and 'sodium'
calories = []
sodium = []

# TODO: Iterate over the 'lines' list and split each line
for line in lines:
    split = line.split()
    
    # TODO: Append calories and sodium to the appropriate list while converting each to a float type
    calories.append(float(split[1]))
    sodium.append(float(split[7]))

<div class="alert alert-block alert-success"><b>Problem 2 continued:</b> Print the <b><i>calories</i></b> list.</div>

In [13]:
# TODO: Print the 'calories' list
print(calories)

[890.0, 960.0, 730.0, 810.0, 1030.0, 860.0, 800.0, 790.0, 1050.0, 750.0, 1120.0, 1040.0, 1040.0, 800.0, 1010.0, 1020.0, 730.0]


<div class="alert alert-block alert-success"><b>Problem 2 continued:</b> Print the <b><i>sodium</i></b> list.</div>

In [14]:
# TODO: Print the 'sodium' list
print(sodium)

[390.0, 400.0, 330.0, 370.0, 570.0, 440.0, 250.0, 400.0, 500.0, 380.0, 860.0, 530.0, 480.0, 340.0, 510.0, 390.0, 250.0]


One of Python's built-in string methods is `replace()`. This can be used when we want to replace or remove a character from a string. Below are two simple examples. The first one replaces the `#` symbol in a string with a blank space and the second one replaces `-` with `.`. In each case, the symbol is replaced everywhere it occurs in the string. We could add a third argument to limit the number of times the symbol is replaced. We will learn more efficient ways to remove or replace characters from strings later on in the course.

In [15]:
my_string = 'abc#def#ghi'

my_string.replace('#', ' ')

'abc def ghi'

In [16]:
phone_num = '1-402-363-9951'
phone_num.replace('-', '.')

'1.402.363.9951'

As we can see from the file contents, the name of each Blizzard&#174; contains forward slashes in it. It would be nice if we could replace those with a blank space.

<div class="alert alert-block alert-success"><b>Problem 3 (5 pts.)</b>: Complete the following tasks:
    
1. Start with one empty list named <b><i>menu_item</i></b> then iterate over the <b><i>lines</i></b> list, splitting each line in turn. <br>
2. Obtain the element with the name of the Blizzard&#174; and replace <b><mark>/</mark></b> with a blank space.<br>
3. Append the revised name to the <b><i>menu_item</i></b> list.
4. Print the <b><i>menu_item</i></b> list.
</div>

In [17]:
# TODO: Start with an empty list named 'menu_item'
menu_item = []

# TODO: Iterate over the 'lines' list and split each line
for line in lines:
    split = line.split()
    
    # TODO: Obtain the element with the name of the Blizzard and replace / with a blank space
    split[0] = split[0].replace('/',' ')
    
    # TODO: Append the revised name to the 'menu_item' list
    menu_item.append(split[0])

# TODO: Print the 'menu_item' list
print(menu_item)

['Cinnamon Roll Centers Blizzard', 'Frosted Sugar Cookie Blizzard', 'Butterfinger Blizzard', 'Choco Brownie Extreme Blizzard', 'Chocolate Chip Cookie Dough Blizzard', 'Heath Blizzard', 'M&M’s Chocolate Candy Blizzard', 'OREO Cookie Blizzard', 'OREO Hot Cocoa Blizzard', "Reese's Peanut Butter Cups Blizzard", "Reese's Take Five Blizzard", 'Royal New York Cheesecake Blizzard Filled with Strawberry', 'Royal Ultimate Choco Brownie Blizzard Filled with Fudge', 'Snickers Blizzard', 'Snickers Brownie Blizzard', 'Turtle Pecan Cluster Blizzard', 'Very Cherry Chip Blizzard']


### Working with Methods

Next we will introduce two list methods and ask you use them together in the next problem. First we have the `max` method to get the maximum value in a list.

In [18]:
my_list = [1,2,3,10,4,5,6]
max(my_list)

10

Second, we can get the "position" of any value in the list using the `index` method. Note that the first position has `index` **zero** and not **one**. So it would be more accurate to think of the `index` as the <b><i>offset</i></b> as opposed to the <b><i>position</i></b>. 

In [19]:
my_list.index(10)

3

Run the following cell to double check that the value in position (offset) 3 really is 10.

In [20]:
my_list[3]

10

It is important to understand the two parts of the list and how to access these parts: (1) the actual value in the list and (2) the index that represents the actual value.

In [21]:
# Create a new list of colors named 'my_list2'
my_list2 = ['red','yellow','green','blue','purple']

# Display the elements of 'my_list2'
my_list2

# Confirm it is a list
type(my_list2)

# How long is our list?
len(my_list2)

['red', 'yellow', 'green', 'blue', 'purple']

list

5

Below we loop through `my_list2` showing how both the value of the list is referenced and how the index is referenced.

In [22]:
length = len(my_list2)

for num in range(0,length):
    print(f'The value is {my_list2[num]} and the index is {num}.')
    

The value is red and the index is 0.
The value is yellow and the index is 1.
The value is green and the index is 2.
The value is blue and the index is 3.
The value is purple and the index is 4.


<div class="alert alert-block alert-success"><b>Problem 4 (6 pts.)</b>: Define a function called <b><i>high_content</i></b> that takes three arguments: <b><i>item_list, info_list</i></b> and <b><i> label</i></b>. The first two arguments are lists and the third is a string variable. 
    
The lists used for this problem are ones created above: `menu_item`, `sodium`, and `cholesterol`.<br>

The function should find the Blizzard&#174; with the largest value for either sodium or cholesterol and print the name of the Blizzard&#174; together with the highest value and the label for that value. </div>

<div class="alert alert-block alert-info">For example, <br>

`high_content(['Item1','Item2','Item3'], [300,410,250],'sodium')` should print: 

**Item2 has the highest sodium level with 410 mg.**<br>
</div>

In [23]:
# TODO: Define a function called 'high_content' that takes 3 arguments 'item_list', 'info_list', and 'label'
def high_content(item_list,info_list, label):
    
    #TODO: Find the maximum value and save to a variable
    most = max(info_list)
    
    #TODO: Print the output as indicated above
    for num in range(len(item_list)):
        if info_list[num] == most:
            print(f'{item_list[num]} has the highest {label} level with {most} mg.')
    


<div class="alert alert-block alert-success"><b>Problem 4 continued:</b> Call the <b>high_content</b> function by passing the lists <b><i>menu_item</i></b> and <b><i>cholesterol</i></b> for the first two arguments and the label <b><i>'cholesterol'</i></b> for the third argument.</div>

In [24]:
#TODO: Call the function 'high_content' by passing the lists menu_item and cholesterol for the first
# two arguments and pass 'cholesterol' for the third argument
high_content(menu_item, cholesterol, 'cholesterol')

Cinnamon Roll Centers Blizzard has the highest cholesterol level with 120.0 mg.


<div class="alert alert-block alert-success"><b>Problem 4 continued:</b> Call the <b>high_content</b> function by passing the lists <b><i>menu_item</i></b> and <b><i>sodium</i></b> for the first two arguments and the label <b><i>'sodium'</i></b> for the third argument.</div>

In [25]:
#TODO: Call the function 'high_content' by passing the lists menu_item and sodium for the first
# two arguments and pass 'sodium' for the third argument
high_content(menu_item, sodium, 'sodium')

Reese's Take Five Blizzard has the highest sodium level with 860.0 mg.


### Reading files using Pandas

__[Pandas Overview](https://pandas.pydata.org/pandas-docs/stable/getting_started/overview.html)__

**Pandas** is a large library which is used extensively in Data Science to wrangle data.  As you can see in the cell below, the library must be loaded with the command:

<font color = 'green'>**import**</font> pandas <font color = 'green'>**as**</font> pd

**Pandas** gives you access to two additional data structures: (1) a one dimensional <i><u>Series</u></i> and (2) a two dimensional <i><u>DataFrame</u></i>. We will look at some basics of the DataFrame which has the following attributes:
1. spread-sheet like structure
2. has ordered collection of columns
3. each column can be of different value types such as numeric, boolean, string, etc.
4. has both a row and column index


We begin by importing pandas and reading in `DQ_Blizzard_Nutrition.csv`.

In [26]:
import pandas as pd

blizzard_csv = pd.read_csv('DQ_Blizzard_Nutrition.csv')
blizzard_csv

Unnamed: 0,Cinnamon Roll Centers Blizzard,890,350,32,20,1.5,120,390,132,1,106,18
0,Frosted Sugar Cookie Blizzard,960,290,42,19,1.0,75,400,129,1,100,17
1,Butterfinger Blizzard,730,230,26,15,1.0,55,330,107,2,81,18
2,Choco Brownie Extreme Blizzard,810,330,36,21,1.0,55,370,111,4,87,16
3,Chocolate Chip Cookie Dough Blizzard,1030,370,41,24,1.0,60,570,151,2,111,17
4,Heath Blizzard,860,330,37,23,1.0,65,440,119,1,106,16
5,M&M's Chocolate Candy Blizzard,800,240,27,17,1.0,60,250,124,2,107,16
6,OREO Cookie Blizzard,790,280,31,15,1.0,50,400,117,1,88,14
7,OREO Hot Cocoa Blizzard,1050,410,45,22,1.5,65,500,147,2,113,19
8,Reese's Peanut Butter Cups Blizzard,750,280,31,16,1.0,60,380,102,2,88,19
9,Reese's Take Five Blizzard,1120,440,49,21,1.0,65,860,145,5,111,29


<div class="alert alert-block alert-success"><b>Problem 5 (3 pts.)</b>: As you can see from the result of reading in the csv file, the first record is appearing as the header.

Read the file in again so that there is no header and all records from the file show correctly. Use the documentation as needed: __[pandas.read_csv](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html)__
</div>

In [27]:
#TODO: Read in the 'DQ_Blizzard_Nutrition.csv' file without a header
blizzard_csv = pd.read_csv('DQ_Blizzard_Nutrition.csv', header = None)

#TODO: Display the file contents to confirm all records are accurate
blizzard_csv

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11
0,Cinnamon Roll Centers Blizzard,890,350,32,20,1.5,120,390,132,1,106,18
1,Frosted Sugar Cookie Blizzard,960,290,42,19,1.0,75,400,129,1,100,17
2,Butterfinger Blizzard,730,230,26,15,1.0,55,330,107,2,81,18
3,Choco Brownie Extreme Blizzard,810,330,36,21,1.0,55,370,111,4,87,16
4,Chocolate Chip Cookie Dough Blizzard,1030,370,41,24,1.0,60,570,151,2,111,17
5,Heath Blizzard,860,330,37,23,1.0,65,440,119,1,106,16
6,M&M's Chocolate Candy Blizzard,800,240,27,17,1.0,60,250,124,2,107,16
7,OREO Cookie Blizzard,790,280,31,15,1.0,50,400,117,1,88,14
8,OREO Hot Cocoa Blizzard,1050,410,45,22,1.5,65,500,147,2,113,19
9,Reese's Peanut Butter Cups Blizzard,750,280,31,16,1.0,60,380,102,2,88,19


### Reading Excel files with Pandas

Next we read in an Excel file using pandas and look at the information about the file using the `info` method. Notice that `info( )` tells you how many rows, how many non-nulls per column and each variable type. After this we look at the first 5 rows of the data using the `head` method and the last 5 rows using the `tail` method.

In [28]:
# Read in the file named 'DQ_Blizzard_Nutrition.xlsx'
blizzard_excel = pd.read_excel('DQ_Blizzard_Nutrition.xlsx')

# Inspect the data using the info method
blizzard_excel.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 12 columns):
 #   Column                                                    Non-Null Count  Dtype 
---  ------                                                    --------------  ----- 
 0   Nutrition information for Dairy Queen Blizzard varieties  20 non-null     object
 1   Unnamed: 1                                                18 non-null     object
 2   Unnamed: 2                                                18 non-null     object
 3   Unnamed: 3                                                18 non-null     object
 4   Unnamed: 4                                                18 non-null     object
 5   Unnamed: 5                                                18 non-null     object
 6   Unnamed: 6                                                18 non-null     object
 7   Unnamed: 7                                                18 non-null     object
 8   Unnamed: 8                      

In [29]:
# Look at the first 5 rows
blizzard_excel.head()

Unnamed: 0,Nutrition information for Dairy Queen Blizzard varieties,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11
0,"Includes fat content, cholesterol, sodium, car...",,,,,,,,,,,
1,Menu Item,Calories (kcal),Fat Calories (kcal),Total Fat (g),Saturated Fat (g),Trans Fat (g),Cholesterol (mg),Sodium (mg),Carbohydrates (g),Fiber (g),Sugars (g),Protein (g)
2,Cinnamon Roll Centers Blizzard,890,350,32,20,1.5,120,390,132,1,106,18
3,Frosted Sugar Cookie Blizzard,960,290,42,19,1,75,400,129,1,100,17
4,Butterfinger Blizzard,730,230,26,15,1,55,330,107,2,81,18


As we can see, the first two rows appear to be typed comments and are not part of the data with nutrition information. We can also use the `tail()` method to look at the last few lines of the file.

In [30]:
blizzard_excel.tail()

Unnamed: 0,Nutrition information for Dairy Queen Blizzard varieties,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11
15,Snickers Blizzard,800.0,250.0,28.0,15.0,1.0,65.0,340.0,120.0,1.0,102.0,19.0
16,Snickers Brownie Blizzard,1010.0,340.0,37.0,20.0,1.0,70.0,510.0,151.0,3.0,118.0,21.0
17,Turtle Pecan Cluster Blizzard,1020.0,470.0,52.0,29.0,1.0,65.0,390.0,123.0,3.0,99.0,18.0
18,Very Cherry Chip Blizzard,730.0,230.0,25.0,18.0,1.0,65.0,250.0,113.0,2.0,97.0,16.0
19,All products contain milk. Some products conta...,,,,,,,,,,,


Again, it appears there is information in the last row that is not part of the data we're interested in. There is a way to handle unneeded rows when reading in an excel file, which will be used in the next problem. __[pandas.read_excel](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html)__

<div class="alert alert-block alert-success"><b>Problem 6 (4 pts.)</b>: Read in <b>DQ_Blizzard_Nutrition.xlsx</b> again so that the header is correct, the footer is removed, and all records from the file show correctly.
</div>

In [31]:
# TODO: Read in 'DQ_Blizzard_Nutrition.xlsx' so that the header is correct and the footer is removed
blizzard_excel = pd.read_excel('DQ_Blizzard_Nutrition.xlsx', skiprows = 2, skipfooter = 1)

# TODO: Display the first five rows of data
blizzard_excel.head()

Unnamed: 0,Menu Item,Calories (kcal),Fat Calories (kcal),Total Fat (g),Saturated Fat (g),Trans Fat (g),Cholesterol (mg),Sodium (mg),Carbohydrates (g),Fiber (g),Sugars (g),Protein (g)
0,Cinnamon Roll Centers Blizzard,890,350,32,20,1.5,120,390,132,1,106,18
1,Frosted Sugar Cookie Blizzard,960,290,42,19,1.0,75,400,129,1,100,17
2,Butterfinger Blizzard,730,230,26,15,1.0,55,330,107,2,81,18
3,Choco Brownie Extreme Blizzard,810,330,36,21,1.0,55,370,111,4,87,16
4,Chocolate Chip Cookie Dough Blizzard,1030,370,41,24,1.0,60,570,151,2,111,17


<div class="alert alert-block alert-success"><b>Problem 6 continued</b>: Display the last 3 rows of data.
</div>

In [32]:
# TODO: Display at the last 3 rows of data
blizzard_excel.tail(3)

Unnamed: 0,Menu Item,Calories (kcal),Fat Calories (kcal),Total Fat (g),Saturated Fat (g),Trans Fat (g),Cholesterol (mg),Sodium (mg),Carbohydrates (g),Fiber (g),Sugars (g),Protein (g)
14,Snickers Brownie Blizzard,1010,340,37,20,1.0,70,510,151,3,118,21
15,Turtle Pecan Cluster Blizzard,1020,470,52,29,1.0,65,390,123,3,99,18
16,Very Cherry Chip Blizzard,730,230,25,18,1.0,65,250,113,2,97,16


### Read a json file

Now let's look at another file type, a `json` file. If we look at the raw data in `DQ_Blizzard_Nutrition.json`, it looks like this:<br>

`{"Menu Item":{"0":"Cinnamon Roll Centers Blizzard","1":"Frosted Sugar Cookie Blizzard","2":"Butterfinger Blizzard","3":"Choco Brownie Extreme Blizzard","4":"Chocolate Chip Cookie Dough Blizzard","5":"Heath Blizzard","6":"M&M's Chocolate Candy Blizzard","7":"OREO Cookie Blizzard","8":"OREO Hot Cocoa Blizzard","9":"Reese's Peanut Butter Cups Blizzard","10":"Reese's Take Five Blizzard","11":"Royal New York Cheesecake Blizzard Filled with Strawberry","12":"Royal Ultimate Choco Brownie Blizzard Filled with Fudge","13":"Snickers Blizzard","14":"Snickers Brownie Blizzard","15":"Turtle Pecan Cluster Blizzard","16":"Very Cherry Chip Blizzard"},"Calories (kcal)":{"0":890,"1":960,"2":730,"3":810,"4":1030,"5":860,"6":800,"7":790,"8":1050,"9":750,"10":1120,"11":1040,"12":1040,"13":800,"14":1010,"15":1020,"16":730},"Fat Calories (kcal)":{"0":350,"1":290,"2":230,"3":330,"4":370,"5":330,"6":240,"7":280,"8":410,"9":280,"10":440,"11":410,"12":410,"13":250,"14":340,"15":470,"16":230},"Total Fat (g)":{"0":32,"1":42,"2":26,"3":36,"4":41,"5":37,"6":27,"7":31,"8":45,"9":31,"10":49,"11":46,"12":45,"13":28,"14":37,"15":52,"16":25},"Saturated Fat (g)":{"0":20,"1":19,"2":15,"3":21,"4":24,"5":23,"6":17,"7":15,"8":22,"9":16,"10":21,"11":21,"12":29,"13":15,"14":20,"15":29,"16":18},"Trans Fat (g)":{"0":1.5,"1":1.0,"2":1.0,"3":1.0,"4":1.0,"5":1.0,"6":1.0,"7":1.0,"8":1.5,"9":1.0,"10":1.0,"11":1.5,"12":1.0,"13":1.0,"14":1.0,"15":1.0,"16":1.0},"Cholesterol (mg)":{"0":120,"1":75,"2":55,"3":55,"4":60,"5":65,"6":60,"7":50,"8":65,"9":60,"10":65,"11":105,"12":65,"13":65,"14":70,"15":65,"16":65},"Sodium (mg)":{"0":390,"1":400,"2":330,"3":370,"4":570,"5":440,"6":250,"7":400,"8":500,"9":380,"10":860,"11":530,"12":480,"13":340,"14":510,"15":390,"16":250},"Carbohydrates (g)":{"0":132,"1":129,"2":107,"3":111,"4":151,"5":119,"6":124,"7":117,"8":147,"9":102,"10":145,"11":140,"12":146,"13":120,"14":151,"15":123,"16":113},"Fiber (g)":{"0":1,"1":1,"2":2,"3":4,"4":2,"5":1,"6":2,"7":1,"8":2,"9":2,"10":5,"11":2,"12":5,"13":1,"14":3,"15":3,"16":2},"Sugars (g)":{"0":106,"1":100,"2":81,"3":87,"4":111,"5":106,"6":107,"7":88,"8":113,"9":88,"10":111,"11":112,"12":117,"13":102,"14":118,"15":99,"16":97},"Protein (g)":{"0":18,"1":17,"2":18,"3":16,"4":17,"5":16,"6":16,"7":14,"8":19,"9":19,"10":29,"11":19,"12":19,"13":19,"14":21,"15":18,"16":16}}`

__[pandas.read_json](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_json.html)__

Next we read a `json` file in using Pandas.


In [33]:
blizzard_json = pd.read_json('DQ_Blizzard_Nutrition.json')

# Determine the data type of 'blizzard_json'
type(blizzard_json)

# Look at first 3 records
blizzard_json.head(3)

# Display the file info
blizzard_json.info()

pandas.core.frame.DataFrame

Unnamed: 0,Menu Item,Calories (kcal),Fat Calories (kcal),Total Fat (g),Saturated Fat (g),Trans Fat (g),Cholesterol (mg),Sodium (mg),Carbohydrates (g),Fiber (g),Sugars (g),Protein (g)
0,Cinnamon Roll Centers Blizzard,890,350,32,20,1.5,120,390,132,1,106,18
1,Frosted Sugar Cookie Blizzard,960,290,42,19,1.0,75,400,129,1,100,17
2,Butterfinger Blizzard,730,230,26,15,1.0,55,330,107,2,81,18


<class 'pandas.core.frame.DataFrame'>
Int64Index: 17 entries, 0 to 16
Data columns (total 12 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Menu Item            17 non-null     object 
 1   Calories (kcal)      17 non-null     int64  
 2   Fat Calories (kcal)  17 non-null     int64  
 3   Total Fat (g)        17 non-null     int64  
 4   Saturated Fat (g)    17 non-null     int64  
 5   Trans Fat (g)        17 non-null     float64
 6   Cholesterol (mg)     17 non-null     int64  
 7   Sodium (mg)          17 non-null     int64  
 8   Carbohydrates (g)    17 non-null     int64  
 9   Fiber (g)            17 non-null     int64  
 10  Sugars (g)           17 non-null     int64  
 11  Protein (g)          17 non-null     int64  
dtypes: float64(1), int64(10), object(1)
memory usage: 1.7+ KB


## Adding a Column to the DataFrame

We want to create a new column for the percentage of fat. We have the total calories and the fat calories for each Blizzard&#174;, so we can get the fat percentage using those two columns.

In [34]:
# Add a new column to the dataframe
blizzard_json['Fat %'] = round(100*blizzard_json['Fat Calories (kcal)']/blizzard_json['Calories (kcal)'],2)

# Inspect the dataframe to see the new column
blizzard_json.info()
blizzard_json.head()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 17 entries, 0 to 16
Data columns (total 13 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Menu Item            17 non-null     object 
 1   Calories (kcal)      17 non-null     int64  
 2   Fat Calories (kcal)  17 non-null     int64  
 3   Total Fat (g)        17 non-null     int64  
 4   Saturated Fat (g)    17 non-null     int64  
 5   Trans Fat (g)        17 non-null     float64
 6   Cholesterol (mg)     17 non-null     int64  
 7   Sodium (mg)          17 non-null     int64  
 8   Carbohydrates (g)    17 non-null     int64  
 9   Fiber (g)            17 non-null     int64  
 10  Sugars (g)           17 non-null     int64  
 11  Protein (g)          17 non-null     int64  
 12  Fat %                17 non-null     float64
dtypes: float64(2), int64(10), object(1)
memory usage: 1.9+ KB


Unnamed: 0,Menu Item,Calories (kcal),Fat Calories (kcal),Total Fat (g),Saturated Fat (g),Trans Fat (g),Cholesterol (mg),Sodium (mg),Carbohydrates (g),Fiber (g),Sugars (g),Protein (g),Fat %
0,Cinnamon Roll Centers Blizzard,890,350,32,20,1.5,120,390,132,1,106,18,39.33
1,Frosted Sugar Cookie Blizzard,960,290,42,19,1.0,75,400,129,1,100,17,30.21
2,Butterfinger Blizzard,730,230,26,15,1.0,55,330,107,2,81,18,31.51
3,Choco Brownie Extreme Blizzard,810,330,36,21,1.0,55,370,111,4,87,16,40.74
4,Chocolate Chip Cookie Dough Blizzard,1030,370,41,24,1.0,60,570,151,2,111,17,35.92


<div class="alert alert-block alert-success"><b>Problem 7 (4 pts.)</b>: Create two new columns called <b><i>Saturated Fat %</i></b> and <b><i>Sugar Calories</i></b>. Round your newly calculated values to two decimals for each new column. 
    
The calculation for the first new column would be the ratio of <b>Saturated Fat (g)</b> with <b>Total Fat (g)</b> multiplied by 100. 
    
The calculation for the second column involves two steps (which can be combined into one):

1. Convert <b>Sugars (g)</b> to calories by multiplying by 3.87.
2. Take the ratio of the result in Step 1 with <b>Calories (kcal)</b> and multiply by 100.

    
Display the file info and the first 5 rows.
</div>

In [35]:
# TODO: Create a new column called 'Saturated Fat %'
blizzard_json['Saturated Fat %'] = round(100*(blizzard_json['Saturated Fat (g)']/blizzard_json['Total Fat (g)']),2)

# TODO: Create a new column called 'Sugar Calories'
blizzard_json['Sugar Calories'] = round(100*(3.87*blizzard_json['Sugars (g)']/blizzard_json['Calories (kcal)']),2)

# TODO: Show the file info
blizzard_json.info()

# TODO: Show the first five rows of your file
blizzard_json.head()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 17 entries, 0 to 16
Data columns (total 15 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Menu Item            17 non-null     object 
 1   Calories (kcal)      17 non-null     int64  
 2   Fat Calories (kcal)  17 non-null     int64  
 3   Total Fat (g)        17 non-null     int64  
 4   Saturated Fat (g)    17 non-null     int64  
 5   Trans Fat (g)        17 non-null     float64
 6   Cholesterol (mg)     17 non-null     int64  
 7   Sodium (mg)          17 non-null     int64  
 8   Carbohydrates (g)    17 non-null     int64  
 9   Fiber (g)            17 non-null     int64  
 10  Sugars (g)           17 non-null     int64  
 11  Protein (g)          17 non-null     int64  
 12  Fat %                17 non-null     float64
 13  Saturated Fat %      17 non-null     float64
 14  Sugar Calories       17 non-null     float64
dtypes: float64(4), int64(10), object(1)
memory

Unnamed: 0,Menu Item,Calories (kcal),Fat Calories (kcal),Total Fat (g),Saturated Fat (g),Trans Fat (g),Cholesterol (mg),Sodium (mg),Carbohydrates (g),Fiber (g),Sugars (g),Protein (g),Fat %,Saturated Fat %,Sugar Calories
0,Cinnamon Roll Centers Blizzard,890,350,32,20,1.5,120,390,132,1,106,18,39.33,62.5,46.09
1,Frosted Sugar Cookie Blizzard,960,290,42,19,1.0,75,400,129,1,100,17,30.21,45.24,40.31
2,Butterfinger Blizzard,730,230,26,15,1.0,55,330,107,2,81,18,31.51,57.69,42.94
3,Choco Brownie Extreme Blizzard,810,330,36,21,1.0,55,370,111,4,87,16,40.74,58.33,41.57
4,Chocolate Chip Cookie Dough Blizzard,1030,370,41,24,1.0,60,570,151,2,111,17,35.92,58.54,41.71


### Writing to a File

When you create a new column and add it to your dataframe, it is a good idea to write out the new file to save it for future reference.


<div class="alert alert-block alert-success"><b>Problem 8 (2 pts.)</b>: Write your new file out to an <b>Excel</b> file called <b><i>blizzard_totals.xlsx</i></b>.  Upload your new file into Canvas along with the .ipynb and HTML files. 
    
Reference: __[Pandas DataFrame to Excel File](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_excel.html)__
</div>

In [36]:
# TODO: Write your file to an Excel file called 'blizzard_totals.xlsx'
blizzard_json.to_excel('blizzard_totals.xlsx', header = True, index = False)