**Working With CSV Files**

CSV files are used to store a large number of variables – or data. They are incredibly simplified spreadsheets – think Excel – only the content is stored in plaintext.

**Challenge 1**

Calculate the average and the highest cholesterol value based on the data available in the dataset.


In [2]:
# To parse CSV files, we use the csv module. CSV literally stands for comma separated value, 
# where the comma is what is known as a "delimiter." The csv module provides a number of built-in
# functions to make it easier to parse and iterate through CSV files.
import csv

In [4]:
# Open the diabetes file.  Note that when Python opens data files and stores them in variables,
# the variables DO NOT actually contain text.  In the example below, the diabetes_file 
# variable stores the file in a special format (one that Python can understand and interpret)
diabetes_file = open("diabetes.csv")


# Now we need to tell Python that the file stored in diabetes_file variable should be read as 
# and interpreted as a CSV file.  We do that by calling on the reader() function of the csv module
diabetes_data = csv.reader(diabetes_file)



In [8]:
# Calculate average cholesterol

cnt = 0 # Initialize a temporary counter
diabetes_file.seek(0) # Reset the read position of the file object
total = 0 # This variable will hold the sum of all cholesterol values

for row in diabetes_data:
    if row[1] != "":
        if cnt > 0:
            total = total + int(row[1])
        cnt = cnt + 1 # Increment the counter by one
        
print("Total: " , total)
print("Count: " , cnt)

avg_chol = total / cnt

print("Average: ", avg_chol)

Total:  83554
Count:  403
Average:  207.3300248138958


In [9]:
# Calculate average cholesterol

cnt = 0 # Initialize a temporary counter
diabetes_file.seek(0) # Reset the read position of the file object
max_chol = 0 # This variable will hold the sum of all cholesterol values

for row in diabetes_data:
    if row[1] != "":
        if cnt > 0:
            # Every time through the loop (for every row that contains a value)
            # we compare the value from the data with the value stored in 
            # max_chol variable.  
            # If the value from the data is larger, we set max_chol to that larger value
            # After the loop finishes running, the largest value will be stored in max_chols
            if max_chol < int(row[1]):
                max_chol = int(row[1])
        cnt = cnt + 1 # Increment the counter by one
        

print("Maximum cholesterol: ", max_chol)

Maximum cholesterol:  443
