# Data Analysis with CSVs

In this lesson, we'll complete a full data analysis with a raw data file. **Comma Separated Values** files, known as CSVs, are one of the most common file formats for storing tabular data. In this lesson, we'll show you how to load one into memory and work with it using Python.

# Objectives

Perform a complete data analysis by...

- Creating a Python data structure from a .csv file
- Exploring and cleaning the data 
- Conducting descriptive analysis
- Visualizing the results

# Loading a CSV

The before you can work with a CSV file, you need to load it into memory.

We'll be working with food data! Data source: https://www.kaggle.com/datasets/openfoodfacts/world-food-facts (version we're using has been lightly cleaned and edited for clarity, and because we don't need all 163 columns)

## First up - to the Terminal!

## Get the File Path

Make sure you have the path to your data file. For this example, it is in the root folder of this repository.

Open a new Terminal window (one that doesn't have Jupyter running) and check out the _relative_ location and path between this notebook and the `OpenFoodFacts.csv` data file.

Now, let's save that relative path here, as a string variable.

We use `.` to indicate the root folder of the repository when working within this notebook.

In [1]:
csv_file_path = './data/OpenFoodFacts.csv'

## Inspecting CSV files

In a CSV file, each line represents one row of tabular data, and consecutive values in that row are separated by a comma. Often, the first row contains the column names separated by commas, also known as field names. Let's confirm that this is the case and learn about our dataset by printing the first five lines of the file.

We can inspect the data directly in our terminal! Use the bash command `head` with the flag `-n 5` to check out those first two rows.

### Printing Lines in Python

We use the `with open()` syntax to easily open and read the file in the notebook. Using this syntax will automatically close the file once the statement is done running.

In [2]:
# This code prints the first line of the CSV file

with open(csv_file_path) as csvfile:
    print(csvfile.readline())

product_name,brands,created_date,last_modified_date,serving_size,energy_100g,fat_100g,carbohydrates_100g,sugars_100g,proteins_100g



Next, we can print the second line to look at an example of one observation from our dataset. 

In [3]:
# This code prints the second line of the CSV file

with open(csv_file_path) as csvfile:
    for line, line2 in enumerate(csvfile): 
        print(csvfile.readline())

Banana Chips Sweetened (Whole),,2017-03-09,2017-03-09,28.0 g,2243.0,28.57,64.29,14.29,3.57

Organic Salted Nut Mix,Grizzlies,2017-03-09,2017-03-09,28.0 g,2540.0,57.14,17.86,3.57,17.86

Breadshop Honey Gone Nuts Granola,Unfi,2017-03-09,2017-03-09,52.0 g,1933.0,18.27,63.46,11.54,13.46

Organic Muesli,Daddy's Muesli,2017-03-09,2017-03-09,64.0 g,1833.0,18.75,57.81,15.62,14.06

Organic Sunflower Oil,Napa Valley Naturals,2017-03-09,2017-03-09,14.0 g,3586.0,100.0,,,

Organic Penne Pasta,Gardentime,2017-03-09,2017-03-09,57.0 g,1540.0,1.75,73.68,,14.04

Organic Golden Flax Seeds,Unfi,2017-03-09,2017-03-09,21.0 g,2590.0,42.86,38.1,,19.05

Cinnamon Nut Granola,Grizzlies,2017-03-09,2017-03-09,55.0 g,1824.0,18.18,60.0,21.82,14.55

Organic Sweetened Banana Chips,Unfi,2017-03-09,2017-03-09,30.0 g,2092.0,26.67,66.67,16.67,3.33

Organic Oat Groats,Pcc,2017-03-09,2017-03-09,42.0 g,1096.0,5.95,66.67,2.38,16.67

Antioxidant Mix - Berries & Chocolate,Sunridge,2017-03-09,2017-03-09,30.0 g,2092.0,33.33,46.67

Cut Green Beans,Kroger,2017-03-09,2017-03-09,81.0 g,155.0,0.0,6.17,2.47,1.23

Southern Style Baby Lima Beans,Kroger,2017-03-09,2017-03-09,83.0 g,556.0,0.0,24.1,2.41,7.23

Mixed Vegetables,Kroger,2017-03-09,2017-03-09,88.0 g,285.0,0.0,12.5,4.55,2.27

"Blackeye Peas, Southern Style",Kroger,2017-03-09,2017-03-09,81.0 g,619.0,0.62,27.16,0.0,9.88

Meal-Ready Sides Peas & Carrots,Kroger,2017-03-09,2017-03-09,86.0 g,243.0,0.0,11.63,4.65,3.49

Southern Style Breaded Cut Okra,Kroger,2017-03-09,2017-03-09,88.0 g,569.0,0.57,29.55,4.55,3.41

"Stir-Fry Starters, Vegetables With Noodles, Egg Noodles, Broccoli, Carrots, Celery, Green Peppers, Mushrooms, Onions & Red Peppers",Kroger,2017-03-09,2017-03-09,87.0 g,385.0,0.57,16.09,1.15,4.6

Meal - Ready Sides Italian Style Vegetables,Kroger,2017-03-09,2017-03-09,87.0 g,192.0,0.0,8.05,3.45,2.3

Fiesta Style Vegetables,Kroger,2017-03-09,2017-03-09,80.0 g,314.0,0.0,12.5,2.5,3.75

Recipe Beginnings Chopped Spinach,Kroger,2017-03-09,2017-03-09,81.0 g,155.0,0.

"Punalu'U Bake Shop, Macadamia Nut Shortbread Cookies, Coffee",Punalu'U Bakery  Inc.,2017-03-09,2017-03-09,27.0 g,2171.0,33.33,55.56,22.22,3.7

"Punalu'U Bake Shop, Macadamia Nut Shortbread Cookies",Punalu'U Bakery  Inc.,2017-03-09,2017-03-09,27.0 g,2171.0,33.33,55.56,22.22,3.7

Imitation Crabmeat,Black Tie,2017-03-09,2017-03-09,85.0 g,983.0,9.41,25.88,0.0,12.94

Jumbo Breaded Tail-On Mini Shrimp,Florida Best,2017-03-10,2017-03-10,85.0 g,983.0,9.41,25.88,0.0,12.94

Popcorn Shrimp,Ocean Wave,2017-02-03,2017-03-09,85.0 g,983.0,8.24,25.88,1.18,12.94

"Egg Hunt Kit, Creamy Milk Chocolate Eggs",Vigneri,2017-03-09,2017-03-09,28.0 g,2389.0,35.71,60.71,57.14,7.14

"Vigneri, Creamy Milk Chocolate Egg With Jelly Beans Inside",Gelateria Patisceria Vigneri,2017-03-09,2017-03-09,28.0 g,2389.0,35.71,60.71,57.14,7.14

"Vigneri, Give Collection Rich Dark Chocolate Egg",Gelateria Patisceria Vigneri,2017-03-09,2017-03-09,100.0 g,2050.0,33.0,58.0,46.0,5.0

"Luxury Brazils Nuts, Dark Chocolate",Beech's Fi

Value Biscuit Country Ham,Tennessee Valley Ham Co.  Inc.,2017-03-09,2017-03-09,56.0 g,820.0,10.71,0.0,0.0,25.0

Country Ham Biscuit Slices,Clifty Farm,2017-03-09,2017-03-09,56.0 g,820.0,10.71,0.0,0.0,25.0

Brown Sugar Ham,"Ahold,  Lifestyle One  Inc.",2017-03-10,2017-03-10,56.0 g,448.0,2.68,5.36,5.36,17.86

Hot Dog Chili Sauce,Steve's & Ed's,2017-03-09,2017-03-09,16.0 g,259.0,3.12,6.25,0.0,6.25

Original Beans & Franks,Steve's & Ed's,2017-03-09,2017-03-09,213.0 g,490.0,3.76,15.02,2.82,6.57

Honey,Steve's & Ed's,2017-03-09,2017-03-09,21.0 g,1276.0,0.0,80.95,80.95,0.48

"Salsa, Mild",Steve's & Ed's,2017-03-09,2017-03-09,31.0 g,201.0,0.0,9.68,3.23,0.0

"Salsa, Hot",Steve's & Ed's,2017-03-09,2017-03-09,31.0 g,201.0,0.0,9.68,3.23,0.0

Wild Alaskan Red Sockeye Salmon,Deming's,2017-03-09,2017-03-09,63.0 g,732.0,11.11,0.0,0.0,20.63

"Double""Q"", Wild Alaskan Skinless & Boneless Pink Salmon",,2017-03-09,2017-03-09,63.0 g,464.0,2.38,0.0,0.0,19.05

Wild Caught Alaskan Pink Salmon,Deming's,2017-0

Candy Watches,Badatz Paskesz,2017-03-10,2017-03-10,14.0 g,1795.0,0.0,92.86,7.14,0.0

Marshmallows,Paskesz,2017-03-09,2017-03-09,28.0 g,1343.0,0.0,78.57,53.57,3.57

"Paskesz, Fruit Snacks, Fruit Medley",Paskesz Candy Company,2017-03-09,2017-03-09,22.7 g,1289.0,0.0,74.89,44.05,4.41

"Paskesz, Fruit Snacks, Wild Strawberry",Paskesz Candy Company,2017-03-09,2017-03-09,22.7 g,1289.0,0.0,74.89,44.05,4.41

"Paskesz, Chocolate Mint Thins",Paskesz Candy Company,2017-03-09,2017-03-09,40.0 g,1987.0,20.0,70.0,65.0,2.5

"Paskesz, Chocolate Chip Cookies",Paskesz Candy Company,2017-03-09,2017-03-09,28.0 g,1795.0,21.43,60.71,28.57,3.57

Bar-B-Que Coating Crumbs,Paskesz,2017-03-09,2017-03-09,28.0 g,1615.0,17.86,67.86,,0.71

Traditional Roast Beef,Lower Foods  Inc.,2017-03-09,2017-03-09,56.0 g,523.0,3.57,3.57,0.0,19.64

Roast Beef,Northwest Reserve,2017-03-09,2017-03-09,56.0 g,523.0,3.57,3.57,0.0,19.64

Roasted Garlic Hummus,King Harvest,2017-03-10,2017-03-10,28.0 g,1046.0,17.86,17.86,3.57,7.14

Black O


Cut Wax Beans,Harvest Valley,2017-03-09,2017-03-09,120.0 g,71.0,0.0,3.33,1.67,0.83

Seasoned Green Beans,Harvest Valley,2017-03-09,2017-03-09,121.0 g,71.0,0.0,3.31,0.83,0.83

Whole Green Beans,Harvest Valley,2017-03-09,2017-03-09,120.0 g,88.0,0.0,3.33,1.67,0.83

Buttermilk Ranch Dressing,Ventura Foods  Llc.,2017-03-09,2017-03-09,29.0 g,2163.0,55.17,3.45,3.45,0.0

"Bashas', Flour Tortillas",Bashas',2017-03-09,2017-03-09,31.0 g,1335.0,8.06,50.0,0.0,9.68

"Bashas', Cottage Cheese",Bashas',2017-03-09,2017-03-09,110.0 g,418.0,4.09,4.55,2.73,11.82

"Bashas', Hamburger Buns",Bashas',2017-03-09,2017-06-25,43.0 g,1167.0,2.33,58.14,9.3,9.3

Pork Tamales,Dos Ranchitos,2017-03-09,2017-03-09,113.0 g,741.0,8.85,15.93,0.88,7.96

Corn Tortilla,Food City,2017-03-10,2017-03-10,51.0 g,904.0,2.94,43.14,3.92,3.92

Nacho Chips,Fiesta,2017-03-09,2017-03-09,28.0 g,2092.0,25.0,60.71,0.0,7.14

Mexican Style Hominy,Food City,2017-03-09,2017-03-09,130.0 g,301.0,0.77,14.62,1.54,1.54

"Aj's Purveyors Of Fine Foods


"Premium Ice Cream, Bear Claw",Food Club,2017-03-09,2017-03-09,68.0 g,1046.0,13.24,30.88,22.06,4.41

"Premium Ice Cream, Mint Chocolate Chip",Food Club,2017-03-09,2017-03-09,68.0 g,925.0,11.76,25.0,19.12,2.94

"Premium Ice Cream, Butter Pecan",Food Club,2017-03-09,2017-03-09,68.0 g,983.0,14.71,22.06,16.18,2.94

"Premium Ice Cream, Strawberry",Food Club,2017-03-09,2017-03-09,68.0 g,799.0,10.29,25.0,19.12,2.94

"Premium Ice Cream, French Vanilla",Food Club,2017-03-09,2017-03-09,68.0 g,862.0,11.76,25.0,19.12,2.94

Tex-Mex Chipotle Seasoning,Food Club,2017-03-09,2017-03-09,1.0 g,0.0,0.0,0.0,,0.0

Garlic & Sea Salt,Food Club,2017-03-09,2017-03-09,1.1 g,0.0,0.0,0.0,,0.0

Unsweetened Pitted Dark Sweet Cherries,Food Club,2017-03-09,2017-03-09,140.0 g,268.0,0.0,15.71,12.86,0.71

Unsweetened Wild Blueberries,Food Club,2017-03-09,2017-03-09,140.0 g,209.0,0.0,12.14,8.57,0.71

Unsweetened Mixed Fruit,Food Club,2017-03-09,2017-03-09,140.0 g,180.0,0.0,11.43,8.57,0.71

Whole Unsweetened Strawberries,

Italian Style Cut Green Beans,Furmano's,2017-03-09,2017-03-09,120.0 g,71.0,0.0,3.33,1.67,0.83

Diced Tomatoes,Furmano's,2017-03-09,2017-03-09,116.0 g,92.0,0.0,4.31,2.59,0.86

"Italian Style Diced Tomatoes, Basil, Garlic & Oregano",Furmano's,2017-03-09,2017-03-09,120.0 g,121.0,0.0,5.83,4.17,0.83

"Petite Diced Tomatoes, Green Chilies",Furmano's,2017-03-09,2017-03-09,116.0 g,92.0,0.0,4.31,2.59,0.86

Chunky Crushed Tomatoes,Furmano's,2017-03-09,2017-03-09,119.0 g,142.0,0.0,6.72,3.36,0.84

Petite Diced Tomatoes,Furmano's,2017-03-09,2017-03-09,116.0 g,92.0,0.0,4.31,2.59,0.86

"Furmano's, Chili Style Diced Tomatoes, Ancho Chili & Cumin",Furmano Foods,2017-03-09,2017-03-09,116.0 g,92.0,0.0,5.17,2.59,0.86

Crushed Tomatoes,Furmano's,2017-03-09,2017-03-09,120.0 g,105.0,0.0,5.83,4.17,1.67

Scrunchy Animal Crackers,Shoprite,2017-03-09,2017-03-09,30.0 g,1812.0,13.33,70.0,20.0,6.67

Chocolate Sandwich Cremes,Shoprite,2017-03-09,2017-03-09,35.0 g,2033.0,20.0,71.43,37.14,2.86

"Sandwich Cremes Cookie


Country-Style Spread,Essential Everyday,2017-03-09,2017-03-09,14.0 g,1795.0,50.0,0.0,0.0,0.0

Blended Vanilla Yogurt,Essential Everyday,2017-03-09,2017-03-09,227.0 g,389.0,0.88,17.62,14.98,2.64

Lowfat Yogurt,Essential Everyday,2017-03-09,2017-03-09,227.0 g,389.0,0.88,17.62,14.98,3.08

Grade A Large Brow Fresh Eggs,Essential Everyday,2017-03-10,2017-03-10,50.0 g,586.0,10.0,0.0,,12.0

Jumbo Eggs,Essential Everyday,2017-03-10,2017-03-10,63.0 g,598.0,9.52,0.0,,12.7

Homestyle Chicken Noodle Soup,Essential Everyady,2017-03-09,2017-03-09,123.0 g,238.0,1.63,7.32,0.0,3.25

Chicken Cooking Stock,Essential Everyday,2017-03-09,2017-03-09,245.0 g,25.0,0.0,0.41,0.41,1.22

Honey,"Clover,  Essential Everyday",2017-03-09,2017-03-09,21.0 g,1197.0,0.0,80.95,76.19,0.0

Raisin Bread,"Essential Everyday,  Supervalu  Inc.",2017-03-09,2017-03-09,31.0 g,1213.0,1.61,58.06,22.58,6.45

Split Top White Enriched Bread,Essential Everyday,2017-03-09,2017-03-09,28.0 g,1046.0,1.79,53.57,7.14,7.14

Split Top Wheat Br

Angle Food Cake,Weis,2017-03-09,2017-03-09,43.0 g,682.0,1.16,37.21,23.26,2.33

Pecan Spins,Weis,2017-03-09,2017-03-09,28.0 g,1494.0,12.5,57.14,3.57,3.57

"Popped Popcorn, Butter",Weis Quality,2017-03-09,2017-03-09,28.0 g,2540.0,42.86,46.43,3.57,7.14

"Baked Cheese Balls, Cheese",Weis Quality,2017-03-09,2017-03-09,28.0 g,1941.0,25.0,57.14,3.57,7.14

"Sourdough Pretzel Pieces, Seasoned Ranch",Weis,2017-03-09,2017-03-09,28.0 g,1941.0,21.43,64.29,0.0,10.71

"Kettle Style Potato Chips, Honey Barbeque","Weis,  Utz",2017-03-09,2017-03-09,28.0 g,2243.0,32.14,53.57,3.57,7.14

Ripple Cut Potato Chips,Weis Quality,2017-03-09,2017-03-09,28.0 g,2243.0,35.71,50.0,0.0,7.14

Potato Chips,Weis,2017-03-09,2017-03-09,28.0 g,2243.0,32.14,57.14,0.0,7.14

"Ripple Cut Potato Chips, Sour Cream & Onion","Weis,  Utz",2017-03-09,2017-03-09,28.0 g,2389.0,35.71,50.0,3.57,7.14

Nacho Tortilla Chips,Weis Markets  Inc.,2017-03-10,2017-03-10,28.0 g,2243.0,28.57,67.86,3.57,7.14

Crunchy Cheese Sticks,Weis Quality,2017-


Olives With Pimientos & Capers,La Triguena,2017-03-09,2017-03-09,15.0 g,699.0,13.33,6.67,,0.0

Total Seasoning,Madame Gougousse,2017-03-09,2017-03-09,1.0 g,0.0,0.0,50.0,0.0,0.0

Coconut Milk (Lait Cocoye),Artibonite,2017-03-09,2017-03-09,80.0 g,209.0,5.0,3.75,3.75,1.25

Swai Fillets,Happy Seas,2017-03-09,2017-03-09,113.0 g,259.0,1.77,0.0,0.0,13.27

Unsweetened Apple Sauce,Burnette Foods  Inc.,2017-03-09,2017-03-09,124.0 g,167.0,0.0,10.48,8.06,0.81

Light Red Kidney Beans,Clover Valley,2017-03-09,2017-03-09,125.0 g,335.0,0.0,16.0,0.0,6.4

Whole Purple Plums In Light Syrup,Burnette Foods  Inc.,2017-03-09,2017-03-09,125.0 g,335.0,0.0,20.0,18.4,0.8

Organic Montmorency Pitted Tart Cherries In Water,Omena Organics,2017-03-09,2017-03-09,145.0 g,172.0,0.0,9.66,7.59,0.69

Organic Unsweetened Applesauce,Omega Organics,2017-03-09,2017-03-09,124.0 g,167.0,0.0,10.48,8.06,0.81

Organic Pinto Beans,Omena Organics,2017-03-09,2017-03-09,115.0 g,510.0,0.43,22.61,0.0,7.83

Organic Beans Great Northern,

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 5206: character maps to <undefined>

### Discussion

Based on these two lines, what can we infer about the contents of the data file? What questions arise for you about the data?

- 


## Using the `csv` module

The [`csv` module](https://docs.python.org/3/library/csv.html) lets us easily process data in CSV files. We will use it to read each row in the `.csv` file and store its information in a Python object we can use for further analysis. 

In [4]:
import csv

The [`csv.DictReader`](https://docs.python.org/3/library/csv.html#csv.DictReader) object is a file reader, reading each row and then converting it to a dictionary. By default, it turns each row into a dictionary, using the field names from the first row as the keys.

In [5]:
# Print OrderedDict from first row of CSV file 

with open(csv_file_path) as csvfile:
    reader = csv.DictReader(csvfile)
    print(next(reader))

{'product_name': 'Banana Chips Sweetened (Whole)', 'brands': '', 'created_date': '2017-03-09', 'last_modified_date': '2017-03-09', 'serving_size': '28.0 g', 'energy_100g': '2243.0', 'fat_100g': '28.57', 'carbohydrates_100g': '64.29', 'sugars_100g': '14.29', 'proteins_100g': '3.57'}


Let's get all of the data out of our file and into dictionaries, and store those dictionaries in a new list called `products`.

In [6]:
products = []

with open(csv_file_path) as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        products.append(row)

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 5206: character maps to <undefined>

In [7]:
# Now look at the first 3 entries
products[0:3]

[{'product_name': 'Banana Chips Sweetened (Whole)',
  'brands': '',
  'created_date': '2017-03-09',
  'last_modified_date': '2017-03-09',
  'serving_size': '28.0 g',
  'energy_100g': '2243.0',
  'fat_100g': '28.57',
  'carbohydrates_100g': '64.29',
  'sugars_100g': '14.29',
  'proteins_100g': '3.57'},
 {'product_name': 'Peanuts',
  'brands': 'Torn & Glasser',
  'created_date': '2017-03-09',
  'last_modified_date': '2017-03-09',
  'serving_size': '28.0 g',
  'energy_100g': '1941.0',
  'fat_100g': '17.86',
  'carbohydrates_100g': '60.71',
  'sugars_100g': '17.86',
  'proteins_100g': '17.86'},
 {'product_name': 'Organic Salted Nut Mix',
  'brands': 'Grizzlies',
  'created_date': '2017-03-09',
  'last_modified_date': '2017-03-09',
  'serving_size': '28.0 g',
  'energy_100g': '2540.0',
  'fat_100g': '57.14',
  'carbohydrates_100g': '17.86',
  'sugars_100g': '3.57',
  'proteins_100g': '17.86'}]

# Data Preparation

Now that we've gotten all of our data into a Python object, we can prepare it for analysis. Let's look at a sample observation and consider how we might want to process it to make it easier to analyze.

In [8]:
products[0]

{'product_name': 'Banana Chips Sweetened (Whole)',
 'brands': '',
 'created_date': '2017-03-09',
 'last_modified_date': '2017-03-09',
 'serving_size': '28.0 g',
 'energy_100g': '2243.0',
 'fat_100g': '28.57',
 'carbohydrates_100g': '64.29',
 'sugars_100g': '14.29',
 'proteins_100g': '3.57'}

**Discussion:** How might you clean these entries to make them easier to analyze?

- 


## Clean the Serving Size (and other numbers)

You might have noticed that all of these numbers are strings! Hard to do things like math on strings.

In addition, the `serving_size` here has ` g` at the end - in fact, I know from doing more analysis on this data that ALL of these are in grams, and have ` g` at the end. Let's remove that so we can treat those as numbers too!

In [9]:
# First, let's do a one off example - how to we turn a string into a float?
float(products[0]['sugars_100g'])

14.29

In [10]:
# And how can we remove the ' g' from the end of the serving sizes?
float(products[0]['serving_size'].strip(" g"))

28.0

In [11]:
# Now that we've figured out what to do, let's clean up these dicts!
for product in products:
    # Clean up serving size
    product['serving_size'] = float(product['serving_size'].strip(" g"))

    # Clean up all the values if the key contains "100g"
    for key, detail in product.items():
        if '100g' in key:
            # now - we have some blanks in here!
            # introducing: try / except!
            try: # tries to do this first thing
                product[key] = float(detail)
            except: # does this if the first thing doesn't work
                product[key] = 0.0

In [13]:
# Check it out:
products[5]

{'product_name': 'Organic Long Grain White Rice',
 'brands': 'Lundberg',
 'created_date': '2017-03-09',
 'last_modified_date': '2017-03-09',
 'serving_size': 45.0,
 'energy_100g': 1490.0,
 'fat_100g': 0.0,
 'carbohydrates_100g': 80.0,
 'sugars_100g': 0.0,
 'proteins_100g': 8.89}

## Clean the Dates

Next we'll clean the dates so that we can easily get the month and year when each food was added to this database.

### Using Python built-in methods

**Activity**: Process the `products` list to add numeric values for the month and year when each entry was first created! (aka parse out `created_date`)

In [14]:
# Your work here
for product in products:
    # Save the date to a variable
    string_date = product["created_date"]

    # Extract the sale year and month from the string, and cast to int
    created_year = int(string_date[0:4])
    created_month = int(string_date[5:7])    

    # Add the sale year and month to each dictionary element of sales
    product["created_year"] = created_year
    product["created_month"] = created_month

<details>
    <summary><b><u>Click Here for Answer Code</u></b></summary>
    
```python    
for product in products[:10]:
    # Save the date to a variable
    string_date = product["created_date"]

    # Extract the sale year and month from the string, and cast to int
    created_year = int(string_date[0:4])
    created_month = int(string_date[5:7])    

    # Add the sale year and month to each dictionary element of sales
    product["created_year"] = created_year
    product["created_month"] = created_month
```
</details>

In [15]:
products[0]

{'product_name': 'Banana Chips Sweetened (Whole)',
 'brands': '',
 'created_date': '2017-03-09',
 'last_modified_date': '2017-03-09',
 'serving_size': 28.0,
 'energy_100g': 2243.0,
 'fat_100g': 28.57,
 'carbohydrates_100g': 64.29,
 'sugars_100g': 14.29,
 'proteins_100g': 3.57,
 'created_year': 2017,
 'created_month': 3}

# Data Analysis

Now that you have a cleaner version of the data, we can finally start to perform some data analysis.

## Question 1: What is the range of years for which we have data?

In [29]:
# Your work here
years = []
for product in products:
    years.append(product["created_year"])
print(max(years))
print(min(years))


2017
2012


<details>
            <summary><b><u>Answer</u></b></summary>

```python
years = [product['created_year'] for product in products]
print(min(years))
print(max(years))
```    
</details>

## Question 2: What is the average number of calories per 100g for all of these foods?

In [30]:
# Your work here
#energy_100g total added/ total number
cals = [product['energy_100g'] for product in products]

print(sum(cals) / len(cals))


1181.9687873138237


<details>
            <summary><b><u>Answer</u></b></summary>

```python
cals = [product['energy_100g'] for product in products]

sum(cals) / len(cals)
```
</details>

## Question 3: What is the average amount of sugar per 100g in foods that are more than 1000 calories per 100g?

In [1]:
# Your work here
#sugars_100g 
sugars_over_1000cal = []

for product in products:
    if product['energy_100g'] > 1000:
        sugars_over_1000cal.append(product['sugars_100g'])
avgsugars = sum(sugars_over_1000cal)/len(sugars_over_1000cal)
avgsugars

NameError: name 'products' is not defined

<details>
            <summary><b><u>Answer</u></b></summary>

```python
prods_over_1000cal = [product for product in products if product['energy_100g'] > 1000]
sugar = [product['sugars_100g'] for product in prods_over_1000cal]
sum(sugar) / len(sugar)
```
</details>

## Question 4: How many foods were added (created) in each year?

In [43]:
# Your work here

created_years = [product['created_year'] for product in products]
unique_years = set(created_years)

prod_year_count = {}
#for year in unique_years:
#    if year == unique_year
#    prod_year_count[unique_year] = 1
#    prod_year_count[unique_year] += 1

<details>
            <summary><b><u>Answer</u></b></summary>

```python
prod_year_count = {}
list_created_years = [product['created_year'] for product in products]
unique_prod_years = set(list_created_years)
for unique_year in unique_prod_years:
    num_added = len([year for year in list_created_years if year == unique_year])
    prod_year_count[unique_year] = num_added
    
prod_year_count
```
</details>

# Chart the Data

This rendering of the data directly in the notebook is helpful, but it takes a lot of effort to read it and make sense of the trends - let's visualize the results to make this easier.

In [None]:
from matplotlib import pyplot as plt

In [None]:
# create a figure and one plot
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(10, 5))

# place data in a bar chart
# where the x-axis is each year and
# the y-axis is the number of products added per year
ax.bar(prod_year_count.keys(),
       prod_year_count.values())

# set axis labels
ax.set_xlabel("Year")
ax.set_ylabel("Number of Products Added")

# give the bar chart a title
ax.set_title("Foods Added to the Open Food Facts Database")

# display bar chart clearly
fig.tight_layout()

# Level Up: `Counter`

The `collections` package has a lot of useful tools for working with Python objects containing multiple elements. You can use the `Counter` class to easily count the number of values in a list.

For example, let's say that for all dictionaries in `sales`, we want to count how many property sales occured in each year. 

In [None]:
from collections import Counter

prod_year_count = Counter([product['created_year'] for product in products])
prod_year_count

Let's sort these by year. Since the value in `sale_year` is an integer, we can use Python's built-in [`sorted`](https://docs.python.org/3/library/functions.html#sorted) function to transform the `Counter` object into a `list` of `tuples` - `(year, count)` which are sorted in ascending order by year.

In [None]:
prod_year_count = sorted(prod_year_count.items())
prod_year_count

# Level Up: Using `datetime`

Python has a [`datetime`](https://docs.python.org/3.7/library/datetime.html) package that is the standard tool for handling dates and times. `datetime` objects make it easy to do fun things like subtract dates to calculate how far apart they are.

In [None]:
from datetime import datetime

In [None]:
for product in products:
    # Transform DocumentDate from string to datetime
    product["clean_date"] = datetime.strptime(product["last_modified_date"], 
                                           "%Y-%m-%d")
    
    # Add the sale year and month to each dictionary element of sales
    product["last_modified_year"] = product["clean_date"].year
    product["last_modified_month"] = product["clean_date"].month

Let's inspect our work

In [None]:
products[0]