# Chapter 4: NumPy Basics: Arrays and Vectorized Computation

In [1]:
import numpy as np

## Dataset:80 Cereals - Nutrition data on 80cereal products

The data file can be downloaded from [Kaggle.com](https://www.kaggle.com/crawford/80-cereals)

>If you like to eat cereal, do yourself a favor and avoid this dataset at all costs. After seeing these data it will never be the same for me to eat Fruity Pebbles again. - Kaggle

- Download the zip file from Kaggle (login required)
- Unzip to get `cereal.csv` file
- Move the csv file to a proper folder
- Open the csv file using notepad and excel to examine its content

In [3]:
import os
print("My current working directory:", os.getcwd())
print("Make sure the csv file exists:", os.listdir('Data/cereals'))
# my cereal file can be accessed as "Data/cereals/cereal.csv"

My current working directory: C:\Users\ch002\Dropbox\Teaching\CMP464Fall2019
Make sure the csv file exists: ['cereal.csv']


In [5]:
# Load the csv file with np.loadtxt()
# Spoiler alert: in the next chapter we will learn a more user-friendly
# way of loading data.

# How to use np.loadtxt()?
# ?np.loadtxt() # Display documentation
# ??np.loadtxt() # Display source code

In [6]:
# Try an example from documentation
from io import StringIO 
c = StringIO(u"0 1\n2 3")
np.loadtxt(c)

array([[0., 1.],
       [2., 3.]])

In [7]:
# Load cereal.csv as a numpy array named raw_data
raw_data = np.loadtxt("Data/cereals/cereal.csv",
                      dtype=str,
                      delimiter=",")
print(raw_data[0, :])

['name' 'mfr' 'type' 'calories' 'protein' 'fat' 'sodium' 'fiber' 'carbo'
 'sugars' 'potass' 'vitamins' 'shelf' 'weight' 'cups' 'rating']


In [8]:
# What is the shape of raw_data?
print("Shape:", raw_data.shape)

Shape: (78, 16)


In [18]:
# Create a list of feature names (call it feature_names)
feature_names = raw_data[0, :]
# print("Feature names:", raw_data[0, :])

# Print a list in a nicer format:
# Create a string that joins all values from the array
feature_string = ", ".join(feature_names)
print(feature_string)

name, mfr, type, calories, protein, fat, sodium, fiber, carbo, sugars, potass, vitamins, shelf, weight, cups, rating


In [100]:
# Assign the rest to data
data = raw_data[1:, :]

# Print the shape of data
print("Shape of data:", data.shape)

Shape of data: (77, 16)


### Content
What are the features?

- Name: Name of cereal
- mfr: Manufacturer of cereal
    - A = American Home Food Products;
    - G = General Mills
    - K = Kelloggs
    - N = Nabisco
    - P = Post
    - Q = Quaker Oats
    - R = Ralston Purina
- type:
    - cold
    - hot
- calories: calories per serving
- protein: grams of protein
- fat: grams of fat
- sodium: milligrams of sodium
- fiber: grams of dietary fiber
- carbo: grams of complex carbohydrates
- sugars: grams of sugars
- potass: milligrams of potassium
- vitamins: vitamins and minerals - 0, 25, or 100, indicating the typical percentage of FDA recommended
- shelf: display shelf (1, 2, or 3, counting from the floor)
- weight: weight in ounces of one serving
- cups: number of cups in one serving
- rating: a rating of the cereals (Possibly from Consumer Reports?)

Next, let's examine some important features

In [21]:
# ---------Names------------
# Display the list of cereal names
print("\n".join(data[:, 0]))

100% Bran
100% Natural Bran
All-Bran
All-Bran with Extra Fiber
Almond Delight
Apple Cinnamon Cheerios
Apple Jacks
Basic 4
Bran Chex
Bran Flakes
Cap'n'Crunch
Cheerios
Cinnamon Toast Crunch
Clusters
Cocoa Puffs
Corn Chex
Corn Flakes
Corn Pops
Count Chocula
Cracklin' Oat Bran
Cream of Wheat (Quick)
Crispix
Crispy Wheat & Raisins
Double Chex
Froot Loops
Frosted Flakes
Frosted Mini-Wheats
Fruit & Fibre Dates; Walnuts; and Oats
Fruitful Bran
Fruity Pebbles
Golden Crisp
Golden Grahams
Grape Nuts Flakes
Grape-Nuts
Great Grains Pecan
Honey Graham Ohs
Honey Nut Cheerios
Honey-comb
Just Right Crunchy  Nuggets
Just Right Fruit & Nut
Kix
Life
Lucky Charms
Maypo
Muesli Raisins; Dates; & Almonds
Muesli Raisins; Peaches; & Pecans
Mueslix Crispy Blend
Multi-Grain Cheerios
Nut&Honey Crunch
Nutri-Grain Almond-Raisin
Nutri-grain Wheat
Oatmeal Raisin Crisp
Post Nat. Raisin Bran
Product 19
Puffed Rice
Puffed Wheat
Quaker Oat Squares
Quaker Oatmeal
Raisin Bran
Raisin Nut Bran
Raisin Squares
Rice Chex
Rice Kr

In [23]:
# How to sort an numpy array?
ary = np.array([1, 3, 2, 7, 5])
print(ary)
# ary = np.sort(ary)
ary.sort()
print(ary)

[1 3 2 7 5]
[1 2 3 5 7]


In [24]:
matrix = np.array([[1, 4, 23, 19], 
                   [5, 2, 6, -20]])
print(matrix)
print("sorting each row:")
print(np.sort(matrix, axis=1)) # or: np.sort(matrix)
print("sorting each column:")
print(np.sort(matrix, axis=0))

[[  1   4  23  19]
 [  5   2   6 -20]]
sorting each row:
[[  1   4  19  23]
 [-20   2   5   6]]
sorting each column:
[[  1   2   6 -20]
 [  5   4  23  19]]


In [25]:
# The names are nicely sorted. How to sort an array?

# To make an example, let's shuffle the array first
name_data = np.array(data[:, 0])
example = np.array(name_data)
np.random.shuffle(example)
print("\n".join(example))

Count Chocula
Froot Loops
Special K
Bran Flakes
Honey-comb
Cinnamon Toast Crunch
Just Right Crunchy  Nuggets
Puffed Rice
Nut&Honey Crunch
Just Right Fruit & Nut
Kix
Maypo
Frosted Flakes
Rice Krispies
Basic 4
Apple Jacks
Triples
Muesli Raisins; Dates; & Almonds
Trix
Wheaties Honey Gold
All-Bran with Extra Fiber
Almond Delight
Honey Graham Ohs
Strawberry Fruit Wheats
Oatmeal Raisin Crisp
Cap'n'Crunch
Apple Cinnamon Cheerios
Raisin Nut Bran
Wheaties
Fruit & Fibre Dates; Walnuts; and Oats
Crispy Wheat & Raisins
Fruitful Bran
All-Bran
Corn Pops
Quaker Oatmeal
Puffed Wheat
Mueslix Crispy Blend
Corn Chex
Golden Grahams
Total Whole Grain
Honey Nut Cheerios
Muesli Raisins; Peaches; & Pecans
Quaker Oat Squares
Wheat Chex
Grape Nuts Flakes
Cocoa Puffs
Grape-Nuts
Double Chex
100% Natural Bran
Cream of Wheat (Quick)
100% Bran
Nutri-grain Wheat
Post Nat. Raisin Bran
Frosted Mini-Wheats
Product 19
Lucky Charms
Life
Crispix
Smacks
Total Raisin Bran
Corn Flakes
Cracklin' Oat Bran
Total Corn Flakes
Shre

In [26]:
# Now use sort() to sort the array
# print(np.sort(example)) # this creates a new sorted list
example.sort()
print(example)# this will sort the list itself

['100% Bran' '100% Natural Bran' 'All-Bran' 'All-Bran with Extra Fiber'
 'Almond Delight' 'Apple Cinnamon Cheerios' 'Apple Jacks' 'Basic 4'
 'Bran Chex' 'Bran Flakes' "Cap'n'Crunch" 'Cheerios'
 'Cinnamon Toast Crunch' 'Clusters' 'Cocoa Puffs' 'Corn Chex'
 'Corn Flakes' 'Corn Pops' 'Count Chocula' "Cracklin' Oat Bran"
 'Cream of Wheat (Quick)' 'Crispix' 'Crispy Wheat & Raisins' 'Double Chex'
 'Froot Loops' 'Frosted Flakes' 'Frosted Mini-Wheats'
 'Fruit & Fibre Dates; Walnuts; and Oats' 'Fruitful Bran' 'Fruity Pebbles'
 'Golden Crisp' 'Golden Grahams' 'Grape Nuts Flakes' 'Grape-Nuts'
 'Great Grains Pecan' 'Honey Graham Ohs' 'Honey Nut Cheerios' 'Honey-comb'
 'Just Right Crunchy  Nuggets' 'Just Right Fruit & Nut' 'Kix' 'Life'
 'Lucky Charms' 'Maypo' 'Muesli Raisins; Dates; & Almonds'
 'Muesli Raisins; Peaches; & Pecans' 'Mueslix Crispy Blend'
 'Multi-Grain Cheerios' 'Nut&Honey Crunch' 'Nutri-Grain Almond-Raisin'
 'Nutri-grain Wheat' 'Oatmeal Raisin Crisp' 'Post Nat. Raisin Bran'
 'Product

In [None]:
feature_names

In [29]:
my_data = np.array([[1, 2, 13],
                    [4, 5, 6],
                    [7, 8, 9]])
last_col = np.array(my_data[:, 2]) # this creates a new copy of last column
last_col = my_data[:, 2] # this simply refers to the last column of my_data
last_col.sort()
print(last_col)
print(my_data)

[ 6  9 13]
[[ 1  2  6]
 [ 4  5  9]
 [ 7  8 13]]


In [30]:
# ------------- ratings ------------
# What is the index of rating in feature_names?
print(np.where(feature_names == "rating"))
rating_data = np.array(data[:, 15]) # create a new numpy array with ratings
# print(rating_data)
# print(rating_data.dtype)
# change the data type from string to float
rating_data = rating_data.astype(float)
print(rating_data)
# rating_data_int = rating_data.astype(int)
# print(rating_data_int)

(array([15], dtype=int64),)
[68.402973 33.983679 59.425505 93.704912 34.384843 29.509541 33.174094
 37.038562 49.120253 53.313813 18.042851 50.764999 19.823573 40.400208
 22.736446 41.445019 45.863324 35.782791 22.396513 40.448772 64.533816
 46.895644 36.176196 44.330856 32.207582 31.435973 58.345141 40.917047
 41.015492 28.025765 35.252444 23.804043 52.076897 53.371007 45.811716
 21.871292 31.072217 28.742414 36.523683 36.471512 39.241114 45.328074
 26.734515 54.850917 37.136863 34.139765 30.313351 40.105965 29.924285
 40.69232  59.642837 30.450843 37.840594 41.50354  60.756112 63.005645
 49.511874 50.828392 39.259197 39.7034   55.333142 41.998933 40.560159
 68.235885 74.472949 72.801787 31.230054 53.131324 59.363993 38.839746
 28.592785 46.658844 39.106174 27.753301 49.787445 51.592193 36.187559]


In [None]:
# Find the maximum rating
print("maximum rating:", np.max(rating_data))
print("minimum rating:", np.min(rating_data))
print("average rating:", np.mean(rating_data))

# Find the index corresponding to the highest rating
highest_rating = np.max(rating_data)
highest_rating_index = np.where(rating_data == highest_rating)
print(highest_rating_index)
print("Product with highest rating:", data[3, 0])

lowest_rating = np.min(rating_data)
lowest_rating_i = np.where(rating_data == lowest_rating)
print(lowest_rating_i)
print(data[10,0])

In [107]:
# sort and argsort()
arr = np.random.rand(5)
print(arr.argsort())

# find the second largest value in arr
print("Second largest:", arr[arr.argsort()[-2]])
print(arr)9

[0 3 2 1 4]
Second largest: 0.24412138705245656
[0.08837719 0.24412139 0.18426609 0.11674943 0.53219886]


In [33]:
# Sort the ratings using sort() and argsort()



# Find the top five highest-rated cereal products



# Find the top five lowest-rated cereal products




maximum rating: 93.704912
minimum rating: 18.042851
average rating: 42.66570498701299
(array([3], dtype=int64),)
Product with highest rating: All-Bran with Extra Fiber
(array([10], dtype=int64),)
Cap'n'Crunch


In [None]:
# ------------- sugars -------------
# What is the index of sugars in feature_names?



# Is there any correlation between sugars and rating?
import matplotlib.pyplot as plt
%matplotlib inline
# plt.plot(original_ratings, sugars_data)

# change data type from str to float
rating_data = original_ratings.astype(float)
sugars_data = sugars_data.astype(float)
# plt.plot(rating_data, sugars_data)

# sort the ratings data (sugars data should be changed accordingly, how?)
order = rating_data.argsort()
print("order:", order)
plt.plot(rating_data[order], sugars_data[order], 'b.')


In [None]:
# ------------- Weight -------------
# What is the index of weight in feature_names?



# How many different weights per serving are there?




## Measure nutrition by serving

The following project is inspired by [This Kaggle kernel](https://www.kaggle.com/frankwwu/how-cereal-manufacturers-mislead-consumers)

Manufacturers like to measure nutrition with serving. Every manufacturer chaotically defines the serving with different weights and cups. Thus, for consumers, comparing nutrition measured with different serving is very confusing in practice. Imagine you are comparing nutrition facts of different cereals in a grocery store and they are measured with different serving, you definitely need a calculator and a piece of paper.

In [None]:
# Divide sugars by weight



#### Arithmetic with NumPy arrays

In [None]:
arr = np.array([[1., 2., 3.], [4., 5., 6.]])
arr
arr * arr
arr - arr

In [None]:
1 / arr
arr ** 0.5

In [None]:
arr2 = np.array([[0., 4., 1.], [7., 2., 12.]])
arr2
arr2 > arr

In [None]:
# Plot ratings vs. unified_sugars_data




In [None]:
# What is the maximum and minimum amount of sugar in a unified serving?




### Create our own ratings

- good-cereal-rating = protein + fiber + vitamins
- bad-cereal-rating = fat + sodium + potass + sugars

In [None]:
good_rating = data[:, 4].astype(float) + data[:, 7].astype(float) + data[:, 11].astype(float)
print(good_rating)

In [None]:
plt.plot(rating_data, good_rating, 'b.')

## More on NumPy Arrays
- Create a numpy array
- Change data type
- Arithmetic
- Boolean indexing
- Fancy indexing
- Reshape numpy arrays
- Element-wise array functions
- Statistical methods

In [48]:
# Create a numpy array
# 1. Array [6, 7.5, 8, 10.0]
array = np.array([6, 7.5, 8, 10.0])
print("array1:", array)
# 2. Array [[1, 2, 3, 4], [5, 6, 7, 8]]
array2 = np.array ([[1,2,3,4],[5,6,7,8]])
print("array\n",array2)
# 3. An array filled with 10 ones
array3 = np.ones(10)
print("array3")
print(array3)
# 4. A 3-by-6 2D array filled with zeros. np.zeros()
array4 = np.zeros(shape=(3,6))
print(array4)
# 5. Use np.arange() to create [0, ..., 99]
# array5 = np.arange(50, 100, 10)
array5 = np.arange(100)
print("array5")
print(array5)

# 6. Use np.arange() to create [1, 3, 5, 7, 9]
array6 = np.arange(1,10,2)
print(array6)

array1: [ 6.   7.5  8.  10. ]
array
 [[1 2 3 4]
 [5 6 7 8]]
array3
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[[0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0.]]
array5
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
 96 97 98 99]
[1 3 5 7 9]


In [68]:
# Data type
# Use np.random.rand() to create a 2*3 2D array of random numbers
array7 = np.random.rand(2, 3) # random numbers are sampled uniformly
                                # from [0, 1)
array8 = np.random.randn(2, 3) # sampled from Gaussian distribution
print(array7)
print(array8)

# draw 8 numbers randomly from [0, 10)
array9 = np.random.rand(8) * 10
print(array9)

# Simulate a sequence of 10 dice roll (numbers are randomly
# drawn from {1, 2, 3, 4, 5, 6})
print((np.random.rand(10) * 6).astype(int) + 1)
# ?np.random.randint()
print(np.random.randint(1, high=7, size=10))

# Find the data type of the array
print(array7.dtype)

# Convert the type to np.int32
print(array7.astype(int))

[[0.21391417 0.07924372 0.59632204]
 [0.0142967  0.92360409 0.50829102]]
[[-0.32904419  0.34162724  1.38937802]
 [-1.3966595  -0.67743766 -0.00604751]]
[9.63178261 7.40818221 7.21490203 0.13925985 9.67804508 3.89290541
 0.6196624  8.24624886]
[5 2 2 5 3 3 1 6 4 6]
[6 2 6 3 2 2 4 4 6 4]
float64
[[0 0 0]
 [0 0 0]]


In [69]:
# Array arithmetic
arr = np.array([[1., 2., 3.], [4., 5., 6.]])

print("arr * arr:\n", arr * arr)

print("arr - arr:\n", arr - arr)

print("arr > 3:\n", (arr > 3))

arr * arr:
 [[ 1.  4.  9.]
 [16. 25. 36.]]
arr - arr:
 [[0. 0. 0.]
 [0. 0. 0.]]
arr > 3:
 [[False False False]
 [ True  True  True]]


In [70]:
# Multiply two 2D arrays as matrices
mat1 = np.array([[1, 2, 3],
                 [4, 5, 6]])
mat2 = np.array([[7, 9],
                 [11, 13],
                 [15, 17]])
product = mat1.dot(mat2)
print(product)

[[ 74  86]
 [173 203]]


In [81]:
# Reshape numpy arrays
data = np.arange(12)
print(data)

# Reshape data to a 2*6 matrix
data = data.reshape([2, 6])
print(data)

data = data.reshape([3, 4])
print(data)
# Reshape data so that it has 4 rows
data = data.reshape([4, -1]) # -1 is a placeholder
print(data)

data = data.reshape([-1, 2])
print(data)

data = data.reshape(-1)
print(data)

# data = data.reshape([-1, -1]) # this gives an error
# Swap rows and columns
mat = np.array([[1, 5, 9],
                [2, 6, 10],
                [3, 7, 11],
                [4, 8, 12]])
# transpose the matrix
mat = mat.T
print(mat)
mat = mat.reshape(12)
print(mat)

[ 0  1  2  3  4  5  6  7  8  9 10 11]
[[ 0  1  2  3  4  5]
 [ 6  7  8  9 10 11]]
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]
[[ 0  1]
 [ 2  3]
 [ 4  5]
 [ 6  7]
 [ 8  9]
 [10 11]]
[ 0  1  2  3  4  5  6  7  8  9 10 11]
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
[ 1  2  3  4  5  6  7  8  9 10 11 12]


In [88]:
# Boolean indexing
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
print("name == 'Bob'\n", (names == 'Bob'))
where_bob = (names == 'Bob')
print(names[where_bob])

data = np.arange(28).reshape([7, 4])
print("data:\n", data)
print("data[names == 'Bob']\n", data[names == 'Bob'])

mask = ((names == 'Bob') | (names == 'Will'))
print("mask:\n", mask)
print("data[mask]:\n", data[mask])
print("data[~mask]:\n", data[~mask])

name == 'Bob'
 [ True False False  True False False False]
['Bob' 'Bob']
data:
 [[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]
 [20 21 22 23]
 [24 25 26 27]]
data[names == 'Bob']
 [[ 0  1  2  3]
 [12 13 14 15]]
mask:
 [ True False  True  True  True False False]
data[mask]:
 [[ 0  1  2  3]
 [ 8  9 10 11]
 [12 13 14 15]
 [16 17 18 19]]
data[~mask]:
 [[ 4  5  6  7]
 [20 21 22 23]
 [24 25 26 27]]


In [95]:
# Fancy indexing
arr = np.arange(100).reshape([10, 10])
print(arr)
# Extract the following rows (keep the order): [4, 3, 0, 6]
rows = [4, 3, 0, 6]
sub_arr = arr[rows]
print(sub_arr)
# Extract the following columns (keep the order): [-3, -5, -7]
cols = [-3, -5, -7]
sub_arr = arr[:, cols]
print(sub_arr)
# Extract the intersection of the first 5 rows and first 5 columns
sub_arr = arr[[0,1,2,3,4],[0,1,2,3,4]]
print(sub_arr)
sub_arr = arr[:5][:, :5]
print(sub_arr)

[[ 0  1  2  3  4  5  6  7  8  9]
 [10 11 12 13 14 15 16 17 18 19]
 [20 21 22 23 24 25 26 27 28 29]
 [30 31 32 33 34 35 36 37 38 39]
 [40 41 42 43 44 45 46 47 48 49]
 [50 51 52 53 54 55 56 57 58 59]
 [60 61 62 63 64 65 66 67 68 69]
 [70 71 72 73 74 75 76 77 78 79]
 [80 81 82 83 84 85 86 87 88 89]
 [90 91 92 93 94 95 96 97 98 99]]
[[40 41 42 43 44 45 46 47 48 49]
 [30 31 32 33 34 35 36 37 38 39]
 [ 0  1  2  3  4  5  6  7  8  9]
 [60 61 62 63 64 65 66 67 68 69]]
[[ 7  5  3]
 [17 15 13]
 [27 25 23]
 [37 35 33]
 [47 45 43]
 [57 55 53]
 [67 65 63]
 [77 75 73]
 [87 85 83]
 [97 95 93]]
[ 0 11 22 33 44]
[[ 0  1  2  3  4]
 [10 11 12 13 14]
 [20 21 22 23 24]
 [30 31 32 33 34]
 [40 41 42 43 44]]


In [99]:
# Create a list of 100 random number uniformly drawn between 0 and 1.
arr = np.random.rand(100)
print(arr)

# Compute the mean
mean = np.mean(arr)
print('Mean:', mean)

# Compute the sum
sum = np.sum(arr)
print(sum)

# Compute the standard deviation
std = np.std(arr)
print("Standard deviation:", std)

[0.75895153 0.75723114 0.72957846 0.50767328 0.94609245 0.51189209
 0.02208804 0.14367895 0.19014938 0.99722831 0.63471951 0.74796715
 0.18658403 0.03396398 0.18386508 0.43030319 0.02735854 0.15610903
 0.12120197 0.94876429 0.72402324 0.63500856 0.32979472 0.95759337
 0.60433845 0.88716541 0.92267046 0.90928401 0.07988464 0.75778378
 0.13402818 0.68633397 0.99479077 0.31033912 0.91065641 0.14061893
 0.33818978 0.77967126 0.47915469 0.97433624 0.37538279 0.36577403
 0.75108892 0.88052191 0.61631846 0.89998157 0.93476979 0.31200036
 0.83896905 0.43686205 0.70636178 0.03428128 0.99972293 0.45896437
 0.77047936 0.4072329  0.09613266 0.80699996 0.61844946 0.07644007
 0.36912713 0.30922057 0.74362355 0.30884737 0.51420615 0.24712162
 0.39963765 0.47622398 0.91433826 0.1054581  0.20278824 0.13315929
 0.05218996 0.60055415 0.98946428 0.4577292  0.83272315 0.30077206
 0.01688476 0.25973886 0.79210683 0.51653777 0.05146598 0.13625535
 0.8064971  0.40753337 0.70660344 0.57893128 0.24048631 0.9861

# Week 2 Homework
1. Fat
    - Calculate fat per gram
    - What is the maximum and minimum value for fat per gram?
2. Calories
    - Calculate calories per gram
    - find the top 5 cereals with highest calories
    - find the top 5 cereals with lowest calories
3. Bad rating
    - Calculate bad-cereal-rating for each cereal
    - Plot Ratings vs. Bad-Cereal-Ratings.
    - Do they agree each other?