# CrunchieMunchies
You work in marketing for a food company YummyCorps, which is developing a new kind of tasty, wholesome cereal called CrunchieMunchies. You want to demonstrate to consumers how healthy your cereal is in comparison to other leading brands, so you’ve dug up nutritional data on several different competitors.

Your task is to use NumPy statistical calculations to analyze this data and prove that your CrunchieMunchies cereal is the healthiest choice for consumers.

Look over the cereal.csv file. This file contains the reported calorie amounts for different cereal brands. Load the data from the file and save it as calorie_stats.

In [1]:
import numpy as np

calorie_stats = np.genfromtxt("files/cereal.csv", delimiter=",")

There are 60 calories per serving of CrunchieMunchies. How much higher is the average calorie count of your competition?

In [2]:
average_calories = np.mean(calorie_stats)
print(average_calories)

106.88311688311688


Does the average calorie count adequately reflect the distribution of the dataset? Let’s sort the data and see.

In [3]:
calorie_stats_sorted = np.sort(calorie_stats)
print(calorie_stats_sorted)

[ 50.  50.  50.  70.  70.  80.  90.  90.  90.  90.  90.  90.  90. 100.
 100. 100. 100. 100. 100. 100. 100. 100. 100. 100. 100. 100. 100. 100.
 100. 100. 110. 110. 110. 110. 110. 110. 110. 110. 110. 110. 110. 110.
 110. 110. 110. 110. 110. 110. 110. 110. 110. 110. 110. 110. 110. 110.
 110. 110. 110. 120. 120. 120. 120. 120. 120. 120. 120. 120. 120. 130.
 130. 140. 140. 140. 150. 150. 160.]


Do you see what I’m seeing? Looks like the majority of the cereals are higher than the mean. Let’s see if the median is a better representative of the dataset.

In [4]:
median_calories = np.median(calorie_stats)
print(median_calories)

110.0


While the median demonstrates that at least half of our values are over 100 calories, it would be more impressive to show that a significant portion of the competition has a higher calorie count that CrunchieMunchies.

Calculate different percentiles and print them to the terminal until you find the lowest percentile that is greater than 60 calories. Save this value to the variable nth_percentile.

In [8]:
n = 100
percentile = np.percentile(calorie_stats, n)
print("percentile of {} is {}".format(n, percentile))

while percentile > 60:
  n -= 1
  percentile = np.percentile(calorie_stats, n)
  print("percentile of {} is {}".format(n, percentile))
nth_percentile = n + 1

percentile of 100 is 160.0
percentile of 99 is 152.39999999999995
percentile of 98 is 150.0
percentile of 97 is 147.2
percentile of 96 is 140.0
percentile of 95 is 140.0
percentile of 94 is 140.0
percentile of 93 is 136.80000000000007
percentile of 92 is 130.0
percentile of 91 is 130.0
percentile of 90 is 124.00000000000006
percentile of 89 is 120.0
percentile of 88 is 120.0
percentile of 87 is 120.0
percentile of 86 is 120.0
percentile of 85 is 120.0
percentile of 84 is 120.0
percentile of 83 is 120.0
percentile of 82 is 120.0
percentile of 81 is 120.0
percentile of 80 is 120.0
percentile of 79 is 120.0
percentile of 78 is 120.0
percentile of 77 is 115.20000000000003
percentile of 76 is 110.0
percentile of 75 is 110.0
percentile of 74 is 110.0
percentile of 73 is 110.0
percentile of 72 is 110.0
percentile of 71 is 110.0
percentile of 70 is 110.0
percentile of 69 is 110.0
percentile of 68 is 110.0
percentile of 67 is 110.0
percentile of 66 is 110.0
percentile of 65 is 110.0
percentile 

While the percentile shows us that the majority of the competition has a much higher calorie count, it’s an awkward concept to use in marketing materials.

Instead, let’s calculate the percentage of cereals that have more than 60 calories per serving. Save your answer to the variable more_calories and print it to the terminal.

In [9]:
more_calories = np.mean(calorie_stats > 60)
print(more_calories)

0.961038961038961


Wow! That’s a really high percentage. That’s going to be very useful when we promote CrunchieMunchies. But one question is, how much variation exists in the dataset? Can we make the generalization that most cereals have around 100 calories or is the spread even greater?

Calculate the amount of variation by finding the standard deviation. Save your answer to calorie_std and print to the terminal. How can we incorporate this value into our analysis?

In [10]:
calorie_std = np.std(calorie_stats)
print(calorie_std)



19.35718533390827
