# CrunchieMunchies

You work in marketing for a food company <b>myCorps</b>, which is developing a new kind of tasty, wholesome cereal called <b>CrunchieMunchies</b>. 

You want to demonstrate to consumers how healthy your cereal is in comparison to other leading brands, so you’ve dug up nutritional data on several different competitors.

Your task is to use <em>NumPy statistical calculations</em> to analyze this data and prove that your <b>CrunchieMunchies</b> is the healthiest choice for consumers.






# Task STEPS


1.First, import numpy.

In [1]:
import numpy as np


2.Look over the <b><em>cereal.csv</em></b> file. This file contains the reported calorie amounts for different cereal brands. Load the data from the file and save it as <b><em>calorie_stats.</em></b>



In [2]:
calorie_stats=np.genfromtxt('cereal.csv', delimiter=',')
calorie_stats

array([ 70., 120.,  70.,  50., 110., 110., 110., 130.,  90.,  90., 120.,
       110., 120., 110., 110., 110., 100., 110., 110., 110., 100., 110.,
       100., 100., 110., 110., 100., 120., 120., 110., 100., 110., 100.,
       110., 120., 120., 110., 110., 110., 140., 110., 100., 110., 100.,
       150., 150., 160., 100., 120., 140.,  90., 130., 120., 100.,  50.,
        50., 100., 100., 120., 100.,  90., 110., 110.,  80.,  90.,  90.,
       110., 110.,  90., 110., 140., 100., 110., 110., 100., 100., 110.])

3.There are <em>60 calories per serving of CrunchieMunchies</em>. How much <b>higher</b> is the <b>average calorie count</b> of your competition?

Save the answer to the variable <b>average_calories</b> and print the variable to the terminal to see the answer.


In [3]:
average_calories=np.average(calorie_stats)

4.Does the <b>average calorie count</b> adequately reflect the distribution of the dataset? Let’s sort the data and see.

<b><em>Sort</em></b> the data and save the result to the variable <b>calorie_stats_sorted</b>. Print the sorted data to the terminal.


In [4]:

calorie_stats_sorted=np.sort(calorie_stats)
calorie_stats_sorted

array([ 50.,  50.,  50.,  70.,  70.,  80.,  90.,  90.,  90.,  90.,  90.,
        90.,  90., 100., 100., 100., 100., 100., 100., 100., 100., 100.,
       100., 100., 100., 100., 100., 100., 100., 100., 110., 110., 110.,
       110., 110., 110., 110., 110., 110., 110., 110., 110., 110., 110.,
       110., 110., 110., 110., 110., 110., 110., 110., 110., 110., 110.,
       110., 110., 110., 110., 120., 120., 120., 120., 120., 120., 120.,
       120., 120., 120., 130., 130., 140., 140., 140., 150., 150., 160.])

5.Do you see what I’m seeing? Looks like <b><em>the majority of the cereals are higher than the mean</em></b>. Let’s see if the <b>median</b> is a better representative of the dataset.

Calculate the median of the dataset and save your answer to <b><em >median_calories</em></b>. Print the median so you can see how it compares to the mean.

In [5]:
median_calories=np.median(calorie_stats_sorted)
median_calories

110.0

6.While the median demonstrates that <b><em><q>at least half of our values are over 100 calories</q></em></b>, it would be more impressive to show that a significant portion of the competition has a higher calorie count that CrunchieMunchies.

<b>Calculate different percentiles</b> and print them to the terminal until you find the lowest percentile that is greater than 60 calories. Save this value to the variable <b>nth_percentile</b>.


In [9]:
print(np.percentile(calorie_stats_sorted, 1))
print(np.percentile(calorie_stats_sorted, 2))
print(np.percentile(calorie_stats_sorted, 3))
nth_percentile=np.percentile(calorie_stats_sorted, 4)
nth_percentile

50.0
50.0
55.599999999999994


70.0

7.While the percentile shows us that the majority of the competition has a much higher calorie count, it’s an awkward concept to use in marketing materials.

Instead, let’s calculate the percentage of cereals that have more than 60 calories per serving>. Save your answer to the variable more_calories and print it to the terminal

In [12]:
more_calories = np.mean(calorie_stats[calorie_stats> 60])

more_calories


109.1891891891892

8.Wow! That’s a really high percentage. That’s going to be very useful when we promote CrunchieMunchies. But one question is, how much variation exists in the dataset? Can we make the generalization that most cereals have around 100 calories or is the spread even greater?

Calculate the amount of variation by finding the <b><em>standard deviation</em</b> Save your answer to calorie_std and print to the terminal. How can we incorporate this value into our analysis?

In [10]:
calorie_std=np.std(calorie_stats_sorted)
calorie_std



19.35718533390827

9.Write a short paragraph that sums up your findings and how you think this data could be used to 
<b>myCorp’s</b> advantage when marketing CrunchieMunchies.
