# Introduction
This assignment will test how well you're able to perform various data science-related tasks.

Each Problem Group below will center around a particular dataset that you have worked with before.

To ensure you receive full credit for a question, make sure you demonstrate the appropriate pandas, altair, or other commands as requested in the provided code blocks. 

You may find that some questions require multiple steps to fully answer. Others require some mental arithmetic in addition to pandas commands. Use your best judgment.

## Submission
Each problem group asks a series of questions. This assignment consists of two submissions:

1. After completing the questions below, open the Module 01 Assessment Quiz in Canvas and enter your answers to these questions there.

2. After completing and submitting the quiz, save this Colab notebook as a GitHub Gist (You'll need to create a GitHub account for this), by selecting `Save a copy as a GitHub Gist` from the `File` menu above.

    In Canvas, open the Module 01 Assessment GitHub Gist assignment and paste the GitHub Gist URL for this notebook. Then submit that assignment.

## Problem Group 1

For the questions in this group, you'll work with the Netflix Movies Dataset found at this url: [https://raw.githubusercontent.com/byui-cse/cse450-course/master/data/netflix_titles.csv](https://raw.githubusercontent.com/byui-cse/cse450-course/master/data/netflix_titles.csv)


### Question 1
Load the dataset into a Pandas data frame and determine what data type is used to store the `release_year` feature.

In [4]:
import pandas as pd
netflix = pd.read_csv("https://raw.githubusercontent.com/byui-cse/cse450-course/master/data/netflix_titles.csv")
netflix.head(20)

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,81145628,Movie,Norm of the North: King Sized Adventure,"Richard Finn, Tim Maltby","Alan Marriott, Andrew Toth, Brian Dobson, Cole...","United States, India, South Korea, China","September 9, 2019",2019,TV-PG,90 min,"Children & Family Movies, Comedies",Before planning an awesome wedding for his gra...
1,80117401,Movie,Jandino: Whatever it Takes,,Jandino Asporaat,United Kingdom,"September 9, 2016",2016,TV-MA,94 min,Stand-Up Comedy,Jandino Asporaat riffs on the challenges of ra...
2,70234439,TV Show,Transformers Prime,,"Peter Cullen, Sumalee Montano, Frank Welker, J...",United States,"September 8, 2018",2013,TV-Y7-FV,1 Season,Kids' TV,"With the help of three human allies, the Autob..."
3,80058654,TV Show,Transformers: Robots in Disguise,,"Will Friedle, Darren Criss, Constance Zimmer, ...",United States,"September 8, 2018",2016,TV-Y7,1 Season,Kids' TV,When a prison ship crash unleashes hundreds of...
4,80125979,Movie,#realityhigh,Fernando Lebrija,"Nesta Cooper, Kate Walsh, John Michael Higgins...",United States,"September 8, 2017",2017,TV-14,99 min,Comedies,When nerdy high schooler Dani finally attracts...
5,80163890,TV Show,Apaches,,"Alberto Ammann, Eloy Azorín, Verónica Echegui,...",Spain,"September 8, 2017",2016,TV-MA,1 Season,"Crime TV Shows, International TV Shows, Spanis...",A young journalist is forced into a life of cr...
6,70304989,Movie,Automata,Gabe Ibáñez,"Antonio Banderas, Dylan McDermott, Melanie Gri...","Bulgaria, United States, Spain, Canada","September 8, 2017",2014,R,110 min,"International Movies, Sci-Fi & Fantasy, Thrillers","In a dystopian future, an insurance adjuster f..."
7,80164077,Movie,Fabrizio Copano: Solo pienso en mi,"Rodrigo Toro, Francisco Schultz",Fabrizio Copano,Chile,"September 8, 2017",2017,TV-MA,60 min,Stand-Up Comedy,Fabrizio Copano takes audience participation t...
8,80117902,TV Show,Fire Chasers,,,United States,"September 8, 2017",2017,TV-MA,1 Season,"Docuseries, Science & Nature TV","As California's 2016 fire season rages, brave ..."
9,70304990,Movie,Good People,Henrik Ruben Genz,"James Franco, Kate Hudson, Tom Wilkinson, Omar...","United States, United Kingdom, Denmark, Sweden","September 8, 2017",2014,R,90 min,"Action & Adventure, Thrillers",A struggling couple can't believe their luck w...


### Question 2
Filter your dataset so it contains only `TV Shows`. How many of those TV Shows were rated `TV-Y7`?

In [45]:
TV = netflix[netflix["type"] == "TV Show"]
TVY7 = TV[TV["rating"] == "TV-Y7"]
TVY7.count()

show_id         100
type            100
title           100
director          7
cast             93
country          77
date_added       99
release_year    100
rating          100
duration        100
listed_in       100
description     100
dtype: int64

### Question 3
Further filter your dataset so it only contains TV Shows released between the years 2000 and 2009 inclusive. How many of *those* shows were rated `TV-Y7`?

In [46]:
TV = TV[TV["release_year"] <= 2009]
TV = TV[TV["release_year"] >= 2000]
TVY7 = TV[TV["rating"] == "TV-Y7"]
TVY7.count()

show_id         4
type            4
title           4
director        0
cast            4
country         4
date_added      4
release_year    4
rating          4
duration        4
listed_in       4
description     4
dtype: int64

## Problem Group 2

For the questions in this group, you'll work with the Cereal Dataset found at this url: [https://raw.githubusercontent.com/byui-cse/cse450-course/master/data/cereal.csv](https://raw.githubusercontent.com/byui-cse/cse450-course/master/data/cereal.csv)


### Question 4
After importing the dataset into a pandas data frame, determine the median amount of `protein` in cereal brands manufactured by Kelloggs. (`mfr` code "K")

In [11]:
K = pd.read_csv("https://raw.githubusercontent.com/byui-cse/cse450-course/master/data/cereal.csv")
K = K[K["mfr"] == "K"]
K["protein"].describe()

count    23.000000
mean      2.652174
std       1.070628
min       1.000000
25%       2.000000
50%       3.000000
75%       3.000000
max       6.000000
Name: protein, dtype: float64

### Question 5
In order to comply with new government regulations, all cereals must now come with a "Healthiness" rating. This rating is calculated based on this formula:

    healthiness = (protein + fiber) / sugar

Create a new `healthiness` column populated with values based on the above formula.

Then, determine the median healthiness value for only General Mills cereals (`mfr` = "G"), rounded to two decimal places.

In [17]:
cereal = pd.read_csv("https://raw.githubusercontent.com/byui-cse/cse450-course/master/data/cereal.csv")
cereal["healthiness"] = (cereal["protein"] + cereal["fiber"]) / cereal["sugars"]
GHealth = cereal[cereal["mfr"] == "G"]
GHealth = GHealth["healthiness"]
GHealth.describe()

count    22.000000
mean      0.902024
std       1.667625
min       0.076923
25%       0.212500
50%       0.475000
75%       0.666667
max       8.000000
Name: healthiness, dtype: float64

## Problem Group 3

For the questions in this group, you'll work with the Titanic Dataset found at this url: [https://raw.githubusercontent.com/byui-cse/cse450-course/master/data/titanic.csv](https://raw.githubusercontent.com/byui-cse/cse450-course/master/data/titanic.csv)

### Question 6

After loading the dataset into a pandas DataFrame, create a new column called `NameGroup` that contains the first letter of the passenger's surname in lower case.

Note that in the dataset, passenger's names are provided in the `Name` column and are listed as:

    Surname, Given names

For example, if a passenger's `Name` is `Braund, Mr. Owen Harris`, the `NameGroup` column should contain the value `b`.

Then count how many passengers have a `NameGroup` value of `k`.

In [42]:
import altair as alt
titanic = pd.read_csv("https://raw.githubusercontent.com/byui-cse/cse450-course/master/data/titanic.csv")

def doIt(lastName):
  letter = lastName[0]
  letter = letter.lower()
  return letter

titanic["NameGroup"] = titanic["Name"].apply(doIt)
K = titanic[titanic["NameGroup"] == "k"]
K.count()

PassengerId    28
Survived       28
Pclass         28
Name           28
Sex            28
Age            16
SibSp          28
Parch          28
Ticket         28
Fare           28
Cabin           5
Embarked       28
NameGroup      28
dtype: int64