# Setting up our Modules

In [2]:
import numpy as np
from datascience import *

# These lines do some fancy plotting magic.
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
import warnings
warnings.simplefilter('ignore', FutureWarning)

In [3]:
deck = Table.read_table('deck.csv')

# Demo Question(s)

#### The setup is below:

In [None]:
#This line 
population = Table.read_table('population.csv').where("time", are.below(2021))
life_expectancy = Table.read_table('life_expectancy.csv').where('time', are.below(2021))
child_mortality = Table.read_table('child_mortality.csv').relabel(2, 'child_mortality_under_5_per_1000_born').where('time', are.below(2021))
fertility = Table.read_table('fertility.csv').where('time', are.below(2021))

<!-- BEGIN QUESTION -->

**Question 10.** Draw a line plot of the world population from 1800 through 2020 (inclusive of both endpoints). The world population is the sum of all of the countries' populations. You should use the `population` table defined earlier in the project. 

<!--
BEGIN QUESTION
name: q1_10
manual: True
-->

<!-- END QUESTION -->

**Question 11.** Create a function `stats_for_year` that takes a `year` and returns a table of statistics. The table it returns should have four columns: `geo`, `population_total`, `children_per_woman_total_fertility`, and `child_mortality_under_5_per_1000_born`. Each row should contain one unique Alpha-3 country code and three statistics: population, fertility rate, and child mortality for that `year` from the `population`, `fertility` and `child_mortality` tables. Only include rows for which all three statistics are available for the country and year.

In addition, restrict the result to country codes that appears in `big_50`, an array of the 50 most populous countries in 2020. This restriction will speed up computations later in the project.

After you write `stats_for_year`, try calling `stats_for_year` on any year between 1960 and 2020. Try to understand the output of stats_for_year.

*Hint*: The tests for this question are quite comprehensive, so if you pass the tests, your function is probably correct. However, without calling your function yourself and looking at the output, it will be very difficult to understand any problems you have, so try your best to write the function correctly and check that it works before you rely on the `grader` tests to confirm your work.

*Hint*: What do all three tables have in common (pay attention to column names)?

<!--
BEGIN QUESTION
name: q1_11
manual: false
points:
- 0
- 0
- 0
- 4
-->

In [None]:
# We first create a population table that only includes the 
# 50 countries with the largest 2020 populations. We focus on 
# these 50 countries only so that plotting later will run faster.
big_50 = population.where('time', are.equal_to(2020)).sort("population_total", descending=True).take(np.arange(50)).column('geo')
population_of_big_50 = population.where('time', are.above(1959)).where('geo', are.contained_in(big_50))

def stats_for_year(year):
    """Return a table of the stats for each country that year."""
    p = population_of_big_50.where('time', are.equal_to(year)).drop('time')
    f = fertility.where('time', are.equal_to(year)).drop('time')
    c = child_mortality.where('time', are.equal_to(year)).drop('time')
    return ...

stats_for_year(2020) 

**Question 12.** Create a table called `pop_by_decade` with two columns called `decade` and `population`, in this order. It has a row for each year that starts a decade, in increasing order starting with 1960 and ending with 2020. For example, 1960 is the start of the 1960's decade. The `population` column contains the total population of all countries included in the result of `stats_for_year(year)` for the first `year` of the decade. You should see that these countries contain most of the world's population.

*Hint:* One approach is to define a function `pop_for_year` that computes this total population, then `apply` it to the `decade` column.  The `stats_for_year` function from the previous question may be useful here.

This first test is just a sanity check for your helper function if you choose to use it. You will not lose points for not implementing the function `pop_for_year`.

**Note:** The cell where you will generate the `pop_by_decade` table is below the cell where you can choose to define the helper function `pop_for_year`. You should define your `pop_by_decade` table in the cell that starts with the table `decades` being defined. 

<!--
BEGIN QUESTION
name: q1_12_0
manual: false
points: 
- 0
- 0
- 0
- 0
-->

In [None]:
def pop_for_year(year):
    """Return the total population for the specified year."""
    return ...

In [None]:
pop_for_year(1960)

Now that you've defined your helper function (if you've chosen to do so), define the `pop_by_decade` table.

<!--
BEGIN QUESTION
name: q1_12
manual: false
points:
- 0
- 0
- 0
- 0
- 0
- 0
- 4
-->

In [None]:
decades = Table().with_column('decade', np.arange(1960, 2021, 10))

pop_by_decade = ...
pop_by_decade.set_format(1, NumberFormatter)

# Random Selection

## 'np.random.choice' will help us produce arrays that are randomly generated

Consider the array of food we could get at the Sky Cafe here on campus: 

In [4]:
cafe_lunch_options = make_array('Salad Bar','Chicken Tendies','Quesadilla','Burger','Just hella fries',
                                'Daily Special','Sandwich','Cup Noodle')
cafe_lunch_options

array(['Salad Bar', 'Chicken Tendies', 'Quesadilla', 'Burger',
       'Just hella fries', 'Daily Special', 'Sandwich', 'Cup Noodle'],
      dtype='<U16')

#### I forgot to bring my lunch from home today, what am I getting? Run the cell below:

In [18]:
np.random.choice(cafe_lunch_options)

'Just hella fries'

#### Denise also forgot hers. She said, "I don't know, surprise me." What are we each getting?

In [19]:
np.random.choice(cafe_lunch_options,2)

array(['Chicken Tendies', 'Cup Noodle'],
      dtype='<U16')

## Appending items to arrays and other arrays to arrays

In [22]:
this_array = make_array()
np.append(this_array,3)

array([ 3.])

In [24]:
lunch_this_week = make_array()
for i in np.arange(5):
    this_lunch = np.random.choice(cafe_lunch_options)
    print(this_lunch)
    lunch_this_week = np.append(lunch_this_week,this_lunch)

lunch_this_week    

Just hella fries
Sandwich
Cup Noodle
Daily Special
Sandwich


array(['Just hella fries', 'Sandwich', 'Cup Noodle', 'Daily Special',
       'Sandwich'],
      dtype='<U32')

## 'tbl.sample' will help produce smaller tables that are randomly generated

In [26]:
deck.show()

Rank,Suit
2,♠︎
2,♣︎
2,♥︎
2,♦︎
3,♠︎
3,♣︎
3,♥︎
3,♦︎
4,♠︎
4,♣︎


In [31]:
my_hand = deck.sample(25,with_replacement = False)
my_hand.show()

Rank,Suit
K,♠︎
6,♥︎
10,♠︎
6,♠︎
4,♥︎
8,♥︎
4,♠︎
8,♣︎
6,♦︎
8,♦︎
