# STOR 120 -  Lab 3: Data Types and Arrays

Welcome to Lab 3!

So far in labs, we've used Python to manipulate numbers and work with tables.  But we need to discuss data types to deepen our understanding of how to work with data in Python.

In this lab, you'll first see how to represent and manipulate another fundamental type of data: text.  A piece of text is called a *string* in Python. You'll also see how to work with *arrays* of data, such as all the numbers between 0 and 100 or all the words in the chapter of a book. Lastly, you'll create tables and practice analyzing them with your knowledge of table operations.

First run the cell below.

In [1]:
# Just run this cell

import numpy as np
from datascience import *

# 1. Text
Programming doesn't just concern numbers. Text is one of the most common data types used in programs. 

Text is represented by a **string value** in Python. The word "string" is a programming term for a sequence of characters. A string might contain a single character, a word, a sentence, or a whole book.

To distinguish text data from actual code, we demarcate strings by putting quotation marks around them. Single quotes (`'`) and double quotes (`"`) are both valid, but the types of opening and closing quotation marks must match. The contents can be any sequence of characters, including numbers and symbols. 

We've seen strings before in `print` statements.  Below, two different strings are passed as arguments to the `print` function.

In [2]:
print("I <3", 'Data Science')

I <3 Data Science


Just as names can be given to numbers, names can be given to string values.  The names and strings aren't required to be similar in any way. Any name can be assigned to any string.

In [3]:
one = 'two'
plus = '*'
print(one, plus, one)

two * two


**Question 1.1.** Yuri Gagarin was the first person to travel through outer space.  When he emerged from his capsule upon landing on Earth, he [reportedly](https://en.wikiquote.org/wiki/Yuri_Gagarin) had the following conversation with a woman and girl who saw the landing:

    The woman asked: "Can it be that you have come from outer space?"
    Gagarin replied: "As a matter of fact, I have!"

The cell below contains unfinished code.  Fill in the `...`s so that it prints out this conversation *exactly* as it appears above.

<!--
BEGIN QUESTION
name: q1_1
-->

In [6]:
woman_asking = "The woman asked:"
woman_quote = '"Can it be that you have come from outer space?"'
gagarin_reply = 'Gagarin replied:'
gagarin_quote = '"As a matter of fact, I have!"'

print(woman_asking, woman_quote)
print(gagarin_reply, gagarin_quote)

The woman asked: "Can it be that you have come from outer space?"
Gagarin replied: "As a matter of fact, I have!"


## 1.1. String Methods

Strings can be transformed using **methods**. Recall that methods and functions are not technically the same thing, but we'll be using them interchangeably for the purposes of this course.

Here's a sketch of how to call methods on a string:

    <expression that evaluates to a string>.<method name>(<argument>, <argument>, ...)
    
One example of a string method is `replace`, which replaces all instances of some part of the original string (or a *substring*) with a new string. 

    <original string>.replace(<old substring>, <new substring>)
    
`replace` returns (evaluates to) a new string, leaving the original string unchanged.
    
Try to predict the output of this example, then run the cell!

In [7]:
# Replace one letter
hello = 'Hello'
print(hello.replace('o', 'a'), hello)

Hella Hello


You can call functions on the results of other functions.  For example, `max(abs(-5), abs(3))` evaluates to 5.  Similarly, you can call methods on the results of other method or function calls.

You may have already noticed one difference between functions and methods - a function like `max` does not require a `.` before it's called, but a string method like `replace` does.

In [8]:
# Calling replace on the output of another call to replace
'train'.replace('t', 'ing').replace('in', 'de')

'degrade'

Here's a picture of how Python evaluates a "chained" method call like that:

<img src="lab03-chaining_method_calls.jpg"/>

**Question 1.1.1.** Use `replace` to transform the string `'clarinetists'` into `'statistics'`. Assign your result to `new_word`.

<!--
BEGIN QUESTION
name: q111
-->

In [10]:
new_word = 'clarinetists'.replace('clarinet', 'stat').replace("sts", "stics")
new_word

'statistics'

There are many more string methods in Python, but most programmers don't memorize their names or how to use them.  In the "real world," people usually just search the internet for documentation and examples. A complete [list of string methods](https://docs.python.org/3/library/stdtypes.html#string-methods) appears in the Python language documentation. [Stack Overflow](http://stackoverflow.com) has a huge database of answered questions that often demonstrate how to use these methods to achieve various ends.

## 1.2. Converting to and from Strings

Strings and numbers are different *types* of values, even when a string contains the digits of a number. For example, evaluating the following cell causes an error because an integer cannot be added to a string.

In [None]:
8 + "8"

However, there are built-in functions to convert numbers to strings and strings to numbers. Some of these built-in functions have restrictions on the type of argument they take:

|Function |Description|
|-|-|
|`int`|Converts a string of digits or a float to an integer ("int") value|
|`float`|Converts a string of digits (perhaps with a decimal point) or an int to a decimal ("float") value|
|`str`|Converts any value to a string|

Try to predict what data type and value `example` evaluates to, then run the cell.

In [11]:
example = 8 + int("10") + float("8")

print(example)
print("This example returned a " + str(type(example)) + "!")

26.0
This example returned a <class 'float'>!


Suppose you're writing a program that looks for dates in a text, and you want your program to find the amount of time that elapsed between two years it has identified.  It doesn't make sense to subtract two texts, but you can first convert the text containing the years into numbers.

**Question 1.2.1.** Finish the code below to compute the number of years that elapsed between `one_year` and `another_year`.  Don't just write the numbers `1618` and `1648` (or `30`); use a conversion function to turn the given text data into numbers.

<!--
BEGIN QUESTION
name: q121
-->

In [13]:
# Some text data:
one_year = "1618"
another_year = "1648"

# Complete the next line.  Note that we can't just write:
#   another_year - one_year
# If you don't see why, try seeing what happens when you
# write that here.
difference = int(another_year) - int(one_year)
difference

30

## 1.3. Passing strings to functions

String values, like numbers, can be arguments to functions and can be returned by functions. 

The function `len` (derived from the word "length") takes a single string as its argument and returns the number of characters (including spaces) in the string.

Note that it doesn't count *words*. `len("one small step for man")` evaluates to 22, not 5.

**Question 1.3.1.**  Use `len` to find the number of characters in the long string in the next cell.  Characters include things like spaces and punctuation. Assign `speech_length` to that number.

(The string is the text of North Carolina Gov. Roy Cooper's 2021 inauguration speech from the [AP News](https://apnews.com/article/health-north-carolina-inaugurations-coronavirus-pandemic-0b2b6cc2c722a00971bc5f9e4c82a948of.asp).)  

<!--
BEGIN QUESTION
name: q131
-->

In [14]:
speech = "Well, Good morning everybody. I’m Roy Cooper. Today, I’m honored to stand before you to accept both the oath of office of Governor and the duties that come along with it. I’m thankful for my family — our First Lady Kristin Cooper and my three daughters — who inspire me every day. Transitions are a time for reflection, and a time for looking forward. My first term in this office was filled with triumphs, but also trials. First, the triumphs. Historic progress to make our state more inclusive and our environment cleaner. Record jobs announcements in rural and urban parts of our state that provided rewarding work to our people. Unrelenting efforts to make health care more accessible and public schools stronger. And as for the trials — the natural disasters. The overdue reckoning on racial justice. An unprecedented global pandemic. The earthquakes – those that shook the ground and those that shook the very foundation of our democracy. We’ve had our share of tough days. But we are North Carolinians. And in our state, difficulties don’t define us. What defines us is our strength, our resilience, our readiness to succeed at what comes next. When the pandemic forced classrooms to close in March, schools and volunteers made sure our kids got fed at home. When weary health care workers needed a boost, communities sent meals and care packages. When personal protective equipment ran short, North Carolina manufacturing companies pivoted to produce face shields, gowns, masks and more. And let’s not overlook the stories of neighbors helping neighbors. Grandchildren talking and singing through windows to their grandparents in nursing homes. A painted rainbow taped up in the hallway. Snacks and signs of encouragement left for delivery workers facing long hours. An overworked nurse getting a COVID-19 vaccine and telling everyone the biggest side effect is joy. Our state deserves a collective pat on the back. And as your governor, I say thank you. And before looking ahead, it’s worth looking back a hundred years where North Carolina was in 1920. The state had lost nearly 14,000 people in the Spanish flu pandemic. And in just a few years, North Carolina roared back. New manufacturing jobs paid reliable wages for the first time to thousands of North Carolinians. With more money in their pockets, people were able to afford to buy cars. And that created the challenge of needing roads for those cars to drive on. So, North Carolina responded and became known as the Good Roads state. Those roads got people to work, but they also enabled them to vacation and enjoy the natural beauty of our state. But now a century later, that cycle of challenge-and-response confronts North Carolina again. We are living it. We can see it. And we can solve it. As your governor, I commit to focus on our most important challenges. The challenge of emerging from this pandemic smarter and stronger than ever. The challenge of educating our people and ensuring that every North Carolinian gets health care. The challenge of overcoming disinformation and lies and recommitting to the truth. We can respect our disagreements, but we must cherish our democracy. And the challenge of forming a more perfect North Carolina, where every person has opportunity and access to the liberty that they deserve and our laws promise. As we enter 2021, we carry the imprint of our people’s frustration and loss as well as our determination and resilience. I hold close the memories of the suffering and the heroic North Carolinians. This new year and this new term as Governor is more than just turning the page of a calendar. The lessons we’ve all learned must usher in a new era. An era where we can acknowledge and work around our differences while refusing to sacrifice truth and facts at the altar of ideology. Where the dangerous events that took place at our nation’s Capitol can never be justified. So let’s reach together – to find ways all North Carolinians can afford to see a doctor. To get a quality education and a good paying job. To reform our systems that hurt people of color and to live and work in an economy that leaves no one behind, no matter who they are or where they live. Hey, let’s cast aside notions of red counties or blue counties and recognize that these are artificial divisions. Let’s place integrity at the forefront. We are all North Carolinians. These times of triumph and trial have shown us that we are more connected than we ever imagined. And one thing is clear, just as we did one hundred years ago — North Carolina is ready to roar again. And we will do it together. As the Bible tells us in the Book of Ecclesiastes, “Two are better than one, because they have good reward for their toil. For if they fall, one will lift up his fellow.” North Carolinians have shown we know how to lift one another up. I’m truly humbled by the trust that you, the people of North Carolina have placed in me to serve again as your Governor. I have faith in you, and thank you for putting your faith in me. Together, may we continue to be strong, resilient and ready. God Bless North Carolina and may God bless all of you."
speech_length = len(speech)
speech_length

5134

# 2. Arrays

Computers are most useful when you can use a small amount of code to *do the same action* to *many different things*.

For example, in the time it takes you to calculate the 18% tip on a restaurant bill, a laptop can calculate 18% tips for every restaurant bill paid by every human on Earth that day (That's if you're pretty fast at doing arithmetic in your head!).

**Arrays** are how we put many values in one place so that we can operate on them as a group. For example, if `billions_of_numbers` is an array of numbers, the expression

    .18 * billions_of_numbers

gives a new array of numbers that contains the result of multiplying each number in `billions_of_numbers` by .18.  Arrays are not limited to numbers; we can also put all the words in a book into an array of strings.

Concretely, an array is a **collection of values of the same type**. 

## 2.1. Making arrays

First, let's learn how to manually input values into an array. This typically isn't how programs work. Normally, we create arrays by loading them from an external source, like a data file.

To create an array by hand, call the function `make_array`.  Each argument you pass to `make_array` will be in the array it returns.  Run this cell to see an example:

Personal Note: [The difference between lists and arrays in Python](https://www.geeksforgeeks.org/difference-between-list-and-array-in-python/)

In [15]:
make_array(0.125, 4.75, -1.3)

array([ 0.125,  4.75 , -1.3  ])

Each value in an array (in the above case, the numbers 0.125, 4.75, and -1.3) is called an *element* of that array.

Arrays themselves are also values, just like numbers and strings.  That means you can assign them to names or use them as arguments to functions. For example, `len(<some_array>)` returns the number of elements in `some_array`.

**Question 2.1.1.** Make an array containing the numbers 0, 1, -1, $\pi$, and $e$, in that order.  Name it `interesting_numbers`.  

*Hint:* How have you found the values $\pi$ and $e$ earlier in this course?

<!--
BEGIN QUESTION
name: q211
-->

In [17]:
import math as math 

interesting_numbers = make_array(0,1,-1,math.pi, math.e)
interesting_numbers

array([ 0.        ,  1.        , -1.        ,  3.14159265,  2.71828183])

**Question 2.1.2.** Make an array containing the five strings `"Hello"`, `","`, `" "`, `"world"`, and `"!"`.  (The third one is a single space inside quotes.)  Name it `hello_world_components`.

*Note:* If you evaluate `hello_world_components`, you'll notice some extra information in addition to its contents: `dtype='<U5'`.  That's just NumPy's extremely cryptic way of saying that the data types in the array are strings.

<!--
BEGIN QUESTION
name: q212
-->

In [18]:
hello_world_components = make_array("Hello", ",", " ", "world", "!")
hello_world_components

array(['Hello', ',', ' ', 'world', '!'], dtype='<U5')

###  `np.arange`
Arrays are provided by a package called [NumPy](http://www.numpy.org/) (pronounced "NUM-pie"). The package is called `numpy`, but it's standard to rename it `np` for brevity.  You can do that with:

    import numpy as np

Very often in data science, we want to work with many numbers that are evenly spaced within some range.  NumPy provides a special function for this called `arange`.  The line of code `np.arange(start, stop, step)` evaluates to an array with all the numbers starting at `start` and counting up by `step`, stopping **before** `stop` is reached.

Run the following cells to see some examples!

In [19]:
# This array starts at 1 and counts up by 2
# and then stops before 6
np.arange(1, 6, 2)
# THis array outputs 1,3, and 5 because the arange function will give you output from 1-6, with an upwards intervals of 2
# starts at 1, add 2, then you get 3, then add two, then you get 5

array([1, 3, 5])

In [20]:
# This array doesn't contain 9
# because np.arange stops *before* the stop value is reached
np.arange(4, 9, 1)

array([4, 5, 6, 7, 8])

**Question 2.1.3.** Import `numpy` as `np` and then use `np.arange` to create an array with the multiples of 42 from 0 up to (**and including**) 4242.  (So its elements are 0, 42, 84, 126, etc.)

<!--
BEGIN QUESTION
name: q513
-->

In [22]:
import numpy as np
multiples_of_42 = np.arange(0,4243, 42)
multiples_of_42

array([   0,   42,   84,  126,  168,  210,  252,  294,  336,  378,  420,
        462,  504,  546,  588,  630,  672,  714,  756,  798,  840,  882,
        924,  966, 1008, 1050, 1092, 1134, 1176, 1218, 1260, 1302, 1344,
       1386, 1428, 1470, 1512, 1554, 1596, 1638, 1680, 1722, 1764, 1806,
       1848, 1890, 1932, 1974, 2016, 2058, 2100, 2142, 2184, 2226, 2268,
       2310, 2352, 2394, 2436, 2478, 2520, 2562, 2604, 2646, 2688, 2730,
       2772, 2814, 2856, 2898, 2940, 2982, 3024, 3066, 3108, 3150, 3192,
       3234, 3276, 3318, 3360, 3402, 3444, 3486, 3528, 3570, 3612, 3654,
       3696, 3738, 3780, 3822, 3864, 3906, 3948, 3990, 4032, 4074, 4116,
       4158, 4200, 4242])

##### Temperature readings
NOAA (the US National Oceanic and Atmospheric Administration) operates weather stations that measure surface temperatures at different sites around the United States.  The hourly readings are [publicly available](http://www.ncdc.noaa.gov/qclcd/QCLCD?prior=N).

Suppose we download all the hourly data from the Raleigh-Durham International Airport site for the semester, from January 1, 2022 to April 30, 2022.  To analyze the data, we want to know when each reading was taken, but we find that the data don't include the timestamps of the readings (the time at which each one was taken).

However, we know the first reading was taken at the first instant of January 2022 (midnight on January 1st) and each subsequent reading was taken exactly 1 hour after the last.

**Question 2.1.4.** Create an array of the *time, in seconds, since the start of the January 1st, and ending before the start of May 1st* at which each hourly reading was taken.  Name it `collection_times`.

*Hint 1:* How many days are in each month? Is it a leap year? There are 31 days in January, which is equivalent to ($31 \times 24$) hours or ($31 \times 24 \times 60 \times 60$) seconds.

<!--
BEGIN QUESTION
name: q214
-->

In [23]:
jan = (31*24*60*60)
feb = (28*24*60*60)
march = (31*24*60*60)
april = (30*24*60*60)

collection_times = jan + feb + march + april
collection_times

10368000

## 2.2. Working with single elements of arrays ("indexing")
Let's work with a more interesting dataset.  The next cell creates an array called `NCpop` that includes estimated North Carolina populations in every year from 1900 to roughly the present. The estimates come from the [FRED Economic Data](https://fred.stlouisfed.org/series/NCPOP).


In [29]:
NCpop = Table.read_table("Datasets/NCpop.csv").column("NCPOP")
NCpop

array([ 1897000,  1926000,  1956000,  1986000,  2017000,  2051000,
        2077000,  2105000,  2142000,  2174000,  2221000,  2276000,
        2313000,  2362000,  2421000,  2473000,  2513000,  2546000,
        2522000,  2535000,  2588000,  2651000,  2700000,  2761000,
        2830000,  2895000,  2959000,  3027000,  3082000,  3133000,
        3167000,  3184000,  3227000,  3268000,  3304000,  3323000,
        3346000,  3385000,  3440000,  3514000,  3574000,  3589000,
        3569000,  3654000,  3560000,  3533000,  3706000,  3769000,
        3837000,  3911000,  4068000,  4120000,  4109000,  4120000,
        4131000,  4242000,  4309000,  4368000,  4376000,  4458000,
        4573000,  4663000,  4707000,  4742000,  4802000,  4863000,
        4896000,  4952000,  5004000,  5031000,  5084411,  5203531,
        5301150,  5389852,  5470911,  5547188,  5607964,  5685607,
        5759492,  5823491,  5898980,  5956653,  6019101,  6077056,
        6164006,  6253954,  6321578,  6403700,  6480594,  6565

Here's how we get the first element of `NCpop`, which is the world population in the first year in the dataset, 1900.

In [30]:
NCpop.item(0)

1897000

The value of that expression is the number 1897000 (almost 2 million), because that's the first thing in the array `NCpop`.

Notice that we wrote `.item(0)`, not `.item(1)`, to get the first element.  This is a weird convention in computer science.  0 is called the *index* of the first item.  It's the number of elements that appear *before* that item.  So 3 is the index of the 4th item.

Here are some more examples.  In the examples, we've given names to the things we get out of `NCpop`.  Read and run each cell.

In [31]:
# The 13th element in the array is the population
# in 1912 (which is 1900 + 12).
nc_pop_1962 = NCpop.item(12)
nc_pop_1962

2313000

In [32]:
# The 66th element is the population in 1965.
nc_pop_2015 = NCpop.item(65)
nc_pop_2015

4863000

In [33]:
# The array has only 122 elements, so this doesn't work.
# (There's no element with 122 other elements before it.)
nc_pop_2022 = NCpop.item(122)
population_2022

IndexError: index 122 is out of bounds for axis 0 with size 122

Since `make_array` returns an array, we can call `.item(3)` on its output to get its 4th element, just like we "chained" together calls to the method `replace` earlier.

In [34]:
make_array(-1, -3, 4, -2).item(3)

-2

**Question 2.2.1.** Set `population_1982` to the North Carolina population in 1982, by getting the appropriate element from `NCpop` using `item`.

<!--
BEGIN QUESTION
name: q221
-->

In [35]:
population_1982 = NCpop.item(82)
population_1982

6019101

## 2.3. Doing something to every element of an array
Arrays are primarily useful for doing the same operation many times, so we don't often have to use `.item` and work with single elements.

##### Logarithms
Here is one simple question we might ask about world population:

> How big was the population in *orders of magnitude* in each year?

Orders of magnitude quantify how big a number is by representing it as the power of another number (for example, representing 104 as $10^{2.017033}$). One way to do this is by using the logarithm function. The logarithm (base 10) of a number increases by 1 every time we multiply the number by 10. It's like a measure of how many decimal digits the number has, or how big it is in orders of magnitude.

We could try to answer our question like this, using the `log10` function from the `math` module and the `item` method you just saw:

In [36]:
nc_pop_1900_magnitude = math.log10(NCpop.item(0))
nc_pop_1901_magnitude = math.log10(NCpop.item(1))
nc_pop_1902_magnitude = math.log10(NCpop.item(2))
nc_pop_1903_magnitude = math.log10(NCpop.item(3))
nc_pop_1982_magnitude = math.log10(NCpop.item(82))

But this is tedious and doesn't really take advantage of the fact that we are using a computer.

Instead, NumPy provides its own version of `log10` that takes the logarithm of each element of an array.  It takes a single array of numbers as its argument.  It returns an array of the same length, where the first element of the result is the logarithm of the first element of the argument, and so on.

**Question 2.3.1.** Use `np.log10` to compute the logarithms of the North Carolina population in every year.  Give the result (an array of 122 numbers) the name `NCpop_magnitudes`.  Your code should be very short.

<!--
BEGIN QUESTION
name: q231
-->

In [40]:
NCpop_magnitudes = np.log10(NCpop)
NCpop_magnitudes

array([6.27806733, 6.28465628, 6.29136885, 6.29797924, 6.3047059 ,
       6.31196566, 6.3174365 , 6.3232521 , 6.33081947, 6.33725954,
       6.34654856, 6.35717226, 6.36417563, 6.37327989, 6.38399479,
       6.39322412, 6.40019249, 6.4058584 , 6.40174508, 6.40397796,
       6.41296427, 6.42340973, 6.43136376, 6.44106641, 6.45178644,
       6.46164857, 6.47114497, 6.48101242, 6.48883263, 6.49596039,
       6.50064806, 6.50297306, 6.50879897, 6.51428205, 6.51904004,
       6.52153034, 6.52452594, 6.52955867, 6.53655844, 6.54580176,
       6.55315455, 6.55497346, 6.55254655, 6.56276854, 6.55145   ,
       6.54814364, 6.56890541, 6.57622614, 6.5839918 , 6.59228782,
       6.60938094, 6.61489722, 6.61373614, 6.61489722, 6.61605519,
       6.62757066, 6.63437649, 6.64028263, 6.64107731, 6.64914006,
       6.6602012 , 6.66866542, 6.6727442 , 6.67596155, 6.68142216,
       6.68690427, 6.68984141, 6.69478064, 6.6993173 , 6.70165432,
       6.70624065, 6.71629815, 6.72437009, 6.73157684, 6.73805

What you just did is called *elementwise* application of `np.log10`, since `np.log10` operates separately on each element of the array that it's called on. Here's a picture of what's going on:

<img src="lab03-array_logarithm.jpg">


The textbook's [section](https://www.inferentialthinking.com/chapters/05/1/Arrays)  on arrays has a useful list of NumPy functions that are designed to work elementwise, like `np.log10`.

##### Arithmetic
Arithmetic also works elementwise on arrays, meaning that if you perform an arithmetic operation (like subtraction, division, etc) on an array, Python will do the operation to every element of the array individually and return an array of all of the results. For example, you can divide all the population numbers by 1 million to get numbers in millions:

In [41]:
NCpop_in_millions = NCpop / 1000000
NCpop_in_millions

array([ 1.897   ,  1.926   ,  1.956   ,  1.986   ,  2.017   ,  2.051   ,
        2.077   ,  2.105   ,  2.142   ,  2.174   ,  2.221   ,  2.276   ,
        2.313   ,  2.362   ,  2.421   ,  2.473   ,  2.513   ,  2.546   ,
        2.522   ,  2.535   ,  2.588   ,  2.651   ,  2.7     ,  2.761   ,
        2.83    ,  2.895   ,  2.959   ,  3.027   ,  3.082   ,  3.133   ,
        3.167   ,  3.184   ,  3.227   ,  3.268   ,  3.304   ,  3.323   ,
        3.346   ,  3.385   ,  3.44    ,  3.514   ,  3.574   ,  3.589   ,
        3.569   ,  3.654   ,  3.56    ,  3.533   ,  3.706   ,  3.769   ,
        3.837   ,  3.911   ,  4.068   ,  4.12    ,  4.109   ,  4.12    ,
        4.131   ,  4.242   ,  4.309   ,  4.368   ,  4.376   ,  4.458   ,
        4.573   ,  4.663   ,  4.707   ,  4.742   ,  4.802   ,  4.863   ,
        4.896   ,  4.952   ,  5.004   ,  5.031   ,  5.084411,  5.203531,
        5.30115 ,  5.389852,  5.470911,  5.547188,  5.607964,  5.685607,
        5.759492,  5.823491,  5.89898 ,  5.956653, 

You can do the same with addition, subtraction, multiplication, and exponentiation (`**`). For example, you can calculate a tip on several restaurant bills at once (in this case just 3):

In [42]:
restaurant_bills = make_array(20.12, 39.90, 31.01)
print("Restaurant bills:\t", restaurant_bills)

# Array multiplication
tips = .2 * restaurant_bills
print("Tips:\t\t\t", tips)

Restaurant bills:	 [20.12 39.9  31.01]
Tips:			 [4.024 7.98  6.202]


<img src="lab03-array_multiplication.jpg">

**Question 2.3.2.** Suppose the total charge at a restaurant is the original bill plus the tip. If the tip is 20%, that means we can multiply the original bill by 1.2 to get the total charge.  Compute the total charge for each bill in `restaurant_bills`, and assign the resulting array to `total_charges`.

<!--
BEGIN QUESTION
name: q532
-->

In [43]:
total_charges = tips + restaurant_bills
total_charges

array([24.144, 47.88 , 37.212])

**Question 2.3.3.** The array `more_restaurant_bills` contains 100,000 bills!  Compute the total charge for each one.  How is your code different?

This code is different because it includes the tips and the bills into one line of code 

<!--
BEGIN QUESTION
name: q233
-->

In [48]:
more_restaurant_bills = Table.read_table("Datasets/more_restaurant_bills.csv").column("Bill")
more_total_charges = (.2 * more_restaurant_bills) + more_restaurant_bills
more_total_charges

array([20.244, 20.892, 12.216, ..., 19.308, 18.336, 35.664])

The function `sum` takes a single array of numbers as its argument.  It returns the sum of all the numbers in that array (so it returns a single number, not an array).

**Question 5.3.4.** What was the sum of all the bills in `more_restaurant_bills`, *including tips*?

<!--
BEGIN QUESTION
name: q534
-->

In [49]:
sum_of_bills = sum(more_total_charges)
sum_of_bills

1795730.0640000193

## 3. Creating Tables

An array is useful for describing a single attribute of each element in a collection. For example, let's say our collection is all US States. Then an array could describe the land area of each state. 

Tables extend this idea by containing multiple arrays, each one describing a different attribute for every element of a collection. In this way, tables allow us to not only store data about many entities but to also contain several kinds of data about each entity.

For example, in the cell below we have two arrays. The first one, `NCpop`, was defined above in section 2.2 and contains the North Carolina population in each year. The second array, `years`, contains the years themselves. These elements are in order, so the year and the North Carolina population for that year have the same index in their corresponding arrays.

In [50]:
# Just run this cell

years = np.arange(1900, 2021+1)
print("Population column:", NCpop)
print("Years column:", years)

Population column: [ 1897000  1926000  1956000  1986000  2017000  2051000  2077000  2105000
  2142000  2174000  2221000  2276000  2313000  2362000  2421000  2473000
  2513000  2546000  2522000  2535000  2588000  2651000  2700000  2761000
  2830000  2895000  2959000  3027000  3082000  3133000  3167000  3184000
  3227000  3268000  3304000  3323000  3346000  3385000  3440000  3514000
  3574000  3589000  3569000  3654000  3560000  3533000  3706000  3769000
  3837000  3911000  4068000  4120000  4109000  4120000  4131000  4242000
  4309000  4368000  4376000  4458000  4573000  4663000  4707000  4742000
  4802000  4863000  4896000  4952000  5004000  5031000  5084411  5203531
  5301150  5389852  5470911  5547188  5607964  5685607  5759492  5823491
  5898980  5956653  6019101  6077056  6164006  6253954  6321578  6403700
  6480594  6565459  6656987  6748135  6831850  6947412  7060959  7185403
  7307658  7428672  7545828  7650789  8081614  8210122  8326201  8422501
  8553152  8705407  8917270  911

Suppose we want to answer this question:

> In which year did the North Carolina's population cross 5 million?

You could technically answer this question by importing and viewing the full NCpop table that includes the years, or just from staring at the arrays and counting the position where the population first crossed 5 million. We won't do that since we want how to learn to do these things when those methods may not be available. 

The expression below:

- creates an empty table using the expression `Table()`,
- adds two columns by calling `with_columns` with four arguments,
- assigns the result to the name `population`, and finally
- evaluates `population` so that we can see the table.

The strings `"Year"` and `"Population"` are column labels that we have chosen. The names `NCpop` and `years` were assigned above to two arrays of the **same length**. The function `with_columns` takes in alternating strings (to represent column labels) and arrays (representing the data in those columns). The strings and arrays are separated by commas.

In [51]:
population = Table().with_columns(
    "Population", NCpop,
    "Year", years
)
population

Population,Year
1897000,1900
1926000,1901
1956000,1902
1986000,1903
2017000,1904
2051000,1905
2077000,1906
2105000,1907
2142000,1908
2174000,1909


Now the data is combined into a single table! It's much easier to parse this data. If you need to know what the population was in 1959, for example, you can tell from a single glance.

## 4. More Table Operations!

Now that you've worked with arrays, let's add a few more methods to the list of table operations that you saw in Lab 2.

### `column`

`column` takes the column name of a table (in string format) as its argument and returns the values in that column as an **array**. 

In [53]:
# Returns an array of the amount of restaurant bills
more_restaurant_bills
# This can't read the column of a column
# THe variable "more_restaurant_bills" is already assigned the column "Bill"
# So, I'm getting rid of the call for .column("Bill")

array([16.87, 17.41, 10.18, ..., 16.09, 15.28, 29.72])

Note that both the array above and the table `more_restaurant_bills` only have data for one variable, but their structures are different. Many function that we will use expect the data to be in either an array format or a table format, but can not handle both.

In [54]:
# Returns a table of the amount of restaurant bills
more_restaurant_bills = Table.read_table("Datasets/more_restaurant_bills.csv")
more_restaurant_bills

Bill
16.87
17.41
10.18
9.84
18.44
8.99
39.72
3.55
4.65
5.61


### `take`
The table method `take` takes as its argument an array of numbers.  Each number should be the index of a row in the table.  It returns a **new table** with only those rows. 

You'll usually want to use `take` in conjunction with `np.arange` to take the first few rows of a table.

In [55]:
# Take first 100 amounts of restaurant bills
more_restaurant_bills.take(np.arange(0, 100, 1))

Bill
16.87
17.41
10.18
9.84
18.44
8.99
39.72
3.55
4.65
5.61


The next questions will give you practice with combining the operations you've learned in this lab and the previous one to answer questions about the `population` table. First, check out the `population` table from section 2.

In [56]:
# Run this cell to display the population table.
population

Population,Year
1897000,1900
1926000,1901
1956000,1902
1986000,1903
2017000,1904
2051000,1905
2077000,1906
2105000,1907
2142000,1908
2174000,1909


**Question 4.1.** Compute the year when the North Carolina population first went above 5 million. Assign the year to `year_NCpop_crossed_5_million`. Here are some more ways that you can use the [where table function](http://data8.org/datascience/predicates.html)

<!--
BEGIN QUESTION
name: q41
-->

In [60]:
year_NCpop_crossed_5_billion = population.where("Population", are.above_or_equal_to(5000000))
year_NCpop_crossed_5_billion
# THe first year is 1968 where the population is above 5 million

Population,Year
5004000,1968
5031000,1969
5084411,1970
5203531,1971
5301150,1972
5389852,1973
5470911,1974
5547188,1975
5607964,1976
5685607,1977


**Question 4.2.** Find the average yearly change in North Carolina's population for 1980's (years from 1980 and 1990 (inclusive)) and the average yearly change in North Carolina population for 2010's (years from 2010 and 2020 (inclusive)). You should round to the nearest whole number.

*Hint*: Think of the steps you need to do and try to put them in an order that makes sense.

*Hint*: In a previous assignment you used a function that calculated the differences between adjacent values in an array

<!--
BEGIN QUESTION
name: q42
-->

In [80]:
NC_pop_1980_to_1990 = np.arange(80,91, 1)
NC_pop_1980_to_1990_pop = []

NC_pop_2010_to_2020 = np.arange(110, 121, 1)
NC_pop_2010_to_2020_pop = []

for each in NC_pop_1980_to_1990: 
    NC_pop_1980_to_1990_pop.append(NCpop.item(each))

for each in NC_pop_2010_to_2020: 
    NC_pop_2010_to_2020_pop.append(NCpop.item(each))

In [81]:
print(NC_pop_1980_to_1990_pop)
print(NC_pop_2010_to_2020_pop)

[5898980, 5956653, 6019101, 6077056, 6164006, 6253954, 6321578, 6403700, 6480594, 6565459, 6656987]
[9574586, 9658913, 9751810, 9846717, 9937295, 10037218, 10161802, 10275758, 10391358, 10501384, 10457177]


In [85]:
import statistics as st
st.mean(NC_pop_1980_to_1990_pop)
st.mean(NC_pop_2010_to_2020_pop)

# This is the average population between those two years 
# This does not give you the average yearly change 

10054001.636363637

In [87]:
# Average yearly change: 
# Final value - start value / start value x100

NCpop1980 = NCpop.item(80)
NCpop1990 = NCpop.item(90)
NCpop2010 = NCpop.item(110)
NCpop2020 = NCpop.item(120)

between_1980_and_1990 = ((NCpop1990 - NCpop1980)/NCpop1980)*100
between_2010_and_2020 = ((NCpop2020-NCpop2010)/NCpop2010)*100
print("Average yearly change in North Carolina's population for the 1980's:", between_1980_and_1990)
print("Average yearly change in North Carolina's population for the 2010's", between_2010_and_2020)

Average yearly change in North Carolina's population for the 1980's: 12.849797761646927
Average yearly change in North Carolina's population for the 2010's 9.2180591411472


Congratulations, you're done with lab 3!