# Python Tutorial Part II: Data Types
## Summer 2022
## Wendy and Allison

In the other notebook, we had our first look at Python and Jupyter notebooks. So far, we've only used Python to manipulate numbers.  There's a lot more to life than numbers, so Python lets us represent many other types of data in programs.

In this tutorial, we'll learn how to:
* represent and manipulate another fundamental type of data: text.  A piece of text is called a *string* in Python.
* invoke *methods*.  A method is very similar to a function.  It just looks a little different because it's tied to a particular piece of data (like a piece of text or a number).
* work with datasets in Python -- *collections* of data, like the numbers 2 through 5 or the words "welcome", "to", and "boulder".

We principally use two kinds of collections:
  * **Arrays:** An array is a collection of many pieces of a single kind of data, kept in order.  An array is like a single column in an Excel spreadsheet.
  * **Tables:** A table is a collection of many pieces of different kinds of data.  It's like an entire Excel spreadsheet.  Each row of a table represents one entity, and each column contains a different kind of data about each entity.

This week is about arrays.

Feel free to add cells at any point to aid you in any calculations!

This notebook was modified from Berkeley's Data8 data science course.

First, execute the following cell to set up the lab.

In [1]:
import math
import numpy as np

# 1. Review: The building blocks of Python code
The two building blocks of Python code are *expressions* and *statements*.  An expression is a piece of code that

* is self-contained, meaning it would make sense to write it on a line by itself, and
* usually has a value.

Here are some expressions, with values `3`, `-1`, `1`, and `32`, respectively:

    3
    2 - 3
    abs(2 - 3)
    max(3, pow(2, abs(2 - 3) + pow(2, 2)))

All these expressions but the first are *compound expressions*, meaning that they are actually combinations of several smaller expressions.  `2 + 3` combines the expressions `2` and `3` by addition.  In that example, `2` and `3` are called *subexpressions* because they're expressions that are part of a larger expression.

A *statement* is a whole line of code.  Some statements are just expressions.  The expressions listed above are examples.

Other statements *make something happen* rather than *having a value*.  After they are run, something in the world has changed.  For example, an assignment statement assigns a value to a name.  Here are some assignment statements:
    
    height = 1.3
    the_number_five = abs(-5)
    absolute_height_difference = abs(height - 1.688)

A key idea in programming is that large, interesting things can be built by combining many simple, uninteresting things.  The key to understanding a complicated piece of code is breaking it down into its simple components.

For example, a lot is going on in the last statement above, but it's really just a combination of a few things.  This picture describes what's going on.

<img src="https://github.com/cu-applied-math/stem-camp-notebooks/blob/master/2021/PythonIntro/figs/statement.jpg?raw=1">

### QUESTION 1.1
In the next cell, assign the name `this_year` to the larger number of the following two numbers:

1. the absolute value of $2^{5}-2^{11}$, and 
2. $2\times(2^{4}\times 3\times 21+1$).

Try to use just one statement (one line of code).

In [2]:
this_year = ...
this_year

Ellipsis

# 2. Text
Programming doesn't just concern numbers. Text is one of the most common types of values used in programs. 

A snippet of text is represented by a *string value* in Python. The word "*string*" is a computer science term for a sequence of characters. A string might contain a single character, a word, a sentence, or a whole book.

To distinguish text data from actual code, we demarcate strings by putting quotation marks around them. Single quotes (`'`) and double quotes (`"`) are both valid. The contents can be any sequence of characters, including numbers and symbols. 

We've seen strings before in `print` statements.  Below, two different strings are passed as arguments to the `print` function.

In [3]:
print("I <3", 'Data Science')

I <3 Data Science


`print` prints all of its arguments together, separated by single spaces.

Just like names can be given to numbers, names can be given to string values.  The names and strings aren't required to be similar. Any name can be given to any string.

### QUESTION 2.1
Yuri Gagarin was the first person to travel through outer space.  When he emerged from his capsule upon landing on Earth, he [reportedly](https://en.wikiquote.org/wiki/Yuri_Gagarin) had the following conversation with a woman and girl who saw the landing:

    The woman asked: "Can it be that you have come from outer space?"
    Gagarin replied: "As a matter of fact, I have!"

The cell below contains unfinished code.  Fill in the `...`s so that it prints out this conversation *exactly* as it appears above.

In [4]:
woman_asking = ...
woman_quote = '"Can it be that you have come from outer space?"'
gagarin_reply = 'Gagarin replied:'
gagarin_quote = ...

print(woman_asking, woman_quote)
print(gagarin_reply, gagarin_quote)

Ellipsis "Can it be that you have come from outer space?"
Gagarin replied: Ellipsis


## 2.1. String Methods

Strings can be transformed using *methods*, which are functions that involve an existing string and some other arguments. For example, the `replace` method replaces all instances of some part of a string with some replacement. A method is invoked on a string by placing a `.` after the string value, then the name of the method, and finally parentheses containing the arguments. 

    <string>.<method name>(<argument>, <argument>, ...)

Otherwise, a method is pretty similar to a function.

Try to predict the output of these examples, then execute them.

In [5]:
# Replace one letter
'Hello'.replace('o', 'a')

'Hella'

In [6]:
# Replace a sequence of letters, which appears twice
'hitchhiker'.replace('hi', 'ma')

'matchmaker'

Once a name is bound to a string value, methods can be invoked on that name as well. The name doesn't change in this case, so a new name is needed to capture the result. 

In [7]:
sharp = 'edged'
hot = sharp.replace('ed', 'ma')
print('sharp =', sharp)
print('hot =', hot)

sharp = edged
hot = magma


Remember that you could call functions on the results of other functions.  For example,

    max(abs(-5), abs(3))

has value 5.  Similarly, you can invoke methods on the results of other method (or function) calls.

In [8]:
# Calling replace on the output of another call to
# replace
'train'.replace('t', 'ing').replace('in', 'de')

'degrade'

Here's a picture of how Python evaluates a "chained" method call like that:

<img src="https://github.com/cu-applied-math/stem-camp-notebooks/blob/master/2021/PythonIntro/figs/chaining_method_calls.jpg?raw=1"/>

### QUESTION 2.1.1
Assign strings to the names `you` and `this` so that the final expression evaluates to a 10-letter English word with three double letters in a row.

*Hint:* After you guess at some values for `you` and `this`, it's helpful to see the value of the variable `the`.  Try printing the value of `the` by adding a line like this:
    
    print(the)

*Hint 2:* Run the tests if you're stuck.  They'll often give you help.

*Hint 3:* "Beekeeper"

In [9]:
# TO DO 2.1.1
you = ...
this = ...
a = 'beeper'
the = a.replace('p', you) 
print(the)
the.replace('bee', this)

TypeError: replace() argument 2 must be str, not ellipsis

Other string methods do not take any arguments at all, because the original string is all that's needed to compute the result. In this case, parentheses are still needed, but there's nothing in between the parentheses. Here are some methods that work that way:

|Method name|Value|
|-|-|
|`lower`|a lowercased version of the string|
|`upper`|an uppercased version of the string|
|`capitalize`|a version with the first letter capitalized|
|`title`|a version with the first letter of every word capitalized||


In [10]:
'bouLDeR SoLAR ReU'.title()

'Boulder Solar Reu'

In [11]:
'bouLDeR SoLAR ReU'.upper()

'BOULDER SOLAR REU'

Methods can also take arguments that aren't strings.  For example, strings have a method called `zfill` that "pads" them with the character `0` so that they reach a certain length.  This can be useful for displaying numbers in a uniform format:

In [12]:
print("5.12".zfill(6))
print("10.50".zfill(6))

005.12
010.50


All these string methods are useful, but most programmers don't memorize their names or how to use them.  In the "real world," people usually just search the internet for *documentation* and examples.  ([Stack Overflow](http://stackoverflow.com) also has a huge database of answered questions.)

You can refer back to this tutorial for the methods we mention.

## 2.2. Converting to and from Strings

Strings and numbers are different *types* of values, even when a string contains the digits of a number. For example, evaluating the following cell causes an error because an integer cannot be added to a string.

In [13]:
8 + "8"

TypeError: unsupported operand type(s) for +: 'int' and 'str'

However, there are built-in functions to convert numbers to strings and strings to numbers. 

    int:   Converts a string of digits to an integer ("int") value
    float: Converts a string of digits, perhaps with a decimal point, to a decimal ("float") value
    str:   Converts any value to a string

Try to predict what the following cell will evaluate to, then evaluate it.

In [14]:
8 + int("8")

16

Suppose you're writing a program that looks for dates in a text, and you want your program to find the amount of time that elapsed between two years it has identified.  It doesn't make sense to subtract two texts, but you can first convert the text containing the years into numbers.

### QUESTION 2.2.1
Finish the code below to compute the number of years that elapsed between `one_year` and `another_year`.  Don't just write the numbers `1618` and `1648` (or `30`); use a conversion function to turn the given text data into numbers.

In [15]:
# Some text data:
one_year = "1618"
another_year = "1648"

# Complete the next line.  Note that we can't just write:
#   another_year - one_year
# If you don't see why, try seeing what happens when you
# write that here.
difference = ...
difference

Ellipsis

We can use the `type` function to see what variable type our variable has.

In [16]:
type(one_year)

str

## 2.3. Strings as function arguments
String values, like numbers, can be arguments to functions and can be returned by functions.  The function `len` takes a single string as its argument and returns the number of characters in the string: its **len**gth.  Note that it doesn't count *words*.  `len("one small step for man")` is 22, not 5.

### QUESTION 2.3.1
Use `len` to find out the number of characters in the very long string in the next cell.  (It's the first sentence of the English translation of the French [Declaration of the Rights of Man](http://avalon.law.yale.edu/18th_century/rightsof.asp).)  The length of a string is the total number of characters in it, including things like spaces and punctuation.  Assign `sentence_length` to that number.

In [17]:
a_very_long_sentence = "The representatives of the French people, organized as a National Assembly, believing that the ignorance, neglect, or contempt of the rights of man are the sole cause of public calamities and of the corruption of governments, have determined to set forth in a solemn declaration the natural, unalienable, and sacred rights of man, in order that this declaration, being constantly before all the members of the Social body, shall remind them continually of their rights and duties; in order that the acts of the legislative power, as well as those of the executive power, may be compared at any moment with the objects and purposes of all political institutions and may thus be more respected, and, lastly, in order that the grievances of the citizens, based hereafter upon simple and incontestable principles, shall tend to the maintenance of the constitution and redound to the happiness of all."
sentence_length = ...
sentence_length

Ellipsis

# 3. Importing packages

> What has been will be again,  
> what has been done will be done again;  
> there is nothing new under the sun.

Most programming involves work that is very similar to work that has been done before.  Since writing code is time-consuming, it's good to rely on others' published code when you can.  Rather than copy-pasting, Python allows us to *import* other code, creating a *module* that contains all of the names created by that code.

Python includes many useful modules that are just an `import` away.  We'll look at the `math` module as a first example.

Suppose we want to very accurately compute the area of a circle with radius 5 meters.  For that, we need the constant $\pi$, which is roughly 3.14.  Conveniently, the `math` module defines `pi` for us:

In [18]:
import math
radius = 5
area_of_circle = radius**2 * math.pi
area_of_circle

78.53981633974483

`pi` is defined inside `math`, and the way that we access names that are inside modules is by writing the module's name, then a dot, then the name of the thing we want:

    <module name>.<name>
    
In order to use a module at all, we must first write the statement `import <module name>`.  That statement creates a module object with things like `pi` in it and then assigns the name `math` to that module.  Above we have done that for `math`.

### QUESTION 3.1 
`math` also provides the name `e` for the base of the natural logarithm, which is roughly 2.71.  Compute $e^{\pi}-\pi$, giving it the name `near_twenty`.

In [19]:
near_twenty = ...
near_twenty

Ellipsis

![XKCD](http://imgs.xkcd.com/comics/e_to_the_pi_minus_pi.png)

## 3.1. Importing functions
Modules can provide other named things, including functions.  For example, `math` provides the name `sin` for the sine function.  Having imported `math` already, we can write `math.sin(3)` to compute the sine of 3.  (Note that this sine function considers its argument to be in [radians](https://en.wikipedia.org/wiki/Radian), not degrees.  180 degrees are equivalent to $\pi$ radians.)

### QUESTION 3.1.1
A $\frac{\pi}{4}$-radian (45-degree) angle forms a right triangle with equal base and height, pictured below.  If the hypotenuse (the radius of the circle in the picture) is 1, then the height is $\sin(\frac{\pi}{4})$.  Compute that using `sin` and `pi` from the `math` module.  Give the result the name `sine_of_pi_over_four`.

<img src="http://mathworld.wolfram.com/images/eps-gif/TrigonometryAnglesPi4_1000.gif">
(Source: [Wolfram MathWorld](http://mathworld.wolfram.com/images/eps-gif/TrigonometryAnglesPi4_1000.gif))

In [20]:
sine_of_pi_over_four = ...
sine_of_pi_over_four

Ellipsis

For your reference, here are some more examples of functions from the `math` module:

In [21]:
# Calculating factorials.
math.factorial(5)

120

In [22]:
# Calculating logarithms (the logarithm of 8 in base 2).
# The result is 3 because 2 to the power of 3 is 8.
math.log(8, 2)

3.0

In [23]:
# Calculating square roots.
math.sqrt(5)

2.23606797749979

In [24]:
# Calculating cosines.
math.cos(math.pi)

-1.0

##### A function that displays a picture
People have written Python functions that do very cool and complicated things, like crawling web pages for data, transforming videos, or doing machine learning with lots of data.  Now that you can import things, when you want to do something with code, first check to see if someone else has done it for you.

Let's see an example of a function that's used for downloading and displaying pictures.

The module `IPython.display` provides a function called `Image`.  `Image` takes a single argument, a string that is the URL of the image on the web. It returns an *image* value that this Jupyter notebook understands how to display.  To display an image, make it the value of the last expression in a cell, just like you'd display a number or a string.

We can import a single function using the syntax `from _____ import ______`.

### QUESTION 3.1.2
In the next cell, import the module `IPython.display` and use its `Image` function to display the image at this URL:

    https://www.nasa.gov/sites/default/files/m7.3-flare-zoom_0.jpg

Give the name `space_weather` to the output of the call to `Image`, then make the last line of the cell `space_weather` to see the image.  (It might take a few seconds to load the image.)

In [25]:
# Import the module IPython.display.
from IPython.display import Image
# Replace the ... with a call to the Image function
# in the IPython.display module, which should produce
# a picture.
space_weather = ...
space_weather

Ellipsis

# 4. Arrays
Up to now, we haven't done much that you couldn't do yourself by hand, without going through the trouble of learning Python.  Computers are most useful when you can use a small amount of code to *do the same action* to *many different things*.

For example, in the time it takes you to calculate the 18% tip on a restaurant bill, a laptop can calculate 18% tips for every restaurant bill paid by every human on Earth that day.  (That's if you're pretty fast at doing arithmetic in your head!)

Arrays are how we put many values in one place so that we can operate on them as a group.  For example, if `billions_of_numbers` is an array of numbers, the expression

    .18 * billions_of_numbers

gives a new array of numbers that's the result of multiplying each number in `billions_of_numbers` by .18 (18%).  Arrays are not limited to numbers; we can also put all the words in a book into an array of strings.

Concretely, an array is like a column in an Excel spreadsheet.

<img src="https://github.com/cu-applied-math/stem-camp-notebooks/blob/master/2021/PythonIntro/figs/excel_array.jpg?raw=1">

## 4.1. Making arrays
You can type in the data that goes in an array yourself, but that's not typically how programs work. Normally, we create arrays by loading them from an external source, like a data file.

First, though, let's learn how to do it manually.

Arrays are provided by a package called [NumPy](http://www.numpy.org/) (pronounced "NUM-pie" or, if you prefer to pronounce things incorrectly, "NUM-pee").  The package is called `numpy`, but it's standard to rename it `np` for brevity.  You can do that with:

    import numpy as np

To create an array, call the function `np.array`.  Each argument you pass to `np.array` will be in the array it returns.  Run this cell to see an example:

In [26]:
import numpy as np
np.array([0.125, 4.75, -1.3])

array([ 0.125,  4.75 , -1.3  ])

Each item in an array (in the above case, the numbers 0.125, 4.75, and -1.3) is called an *element* of that array.

Arrays are values, just like numbers and strings.  That means you can assign them names or use them as arguments to functions. They also have dimensions! We can use `np.shape()` to see the dimension of an array.

### QUESTION 4.1.1
Make an array containing the numbers 1, 2, and 3, in that order. Name it `small_numbers`. Then print the shape of the array.

In [27]:
small_numbers = ...
# print the array
# print the shape of the array

See if you can add the numbers 4, 5, and 6 to the `small_numbers` array. Call it `more_numbers`. Show the values in the array and print the new dimensions of the array.

*Hint:* See the [documentation](https://numpy.org/doc/stable/reference/generated/numpy.append.html#numpy.append) for `np.append`

In [28]:
more_numbers = ...
# print the shape of the array

# print the array
more_numbers

Ellipsis

Note: python also has a structure called a `List` that can be used as an alternative to an array in SOME cases. 
In short, arrays store data more efficiently and we can perform arithmetic operations such as "divide all elements in the array by 3" on arrays, but not on lists. Additionally, it is generally considered bad practice for arrays to change size because of the way they are stored in memory, so we should know the size of an array before we initialize it. For example, we can initialize arrays using functions like `np.zeros()` or `np.ones()`. We will explore this more later.

[Here](https://www.pythoncentral.io/the-difference-between-a-list-and-an-array/) is a brief article describing the differences betweens arrays and lists.

In many cases, we can accomplish similar tasks using lists (with slightly different methods), as demonstrated below.

In [29]:
small_numbers_list = [1, 2, 3]
for i in range(4, 7):
    small_numbers_list.append(i)
small_numbers_list

[1, 2, 3, 4, 5, 6]

Again, we can always check the difference in variable type (i.e. array vs. list) using `type`.

In [30]:
print(type(small_numbers))
print(type(small_numbers_list))

<class 'ellipsis'>
<class 'list'>


### QUESTION 4.1.2
Make an array containing the numbers 0, 1, -1, $\pi$, and $e$, in that order.  Name it `interesting_numbers`.  *Hint:* How did you get the values $\pi$ and $e$ earlier?  You can refer to them in exactly the same way here.

In [31]:
interesting_numbers = ...
interesting_numbers

Ellipsis

### QUESTION 4.1.3
Make an array containing the five strings `"Hello"`, `","`, `" "`, `"world"`, and `"!"`.  (The third one is a single space inside quotes.)  Name it `hello_world_components`.

*Note:* If you print `hello_world_components`, you'll notice some extra information in addition to its contents: `dtype='<U5'`.  That's just NumPy's extremely cryptic way of saying that the things in the array are strings.

In [32]:
hello_world_components = ...
hello_world_components

Ellipsis

### 4.1.1. `np.arange`

Very often in data science, we want to work with many numbers that are evenly spaced within some range.  NumPy provides a special function for this called `arange`.  `np.arange(start, stop, space)` produces an array with all the numbers starting at `start` and counting up by `space`, stopping before `stop` is reached.

For example, the value of `np.arange(1, 6, 2)` is an array with elements 1, 3, and 5 -- it starts at 1 and counts up by 2, then stops before 6.  In other words, it's equivalent to `make_array(1, 3, 5)`.

`np.arange(4, 9, 1)` is an array with elements 4, 5, 6, 7, and 8.  (It doesn't contain 9 because `arange` stops *before* the stop value is reached.)

### Question 4.1.1.1
Use `np.arange` to create an array with the multiples of 99 from 0 up to (**and including**) 9999.  (So its elements are 0, 99, 198, 297, etc.)

In [33]:
multiples_of_99 = ...
multiples_of_99

Ellipsis

##### Temperature readings
NOAA (the US National Oceanic and Atmospheric Administration) operates weather stations that measure surface temperatures at different sites around the United States.  The hourly readings are [publicly available](http://www.ncdc.noaa.gov/qclcd/QCLCD?prior=N).

Suppose we download all the hourly data from the Oakland, California site for the month of December 2015.  To analyze the data, we want to know when each reading was taken, but we find that the data don't include the timestamps of the readings (the time at which each one was taken).

However, we know the first reading was taken at the first instant of December 2015 (midnight on December 1st) and each subsequent reading was taken exactly 1 hour after the last.

### QUESTION 4.1.1.2
Create an array of the *time, in seconds, since the start of the month* at which each hourly reading was taken.  Name it `collection_times`.

*Hint:* There were 31 days in December, which is equivalent to $31 \times 24$ hours or $31 \times 24 \times 60 \times 60$ seconds.  So your array should have $31 \times 24$ elements in it.

*Hint 2:* The `len` function works on arrays, too.  If your `collection_times` isn't passing the tests, check its length and make sure it has $31 \times 24$ elements.

In [34]:
collection_times = ...
collection_times

Ellipsis

In [35]:
# print length of the array

## 4.2 More Functions

The function `sum` takes a single array of numbers as its argument.  It returns the sum of all the numbers in that array (so it returns a single number, not an array).

### QUESTION 4.3.4
What is the sum of all the multiples of 99 from 0 up to and including 9999. In other words, what is the sum of all the elements in `multiples_of_99`?

In [36]:
# TO DO 4.3.4

### QUESTION 4.3.5
The powers of 2 ($2^0 = 1$, $2^1 = 2$, $2^2 = 4$, etc) arise frequently in computer science.  (For example, you may have noticed that storage on cell phones comes in powers of 2, like 16 GB, 32 GB, or 64 GB.)  Use `np.arange` and the exponentiation operator `**` to compute the first 30 powers of 2, starting from `2^0`.

In [37]:
powers_of_2 = ...
powers_of_2

Ellipsis

## 5.1 Writing Functions
A function is a block of reusable code, made to perform a specific task. We can define our own functions to build our programs in a modular way. Using functions also makes your code much easier for other people to read and understand.

In python, we can define a function using `def`.

The below function greets the user!

In [38]:
def greet(name):
    print(f'Hello {name}!') 

*Note the f character inside the print string. This character allows us to print the *value* of variables in the string.

We can use the `greet` function with any name as an input like so:

In [39]:
my_name = "Boulder Solar"
greet(my_name)

Hello Boulder Solar!


Functions have *inputs* and *outputs*. In the following function that returns different values based on whether a number is even or odd, the INPUTS are *n* and *angle* and the OUTPUTS are the values that are *returned*.

To someone in your group, see if you can describe in words what the following function does.

In [40]:
def even_odd(n, angle):
    if n % 2 == 0: # if n is even, % is the modulo operator in python (if you don't know what it does, google it!)
        return n/2, n+1
    else: # if n is odd
        return np.sin(n * angle), n-1

In [41]:
# Use the function
even_odd(1, np.pi)

(1.2246467991473532e-16, 0)

In [42]:
# We can assign the output of a function to a variable
n_1, n_2 = even_odd(10, 5)
print(n_1)
print(n_2)

5.0
11


Here's another function. What does it do?

Hint: Use the function and uncomment the print statement if you need some help

In [43]:
def example_function(n):
    sum = 0
    for i in range(n):
        # print(i)
        sum = i + sum
    return sum

### QUESTION 5.1

Define a function that takes an integer as an input and returns an array of the first k powers of that integer.

Hint: There are many ways to do this, but if you are stuck, try using a for loop!

In [44]:
# TO DO 5.1