## Tips and tricks for using Jupyter Notebook


1. To run a cell, press `<Shift>+<Return>` This will save you a lot of time. This takes you directy to the next cell. If you want to run the next cell just press `<Shift>+<Return>` again. If you want to edit the cell just press `<Enter>` 
2. Whenever you enter a new session. Don't forget to run all the cells above the cell you want to edit. If you don't do show, you will get a lot of error messages saying that some variable you are trying to use has not been defined. To run all cells above go to Cell>>Run all below
3. When you run a cell, two things happen.
    
        i) Your kernel will be busy (circle on the top right will become black). For as long as the circle is black there is no reason to rerun the cell or run other cells so just be patient.
        ii) If your kernel keeps dying that probably means that either you internet connection is failing or you are asking python to perform an unreasonably large amount of computation. In this case review your work carefully.
4. If your Kernel dies, then go to Kernel>>Restart and then run all the cells above the cell you are at.
5. It is good practice to always save your work as you are going over an assignment. Just do `<Command> + S` to save or go to File>>Save and Checkpoint
6. If you want to comment out a particular line/lines of your code simply select the line and press `<Command>+<?>`
For more common mistakes and gotcha's take a look at Niraj's <a href="https://drive.google.com/a/berkeley.edu/file/d/0B7D7GRdrAwBDQ2J3cGFZYzdjTTQ/view?usp=sharing">guide</a>

Programming in Python
=========

**Note:** This chapter is very important, especially if you have no previous coding experience. I would suggest trying to familiarize yourself as soon as possible with writing code since this will make your life a lot easier when it comes to homeworks and labs (and even exams). I will include some practice exercises in the end of the notebook for you to practice on your own time.

**Note 2:** I will try to take a different stance in talking about python compared to the textbook since just repeating the textbook material won't be of too much help.

<h2> Table of Contents </h2>

* **<a href="#basics">Python Basics</a>**
    * <a href="#calc">Simple Calculator</a>
    * <a href="#bnum">Beyond numbers</a>
    * <a href="#data">Basic Data types</a>
    * <a href="#basic_exe">Python Basics Exercises</a>
* **<a href="#advanced">Advanced Programming</a>** 
    * <a href="#pacs">Packages and built-in functions</a>
    * <a href="#arrays">Arrays</a>
    * <a href="#ranges">Ranges</a>
    * <a href="#advanced_exe">Advanced Programming Exercises</a>
* **<a href="#table">Tables</a>**
    * <a href="#table_manip">Table Manipulation</a>

<a id="section"></a>
<h2> Python Basics </h2>

**Python comments**

These are parts of your code that the compiler ignores and which you can add to explain what you are doing

For example:

In [None]:
# This is a one line comment

<a id="calc"></a>
**Simple calculator**

In [None]:
#In its simplest form, you can think of Python as a fancy calculator
#Much like any calculator, you can type in an expression in a cell and press shift+Enter to run the cell
#e.g.
1+2

In [None]:
#Note: the notebook displays the value of the last expression in the cell
1+2 #You can't see me
2+2 #I am the last expression so you can see me

In [None]:
# Operations in Python
# Addition
1+2 

In [None]:
# Subtraction
2-1

In [None]:
# Multiplication
2*3

In [None]:
# Division
1/2

In [None]:
# Exponentiation
2**3 # base**exponent, 2 raised to the 3rd power

In [None]:
# Remainder
5%2 # Use the % sign to express the remainder of the Euclidean division between 5 and 2

In [None]:
# Order of operations: These are some rules that dictate which operation happens first
# This is no different than what you learned in highschool algebra
1 + 2**2 / 2 * 5 # First, exponentiation takes place, then division and multiplication and finally addition and subtraction

In [None]:
# If we want to give priority to some subexpressions, we use ()
(1+2**2)/(2*5) #Everything inside parentheses will be evaluated first and then we follow the rules of the order of operations

<a id="bnum"></a>
**Beyond numbers**

* **_Names and Assignment Statements_**

In Python, we use assignment statements to bind a value to a name: this "saves" or "stores" the value in the name, and we can then use the name as a variable. Each name can only be bound to one value!
Try to keep your names simple and as indicative of the variable content as possible.

In [None]:
number1 = 1 #assign the value 1 to the name number1
number2 = 2
result = number1 + number2
result

* **_Call Expressions_**

We 'call' functions on a value or values to perform an action on the value(s), which are called argument(s). Python contains several built-in functions, such as abs (absolute value function) and max (finds the maximum value of the arguments). You can find a list of built-in functions here: https://docs.python.org/3/library/functions.html

In [None]:
#max function
max(1,2,3) #passes the arguments 1, 2, and 3 into the function max and calls it

<a id="data"></a>
* **_ Basic Data types_**

|       Type       |                     Example                    | Description                                                                                                                                                                                                                                                                                                               |
|:----------------:|:----------------------------------------------:|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Integer (`int`)  |                   `age = 19`                   | In Python, we refer to integers as whole numbers. You can think of those as every number without a decimal place. Remember though that 5.0 is NOT an integer but 5 is.                                                                                                                                                    |
| Float (`float`)  | `macs_per_student = 1.0` OR `one_half = 0.5`   | Float numbers are numbers with decimals. You can use them in operations just like integers. As we saw above, the result of an operation between integers can give you a float number. The division operator always returns a float.                                                                                                                                      |
|  String (`str`)  | `student_name = 'Tom'` OR `student_name="Tom"` | Strings are a data type meant to denote regular text. There are two ways to denote a string; either with a single quote ' or with double quotes " as in the example. Beware that in general, you cannot use the same operations we used on integers on strings (although there are some exceptions we will discuss later) |
| Boolean (`bool`) | `1 > 2` OR `True`                              | Booleans are a special data type which is the result of a logical operation. You can think of them as the answer to a true or false question. A variable can either have a value of `True` or `False`, or a logical expression that evaluates into one of those.                                                          |

**Integers and Floating Point Numbers:**

In [None]:
int1 = 10
int2 = 2
a_float = int1/int2 #Division always returns a float, even if the dividend and divisor are both integers
a_float

In [None]:
#Floats will automatically convert to scientific notation when appropriate, but integers won't. 
#Also, floats only remember the first 16 decimal places: note that an_int has more digits than sci_notation
sci_notation = 2.0**100
an_int = 2**100
print("Float: ", sci_notation)
print("Int: ", an_int)

In [None]:
#Watch out for roundoff error with floats
not_two = (2**0.5)*(2**0.5) #Normally, we expect 2, but...
not_two

**Strings**

In [None]:
#An example of a string
name = "Vasilis" # same as name ='Vasilis'
name

In [None]:
#Concatenating two strings into one using the + operator
greeting = "Hello " + name + "! It is nice to meet you."
greeting

In [None]:
#Make sure to add spaces when you need them
stuck = 'stuck'
together = 'together'
stuck + together

In [None]:
#Concatenation only works on strings! 
#The following print statement causes an error:
age = 19
print("I am " + age + " years old.") #Age is an integer, not a string

In [None]:
#However, you can convert between types! 

#The int and float functions converts numbers to integers and floating point numbers, respectively
twelve = int(12.9) #int rounds down for floats
twelve_point_0 = float(12)
twelve, twelve_point_0

In [None]:
#The str function converts values to strings
str_3 = str(3)
str3_type = type(str_3) #the type function returns the data type of the value
str3_type

In [None]:
#Now we can concatenate what we wanted earlier
age = 19
print("I am " + str(age) + " years old.")

**Booleans**

In [None]:
# Booleans usually arise when we use what we call logical operators
# The most important ones are ==, !=, >, <, >= and <=
test1 = (1 >= 1) #greater than or equal
print(test1)
test2 = (2 == 2) #equal
print(test2)
test3 = (3 < 2) # smaller than
print(test3)

In [None]:
#You can also assign a boolean to be True or False
true = True
false = False
true==false

In [None]:
# Bonus Booleans
#You can also compare strings using operators like ==, < or > just as with numbers (alphabetical order)
print("cat" < "dog") # True
print("Cat"<"at") # Capital letters take priority over lowercase ones True
print("kitty"<"kite") # False

<a id="basic_exe"></a>
** Python Basics Exercices**

1) What is the output when you run the following lines of code? What is assigned to the name 'lilo'? Write your answer in the cell below.

lilo = "stitch" 

print(lilo)

lilo = "Lilo"

stitch

[YOUR ANSWER HERE]

2) My brother is X years old where X is selected at random from a list of ages. Given X, calculate the number of years it will take for him to turn 100 years old.

In [None]:
import numpy as np # Numpy is an important package that we'll be using in class - more on packages in a bit
age = np.random.choice(np.arange(1,20)) #This randomly selects a number between 1 and 19 
years =  ... 
print("My brother is " + ... + " years old.")
print("He will turn 100 in " + ... + " years.") #Replace the ...'s with your answer

3) In the context of Euclidean Division, we have the dividend, the divisor, the quotient and the remainder (look them up if you don't remember what they are). If I give you the dividend and divisor, can you give me the quotient and remainder? Print your results. Your output should look like this:

Dividend: X

Divisor: Y

Quotient: Z

Remainder: A

*Hint 1: You may wanna take a look at what the function `int()` does when you call it with a float number. E.g. `int(1/3)`

In [None]:
# Randomly select a pair of divident and divisor
dividend = np.random.choice(np.arange(1, 100)) 
divisor = np.random.choice(np.arange(1, 6))
quotient = ...
remainder = ... #fill in these two lines
print(...)
...

4) Find if a number is odd or even. Here, we will have to you an if statement. I encourage you to look up how the if statement is used. Replace the `<condition>` with a boolean expression (eg `1==2` or `1>2` etc)

In [None]:
number = np.random.choice(np.arange(1,1000))
if <condition>:
    print(str(number) + " is odd.")
else:
    print(str(number) + " is even.")

5) Given the expression, add the necessary parentheses to get the desired results. When you are done all the print statement should be True.

In [None]:
#example
print(1+2/3==1)
#Should be changed to
print((1+2)/3==1)

In [None]:
print(1+2/3+4-5*6+7+8+9/10 == -10)

6) Replace each '...' with a correct logical operator such that each Boolean expression evaluates to True. 

In [None]:
print(1.0 ... 1)
print(2 ... 3) 
print(2 ... 3) #write a different operator than the one above!
print(2 ... 3) #write a different operator again!
print("John" ... "David")
print("John" ... "david")

7) What happens when you run this line of code?

"2" > 2

[YOUR ANSWER HERE]

<a id="advanced"></a>
<h2> Advanced Programming</h2>

<a id="pacs"></a>
**Packages and built-in functions**

_Built-in Functions_

A `function` in programming is a procedure that takes in some input (usually refered to as argument) and returns some result from processing whatever that input was. In Python and other programming languages, there are two kinds of functions; those that already exist and are preloaded in the language (built-in funtions) and the ones the user can has written by himself (user-defined functions). We will take a look at some useful built-in functions and later on in the course we will learn to write our own. 

_Packages_

From now on we will be working quite a bit with what we will refer to as packages. The `datascience` or the `numpy` packages are just two examples of packages that you will be using extensively in this class. You can think of those as collections of `functions`, collections of code that given some input, allow you to automatically perform some operation. In the beginning of every notebook you will typically find a cell consisting of code that looks like this

`import numpy as np`

`from datascience import *`

`import matplotlib.pyplot as plt`

`import math`

Although you don't really need to know how these statements work, it is good to have an understanding of why they are there. We talked about built-in functions and briefly mentioned user-defined functions, however there are also functions that other people wrote and which are made available through packages such as numpy, math or others. In order to use those, we have to refer to them not just by their name but also by their package. So you can use a package function with `<package_name>.<function_name>` after you do `import <package_name>` or, alternativey you can directly import that function from the package and refer to it by its name. For example, I can do `from <package_name> import <function_name>` and then call the function directly with `<function_name>`.

In [None]:
#Built-in functions
#These are examples of some built-in functions that you could find yourself using in the class

#abs
abs1 = abs(-6) #take the absolute value of a number (float or integer)
abs2 = abs(1.329)
abs1, abs2

In [None]:
#round
#The round function is used to round float numbers. It takes two arguments (input variables).
#The first one is the number we want to round and the second is the number of decimals we want to keep after we round
# e.g round(<some float>, 3) round the float to a 3 decimal place float
round_to_integer = round(1.987) # same as round(1.987, 0)
round_to_first_decimal = round(1.8633, 1)
round_to_second_decimal = round(1.9873737363, 3)
round_to_integer, round_to_first_decimal, round_to_second_decimal

In [None]:
#int vs round
#The function int() is only keeps the integer part of a float number. It should not be confused with the round function!
# Note the difference
print(int(1.65))
print(round(1.65))

In [None]:
#abs
#The function abs() takes the absolute value of a number
print(abs(-2))
print(abs(2.0))

In [None]:
# max and min
max_of_2 = max(1,2)
max_of_3 = max(1,2,3)
min_of_2 = min(1,2)
min_of_3 = min(1,2,3)

max_of_2, max_of_3, min_of_2, min_of_3

As you can imagine this is not an exhaustive list of all the built-in functions in Python but these are good for now. We will learn about others later.

In [None]:
#importing functions from packages
#Let's use some functions from the math package
import math

math.log(1)

In [None]:
#An equivalent way of writing the above would be 
from math import log

log(1)

<a id="arrays"></a>
**Arrays**

Although there are more types of sequences like lists and tuples, the only sequence we will be using in this class is the numpy array. We refer to it as the numpy array because it is the sequence that is used by that package. To create an array, use the function make_array. For example `my_array = make_array(1,2,3)`. Now let's explore the properties of arrays.

In [None]:
from datascience import * #Run this cell!

In [None]:
my_array = make_array(1,2,3) #This is how I make an array with elements 1,2 and 3. They don't have to be integers.
names = make_array('Tom', 'Tonny', 'Trevor') #An array of strings
my_array, names

In [None]:
#The reason we use arrays over other types of sequences is that they make it really easy to perform operations on them
#P.S. You can also think (and use) numpy arrays like vectors in linear algebra.
array1 = make_array(1,2,3)
array2 = make_array(3,2,1)
twos = make_array(2,2,2)
#In fact we can add arrays! 
#This is element-wise addition: each element in array1 is added to the corresponding element of array2
array1+array2

In [None]:
# Subtracting arrays (element-wise subtraction)
array1-array2

In [None]:
#Multiplying arrays (element-wise multiplication)
array1*array2

In [None]:
# Or even raise one array to the power of another!
array1**twos # Note that each element is raised to the second power! (The message dytpe = ... tells us the type of the values.)

In [None]:
# Compare all elements in an array with a number - returns an array of booleans
print(array1==1) #Asks the question to each element: Are you equal to 1
print(array1<3) # Asks the question to each element: Are you less than 3

In [None]:
#We can also do operations between arrays and numbers for example:
result1 = array1+1 #Adds 1 to every element in the array
result2 = array1*2 #Multiplies every element in the array by 2
result1, result2

In [None]:
#Array indexing - A way of selecting an element from an array
a = make_array(4,2,1,6,3)

#How do I extract the first element from array a?
first_element = a.item(0)
second_element = a.item(1)
print(a)
print("First element is: " + str(first_element))
print("Second element is: " + str(second_element))

#The reason we use .item(0) for the first element and not .item(1) is simply a Python convention.
#We say that Python is a 0-indexed language. Other languages have their first element in position 1.

Operating on numpy arrays:
Numpy contains many functions which we can use on arrays. Here are some of them along with some examples.

In [None]:
# Run this cell! It imports the numpy package.
import numpy as np # The 'as np' allows us to call np.<function_name> instead of numpy.<function_name> 

In [None]:
#np.sum - Sums all elements in an array.
ar = make_array(1,2,3,4)
total = np.sum(ar) # 1+2+3+4
total

In [None]:
#np.prod - Takes the product of all elements in an array.
ar = make_array(1,2,3,4)
total = np.prod(ar) # 1*2*3*4
total

In [None]:
#np.mean - Takes the mean (average) of all the elements in an array
scores = make_array(97, 85, 73, 90, 88) #Midterm scores corresponding to a section
section_average = np.mean(scores)
section_average

In [None]:
#np.diff - Returns an array of the differences between back to back elements in an array
violent_crimes = make_array(4, 8, 11, 5, 7, 3, 1, 0, 10, 13, 6, 9) # number of violent crimes per month in the city of Berkeley
monthly_change = np.diff(violent_crimes) # make_array(8-4, 11-8, 5-11, ...)
monthly_change #monthly_change is the change in the number of crimes in consecutive months

In [None]:
#np.sort - Takes in an array and returns an array with the original array's values sorted from least to greatest.
out_of_order = make_array(3,4,20,0)
in_order = np.sort(out_of_order)
in_order

In [None]:
#np.count_nonzero - To be used with an array of booleans as input. Counts the number of True values in the array
dummy_array = make_array(1,4,2,5,3,4,5,8,2,6)

#find how many values in this array are less than 5
less_than_five = dummy_array < 5 
print(less_than_five)
n_less_than_5 = np.count_nonzero(less_than_five)
print("Number of items in the array that are less than 5 is: " + str(n_less_than_5))

For more numpy functions check out the course textbook <a href="https://www.inferentialthinking.com/chapters/04/4/arrays.html">here</a>.

<a id="ranges"></a>
**Ranges**

_Textbook definition:_ A range is an array of numbers in increasing or decreasing order, each separated by a regular interval. 

To construct a range, we use numpy's `np.arange` function.

The structure for the the expression is `np.arange(<start>, <end>, <step>)`. (When the step is 1 we can omit the step and use `np.arange(<start>, <end>)`)

Example: `np.arange(1, 11)` (same as `np.arange(1, 11, 1)`) gives an array of numbers 1 through 10 in order
(1,2,3,4,5,6,7,8,9,10)

start: The first element of the range.

end: The ending boundary of the range. **Important:** end is **NOT** included in the range. 

step: By how much you want your range to increase.

**Note: ** You can also omit the start and step and only provide the end and just write something like `np.arange(5)` which will assume that you have set start=0, step=1 and end=5 

In [None]:
#make an array of all the numbers from 0 to 100
zero_to_100 = np.arange(101) # or zero_to_100 = np.arange(0,101) or zero_to_100 = np.arange(0,101, 1)
zero_to_100 #note that the argument is 101, but the number 101 is not included in the range

In [None]:
#make an array with all the multiples of 2 between 0 and 100 inclusive
even_0_to_100 = np.arange(0, 101, 2)
even_0_to_100

In [None]:
#make an array that goes 5, 4, 3, 2, 1
step_down = np.arange(5, 0, -1) #the step is -1
step_down

<a id="advanced_exe"></a>
**Advanced Programming Exercises**

1) Without using the built-in function max, find the maximum of 3 numbers a , b, c. Your output should be the same should be the same as the output of max(a,b,c). Do not use max.

**Hint: ** You could do that by using if statements but an easier way would be to look at the built-in functions for arrays.

In [None]:
a = int(input("a")) #The following three lines allow you to input values for a, b, and c!
b = int(input("b"))
c = int(input("c"))
maximum = ... #replace this with your answer
maximum

2) According to the Oakland Police Department End of Year Report (found <a href="http://www2.oaklandnet.com/oakca1/groups/police/documents/webcontent/oak062295.pdf">here</a>) the historical total number of crimes per year in Oakland is the following.

| 2012   | 2013   | 2014   | 2015   | 2016   |
|--------|--------|--------|--------|--------|
| 33,685 | 33,965 | 31,612 | 31,470 | 29,919 |

Without directly typing in a number, find how many years saw more crime than 2012.

In [None]:
total_crimes_by_year =  make_array(33685, 33965, 31612, 31470, 29919)
num_years = ...
num_years

3) Find all years of Presidential Elections in the United States between 1788 and 2016. How many years does this span?

In [None]:
pres_elections = ...
yearspan = ...
pres_elections, yearspan

4) The formula for compund interest is A = $$P(1 + r/n)^{(n*t)}$$ where P is the starting amount, r is the interest rate (as a decimal), n is the number of compoundings per year, and t is the time passed in years.

Let P = 1000, r = 0.05, and n = 2. Make an array that represents A over 10 years.

In [None]:
time_passed = make_array(1,2,3,4,5,6,7,8,9,10)
amount_of_money_per_year = ...
amount_of_money_per_year

5) You find a store that keeps discounting their items by a certain percentage each day for 10 days. How much will this $1000 item cost on each of the 10 days?

In [None]:
discounts = make_array(5, 10, 4, 8, 6, 4, 9, 20, 4, 12) #Percentage of discount each day

# Hint: First make the array of discounts into an array that represents the discounts as a decimal. 
# Then find what proportion of the total price each discount represents (i.e. a 5% discount means it 
# represents 95% of the total price of the product). Then you can calculate the price of the item each day

discounts_as_decimals = ...
discounts_as_proportion_of_total_price = ...
price_by_day = ...
price_by_day

6) You're at a grocery store. Apples are 1 dollar each, oranges are 2 each, and watermelons are 6 each. You decide you want to buy 3 apples, 5 oranges, and 28 watermelons. Without doing any arithmetic, find the total cost of your fruit purchase.  

In [None]:
prices = ...
amounts = ...
total = ...
total

If you are done with the exercises and you are looking for more of a challenge, take a look at this <a href="http://www.practicepython.org/">page</a> for some more advanced exercises. (A lot of the material you don't have to know for this class)

<a id="table"></a>
# Tables

Tables are an important tool in data science that allow individuals to visualize certain types of data. In order to create them, there is a beautiful tool in the datascience module called Table(). Here it is below!

In [None]:
Table()

Calling this function will return an empty table. This empty table is a useful starting point for this class because it can be extended to contain rows and columns. Let's try that now!

Let's begin with a table that represents the wonderful instructors of Data 8. Suppose we want a table with one column, labeled "Instructors", containing the entries "John" and "David".

In [None]:
array_of_instructors = make_array('John', 'David') #with_column can take an array as an argument

table_of_instructors = Table().with_column('Instructors', array_of_instructors)
table_of_instructors

Notice the syntax when adding columns. Remember that if you want more tha one entry in in the column, you will need to put an array as the values to the table. 

However, a table with only one column is of little use to us as data scientists. Luckily, we can add additional columns to tables we've already created, as well as build tables with multiple columns with our original command.

In [None]:
height = ('Very Tall', 'Not as tall', )
table_of_instructors.with_column('Height', height)

In [None]:
table2 = Table().with_columns('Instructors', array_of_instructors, 
                              'Height', height)
table2

<a id="table_manip"></a>
**Table Manipulation**

Suppose we have a table of names of individuals, their favorite color, their favorite number, and favorite subject! I will manually construct the table below, so you can just ignore the following cell.

In [None]:
name = make_array('Henry', 'Mark', 'Michael', 'Sarah', 'Michelle', 'Chelsea', 'Alyssa', 'Kevin', 'Anna')
color = make_array('red', 'blue', 'green', 'orange', 'red', 'purple', 'blue', 'orange', 'blue')
number = make_array(18, 3, 8, 12, 19, 13, 15, 65, 2017)
subject = make_array('data science', 'data science','data science','data science','data science','data science','data science','data science','data science')
table = Table().with_columns('Name', name,
                          'Favorite Color', color,
                          'Favorite Number', number,
                          'Favorite Subject', subject)
table

Suppose we decide it is useless to maintain the "Favorite Subject" as a column because it is common knowledge that data science is the best subject ever. We can delete the column by simply selecting all the columns that aren't the last column.

In [None]:
table1 = table.select(0,1,2) #Remember that indices in Python start at 0!
table1

Now, suppose we wanted to filter this table only for entries where the individuals favorite color is blue. Luckily, the built in where function does this for us!

In [None]:
table1.where('Favorite Color', 'blue')

Moreover, suppose we wanted to find name of the individual with the single largest favorite number. We can do this by first sorting the table by the values of the favorite number, and then selecting the column with their names, and selecting the first item in the column.

In [None]:
name_of_largest = table.sort('Favorite Number', descending = True).column('Name').item(0)
name_of_largest

Remember that when we call .column, we get returned an array. This is important because we can now perform array arithmetic on column values! 

Suppose we wanted to find an interesting statistic. What is the sum, average, and minimum of all the individuals favorite numbers?

In [None]:
average = np.average(table.column('Favorite Number'))
summation = sum(table.column('Favorite Number'))
minimum = min(table.column('Favorite Number'))

average, summation, minimum

# Table Practice Problems

UC Berkeley recently performed a survey of 10 students asking for their height. We have provided a list the students in an array named students, along with their height in an array called height. Build a table height with a column "Name" displaying their name and a column "Height" displaying their height..

We have provided an array of their names as well as heights.

In [None]:
students = make_array('Henry', 'Mark', 'Michael', 'Sarah', 'Michelle', 'Chelsea', 'Alyssa', 'Mark', 'Anna', 'Devin')
heights = make_array(60, 70, 68, 65, 73, 70, 58, 72, 64, 75)
survey = Table().with_columns('Name', students,
                           'Height', heights)
survey

1) For kicks, write a line of code that outputs a table containing only the names of the students.

In [None]:
names = ...
names

2) Let's see what we can find about the data set! Suppose we are interested in who is the tallest individual out of the 10 interviewed students. Write code that outputs the tallest individual in the table.

In [None]:
tallest = ... #Write code in order to find this! Tables will soon become too big to manually inspect.
tallest

3) While finding outliers (max, min) in the data is important, another question that often comes up in statistical analysis is the average. Use your knowledge of tables and arrays to assign average to the value of the average height of the students.

In [None]:
average = ...
average

4) Another important statistic to look for is the standard deviation. You will learn more about the process of searching for standard deviation and its properties down the line, but for now, assign the name sd to the standard deviation of all the scores.

In [None]:
sd = ...
sd

5) Devin gets ahold of the data, and wonders how his height matches up to the height of his fellow students. He is a pretty tall person, so he suspects that his height will be above the average. Help him! Assign the name devin to be the number of inches that devin is taller than the average.

In [None]:
devin = ...
devin

6) Finally, an instructor looks at the data and decides that they believe having the name Mark will make a person grow taller. Find the difference in the _average of height's of individuals named Mark_ and the _average of the height of the rest of the individuals_. Assign this to the name mark_difference.

In [None]:
mark_difference = ...
mark_difference

7) Can one conclude that having the name Mark will cause a person to grow taller? Why or why not?

(Fill in answer here)

(Notebook developed by Vasilis Oikonomou with Adnan Hemani, Niraj Rao and Erik Cheng in Spring 2017

Revised by Claire Zhang and Howard Ki in Fall 2017)