### Let's Review!

"Running" a cell is similar to pressing 'Enter' on a calculator once you've typed in an expression. After you've typed your code into a cell, running the cell will produce the output. When you run a code cell it computes all of the expressions contained within the cell.

To run a code cell, you can do one of the following:
- press __Shift + Enter__
- click __Cell -> Run Cells__ in the toolbar at the top of the screen.

You can navigate the cells by either clicking on them or by using your up and down arrow keys. Try running the cell below to see what happens. 

In [26]:
print("Hello, World!")

Hello, World!


The input of the cell consists of the text/code that is contained within the cell's enclosing box. Here, the input is an expression in Python that "prints" or repeats whatever text or number is passed in. 

The output of running a cell is shown in the line immediately after it. Notice that markdown cells have no output.

### Expressions

An expression is a combination of numbers, variables, operators, and/or other Python elements that the language interprets and acts upon. You can think of expressions as a set of __step-by-step instructions__ for Python to follow in order to produce a specific output. To edit a code cell, simply double click on it and make your changes! 

In [27]:
# Replace the word 'friend' below with your name. Then run the cell. 

print("Welcome to Jupyter notebooks, friend.")

Welcome to Jupyter notebooks, friend.


Code cells can be used evaluate arithmetic expressions. Below are a few basic examples of what Python can do!

In [28]:
# Addition
20+20

40

In [29]:
# Multiplication
10*8.5

85.0

In [30]:
# Division
625/25

25.0

In [31]:
# Exponents
4**2

16

In [32]:
# A series of arithmetic operations
(2-4*5+7) + 18**2

313

Note that code that begins on a line with a # (hashtag) is not run, so we use these lines to add comments or notes on our code. Here's an example.

In [33]:
# 4**2

### Python Variables

Aside from numbers, Python has **variables**, names that can act as placeholders for certain values. For example, let the variables `x` and `y` equal 10 and 9999.999, respectively. This action is called "defining a variable".

In [34]:
x = 10
y = 99999

Notice that assigning a number to a variable name such as `x` produces no output.

Now, we can use the variables `x` and `y` in expressions.

In [35]:
10 + 99999

100009

In [36]:
x + y

100009

Now what happens when the value of `x` changes?

In [37]:
x = 12

Then, the value of the expression also changes.

In [38]:
x + y

100011

**This is why the order in which you run code cells is important.** The expression `x + y` can yield different results depending on which cells you ran before.

What happens if you try to use a variable without assigning it to a value first?

In [39]:
x + y + z

102030

You'll see that Python outputs a `NameError`. Python tried to find the value of `z`, but `z` hadn't been defined yet!

**Important:** If you see this error again in this notebook or in future notebooks, it is an indication that you might not have run all the previous cells or that you might be using variables without assigning values to them first.

In [40]:
# Defining z here
z = 2019

In [41]:
# Good to go
x + y + z

102030

### Variable Types
As you saw in the examples above, two common types of variables are __integers__ (positive and negative whole numbers), and __decimals__ (positive and negative decimal numbers).

Another important type of variable is a __string__. Strings are sequences of characters, such as words or sentences. Strings are always surrounded by quotes. For example, `"Sociology"` is a string becuase it is surrounded by quotes, but `berkeley` and `1868` are not. 

In [42]:
# String
"Sociology"

'Sociology'

In [43]:
# The variable subject is a string
subject = "Sociology"

# The variable berkeley is an integer
berkeley = 1868

### Arrays

An Array is a special type of variable that can hold more than one value or variable at a time. You can think of it as a "list" or collection of values that you can use to store multiple values into one single variable.

Run the following code cell. You do not have to know what it does, but at a high level, it is simply defining variables and functions that we'll be using later.

In [44]:
# Run and ignore
import numpy as np
array = lambda *args: np.array(args)

Now, let's create our first array. To create an array, pass in the arguments into the parenthesis right next to the variable `array`.

In [45]:
array(1, 2, 3)

array([1, 2, 3])

Let's create an array consisting of the variables `x`, `y`, and `z` that we used earlier.

In [46]:
array(x, y, z)

array([   12, 99999,  2019])

Arrays are extremely useful because they allow us to perform calculations on the values stored inside the array. For example, let's take the mean of the variables `x`, `y`, and `z`.

In [47]:
var_array = array(x, y, z)
np.mean(var_array)

34010.0

## Specialized Functions

Now that we've covered the basics, let's go over some functions you'll encounter on the project.

#### Barchart Creation:
To create a barchart, use the barchart function. This function takes in the categorical variable first, the frequency of each category second, the x label third, y label fourth, the title of the graph fifth, and the filename for saving.

In [48]:
#Example:
from functions import *
categories = array("gruyere", "brie", "cheddar", "provolone")
frequencies = array(10 , 30, 100, 60)
barchart(categories, frequencies, "Cheese Type", "Popularity", "Cheese Popularities", "cheese_chart")

FileNotFoundError: [Errno 2] No such file or directory: 'Output/cheese_chart.png'

#### Histogram Creation:
To create a histogram, use the histogram function. This function takes in the array of numerical values first, the x label second, y label third, the title of the graph fourth, and the filename for saving.

In [51]:
#Example
dice_distribution = array(1,1,1,3,3,3,3,3,3,3,2,2,2,4,4,4,4,5,5,5,5,6,6,6,6,6)
histogram(np.array(6), "Dice Rolls", "ads", "sdjf")

FileNotFoundError: [Errno 2] No such file or directory: 'Output/histogram.png.png'

#### Filtering Values:
To filter values from a table, use the filter_values function. This function takes in the table first, the column with the values to be removed second, and an array of values to remove third.

In [60]:
#Example
grade_and_score = Table().with_columns([
        'letter', array('a', 'b', 'c','d','f', 'i'),
        'count',  array( 9,   10,   7,  5,  4,  1),
        'points', array(10,   8,   6,  4,  2,  0),
        ])
#print(grade_and_score)
filter_values(grade_and_score, 'count', array(9,1))

letter,count,points
b,10,8
c,7,6
d,5,4
f,4,2


#### Creating Categories:
To create a categorical variable from another column, use the create_categories function. This function takes in the table first, the column with the values to be "categorized" second, and the endpoints of each category last.

In [62]:
#We can use the table from above to create a categorical variable for count
create_categories(grade_and_score, 'count', array(0,5,10))

letter,count,points,count_group
f,4,2,0 - 4
i,1,0,0 - 4
a,9,10,5 - 10+
b,10,8,5 - 10+
c,7,6,5 - 10+
d,5,4,5 - 10+


#### Cross-Tabulation:
To create a cross-tabulation of a table between, use the cross_tab function. This function takes in the table first, the column to be split into individual columns second, and the column to use for row values last.

In [70]:
#This shows the relationship between letter grades and point values.
cross_tab(grade_and_score, 'letter', 'points')

points,a (letter),b (letter),c (letter),d (letter),f (letter),i (letter),total (letter)
0,0,0,0,0,0,1,1
2,0,0,0,0,1,0,1
4,0,0,0,1,0,0,1
6,0,0,1,0,0,0,1
8,0,1,0,0,0,0,1
10,1,0,0,0,0,0,1
total,1,1,1,1,1,1,6


#### Grouping Tables:
To find the count of each column's values, we can group the table by that value. We call the .group function on the table, using the column of interest as the parameter.

In [68]:
#This shows the count of each letter in the 'letter' column
grade_and_score.group('letter')

letter,count
a,1
b,1
c,1
d,1
f,1
i,1
