## Notebook 1: Introduction to Jupyter Notebooks and Python

## **Jupyter Notebooks**

Welcome to a Jupyter Notebook! **Notebooks** are documents that support interactive computing in which code is interwoven with text, visualizations, and more.

The way notebooks are formatted encourages **exploration**, allowing users to iteratively update code and document the results. In use cases such as **data exploration and communication**, notebooks excel. Science (and computational work in general) has become quite sophisticated: models are built upon experiments that are conducted on large swaths of data, methods and results are abstracted away into symbols, and papers are full of technical jargon. *A static document like a paper might not be sufficient to both effectively communicate a new discovery and allow someone else to discover it for themselves*.

<div class="alert alert-block alert-danger">
    <p style="font-size:20px">In this notebook, there are some more advance topics that are <i>"optional"</i>. This means you can just read over these sections, don't worry about fully understanding these parts unless you are really interested. They may be useful later in the course, but for now they are not necessary, so feel free to just skim these parts!
</div>

<div class="alert alert-block alert-success">
    <p style="font-size:20px">This section is advanced/optional
</div>

<hr style="border: 2px solid #003262">
<hr style="border: 2px solid #C9B676">

## Learning Outcomes
Working through this notebook, you will learn about:
1. The history behind **Jupyter notebooks** and why they are used in computing
2. How a Jupyter notebook is **structured** and how to use them
3. Python fundamentals and working with **tabular data**

<hr style="border: 2px solid #003262">
<hr style="border: 2px solid #C9B676">

## A Brief History
The Jupyter Notebook is an _interactive computational environment_ that supports over **40** different programming languages, but was first released as a web-based interface for IPython in 2011. [Fernando Perez](https://bids.berkeley.edu/people/fernando-p%C3%A9rez), a professor in the Statistics department here at UC Berkeley, created [IPython](https://bids.berkeley.edu/research/ipython) as a graduate student in 2001 and co-founded [Project Jupyter](https://bids.berkeley.edu/research/project-jupyter) in 2014. 

Project Jupyter's name is a reference to the three core programming languages supported by Jupyter, which are `Julia`, `Python` and `R` (`"ju"` from `"Julia"`, `"pyt"` from `"Python"`, and `"er"` from `"R"`; all together you get `"ju" + "pyt" + "er" = "jupyter"`). 

The word notebook is an homage to Galileo's notebooks in which he documents his discovery of the the moons of Jupiter (the planet).

Though the Jupyter Notebook interface has been around only about a decade, the first notebook interface, [Mathematica](https://www.mathematica.org/), was released over 30 years ago in 1988. Other notebook interfaces have been released since then, but none have gained as much traction as Jupyter Notebooks have. Starting in the early 2000s, open-source scientific tools were becoming more and more popular and with the widespread popularity of Jupyter, it's well on its way to becoming a standard for sharing research methodology and results.

<img src="assets/mathematica.png" alt="Early Mathematica Interface" style="width: 350px;" class="center"/><br>
The early Mathematica Interface.

## Why (Jupyter) Notebooks?
Notebooks are used for [*literate programming*](https://en.wikipedia.org/wiki/Literate_programming), a programming paradigm introduced by [Donald Knuth](https://en.wikipedia.org/wiki/Donald_Knuth) in 1984, in which a programming language is accompanied with a documentation language, or a natural language. In other words, the computer program has an explanation in a natural language. 

This approach to programming effectively treats software as works of literature ([Knuth](http://www.literateprogramming.com/knuthweb.pdf), "Literate Programming"). _It supports people to have a strong conceptual map of what is happening in the code and also to have clarity on the flow and logic of the code/program, which is helpful for both the writer and the reader_.

Jupyter leverages this idea and enables users to create and share documents that combine code, visualizations, narrative text, equations, and rich media. Notebooks are multipurpose and can be used in any discipline. The notebook is like a laboratory notebook, but for computing. 

Researchers can write code to work with their data while supplementing their methods with explanations, analysis, or hypotheses. Notebooks are also used in education because they enable students to engage with content presented in different forms, experience computation with no prior experience, and practice programming in a scaffolded way. 

<hr style="border: 2px solid #003262">
<hr style="border: 2px solid #C9B676">

## Notebook Structure

### Cell Types
A notebook is composed of rectangular sections called **cells**. There are 2 kinds of cells: markdown and code. 
- A **markdown cell**, such as this one, contains text. 
- A **code cell** contains code in Python, a programming language that we will be using with all of our data science modules in this class. You can select any cell by clicking it once.
    - Code cells can also contain code in other languages like Julia, or R: both of which can also be used in data analysis

### Running Cells
To "run" a code cell (i.e. tell the computer to perform the programmed instructions in the cell), select it and then,
- Press `Shift` + `Enter` to run the cell and move to the following cell
- Press `Command/Control` + `Enter` to run the cell but stay on the same cell
    - This can be used repeatedley to re-run the same process
- Click the Run button in the toolbar at the top of the screen. 

### Results and Outputs of a Cell
When you run a code cell, a number of things can happen, depending on the type and contents of the cell:
1. If the cell is a markdown cell, the text will be rendered according to the structure (_Markdown_, _HTML_, etc.)
2. If the cell is a code cell, the result of the last line in the cell will be shown
    - This output may be text, a number, a picture, or a visualization

If a code cell is running, you will see an asterisk (\*) appear in the square brackets to the left of the cell. Once the cell has finished running, a number will replace the asterisk and any output from the code will appear under the cell.

Let's try it! **Run the cell below to see the output.** Feel free to play around with the code -- try changing 'World' to your name.

In [56]:
print("Hello World!") # Run the cell by using one of the methods we mentioned above!

Hello World!


### Comments
You'll notice that many code cells contain lines of blue text that start with a `#`. These are ***comments***. Comments often contain helpful information about what the code does or what you are supposed to do in the cell. The leading `#` tells the computer to ignore whatever text follows it.

<hr style="border: 2px solid #003262">
<hr style="border: 2px solid #C9B676">

## Editing the Notebook

You can change the text in a markdown cell by clicking it twice. Text in markdown cells is written in [**Markdown**](https://daringfireball.net/projects/markdown/), a formatting language for plain text, so you may see some funky symbols should you try and edit a markdown cell we've already written. Once you've made changes to a markdown cell, you can exit editing mode by running the cell the same way you'd run a code cell.

<blockquote>
    <div class="alert alert-block alert-warning">
        <b>
            Try double-clicking on this text to see what some Markdown formatting looks like.
        </b>
    </div>
</blockquote>

### Manipulating Cells

Another feature of Jupyter Notebooks is the ability to add and delete cells, whether that be code or markdown. You can add cells by pressing the plus sign icon in the menu bar. This will add (by default) a code cell immediately below your current highlighted cell.

To convert a cell to markdown, you can press 'Cell' in the menu bar, select 'Cell Type', and finally pick the desired option. This works the other way around too!

To delete a cell, simply press the scissors icon in the menu bar. A common fear is deleting a cell that you needed -- but don't worry! This can be undone using 'Edit' > 'Undo Delete Cells'! If you accidentally delete content in a cell, you can use `Ctrl` + `Z` to undo.

<h3>Shortcuts</h3>

<div class="alert alert-block alert-success">
    <p style="font-size:20px">This section is optional
</div>

Select a cell by clicking on the empty space to the left of the text (there will be a blue bar to the left of the cell at this point)
<ul>
<li>To <b>add a cell <i>below</i></b> the selected one, press the <code>b</code> key (<i>b for below</i>) </li>
<li>To <b>add a cell <i>above</i></b> the selected one, press the <code>a</code> key (<i>a for above</i>) </li>
<li>To <b>delete a cell</b>, press the <code>d</code> key (<i>d for delete</i>) </li>
<li>To <b>copy a cell</b>, press the <code>c</code> key (<i>c for copy</i>) </li>
<li>To <b>cut a cell</b>, press the <code>x</code> key (<i>same as the general cut text command</i>) </li>
<li>To <b>delete a cell</b>, press the <code>d</code> key <b><i>twice</i></b> (<i>d for delete, twice to ensure the action</i>) </li>
<li>To <b>paste a cell</b>, press the <code>v</code> key (<i>same as the general paste text command</i>) </li>
<li>To <b>convert a cell to a markdown cell</b>, press the <code>m</code> key (<i>m for markdown</i>) </li>
<li>To <b>convert a cell to a code cell</b>, press the <code>y</code> key </li>
</ul>

### Saving and Loading

Your notebook will automatically save your text and code edits, as well as any graphs you generate or any calculations you make. However, you can also manually save the notebook in its current state by using `Ctrl` + `S`, clicking the floppy disk icon in the toolbar at the top of the page, or by going to the 'File' menu and selecting 'Save and Checkpoint'.

Next time you open your notebook, it will look the same as when you last saved it!

<blockquote>
    <div class="alert alert-block alert-info">
        <b>Note:</b> When you load a notebook you will see all the outputs from your last saved session (such as graphs, computations, etc.) but you won't be able to use any of the variables you assigned in your code without running it again.
    </div>
</blockquote>

An easy way to "catch up" to the last work you did is to highlight the cell you left off on and click "Run all above" under the Cell tab in the menu at the top of the screen.

<hr style="border: 2px solid #003262">
<hr style="border: 2px solid #C9B676">

## Python Basics
[**Python**](https://www.python.org/) is a programming language – a way for us to communicate with the computer and give it instructions.

Just like any language, Python has a set vocabulary made up of words it can understand, and a syntax which provides the rules for how to structure our commands and give instructions.

### Math
Python is a great language for math, as it is easy to understand, and looks very similar to what it would look like in a regular scientific calculator!

- `+` Acts as the addition operator
- `-` Acts as the subtraction operator and can also act as a negative sign if next to a number (e.g., `-2` vs `- 2`)
- `*` Acts as the multiplication operator
- `**` Acts as the exponentiation operator
- `/` Acts as the division operator
- `()` Acts as the grouping operator

There are two types of numbers in python: Integers, also known as `int`, (e.g., `4`, `7`, `15`, `2354`) and decimal numbers, aka `float` (e.g., `13.0`, `14.5`, `2.731`, `3.1415`). 

When using the `/` operator, even if the result is a whole number, the result will be a `float`. For example, `10 / 5` is `2.0`, not `2`.

> There are other, more advanced operators such as [floor division](https://www.pythontutorial.net/advanced-python/python-floor-division/) and [modulo](https://realpython.com/python-modulo-operator/), however, we won't be using those for the time being, so feel free to explore if you'd like, but this is very out of scope for this material

Let's look at some examples of using these operators. As usual, feel free to play artound with these cells or even add new ones to explore how these operations work!

In [None]:
1 + 3

In [None]:
1 + 3 - 4 + 5

In [None]:
1 + 10 / 2 + 7

In [None]:
3 * 4

In [68]:
5 ** 2

25

In [69]:
2 ** 3

8

In [70]:
4 ** .5

2.0

In [None]:
6 + 4 * 2 - 15

In [None]:
(6 + 4) * 2 - 15

In [None]:
5 * 10 / 10

In [None]:
2 - 100 * -2

### Strings
Strings are what we call words or text in Python. A string is surrounded in either single ('') or double ("") quotes. Here are some examples of strings

In [None]:
"This is a string"

In [None]:
'This is too'

In [None]:
"eVeNtHiSiSaStRiNg"

You can even do math with strings!
<div class="alert alert-block alert-success">
    <p style="font-size:20px">These examples are advanced/optional
</div>

In [None]:
'Ha' * 10

In [None]:
"Ha" + " " + "Ha" + "     " + "Ha"

### Errors
Errors in programming are common and totally okay! Don't be afraid when you see an error because more likely than not the solution lies in the error code itself! Let's see what an error looks like. **Run the cell below to see the output.**

In [None]:
print('This line is missing something.' #we are missing a closing parentheses here!

The last line of the error output attempts to tell you what went wrong. 

The *syntax* of a language is its structure, and this `SyntaxError` tells you that you have created an illegal structure.  

`EOF` means "End of File," so the message is saying Python expected you to write something more (in this case, a right parenthesis) before finishing the cell.

There's a lot of terminology in programming languages, but you don't need to know it all in order to program effectively. If you see a cryptic message like this, you can often get by without deciphering it.  (Of course, if you're frustrated, you can usually find out by searching for the error message online.)

### Variables

As we mentioned before, in this Jupyter Notebook you will be assigning data, figures, numbers, text, or other objects to **variables**. 

You can even assign graph output or functions to variables, but that is out of scope for this assignment so don't worry about it for now! 

Variables are stored in a computer's memory and you can use it over and over again in future calculations!

Sometimes, instead of trying to work with raw information all the time in a long calculation you will want to store it as a **variable** for easy access in future calculations. **Check out how we can use variables to our advantage below!**

<div class="alert alert-block alert-warning">
<b>Note:</b> In Python, variable names must be a combination of letters (capital and/or lowercase), numbers, and underscores ( _ ). <b>Variable names <i>cannot</i> begin with a number</b>
</div>

- The following are all valid variable names:
    - `pants`, `pan_cakes`, `_`, `_no_fun`, `potato940`, `bowser_32`, `FOO`, `BaR`, `bAr`
    
- These are invalid names: 
    - `123`, `1_fun`, `f@ke`, `fun time`, `fun_times!!`, `00f00`

## Assignment Statements
Here is an example of what we call an _assignment statement_, what we use to create a variable. Here is an example of an assignment statement:

In [36]:
x = 1 + 2 + 3 + 4 

The first part of the statment is the **variable name**, in this case `x`. Then we write an equals sign (`=`). The final part is the left hand side, **the value** is the value of the variable, in this case it is `1 + 2 + 3 + 4`. 

Also, notice that there was no output; an assignment statement has no output value! But, if we run a cell with the variable name itself, we can see the output, which is the value of `x`.

In [40]:
x #just run this cell

10

You can also chain variable assignments, as follows:

In [41]:
y = (x + 3 + x + 2) * 4
y

100

### Variable Examples
Let's look at a couple examples of when using variables can help us immensely!

#### Example 1: Seconds in a Year
Let's say we want to find out how many seconds are in a year. We could calcluate it raw as following: $$60 \cdot 60 \cdot 24 \cdot 365$$ However, someone reading this may not understand it, and we might want to use this information for further calculations. Let's see how we can do this using variables!

In [9]:
days = 365 #the days in a year
hours = 24 #the hours in a day
minutes = 60 #the minutes in an hour
seconds = 60 #the seconds in a minute
seconds_per_year = days * hours * minutes * seconds #the seconds in one year
seconds_per_year

31536000

This method is far easier to understand, and we can use our new variable `seconds_per_year` to answer other questions! Lets say we wanted to find the number of seconds in half a year, $7$ years, $234$ years, or even $3.1415$ years as follows!

In [10]:
print("Seconds in half a year:", seconds_per_year / 2)
print("Seconds in seven years:", seconds_per_year * 7)
print("Seconds in two hundred and thirty-four years:", seconds_per_year * 243)
print("Seconds in 3.1415 years:", seconds_per_year * 3.1415)

Seconds in half a year: 15768000.0
Seconds in seven years: 220752000
Seconds in two hundred and thirty-four years: 7663248000
Seconds in 3.1415 years: 99070344.0


As you can see, we don't need to re-do the problem four times to get all the results, we just need to use the result we already calculated! (Cool right?)

#### Example 2: Equation of a Line
If you haven't done algebra in a while, for reference, the equation of a line is $$y = mx + b$$ We can use variables to easily calculate the $y$ for any $x$, and we can set $m$ (the slope) and $b$ (the intercept).

In [11]:
m = 1 #the slope
b = 5 #the intercept
x = 4 # try changing this value to see how the output changes!
y = m * x + b
print(f"On the line y = {m}x + {b}, at x = {x}, y is equal to {y}")

On the line y = 1x + 5, at x = 4, y is equal to 9


## Lists
Variable values may be more sophisticated. We can store multiple numbers under a single name if we make the value a list. The following cell stores 3 numbers in a list:

In [None]:
y = [4,9,16]

You can add values to a list or even take them out! This can be useful when you want to do a lot of calculations and keep them under one variable. See the following example for one such reson why lists are helpful. In this example, we add another element to the list, which is the sum of all of the numbers in the list already

<div class="alert alert-block alert-success">
    <p style="font-size:20px">This example is advanced/optional
</div>

In [14]:
running_totals = [1]
running_totals = running_totals + [sum(running_totals)]
running_totals = running_totals + [sum(running_totals)]
running_totals = running_totals + [sum(running_totals)]
running_totals = running_totals + [sum(running_totals)]
running_totals = running_totals + [sum(running_totals)]
running_totals = running_totals + [sum(running_totals)]
running_totals

[1, 1, 2, 4, 8, 16, 32]

<h2>Loops</h2>

<div class="alert alert-block alert-success">
    <p style="font-size:20px">This section is advanced/optional
</div>

That code above looks a little repetative, instead of typing that out, we can use what is called a <i>for loop</i>, to have the computer do it for us! The code in the cell below does the exact same thing as the cell above, just in a shorter way!

In [15]:
running_totals = [1]
for _ in range(6): #do the intented action 6 times
    running_totals = running_totals + [sum(running_totals)]
running_totals

[1, 1, 2, 4, 8, 16, 32]

## Functions
We've seen that values can have names (often called **variables**), but operations may also have names. A named operation is called a **function**. Python has some functions built into it.

In [16]:
round # a built-in function 

<function round(number, ndigits=None)>

If you Put the name of a function followed by a question mark (?), you can see some documentation on the function, such as what it us supposed to do!

In [17]:
round?

[0;31mSignature:[0m [0mround[0m[0;34m([0m[0mnumber[0m[0;34m,[0m [0mndigits[0m[0;34m=[0m[0;32mNone[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Round a number to a given precision in decimal digits.

The return value is an integer if ndigits is omitted or None.  Otherwise
the return value has the same type as the number.  ndigits may be negative.
[0;31mType:[0m      builtin_function_or_method


Functions get used in *call expressions*, where a function is named and given values to operate on inside a set of parentheses. The `round` function returns the number it was given, rounded to the nearest whole number.

In [18]:
round(1988.74699) # a call expression using round

1989

A function may also be called on more than one value (called *arguments*). For instance, the `min` function takes however many arguments you'd like and returns the smallest. Multiple arguments are separated by commas.

In [19]:
min(9, -34, 0, 99)

-34

### User-Defined Functions

<div class="alert alert-block alert-success">
    <p style="font-size:20px">This section is advanced/optional
</div>

One of the most useful features in python is the ability to define your own functions using what we call a `"def"` statement. 

Here is an example of one such function based on our earier example of seconds in one year:

In [23]:
def seconds(x):
    """Returns the number of seconds in `x` years"""
    days = 365
    hours = 24
    minutes = 60
    seconds = 60
    per_year = days * hours * minutes * seconds
    return per_year * x

Now we can use this function just like a built-in function!

In [24]:
seconds

<function __main__.seconds(x)>

In [25]:
seconds?

[0;31mSignature:[0m [0mseconds[0m[0;34m([0m[0mx[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Returns the number of seconds in `x` years
[0;31mFile:[0m      /var/folders/3k/k6qjt0ds37g89ylq_8dq2wwh0000gn/T/ipykernel_6474/789620005.py
[0;31mType:[0m      function


In [30]:
print("Seconds in one year:", seconds(1))
print("Seconds in 57 years:", seconds(57))
print("Seconds in 6.022 years:", round(seconds(6.022)))
print("Seconds since Berkeley was founded (the year 1868):", seconds(2022-1868))

Seconds in one year: 31536000
Seconds in 57 years: 1797552000
Seconds in 6.022 years: 189909792
Seconds since Berkeley was founded (the year 1868): 4856544000


### Practice

<div class="alert alert-block alert-warning">
<ul>
    <li>The <code>abs</code> function takes one argument (just like <code>round</code>)</li>
    <li>The <code>max</code> function takes one or more arguments (just like <code>min</code>)</li>
</ul>



Try calling <code>abs</code> and <code>max</code> in the cell below. What does each function do?

Also try calling each function <i>incorrectly</i>, such as with the wrong number of arguments. What kinds of error messages do you see?
</div>

In [None]:
... # replace the ... with calls to abs and max

<hr style="border: 2px solid #003262">
<hr style="border: 2px solid #C9B676">

### Getting Started setting up an Environment

Now that we've covered our bases with regards to the platform we'll be working on for this assignment, let's load some **libraries** we need to explore the data we are working with. Python **libraries** are extra packages we can load to help use tools that are not otherwise available. These can include visualization libraries such as `matplotlib` or numerical tools like `numpy`. You can see how we load these libraries below:

In [165]:
from datascience import * # This loads tools from the datascience library
import numpy as np # Loads numerical methods
import math, random #Loads math and random functions

import otter #This is so we can get a clean export PDF to turn in
generator = otter.Notebook()

For your reference, we'll break down one line of the large cell below. In this line:

```python
import numpy as np
```

We import the numpy library. Essentially, we're telling python that we want to use a specific set of functions that has been named "`numpy`". We then use the `as` keyword to specify that we want to use some other name to refer to "`numpy`"; in this context we told python we want to call it "`np`".

Lets say we want to use the `numpy.mean` function (this takes the mean of a list of numbers). Because we imported "`numpy`" as "`np`", we could now call `np.mean()`

Note: We name `numpy` as "`np`" because it is the standard name in the industry, however we could have said:

```python
import numpy as harrystyles
```
Which would've allowed us to run `harrystyles.mean()`. However, his would be confusing for a reader, so its considered best practice to use standard names such as "`np`" for "`numpy`", so that different people can all read your work and not be confused as to what function you are using or what module it came from.

The cell below runs what we just talked about. Feel free to play around and try things out!

In [193]:
import numpy as harrystyles
print("Harry Styles is NumPy?", harrystyles == np)
numbers = [1, 2, 3, 4, 5, 6, 7, 8]
print(f"Harry Styles' mean of {numbers}:", harrystyles.mean(numbers))
print(f"NumPy's mean of {numbers}:", np.mean(numbers))

# this line delete the silly import of `harrystyles` that we just made
del harrystyles

Harry Styles is NumPy? True
Harry Styles' mean of [1, 2, 3, 4, 5, 6, 7, 8]: 4.5
NumPy's mean of [1, 2, 3, 4, 5, 6, 7, 8]: 4.5


#### Dot Notation
Python has a lot of [built-in functions](https://docs.python.org/3/library/functions.html) (that is, functions that are already named and defined in Python), but even more functions are stored in collections called *modules*. Earlier, we imported the `math` module so we could use it later. Once a module is imported, you can use its functions by typing the name of the module, then the name of the function you want from it, separated with a `.`.

<div class="alert alert-block alert-info">
<p style='font-size:18px'><b>Tip:</b> If you type the name of a <i>module</i>, but can't remember the name of the function you're looking for, type a dot <code>.</code>, then press the <code>Tab</code> key to bring up an auto-complete menu to help you find the function you're looking for!
    </p>
</div>

In [163]:
math.factorial(5) # a call expression with the factorial function from the math module

120

Many math operations can be applied to lists. Try calling math.sqrt on y which you saved as the list [4, 9, 16].

Operations like math.sqrt output a list of the same length as the input. Some reduce a list to a single number: try calling sum() on [4,9,16].

<!-- BEGIN QUESTION -->

<div class="alert alert-block alert-info">
<b>Question 1:</b>
`math` also has a function called `sqrt` that takes one argument and returns the square root. Call `sqrt` on 16 in the next cell.
</div>

<!--
BEGIN QUESTION
name: q1
points: 1
manual: true
-->

In [164]:
... # Replace the ...with the call to use math.sqrt() to get the square root of 16

Ellipsis

**Answer here** *Double click to edit this markdown cell with your answer*

<!-- END QUESTION -->

### Random numbers and sampling
Random sampling plays a key role in data science. The random module implements functions for random sampling and random number generation. For example, the cell below generates a random integer between 1 and 50. 

Note that any whole number between 1 and 50 has an equal probability of being selected --- the sampling probabilities are uniform.

Try running this cell multiple times by holding `Command (mac)/Control (windows)` and pressing `Enter` repeatedly; notice how the output changes even though the code stays the same

In [162]:
random.randint(1,50)

81

<hr style="border: 2px solid #003262">
<hr style="border: 2px solid #C9B676">



## Tables

In most cases, when interacting with data you will be working with **tables**. In this section, we will cover how to examine and manipulate data using Python. 

**Tables** are the fundamental way we organize and display data. 
**Run the cell below to load a dataset.** We'll be working with this data in a future notebook.

In [183]:
prisons = Table.read_table("data/monthly_cdcr.csv") #Here we see an assignment statement
prisons

year,month,institution_name,population_felons,civil_addict,total_population,designed_capacity,percent_occupied,staffed_capacity
1996,1,VSP (VALLEY SP),2294,0,2294,1980,115.9,1980
1996,1,SCC (SIERRA CONSERVATION CENTER),322,0,322,320,100.6,320
1996,1,NCWF (NO CAL WOMEN'S FACIL),786,4,790,400,197.5,760
1996,1,CCWF (CENTRAL CA WOMEN'S FAC),2846,13,2859,2004,142.7,3224
1996,1,"CRC (CAL REHAB CTR, WOMEN)",91,703,794,500,158.8,842
1996,1,CIW (CA INSTITUTION FOR WOMEN),1690,36,1726,1026,168.2,1646
1996,1,WSP (WASCO SP),4475,62,4537,2484,182.6,4484
1996,1,SCC (SIERRA CONSERVATION CENTER),6010,0,6010,3606,166.7,5884
1996,1,SRTA (SANTA RITA CO. JAIL-RC),811,0,811,395,205.3,750
1996,1,RJD (RJ DONOVAN CORRECTIONAL FAC),4577,0,4577,2200,208.0,4566


This table is organized into **columns**, one for each category of information collected. You can also think about the table in terms of its rows, where each row represents all the information collected about a particular instance, in this case, different state prisons. By default only the first 10 rows are shown, but as you can see in the code we ran, we changed it to 5.

**Table Attributes**

Every table has **attributes** that give information about the table, such as the number of rows and the number of columns. Attributes you'll use frequently include `num_rows` and `num_columns`, which give the number of rows and columns in the table, respectively. These are accessed using something called **dot notation** which means we won't be using any parentheses like in our print statement (Hello World!) earlier.

In [184]:
prisons.num_columns # Get the number of columns

9

In [186]:
prisons.num_rows # Get the number of rows

9501

<!-- BEGIN QUESTION -->

<div class="alert alert-block alert-info">
<b>Question 2:</b>
Observe the output of the cell above. How many state prisons are included in our data set?
</div>

<!--
BEGIN QUESTION
name: Q2
points: 1
manual: true
-->

**Answer here** *Double click to edit this markdown cell with your answer*

<hr style="border: 2px solid #003262">
<hr style="border: 2px solid #C9B676">

## Notebooks in Practice

With proprietary software like *Mathematica*, users are supposed to trust the results returned and are unable to check the code. In contrast, *Jupyter* fosters **transparency** and hence encourages **reproducibility**, which refers to the ability to reproduce the results of a scientific study. Not only is the code behind the software available for anyone to inspect or tinker with, but code in the notebooks can also be examined or re-run to reproduce the results. 

[Theodore Gray](http://home.theodoregray.com/), the co-founder of [Wolfram Research](https://en.wikipedia.org/wiki/Wolfram_Research) who was also involved in creating the *Mathematica* interface, said about *Jupyter*,
>*"I think what they have is **acceptance from the scientific community** as a tool that is considered to be **universal**."*

In other words, *Jupyter Notebooks* support the computational work of researchers from different fields, from astronomy to psychology to literature, and therefore enable new ways for researchers in very different domains to **share research tools, methods, and learn from one another**.

The versatility of the Notebook also has important consequences for data science and the workflows that are involved when working with data in settings other than research, such as for **education and community science projects**. The process of working with data can be messy and nonlinear, which a Jupyter notebook can handle well because of its **flexibility** (though this messiness is often reflected in the resulting notebook!). 

The power of the notebook lies in its ability to include a **variety of media** with the computation as a means to maintain **accountability, integrity, and transparency** for both the author of the notebook and the audiences that you share your work with. 

**In a world in which algorithms and data analysis inform many aspects of life and where computation is getting more and more abstract, the ability to understand and reason about computational work is more important than ever.**