In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("lab0-gettingstarted.ipynb")

# Lab 0: Getting started

This lab is meant to help you familiarize yourself with using the LSIT server and Jupyter notebooks. Some light review of numpy arrays is also included.

### Objectives
* Keyboard shortcuts, running cells, and viewing documentation in Jupyter Notebooks
* Review functions, lists, and loops
* Review NumPy arrays: indexing, attributes, and operations on arrays

### Collaboration Policy

Data science is a collaborative activity. While you may talk with others about the course assignments, we ask that you **write your solutions individually** and do not copy them from others.

By submitting your work in this course, whether it is homework, a lab assignment, or a quiz/exam, you agree and acknowledge that **this submission is your own work and that you have read the policies regarding Academic Integrity**: https://studentconduct.sa.ucsb.edu/academic-integrity. The Office of Student Conduct has policies, tips, and resources for proper citation use, recognizing actions considered to be cheating or other forms of academic theft, and studentsâ€™ responsibilities. You are required to read the policies and to abide by them.

_**If you collaborate with others, we ask that you indicate their names on your submission.**_

## Jupyter

Jupyter notebooks are organized into 'cells' that can contain either text or codes. For example, this is a text cell.

Technically, Jupyter is an application/interface that runs atop a kernel -- a programming-language-specific independent environment in which code cells are executed. This basic organization allows for interactive computing with text integration.

Selecting a cell and pressing `Enter` will enter **edit mode** and allow you to edit the cell. From edit mode, pressing `Esc` will revert to **command mode** and allow you to navigate the notebook's cells.

In edit mode, most of the keyboard is dedicated to typing into the cell's editor. Thus, in edit mode there are relatively few shortcuts. In command mode, the entire keyboard is available for shortcuts, so there are many more. Here are a few useful ones:

1. `Ctrl` + `Return` : *Evaluate the current cell*
2. `Shift` + `Return`: *Evaluate the current cell and move to the next*
3. Saving the notebook: `s`
4. Basic navigation: up one cell `k`, down one cell `j`
5. `a` : *create a cell above*
6. `b` : *create a cell below*
7. `dd` : *delete a cell*
8. `z` : *undo the last cell operation*
9. `m` : *convert a cell to markdown*
10. `y` : *convert a cell to code*


Take a moment to find out what the following commands do:

* Cell editing: `x, c, v, z`
* Kernel operations: `i`, `0` (press twice)


In [67]:
# Practice the above commands on this cell




### Running Cells and Displaying Output


Run the following cell.  

In [68]:
print("Hello, World!")

In Jupyter notebooks, all print statements are displayed below the cell. Furthermore, the output of **only the last line** is displayed following the cell upon execution.

In [69]:
"Will this line be displayed?"

print("Hello" + ",", "world!")

5 + 3

### Viewing Documentation

To output the documentation for a function, use the `help()` function.

In [70]:
help(print)

You can also use Jupyter to view function documentation inside your notebook. The function must already be defined in the kernel for this to work.

Below, click your mouse anywhere on `print()` and use `Shift` + `Tab` to view the function's documentation. 

In [71]:
print('Welcome to this course!')

### Importing Libraries

In this course, we will be using common Python libraries to help us retrieve, manipulate, and perform operations on data. By convention, we import all libraries at the very top of the notebook. There are also a set of standard aliases that are used to shorten the library names. Below are some of the libraries that you may encounter throughout the course, along with their respective aliases.

In [72]:
import pandas as pd
import numpy as np

## Practice

Most assignments for this class will be given as notebooks organized into explanation and prompts followed by response cells; you will complete assignments by filling in all of the response cells. 

Many response cells are followed by a test cell that performs a few checks on your work. Please be aware that test cells don't always confirm that your response is correct or incorrect. They are meant to give you some useful feedback, but it's your responsibility to interpret the feedback -- please be sure to read and think about test output if tests fail, and make your own assessment of whether you need to revise your response.

Below are a few practice questions for you to familiarize yourself with the process. These assume familiarity with basic python syntax and the numpy package.

#### Question 1

Write a function `summation` that evaluates the following summation for $n \geq 1$:

$$\sum_{i=1}^{n} \left(i^3 + 5 i^3\right)$$

*Hint*: `np.arange(5).sum()` will generate an array comprising $1, 2, \dots, 5$ and then add up the elements of the array.

In [73]:
def summation(n):
    """Compute the summation i^3 + 5 * i^3 for 1 <= i <= n."""
    ...

In [None]:
grader.check("q1")

Use your function to compute the sum for...

In [77]:
# n = 2
...

In [78]:
# n = 20
...

#### Question 2

The core of numpy is the array. Let's use `np.array` to create an array. It takes a sequence, such as a list or range (remember that list elements are included between the square brackets `[` and `]`, such as `[1, 5, 3]`). 

Below, create an array containing the values 1, 2, 3, 4, and 5 (in that order) and assign it the name `my_array`.

In [79]:
my_array = ...

In [None]:
grader.check("q2_a")

Numpy arrays are integer-indexed by position, with the **first element indexed as position 0**. Elements can be retrieved by enclosing the desired positions in brackets `[]`. 

In [83]:
my_array[3]

To retrieve consecutive positions, specify the starting index and the ending index separated by `:`, for instance, `arr[from:to]`. This syntax is **non-inclusive of the left endpoint**, meaning that the starting index is *not* included in the output.

In [84]:
my_array[2:4]

In addition to values in the array, we can access attributes such as array's shape and data type that can be retrieved by name using syntax of the form `array.attr`. Some useful attributes are:

* `.shape`, a tuple with the length of each array dimension
* `.size`, the length of the first array dimension
* `.dtype`, the data type of the entries (float, integer, etc.)

A full list of attributes is [here](https://docs.scipy.org/doc/numpy-1.15.0/reference/arrays.ndarray.html#array-attributes).

In [85]:
my_array.shape

In [86]:
my_array.size

In [87]:
my_array.dtype

Arrays, unlike Python lists, **cannot store items of different data types**.

In [88]:
# A regular Python list can store items of different data types
[1, '3']

In [89]:
# Arrays will convert everything to the same data type
np.array([1, '3'])

In [90]:
# Another example of array type conversion
np.array([5, 8.3])

Arrays are also useful in performing *vectorized operations*. Given two or more arrays of equal length, arithmetic will perform **element-wise computations** across the arrays. 

For example, observe the following:

In [91]:
# Python list addition will concatenate the two lists
[1, 2, 3] + [4, 5, 6]

In [92]:
# NumPy array addition will add them element-wise
np.array([1, 2, 3]) + np.array([4, 5, 6])

#### Question 3

Given the array `random_arr`, assign `valid_values` to an array containing all values $x$ such that $2x^4 > 1$.

In [93]:
# for reproducibility - setting the seed will result in the same random draw each time
np.random.seed(42)

# draw 60 uniformly random integers between 0 and 1
random_arr = np.random.rand(60)

# solution here
valid_values = ...

In [None]:
grader.check("q3")

#### A note on `np.arange` and `np.linspace`

Usually we use `np.arange` to return an array that steps from `a` to `b` with a fixed step size `s`. While this is fine in some cases, we sometimes prefer to use `np.linspace(a, b, N)`, which divides the interval `[a, b]` into N equally spaced points.

`np.arange(start, stop, step)` produces an array with all the numbers starting at `start`, incremendted up by `step`, stopping **before** `stop` is reached. For example, the value of `np.arange(1, 6, 2)` is an array with elements 1, 3, and 5 -- it starts at 1 and counts up by 2, then stops before 6. `np.arange(4, 9, 1)` is an array with elements 4, 5, 6, 7, and 8. (It doesn't contain 9 because `np.arange` stops _before_ the stop value is reached.)

 `np.linspace` always includes **both end points** while `np.arange` will **not** include the second end point `b`. For this reason, especially when we are plotting ranges of values we tend to prefer `np.linspace`.

Notice how the following two statements have different parameters but return the same result.

In [97]:
np.arange(-5, 6, 1.0)

In [98]:
np.linspace(-5, 5, 11)

Check your understanding. Will `np.arange(1, 10)` produce an array that contains `10`? Add a cell below and check to confirm your answer.

# Submission


1. Make sure you **save the notebook** first, 
2. Then go up to the `Kernel` menu and select `Restart & Clear Output` (make sure the notebook is saved first, because otherwise, you will lose all your work!). 
3. Now, go to `Cell -> Run All`. Carefully look through your notebook and verify that all computations execute correctly. You should see **no errors**; if there are any errors, make sure to correct them before you submit the notebook.
4. Then, go to `File -> Download as -> Notebook` and download the notebook to your own computer. This is your backup copy.
5. Export the notebook as HTML using the HTML_embed option; print to PDF in chrome on A4 paper and upload to Gradescope.

---

To double-check your work, the cell below will rerun all of the autograder tests.

In [None]:
grader.check_all()