# STA 141B Lecture 2

The class website is <https://github.com/2019-winter-ucdavis-sta141b/notes>

## Today's References

* Python for Data Analysis, Ch. 1-3
* [Python Data Science Handbook][PDSH], Ch. 1
* [ProGit][], Ch. 1-2

The [Git cheatsheet](https://services.github.com/on-demand/downloads/github-git-cheat-sheet.pdf) is also helpful.

[PDSH]: https://jakevdp.github.io/PythonDataScienceHandbook/
[ProGit]: https://git-scm.com/book/

## Recap

In Tuesday's lecture:

* Jupyter Notebooks
* `print()`
* Python data types and `type()`
* Variables
* `def` to define functions
* Getting help with `help()`, `?`, `??`

Today:

* More about Jupyter
* Tuples, lists, and dictionaries
* Methods and attributes
* Control flow with `if`, `while`, `for`
* Modules
* git

## More about Jupyter

Jupyter breaks sections of the notebook into _cells_. You can choose the type of cell in the `Cell -> Cell Type` menu. Use "Code" for cells that contain code and "Markdown" for cells that contain text or images.

Code sells are set up to run Python code. When you open a Jupyter notebook, Jupyter runs a Python session called a _kernel_ in the background. Each time you run a code cell, the code is sent to the kernel, and then the results are printed in the notebook. The kernel maintains state between cells, so code you run in one cell can affect code you run in another cell.

__Caution!__ The state of the kernel depends on the order you run cells in, not the order cells appear in the notebook.

You can stop or restart the kernel using the `Kernel` menu. This is mostly useful when you want to cancel a computation.

Markdown cells allow you to input text and format it using the Markdown language. You can learn more about Markdown [here](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet).

## Python Data Structures

_Data structures_ are containers for data. Examples: vectors, lists, data frames

We'll use Python's built-in tuple, list, and dictionary data structures a lot.

### Tuples

A tuple is an ordered collection of values. Think of coordinates. Tuples are _immutable_, which means they can't be changed after they're created.

In [2]:
# Make a tuple with commas ,
x = 1, 3, 5

In [3]:
# Use parentheses ( ) for clarity
y = ("hi", 1, 3.7)

Three ways to get elements from a tuple:

In [5]:
# 1. Indexing with [ ]. Indexes start from 0, not 1
x[0] # first element

1

In [7]:
# 2. Slicing with [] and :. A slice a:b gets elements from a to b - 1.
x[0:2]

(1, 3)

In [9]:
# 3. Unpacking -- assign to a same-shape tuple of variables on the left-hand side
u, v, w = x # x is (1, 3, 5)
print(u)
print(v)
print(w)

1
3
5


In [8]:
# Tuples can't be changed
x[1] = 3

TypeError: 'tuple' object does not support item assignment

### Lists

A list is an ordered collection of values. Lists are _mutable_, which means they __can be changed__ after they're created.

Lists are less efficient but more flexible than tuples.

In [17]:
# Make a list with square brackets [ ]
x = [1, 3, 5]

In [11]:
# Three ways to get elements from a list (same as a tuple):
x[0]

1

In [12]:
x[1:3]

[3, 5]

In [14]:
u, v, w = x
print(u)
print(v)
print(w)

1
3
5


In [18]:
# Delete a list element with del
del x[1]
print(x)

[1, 5]


Lists use _reference semantics_, which means that if you assign a list to two different variables, there's still only one list in memory, and both variables refer to it.

As a result, changing the list with one variable changes the list for the other variable.

In [19]:
x = [1, 3, 5]
y = x
y[1] = 7

x

[1, 7, 5]

### Dictionaries

A dictionary (or `dict`) is a one-to-one map from _keys_ to _values_. In other words, you use a key to look up a value.

Dictionaries are mutable and use reference semantics.

In [29]:
# Make a dict with curly brackets { } and colons :
x = {"hello": 1, 3: 5}

Two ways to get elements from a dict:

In [22]:
# 1. Indexing with [ ]
x["hello"]

1

In [25]:
# 2. With the .get() method. We'll learn more about methods below.
# The .get() method gets an element OR returns a default value if the key can't be found.

x.get("hello", 10) # second argument (10) is default value if key is not in dictionary

1

In [26]:
x.get("goodbye", 10) 

10

In [27]:
# Can also use del to delete dict elements
del x["hello"]
x

{3: 5}

## Methods and Attributes

In Python, every number, string, data structure, function, and so on is an _object_. Each object has a type, and may have other objects stored inside.

A _method_ is a function stored inside another object. Methods usually affect their object somehow.

An _attribute_ is a non-function object stored inside another object.

In [None]:
# Use . to access methods and attributes

In [None]:
# Use dir() to list methods and attributes

For example, we can use the `.copy()` method of lists and dicts to make a copy.

In [31]:
x = [1, 3, 5]
y = x.copy()
y[1] = 7
# Since we made a copy, x is unchanged.
x

[1, 3, 5]

## Control Flow

Python's `if` statement allows us to change the behavior of our code depending on whether a condition is met. Conditions must be Boolean expressions (type `bool`).

Indentation determines whether code is inside or outside of a control flow statement! Be careful to get it right!

In [36]:
x = 1
if x > 10:
    print("x is greater than 10")
else:
    print("x is less than or equal to 10")
    
    print("This line is indented, so it's inside the if-statement and only runs when x is not greater than 10")
    
print("This line is not indented, so it's outside the if-statement and always runs.")

x is less than or equal to 10
This line is indented, so it's inside the if-statement and only runs when x is not greater than 10
This line is not indented, so it's outside the if-statement and always runs.


Use `elif` to add additional options that also have conditions:

In [34]:
if x > 10:
    print("x is greater than 10")
elif x == 1:
    print("x is one!")
else:
    print("x is less than or equal to 10, and not 1")

x is one!


Python's `while` loop allows us to run code repeatedly while some condition is met.

In [37]:
x = 0
while (x < 10):
    x = x + 1
    print(x)

1
2
3
4
5
6
7
8
9
10


Python's `for` loop allows us to iterate over elements of a string, tuple, list, or other object.

Objects that can be iterated over are _iterable_. We'll learn more about iterables next week.

In [38]:
for i in [1, 2, 3]:
    print(i)

1
2
3


In [50]:
# A weird way to convert to lowercase that shows a non-trivial loop:

for letter in 'STA 141B':
    # Computers compute on numbers, so each letter is represented by a number in memory.
    # ord() gets the number that represents a letter
    num = ord(letter)
    if 65 <= num <= 90: # A-Z are represented by 65-90
        # a-z are represented by 97-122, so a 32 number offset
        new_letter = num + 32
        # chr() converts a number that represents a letter back to the letter
        new_letter = chr(new_letter)
    else:
        new_letter = letter
        
    print(new_letter)

s
t
a
 
1
4
1
b


In [52]:
# In practice, we can just use a built-in method to convert to lowercase
'STA 141B'.lower()

# Behind the scenes, .lower() is implemented in pretty much the same way as our loop above.

'sta 141b'

## Modules

A _module_ is a text file that contains Python code, usually a `.py` file.

Python's `import` command lets us load code from a module to use in our script or notebook. Note: `import` is like a combination of R's `source()` and `library()` functions.

Python provides many built-in modules for common tasks (see [the list][py-modules]). Packages provide even more modules. 

[py-modules]: https://docs.python.org/3/library/index.html

In [41]:
# Use . to access functions and variables in a module
import math

math.pi

3.141592653589793

In [42]:
# You can give imported modules an alias to cut down on typing
import math as m

m.pi

3.141592653589793

## Git

[git](http://git-scm.com/) is a distributed version control system. Let's break that down:

* _Distributed_ means git can share files across multiple computers.
* A _version control system_ is a tool to keep track of different versions or drafts of files.

With git, you can

* Get or send sets of files with a single command.
* Back up your work to another computer or server.
* Work collaboratively with others (git will help resolve editing conflicts).
* Undo changes to files or entire directories.

A collection of files tracked by git is called a _repository_ or _repo_. A repo looks like any other directory on your computer, but always contains a hidden `.git` directory to store git tracking info.

You've already used a git repo -- the class website.

### GitHub

The class website is hosted on [GitHub][], an online service for backing up and sharing git repos. You'll need a free GitHub account in order to submit assignments for this class.

[GitHub]: https://github.com/

### The Shell

We'll run git commands in the shell, a text-based program for interacting with computers. Git and the shell are not part of Python or Jupyter. To open a shell window:

* On Windows, run "Git Bash"
* On Mac OS, run "Terminal"
* On Linux, run your favorite terminal emulator. Mine is `st`.

You can use the shell to navigate and modify directories on your computer. Directories are like places, and the shell is always at one directory at a time, called the _working directory_. By default, when you run commands in the shell, they affect the working directory. 

The essential shell commands for navigation are:

* `pwd` to print the working directory path.
* `cd PATH` to change the working directory. Replace `PATH` with a path to a directory, or with `..` to go up one directory.
* `ls` to list files and directories in the working directory.
* `man COMMAND` to get help. Replace `COMMAND` with the name of a shell command.

To learn more, I recommend Software Carpentry's [Unix Shell Notes][swc-shell].

[swc-shell]: https://swcarpentry.github.io/shell-novice/

### Configuring git

The first time you use git, you need to set your name and email address so that you'll get credit for your work.

Replace my name with yours, and run
```sh
git config --global user.name "Nick Ulle"
```

Then replace my email with yours, and run
```sh
git config --global user.email naulle@ucdavis.edu
```

You can check the settings any time by running these commands with the last value omitted. For instance:
```sh
git config --global user.name
```

### Clone and Pull

Let's use git to download the class repo.

When you want to download a git repo __for the first time__, use `git clone URL`. Replace `URL` with the web url of repo. For GitHub repos, the web url is always listed under the bright green "Clone or Download" button on the repo's front page.

So to clone the class repo, run
```sh
git clone https://github.com/2019-winter-ucdavis-sta141b/notes.git
```
Now you have a _local_ copy of the repo, one that's on your computer. The copy on GitHub is _remote_, since it's not on your computer.

When someone else owns the remote repo, or when you work on a remote repo with other people, they might make changes after you've cloned the repo. For instance, I might upload some new notes to the class repo. You can use `git pull` to check for and download changes from the remote repo to your local repo.

### Add, Commit, Push

A typical git workflow is:

1. Clone a repo from a server (like GitHub) with `git clone`. This downloads remote -> local.
2. Make some changes to your local copy of the repo.
3. Tell git to track your changes with `git add`.
4. Tell git to save your changes with `git commit -m`.
5. Push the changes on your computer back to the server with `git push`. This uploads local -> remote.
6. Repeat 2-5 as many times as you like until finished.

Let's learn the add, commit, and push commands.

Git calls a record of changes a _commit_. A commit is similar to a snapshot or save point. Before you create a commit, you need to tell git which changes to record. Use `git add PATH` to tell git to record changes to a file. Replace `PATH` with the path to the file.

Once you're done adding files, it's time to create the commit. Before you create the commit, optionally use `git status` to check that the changes you meant to add were added.

If everything looks correct, use `git commit -m "MESSAGE"` to create a commit. Replace `MESSAGE` with a 1 sentence message explaining what's changed in the commit. The commit message is a reminder for you and anyone else using your repo, so make sure it's clear.

You can make as many commits as you want before pushing them back to the server. When you are ready to push them back to the server, make sure you are connected to the internet and then use `git push`.

There are lots of steps in this process, so there are lots of places where it can go wrong. __Pay attention to error messages__ and search online or ask on Piazza to get help!