
# Lecture 2: Getting Started with Python and Jupyter Notebooks

## Python is an interpreted language

* Programs do not need to be compiled before they are executed

* Allows for rapid development and exploration (important for statistics/data science)



* You can run the Python interpreter directly from the command line by calling ''python'' or ''python3''


* We will instead use Juypter Notebook with a Python 3 kernel

* A Jupyter notebook is divided into cells

* Cells may be subdivided into an input cell and an output cell

* Input cells generally either contain code or text 
    * The Markdown language can be used to format text
    * LaTeX can be used to input math
* Cell type defaults to "Code" and can be changed to "Markdown" using the Cell->Cell Type menu or a command sequence (typically control-m m)

## Heading level 2

* Item 1
    * Item 2
    
$$\int_0^\infty e^x~dx$$

* Cells also may contain "magics", which are commands to the Jupyter notebook server

* For instance, to determine which directory you are in, you can use the "%pwd" (print working directory) magic:


You can use the "%cd" magic to change your directory:

### Hello World!

To print output in Python, use the print() function. It knows how to output many data types without providing explicit formatting (like printf in C)

* Code cells may contain multiple Python statements. 
* All statements in a cell will be run sequentially when the cell is run
* Cells may be run using the "play" button, via the Cell-> Run Cells button or by using a keyboard command (usually shift-Enter)

Anything that follows a # (hash) symbol is a comment:

In [None]:
#It is important to use comments to document your thinking on big assignments

There is not really a multi-line comment in Python (like /* */ in C). One way to make a multiline comment is to just make a multi-line string that is not assigned to any variable.  Multi-line strings are delimited by triple-ticks ('''):

When we make functions, a multi-line string right after the function definition serves as the docstring (documentation string) for that function.

In [None]:
def myFunction(x):
    '''This is the function docstring
    This is the second line'''
    return x

In [None]:
help(myFunction)

In [None]:
myFunction?

## Python is a dynamically typed language

* Variable types are determined when they are assigned values

* Python will usually do the *right thing* based on the type of the variable

Just be careful of the implications of this:

## Indentation conveys meaning in Python

* Use indentation to indicate code blocks that belong together

Ranges in Python start by default at 0 and are exclusive of the end point (I.e., if started at 0, the end point is also the length of the sequence)

## Main Data Types

The main data types in Python are:
* numbers (int, float, complex)
* string
* list
* tuple
* dictionary

* All data types are objects. That means they methods associated with them

* Methods with __ __ are designated as private, but you can still call them:

* Usually, you don't call the private methods because there are other ways of achieving the same thing that are easier to interpret:

* Some data types in Python are immutable. They cannot be changed. These include numbers, strings, and tuples
* Lists and dicts are mutable. They can be changed

* How did a change if it is immutable? 
* a did not change, a new tuple was created that added 4 to the previous tuple, and a was updated to point to the new tuple
* How can we tell?

* Compare with lists:

You can also append to lists with +:

Lists and tuples may contain any other objects, including other lists and tuples:

Note that tuples and lists are ordered collections, and we can access their members directly:

* Negative indexes start from the end of the list, with -1 denoting the last member in the list:

## Modules and Libraries

* Many of the tools we will use in the class are not directly part of Python
* Instead, they are libraries or modules that provide particular functionality
* These include:

* **numpy** provides arrays, linear algebra, and math functions (many similar to the core MATLAB functions)
* **matplotlib** provides functions to generate plots similar to those in MATLAB
* **random** contains functions for generating random numbers and choices
* **scipy** provides many tools used in scientific computing including optimization, signal processing, and statistics
* **sklearn** provides tools to implement a wide variety of machine learning models and techniques
* **pytorch** provides machine learning and deep learning tools for applications such as computer vision and natural language processing

To work with these libraries, import them:

To reduce typing, you can relable a library on import:

* When using matplotlib in Jupyter notebook, I recommend using the "%matplotlib inline" magic to make your graphs appear directly in the notebook:

# Introduction to Functions, Counting in Numpy, and Accelerating Simulations

## Introduction to Functions

In [None]:
num_sims = 20
flips=20
for sim in range(num_sims):
    coins = random.choices(faces, k=flips)
    print(coins.count('H'))

Suppose we want to see how often 6 or fewer heads occurs. We can count and determine the relative frequency of those extreme events:

In [None]:
num_sims = 1000
flips=20

event_count=0
for sim in range(num_sims):
    coins = random.choices(faces, k=flips)
    num_heads = coins.count('H')
    if num_heads <= 6:
        event_count+=1

print("Relative frequency of 6 or fewer heads is ~ ", event_count/num_sims)

Let's consider how to further improve this code. 

We begin by turning this simulation into a function. New functions in Python are defined using the ```def``` keyword, followed by the function name, the arguments in parentheses, and then a colon. The commands to be run in the function follow in an indented block.

Note that it is helpful to know how to indent a whole block of code in Jupyter. Choose the Help->Keyboard Shortcuts menu and then look under the Edit Mode section for the Indent command. For instance, in Windows, it is Command-].  When you want to turn a code block into a function, copy and paste it into a new cell and then indent it using the keyboard command. Then add the ```def``` statement above it.

It is easiest to understand through and example:

Now we can call the function by its name followed by parentheses. Since we have provided default values for all of the function's arguments, we do not have to even provide any arguments:

We can pass arguments to the function according to their *position*, *keyword*, or both. For instance, to only run 100k simultions, we can do either of the following:

Keyword arguments can appear in any order and can appear after positional arguments:

However, positional arguments cannot follow keyword arguments:

Now let's see how long it takes to run this function. We will use Jupyter's built-in ```%timeit``` magic:

If you have programmed in Matlab, you may have heard to avoid ```for``` loops because they slow everything down. The same is true in Python. Instead, we replace the lists with 2-D arrays, where one dimension is for the different dice, and the other dimension is for the different experiments.

Since we are creating an *array* of values, we will be generating 1s and 0s instead of 'H's and 'T's.  We will use the ```numpy.random``` submodule. It will be convenient to import it as ```npr```. We will also use other parts of ```numpy```, so we will import it as ```np```, as usual:

In [None]:
import numpy as np
import numpy.random as npr

Let's start by simulating flipping a fair coin 20 times again. Here we just randomly choose 20 random values that are equally likely to be 0 (representing tails) or 1 (representing heads). We use the ```randint()``` method:

Now, we can generate multiple rows like this by changing the size to a tuple. The tuple is interpreted as (rows, columns), so to conduct 5 simulations, we can do:

Next, we need to learn how to translate the simulated coin flips into the counts of the number of heads. We can do that by summing across the columns. The rows are dimension 0, and the columns are dimension 1. We can use numpy's sum method to carry out the sum over the columns as follows:

We can perform comparisons on numpy arrays, and it will compare every element:

If we sum over an array of True/False values, it will treat True as 1 and False as 0. Thus, we can count how many items satisfy some condition easily:

Now we are ready to put all of that into practice. Let's make a new function using these principles: