
<img src="https://github.com/abchapman93/DELPHI_Intro_to_NLP_Spring_2024/blob/main/media/DELPHI-long.png?raw=true" size="20%">
</br>

<h1 valign="center" align="center"><font size="+150">Introduction to NLP in Python</br>Spring 2024</font></h1>

# 1. Python Essentials
We'll start by learning some of the essentials of Python, as well as how to use Jupyter Notebooks.

# I. Jupyter Notebooks
Jupyter Notebooks are an environment which can be used for running code, displaying results and visualizations, and sharing human-readable information. Jupyter notebooks consist of *cells* and each cell defines a single piece of code. 

The cell which you're reading now is called a *Markdown* cell: It is meant to be human-readable and allows formatting like:
- Bullted or numbered lists
- **bold text**
- *italics*

Double-click on this cell to see what the raw markdown looks like. Then press `"Run"` or hit "Shift+Enter" to re-execute the cell so it renders in your browser. 

The other main cell type is the **Code** cell. This contains executable Python code. When you run a code cell, it will execute the code within and display the output. This way, you can go through a notebook a run code step-by-step and inspect the output.

You can change a cell type by clicking the dropdown menu above which says **"Markdown"**. To make it a Python cell, select **"Code"**.

In [None]:
print("This is a code cell.")

Inside of a code cell, you can write **comments** by putting a `#` symbol before text. Anything on the line after a `#` will not be executed and is just there for humans to read it.

In [None]:
# This line won't run
print("But this one will!") # (but nothing after this)

#### TODO
Run the cells below. Make sure it runs successfully, otherwise you'll get errors later in the notebook!

In [None]:
!pip install https://github.com/abchapman93/DELPHI_Intro_to_NLP_Spring_2024/releases/download/v0.1/delphi_nlp_2024-0.1.tar.gz

In [None]:
from delphi_nlp_2024 import *
from delphi_nlp_2024.quizzes.quizzes import *
from delphi_nlp_2024.helpers import *

### TODO
Change the cell below to be a code cell. Then modify the code so the `print` statement will execute. Run the cell by hitting either the `"Run"` button above (looks like a "play" button") or hitting "Shift+Enter". 

In [None]:
# print("Hello, there!")

Note that in the code cell above, when we execute the cell it runs the single line of code and then displays the output underneath.

To create a new cell, press the **"+"** buttom in the menu. The default cell type is **Code**.

### TODO
Create two new cells below. Make the first cell a Markdown cell. Copy and paste the following text into the Markdown cell and edit it with your information. Run the cell. Notice how the formatting of the text is rendered once you execute the cell.

```
- **First Name**: your first name
- **Last Name**: your last name
- **Major**: Your department
```

Then, make the second cell a Code cell. Copy and paste the following code and execute it:

```
print("1 + 2 = ", 1 + 2)
```

- **First Name**: Alec
- **Last Name**: Chapman
- **Major**: Division of Epidemiology

In [None]:
print("1 + 2 = ", 1 + 2)

## Errors
Errors tell you that something in the code was incorrect and either couldn't run or failed some check that the developer put in. They aren't always pleasant, but you're going to see a lot of them, so we might as well get familiar with them!

There are several different types of errors you'll come across. Let's see some examples and learn how to understand them.

The most common is a **syntax error**. This tells you that you've made a mistake in the syntax of your code. The code below needs parentheses around `x`. Read the message at the bottom.

In [None]:
print 1

In [None]:
print(1)

Another common error is `NameError`. That means you haven't defined a variable yet, or maybe you misspelled something.

In [None]:
# Throws an error
x

In [None]:
# Runs successfully
x = 3
x

In [None]:
# Throws an error
my_vriable = 2
my_variable

In [None]:
# Runs successfully
my_variable = 2
my_variable

# Functions

#### What is a function?
One of the most important concepts in Python programming is **functions**. Functions are what will allow us to take input (e.g., data) and produce some sort of output (e.g., summary statistics) which are useful to us.


#### Examples of functions

You already saw one example of a function earlier called `print`. This function takes a message and "prints" it out for the user to see. The message we printed out below was *Hello, world!*

#### TODO
Print your name in the cell below.

In [None]:
print("Alec")

The syntax for using a function is:

```python
function_name(arg1, arg2, ...)
```

- Type the function name
- Pass in any **arguments** or **parameters** in parentheses (sometimes there aren't any arguments!). 

The `print` function, and a number of other functions, are loaded by default. Another useful funtion is `help`, which gives us information about how to use a function:

In [None]:
help(print)

In [None]:
help(help)

We'll learn how to write our own functions later, but first let's get some practice using functions.

#### TODO

Here are a few examples of functions that have been defined for you:
- `square(x)`: Squares a number
- `add(x, y)`: Add x and y together
- `print_name(first_name, last_name, middle_name=None)`: Prints "My name is", followed by your first/last name and optional middle initial (we'll talk more about optional arguments later)

Using these functions, do the following:
1. Calculate the square of 2
2. Add 3 and 5
3. Add 3 and 5 and square the result
4. Print out your name

In [None]:
square(2)

In [None]:
add(3, 5)

In [None]:
square(add(3, 5))

In [None]:
print_name("Alec", "Chapman", "B")

## Variables
Now that we know how to use functions, the next tool we'll learn about is **variables**.

Variables are objects whose values can be changed. The opposite of a variable is a **literal**, which is a single specific value like `5` or `"abc"`.

We **declare** a variable using the assignent operator `=`:

```python
var_name = ... 
```

So, for example, we can declare a variable **x** and give it the value of 5. Then we can pass `x` into our `square` function and get the value of `x**2`.

In [None]:
x = 5
square(x)

Since `x` is a variable, we can change it to a different value so we can run our function using a different value.

In [None]:
x = 3
square(3)

We can also define a variable using the output of a function.

### TODO
What value will we get with the following code?
```python
x = 3
y = square(x)
x = x - 1
add(x, y)
```

In [None]:
# RUN CELL TO SEE QUIZ
x_y_changing_var_quiz 

#### TODO
Declare two variables, `r` and `pi`. Set `r` equal to 2 and `pi` equal to `3.14`.

In [None]:
r = 2
pi = 3.14

In [None]:
test_r_equals_2.test(r)

In [None]:
test_pi_equals.test(pi)

## Datatypes
Variables can be of any Python **datatype**. Here are some examples of data types:

### Integers
Integers are whole numbers. Each of the following cells have an integer.

In [None]:
1

In [None]:
x = 3
x

We can add two or more integers using the `+` operator (instead of the `add` function we used earlier - that was just an example):

In [None]:
x + 5

In [None]:
x + 5 + 4

Or multiply them with `*`:

In [None]:
x * 5

... divide them  with `/` (note that it isn't an integer anymore):

In [None]:
x / 5

... or exponentiate them using `**`:

In [None]:
x ** x

## Floats
Floats are the other numeric datatype in Python and are a number with a decimal. 

In [None]:
1.3

In [None]:
pi = 3.14
pi

We can do all the same operations on floats that we did on integers. We can also do those same operations between floats and integers.

In [None]:
pi + pi

In [None]:
2 * pi

In [None]:
pi + x

In [None]:
pi ** x

We can convert a float to an int and vice-versa:

In [None]:
int(pi)

In [None]:
float(2)

#### TODO
What data type is the value: 

```python
r=3
2*r*pi
```

In [None]:
# RUN CELL TO SEE QUIZ
quiz_data_type_2_pi_r

In [None]:
# RUN CELL TO SEE QUIZ
quiz_2_pi_r_part2

## Strings
**Strings** represent text. We make a string by wrapping characters in quotation marks:

In [None]:
"Hello!"

You can also use single quotes (although I personally prefer double-quotes):

In [None]:
'Hi!'

You can also use **triple-quotes** to span multiple lines in the code:

In [None]:
# https://www.poetryfoundation.org/poems/42916/jabberwocky
"""
’Twas brillig, and the slithy toves
      Did gyre and gimble in the wabe:
All mimsy were the borogoves,
      And the mome raths outgrabe.

“Beware the Jabberwock, my son!
      The jaws that bite, the claws that catch!
Beware the Jubjub bird, and shun
      The frumious Bandersnatch!”
"""

#### Concatenating strings
We can add strings together to get a longer string:

In [None]:
first = "Alec"
middle = "B."
last = "Chapman"

In [None]:
first + " " + middle + " " + last

However, you cannot add a string and numeric objects:

In [None]:
first + 1

#### TODO
What do you think will happen if you multiply a string and an int? Test it out below.

In [None]:
"Hello!"*25

In [None]:
quiz1

In [None]:
quiz2

## Booleans and comparators
**Booleans** can have two possible values: `True` or `False`. These are **logical** values and represent whether some condition holds.

In [None]:
True

In [None]:
False

A common way to encounter booleans is through `comparators`, such as whether two values are equal or if one is greater than another.

To check whether two values are equal, we put double equal signs `==` between the two values. If the two values are equal, then this evaluates as `True`. Otherwise, we get `False`.

In [None]:
x = 3
x == 3

In [None]:
x == 2

Or we can check whether two values are not equal with `!=`:

In [None]:
x != 3

In [None]:
x != 2

To check whether one value is greater than another, we use `>`. If we want to check greater than or equal to, we add an equal sign `>=`.

In [None]:
x > 1

In [None]:
x > 3.0

In [None]:
x >= 3.0

And we do the same for less than but with. `<`

In [None]:
x < 3.001

In [None]:
x <= 2.9999

## Lists
Next we'll look at examples of **containers**. These are Python objects that can contain other Python objects. The first of these classes is a `list`. A list is an ordered array of objects and is marked by square brackets: `[]`.

These are all examples of lists:

In [None]:
x = [1, 2, 3]
y = ["a", "b", "c"]
z = [1, "b", x]

#### TODO
What are the data types of the three elements of `z` in the previous cell?

In [None]:
# RUN CELL TO SEE QUIZ
quiz_data_types_z

You can access specific elements of a list using **indexing**. You do this by putting brackets after the name of the list and a numeric index. For example, let's say we want to access the first element of `x`. We would do this as:

```python
x[0]
```

This might look a little funny to you - the *zeroth* index of x? The reason that we use `x[0]` instead of `x[1]` for the first elementing is that Python uses **zero indexing**, meaning that the positions start at 0 and end at `len(x) - 1`. This is different from R or many other statistical software packages. You can think of this as saying: **Give me the element of x which is `0` positions away from the beginning**.

#### TODO
What code would give you the second element of `x`?

In [None]:
# RUN CELL TO SEE QUIZ
quiz_second_element_x

To get the length of a list, we can use the built-in `len` function. 

#### `k` steps back
When we pass in 0 or a positive index `k` for list `x`, we go `k` positions from the beginning. But if we pass in a negative number, we go backwards from the end of the list. So one way to get the last element of `x` is:

In [None]:
x[-1]

## Dictionaries
**Dictionaries** are collections of **key/value pairs**. Unlike lists, they are unique and unordered.

A key/value pair is a unique mapping from one item (a key) to another (a value). An example of this in real life is a mapping from states to their capitals:

- Utah --> Salt Lake City
- Pennsylvania --> Harrisburg
- New York --> Albany

The states are the keys and the capitals are values. Let's see what this would look like in Python.

Dictionaries are declared using curly brackets `{}` with colons `:` mapping the keys and values.

In [None]:
state_capitals = {
    "Utah": "Salt Lake City",
    "Pennsylvania": "Harrisburg",
    "New York": "Alabany"
}

state_capitals

Let's say we want to know the capital of a particular state. We can get this by using the key (i.e., state) as the index:

In [None]:
state_capitals["Utah"]

If you try to index using a key that isn't in the dictionary, you get an error:

In [None]:
state_capitals["Ohio"]

You can add a key/value pair to a dictionary like this:

In [None]:
state_capitals["California"] = "Sacramento"

In [None]:
state_capitals["California"] = "Sacramento"

In [None]:
len(state_capitals)