# Python Overview

This notebook covers an introduction to Python. It includes:

- data types
- functions
- control flow
- files

## Data Types

### Variables & Names

When programming, we use variables to keep track of data. In the following example, we add two numbers togetherWe differentiate our variables by naming them

The following line is an expression where we compute a value, specifically 2+2

In [None]:
2+2

We can use a variable to store the value returned by the expression. We assign the value to the variable by using the `=` sign.

In [None]:
a = 2+2

Here, the name of the variable is `a`. It is good practice to name the variable something descriptive. 

**Question:** What would be a more appropriate name for the variable that stores the result of the expression `2+2`?

In [None]:
# ... = 2+2

### Numbers

In [None]:
int_num = ...
float_num = ...

type(int_num), type(float_num)

**Question:** What happens if we add, subtract, multiply, and divide floats and ints?

In [None]:
#addition

In [None]:
# subtraction

In [None]:
#multiplication

In [None]:
#divition

#### Long floats

In [None]:
float1 = 34.234592019394939293499293992434828482842
float2 = 34.2345920193949392934992939924

**Question:** Are these two values above equivalent?

**Question:** Are the values assigned to the variables `float1` and `float2` equivalent?

In [None]:
float1, float2

In [None]:
float1 == float2

### Strings

In [None]:
string = "hello word!"
string

#### String methods

In [None]:
help(str)

In [None]:
string.split()

**When do you think split() would be helpful in text analysis?**
<details>
<summary>Solution</summary>
    <b>Getting words from a string</b>
    <br>
    <i>There are some issues with this that we'll discuss late this week</i>

</details>

In [None]:
string_example = "What is an API? The Federal Circuit described an API as a tool that \
“allow[s] programmers to use ... prewritten code to build certain functions into their \
own programs, rather than write their own code to perform those functions from scratch."
string_example

Let's use `split()` here

**Question:** What is the length of `string_example`?

**Question:** Let's lowercase string_example

**Question:** How many `c`'s are in `string_example`?

##### Combining Strings

In [None]:
string_example + " hello"

##### Combining Strings and Numbers

**f strings** - https://zetcode.com/python/fstring/

In [None]:
fav_fruit = "banana"

f"I like to eat {fav_fruit}"

In [None]:
"I like to eat " + fav_fruit

### Collections
- Lists
- Tuples
- Sets
- Dictionaries

#### Lists 
Lists store multiple items in a single variable. (Definition is form w3schoolhttps://www.w3schools.com/python/python_lists.asp

In [None]:
fruits = ["bananas", "apples", "oranges"]
fruits

##### Size of lists

**Question:** How many items are in `fruits`?

##### Accessing items in lists

**Question:** How do we access the first item in the list `fruits`?

In [None]:
first_fruit = ...
first_fruit

**Question:** How do we access the last item in the list `fruits`?

In [None]:
last_fruit = ...
last_fruit

**Question:** How do we access the second item in the list `fruits`

In [None]:
second_fruit = ...
second_fruit

##### Accessing sub-lists from lists

`[first_index : last_index]`

`first_index` is inclusive, `last_index` is exclusive

In [None]:
fruits[0:2]

In [None]:
fruits[1:2]

In [None]:
fruits[1:]

In [None]:
fruits[:2]

##### Adding to lists


`<list>.append(...)` adds ... to the end of the list (in-place)

In [None]:
fruits.append("strawberries")
fruits

Override at a specific index

In [None]:
fruits[0] = "pear"
fruits

##### Other things we can do to a list

(Just run the code below, its ok if we don't understand it just yet)

In [None]:
[", ".join([func for func in dir(fruits) if not func.startswith("__") ])]

The official [python documentation](https://docs.python.org/3.8/tutorial/datastructures.html#more-on-lists) contains descriptions of each of these methods

**Iterating through a list**

> **for** *element* **in** *list*: <br>
        &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;do something

*Indentation is important*

**Question:** Iterate through each item in fruits and print out the item

### Tuples

In [None]:
tup = (1,2,3)
tup, type(tup)

Tuples are unmutiple lists. Lets look at what that means:

**Adding to a tuple**

In [None]:
tup.append(4)

In [None]:
tup[0] = 2

**What can we do with a tuple?**

What python method will list out the options:
<details>
<summary>Solution</summary>
    <b>dir(tup)</b> or <b>help(typ)</b>

</details>


In [None]:
# run that command here

In [None]:
tup.count(1)

**Why use tuples?**

In [None]:
play = ("Shakspeare", "A Midsumer Night's Dream", 1595)
author, title, year = play
author, title, year

Tuples assignment is a way of unpacking values.

### Sets

In [None]:
set_example = set([0,1,2,3,4])
set_example, type(set_example)

**Adding to a set** 

Lets try adding the number `5` to our set

In [None]:
set_example.append(5)

What python method can you use to find out how to add to a set? 
<details>
<summary>Solution</summary>
<b>dir(set_example)</b>

</details>


Now let's add 5 to our set

#### Set of authors

In [None]:
authors = set(["Shakespeare", "Austin", "Morrison", "Woolf", "Shakespeare"])
authors

**What would be a good use case of sets?**
<details>
<summary>Solution</summary>
<b>Vocabularies</b>

</details>

#### 2016 Republican & Democratic Miami Presidential Debates

Let's first store the debates in the variables `repub_debates` & `dem_debates`

In [None]:
repub_debates = open("data/Republican_16_Miami_Debate.txt").read()
dem_debates = open("data/Democratic_16_Miami_Debate.txt").read()

In [None]:
repub_debates[:3000]

**What types are `repub_debates` & `dem_debates`?**

**Let's build our vocabularies**

In [None]:
repub_vocab, dem_vocab = set(), set()

**Let's discuss an algorithm for adding words to our vocabularies**

<details>
<summary>Solution - dont show for a while</summary>
1. Split the string containing the entire debate into a list of words
<br>
2. Loop through the list of words, one word at a time, and add each word to the set
</details>

In [None]:
# skip

In [None]:
# skip

In [None]:
# skip

In [None]:
# skip

In [None]:
for word in repub_debates.split():
    repub_vocab.add(word)
for word in dem_debates.split():
    dem_vocab.add(word)

In [None]:
len(repub_vocab), len(dem_vocab)

**Its a good idea to sanity check our variables**
Let's look at our data a bit

In [None]:
dem_vocab

#### Union, intersection, difference

**Question: How big is the entire vocabulary across both debates?**

Approach 1: add the length of the two vocabularies together

In [None]:
len(repub_vocab) + len(dem_vocab)

Why is this wrong?

<details>
<summary>Solution</summary>
<b>There are words in both vocabularies</b>

 What words do we think might be in both?   
</details>

Approach 2: Intersection of the two

In [None]:
len(repub_vocab.union(dem_vocab))

In [None]:
total_vocab = repub_vocab.union(dem_vocab)
total_vocab

**Question: What are the words used in one debate but not the other?**

In [None]:
dem_vocab.difference(repub_vocab)

**Question: What do we notice about these words?**

<details>
<summary>Solution</summary>
<b>Punctuation</b>.
    
    Example: 'excuses' in dem_vocab, 'excuses.' in dem_vocab
    
    We'll deal with this later
</details>


### Dictionary

![key_val](images/key_val_dict.jpeg)
<p style='text-align: right;'>Image from <a href="https://medium.com/python-pandemonium/python-dictionaries-45cacc2b76aa">https://medium.com/python-pandemonium/python-dictionaries-45cacc2b76aa</a></p>

In [None]:
type({})

In [None]:
dict_example = {"a": 1, 
                "b": 2, 
                "c": 3}
dict_example

#### Dictionary functions

In [None]:
help(dict)

**Accessing items in dictionaries**

**Check if key is in a dictionary**

#### Use case
**Question:** What would be a good use of dictionaries from our previous example?

<details>
<summary>Solution</summary>
    <b>Term Frequencies</b>
</details>

In [None]:
repub_word_counts, dem_word_counts = {}, {}

for word in repub_debates.split():
    if word not in repub_word_counts:
        repub_word_counts[word] = 0 
    repub_word_counts[word] += 1
    
for word in dem_debates.split():
    if word not in dem_word_counts:
        dem_word_counts[word] = 0
    dem_word_counts[word] += 1


In [None]:
sorted_dict = {}
sorted_keys = sorted(dem_word_counts, key=dem_word_counts.get, reverse=True)
for w in sorted_keys:
    sorted_dict[w] = dem_word_counts[w]

print(sorted_dict) 

**In today's tutorial we will build candidate specific vocabularies and compute their frequencies**

## Functions

In [None]:
def add(x, y):
    """Returns the sum of two numbers passed as arguments"""
    return x + y

In [None]:
type(add)

In [None]:
add.__doc__

In [None]:
dict_example[0]

## Control Statements

### Loops

We saw loops above. We can loop through any collection

#### Looping through multiple collections

In [None]:
people = ['John', 'Mary', 'Karen']
fruits = ['apple', 'banana', 'organge']

for person, fruit in ...(people, fruit):
    # Do something

#### Looping and getting indices

In [None]:
enumerate

### Conditionals: If statements


In [None]:
if ___:
    # action 1
elif __: # other if second condition is true
    # action 2
else: # if nother first condition . nor second condition are true:
    # action 3

## Python functions and libraries

**Question:** Write a function called `absolute()` that takes in a number of returns the absolute value

In [None]:
def absolute(number):
    '''
    Returns the absolute value of a number
    '''
    
    return

Let's look up built in python libraries

Many of the things we want to do in computational text analysis are already implemented in python libraries:

- Find all people, places, numbers, organizations, countries mentioned in a text
- Identify all nouns, verbs, adjectives, or adverbs in a text
- Predict the sentiment of a tweet or news article 
- Determine the vocabulary and frequency of different terms
- Represent words with meaningful lists of numbers
- Develop a Machine Learning classifier to predict X from text