# Brief Overview of Python

Just to get us on the same page this is a quick overview of the Python programming language and key points to help us understand Jupyter Notebooks and some of the libraries we look at.

Python relies on whitespace to group levels of code. It is import that Python code is indented correctly in order for it to be understood and executed by the computer.
Try running the cell below. We want it to print the phrase "hello world!".

Note the error and the space before the `print` command. Delete the leading white space of the `print` line and then run the cell again.

In [0]:
a = 'hello world'
 print(a)

## variables

variables are used to store data in Python. We will use variables a lot in Jupyter Notebooks.

A few things about variables:
* must start with a letter, or undescore
* cannot start with a number
* the name must only contain alphanumeric characters and underscores
* the names are case sensitive ('cat' does not equal 'Cat')
* variables cannot be key Python terms (e.g. `print`)

A variable is created the moment it is declared. So to make a variable all you need to do is name it and assign it some value.


In [0]:
# create a variable called 'name' and print it out. Hint: look at the code above.

## Data Types

Working in Python it is important to understand the data types.

### String

When working in NLTK and other language processing libraries we will work a lot with strings. A string is simply character data and will always be surrounded by single or double quotes (there is no difference between using single vs double). Think of a string as plain text. 

One thing to keep in mind is that strings are nothing but character data. So the string "4" is not the same thing as the number 4.

We can see this in the code cell below:

In [0]:
print ('4' + '4 \n', 4 + 4)

### Numbers

There are two types of of number data in Python that we will look at:
* integer: a whole positive or negative number with out decimals (e.g. 1, 20543670, -34)
* float: a positive or negative number containing at least one decimal (e.g. 1.34, -2.546, 2.0)

We perform math and computation on these data types.

In [0]:
print (5*3)

In [0]:
print (1+2)

In [0]:
print (2.1*(3+3.134))

### Python Collections

#### Lists
Lists are collections of data that can be changed and reordered. For example we could store the numbers 1, 4, -1, -1.3, and 3.14 in a list. A list is notated with the square brackets. In Python a list containing the numbers above would look like this: <br/>
`[1, 4, -1, -1.3, 3.14]`<br/>
A list of animals would look like this:<br/>
`['cat','bear','horse','dog']`

We can assign these lists to a variable to refer to them.<br/>
`animals = ['cat','bear','horse','dog']`

To retrieve the values from a list we refer to the position of the item in the list, this is called the index, or index position. Note that in Python counts start at 0. So the first entry has the index position of 0.

`print (animals[1])`<br/>
`>>> 'bear'`

Below, create a list of anything you like. Then practice accessing the data using the index position.

In [0]:
# type your list below this line. 

If we want to see more items from the list we can refer to multiple positions in a list with a colon. <br/>
Note: the last number indicates the position of the index where the slice ends and does not return the item in that position.<br/>
`print (animals[0:2])`<br/>
`>>> ['cat','bear']`

To reach the end of a list we can leave the stopping point blank:<br/>
`print (animals[2:])`<br/>
`>>> ['horse','dog']`

In [0]:
# Try retrieveing different items, or groups of items, out of the list below.
animals = ['cat','bear','horse','dog']
print(animals[2:])

Something useful about lists is that we can change a list by simply reassigning values by the index location. 

In [0]:
animals

In [0]:
animals[0] = 'red'
animals

#### Sets

Sets are unordered collections of data that can be added to or changed. For our purposes the key difference between a list and a set is that a set does not contain duplicate variables and the values are not indexed. 

Sets are made simlarly to lists but instead of using the squared brackets, sets are denoted with the curly brackets: `{}`

Look at the code cell below and note the difference in the varaibles when printed.

In [0]:
s = {'dog','bone','cat','string','dog'}
l = ['dog','bone','cat','string','dog']
print(s)
print(l)

In [0]:
# `len` will give us the length, or the number of items, in a list or set.
print(len(s))
print(len(l))

If we want to get values in a set we can iterate through it.



In [0]:
for word in s:
  print (word)
  
  

In [0]:
# a set can be changed, but note the order is not significant. 
# In this case the list is sorted in alphabetical order but it doesn't have any significance to the set.
s.add('mouse')
print (s)

### Tuple
A tuple is a collection of ordered and unchangeable data.

Tuples are denoted with parantheses, or round brackets:`()`

Once a tuple is created it cannot be changed unless there is a reassignment to the variable.

In [0]:
color = ('red','yellow','green','blue')
print (color)

In [0]:
color[0] 

In [0]:
color[0] = 'cat'


In [0]:
print(color)

### DIctionary
Finally (for our purposes)  a dictionary is a collection of unordered, changeable, and indexed data, with key-value pairs. 

A dictionary is denoted with curly brackets:`{}` and key-value pairs indicated by `:`

In [0]:
my_dict = {
    'cat':'string',
    'dog':'bone'
}

There are many ways to access the values in the dictionary.

In [0]:
my_dict['cat']

In [0]:
my_dict.get('dog')

In [0]:
for val in my_dict:
  print(val)

And the dictionary can have values added, changed and deleted

In [0]:
my_dict['cat'] = 'plant'
my_dict

In [0]:
my_dict['mouse'] = 'cheese'
my_dict

In [0]:
del my_dict['cat']
my_dict

In [0]:
my_dict.keys()

In [0]:
my_dict.values()

## Conditional Statements and For Loops

### Conditional statements
Python uses "conditional statements" to evaluate data and then execute an action based on the result of that evaluation. The basic construction is: *if this statement is true then do this action*

Python only cares if the statement is true or false. For example:  2+2 = 4. If this is true say "you are right"

In python this kind of statement is writen this way:<br/>
`if [my condition]:`
> `[do this action]`

look at the code below to explore how conditional statements work. Change the conditional statement to see how the code responds. 

Here is the notation for some basic comparisons:
* Equals: a == b
* Not Equals: a != b
* Less than: a < b
* Less than or equal to: a <= b
* Greater than: a > b
* Greater than or equal to: a >= b

In [0]:
if 1 < 2:
  print ('this statement is true')
  
print('^^^^^^^^^^^^^^^^^^^^^^')

What can we do if the statement is false?

There are two options.
1. We just ignore the action and continue as if it was never accounted. i.e. do nothing. 
2. Offer an alternative to execute in the case that the stetement is False by including an `else` action.


**Note: Observe how the code is indented to organize the actions. **

In [0]:
a = 2
b = 4
if a == b:
  print ('These are the same')
else:
  print('These are different')

### For Loops

A for loop is used to go through a sequence of data (e.g. a list, tuple, set, dictionary, and even a string).

it is written using an iterator value that tracks the place of the loop within the container 

In [0]:
# create a list of numbers to iterate through with a foor loop
numbers = [1,4,6,7]

In [0]:
# use a loop to square each number and show the result
for n in numbers:
  print(n*n)

In [0]:
# try this on a list of strings
animals = ['cat','dog','bear','fish']
for a in animals:
  word = a.upper()
  print(word)

***Note: The actual values in the list have not changed. This does not do anything to the list but it uses the list as an input value*** 

In [0]:
print(numbers)
print(animals)

The if statement and the for loop can be used together to execute more complex actions, particularly when evaluating containers of data.

In [0]:
# combine a for loop and an if statement to find all the numbers less than 5.

for n in numbers:
  if n < 5:
    print (n)

In [0]:
# we can use this feature to create a new list that contains the items we are interested in.

colors = ['gold','green','blue','red','yellow']
g_colors = []
for c in colors:
  if c[0] == 'g':
    g_colors.append(c)
    
print(g_colors)

## Functions

Finally we can combine all of these features to create complex actions that can be reused to perform regular tasks.

A function is created with the key word `def` followed by the name of the function and parantheses. A function can accept input in the form of arguments that are passed into the function.

`def [function name]([argument]):`
>`[code to execute]`

To use a function, or to "call" a function, you simply use the name with the parenthese and pass in any necessary arguments. 

Let's first look at a very simple function that does not require any inputs.

In [0]:
# at the end of this cell type 'greeting()' to call the function.
def greeting():
  print('Hello There!')
  


In [0]:
# Now let's pass in an argument and see how we can make the function more versatile
def greeting(name):
  print('Hello There,', name + '!')

In [0]:
# now we create a variable and pass that into the function. 
my_name = 'Derek'
greeting(my_name)

In [0]:
# now we can use this function to greet anyone once we have a variable for the name.
your_name = ''
greeting(your_name)

In [0]:
# or we can just pass in a string
greeting('people')

In [0]:
# useful for executing repetitive tasks. Compute the number of seconds in a given number of minutes.
def seconds(min):
  sec = min*60
  print('There are', sec,'seconds in', min,'minutes.')

seconds(24)

Now let's combine everything we have leanred to create a function that will compare two lists and return a dictionary that tells us what words appear in both lists and how many times.

There are lots of ways to write this code. I am going to focus on a method that relies on the aspects of python covered in this notebook.

In [0]:
# create the dictionary to hold the results we want to see
word_list = {}

def check_list(ls1,ls2):
  # remove any duplicates from the list to find matches in both lists
  setx = set(ls1)
  sety = set(ls2)
  both = []
  for x in setx:
    for y in sety:
      if x == y:
        word_list[x] = 0
  all_words = ls1 + ls2
  for k in word_list.keys():
    for w in all_words:
      if k == w:
        word_list[k] += 1
  print(word_list)
        
    
    
   
      
  

In [0]:
animals = ['cat','dog','bird','cat','bear']
pets = ['dog','cat','mouse','dog','bird']

check_list(animals,pets)