# Introduction

We'll be using Jupyter python notebooks in this course. Today, they are extremely popular and there are an abundent number of python notebook and data science examples online.


## Which Python? 

We're going to use Python3. Note that Python3 is not compatible with Python2 code.

* Python 2.x is legacy
  * But it is still widely used
* Python 3.x released “late” in 2008
  * Not backwards compatible with Python 2.x

Most major libraries and python modules have been updated to Python 3.x code

Be careful! New python functions defined in Python 3.5 will not work on a system that only have Python 3.4 installed.

## Interpretable Python and Notebooks

Python is interpreted (though there is some behind-the-scenes compilation that eventually occurs to speed up runtime). Statements and expressions can be run in the python interpreter and executed immediately.

## How to use this notebook?

Each cell is either a "Markdown" cell or a "Code" cell. 

**Markdown syntax** is fairly basic and is worth exploring, to make your python notebooks more readable. Here are some reference links:

* https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Working%20With%20Markdown%20Cells.html
* https://medium.com/ibm-data-science-experience/markdown-for-jupyter-notebooks-cheatsheet-386c05aeebed

**To exectue a cell:** first make it active by clicking on it, then press Cntl-Enter.

Cells are executed in the order that the user Cntl-Enter's on them.

## Why Python?

Very Object Oriented 
* Python much less verbose than Java 
NLP Processing: Symbolic 
* Python has built-in datatypes for strings, lists, and more. 
NLP Processing: Statistical 
* Python has strong numeric processing capabilities: matrix operations, etc. 
* Suitable for probability and machine learning code 
NLTK is implemented as set of Python modules
* (There are also some more “industrial” NLP modules for other programming languages.) 

## How I Learn Python

Look at code from other programmers
1. Determine if it's good code that I should be modeling
2. Figure out if it's legacy python 2.x code

Read/Query unfamiliar functions
* There is a lot in the language!

Google

Easy to get better with practice!

Python API: https://docs.python.org/3/library/index.html

## Learning Python from Scratch

http://www.learnpython.org/ is very good these days!

Especially relevant sections:
1. Learn the Basics
2. Advanced: List Comprehensions, Multiple Function Arguments, Regular Expressions, Exception Handling

## Arithmetic

Expressions can be directly typed in.

In [1]:
5+9

14

Though both of the below expressions are evaluated, only the result of last one is displayed when the cell is run.

In [None]:
2.5+3

4-3*2

Unless `print` statements are used.

Comments in python begin with a `#`

In [None]:
# no semicolons needed!
print(4/3)
2.0+5
4 % 2

## Variables

Variables have **types**, but don't need to be declared. The type of a variable can change. The type of an expression can be returned with the `type` function.

* Variables come into existence when they are first assigned 
* A variable can refer to an object of any type
* Drawback: <u>type errors are only caught at runtime </u>

In [None]:
x = 5
y = 'Hello World'
z = 10.5

In [None]:
z+5

In [None]:
print(type(x))
print(type(y))
print(type(z))

Returning the floor of a float:

In [None]:
int(z)

# Whitespace Formatting

Python uses indentation to delimit blocks of code. No more curly braces!

In [None]:
for i in [1,2,3,4,5]:
    print(i)
    for j in [6,7,8,9,10]:
        print(j)
        print(i+j)
    print(i)
print('done')

## Strings

Strings can be delimited by either single or double quotation marks.

In [None]:
x = 'rich is a string'
y = "Richie is still a string"
print(y)

Unmatched can occur within the string.

In [None]:
"matt's"

Use triple double-quotes for multi-line strings or strings than contain both ' and ” inside of them.

In [None]:
"""
A very long
string over multiple
lines including " and '.
"""

### String Functions

In [None]:
len(x)     # length of a string

The `dir` function returns operations that are defined on an object.

In [None]:
dir('x')

Using the `upper` method:

In [None]:
x.upper()

## Printing + String Concatenation + Int2Str

In Python3, `print` statements are functions and always have a `(` and `)`. (Python2 notably differs.)

In [None]:
print(2+3, 5, 4*20.0)    # prints the result of each expression, space seperated

Converting an int to a string:

In [None]:
s = "Hello World" + str(5)   # conversion function
s

## Null Values

Java's `null` = Python's `None`

In [None]:
found = None
x = 5
type(found)

## Sequence Types

1. Tuples
  * A simple immutable ordered sequence of items
  * Items can be of mixed types, including collection types 
2. Strings 
  * Immutable 
  * Conceptually very much like a tuple 
3. Lists 
  * Mutable ordered sequence of items of mixed types
  * The mutability of lists means that they aren't as fast as tuples
  
All three sequence types share much of the same syntax and functionality.
https://docs.python.org/3/library/stdtypes.html#sequence-types-list-tuple-range 

### Lists

The list data structure is really useful and cool in Python. It's an ordered collection, similar to arrays in Java. However, a python list can have values of different types in them.

<u>Lists are defined using square brackets (and commas).</u>

In [None]:
x = [1,2,3]
y = [1,'a',2.3,False]
z = [1,x]      # lists can have different types in them
w = range(50)  # is the list [0,1,...,49]

### Tuples

A tuple is immutable. Very similar to a list.
<u>Uses parentheses instead of the square brackets of a list.</u>

In [None]:
tu = (23, 'abc', 4.56, (2,3),'def')
tu

### Strings

Strings are defined using quotes (”, ', or ”””). 

In [None]:
st1 = "Hello World"
st2 = 'Hello World'
st3 = """This is a multi-line
string that uses triple quotes."""

### Are tuples <u>iterable</u>?

In [None]:
t = (1,2,3,(4,5))
for i in t:    
    print (i)

### Sequence Types

We can access individual members of a tuple, list, or string using square bracket ”array” notation. 

Note that all are 0 based... 

In [None]:
tu = (23, 'abc', 4.56, (2,3),'def')
tu[1] # Second item in the tuple. 

In [None]:
li = ["abc", 34, 4.34, 23]
li[1] # Second item in the list.

In [None]:
st = "Hello World"
st[1] # Second character in string. 

### Positive and Negative Indicies

* Positive index: count from the left, starting with 0. 
* Negative lookup: count from right, starting with -1. 

In [None]:
tu[-3]

### Slicing: Return Copy of a Subset

Return a copy of the container with a subset of the original members. Start copying at the first index, and stop copying before the second index. 

You can also use negative indices when slicing. 

In [None]:
tu[1:4]

In [None]:
tu[1:-1]

* Omit the first index to make a copy starting from the beginning of the container.
* Omit the second index to make a copy starting at the first index and going to the end of the container. 

In [None]:
tu[:2]

In [None]:
tu[2:]

### Other List Operations

1. length: `len() `
2. append: `b.append('a') `
3. insert: `b.insert(2, 'a')` 
4. sorting: `b.sort() `
5. reversing: `b.reverse() `
6. iteration: `for item in a: `
7. double indexing: `b[2][1] `
8. finding index of first occurrence: `b.index('green') `
1. number of occurrences: `b.count('green') `
1. remove first occurrence: `b.remove('green') `

## The `in` operator

Can be used as a Boolean test to:
* test whether a value is inside a container
* test for substrings

In [None]:
t = [1,2,4,5]
3 in t

In [None]:
4 in t

In [None]:
4 not in t

In [None]:
a = 'abcde'
print('c' in a)
print('ac' in a)

## The `+` operator

The + operator produces a new tuple, list, or string whose value is the concatenation of its arguments. 

In [None]:
(1, 2, 3) + (4, 5, 6) 

In [None]:
[1, 2, 3] + [4, 5, 6] 

In [None]:
"Hello” + ” " + "World" 

## The `*` operator

The * operator produces a new tuple, list, or string that “repeats” the original content.

In [None]:
print((1, 2, 3) * 3)
print([1, 2, 3] * 3)
print("Hello" * 3)

### Other String Operations

1. joining: `c = ' '.join(b) `
2. splitting: `c.split('r') `

## Checking for equality

`==` is used to check whether two values are the same, or if the content of two objects are identical

`is` is used to check whether two objects are identical (have the same memory address)
* `is` is often used to check if an object is null (`None`)

In [None]:
x = [1,2,3]
y = [1,2,3]

In [None]:
x == y

In [None]:
x is y

In [None]:
x is None

### Dictionaries

Dictionaries store a mapping between a set of keys and a set of values. 
* Keys can be any immutable type. 
  * (usually a String)
* Keys must be unique. (Assigning to an existing key replaces its value) 
* Values can be any type.
* A single dictionary can store values of different types. 
* Dictionaries are unordered. (New entry may appear anywhere in the output.) 

You can define, modify, view, lookup, and delete the key-value pairs in the dictionary. 

Curly braces creates a empty dictionary.

In [None]:
d = {}
d1 = { 'age':50, 'weight':250, 'height':"5'6"}   # initializing a dictionary

### Returning a list of keys in the dictionary

In [None]:
d1.keys()

In [None]:
d1.values() # list of values

In [None]:
d1.items()  # list of (key,value) tuples in the dictionary

### Checking for the existance of a key

In [None]:
print ('age' in d1)
print ('marital_status' in d1)

### Accessing a value from its key

In [None]:
d1['age']

### Updating a dictionary value

In [None]:
# assignment
d1['age'] = 24
d1['age']

### Removing dictionary entries

In [None]:
del d1['age']
d1.items()

### If Statements

elif instead of else if
Also note the :, no parentheses, and the tabbing.

In [None]:
avg = 95
if avg == 100:
    print ("super, you get an A")
elif avg > 90:
    print ("a-")
elif avg >= 80:
    print ("BBBBBB")
else:
    print ("you fail")

## List Comprehensions

Really neat.
Can transform a list into another list, or only select certain elements, similar to how a mathematician would.

In [None]:
evens = [x for x in range(5) if x % 2 == 0]
squares = [x * x for x in range(5)]
even_squares = [x * x for x in evens]

print(evens)
print(squares)
print(even_squares)

**Another example:** Create a list of integers which specify the length of each word in a certain sentence, but only if the word is not the word `the`:

<u>not pythonic:</u>

In [None]:
sentence = "the quick brown fox jumps over the lazy dog"
words = sentence.split()
word_lengths = []
for word in words:
    if word != "the":
        word_lengths.append(len(word))
        
word_lengths

<u>pythonic:</u>

In [None]:
sentence = "the quick brown fox jumps over the lazy dog"
words = sentence.split()
word_lengths = [len(word) for word in words if word != "the"]

word_lengths

In [None]:
words = 'The quick brown fox jumps over the lazy dog'.split()
words

In [None]:
[[w.upper(), w.lower(), len(w)] for w in words]

### Nested loops are permitted in list comprehensions

In [None]:
pairs = [(x,y)
        for x in range (5)
        for y in range (3)]
pairs

### Random Number Generation

Need to first import the random library.
Produces deterministic numbers (if you want) for reproducible results.

In [None]:
import random

y = [random.random() for x in range(3)]
print(y)

y = [random.random() for x in range(3)]
print(y)

    
# now reproducible
random.seed(5)
y = [random.random() for x in range(3)]
print(y)

random.seed(5)
y = [random.random() for x in range(3)]
print(y)  

## Functions

In [None]:
def double(x):
    """documentation about the function
    this function multiplies its input by 2"""
    x = x * 3
    return x

# the triple quoted string can be extended over multiple lines!   

In [None]:
y = [1,2,3]
print(double(y))
print(double(3))
print(double("Cat"))

In [None]:
# multiple parameters
def another_method(x,y):
    x = x + 5
    y += 3
    return (y,x)   # can return multiple values

In [None]:
a,b = another_method(10,100)
print (a)
print (a,b)  # printing multiple values on same line

In [None]:
# optional arguments
def one_final_method(x,y,z=100):  # z has a default value if it is unspecified
    x = y+z
    return z

In [None]:
print (one_final_method(3,4,5))
print (one_final_method(3,4))

## Object-Oriented Programming

Everybody ok?

Objects are usually composed of:
1. class variables
2. instance variables
3. method definitions (with object instance passed as first argument)

* A **static class variable** is shared by all instances of an object.
* Each object has its own **instance variables**.

`__init__` is the constructor (note the double underscore)
* Note the static variable
* Note the instance variables
* Note the use of `self` to refer to the instance.

In [None]:
class Employee:
    'Common base class for all employees'
    empCount = 0
    
    def __init__(self, name, salary):
        self.name = name
        self.salary = salary
        Employee.empCount += 1

    def display_count(self):
        print ("Total Employee ", Employee.empCount)

    def display_employee(self):
        print ("Name : ", self.name,  ", Salary: ", self.salary)
        
    def modify_salary(self, newsalary):
        self.salary = newsalary

In [None]:
worker1 = Employee('burns', 90000)
worker2 = Employee('wyatt', 60000)

print(Employee.empCount)
worker1.display_count()
worker2.display_employee()

worker1.modify_salary(100000)

worker1.display_employee()
print(worker1.name)


### Another Example

In [None]:
class Dog:
    def __init__(self, name):
        self.name = name
        self.tricks = []    # creates a new empty list for each dog
    
    def add_trick(self, trick):
        self.tricks.append(trick)

In [None]:
d = Dog('Fido')
e = Dog('Buddy')
d.add_trick('roll over')
e.add_trick('play dead')
print(d.tricks)
print(e.tricks)

### Multiple Inheritance

Cannot do multiple inheritance in Java!

* Resolving attribute references: if an attribute is not found in `DerivedClassName`, it is searched for in `Base1`, then (recursively) in the base classes of `Base1`, and if it was not found there, it was searched for in `Base2`, and so on.

```python
class DerivedClassName(Base1, Base2, Base3):
    # statement-1
    # statement-2
    # ...
    #
    # statement-N
```

### <u>Private</u> Instance Variables

* These currently don’t exist in Python
* Everything is Public
* Use the convention of beginning a private instance variable with an underscore _ to symbolize that it is private

# The Biggest Topics to Know (I think)

* Assignments
* If Statements
* Loops
* Lists
* Tuples
* Dictionaries
* List Comprehensions
* Random Number Generation
* Functions
* OO


## Other topics

* Regular Expressions
* While Loops
* break, continue statements
* ...


