In [None]:
%matplotlib inline
import matplotlib
import seaborn as sns
sns.set()
matplotlib.rcParams['figure.dpi'] = 144

In [None]:
import expectexception

<!-- requirement: images/logic_tree.png -->

# Python


Python is an open-source, high-level, dynamically-typed general-purpose scripting language.  Its philosophy is around code readability and flexibility, but there's a hidden "one right (i.e. pythonic) way to do things" mentality.  Almost everyone uses the [CPython](https://en.wikipedia.org/wiki/CPython) implementation although there are some adherents to Jython (implemented in Java) and [PyPy](https://en.wikipedia.org/wiki/PyPy) (which is a faster but more restricted version of Python).

# Jupyter notebooks and the kernel


Python is interpreted, as opposed to compiled. This means that a program called the interpeter turns our Python code into instructions that the processor can execute. We are currently working in an interactive environment (sometimes called a [REPL](https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop)) called a Jupyter notebook. The Jupyter notebook has a Python interpreter kernel that will evaluate our code line-by-line. The format of the notebook also allows us to segregate code into blocks (called cells) that can be executed separately.

In [None]:
1+1

Since notebooks are interactive, we'll rely heavily on console output. Only the last evaluated statement will be displayed as output of a cell. If we want to inspect results of other statements, we'll need to `print` them.

In [None]:
# we only see the results of the last line

1+1
2+2
3+3

In [None]:
# to see them all we need to print

print 1+1
print 2+2
print 3+3

# Variables

Evaluating input and returning results won't get us very far if there is no way to persist the output. For example, to calculate a number in the Fibonacci sequence, we need to remember the two numbers that precede it. We store information in variables for reference in later lines of code (or later cells in the notebook).

In [None]:
a = 1 + 1
b = 4.3 * a
c = 'hello'

print a, type(a)
print b, type(b)
print c, type(c)

There are many types of information we can store in variables. Here we've just stored some simple types like an integer, a floating-point number, and a string. We'll encounter more complex objects later in the course.

# Functions


How do we act on this information? Sometimes we will interactively perform operations on variables, but often we'll want to perform the same operation many times. We can define functions that accept input, perform a task, and may return output. Functions help us [organize programs in logically structured ways](https://en.wikipedia.org/wiki/Modular_programming), which makes code easier to read and maintain.

In [None]:
def multiply_by_five(x):
    return x * 5

There are a few special elements of syntax in a function definition. Functions begin with a `def` statement (because we are _def_ining a function). This is followed by the name of the function (e.g. `multiply_by_five`). After that, enclosed in parentheses, are the _arguments_. Arguments are placeholder variables that we will use to refer to input in the _body_ of the function. The body of the function is an indented block under the definition that contains all the code the function will execute whenver we _call_ it. Within the body, `return` has the special meaning of returning some output and concluding execution of the function.

In [None]:
multiply_by_five(5)

In [None]:
my_number = 5.1
multiply_by_five(my_number)

We can also define _anonymous_ functions using the `lambda` keyword.

In [None]:
add = lambda x, y: x + y

add(2, 6)

Python is not a _type-safe_ language, meaning that we don't explicitly declare the type of a variable when it is defined. This is what let's us use `multiply_by_five` on both `int` and `float` without any special declaration that both are allowed. Sometimes this can result in surprising behavior.

In [None]:
multiply_by_five('what will this do?')

And sometimes this can result in an error.

In [None]:
%%expect_exception TypeError

multiply_by_five(None)

# Logic and program flow


Since functions may accept many different values and types of input, we often will want a function to behave in ways that depend on the input. For instance, maybe we want our function to interpret the input `None` as 0.

In [None]:
def multiply_by_five(x):
    if x is None:
        return 0
    else:
        return x * 5

In [None]:
# this is also valid; why?
def multiply_by_five(x):
    if x is None:
        return 0
    return x * 5

In [None]:
print multiply_by_five(my_number)
print multiply_by_five(None)

When `x` is `None`, the `if` statement evaluates as `True`. `True` and `False` are called _Boolean_ variables.

We can define complex logic by combining conditions using logical operators and by nesting conditionals.

In [None]:
def multiply_by_five(x):
    if x is None:
        return 0
    elif isinstance(x, int) or isinstance(x, float):
        return x * 5
    else:
        raise TypeError('Expected input to be number or None. Got %s' % type(x))

In [None]:
multiply_by_five(3)

In [None]:
%%expect_exception TypeError

multiply_by_five('hello')

For more complex logic, it can be helpful to make a diagram of the branching outcomes.

In [None]:
def hello_goodbye(lang, time):
    if lang.lower() is 'english':
        if time < 1200:
            return 'hello'
        return 'goodbye'
    elif lang.lower() is 'espanol' or lang.lower() is 'spanish':
        if time < 1200:
            return 'hola'
        return 'hasta luego'
    else:
        raise ValueError("Argument lang must be set to 'english', 'spanish', or 'espanol'.")

![logic_flowchart](images/logic_tree.png)


## Gotcha: Truthiness


Some objects act as False when treated as Booleans. For example, `None` (it's like a null value), empty lists and empty dictionaries.

In [None]:
if None or 0 or False or [] or {}:
    print "Python is not Truthy"
else:
    print "Python is truthy"

# Iteration


Functions are great for packaging up code that we want to use repeatedly, but we also need a way to _execute_ code repeatedly. Repeatedly executing code is called iteration.

In [None]:
i = 0
while i < 10:
    print i
    i += 2

The `while` loop looks similar to other languages, but usually we will iterate using a `for` loop. Unlike other languages, in Python a `for` loop steps through a collection of objects, setting a placeholder variable to the current object in the collection. In the example below, we step through a list of numbers, 0--4, with `x` taking on the value of each one. 

In [None]:
print range(4)

for x in range(4):
    print x

In C and Java, one iterates through an index.  In Python, one iterates through the *values* of a list, making the syntax cleaner.  In C, we would write
```c
int arr[10];
for (int i=0; i < 10; i++) {
    arr[i] = 2 * i;
}
```

In Python, this is

In [None]:
arr1 = list()
for i in range(10):
    arr1.append(2 * i)

Python also has a unique syntax for iteration called _comprehension_, which can turn a simple `for` loop into a one-liner.

In [None]:
# this is a comprehension
arr2 = [2 * i for i in range(10)]

print "arr1 =", arr1
print "arr2 =", arr2

# Whitespace Matters


In Python, indentation sets up a new context, like a function, conditional, loop, or class. For example, while a function might end with `return`, not every function will have a `return`. Python knows where the end of the body of a function is based on indentation.

On the positive side, this reduces the number of lines of code (no more `{` and `}`).  On the downside, it makes multiline anonymous functions difficult (they don't exist in Python).

In [None]:
for i in range(10):
    print i
    
def add(x, y):
    x + y
    
class Box(object):
    def __init__(self, inside):
        self._inside = inside
        
    def value(self):
        return self._inside

# Putting it all together


We might combine functions, iteration, and logic to perform a complex task. Let's print out the_ prime numbers up to some number, `n`. Notice that `print_primes` is very easy to understand, because we've written a helper function, `is_prime`, that checks if a single number is prime. This is called [_modular programming_](https://en.wikipedia.org/wiki/Modular_programming).

In [None]:
def is_prime(number):
    # 1 is a special case
    if number == 1:
        return False
    
    for divisor in range(2, number):
        if number % divisor == 0:
            return False

    return True

def print_primes(n):
    for number in range(1, n):
        if is_prime(number):
            print '%d is prime' % number

print_primes(23)

# Classes


Finally, you can see the notebook about [Classes](IW_Classes.ipynb) to learn more about classes in Python and how they differ from other languages.

# Flexible

It supports multiple programming paradigms ([object-oriented](https://en.wikipedia.org/wiki/Object-oriented_programming) , [imperative](https://en.wikipedia.org/wiki/Imperative_programming), [functional](https://en.wikipedia.org/wiki/Functional_programming)).

# Concise

Not just reducing superfluous symbols, Python is more concise than C++ or Java because it abstracts many implementation details, which usually reduces code size.

Opening a file in Java
``` java
import java.io.*;
...

BufferedReader myFile =
    new BufferedReader(
        new FileReader(argFilename));
```
        
Opening a file in Python
``` python
myFile = open(argFilename)
```

# Great for Data Science


For better or worse, data science has really gravitated towards Python as an implementation language.
- Partially this is because it's production ready (try getting your engineers to accept R in a production system) but still relatively easy to use (try getting your data scientists to write their code in Java or C).
- Where it lacks for speed, it can plug directly into C via Cython, which has lead to the creation of a whole host of numerical libraries (primarily [numpy](http://www.numpy.org/) and [scipy](https://www.scipy.org/)) which are the basis of important data science tools like ([pandas](http://pandas.pydata.org/) and [scikit-learn](http://scikit-learn.org/)).  We'll discuss this in more detail later. Python can also call C and Fortran code.

*Copyright &copy; 2017 The Data Incubator.  All rights reserved.*