In [1]:
# preamble to be able to run notebooks in Jupyter and Colab
try:
    from google.colab import drive
    import sys
    
    notes_home = "/content/drive/Shared drives/CSC310/notes"
    drive.mount('/content/drive')
    
    sys.path.insert(1,notes_home) # let the notebook access the notes folder
    prefix = notes_home + "/" # needed for data file access

except ModuleNotFoundError as err:
    prefix = "" # running native Jupyter environment -- no need for prefix

# Python for Data Science

* Anaconda3 ([www.anaconda.com](https://www.anaconda.com))
    * Python 3.x
    * Includes ALL major Python data science packages
        * Sci-kit learn
        * Pandas
        * PlotPy
    * Jupyter Notebooks


## A Whirlwind Tour of Python

If you are not familiar with Python or you feel you are rusty then I recommend looking Jake VanderPlas’ intro to Python:

[A Whirlwind Tour of Python](https://www.oreilly.com/programming/free/files/a-whirlwind-tour-of-python.pdf)

## Python - simple commands!

Python is an interactive interpreter started from the shell:

```Python
lutz$ python
Python 3.5.2 |Anaconda 4.2.0 (64-bit)| (default, Jul  2 2016, 17:53:06)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 3 + 10.5
13.5
>>> 7/2
3.5
>>> print("hello world!")
hello world!
>>> 
```

But we are going to be using it from within on Jupyter Notebooks!

## Loading Files

Assume that we have the following program stored in a file called `helloworld.py` in the folder `assets`:
```Python
"""                                                                                                               
helloworld.py
This is the classic program every programmer writes when he or she learns
a new programming language.
"""

def hello():
    "Just print 'hello world!' and that's it"
    print("hello world!") # print inserts a newline char    
```

In [2]:
import assets.helloworld

### Calling Functions in Modules

Functions belong to modules - if you want to execute a function in a module you have to provide the module name as a qualifier (and the folder the module lives in).

In [3]:
assets.helloworld.hello()

hello world!


One of the most helpful features of Python is the `help` function callable on any Python object.

In [4]:
help(assets.helloworld)

Help on module assets.helloworld in assets:

NAME
    assets.helloworld

DESCRIPTION
    helloworld.py
    This is the classic program every programmer writes when he or she learns
    a new programming language.

FUNCTIONS
    hello()
        Just print 'hello world!' and that's it

FILE
    /Users/lutz/Dropbox/URI/Courses/CSC310/github stuff/ds-git/notes/assets/helloworld.py




**Docstrings shine!!!** - automatically generated documentation of your module


### Docstring vs Comment

* A docstring should document what your code does
  * Important for the user of your code
  * Docstrings are exported by Python into the help system
* A comment should comment on how your code does it
  * Important for your peer programmers modifying/understanding your code
  * Comments stay internal to the code


## Python - `import *` considered dangerous!

`from <module> import *` -- Any function or variable in <module> is imported into your local scope WITHOUT a module qualifier!


In [5]:
from assets.helloworld import *

In [6]:
hello()

hello world!


**Very Dangerous!** - it can lead to silent name clashes with strange effects on your code!


Consider we have another file `helloagain.py` that also defines a `hello` function:

```Python
"""
helloagain.py

Here we demonstrate that Python silently clobbers names clashes if you are not careful.
"""

def hello():
    "Print out 'hello again!' and that's it"
    print("hello again!")
```

In [7]:
from assets.helloagain import * # Silently overwrote the original hello() - the original is no longer available!!

In [8]:
hello()

hello again!


**Never use `from <module> import *`  - you have no control over your name space!** Always use fully qualified function names for import.


In [9]:
from assets.helloworld import hello 
from assets.helloagain import hello as hi

In [10]:
hello()

hello world!


In [11]:
hi()

hello again!


## Python - basic programming structures!

### The Loop

In [12]:
for i in range(5): # a for loop with range object
    print(i)

0
1
2
3
4


The `range` function:
```
range(stop) -> range object
range(start, stop[, step]) -> range object
  
Returns an object that produces a sequence of integers from start (inclusive) to stop (exclusive) by step.  
range(i, j) produces i, i+1, i+2, ..., j-1.
start defaults to 0, and stop is omitted!  range(4) produces 0, 1, 2, 3.
These are exactly the valid indices for a list of 4 elements.
When step is given, it specifies the increment (or decrement).
```

In [13]:
list(range(5))

[0, 1, 2, 3, 4]

In [14]:
list(range(5,0,-1))

[5, 4, 3, 2, 1]

In [15]:
lst = [1,2,3] # for loop over lists
for e in lst:
    print(e)

1
2
3


In [16]:
lst = ['chicken','turkey','duck']
for e in lst:
    print(e)

chicken
turkey
duck


### The if-then-else statement

In [17]:
x = input("type a value: ")
x = int(x)
if x==2:
    print('x equals 2')
else:
    print('x is something else')


type a value: 3
x is something else


### The function definition statement

We saw some of that already above. Here is another function definition with parameters.

In [18]:
def inc(x):
    return x+1


In [19]:
inc(3)

4

A slightly more complicated example using a recursive function.

In [20]:
"""                                                                                                               
fact.py                                                                                                           
                                                                                                                  
An example of a recursive function to                                                                             
 find the factorial of a number                                                                                   
"""

def factorial(x):
    """                                                                                                           
    This is a recursive function to find the factorial of an 
     integer x where x >= 0.  The function is not defined 
     for x < 0.                                                         
    """
    if x == 0:
        return 1
    else:
        return x * factorial(x-1)

In [21]:
factorial(3)

6

## Python Lists

In Python lists are a cornerstone of programming.  Consequently lists have a lot of built-in functionality.

In [22]:
lst = [1,2,3]
lst

[1, 2, 3]

In [23]:
lst.append(4)
lst

[1, 2, 3, 4]

In [24]:
lst.reverse()
lst

[4, 3, 2, 1]

In [25]:
lst[0]

4

In [26]:
lst = []
lst

[]

In [27]:
len(lst)

0

Things you can do with lists: <br>
 append(...)<br>
 clear(...)<br>
 copy(...)<br>
 count(...)<br>
 extend(...)<br>
 index(...)<br>
 insert(...)<br>
 pop(...)<br>
 remove(...)<br>
 reverse(...)<br>
 sort(...)<br>
See `help([ ])`

### List Comprehensions

Comprehensions are a short hand notation for constructing lists.

In [28]:
S = [x**2 for x in range(10)]
S

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Another more complicated example.

In [29]:
words = 'The quick brown fox jumps over the lazy dog'.split()
words

['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']

In [30]:
stuff = [[w.upper(), w.lower(), len(w)] for w in words]
stuff

[['THE', 'the', 3],
 ['QUICK', 'quick', 5],
 ['BROWN', 'brown', 5],
 ['FOX', 'fox', 3],
 ['JUMPS', 'jumps', 5],
 ['OVER', 'over', 4],
 ['THE', 'the', 3],
 ['LAZY', 'lazy', 4],
 ['DOG', 'dog', 3]]

Note: strings are objects with
member functions!

Note: we are constructing a list
of lists!


## Data Structures

Python has a number of data structures beyond lists that make programming much easier:
* Tuples
* Sets
* Dictionaries


### Tuples

* A tuple consists of a number of values separated by commas
* Though tuples may seem similar to lists, they are often used in different situations and for different purposes. 
* Tuples are *immutable*, and usually contain a heterogeneous sequence of elements that are accessed via *unpacking* or *indexing*. 
* Lists are *mutable*, and their elements are usually homogeneous and are accessed by *iterating* over the list.


In [31]:
t = (12345, 54321, 'hello!')
t

(12345, 54321, 'hello!')

In [32]:
t[0]

12345

In [33]:
(x, y, z) = t     # pattern matching!
x

12345

In [34]:
empty = ()
len(empty)

0

In [35]:
singleton = 'hello',    # <-- note trailing comma
len(singleton)

1

In [36]:
singleton

('hello',)

### Sets

A set is an unordered collection with no duplicate elements.

In [37]:
basket = {'apple', 'orange', 'apple', 'pear', 'orange', 'banana'} # apple and orange have duplicate entries
basket # show that duplicates have been removed

{'apple', 'banana', 'orange', 'pear'}

In [38]:
'orange' in basket                 # fast membership testing

True

In [39]:
'crabgrass' in basket

False

Sets support the standard set operations such as union, intersection and set difference. Sets can also be built using **set comprehensions** which mirror the mathematical version of set comprehensions.

In [40]:
a = set('abracadabra')
a

{'a', 'b', 'c', 'd', 'r'}

In [41]:
b = set('alacazam')
b

{'a', 'c', 'l', 'm', 'z'}

In [42]:
a | b # union

{'a', 'b', 'c', 'd', 'l', 'm', 'r', 'z'}

In [43]:
a & b # intersection

{'a', 'c'}

In [44]:
a - b # difference

{'b', 'd', 'r'}

In [45]:
{x for x in set('abracadabra') if x not in set('abc')} # set comprehension

{'d', 'r'}

### Dictionaries

A dictionary is an unordered set of `key:value` pairs, with the requirement that the keys are unique (within one dictionary).


In [46]:
tel = {'jack': 4098, 'sape': 4139}
tel

{'jack': 4098, 'sape': 4139}

In [47]:
tel['jack'] # looking up a value using a key

4098

In [48]:
tel['guido'] = 4127 # adding a new key:value pair
tel

{'guido': 4127, 'jack': 4098, 'sape': 4139}

In [49]:
del tel['sape'] # removing a key:value pair
tel

{'guido': 4127, 'jack': 4098}

In [50]:
list(tel.keys()) # we can just look at the keys in the dictionary

['jack', 'guido']

In [51]:
list(tel.values()) # we can just look at the values in the dictionary

[4098, 4127]