# Introduction to Programming (with Python!)

Now that you have seen something useful (our BLS API example), we will take the time to back up and cover some fundamentals. This notebook will cover some basic programming concepts, but do note that it is a poor substitute for the spectacular (and interactive) [How to Think Like a Computer Scientist](http://interactivepython.org/runestone/static/thinkcspy/index.html). Let's dive in...

In this programming introduction, we will cover the following topics:

1. Programming Environments
2. The Standard Library
3. Modules
4. Hello World!
5. Data Types
6. Variables
7. Errors
8. Collections
9. Control Flow
10. Functions
11. Iterations
12. Input/Output

In [1]:
from IPython.display import HTML, display, SVG

# Define function to allow us to display charts in an iframe
def show_iframe(url, iheight=400, iwidth=1000):
    display_string = '<iframe src={url} width={w} height={h}></iframe>'.format(url=url, w=iwidth, h=iheight)
    print(display_string)
    return HTML(display_string)

## Programming Environments

[Programming environments](https://en.wikipedia.org/wiki/Integrated_development_environment) are not languages themselves, but rather tools that are available to the user to facilitate writing and executing code. This very Notebook is a programming environment.  Python, coupled with the resources of the IPython and Jupyter projects, makes code execution in a variety of ways:

+ At the command line (a.k.a. shell) via a [REPL](https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop) (just discovered this [cool online REPL tool](https://repl.it/))
+ In an enhanced REPL called the [Qt Console](https://ipython.org/ipython-doc/3/interactive/qtconsole.html), which allows inline plotting
+ Execution of a script at the command line (e.g. `python some_script.py`)
+ Inside a Jupyter Notebook (like this one)

Jupyter Lab is an environment that houses all of these options, and allows for interactions between them. You can, for example, write some code in a text editor inside of Jupyter Lab, and then send a selection to a coupled Qt Console.

Even though we are largely using the Jupyter ecosystem, there are a lot of development environments out there. One particularly popular (and more heavy weight) Integrated Development Environment is PyCharm.

In [2]:
%%HTML 

<a href="https://www.youtube.com/watch?v=dWlk5JvEaWs">
  <img src="https://i.ytimg.com/vi/dWlk5JvEaWs/maxresdefault.jpg">
</a>

## The Standard Library

While we will spend a lot of our time working with tools that developers have put into packages, the base language has a rich set of functions built in.  These intrinsic functions are collectively known as the [Python standard library](https://docs.python.org/3/library/index.html).  They are the fundamental building blocks for all projects that use Python. To access them, you need only install the language itself.

In [3]:
show_iframe('https://docs.python.org/3/library/index.html', iwidth='95%', iheight=700)

<iframe src=https://docs.python.org/3/library/index.html width=95% height=700></iframe>


In [1]:
import random

dir(random)

['BPF',
 'LOG4',
 'NV_MAGICCONST',
 'RECIP_BPF',
 'Random',
 'SG_MAGICCONST',
 'SystemRandom',
 'TWOPI',
 '_BuiltinMethodType',
 '_MethodType',
 '_Sequence',
 '_Set',
 '__all__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '_acos',
 '_bisect',
 '_ceil',
 '_cos',
 '_e',
 '_exp',
 '_inst',
 '_itertools',
 '_log',
 '_pi',
 '_random',
 '_sha512',
 '_sin',
 '_sqrt',
 '_test',
 '_test_generator',
 '_urandom',
 '_warn',
 'betavariate',
 'choice',
 'choices',
 'expovariate',
 'gammavariate',
 'gauss',
 'getrandbits',
 'getstate',
 'lognormvariate',
 'normalvariate',
 'paretovariate',
 'randint',
 'random',
 'randrange',
 'sample',
 'seed',
 'setstate',
 'shuffle',
 'triangular',
 'uniform',
 'vonmisesvariate',
 'weibullvariate']

In [2]:
help(random.uniform)

Help on method uniform in module random:

uniform(a, b) method of random.Random instance
    Get a random number in the range [a, b) or [a, b] depending on rounding.



## Modules

Python [modules](https://docs.python.org/3/tutorial/modules.html) are just files that contain Python definitions and statements. The purpose of modules is to make it possible to leverage durable tools. Once they are imported, the contents of the module are then available for use in the calling program.

We have a simple module, called `module_test` available right now.  Let's see if we can use it.

In [3]:
!cat module_test.py

'''
FILE:       module_test.py
AUTHOR:     Marvin
CREATED:    2/3/2018

This module is intended only to demonstrate how we can use modules in our programs.
'''

def squareit(x):
    return x**2

In [4]:
from module_test import squareit

squareit(9)

81

When use an `import` statement, Python first checks the current directory to see if the module is located there. (`module_test.py` is in our current directory.)  If it doesn't see it there, it then checks the `PYTHONPATH`, followed by the default location for packages: `/usr/local/lib/python/`. These two locations aren't defined for us, because Miniconda deposits packages elsewhere.

In [8]:
!echo $PYTHONPATH
!ls /usr/local/lib/python


ls: cannot access '/usr/local/lib/python': No such file or directory


Our packages reside under Miniconda: `[..]/miniconda3/pkgs`. In general, packages promote good coding hygiene by encouraging modularity (which means it makes code easier to debug and maintain).

When we import packages, we can import a specific function, like we did above.  We can also import entire libraries with the wild card operator `*`.

In [5]:
from seaborn import *

Once we do this, we can use any function or object in the library without having to reference the library first. **I strongly discourage you from doing this.** It makes code harder to follow because you don't have any easy way of knowing which components come from where. A better solution to the problem of avoiding excessive typing is to use an *alias*.

In [6]:
import numpy as np

Now I have the `numpy` library available, I just have to make sure that I use `np.` when referencing anything from it. The choice of alias is arbitrary. I can also use an alias when importing subcomponents of a library. Maybe, for example, I don't want all of `bokeh`.  I can grab just the `plotting` submodule.

In [7]:
import bokeh.plotting as bp

This avoids bringing more content into memory than I need, which can speed execution by reducing memory loads.

What if I don't know what component I need?  I can always import the base library and take a look with the IPython command, `dir()`. Let's check out the submodules under the top level of `bokeh`.

In [12]:
import bokeh
dir(bokeh)

 '__base_version__',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '__version__',
 '_version',
 'absolute_import',
 'colors',
 'core',
 'document',
 'embed',
 'events',
 'io',
 'layouts',
 'license',
 'model',
 'models',
 'palettes',
 'plotting',
 'print_function',
 'resources',
 'sampledata',
 'settings',
 'themes',
 'transform',
 'util',

## Hello World!

One of the first things that folks do when learning a new language is just making sure they can get something to run. The canonical method of doing this is a `Hello World` script.  We have one ready, but we will also go ahead and make one together.  Just so you see what it looks like, we can run it by using shell commands from inside the Notebook.

In [9]:
!pwd

/home/choct155/projects/telling_stories_with_data/examples


In [10]:
!ls

api_example.ipynb  hello_world.py  python_intro.ipynb	  test
cex		   module_test.py  random_data.csv	  test_io
figs		   __pycache__	   simulating_data.ipynb  two_cols.html


In [12]:
!ls cex/src

cex_food_spend_age_income.ipynb  module_test.py  test.py


In [16]:
!python hello_world.py

Hello World!


## Data Types

In [18]:
if 3 > 4:
    print('success')

For our purposes, we are primarily concerned with four data types (which is not exhaustive):

+ [Boolean](https://en.wikipedia.org/wiki/Boolean_data_type)
+ [Integer](https://en.wikipedia.org/wiki/Integer_(computer_science)
+ [Float](https://en.wikipedia.org/wiki/Floating-point_arithmetic#Floating-point_numbers)
+ [String](https://en.wikipedia.org/wiki/String_(computer_science)

The actual implementation of these broad classes of data is a bit more complicated, but we can get quite far with this representation.

In [19]:
'abc123'

'abc123'

In [20]:
'123"'

'123"'

In [21]:
123 + 456

579

In [22]:
'123'+'456'

'123456'

In [26]:
str(123)

'123'

In [30]:
float('1e23')

1e+23

In [None]:
mylist = ['dog', 'cat', 'fish']

## Variables

In math, variables are symbolic representations of values or concepts.  In programming, the same general concept holds, but how it is actually implemented matters. In programming, a variable is actually [an address](https://en.wikipedia.org/wiki/Variable_(computer_science) in the computer's memory. Why does this matter? If you change the value at that location, the symbolic representation will then point to a new value at the old location. 

In Python, it's a bit tricky to displace the value of a variable without explicit reassignment, but that isn't true in all languages. The more common scenario is when a coder thinks that two variables are pointing at the same value, and they become decoupled.

In [31]:
a = 10
b = a

print('a => Value: {}, Location: {}'.format(a, id(a)))
print('b => Value: {}, Location: {}'.format(b, id(b)))

a => Value: 10, Location: 94859048353280
b => Value: 10, Location: 94859048353280


In [32]:
b = 20

print('a => Value: {}, Location: {}'.format(a, id(a)))
print('b => Value: {}, Location: {}'.format(b, id(b)))

a => Value: 10, Location: 94859048353280
b => Value: 20, Location: 94859048353600


## Errors

There are at least three main types of errors:

+ Syntax Errors
+ Runtime Errors
+ Semantic Errors

In [16]:
# A syntax error
mylist = [1, 2, 3

SyntaxError: unexpected EOF while parsing (<ipython-input-16-6ba40ac16db8>, line 2)

In [33]:
# A runtime error
def divideByZero(x):
    return x/0

divideByZero(10)

ZeroDivisionError: division by zero

In [35]:
# A semantic error
def squareit(x):
    return x**2
    
squareit(10)

100

In [41]:
False = True

SyntaxError: can't assign to keyword (<ipython-input-41-7d4e2f54c615>, line 1)

In [38]:
assert squareit(10) == 100

In [36]:
import sys
try:
    divideByZero(10)
except:
    print('Houston we have a problem: {}'.format(sys.exc_info()[0]))

Houston we have a problem: <class 'ZeroDivisionError'>


## Collections

There are four main collection objects that you need to know about:

+ [List](https://docs.python.org/3/tutorial/datastructures.html#more-on-lists)
+ [Set](https://docs.python.org/3/tutorial/datastructures.html#sets)
+ [Tuple](https://docs.python.org/3/tutorial/datastructures.html#tuples-and-sequences)
+ [Dictionary](https://docs.python.org/3/tutorial/datastructures.html#dictionaries)

All of these are incredibly useful, and will come up with some frequency.  Here is some free advice, **don't sleep on those dictionaries!**.  They can hold virtually any object, which is then references in constant time.

In [4]:
mylist = [0,1,2,3]
mylist.append(4)
mylist.pop(4)

4

In [5]:
mylist

[0, 1, 2, 3]

In [7]:
mylist2 = []
mylist2

[]

In [8]:
mylist2.append(mylist)

In [9]:
mylist2

[[0, 1, 2, 3]]

In [12]:
mylist2[0][2]

2

In [13]:
mylist3 = [1,1,1,1]
mylist3

[1, 1, 1, 1]

In [15]:
myset = {1,1,2,3}
myset2 = {3,4,5}

In [18]:
myset | myset2

{1, 2, 3, 4, 5}

In [20]:
a,b,c = (1,2,3)

In [23]:
c

3

In [24]:
mydict = {'a':1, 'b':2, 'c':3}
mydict['a']

1

In [25]:
mydict

{'a': 1, 'b': 2, 'c': 3}

## Control Flow

There are many situations in which we want to change the order of operations in a program.  Maybe it's because we want to make a choice based upon a condition, or we want to repeat something many times before moving on, or some other situation has arisen. The general class of operations we use to deal with these cases fall under the [control flow](https://python.swaroopch.com/control_flow.html) umbrella.

For now, we are going to focus only on two such constructs, but they have the flexibility to work in a very wide variety of situations.

+ [If-Then-Else](https://en.wikipedia.org/wiki/Conditional_(computer_programming)
+ [For Loop](https://en.wikipedia.org/wiki/For_loop)

## Functions

In mathematical terms, a function maps a set of inputs to a set of outputs.

$$f: X \rightarrow Y$$

The function maps values from the $X$ domain to values in the $Y$ domain.

$$y = f(x)$$

In programming, it's the same general principle. For example, our `squareit()` function maps values to their squared representation.

$$9 = squareit(3)$$

All of the functions you see are derived from this idea. In Python, the generally follow the following syntax:

```python
def function(args):
   ...[do stuff]...
   return result
```

Functions can be very effective, and resuable tools, if well designed.  They should do one thing, and nothing else.  You can think of them as verbs.

## Iteration

The intuitiveness of iteration in Python is one of the best features of the language (though it is not unique to Python). 

In [20]:
mylist = [0,1,2,3,4,5]
mytuple = (0,1,2,3,4,5)
myset = {0,1,2,3,4,5}
mydict = {'a':0, 'b':1, 'c':2, 'd':3, 'e':4, 'f':6}

for num in mylist:
    print(num)

0
1
2
3
4
5


In [21]:
for num in mytuple:
    print(num)

0
1
2
3
4
5


In [22]:
for num in myset:
    print(num)

0
1
2
3
4
5


In [23]:
for key in mydict:
    print(key)

for num in mydict.values():
    print(num)

a
b
c
d
e
f
0
1
2
3
4
6


In [24]:
print(range(10))
for num in range(10):
    print(num)

range(0, 10)
0
1
2
3
4
5
6
7
8
9


In [25]:
for ltr in 'abcdef':
    print(ltr)

a
b
c
d
e
f


In [26]:
for elem in ['bob', 1, '#ffffff']:
    print(elem)

bob
1
#ffffff


[List comprehensions](https://www.datacamp.com/community/tutorials/python-list-comprehension) are convenient methods of transforming items in a collection. They can be used to apply functions on each element in a collection while it's being iterated over.  The general syntax is as follows:

`[function(val) for val in collection if condition==True]`

In [27]:
print(mylist)
print([val for val in mylist if val%2==0])
print([val**2 for val in mylist if val%2==0])

[0, 1, 2, 3, 4, 5]
[0, 2, 4]
[0, 4, 16]


Comprehensions aren't just for lists.  There are a lot of reasons for dictionary comprehensions, but what if you just want to reverse the order of keys and values?

In [28]:
print(mydict)
{v:k for (k,v) in mydict.items()}

{'a': 0, 'b': 1, 'c': 2, 'd': 3, 'e': 4, 'f': 6}


{0: 'a', 1: 'b', 2: 'c', 3: 'd', 4: 'e', 6: 'f'}

Sometimes you want to coordinate operations on elements of more than one list.  It is possible to do this by coordinating on index position.

In [29]:
print(mylist)
print(len(mylist))
print(range(len(mylist)))
for i in range(len(mylist)):
    print('mylist element {}: {}'.format(i, mylist[i]))
    print('my letters element {}: {}'.format(i, 'abcdef'[i]))
    print('my composite element {}: {}'.format(i, str(mylist[i])+'abcdef'[i]))

[0, 1, 2, 3, 4, 5]
6
range(0, 6)
mylist element 0: 0
my letters element 0: a
my composite element 0: 0a
mylist element 1: 1
my letters element 1: b
my composite element 1: 1b
mylist element 2: 2
my letters element 2: c
my composite element 2: 2c
mylist element 3: 3
my letters element 3: d
my composite element 3: 3d
mylist element 4: 4
my letters element 4: e
my composite element 4: 4e
mylist element 5: 5
my letters element 5: f
my composite element 5: 5f


## Input/Output

The canonical way to write to and read from a file can be performed with functions from the standard library.

In [30]:
with open('test_io', 'w') as f:
    f.write('We wrote stuff')

In [31]:
!cat test_io

We wrote stuff

In [34]:
with open('test_io', 'r') as f:
    io_text = f.read()

print(io_text)

We wrote stuff


For the most part, however, we will be using library I/O functions.  Specifically, we will rely almost exclusively on [pandas](https://pandas.pydata.org/).

In [36]:
show_iframe('https://pandas.pydata.org/', iwidth='95%')

<iframe src=https://pandas.pydata.org/ width=95% height=400></iframe>


We can simulate some data, write it to disk as a CSV, and then read it back in.

In [38]:
import numpy as np
import pandas as pd

In [40]:
data = pd.DataFrame(np.random.uniform(size=16).reshape(4,4))

data

Unnamed: 0,0,1,2,3
0,0.633516,0.188297,0.42434,0.969255
1,0.779906,0.182256,0.0493,0.570667
2,0.637426,0.876976,0.110847,0.695058
3,0.475013,0.020227,0.455767,0.63816


In [42]:
data.to_csv('random_data.csv')
!pwd
!ls

/home/choct155/projects/telling_stories_with_data/examples
api_example.ipynb  hello_world.py  python_intro.ipynb	  test
cex		   module_test.py  random_data.csv	  test_io
figs		   __pycache__	   simulating_data.ipynb  two_cols.html


In [43]:
recycled_data = pd.read_csv('random_data.csv')

recycled_data

Unnamed: 0.1,Unnamed: 0,0,1,2,3
0,0,0.633516,0.188297,0.42434,0.969255
1,1,0.779906,0.182256,0.0493,0.570667
2,2,0.637426,0.876976,0.110847,0.695058
3,3,0.475013,0.020227,0.455767,0.63816
