# Fundamentals of Programming

Evan Bianco
[agilegeoscience](http://agilegeoscience.com), [@EvanBianco](http://twitter.com/EvanBianco)

$$ E = mc^2 $$

- Variables and assignment

- Native data types

- Operators and expressions

- Data collections and data structures

- Procedures and control: loops and making decisions

- Getting data, manipulating data

- Defining functions and calling functions 

- Writing and running programs 

- Objects and classes

# Variables and assignment

In [331]:
x = 7
y = 10

In [332]:
x, y

(7, 10)

In [333]:
x

7

In [334]:
x.__repr__()

'7'

Checking the `type` of a variable

In [335]:
type(x)

int

In [336]:
type(10.0)

float

In [337]:
%whos

Variable       Type                          Data/Info
------------------------------------------------------
a              list                          n=9
adder          function                      <function adder at 0x1093a4f28>
age_list       list                          n=12
age_names      list                          n=12
ages           dict                          n=12
b              list                          n=4
bad_val        str                            2469.207 M
badval         str                            2469.207 M
body           list                          n=17
c              int                           42
d              tuple                         n=0
depth          list                          n=10
double         function                      <function double at 0x1093a49d8>
e              tuple                         n=2
f              TextIOWrapper                 <_io.TextIOWrapper name='<...>ode='r' encoding='UTF-8'>
fib            list       

In [338]:
del y

Variable names are lower case by default. And don't use Python's keywords, especially `lambda`:

    and, as, assert, break, class, continue, def, del, elif, else, except, 
    exec, finally, for, from, global, if, import, in, is, lambda, not, or,
    pass, print, raise, return, try, while, with, yield

# Native data `types`

In [339]:
z = 1.4 + 2.3

In [340]:
print(z)

3.6999999999999997


In [341]:
c = 2 + 1.5j  # same as writing: complex(2, 1.5) 
type(c)

complex

In [342]:
5 / 3

1.6666666666666667

In [343]:
5 // 3

1

Why are there 2 kinds of numbers?

## Strings `str`

In [344]:
"5" + '3'

'53'

In [345]:
5, '5', 'five'

(5, '5', 'five')

In [346]:
str(5), str('5')  # This is called type casting.

('5', '5')

In [347]:
w = '300E30049592'
str(w)

'300E30049592'

In [348]:
int('5'), float('5'), int('five')

ValueError: invalid literal for int() with base 10: 'five'

In [349]:
'5' + '5'

'55'

In [350]:
s1 = ' #Nor degg: '

In [351]:
s1.strip()

'#Nor degg:'

In [352]:
dir(s1)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',
 'zfill']

In [353]:
s1.strip(' #:').upper().replace(' ', '')

'NORDEGG'

In [354]:
len(s1)

12

In [355]:
snew.startswith('Nor')

False

## `str` indexing (how to count, part 1)

----
- **Exercise**: return the `e` character in `s`


In [356]:
s1.isdigit()

False

Try `help(s)`, `s1?`, `s1??`, `s1.<tab>`, `s1.upper()` , `s1.strip()`, `s1.startswith()`, `s1.pop()`

## Can we add strings together? 

In [357]:
s2, s3 = 'Limestone', 'Shale'

In [358]:
print(s2 + s3)

LimestoneShale


In [359]:
print(s2 * 5)

LimestoneLimestoneLimestoneLimestoneLimestone


In [360]:
print(s2 + 5)

TypeError: Can't convert 'int' object to str implicitly

In [361]:
lithology = s1 + s2 + ' has minor ' + s3 + ' fragments'
lithology

' #Nor degg: Limestone has minor Shale fragments'

In [362]:
'{0} {1} has minor {2} fragments'.format(s1, s2, s3)

' #Nor degg:  Limestone has minor Shale fragments'

## String methods and string formatting
----
- **Exercise**: Use a combination of string methods on `s` text formatting to produce the following output:

    `> The Nordegg limestone has minor shale fragments` 
    
    (ensure sentence case and remove `'#'`, `':'`, `'\n')

In [363]:
s1, s2, s3

(' #Nor degg: ', 'Limestone', 'Shale')

# Operators and expressions

* mathematical operations
* comparison operations
* bitwise operations
* augmented assignment, copies, and pointers
* boolean expressions

### mathematical operations

In [364]:
10**2

100

In [365]:
import math

math.sqrt(25)
math.

SyntaxError: invalid syntax (<ipython-input-365-fffb53cb805b>, line 4)

In [366]:
dir(math)

['__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'acos',
 'acosh',
 'asin',
 'asinh',
 'atan',
 'atan2',
 'atanh',
 'ceil',
 'copysign',
 'cos',
 'cosh',
 'degrees',
 'e',
 'erf',
 'erfc',
 'exp',
 'expm1',
 'fabs',
 'factorial',
 'floor',
 'fmod',
 'frexp',
 'fsum',
 'gamma',
 'gcd',
 'hypot',
 'inf',
 'isclose',
 'isfinite',
 'isinf',
 'isnan',
 'ldexp',
 'lgamma',
 'log',
 'log10',
 'log1p',
 'log2',
 'modf',
 'nan',
 'pi',
 'pow',
 'radians',
 'sin',
 'sinh',
 'sqrt',
 'tan',
 'tanh',
 'trunc']

### comparison operations

In [367]:
y

NameError: name 'y' is not defined

### augmented assignment, copies and pointers

In [368]:
x >= y

NameError: name 'y' is not defined

### Boolean expressions

In [369]:
True and True, True and False

(True, False)

In [370]:
a = 6

a > 0 and a < 5

False

In [371]:
0 < a < 5

False

# Data collections and data structures

`list, dict, tuples, sets`

### `list`

Lists in Python are one-dimensional, ordered containers whose elements may be any Python objects. Lists are *mutable* and have methods for adding and removing elements to and from themselves. The literal syntax for lists is surround commas seperated values with square brackets (`[]`). The square brackets are a syntactic hint that lists are indexable.

In [372]:
society = 'C.S.P.G.'  # this is a string
type(society.split('.'))

list

In [373]:
s = society.split('.') + ' convention'
s

TypeError: can only concatenate list (not "str") to list

In [374]:
fib = [1, 1, 2, 3, 5, 8] 
fib.append(13)

In [375]:
fib

[1, 1, 2, 3, 5, 8, 13]

In [376]:
fib.extend([21.0, 34.0, 55.0])
fib

[1, 1, 2, 3, 5, 8, 13, 21.0, 34.0, 55.0]

In [377]:
fib += [89.0, 144.0]
x = fib.pop()

In [378]:
x

144.0

In [379]:
fib[5] = 1000
fib

[1, 1, 2, 3, 5, 1000, 13, 21.0, 34.0, 55.0, 89.0]

### Indexing, slicing, striding

In addition to accessing a single element in a `list` or `string`, we can also *slice* or *stride* into data structures to access multiple elements at once.

In [380]:
name = 'Cambrian (C)'
name

'Cambrian (C)'

In [381]:
name[-3]

'('

## Without using the Python interpreter, what is the expected output of the following commands?:

- a) `name[0:7]`

- b) `name[:-4]`

- c) `name[3:7]`

- d) `name[::2]`

In [382]:
age_names = ['Cambrian (C)', 'Ordivician (O)',  'Silurian (S)',  'Devonian (D)', 
           'Mississipian (M)', 'Pennsylvanian (IP)', 'Permian (P)',
           'Triassic (Tr)', 'Jurassic (J)',  'Cretaceous (C)', 
           'Tertiary (T)', 'Quaternary (Q)']

## Indexing practice

----
**Exercise**:

- return the string: 

    ` > Triassic (Tr)` 


- return just the word: 

    ` > Triassic`


- return the abbreviation:

    ` > (Tr)` enclosed in parenthesis


- return just the abbreviation: 

    ` > Tr` 


(bonus points if you can do (d) all in one line)

In [383]:
age_names[1].split()[0]

'Ordivician'

## Nested `list`

lists can contain anything*

In [384]:
[1, 2, [4, 'anything', 6]]

[1, 2, [4, 'anything', 6]]

In [385]:
age_list = [['Cambrian (C)', [544,495]], ['Ordivician (O)', [495, 492] ], 
            ['Silurian (S)', [442, 416]], ['Devonian (D)',[416, 354]], 
            ['Mississipian (M)', [354, 324]], ['Pennsylvanian (IP)', [324, 295]], 
            ['Permian (P)', [304, 248]], ['Triassic (Tr)', [248, 205]], 
            ['Jurassic (J)', [205, 144]], ['Cretaceous (C)', [160, 65]], 
            ['Tertiary (T)', [65, 1.8]], ['Quaternary (Q)']
            ]
# But at some point it just gets impractical

In [386]:
age_list[0][1][0]

544

### `dicts`

Dictionaries are probably *the most important* data structure in Python... with the possible exception of ndarrays. A dictionary, or `dict`, is a mutable, unordered collection of unique key / value pairs. 

For example, we could store a 'row' of our mini 'database' like this:

    {"Name": "Cambrian", "Abbreviation": "C", "Start": 544, "End": 495}
    
The structure we choose depends on features we want. To preserve order, we might make a list of dicts.

In [387]:
d = {'name': "Matt", 'home': 'Nova Scotia'}

In [388]:
d

{'home': 'Nova Scotia', 'name': 'Matt'}

In [389]:
ages = {
    'Cambrian': {"abbreviation": "C", "start": 544, "end": 495},
    'Ordivician': {"abbreviation": "O", "start": 495, "end": 492},  
    'Silurian': {"abbreviation": "S", "start": 442, "end": 416}, 
    'Devonian': {"abbreviation": "D", "start": 416, "end": 354},
    'Mississipian': {"abbreviation": "M", "start": 354, "end": 324},
    'Pennsylvanian': {"abbreviation": "IP", "start": 324, "end": 295}, 
    'Permian': {"abbreviation": "P", "start": 304, "end": 248},
    'Triassic': {"abbreviation": "Tr", "start": 248, "end": 205},
    'Jurassic': {"abbreviation": "J", "start": 205, "end": 144}, 
    'Cretaceous': {"abbreviation": "C", "start": 160, "end": 65}, 
    'Tertiary': {"abbreviation": "T", "start": 65, "end": 1.8}
}

In [390]:
ages['Cambrian']

{'abbreviation': 'C', 'end': 495, 'start': 544}

In [391]:
ages['Quaternary'] = {'abbreviation': 'Q', 'start':1.8, 'end':0}

In [392]:
ages['Quaternary']

{'abbreviation': 'Q', 'end': 0, 'start': 1.8}

----
**Exercise**: what is the expected output of:

* a) `ages['Triassic']`

* b) `ages['Jurassic']['Start']`

* c) What command would you type to return the age of the end of the Permian, 248?

* d) The start of the Cretaceous is wrong: it should be 144. Change it to the correct value.

* e) We've lost the dates for the Quaternary Period [1.8 mya to present (0)]. Index into that entry, and append it.

In [393]:
# your code here

### `tuples`

*Tuples* are the immutable form of lists. They behave almost exactly the same as lists in every way except that you cannot change any of their values. There are no `append()` or `extend()` methods, and there are no *in-place* operators. 

They also differ from lists in their syntax. They are so central to how Python works, that *tuples* are defined by commas. Oftentimes, tuples will be seen surrounded by parentheses. These parentheses only serve to group actions or make the code more readable, not to actually define tuples.

In [394]:
a = (1,2,3,4)  # a length-4 tuple
b = (42,)      # length-1 tuple defined by the comma
c = (42)       # not a tuple, just the number 42
d = ()         # length-0 tuple- no commas means no elements
e = 42, 1      # a length-2 tuple

In [395]:
e

(42, 1)

In [396]:
type(b)

tuple

In [397]:
a[2] = 5

TypeError: 'tuple' object does not support item assignment

You can concatenate tuples together in the same way as lists, but be careful about the order of opeartions. This is where parentheses come in handy,

(1, 2) + (3, 4)

In [398]:
(1,2)+(3,4)

(1, 2, 3, 4)

Note that even though tuples are immutable, they may have mutable elements. Suppose that we have a list embedded in a tuple. This list may be modified in-place even though the list may not be removed or replaced wholesale:

In [399]:
x = 1.0, [2, 4], 16
x[1].append(8)
x

(1.0, [2, 4, 8], 16)

### `Set`

Instances of the `set` type are equivalent to mathematical sets. Like their math counterparts, literal sets in Python are defined by comma seperated values between curly braces ({}). Sets are unordered containers of unique values. Duplicated elements are ignored. Beacuse they unordered, sets are not sequences and cannot be duplicated.

In [400]:
a = [1,2,2,2,3,4,5,5,5]

set(a)

{1, 2, 3, 4, 5}

In [401]:
# a literal set formed with elements of various types
{1.0, 1, 10, "one hundred", (1, 0, 0, 0)}

{1, 10, (1, 0, 0, 0), 'one hundred'}

In [402]:
# a literal set OF special values
{True, False, None, "", 0.0, 0, 1}

{0, 1, None, ''}

In [403]:
# conversion from a list to a set
set([2.0, 4, 4, 4, 2.0])

{2.0, 4}

Here's a good time to take a break

- Variables and Assignment
- Native data types
- Operators and Expressions
- Data collections and data structures
- <font color='lightgrey'>Procedures and control: Loops and Making choices</font>
- <font color='lightgrey'>Getting data, manipulating data</font>
- <font color='lightgrey'>Defining functions and calling functions</font>
- <font color='lightgrey'>Writing and running programs</font>
- <font color='lightgrey'>Objects and classes</font>

<hr />

In [404]:
layers = ['shale','shale','shale','sand','sand','sand','sand','shale','shale','shale']
depth = [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
porosity = [2, 3, 2, 14, 18, 17, 14, 2, 3, 3]
gamma = [85, 90, 77, 23, 27, 31, 25, 110, 113, 108]


**Exercise** manipulating and viewing lists:

* a) create a new list that has the integer 1 for sand and 0 for shale.
* b) using `plt.plot()` create a plot of the porosity values
* c) make a depth vs porosity plot so depth is vertically downwards
* d) Use `plt.scatter()` to make a cross plot of `gamma` vs `porosity`
* f) Explore other keyword arguments for `plt.plot()` and `plt.scatter` to pretty things up
* bonus e) Use the keyword `c` in the call to `scatter` to distinguish between sand and shale

# Procedures and control: loops and making decisions

## Loops

*Doing stuff many times*

the <code><font color="green">while</font></code> loop

the <code><font color="green">for</font></code> loop

In [405]:
layers = ['shale','shale','shale','sand','sand','sand','sand','shale','shale','shale']

In [406]:
# for loop syntax
for lith in layers:
    print(lith)

shale
shale
shale
sand
sand
sand
sand
shale
shale
shale


<font color="#0A5394">**\*iteration, *iterable**</font>

## List comprehension

In [407]:
list(range(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [408]:
squares = []
for n in range(10):
    squares.append(n**2)
    
squares

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [409]:
n

9

In [410]:
[n**2 for n in range(10)]

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [411]:
# Turn these loops into list comprehensions:

squares = []
for n in range(20):
    squares.append(n**2)

[n**2 for n in range(10)]

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [412]:
names = ['William Smith', 'Mary Anning', 'Steve Gould']
short = []
for name in names:
    initial = name[0]
    surname = name.split()[-1]
    short.append(surname + initial)

In [413]:
short

['SmithW', 'AnningM', 'GouldS']

In [414]:
[name.split()[-1]+name[0] for name in names]

['SmithW', 'AnningM', 'GouldS']



## Making decisions

The <code><font color="green">if</font></code> statement

use a loop and if / else statements to transform the list `layers` into a pay flag:

`pay = [0, 0, 0, 1, 1, 1, 1, 0, 0, 0]`

In [415]:
pay_rocks = ['sand', 'siltstone', 'conglom']

In [416]:
layers = ['shale','dolomite','shale','siltstone','sand','conglom','granite','shale','shale','shale']
# the if statement:
pay = []
for rock in layers:
    if rock in pay_rocks:
        pay.append(1)
    else:
        pay.append(0)
pay

[0, 0, 0, 1, 1, 1, 0, 0, 0, 0]

In [417]:
[1 if rock in pay_rocks else 0 for rock in layers]

[0, 0, 0, 1, 1, 1, 0, 0, 0, 0]

<font color="#0A5394">**\*conditionals**</font>

# Getting data...

## ... from text files

You can explicitly read from and write to files directly in your code. Python makes working with files pretty simple.

The first step to working with a text file is to obtain a 'file object' using `open`.

In [418]:
## Open the file with read only permit
fname = 'data/L30_tops.txt'

with open(fname, 'r') as f:  # use 'w' to write and 'a' to append to a file. Add a 'b' for binary files.
    header = f.readline()    # is string containing the first line in the file.
    body = f.readlines()     # The variable "lines" is a list containing the remaining lines.

Every line you get this way ends in a newline character, `\n`, so you'll often want to `strip()` it before doing anything with it.

In [419]:
bad_val = ' 2469.207 M'
float(bad_val.split()[0])

2469.207

In [420]:
fname = 'data/L30_tops.txt'
tops={}
with open(fname) as f:
    for line in f:
        if not line.startswith('#'):
            name, depth = line.strip().split(',')
            if not depth.isdigit():
                depth = depth.split()[0]
            tops[name] = float(depth)

In [421]:
tops['Base O-Marker'] - tops['WyanDot FM']

1602.051

----
**Exercise** create a tops dictionary:

* a) Modify the code snippet about to create a `dict` call tops, containing the formation name as the `key` and the `depth` as the value
* b) What is the thickness between the `Wyandot FM` and the `Base O-Marker`?

`tops['Base O-Marker'] - tops['WyanDot FM']`

# Defining and calling functions

the <code><font color="green">def</font></code> statement

In [422]:
def myfunc(args):
    """
    Documentation string
    """
    # statement
    # statement
    return 

In [423]:
help(myfunc)

Help on function myfunc in module __main__:

myfunc(args)
    Documentation string



<font color="#0A5394">**\*scope**</font>

In [424]:
def double(x):
    """takes a single input x and 
    returns double the input"""
    # this doesn't show
    y = x + x  
    return y

In [425]:
help(double)

Help on function double in module __main__:

double(x)
    takes a single input x and 
    returns double the input



----

**Exercise**: write a function that takes *two* numbers as inputs and returns the sum of those two numbers

In [426]:
# your code here
def adder(x,y):
    return x + y

In [427]:
adder('ten','twelve')

'tentwelve'

----

**Exercise**: write a function called `vshale` that that converts the list of `gamma` into a Vshale measurement. Use 20 API for 100% sand and use 150 API for 100% shale

In [428]:
layers = ['shale','shale','shale','sand','sand','sand','sand','shale','shale','shale']
depth = [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
porosity = [2, 3, 2, 14, 18, 17, 14, 2, 3, 3]
gamma = [85, 90, 77, 23, 27, 31, 25, 110, 113, 108]

In [429]:
def vshale(gamma, sand_end, shale_end):
    # your code here
    vsh = []
    for rock in gamma:
        vsh.append( (rock - sand_end) / (shale_end - sand_end) )
    return vsh

In [430]:
vshale(gamma,20,150)

[0.5,
 0.5384615384615384,
 0.43846153846153846,
 0.023076923076923078,
 0.05384615384615385,
 0.08461538461538462,
 0.038461538461538464,
 0.6923076923076923,
 0.7153846153846154,
 0.676923076923077]

----

**Exercise**: write a function called `process_tops` that takes a
filename as input and return a dictionary of the tops

In [439]:
del process_tops

In [435]:
def process_tops(fname):
    """
    Takes a file as input and returns a dictionary of tops
    f : a filename path
    """
    tops={}
    with open(fname) as f:
        for line in f:
            if not line.startswith('#'):
                name, depth = line.strip().split(',')
                if not depth.isdigit():
                    depth = depth.split()[0]
                tops[name] = float(depth)
    return tops

In [451]:
topsfile = 'data/L30_tops.txt'

In [447]:
utils.make_tops(topsfile)

AttributeError: module 'utils' has no attribute 'make_tops'

In [438]:
my_tops = utils.process_tops(topsfile)

NameError: name 'fname' is not defined

In [326]:
my_tops

{'ABENAKI FM': 3404.3112,
 'Base O-Marker': 2469.207,
 'DAWSON CANYON FM': 984.50402,
 'LOGAN CANYON FM': 1136.904,
 'Lower BACCARO': 3964.5337,
 'Lower MISSISAUGA FM': 3190.6464,
 'MID BACCARO': 3485.0832,
 'Pay_sand_1-rft': 2478.0,
 'TD': 4268.0,
 'Upper MISSISAUGA FM': 2251.2529,
 'WyanDot FM': 867.156,
 'pay_sand_2': 2499.0,
 'pay_sand_3': 2543.0,
 'pay_sand_4': 2637.0,
 'sand_5': 2699.0,
 'sand_6': 2795.0,
 'sand_7': 2835.0}

In [None]:
import numpy as np
import matplotlib.pyplot as plt

% matplotlib inline

topo = np.load('data/topography.npy')
plt.imshow(topo)
plt.show()

# Writing and running programs

Put the previous function in a text file and give it the name, `utils.py`

## Your first module

In [None]:
topsfile = 'data/L30_tops.txt'
tops = utils.process_tops(topsfile)

## ... from delimited files

In [None]:
import csv
with open('data/periods.csv', 'r') as f:
    reader = csv.DictReader(f, delimiter=',')
    for row in reader:
        print (row)

You can write out a delimited data using `csv.writer`:

In [None]:
my_tops = {'GOC' : 1200.0 , 'OWC' : 1300.0, 'Top Reservoir' : 1100.0}

with open('hydrocarbon_contact.txt', 'w') as f:
    writer = csv.writer(f, delimiter=',')
    for name, depth in my_tops.items():
        writer.writerow([name, depth])

## ... from the web

Use View Source in your browser to figure out where the age range is on the page, and what it looks like.

Try to find the same string here.

In [None]:
url = "http://en.wikipedia.org/wiki/Cretaceous"

In [None]:
import requests
r = requests.get(url)
r.text[:2000]

Using a [regular expression](https://docs.python.org/2/library/re.html):

In [None]:
import re

s = re.search(r'<i>(.+?million years ago)</i>', r.text)
text = s.group(1)

## Using built-in functions

## Importing modules

the <code><font color="green">import</font></code> statement


In [None]:
import this

## The Python standard library

[Built-in functions](https://docs.python.org/3/library/functions.html)

[Built-in Types](https://docs.python.org/3/library/stdtypes.html)

[docs.python.org](https://docs.python.org/3/library/)

In [None]:
import datetime

## External python packages

The Python Package Index, [PyPI](https://pypi.python.org/pypi)

* [SciPy](http://www.scipy.org/) -  a collection of often-used libraries