In [None]:
import numpy as np
import matplotlib.pyplot as plt
% matplotlib inline

# Fundamentals of Programming

Evan Bianco
[agilegeoscience](http://agilegeoscience.com), [@EvanBianco](http://twitter.com/EvanBianco)

- Variables and Assignment

- Native data types

- Operators and Expressions

- Data collections and data structures

- Procedures and control: Loops and Making choices

- Getting data, manipulating data

- Defining functions and calling functions 

- Writing and running programs 

- Objects and classes

## Variables and Assignment

In [None]:
x = 7
x

In [None]:
x.__repr__()

checking the `type` of a variable

In [None]:
type(x)

In [None]:
%whos

## Native data `types`

In [None]:
z = 1.4 + 2.3

In [None]:
print(y)

In [None]:
c = 2 +1.5j  # same as writing: complex(2, 1.5) 

In [None]:
5 / 3

In [None]:
5 // 3

Why are there 2 kinds of numbers?

## Strings `str`

In [None]:
s = 'Ekofisk'

## `str` indexing (how to count, part 1)

exercise: return the `'f'` character in `s`

Try `help(s), s?, s??, s.<tab>, s.upper() , s.strip(), s.startswith(), s.pop()`

In [None]:
s2, s3 = 'Chalk ', 'Shale'

In [None]:
lithology = s + s2 + 'has minor ' + s3 + ' fragments'

In [None]:
'{0} {1} has minor {2} fragments'.format(s,s2,s3)

## Operators and Expressions

* mathematical operations

* comparison operations

* bitwise operations

* augmented assignment, copies, and pointers

* boolean expressions

* conversion functions

### mathematical operations

### comparison operations

### bitwise operations

### augmented assignment, copies and pointers

In [None]:
x = 42
y = x
del x

### boolean expressions

### conversion functions

## Data collections and data structures

`list, dict, tuples, sets`

### `list`

Lists in Python are one-dimensional, ordered containers whose elements may be any Python objects. Lists are *mutable* and have methods for adding and removing elements to and from themselves. The literal syntax for lists is surround commas seperated values with square brackets (`[]`). The square brackets are a syntactic hint that lists are indexable.

In [None]:
[1,1] + [3,3] + [4, 4]

In [None]:
fib = [1.0,1.0,2.0,3.0,5.0,6.0]
fib.append(13)

In [None]:
del(fib)

In [None]:
fib.extend([21.0, 34.0, 55.0])

In [None]:
fib += [89.0, 144.0]

In [None]:
fibm = np.array(fib[:-1])
fibp = np.array(fib[1:])
plt.plot(fibp/fibm)
plt.title('Golden Ratio')
fibp/fibm

### indexing, slicing, striding

In [None]:
counts = [1, 2, 3, 4, 5, 6, 7, 8, 9]
counts[3]
counts[1:]  
counts[:-1] 
counts[::2] 
counts[1::2] 

In [None]:
periods = ['Cambrian (C)', 'Ordivician (O)', 'Silurian (S)', 'Devonian (D)', 
           'Mississipian (M)', 'Pennsylvanian (IP)', 'Permian (P)',
           'Triassic (Tr)', 'Jurassic (J)', 'Cretaceous (C)', 
           'Tertiary (T)', 'Quaternary (Q)']

exercise: 

- (a) return the 'Triassic (Tr)' string

- (b) return just the word 'Triassic'

- (c) return the abbreviation '(Tr)' enclosed in parenthesis

- (d) return just the abbreviation 'Tr'

### Nested `list`

In [None]:
nested_periods = [
           ['Cambrian (C)', [544,495]], ['Ordivician (O)', [495, 492] ], 
           ['Silurian (S)', [442, 416]], ['Devonian (D)',[416, 354]], 
           ['Mississipian (M)', [354, 324]], ['Pennsylvanian (IP)', [324, 295]], 
           ['Permian (P)', [304, 248]], ['Triassic (Tr)', [248, 205]], 
           ['Jurassic (J)', [205, 144]], ['Cretaceous (C)', [160, 65]], 
           ['Tertiary (T)', [65, 1.8]], ['Quaternary (Q)']
           ]

exercise: what is the expected output of
* a) `nested_periods[:2]`

* b) `nested_periods[6]`

* c) what command would you type to return the age of the end of the Permian, 248?

* d) the start of the Cretaceous is wrong (it should be 144). Change it to the correct value

* e) We've lost the dates for the Quaternary Period [1.8, 0]. Index into that element, and append it.

### `tuples`

*Tuples* are the immutable form of lists. They behave almost exactly the same as lists in every way except that you cannot change any of their values. There are no `append()` or `extend()` methods, and there are no *in-place* operators. 

They also differ from lists in their syntax. They are so central to how Python works, that *tuples* are defined by commas. Oftentimes, tuples will be seen surrounded by parentheses. These parentheses only serve to group actions or make the code more readable, not to actually define tuples.

In [None]:
a = 1,2,3,4  # a length-4 tuple
b = (42,)    # length-1 tuple defined by the comma
c = (42)     # not a tuple, just the number 42
d = ()       # length-0 tuple- no commas means no elements

You can concatenate tuples together in the same way as lists, but be careful about the order of opeartions. This si where parentheses come in handy,

(1, 2) + (3, 4)

In [None]:
1, 2 + 3, 4

Note that even though tuples are immutable, they may have immutable elements. Suppose that we have a list embedded in a tuple. This list may be modified in-place even though the list may not be removed or replaced wholesale:

In [None]:
x = 1.0, [2, 4], 16
x[1].append(8)
x

### `Sets`

Instances of the `set` type are equivalent to mathematical sets. Like their math counterparts, literal sets in Python are defined by comma seperated values between curly braces ({}). Sets are unordered containers of unique values. Duplicated elements are ignored. Beacuse they unordered, sets are not sequences and cannot be duplicated.

In [None]:
# a literal set formed with elements of various types
{1.0, 10, "one hundred", (1, 0, 0, 0)}

In [None]:
# a literal set OF special values
{True, False, None, "", 0.0, 0}

In [None]:
# conversion from a list to a set
set([2.0, 4, "eight", (16,)])

### `dicts`

Dictionaries are hands down *the most important* data structure in Python. Everything in Python is a dictionary. A dictionary, or `dict`, is a mutable, unordered collection of unique key / value pairs. 

In [None]:
perdict = {
           'Cambrian (C)' : (544,495), 'Ordivician (O)': (495, 492), 
           'Silurian (S)' : (442, 416), 'Devonian (D)': (416, 354), 
           'Mississipian (M)' : (354, 324), 'Pennsylvanian (IP)' : (324, 295), 
           'Permian (P)' : (304, 248), 'Triassic (Tr)' : (248, 205), 
           'Jurassic (J)' : (205, 144), 'Cretaceous (C)' : (160, 65), 
           'Tertiary (T)' : (65, 1.8), 'Quaternary (Q)' : (1.8, 0.0)
           }

In [None]:
periods

`perdict = dict([(k1,v1),(k1,v1),(k1,v1)])`

## Loops

*Doing stuff many times*

the <code><font color="green">while</font></code> loop

the <code><font color="green">for</font></code> loop

In [None]:
# while loop syntax


In [None]:
# for loop syntax

<font color="#0A5394">**\*iteration, *iterable**</font>

## Making choices

The <code><font color="green">if</font></code> statement

In [None]:
# the if statement:

# the if / elif statement:

# the if / else statement:

<font color="#0A5394">**\*conditionals**</font>

## Getting data...

## ... from text files

You can explicitly read from and write to files directly in your code. Python makes working with files pretty simple.

The first step to working with a text file is to obtain a 'file object' using `open`.

In [None]:
file_for_reading = open('reading_file.txt', 'r')  # 'r' means read-only

file_for_writing = open('writing_file.txt', 'r')  # 'w' is for write - will destroy file if already exists

file_for_appending = open('appending_file.txt', 'a')  # 'a' is for appending to the end of a file.

file_for_writing.close()  # don't forget to close your files when you're done.

Because it is easy to forget to close your files, it is convenient to use them with a a `with` block, at the end of which they will be close automatically.

If you need to read a whole text file, you can just iterate over the lines of the file using `for`:

In [None]:
starts_with_hash = 0

with open('input.txt', r) as f:
    for line in f:                    # look at each line in the file 
        if re.match("^#", line):      # use a regex to see if it starts with '#'
            starts_with_hash += 1     # if is does, add 1 to the count

In [None]:
filename = 'data/periods.txt'
periods = {}
with open(filename, 'r') as f:
    for line in f:
        print (line)

Exercise: write a function called process_tops that takes a
filename as input and return a dictionary of the tops,

In [None]:
fname = 'data/periods.txt'
with open(fname) as f:
    i = 0
    line = f.readline()
    if line.startswith('#') and i < 10:
        i+=1
        content = f.readlines()
        print(content)

Every line you get this way ends in a newline character, `\n`, so you'll often want to `strip()` it before doing anything with it.

In [None]:
## Open the file with read only permit
f = open('myTextFile.txt', "r")

line = f.readline()    # is string containing the next line in the file
lines = f.readlines()  # The variable "lines" is a list containing all lines

## close the file after reading the lines.
f.close()

## ... from delimited files

In [7]:
import csv

with open('z_data/periods.csv', 'rt') as f:
    reader = csv.DictReader(f, delimiter=',')
    for row in reader:
        print (row)

{'start': '544', 'end': '495', 'name': 'Cambrian (C)'}
{'start': '495', 'end': '492', 'name': 'Ordivician (O)'}
{'start': '442', 'end': '416', 'name': 'Silurian (S)'}
{'start': '416', 'end': '354', 'name': 'Devonian (D)'}
{'start': '354', 'end': '324', 'name': 'Mississipian (M)'}
{'start': '324', 'end': '295', 'name': 'Pennsylvanian (IP'}
{'start': '304', 'end': '248', 'name': 'Permian (P)'}
{'start': '248', 'end': '205', 'name': 'Triassic (Tr)'}
{'start': '205', 'end': '144', 'name': 'Jurassic (J)'}
{'start': '160', 'end': '65', 'name': 'Cretaceous (C)'}
{'start': '65', 'end': '1.8', 'name': 'Tertiary (T)'}
{'start': '1.8', 'end': '0', 'name': 'Quaternary (Q)'}


You can write out a delimited data using `csv.writer`:

In [None]:
my_tops = {'GOC' : 1200.0 , 'OWC' : 1300.0, 'Top Reservoir' : 1100.0}

with open('comma_delimited_stock_prices.txt', 'wb') as f:
    writer = csv.writer(f, delimiter=',')
    for name, depth in my_tops.items():
        writer.writerow([name, depth])

## ... from the web

Use View Source in your browser to figure out where the age range is on the page, and what it looks like.

Try to find the same string here.

In [8]:
url = "http://en.wikipedia.org/wiki/Jurassic"

In [9]:
import requests
r = requests.get(url)
r.text[:500]

'<!DOCTYPE html>\n<html lang="en" dir="ltr" class="client-nojs">\n<head>\n<meta charset="UTF-8" />\n<title>Jurassic - Wikipedia, the free encyclopedia</title>\n<script>document.documentElement.className = document.documentElement.className.replace( /(^|\\s)client-nojs(\\s|$)/, "$1client-js$2" );</script>\n<script>window.RLQ = window.RLQ || []; window.RLQ.push( function () {\nmw.config.set({"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":false,"wgNamespaceNumber":0,"wgPageName":"Jurassic","wgTitle":'

Using a [regular expression](https://docs.python.org/2/library/re.html):

In [10]:
import re

s = re.search(r'<i>(.+?million years ago)</i>', r.text)
text = s.group(1)

## Defining and calling functions

the <code><font color="green">def</font></code> statement

In [7]:
def myfunc(args):
    """
    Documentation string
    """
    # statement
    # statement
    return # optional

<font color="#0A5394">**\*scope**</font>

Exercise: Make a function to get the start and end ages of *any* geologic period, taking the name of the period as an argument.

In [11]:
def get_age(period):
    url = "http://en.wikipedia.org/wiki/" + period
    r = requests.get(url)
    start, end = re.search(r'<i>([\.0-9]+)–([\.0-9]+)&#160;million years ago</i>', r.text).groups()
    return float(start), float(end)

In [12]:
period = "Jurassic"
get_age(period)

(201.3, 145.0)

In [13]:
def duration(period):
    t0, t1 = get_age(period)
    duration = t0 - t1
    response = "According to Wikipedia, the {0} period was {1:.2f} Ma long.".format(period, duration)
    return response

In [14]:
duration('Cretaceous')

'According to Wikipedia, the Cretaceous period was 79.00 Ma long.'

## Using built-in functions

## Importing modules

the <code><font color="green">import</font></code> statement


In [None]:
import this

## The Python standard library

[Built-in functions](https://docs.python.org/3/library/functions.html)

[Built-in Types](https://docs.python.org/3/library/stdtypes.html)

[docs.python.org](https://docs.python.org/3/library/)

In [None]:
import datetime

## External python languges

The Python Package Index, [PyPI](https://pypi.python.org/pypi)

* [SciPy](http://www.scipy.org/) -  a collection of often-used libraries

## Using external libraries

In [None]:
import numpy as np
import bruges

getting started with [bruges](https://github.com/agile-geoscience/notebooks/blob/master/Bruges_getting_started.ipynb)


## Writing and running programs

## Objects and Classes

In [1]:
class Layers(object):
    
    def __init__(self, layers, label=None):
        # Just make sure we end up with an array
        self.layers = np.array(layers)
        self.label = label or "My log"
        self.length = self.layers.size  # But storing len in an attribute is unexpected...
        
    def __len__(self):  # ...better to do this.
        return len(self.layers)
        
    def rcs(self):
        uppers = self.layers[:-1]
        lowers = self.layers[1:]
        return (lowers-uppers) / (uppers+lowers)
    
    def plot(self, lw=0.5, color='#6699ff'):
        fig = plt.figure(figsize=(2,6))
        ax = fig.add_subplot(111)
        ax.barh(range(len(self.layers)), self.layers, color=color, lw=lw, align='edge', height=1.0, alpha=1.0, zorder=10)
        ax.grid(zorder=2)
        ax.set_ylabel('Layers')
        ax.set_title(self.label)
        ax.set_xlim([-0.5,1.0])
        ax.set_xlabel('Measurement (units)')
        ax.invert_yaxis()  
        #ax.set_xticks(ax.get_xticks()[::2])    # take out every second tick
        ax.spines['right'].set_visible(False)  # hide the spine on the right
        ax.yaxis.set_ticks_position('left')    # Only show ticks on the left and bottom spines
        
        plt.show()

In [None]:
layers = [0.23, 0.34, 0.45, 0.25, 0.23, 0.35]

In [None]:
l = Layers(layers, label='Well # 1')