<a href="https://colab.research.google.com/github/ds4geo/ds4geo/blob/master/DS4GEO_L2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Data Science for Geoscientists - Winter Semester 2020**
# **Session 2**

In the previous session, we handled data in a very simple way using pandas. In this session we will introduce a few other helpful python object types for handling data, and expecially learn how to index/slice data (extract only certain parts of the data/object). Specifically, we will cover lists, dictionaries, and arrays from the numpy library.

We will also introduce simple array operations and aggregations, then apply these topics to a worked example from the geosciences.





# Section 1 - Lists, Dictionaries and Indexing

Lists and dictionaries are built-in python objects useful for storing and handling data.

# Lists
Python lists are ordered collections of other python objects, separated by commas. They are defined by square brackets [ ]

In [None]:
a = [1,2,3] # List of integers
print("a:", a)

b = [1.5, 2.5, 3.5] # List of floats
print("b:", b)

In [None]:
# Lists can contain different types
c = [1, "data", 2.5]
print("b:", b)

# Including other lists (nested)
d = [[1,2,3], [4,5,6]]
print("d:", d)

e = [a, b]
print("e:", e)

In [None]:
# They can contain any other python objects
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
f = [pd, np, plt] # But there's no reason to actually do this
print("f:", f)

In [None]:
# From last week, you'll recall dir() can be used to find methods on objects
a = [1, 2, 3]
# Append adds a new item to the end of a list
a.append(4)
print("a:", a)

# Extend joins to lists *in place*
a.extend(b) # notice we don't assign a result
print("a:", a)
# + operator when applied to two lists but not *in place*:
h = a + b
print("h:", h)

# sort does what it suggests, in place
a.sort()
print("a:", a)

In [None]:
# Tuples are another type very similar to lists except they can't be modified
# i.e. you cannot append to a tuple
# They are defined by parentheses ( ) instead of [ ]
a_tuple = (1, 2, 3)
print("a_tuple:", a_tuple)

# The specific reasons for using tuples complex.
# You will see them in documentation, but usually you can just use a list

# Dictionaries
Python dictionaries are un-ordered collections of pairs known as keys and values. They function like language dictionaries where you look up a word (they key) and see its definition or translation (value).
They are defined with braces { }, separated by commas, and colons : indicate the key-value relationships.

In [None]:
# Create a simple German to English language dictionary
De2En = {"Bier": "Beer", "Wurst": "Sausage"}

# When making lists and dictionaries, you can wrap between lines for readability:
De2En = {"Bier": "Beer",
         "Wurst": "Sausage",
         "Kren": "Horseradish"}


In [None]:
# Values can be any python object, e.g. lists:
rocks = {"igneous": ["Granite", "Basalt", "Rhyolite"],
         "Sedimentary": ["Sandstone", "Limestone"]}

# Keys can be some python objects (int, float, string, tuple), but not others (lists or dicts)
# Keys and values do not all have to be the same type
complex_dict = {0: "zero",
                "one point 5": 1.5,
                2.5: ["two", "point", "five"]}

# Dictionaries can also be nested like lists.
# Note the nesting is multi-line and aligned to improve readability
rock_dict = {"granite": {"type": "igneous",
                         "composition": {"quartz": 0.5,
                                         "feldspar": 0.2},
                         "locations": [(50.59671,-3.98289),
                                       (50.59591,-4.61987)]},
             "sandstone": {}}


# List and Dictionary Indexing
You can select objects/data from lists and dictionaries using square brackets [ ].
List indexing is based on numeric positions, while dictionary indexing is based on its keys.

**Note** python positional indexing (for lists, numpy, pandas, etc) always starts at 0. i.e. the first item is 0. This might seem counter intuitive at first, but when combined with some other features of python, it actually simplifies code in many situations!

In [None]:
# Remind ourselves what is in variable "c"
print(c)

# Print positions 0, 1 and 2 of list "c"
print(c[0])
print(c[1])
print(c[2])

In [None]:
# If we try to index a position beyond the size of the list, we get an index error
print(c[3])

In [None]:
# List indexing also works with negative numbers in reverse, with -1 being the last index
print(c)
print(c[-1]) # the last item in c

In [None]:
# With nested objects, indexing can be stacked with sets of square brackets [ ][ ]
print(d)
print(d[1])
print(d[1][2])

In [None]:
# Indexing tuples and strings works exactly the same way
print(a_tuple)
print(a_tuple[0])

print(c)
print(c[1])
print(c[1][2])

In [None]:
# For lists, tuples and strings (and numpy - see later), ranges also work.
# Ranges are "half-open", i.e. include the first index, but not the last.
# This is so when you use a range of e.g. 2:4, you get a result of length 2, despite indexing starting at 0
print(a)
print(a[2:4])

In [None]:
# Also useful is finding the length of lists, dicts and strings:
print("length of list a:", len(a))
print("length of dict rocks:", len(rocks))
print("length of string in position 1 of list c:", len(c[1]))

In [None]:
# Dictionaries are indexed by their keys:
print(De2En["Bier"])

# And example of indexing nested objects
print(rocks["igneous"])
print(rocks["igneous"][1])
print(rock_dict["granite"])
print(rock_dict["granite"]["composition"])
print(rock_dict["granite"]["composition"]["quartz"])

In [None]:
# You can also expand dictionaries using indexing assignment:
De2En["Semmel"] = "Bread roll"
print(De2En)

rocks["metamorphic"] = ["Gneiss, Schist"]
print(rocks)

# And you can use methods on the objects indexed:
rocks["igneous"].append("Gabbro")
print(rocks)

# Section 2.2 - Numpy part 1
Last week we used the popular python library Pandas, but didn't introduce it formally.
This week we will also be using a popular libary called Numpy.
Pandas is built upon Numpy, and they work well together.
Pandas is good at data handling, manipulation and analysis, while Numpy is the basis of numerical operations and processing.
See more here:
* https://pandas.pydata.org/
* https://numpy.org/

We will use both Pandas and Numpy throughout the course. Together (along with matplotlib), they are the basis of Data Science in python.

Numpy is based around multi-dimensional arrays (of data), and allows efficient indexing, operations and aggregation of said arrays.
For those not familiar with multi-dimensional arrays (also called nd-arrays), imagine an excel spreadsheet as a 2 dimensional table/array with rows and columns, but that you can have as many dimensions as you like.

As an example, in satellite remote sensing, it is typical to have a time-series of many multi-band (e.g. red, green, blue, infra-red) images. Therefore, you might have an array of 4 dimensions: [pixel rows, pixel columns, time, band]. So for each x-y pixel, at each point in time, you have a value for each band.

In the following section, we will create arrays, learn how to do simple operations on them and perform basic aggregations. In the following section, we will explore Numpy's powerful indexing system.



## 2.2.1 - Creating Arrays

In [None]:
# Here we cover simple ways to create numpy arrays.
# We will cover loading and importing data, e.g. from pandas later.

# The simplest way to create an array is from a list
array = np.array([1,2,3])
print(array)

# Or with nested lists for multiple dimensions
array_2d = np.array([[1,2,3],[4,5,6]])
print(array_2d)

In [None]:
# numpy provides some functions to create arrays by shape:
# make a 1d array of 5 zeros
array_zeros = np.zeros(5) 
print(array_zeros)

# Make a 2d array of 1s
array_ones = np.ones((2,5))
print(array_ones)

# numpy arrays have an attribute shape:
print("array_zeros size:", array_zeros.shape)
print("array_ones size:", array_ones.shape)

In [None]:
# Create an array of consecutive integers in a range using np.arange
arange_1 = np.arange(15,25)
print(arange_1)

# Use arange to create larger steps
arange_2 = np.arange(15,25,2)
print(arange_2)

# If one needs a standard python list in this style:
print(range(5))

In [None]:
# Create array across range by number of intermediate steps, rather than the step itself
linspace_1 = np.linspace(0,4,17)
print(linspace_1)

In [None]:
# Arrays of random numbers can be produced with np.random.random_sample np.random.standard_normal
uni_random = np.random.random_sample(10)
print(uni_random)

np.random.standard_normal()
norm_random = np.random.standard_normal(10)
print(norm_random)

## 2.2.2 - Operations

In [None]:
# Python lets us do operations on integers and floats
print(1+2)
print(2*3)
print(2.5*5)
print(2**6)
print(64/4)

In [None]:
# But on lists, these operators do other things:
print([1,2,3] + [4]) # List concatenation
print([1,2,3] * 3) # List duplication
# Operators like / and - do not work

In [None]:
# Operators can be applied to numpy arrays in an intuitive way:
# Operators between a numpy array and a single int or float apply the operation to all elements in the array:
a = np.ones(5)
b = np.arange(5)

print("a:",a)
print("a + 1:",a + 1)
print("a - 1:",a - 1)
print("a * 2:",a * 2)
print("a / 2:",a / 2)

print("b * 2:", b * 2)

In [None]:
# Operations between arrays of the same shape result are element-wise:
print("b:",b)
print("b * b:", b * b)

## 2.2.3 - Aggregations

# Section 3 - Numpy Excercise 1

# Section 3 - Numpy 2

Aggregations and Indexing

# Section 4 - LA-ICPMS data reduction excercise

Motivation and overview