<a href="https://csdms.colorado.edu"><img style="float: center; width: 75%" src="../../media/logo.png"></a>

# Programming with Python
## Analyzing Topographic Data
### minutes: 30

>## Learning Objectives
>*   Explain what a library is, and what libraries are used for
>*   Load a Python library and use the tools it contains
>*   Read data from a file into a program
>*   Assign values to variables
>*   Select individual values and subsections from data
>*   Perform operations on arrays of data
>*   Display simple graphs

While a lot of powerful tools are built into languages like Python,
even more tools exist in [libraries](reference.html#library).

In order to load the elevation data,
we need to [import](reference.html#import) a library called NumPy.
You should use this library if you want to do fancy things with numbers (ie. math),
especially if you have matrices or arrays.
We can load NumPy using:

In [None]:
import numpy

Importing a library is like pulling a toolbox out of a
storage locker and placing it on your workbench, making everything inside the toolbox accessible. Python has a set of built-in functions that are always available (the tools you always have available) and libraries provide
additional functionality (the specialized tools in the toolbox you only sometimes need).

## What is NumPy?
* NumPy is short for "Numerical Python" and it is a fundamental python package for scientific computing.
* It uses a high-performance data structure known as the **n-dimensional array** or **ndarray**, a multi-dimensional array object, for efficient computation of arrays and matrices.


Once we’ve loaded the library, we can
call a function inside that library to read the data file:

In [None]:
numpy.loadtxt('../../data/topo.asc', delimiter=',')

The expression `numpy.loadtxt(...)` is a [function call](reference.html#function-call)
that asks Python to run the function `loadtxt` that belongs to the `numpy` library.
This [dotted notation](reference.html#dotted-notation), with the syntax `thing.component`, is used
everywhere in Python to refer to parts of things.

The function call to `numpy.loadtxt` has two [parameters](reference.html#parameter):
the name of the file we want to read,
and the [delimiter](reference.html#delimiter) that separates values on a line.
Both need to be character strings (or [strings](reference.html#string), for short)
so we write them in quotes.

Within the Jupyter (iPython) notebook, pressing Shift+Enter runs the
commands in the selected cell. Because we haven't told iPython what to
do with the output of `numpy.loadtxt`, the notebook just displays it on
the screen. In this case, that output is the data we just loaded. By
default, only a few rows and columns are shown (with `...` to omit
elements when displaying big arrays).

Our call to `numpy.loadtxt` read the file but didn’t save it to memory.
In order to access the data, we need to [assign](reference.html#assignment) the values to a [variable](reference.html#variable).
A variable is just a name that refers to an object. Python’s variables
must begin with a letter and are [case sensitive](reference.html#case-sensitive). We can assign a
variable name to an object using `=`.


## Objects and their names

What happens when a function is called but the output is not assigned to
a variable is a bit more complicated than simply not saving it. The call
to `numpy.loadtxt` read the file and created an object in memory that
contains the data, but because we didn't assign it to a variable name,
there is no way for us to call this object. 

Let’s re-run numpy.loadtxt and assign the output to a variable name:

In [None]:
topo = numpy.loadtxt('../../data/topo.asc', delimiter=',')

This command doesn’t produce any visible output. If we want to see the
data, we can print the variable’s value with the command `print`:

In [None]:
print(topo)

Using its variable name, we can see that [type](reference.html#type) of object the variable:

In [None]:
type(topo)

The function `type` tells us that the variable name `topo` currently
points to an N-dimensional array created by the NumPy library. We can also get the shape of the
array:

In [None]:
topo.shape

This tells us that `topo` has 500 rows and 500 columns. The file
we imported contains elevation data (in meters, 2 degree spacing) for an
area along the Front Range of Colorado, so the area that this array represents is 1 km x 1 km.

The object of
type `numpy.ndarray` that the variable `topo` is assigned to contains the values of the array
as well as some extra information about the array. These are the [members](reference.html#member) or attributes of the object, and they
describe the data in the same way an adjective describes a noun. The
command `topo.shape` calls the `shape` attribute of the object with the variable name
`topo` that describes its dimensions. We use the same dotted notation
for the attributes of objects that we use for the functions inside
libraries because they have the same part-and-whole relationship.



 ## Who's who in the memory

 You can use the `whos` command at any time to see what variables you have created and what modules you have loaded into the computers memory. As this is an IPython command, it will only work if you are in an iPython
 terminal or the Jupyter Notebook.
 
 Try it, check what is currently on your memory

In [None]:
whos

## Plotting
 
Rasters are just big two dimensional arrays of values. In the case of DEMs, those values
are elevations. It's very hard to get a good sense of what this landscape
looks like by looking directly at the data. This information is better
conveyed through plots and graphics.

Data visualization deserves an entire lecture (or course) of its own,
but we can explore a few features of Python's `matplotlib` library here.
While there is no "official" plotting library in Python, this package is
the de facto standard.

We start by importing the `pyplot` module from the library `matplotlib`:

In [None]:
from matplotlib import pyplot as plt

We can use the function `imshow` within `matplotlib.pyplot` to display arrays as a 2D
image. 

Try to display the 2D `topo` array

In [None]:
plt.imshow(topo)

## Indexing

We can access individual values in an array using an [index](reference.html#index) in square brackets:

In [None]:
print('elevation at the corner of topo:', topo[0,0], 'meters')

In [None]:
print('elevation at a point in topo:', topo[137,65], 'meters')

When referring to entries in a two dimensional array, the indices are
ordered `[row,column]`. The expression `topo[137, 65]` should not surprise you but `topo[0,0]` might. Programming languages like Fortran and MATLAB
start counting at 1 because that’s what (most) humans have done for
thousands of years. Languages in the C family (including C++, Java,
Perl, and Python) count from 0 because that’s simpler for computers to
do. So if we have an M×N array in Python, the indices go from 0 to M-1
on the first axis (rows) and 0 to N-1 on the second (columns). In
MATLAB, the same array (or matrix) would have indices that go from 1 to
M and 1 to N. Zero-based indexing takes a bit of getting used to, but
one way to remember the rule is that the index is how many steps we have
to take from the start to get to the item we want.

Python also allows for **negative indices** to refer to the position of
elements with respect to the end of each axis. An index of -1 refers to
the last item in a list, -2 is the second to last, and so on. Since
index `[0,0]` is the upper left corner of an array, index `[-1,-1]`
therefore the lower right corner of the array. 

Print the lower right corner of the `topo` array: 

In [None]:
print(topo[-1,-1])

Print the upper left corner of the `topo` array: 

In [None]:
print(topo[0,0])

> ## In the Corner
>
> What may also surprise you is that when Python displays an array,
> it shows the element with index `[0, 0]` in the upper left corner
> rather than the lower left.
> This is consistent with the way mathematicians draw matrices,
> but different from the Cartesian coordinates.
> The indices are (row, column) instead of (column, row) for the same reason,
> which can be confusing when plotting data.

## Slicing

A command like `topo[0,0]` selects a single element in the array `topo`.
Indices can also be used to [slice](reference.html#slice) sections of the array. For example, we
can select the top left quarter of the array like this:

In [None]:
print(topo[0:5, 0:5])

The slice `[0:5]` means "Start at index 0 and go along the axis up to,
but not including, index 5".

We don’t need to include the upper or lower bound of the slice if we
want to go all the way to the edge. If we don’t include the lower bound,
Python uses 0 by default; if we don’t include the upper bound, the slice
runs to the end of the axis. If we don’t include either (i.e., if we
just use ‘:’), the slice includes everything. 

Print out the first 5 rows and last 6 columns op the topo array:

In [None]:
print(topo[:5, -6:])

 ## Point elevations: Practice your skills 
 
 Use indexing to answer the following questions and check your answers
 against the data visualization:
 
 * Is the NW corner of the region higher than the SW corner? What's the elevation difference? You can assume the NW corner to be in the upper left corner of the matrix (NW of at [0,0], not the Cartesian NW, see also (In the Corner)
 * What's the elevation difference between the NE corner and the SE corner?
 * What's the elevation at the center of the region shown in the array?

In [None]:
print(topo[0,0]-topo[-1,0])
print(topo.shape[0]/2)
print(topo[int(topo.shape[0]/2),int(topo.shape[1]/2)])

 ## Slicing strings
 
 Indexing and slicing behave the same way for any type of sequence,
 including numpy arrays, lists, and strings. Create a new variable called
 `text` and assign it the string "The quick brown fox jumped over the
 lazy dog." (note the capitalization and punctuation in each sentence, and include the quotes so Python recognizes it as a string).
 Then use slicing and the `print()` statement to create these frases:
 
 * the lazy dog.
 * The fox jumped over the dog
 * The lazy fox jumped over the quick brown dog.
 

In [None]:
text="The quick brown fox jumped over the lazy dog."

print(text[0:3], text[36:46])

 ## Plotting smaller regions 
 
 Use the function `imshow` from `matplotlib.pyplot` to make one plot showing the northern half of the region and another plot showing the southern half.  Use the pyplot show() function to display the current figure and start a new one. Render the figures in the notebook using `%matplotlib inline`

In [None]:
plt.figure()
plt.imshow(topo[:int(topo.shape[0]/2),:])
plt.figure()
plt.imshow(topo[int(topo.shape[0]/2):,:])

## Numerical operations on arrays

We can perform basic mathematical operations on each individual element of a NumPy array. We can create a new array with elevations in feet:

In [None]:
topo_in_feet = topo * 3.2808
print('Elevation in meters:', topo[0,0])
print('Elevation in feet:', topo_in_feet[0,0])

Arrays of the same size can be used together in arithmatic operations:

In [None]:
double_topo = topo + topo
print('Double topo:', double_topo[0,0], 'meters')

We can also perform statistical operations on arrays:

In [None]:
print('Mean elevation:', topo.mean(), 'meters')

> ## Methods vs. attributes
> 
> `mean` is a method that belongs to the array `topo`, i.e., it is a
> function `topo` can inherently call just because of its type.
> When we call `topo.mean()`, we are asking `topo` to calculate its mean
> value. Because it is a function, we need to include parenthesis in the
> command. Because it is an `np.array`, `topo` also has an attribute called `shape`, but it doesn't include parenthesis because
> attributes are objects, not functions.

Python will kindly tell us if we mix up the parentheses:
 

In [None]:
topo.mean

NumPy arrays have many other useful methods. Print the min and max elevation of the topo dataset

In [None]:
print('Highest elevation:', topo.max(), 'meters')
print('Lowest elevation:', topo.min(), 'meters')

We can also call methods on slices of the array:

In [None]:
half_len = int(topo.shape[0] / 2)

print('Highest elevation of NW quarter:', topo[:half_len,
:half_len].max(), 'meters')

print('Highest elevation of SE quarter:', topo[half_len:,
half_len:].max(), 'meters' )

Methods can also be used along individual axes (rows or columns) of an
array. If we want to see how the mean elevation changes with longitude
(E-W), we can use the method along `axis=0`:

In [None]:
print(topo.mean(axis=0) )

To see how the mean elevation changes with latitude (N-S), we can use
`axis=1`:

In [None]:
print(topo.mean(axis=1) )

## Plotting, take two
 
It's hard to get a sense of how the topography changes across the
landscape from these big tables of numbers. A simpler way to display
this information is with line plots.

We are again going to use the `matplotlib` package for data
visualization. Since we imported the `matplotlib.pyplot` library once
already, those tools are available and can be called within Python. As a
review, though, we are going to write every step needed to load and plot
the data.

We use the function `plot` to create two basic line plots of the
topography:

In [None]:
plt.plot(topo[-1,:], 'r--')
plt.title('Topographic profile, southern edge')
plt.ylabel('Elevation (m)')
plt.xlabel('<-- West    East -->')
plt.show() 

# Northern edge
plt.plot(topo[0,:])
plt.title('Topographic profile, northern edge')
plt.ylabel('Elevation')
plt.xlabel('<-- West   East-->')
plt.show()

# Can you plot the southern edge
plt.plot(topo[0,-1])
plt.title('Topographic profile, southern edge')
plt.ylabel('Elevation')
plt.xlabel('<-- West   East-->')
plt.show()

# And the mean elevation changes with longitude (E-W)?
plt.plot(topo.mean(axis=0))
plt.title('Topographic profile, mean elevations')
plt.ylabel('Elevation')
plt.xlabel('<-- West   East-->')
plt.show() 


> ## Scientists dislike typing
>
> We will always use the syntax `import numpy` to import NumPy. However,
> in order to save typing, it is [often
> suggested](http://www.scipy.org/getting-started.html#an-example-script
> ) to make a shortcut like so: `import numpy as np`. If you ever see
> Python code using a NumPy function with `np` (for example,
> `np.loadtxt(...)`), it's because they've used this shortcut.

To better compare these profiles, we can plot them as separate lines in
a single figure. Note that this is the default configuration in python 3. Unless a new figure instance is opened or the existing figure is shown (`plt.show`), all subsequent calls to `plt.plot` will use the same axes (until it reaches `plt.show()`). The argument `label=` holds the label that will appear in the legend.Try it

In [None]:
plt.plot(topo[0,:], label='North')
plt.plot(topo[-1,:], 'r--', label='South')
plt.plot(topo[int(len(topo)/2),:], 'g:', linewidth=3, label='Mid')

plt.title('Topographic profiles')
plt.ylabel('Elevation (m)')
plt.xlabel('<-- West    East -->')
plt.legend(loc = 'lower left')

plt.show() 

 ## Practice your skills: Make your own plots 

 Create a single plot showing how the maximum (`numpy.max()`),
 minimum (`numpy.min()`), and mean (`numpy.mean()`) elevation changes with longitude. Label the axes and include a title for each of the  plots (Hint: use `axis=0`). Create a legend.

In [None]:
plt.plot(topo.min(axis=0), label='Min')
plt.plot(topo.max(axis=0), 'r--', label='Max')
plt.plot(topo.mean(axis=0), 'g:', linewidth=3, label='mean')

plt.title('Topographic profiles')
plt.ylabel('Elevation (m)')
plt.xlabel('<-- West    East -->')
plt.legend(loc = 'lower left')

plt.show() 

 ## Practice your skills: Subplots 

 We often want to arrange separate plots in layouts with multiple rows
 and columns. The script below uses subplots to show the elevation
 profile at the western edge, the mid longitude, and eastern edge of
 the region. Subplots can be a little weird because they require the
 axes to be defined before plotting. Type (don't copy-past!) the code
 below to get a sense of how it works.
 
This script uses a number of new commands. The function `plt.figure()`
creates a space into which we will place the three plots. The parameter
`figsize` tells Python how big to make this space. Each subplot is
placed into the figure using the `subplot` command. The `subplot`
command takes 3 parameters: the first denotes the total number of rows
of subplots in the figure, the second is the total number of columns of
subplots in the figure, and the final parameters identifies the
position of the subplot in the grid. The axes of each subplot are
called with different variable (axes1, axes2, axes3, axes4). Once a
subplot is created, the axes can be labeled using the `set_xlabel()`
(or `set_ylabel()`) method. `plt.show()` is called after the entire
figure is set up.

In [None]:
import numpy as np 
import matplotlib.pyplot as plt

topo = np.loadtxt('../../data/topo.asc', delimiter=',')

fig = plt.figure(figsize=(16.0, 3.0))

axes1 = fig.add_subplot(1,3,1)
axes2 = fig.add_subplot(1,3,2)
axes3 = fig.add_subplot(1,3,3)

axes1.plot(topo[:,0])
axes1.set_ylim([2500,3900])
axes1.set_ylabel('Elevation (m)')
axes1.set_xlabel('<-- N   S -->')
axes1.set_title('West')

axes2.plot(topo[:,int(len(topo)/2)])
axes2.set_ylim([2500,3900])
axes2.set_xlabel('<-- N   S -->')
axes2.set_title('Mid')

axes3.plot(topo[:,-1])
axes3.set_ylim([2500,3900])
axes3.set_xlabel('<--N   S -->')
axes3.set_title('East')

plt.show(fig) 

## Subplots of DEMs (Takehome) 
 
Make a 4x2 grid of subplots that use the function `imshow` to display each quarter of the dataset (ie. split down the middle in both x and y) in the left column. Plot corresponding profiles going from east to west in center of the image (cfr. Mid) in the right column.

When plotting the DEMs (left column)
* Don't label axes or add a colorbar. It can be tricky to do this with subplots.
* To set the range of colors for one subplot, include the arguments `vmin` and `vmax` in `imshow` like this:


In [None]:
vmin = topo.min()
vmax = topo.max()
plt.imshow(topo, vmin=vmin, vmax=vmax) 

In [None]:
fig = plt.figure(figsize=(16.0, 3.0))

axes1 = fig.add_subplot(4,2,1)
axes2 = fig.add_subplot(4,2,2)
axes3 = fig.add_subplot(4,2,3)
axes4 = fig.add_subplot(4,2,4)
axes5 = fig.add_subplot(4,2,5)
axes6 = fig.add_subplot(4,2,6)
axes7 = fig.add_subplot(4,2,7)
axes8 = fig.add_subplot(4,2,8)

vmin = topo.min()
vmax = topo.max()

topo1 = topo[:int(topo.shape[0]/2),:int(topo.shape[1]/2)]
axes1.imshow(topo1, vmin=vmin, vmax=vmax) 
axes1.axes.get_xaxis().set_visible(False)
axes1.axes.get_yaxis().set_visible(False)
axes2.plot(topo1[:,-1])
axes2.set_ylim([vmin,vmax])
axes2.set_xlabel('<--N   S -->')
axes2.set_title('North-West')


topo2 = topo[int(topo.shape[0]/2):,:int(topo.shape[1]/2)]
axes3.imshow(topo2, vmin=vmin, vmax=vmax) 
axes3.axes.get_xaxis().set_visible(False)
axes3.axes.get_yaxis().set_visible(False)
axes4.plot(topo2[:,-1])
axes4.set_ylim([vmin,vmax])
axes4.set_xlabel('<--N   S -->')
axes4.set_title('South-West')

topo3 = topo[:int(topo.shape[0]/2),int(topo.shape[1]/2):]
axes5.imshow(topo3, vmin=vmin, vmax=vmax) 
axes5.axes.get_xaxis().set_visible(False)
axes5.axes.get_yaxis().set_visible(False)
axes6.plot(topo3[:,-1])
axes6.set_ylim([vmin,vmax])
axes6.set_xlabel('<--N   S -->')
axes6.set_title('North-East')

topo4 = topo[int(topo.shape[0]/2):,int(topo.shape[1]/2):]
axes7.imshow(topo4, vmin=vmin, vmax=vmax) 
axes7.axes.get_xaxis().set_visible(False)
axes7.axes.get_yaxis().set_visible(False)
axes8.plot(topo4[:,-1])
axes8.set_ylim([vmin,vmax])
axes8.set_xlabel('<--N   S -->')
axes8.set_title('South-East')

plt.show()

ax=plt.imshow(topo)
ax.axes.get_xaxis().set_visible(False)
ax.axes.get_yaxis().set_visible(False)