# Notes on file types in lsdtopotools

Last updated by Simon M Mudd 12/05/2023

This notebook provides some information about file types in lsdtopotools. 

## If you are on colab

**If you are in the `docker_lsdtt_pytools` docker container, you do not need to do any of this. 
The following is for executing this code in the google colab environment only.**

If you are in the docker container you can skip to the **First get data** section. 

First we install `lsdviztools`. This will take around a minute. It is important you do this before the `condacolab` step. 

In [None]:
!pip install lsdviztools &> /dev/null

Now we need to install lsdtopotools. We do this using something called `mamba`. To get `mamba` we install something called `condacolab`. 

In [None]:
!pip install -q condacolab
import condacolab
condacolab.install()

Now use mamba to install `lsdtopotools`. 

In [None]:
!mamba install -y lsdtopotools &> /dev/null

## One extra bit for colab

When you install `lsdviztools` on colab you will get the python `gdal` (for doing geospatial stuff) version but not the command line tools. You also need to install the command line tools for this excercise. 

In the `lsdtopotools` docker image this is already installed. 

It is faster to install `lsdtopotools` and `gdal` in one mamba call but I wanted to highlight use of gdal for this notebook. 

In [None]:
!mamba install -y gdal &> /dev/null

## Topographic data

`lsdtopotools`, for various historic reasons, reads and writes rasters in the ENVI bil format. These files have the data in binary format plus an ascii header file that any human can read. 

When you download topogrpahic data, however, it might come in all kinds of formats. How do we deal with different formats? 

### Dealing with zipped and tarred files from opentopography

Below is an example that you might come across. 

*OpenTopography* is an outstanding community website that hosts topographic data, in in this example you should go to their portal for gloabl data: https://portal.opentopography.org/dataCatalog?group=global and then select a small region to download. The filenames here will reflect downloading of a Copernicus 30 DEM. 

1. Download a small area from OpenTopography. 
2. Copy the file (`rasters_COP30.tar.gz`) to this workspace (if you are in Goolge Colab you can drag and drop the file). 
3. We can see if the file is in the workspace by calling the linux command `ls` using the `!` character, which tells this python notebook to send a command to the underlying Linux operating system:

In [None]:
!ls

If the file `rasters_COP30.tar.gz` is not there double check you have downloaded it and copied it to this workspace. 

Okay, this file is actually a "tarred" and "gzipped" file. There is a hand linux tool to unzip and untar it all in one go: `!tar xzf you_filename.tar.gz`. 

If you are used to Windows unzip functions you will be happy to learn that this is much, much faster. 

In [None]:
!tar xzf rasters_COP30.tar.gz

Now lets see what you got! There should now be a file called `output_COP30.tif`. 

In [None]:
!ls

Okay, so you should now recognise the `tif` file format, which is very common in geospatial data. 



### Converting and projecting raster data

You have a `.tif` DEM. Now what? `lsdtopotools` wants ENVI bil format. It also wants rasters in the UTM coordinate system. It turns out we can modify both of these things at once with the handy package `gdal`.

Note that there are pythonic ways to do this, but if you get the `gdal` command line tools, you can do it all in fewer steps. 

We can used something called `gdalinfo` to look at the file:

In [None]:
!gdalinfo output_COP30.tif

Okay, this has a bunch of information, but one is the latitude and longitude of the Origin, which looks like `Origin = (longitude,latitude)`. 

In my specific download the origin looks like this:
`Origin = (-118.708888888888907,36.655972222222218)`

This is a **geographic** coordinate system (a system designed for a sphere), but we want lengths and areas in topographic analysis so we need to switch to a **projected** coordinate system (where we project the sphere onto a flot surface). 

`lsdtopotools` **always** uses the UTM coordinate system, which is made up of zones. So we need to find out what zone we are in. We can use the python package UTM for this. 

However, we need to **reverse the order of the coordinates from the origin, because in the origin they are in longitude, latitude, and UTM takes them in latitude,longitude**

In [None]:
# Plug in your origin here
import utm
result = utm.from_latlon(36.655972222222218,-118.708888888888907,)
UTM_zone = result[2]
print(UTM_zone)

Okay, in my data this is in zone 11, but you will have a different zone. We now recast the data in ENVI bil format in UTM zone 11 using gdal:

In [None]:
!gdalwarp -t_srs '+proj=utm +zone=11 +datum=WGS84' -of ENVI output_COP30.tif my_DEM.bil

Okay, you now should have a DEM in both the correct format and projection for `lsdtopotools`. 

In a number of examples in the `lsdtopotools` notebooks, this process is automated by the opentopography scraper tool, but I wanted to let you see what goes on under the hood of `lsdtopotools` so you can see what sort of file manipulation you need from raw data. 