# Get Landsat Time Series (lsts) Sample Data

In this notebook we download the *Forest in Colorado, USA (P035-R032)* dataset from [Chris Holdens landsat_stack repository](https://github.com/ceholden/landsat_stack#landsat_stack) and save part of the data as single layer TIFF files.

Specifically, we run through the following steps:

* We download and extract the data.
* We save the RED, NIR and SWIR1 bands and the FMASK band from the years 2008 to 2013 as single layer TIFF files.
* We delete the stack.
* We create a dataframe with all *.tif* files.

The following is a lists of all Landsat / Fmask bands we download. 
The ones that are not striked out are the ones we keep.

* <del>Band 1 SR (SR * 10000)
* <del>Band 2 SR (SR * 10000)
* Band 3 SR (SR * 10000)
* Band 4 SR (SR * 10000)
* Band 5 SR (SR * 10000)
* <del>Band 7 SR (SR * 10000)
* <del>Band 6 Thermal Brightness (C * 100)
* Fmask
    * 0 - clear land
    * 1 - clear water
    * 2 - cloud
    * 3 - snow
    * 4 - shadow
    * 255 - NoData
    
The dataset is a sample dataset included under the name *lsts* (Landsat time series) in the ``eotools-dataset`` package.

First, download and extract the data.

In [1]:
! wget http://ftp-earth.bu.edu/public/ceholden/landsat_stacks/p035r032.tar.bz2
!tar xf p035r032.tar.bz2    

--2019-02-03 13:51:17--  http://ftp-earth.bu.edu/public/ceholden/landsat_stacks/p035r032.tar.bz2
Resolving ftp-earth.bu.edu (ftp-earth.bu.edu)... 128.197.229.67
Connecting to ftp-earth.bu.edu (ftp-earth.bu.edu)|128.197.229.67|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 11963073 (11M) [application/x-bzip2]
Saving to: ‘p035r032.tar.bz2’


2019-02-03 13:51:46 (418 KB/s) - ‘p035r032.tar.bz2’ saved [11963073/11963073]



Imports and the helper functions.

In [3]:
import os
import pandas as pd
from pathlib import Path 
import shutil
import subprocess

def save_as_single_layer_file(src_dir, overwrite=False, remove_stack=True):
    keep = [2, 3, 4, 7]
    band_names = ["b1", "b2", "b3", "b4", "b5", "b7", "b6", "fmask"]
    src = list(src_dir.glob("*gtif"))[0]
    for bindex, bname in enumerate(band_names):
        if bindex not in keep:
            continue
        
        dst_dir = sdir.parent.parent / src_dir.stem
        dst_dir.mkdir(exist_ok=True)
        dst = dst_dir / f"{src_dir.stem.split('_')[0]}_{band_names[bindex]}.tif"
        if (not dst.exists() or overwrite):
            ot = "Byte" if bname == "fmask" else "Int16"
            exit_code = subprocess.check_call(
                f"gdal_translate -ot {ot} -b {bindex+1} -co COMPRESS=DEFLATE {str(src)} {str(dst)}", 
                shell=True)


Save the selected bands and fmask of the selected years as single layer files. 

In [4]:
bdir_scenes_single = Path("./p035r032")
bdir_scenes = Path("./p035r032/images")
scene_dirs = list(bdir_scenes.glob("L*"))

counter = 0
for i, sdir in enumerate(scene_dirs):
    if int(sdir.stem[9:13]) < 2008:
        continue
    counter += 1
    print(f"{counter} / {len(scene_dirs)} - {sdir}")
    save_as_single_layer_file(src_dir=sdir, overwrite=False)

1 / 447 - p035r032/images/LE70350322009312EDC00
2 / 447 - p035r032/images/LE70350322010283EDC00
3 / 447 - p035r032/images/LE70350322011110EDC00
4 / 447 - p035r032/images/LT50350322011278PAC01
5 / 447 - p035r032/images/LE70350322008262EDC00
6 / 447 - p035r032/images/LT50350322009256PAC01
7 / 447 - p035r032/images/LT50350322009128PAC01
8 / 447 - p035r032/images/LT50350322009208PAC01
9 / 447 - p035r032/images/LT50350322011246PAC01
10 / 447 - p035r032/images/LE70350322008214EDC00
11 / 447 - p035r032/images/LE70350322009072EDC00
12 / 447 - p035r032/images/LE70350322012257EDC01
13 / 447 - p035r032/images/LE70350322012241EDC00
14 / 447 - p035r032/images/LE70350322009248EDC00
15 / 447 - p035r032/images/LE70350322008198EDC00
16 / 447 - p035r032/images/LT50350322010147PAC01
17 / 447 - p035r032/images/LT50350322008190PAC01
18 / 447 - p035r032/images/LE70350322012113EDC00
19 / 447 - p035r032/images/LE70350322011238EDC00
20 / 447 - p035r032/images/LE70350322011190EDC00
21 / 447 - p035r032/images/LE

Delete the *image* directory which contained the complete downloaded data.

In [6]:
shutil.rmtree(bdir_scenes)

Lets derive the paths and some thereof derived metadata and put the info in a dataframe.

In [8]:
layers_paths = list(Path(bdir_scenes_single).rglob("*.tif"))
layers_df = pd.Series([p.stem for p in layers_paths]).str.split("_", expand=True) \
    .rename({0: "sceneid", 1:"band"}, axis=1)
layers_df["date"] = pd.to_datetime(layers_df.sceneid.str[9:16], format="%Y%j")
layers_df["path"] = layers_paths
layers_df = layers_df.sort_values(["date", "band"])
layers_df = layers_df.reset_index(drop=True)
layers_df.head(10)

Unnamed: 0,sceneid,band,date,path
0,LT50350322008110PAC01,b3,2008-04-19,p035r032/LT50350322008110PAC01/LT5035032200811...
1,LT50350322008110PAC01,b4,2008-04-19,p035r032/LT50350322008110PAC01/LT5035032200811...
2,LT50350322008110PAC01,b5,2008-04-19,p035r032/LT50350322008110PAC01/LT5035032200811...
3,LT50350322008110PAC01,fmask,2008-04-19,p035r032/LT50350322008110PAC01/LT5035032200811...
4,LE70350322008118EDC00,b3,2008-04-27,p035r032/LE70350322008118EDC00/LE7035032200811...
5,LE70350322008118EDC00,b4,2008-04-27,p035r032/LE70350322008118EDC00/LE7035032200811...
6,LE70350322008118EDC00,b5,2008-04-27,p035r032/LE70350322008118EDC00/LE7035032200811...
7,LE70350322008118EDC00,fmask,2008-04-27,p035r032/LE70350322008118EDC00/LE7035032200811...
8,LT50350322008126PAC01,b3,2008-05-05,p035r032/LT50350322008126PAC01/LT5035032200812...
9,LT50350322008126PAC01,b4,2008-05-05,p035r032/LT50350322008126PAC01/LT5035032200812...


Reformat the data such that we can check if some of the scenes have missing bands.

In [9]:
counts_bands_per_sceneid = layers_df[["sceneid", "band", "path"]] \
    .pivot_table(index="sceneid", columns="band", aggfunc="count")
display(counts_bands_per_sceneid.head(2))
display(counts_bands_per_sceneid.tail(2))
counts_bands_per_sceneid.apply("sum", axis=0)

Unnamed: 0_level_0,path,path,path,path
band,b3,b4,b5,fmask
sceneid,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
LE70350322008118EDC00,1,1,1,1
LE70350322008150EDC00,1,1,1,1


Unnamed: 0_level_0,path,path,path,path
band,b3,b4,b5,fmask
sceneid,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
LT50350322011278PAC01,1,1,1,1
LT50350322011294PAC01,1,1,1,1


      band 
path  b3       105
      b4       105
      b5       105
      fmask    105
dtype: int64

Which is not the case (;

**The End**