
# FINN Preprocessor

## 1. User specified configurations

You specify three items specific to analysis.

1. `tag_af`: a string that identifies this active fire dataset through the processing.

2. `af_fnames`: list of path for the shape files you downloaded from FIRMS website  

Files you specified as `af_names` will be imported into a databse schema "af_<i>tag_af</i>", and processed in the database.  
Final output file will be "out_<i>tag_af</i>_*.csv" and "out_<i>tag_af</i>_*.shp".

3. `year_rst`: MODIS raster data year to be used for the analysis

Other parameters  such as `tag_lct`, `tag_vcf`, `tag_regnum` (identify landcover, vcf and region number dataset) are set automatically for MODIS dataset.

The datasets are imported into database schema "raster", with table names "rst_<i>tag_lct</i>", or "rst_modlct_2017", for example.  

An overview raster "o_32_rst_modlct_2007" is created as well, as the real dataset is difficult to handle for QA.


Below you have example for the first cell.

Example 1.  Example usage with small local dataset.  Uses both MODIS and VIIRS active fire, which came with the source code, in "sample_datasets" directory.
```python
# tag to identify active fire dataset
tag_af = 'testOTS_092018'

# active fire shp file name and path
af_fnames = [
    '../sample_datasets/fire/testOTS_092018/fire_archive_M6_23960.shp',
    '../sample_datasets/fire/testOTS_092018/fire_archive_V1_23961.shp',
]

# MODIS raster datasets' year
year_rst = 2017
```

Example 2.  Another local fire example, using NRT dataset

```python
# tag to identify active fire dataset
# USA (excl. ALK) 7 days fire downloaded 2019-01-13 
tag_af = 'testUSA_012019'

# shp file names
af_fnames = [
    '../sample_datasets/fire/testUSA_012019/MODIS_C6_USA_contiguous_and_Hawaii_7d.shp',
    '../sample_datasets/fire/testUSA_012019/VNP14IMGTDL_NRT_USA_contiguous_and_Hawaii_7d.shp',
]

# MODIS raster datasets' year
year_rst = 2017
```


Example 3.  Example usage for global 2016 run.  AF file is provided by us but will be downloaded from AWS that we host.
```python
# tag to identify active fire dataset
tag_af = 'mod_global_2016'

# shp file names
af_fnames = [
    '../sample_datasets/fire/global_2016/fire_archive_M6_28864.shp',
]

# MODIS raster datasets' year
year_rst = 2016
```


Example 4.
If you have downloaded shape file for AF, here is an example.  AF file should reside somewhere under finn_preproc, i.e., where you git clone (or downloaded) the source.  For example, 'sample_datasets' directory came with the source code and can be accessed with the small local dataset example (ex 1.)
```python
# tag to identify active fire dataset
tag_af = 'XXXXX'

# active fire shp file name and path
af_fnames = [
]

# MODIS raster datasets' year
year_rst = XXX

```

In [None]:
# tag to identify active fire dataset
tag_af = 'mod_global_2016'

# shp file names
af_fnames = [
    '../sample_datasets/fire/global_2016/fire_archive_M6_28864.shp',
]

# MODIS raster datasets' year
year_rst = 2016

The rest of code should run without user editing.

## 2. Setting Envoronments

In [None]:
# python libraries
import sys
import os
import re
import glob
import datetime
import subprocess
import shlex
from urllib.parse import urlparse
from importlib import reload
import gdal
import matplotlib.pylab as plt


# finn preproc codes
sys.path = sys.path + ['../code_anaconda']
import downloader
import af_import
import rst_import
import polygon_import
import run_step1
import run_step2
import export_shp
import plotter

## 3. Import AF dataset

### Test active fire data files exist

This particular sample AF dataset are provided by FINN developper.  In other applications, it will be user's resoponsibility to provide shape file for active fire in specified path/name.

In [None]:
# check input file exists
print('checking if input files exist:')
re_shp = re.compile('fire_archive_(.*).shp')
re_zip = re.compile('DL_FIRE_(.*).shp')

re_shp_nrt = re.compile('(MODIS_C6|VNP14IMGTDL_NRT)_(.*).shp')



for i,af_fname in enumerate(af_fnames):
    print("%s: " % af_fname, end='')
    
    pn,fn = os.path.split(af_fname)
    zname = None
    
    if os.path.exists(af_fname):
        print("exists.")
        # if .zip file, need to expand.
        if af_fname[-4:] == '.shp':
            # you are good
            print('OK')
        
        elif af_fname[-4:] == '.zip':
            # still need to unzip
            zname = af_fname
            m = re_zip.match(af_fname)
            if m:
                arcname = m.group()[0]
                sname = 'fire_archive_%s.shp' % arcname
            else:
                # i cannot predict name of shp file...
                import zipfile
                # find what shp file included...?
                raise RuntileError('specify .shp file in af_names list!')
                arcname,sname = None, None
        else:
            raise RuntimeError('specify .shp file in af_names list!')
    else:
        print("doesn't exist.")
        
        if af_fname[-4:] == '.shp':
            # guess the zip file name
            
            pn,fn=os.path.split(af_fname)
            
            # see if it's the sample giant archive we provide 
            if fn == 'fire_archive_M6_28864.shp':
                zurl = 'https://s3-us-west-2.amazonaws.com/earthlab-finn/2016-global-DL_FIRE_M6_28864.zip'
                zn = '2016-global-DL_FIRE_M6_28864.zip'
                zname = os.path.join(pn, zn)
                sname = fn
                if not os.path.exists(zname):
                    print('downloading the sample AF file: %s' % zn)
                    subprocess.run(['wget', '-P', pn, zurl], check=True)
            else:

                # see if it's an archive of AF
                m = re_shp.match(fn)
                if m:
                    archname = m.groups()[0]
                    zname = os.path.join( pn, 'DL_FIRE_%s.zip' % arcname)
                    sname = fn
                    print('  found zip: %s' % zname)
                else:
                    # see if it's NRT data
                    m = re_shp_nrt.match(fn)

                    if m:
                        # NRT downloads
                        zname = af_fname[:-4] + '.zip'
                        sname = fn
                        print('  found zip: %s' % zname)


                    else:
                        raise RuntimeError('cannot find file: %s' % af_fname)
        else:
            raise RuntimeError('cannot find file: %s' % af_fname)
    if zname:
        print('unzipping: %s' % zname)
        subprocess.run(['unzip', '-uo', zname, '-d', os.path.dirname(zname)],
                      check=True)
        assert os.path.exists(os.path.join(pn, sname))
        af_fnames[i] = os.path.join(pn, sname)
        print('OK: done')
        

### Import active fire data

Go ahead and import into database.

<b>Be careful!!</b> The code has no safe guard and wipe the schema for the scheama "af_<i>tag_af</i>" and starts over.  

Let me think the design a bit more for now.

In [None]:
reload(af_import)

# TODO this is destructive need to safe guard!
# tell user schema is there, list table names and # of row of each.  Ask her to delete manually or something to proceed
af_import.main(tag_af, af_fnames)

print()
for i,fn in enumerate(af_fnames):
    print(fn)
    p = subprocess.run(['psql', '-c', 'select count(*) from "af_%s".af_in_%d;' % (tag_af, i+1)], stdout=subprocess.PIPE)
    print(p.stdout.decode())


## 4. Download raster datasets

### Settings for Land Surface Datasets (land cover, vegetation continuous field, region definieons)

In [None]:
# tag to identify datasets, automatically set to be modlct_YYYY, modvcf_YYYY
tag_lct = 'modlct_%d' % year_rst
tag_vcf = 'modvcf_%d' % year_rst

# tag for the region number polygon
tag_regnum = 'regnum'

# definition of variables in the raster files
rasters = [
        {
            'tag': tag_lct,
            'kind': 'thematic',
            'variable': 'lct'
        },
        {
            'tag': tag_vcf,
            'kind': 'continuous',
            'variables': ['tree', 'herb', 'bare'],
        },
        {
            'tag': tag_regnum,
            'kind': 'polygons',
            'variable_in': 'region_num',
            'variable': 'regnum',
        },
]

Check if the extent of raster dataset in the database encloses all fire

In [None]:
# confirm that raster data covers extent of AF data
reload(af_import)
dct = {}
for i,fn in enumerate(af_fnames):
    for tag_rst in (tag_lct, tag_vcf):
        if len(af_fnames) == 1:
            cnts = af_import.check_raster_contains_fire(
                '"raster"."skel_rst_%s"' % tag_rst, 
                '"af_%s"."af_in"' % (tag_af)
            )

        else:
            cnts = af_import.check_raster_contains_fire(
                '"raster"."skel_rst_%s"' % tag_rst, 
                '"af_%s"."af_in_%d"' % (tag_af, i+1)
            )
        print(os.path.basename(fn), tag_rst, cnts)
        dct[(fn,tag_rst)] = cnts

        
# **TODO** In some case "fire" is ditected over the the ocean and there is no raster for that part of earth
# need to check if that's the case for the 'n_not_contained' fire

need_to_import_raster = False
if any(_['n_not_contained'] > 0 for _ in  dct.values()):
    print('Some fire is not conained in raster')
    print('Will download/import raster dataset')
    need_to_import_raster = True

    pass


Raster files URL and directories to save data

In [None]:
if need_to_import_raster:
    # all raster downloads are stored in following dir
    download_rootdir = '../downloads'

In [None]:
if need_to_import_raster:
    # earthdata's URL for landcover and VCF
    is_leap = (year_rst % 4 == 0)
    url_lct = 'https://e4ftl01.cr.usgs.gov/MOTA/MCD12Q1.006/%d.01.01/' % year_rst
    url_vcf = 'https://e4ftl01.cr.usgs.gov/MOLT/MOD44B.006/%d.03.%02d/' % (year_rst, 5 if is_leap else 6)

    ddir_lct = download_rootdir +'/'+ ''.join(urlparse(url_lct)[1:3])
    ddir_vcf = download_rootdir +'/'+ ''.join(urlparse(url_vcf)[1:3])

    print('LCT downloads goes to %s' % ddir_lct)
    print('VCF downloads goes to %s' % ddir_vcf)

Download land cover type raster

In [None]:
if need_to_import_raster:
    reload(downloader)
    downloader.download_only_needed(url = url_lct, droot = download_rootdir, pnts=af_fnames[0])

Verify LCT files' checksum.  If a file is correpted, the file is downloaded again.

In [None]:
if need_to_import_raster:
    downloader.purge_corrupted(ddir = ddir_lct, url=url_lct)

Do similar for vegetation continuous field data

In [None]:
if need_to_import_raster:
    downloader.download_only_needed(url = url_vcf, droot = download_rootdir, pnts=af_fnames[0])

In [None]:
if need_to_import_raster:
    downloader.purge_corrupted(ddir_vcf, url=url_vcf)

## 5. Import raster datasets

Downloaded files need preprocessing, which is to extract the only raster band needed, and also make coordinate system to be WGS84.  Intermediate files are created in following directories.

In [None]:
workdir_lct = '../proc_rst_%s' % tag_lct
workdir_vcf = '../proc_rst_%s' % tag_vcf
workdir_regnum = '../proc_rst_%s' % tag_regnum

print('LCT preprocessing occurs in %s' % workdir_lct)
print('VCF preprocessing occurs in %s' % workdir_vcf)
print('RegNum preprocessing occurs in %s' % workdir_regnum)

### Import land cover type

First grab hdf file names from the download directory

In [None]:
if need_to_import_raster:
    search_string = "%(ddir_lct)s/MCD12Q1.A%(year_rst)s001.h??v??.006.*.hdf" % dict(
        ddir_lct = ddir_lct, year_rst=year_rst)
    fnames_lct = sorted(glob.glob(search_string))
    print('found %d hdf files' % len(fnames_lct) )
    if len(fnames_lct) == 0:
        raise RuntimeError("check if downloads are successful and search string to be correct: %s" % search_string)

The next command performs three tasks, "merge", "resample" and "import".  First two task creates intermediate GeoTiff files in <i>work_dir</i>.  Last task actually import the data into database's <i>raster</i> schema.

In [None]:
if need_to_import_raster:
    reload(rst_import)

    rst_import.main(tag_lct, fnames=fnames_lct, workdir = workdir_lct)

At this point you should able to see the raster in the database using QGIS.  
I am also trying to make quick check here creating simple image for QA, but use of GIS tool is encouraged.

In [None]:
%matplotlib inline
import plotter
reload(plotter)
try:
    plotter.plot('raster.o_32_rst_%s' % tag_lct, '../code_anaconda/modlct.clr')
except Exception as e:
    print("Got this error: " + str(e))
    print("Didn't work, use QGIS!")
    pass

### Import vegetation continuous fields

Analogous steps repeated for vegetation continous fields.

In [None]:
if need_to_import_raster:
    # grab hdf file names
    search_string = "%(ddir_vcf)s/MOD44B.A%(year)s065.h??v??.006.*.hdf" % dict(
            ddir_vcf = ddir_vcf, year=year_rst)
    fnames_vcf = sorted(glob.glob(search_string))
    print('found %d hdf files' % len(fnames_vcf) )
    if len(fnames_vcf) == 0:
        raise RuntimeError("check if downloads are successfull and search string to be correct: %s" % search_string)

In [None]:
if need_to_import_raster:
    reload(rst_import)
    rst_import.main(tag_vcf, fnames=fnames_vcf, workdir = workdir_vcf)

In [None]:
%matplotlib inline
import plotter
reload(plotter)
try:
    plotter.plot('raster.o_32_rst_%s' % tag_vcf)
except Exception as e:
    print("Got this error: " + str(e))
    print("Didn't work, use QGIS!")
    pass

### Import countries of the world shapefile

This is actually not a raster but vector data of polygons.  But since it serves conceptually similar function as raster (specify attribute for a given geographic location), I treat it as if it is a raster dataset.  

In [None]:
if not os.path.exists(os.path.join(workdir_regnum, 'All_Countries.shp')):
    subprocess.run(['wget', '-P', workdir_regnum, 
                    'https://s3-us-west-2.amazonaws.com/earthlab-finn/All_Countries.zip'], 
                   check=True)
    subprocess.run(['unzip', os.path.join(workdir_regnum, 'All_Countries.zip'), '-d' , workdir_regnum ], 
                  check=True)

In [None]:
reload(polygon_import)
polygon_import.main('regnum', shpname = os.path.join(workdir_regnum, 'All_Countries.shp'))

## 6. Process active fire data

### Running "step 1" grouping points

In [None]:
reload(run_step1)

if 'firstday' not in dir(): firstday = None
if 'lastday' not in dir(): lastday = None
run_step1.main(tag_af, firstday=firstday, lastday=lastday, ver='v7m', run_prep = True, run_work=True)

### Running "step 2" intersection with raster datasets

In [None]:
reload(run_step2)

assert run_step2.ver == 'v8b'
run_step2.main(tag_af, rasters, firstday=firstday, lastday=lastday)

## 7. Export the output

Default output directory is this diretory (where you have this Jupyter Notebook file), and output file has long name of having tag of each datasets.

In [None]:
outdir = '.'
shpname = 'out_{0}_{1}_{2}_{3}.shp'.format(tag_af, tag_lct, tag_vcf, tag_regnum)

In [None]:
schema = 'af_' + tag_af
tblname = 'out_{0}_{1}_{2}'.format(tag_lct, tag_vcf, tag_regnum)
flds = ('v_lct', 'f_lct', 'v_tree', 'v_herb', 'v_bare', 'v_regnum')

**[TODO]** tell where these file went in the file sytem

In [None]:
reload(export_shp)
export_shp.main(outdir, schema, tblname, flds, shpname)