Start-MAJA is a basic orchestrator to process whole time series of Sentinel-2 L1C products for a given tile, using the MAJA L2A processor, which detects clouds and shadows and performs an atmospheric correction. Start-MAJA works on Linux systems
Branch: master
Clone or download
Latest commit b0aa0f7 Feb 10, 2019

Readme.md

Content

  1. Introduction
  2. System
  3. Change log
  4. MAJA output format
  5. Get and Install MAJA
  6. Use start_maja
  7. Example workflow
  8. Questions

Introduction

The following script will help you run the MAJA L2A processor on your computer, for Sentinel-2 data only so far. You can also run MAJA on CNES PEPS collaborative ground segment using the maja-peps script also available on github. Using PEPS will be much easier, but is not meant for mass processing.

MAJA stands for Maccs-Atcor Joint Algorithm. This atmospheric correction and cloud screening software is based on MACCS processor, developped for CNES by CS-SI company, from a method and a prototype developped at CESBIO, 1 2 3. In 2017, thanks to an agreement between CNES and DLR and to some funding from ESA, we started adding methods from DLR 's atmospheric correction software ATCOR into MACCS. MACCS then became MAJA.

  • The first version resulting from this collaboration was MAJA V1-0. If you are using this version, you will also need the version v1_0 of start_maja.

  • A second version of MAJA, v2-1 was used in Theia, but was not distributed to users, because the version 3 was available shortly afterwards.

  • This version of start_maja.py is made to run MAJA 3.2.

MAJA has a very unique feature among all atmospheric correction processors: it uses multi-temporal criteria to improve cloud detection and aerosol retrieval. Because of this feature, it is important to use MAJA to process time series of images and not single images. Moreover, these images have to be processed chronologically. To initialise processing of a time series, a special mode is used, named "backward mode". To get a correct first product, we process in fact a small number of products in anti-chronological order (default value of number of images processed in backward mode is 8, but consider increasing it if your region is very cloudy). Then all the products are processed in "nominal" mode and chronological order. When a product is fully or nearly fully cloudy, it is not issued to save processing time and disk space.

For more information about MAJA methods but without details, please read : http://www.cesbio.ups-tlse.fr/multitemp/?p=6203 To get all details on the methods, MAJA's ATBD is available here : http://tully.ups-tlse.fr/olivier/maja_atbd/blob/master/atbd_maja.pdf, or reference 1, below.

MAJA needs parameters, that ESA names GIPP. We have also set-up an internal repository containing parameters for all sensors actually processed by MAJA, including Sentinel-2, Venµs and LANDSAT 8. This repository is kept up to date with the operational processors. See also the parameters section below.

System

MAJA works on Linux platforms. We have tested it for Linux RedHat 6+, CentOS 6+, Ubuntu 12+. It requires at least 8GB of memory per instance of MAJA running in parallel. It also requires disk space (1GB per input L1C, 2GB per outpult L2A), and can use several threads in parallel. This is set in the userconf files, and the default value is 8 threads. Above 8, the improvement of performances is not linear with the number of threads.

On our two years old computer, 8GB and 8 threads, it takes 22 minutes to make a L2A product, except for initialisation in "backward mode" (the first product in a time series, which takes about 1 hour).

Change Log

V3.2 (2019/02/01)

We moved start-maja to a new repository, pertaining to CNES and not to Olivier Hagolle's personal github. It is also an opportunity to clean the repository, as the initial one had binary parameters in it, had grown a lot and took a long time to download. So we started from scratch.

The older repository is still accessible : from https://github.com/olivierhagolle/Start_maja, but will not be updated anymore.

Several improvements were brought :

  • in the command line interface
  • to adapt it to CNES HPC context (optional of course)
  • to account from MAJA V3.2 and work with CAMS data.
  • to simplify DTM preparation (thanks to Peter Kettig contribution)
  • we removed this stupid (OH's) idea to remove the "GIPP_" characters to form the context name

MAJA V3.2 brings a couple of improvements compared to V3.1:

  • MAJA 3.2 adapts to a bug from Sentinel-2 L1C products, which sometimes (but quite frequently) provide the detector footprints in an incorrect order since October 2018.
  • The CAMS data can also be used as a default value for AOT estimates. The default CAMS AOT is used with a low weight in the cost function. If MAJA does not find many suitable pixels to estimate the AOT, the CAMS value will have an influence, but in general, a large number of measurements are available in an image, and in that case, CAMS has no influence (except on the aerosol type, see below, V3.1). Finally, this improvement will be usefull over snow covered landscapes, or bright deserts, of for images almost fully covered by clouds.

V3.1 (2018/07/09)

Older versions (click to unfold)

Until MAJA V3.1 there were two output formats, one for the products generated at Theia, and one for the products generated by MAJA used with standard ESA L1C products. In the future, we will adopt the output format of Theia. However, for this version, we provide a choise of two outputs. To choose which output format is used by MAJA, you will need to choose between two binary versions:

MAJA 3.1 ships several improvements :

  • the main improvement is the use of Copernicus Atmosphere Monitoring Service (CAMS) aerosol products, which are used to constrain the aerosol type in the estimates. This brings a major improvement in places where the aerosols can differ a lot from a continental model which was used so far,it might slightly degraded the reults where the aerosol model was the correct one. However, a bug on the time and mlocation interpolation of CAMS data was found, and we recommend to activate the CAMS option only when it is fixed with MAJA 3.1.2.

  • since version V2-1, MAJA also includes a correction for thin cirrus clouds and a directional effect correction used to improve the estimate of AOT when using Sentinel-2 time series coming from adjacent orbits. More information is available here: http://www.cesbio.ups-tlse.fr/multitemp/?p=13291

  • depending on the executable downloaded, you can have access to the same output format as the one used by MUSCATE processing center.

  • and finally, MAJA is now provided for RedHat or Ubuntu Linux families.

V1.0 (2018/07/09)

We just added a tag, v1.0 to get a similar version number as the one used for MAJA. The corresponding release can be accessed here

v.0.9.1 (2018/03/29)

Added MAJA error catching. As a result, the processing of a whole time series stops if MAJA fails for a given date.

v0.9 (2017/10/02)

  • this version of start_maja works with both S2A and S2B
  • we have found errors, especially regarding water vapour, in the parameters we provided in the "GIPP_nominal" folder. These parameters have been removed and we strongly advise you to do the same.
  • we have updated the parameters and provided them for both S2A and S2B in the folder GIPP_S2AS2B

Data format

We provide two versions of MAJA's binary code depending on the format you wish to use :

  • the MAJA version with "Sentinel2-TM" plugin uses the Theia format as output. This format is described here.

  • the other version still uses the native format, described here. We might decide to stop support for this format in the coming versions.

Get MAJA

Get MAJA Sofware

MAJA is provided as a binary code and should at least work on RedHat (6 and 7), Cent 0S, or Ubuntu recent versions. Its licence prevents commercial use of the code. For a licence allowing commercial use, please contact CNES (Olivier Hagolle). MAJA's distribution site is https://logiciels.cnes.fr/en/content/maja.

MAJA is provided under two versions depending on the format you would like to use.

If you wish to use MUSCATE format, which is documented here, you will have to download the TM binary.

If you wish to use the native format, which is documented here, as for MAJA 1_0, you will have to download the "NoTM" version. Anyway, be aware that we will probably not maintain that version in the coming years.

install MAJA

Installation of MAJA is straightforward on linux systems. You just have to unzip the provided package and use the following command :

>>> bash MAJA-3.2.2_TM.run --target /path/to/install

Basic Supervisor for MAJA processor

The basic supervisor start_maja enables to process successively all files in a time series of Sentinel-2 images for a given tile, stored in a folder. The initialisation of the time series is performed with the "backward mode", and then all the dates are processed in "nominal" mode. The backward mode takes much more time than the nominal mode. On my computer, which is a fast one, the nominal mode takes 15 minutes, and the backward mode takes almost one hour. No control is done on the outputs, and it does not check if the time elapsed between two successive products used as input is not too long and would require restarting the initialisation in backward mode.

To use this start_maja.py, you will need to configure the directories within the folder.txt file.

Download Sentinel-2 data :

The use of peps_download.py to download Sentinel-2 l1c PRODUCTS is recommended : https://github.com/olivierhagolle/peps_download

Parameters

The tool needs a lot of configuration files which are provided in two directories "userconf" and "GIPP_S2AS2B". I tend to never change the "userconf", but the GIPP_S2AS2B contains the parameters and look-up tables, which you might want to change. Most of the parameters lie within the L2COMM file. When I want to test different sets of parameters, I create a new GIPP folder, which I name GIPP_context, where context is passed as a parameter of the command line with option -c .

We provide two sets of parameters, one to work without CAMS data, and one to work with CAMS data. The latter needs a lot of disk space (~1.5 GB), as the LUT are provided not only for one aerosol type, but for for 5 aerosol types, and 6 water vapour contents. As Github limits the repository size to 1 GB, we are using a gitlab repository to distribute the parameters (GIPP):

The look-up tables are too big to be but on our gitlab server, you will have to download them following the link in the GIPP readme file, and unzip them in your GIPP folder (I know, it's a bit complicated)

Folder structure

To run MAJA, you need to store all the necessary data in an input folder. Here is an example of its content in nominal mode.

Folder structure...

S2A_MSIL1C_20180316T103021_N0206_R108_T32TMR_20180316T123927.SAFE
S2A_TEST_GIP_CKEXTL_S_31TJF____10001_20150703_21000101.EEF
S2A_TEST_GIP_CKQLTL_S_31TJF____10005_20150703_21000101.EEF
S2A_TEST_GIP_L2ALBD_L_CONTINEN_10005_20150703_21000101.DBL.DIR
S2A_TEST_GIP_L2ALBD_L_CONTINEN_10005_20150703_21000101.HDR
S2A_TEST_GIP_L2COMM_L_ALLSITES_10008_20150703_21000101.EEF
S2A_TEST_GIP_L2DIFT_L_CONTINEN_10005_20150703_21000101.DBL.DIR
S2A_TEST_GIP_L2DIFT_L_CONTINEN_10005_20150703_21000101.HDR
S2A_TEST_GIP_L2DIRT_L_CONTINEN_10005_20150703_21000101.DBL.DIR
S2A_TEST_GIP_L2DIRT_L_CONTINEN_10005_20150703_21000101.HDR
S2A_TEST_GIP_L2SMAC_L_ALLSITES_10005_20150703_21000101.EEF
S2A_TEST_GIP_L2TOCR_L_CONTINEN_10005_20150703_21000101.DBL.DIR
S2A_TEST_GIP_L2TOCR_L_CONTINEN_10005_20150703_21000101.HDR
S2A_TEST_GIP_L2WATV_L_CONTINEN_10005_20150703_21000101.DBL.DIR
S2A_TEST_GIP_L2WATV_L_CONTINEN_10005_20150703_21000101.HDR
S2B_OPER_SSC_L2VALD_32TMR____20180308.DBL.DIR
S2B_OPER_SSC_L2VALD_32TMR____20180308.HDR
S2B_TEST_GIP_CKEXTL_S_31TJF____10001_20150703_21000101.EEF
S2B_TEST_GIP_CKQLTL_S_31TJF____10005_20150703_21000101.EEF
S2B_TEST_GIP_L2ALBD_L_CONTINEN_10003_20150703_21000101.DBL.DIR
S2B_TEST_GIP_L2ALBD_L_CONTINEN_10003_20150703_21000101.HDR
S2B_TEST_GIP_L2COMM_L_ALLSITES_10008_20150703_21000101.EEF
S2B_TEST_GIP_L2DIFT_L_CONTINEN_10002_20150703_21000101.DBL.DIR
S2B_TEST_GIP_L2DIFT_L_CONTINEN_10002_20150703_21000101.HDR
S2B_TEST_GIP_L2DIRT_L_CONTINEN_10002_20150703_21000101.DBL.DIR
S2B_TEST_GIP_L2DIRT_L_CONTINEN_10002_20150703_21000101.HDR
S2B_TEST_GIP_L2SMAC_L_ALLSITES_10005_20150703_21000101.EEF
S2B_TEST_GIP_L2TOCR_L_CONTINEN_10002_20150703_21000101.DBL.DIR
S2B_TEST_GIP_L2TOCR_L_CONTINEN_10002_20150703_21000101.HDR
S2B_TEST_GIP_L2WATV_L_CONTINEN_10005_20150703_21000101.DBL.DIR
S2B_TEST_GIP_L2WATV_L_CONTINEN_10005_20150703_21000101.HDR
S2__TEST_AUX_REFDE2_T32TMR_0001.DBL.DIR
S2__TEST_AUX_REFDE2_T32TMR_0001.HDR
S2__TEST_GIP_L2SITE_S_31TJF____10001_00000000_99999999.EEF

The .SAFE file is the input product. THE L2VALD files are the L2A product, which is the result from a previous execution of MAJA. The files with GIP are parameter files for S2A and S2B, that you will find in this repository. The REFDE2 files are the DTM files. How to obtain them is explained below.

A "userconf" folder is also necessary, but it is also provided in this repository.

DTM

A DTM folder is needed to process data with MAJA. Of course, it depends on the tile you want to process. This DTM must be stored in the DTM folder, which is defined within the code. A tool exists to create this DTM, it is available in the "prepare_dtm" folder.

CAMS

if you intend to use the data from Copernicus Atmosphere Monitoring Service (CAMS), that we use to get an information on the aerosol type, you will need to download the CAMS data. A download tool is provided in the cams_download directory of this repository

Example workflow

Here is how to process a set of data above tile 31TFJ, near Avignon in Provence, France. To process any other tile, you will need to prepare the DTM and store the data in the DTM folder.

Install

  • Install MAJA

  • Clone the current repository to get start_maja.py https://github.com/CNES/Start-MAJA.git

Retrieve Sentinel-2 L1C data.

  • For instance, with peps_download.py (you need to have registered at https://peps.cnes.fr and store the account and password in peps.txt file.

python ./peps_download.py -c S2ST -l 'Avignon' -a peps.txt -d 2017-01-01 -f 2017-04-01 -w /path/to/L1C_DATA/Avignon

  • I tend to store the data per site. A given site can contain several tiles. All the L1C tiles corresponding to a site are stored in a directory named /path/to/L1C_DATA/Site

  • Unzip the LIC files in /path/to/L1C_DATA/Avignon

Add GIPP parameters directory in the Start_maja folder

(see parameters section above)

Create DTM

Follow DTM generation instructions : https://github.com/CNES/Start-MAJA/blob/master/prepare_dtm/Readme.md Copy DTM in "DTM" folder within Start_Maja folder.

Download CAMS data

if you want to use CAMS option, follow cams_download tool instructions : https://github.com/CNES/Start-MAJA/tree/master/cams_download Downloading CAMS data can be quite long these days from ECMWF servers.

Execute start_maja.py

  • To use the start_maja script, you need to configure the directories, within the folder.txt file. Here is my own configuration, also provided in the folders.txt file in this repository.
repCode=/mnt/data/home/hagolleo/PROG/S2/lance_maja
repWork=/mnt/data/SENTINEL2/MAJA
repL1  =/mnt/data/SENTINEL2/L1C_PDGS
repL2  =/mnt/data/SENTINEL2/L2A_MAJA
repMaja=/mnt/data/home/hagolleo/Install-MAJA/maja/core/1.0/bin/maja
repCAMS  =/mnt/data/SENTINEL2/CAMS
  • repCode is where Start_maja.py is stored, together with the DTM, userconf and GIPP directories
  • repWork is a directory to store the temporary files
  • repL1 is where to find the L1C data (without the site name which is added aferward)
    • Les produits SAFE doivent donc être stockés à l'emplacement suivant : repL1 = repL1/site
  • repL2 is for the L2A data (without the site name which is added aferward)
  • repMAJA is where the Maja binary code is
  • repCAMS is where CAMS data are stored

Here is an example of command line

Usage   : python ./start_maja.py -f <folder_file>-c <context> -t <tile name> -s <Site Name> -d <start date>
Example : python ./start_maja.py -f folders.txt -c GIPP_MAJA_3_0_S2AS2B_CAMS -t 31TFJ -s Avignon -d 20170101 -e 20180101

Description of command line options :

  • -f provides the folders filename
  • -c is the context, MAJA uses the GIPP files contained in GIPP_xxx directory. The L2A products will be created in rep_L2/Site/Tile/GIPP_xxx folder
  • -t is the tile number
  • -s is the site name
  • -d (aaaammdd) is the first date to process within the time series
  • -e (aaaammdd) is the last date to process within the time serie-s
  • -z directly uses zipped L1C files

Caution, when a product has more than 90% of clouds, the L2A is not issued. However, a folder with NOTVALD is created.

Known Errors

Some Sentinel-2 L1C products lack the angle information which is required by MAJA. In this case, MAJA stops processing with an error message. This causes issues particularly in the backward mode. These products were acquired in February and March 2016 and have not been reprocessed by ESA (despited repeated asks from my side). You should remove them from the folder which contains the list of L1C products to process.

Questions

If you have issues or questions with MAJA, please raise an issue on this github repository. It will serve as a forum.

References :

1: A multi-temporal method for cloud detection, applied to FORMOSAT-2, VENµS, LANDSAT and SENTINEL-2 images, O Hagolle, M Huc, D. Villa Pascual, G Dedieu, Remote Sensing of Environment 114 (8), 1747-1755

2: Correction of aerosol effects on multi-temporal images acquired with constant viewing angles: Application to Formosat-2 images, O Hagolle, G Dedieu, B Mougenot, V Debaecker, B Duchemin, A Meygret, Remote Sensing of Environment 112 (4), 1689-1701

3: A Multi-Temporal and Multi-Spectral Method to Estimate Aerosol Optical Thickness over Land, for the Atmospheric Correction of FormoSat-2, LandSat, VENμS and Sentinel-2 Images, O Hagolle, M Huc, D Villa Pascual, G Dedieu, Remote Sensing 7 (3), 2668-2691

4: MAJA's ATBD, O Hagolle, M. Huc, C. Desjardins; S. Auer; R. Richter, https://doi.org/10.5281/zenodo.1209633