# Welcome to this tutorial for training dataset construction ! 
            
### There are 2 different steps :   
>- 1) Install and import libraries, create folders and define parameters.  
>    *Variables followed by **#@param** are variables, you can change them.*
>- 2) Construct your dataset. 

# 1) Libraries
### First create a new **virtual environment** then install all requirements by running the following :

In [1]:
!pip install -r requirements.txt



### Create all folders you will need

In [1]:
from utils import create_folders
create_folders()

### Your directory shoud be as following :
Check if the folders (the ones **in bold**) are in your directory.
- **Main folder**
    >- **models**
    >    >* .joblib files (sklearn models)
    >    >* .sav files (mappers such as pca and umap)
    >    >* folders (tensorflow models)
    >- **results**
    >    >* .png images (confusion matrices)
    >    >* .log files (tensorflow training curves)
    >- **data**
    >    >- **train**
    >    >    * train*.tfrecord.gz files (training dataset)
    >    >- **eval**
    >    >    * traineval*.tfrecord.gz files (evaluation dataset)
    >    >- **inference**
    >    >   * .tfrecord.gz files (inference dataset)
    >    >   * *-mixer.json files (needed for georeferencing, if you want to add the prediction to Earth Engine Editor)
    >    >- **predictions**
    >    >    - **colored_pipes**
    >    >        * .kml files (colored-pipe nets corresponding to labels)
    >    >    - **kml**
    >    >        * .kml files and corresponding .png images (mask-prediction images)
    >    >    - **tfrecords**
    >    >        * .TFRecord files (needed if you want to add the prediction to Earth Engine Editor)
    >    >    * .csv files

### Import, authenticate and initialize the Earth Engine library.  
If you have a gmail account, do so with yours, if not, you can use this one :  
Gmail adress : [mounierseb93@gmail.com]    
Code : [mounse$15]

In [None]:
import ee
ee.Authenticate()
ee.Initialize()

In [None]:
import tensorflow as tf

In [None]:
from dataset_construction import TFDatasetConstruction

In [None]:
# Specify inputs (Landsat bands) to the model and the response variable.
LANDSAT  = ['B1', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B10', 'B11']
SENTINEL = ['VV','VH','VV_1','VH_1']
RESPONSE = 'landcover'
# Specify the size and shape of patches expected by the model.
KERNEL_SIZE   = 128 #@param {type:"integer"}

# 2) Dataset Construction  
a) First connect to Google Drive using this address and password :  
Gmail adress : [mounierseb93@gmail.com]  
Code : [mounse$15]

**Every time your run a code, if you receive a message like this : "Please download file from Drive from folder ...", go to the Google Drive and to the folder mentioned, and download the file in the same folder on your computer.**

b) Run the following only if you haven't already access to the training dataset (.tfrecord.gz in folders 'train' and 'eval') 


In [None]:
# Export training and evaluation tfrecords
tfdataconstructor = TFDatasetConstruction(LANDSAT,SENTINEL,RESPONSE,KERNEL_SIZE)
tfdataconstructor.dataset_construction("2017-01-01","2017-12-31") #the date should not change since the label dataset is from 2017