For the next stage, we'd like you to try to tackle a ML task. Please take a look at this kaggle-like challenge and see what you can do with it.

https://kelvins.esa.int/proba-v-super-resolution/home/

In particular, what we're looking for is
- Your analysis and understanding of the provided problem, challenge, data and results
- Quality and creativity of the solution
- Thought process and ability to convey it to us in words, tables, plots and/or visualizations
- Readability and usability of the code
-Feel free to bring your personal touch for problem-solving. Thinking outside of the box is a great way to impress us and making sure you're going to the next phase of the recruitment process! (;

Submitting solutions are appreciated and recommended.
Ranking high is potentially impressive but not necessarily expected given the time and other constraints.

There is no hard deadline for the task. We expect you to work occasionally on it in case you are busy with other affairs (which we suppose you are).

You are free to use your preferred choice of language or frameworks or external data sources if you so see fit.

Expected deliverables are:
- code, preferably in github form
- Report describing your problem analysis, approach, results. Can be in pdf, readme markdown, ipynb form, a mix thereof, or whatever you feel is good at conveying information.

Please do not hesitate to contact us if you have questions or concerns.

# Report
## Contents
My report contains the following:
- Problem Analysis
    1. Objective
    2. Data specifics and challenges
    3. Submission
- Approaches
    1. Overview
    2. Notations
    2. Data 
        - Single Image
        - Multiple Image
    3. ResidualCNN Single Image
    3. ResidualCNN Multiple Image
    4. GAN 
- Results
    1. Results
    2. Challenges

## Problem Analysis
### Objective
In this project we try to super-resolve images taken by the PROBA-V(egation) earth observation satellite. 


We have at least 9 low resolution (300m per pixel) images and we need to reconstruct an high resolution image (100m per pixel). This problem is known as Multi-Image Super-Resolution:

TODO INSERT IMAGE

The low resolution images are captured within a time window of 30 days and are not aligned.


The goal of this challenge is not to enhance the visual appearance of the low resolution images together by imaginating "fake" details to make it look good, but rather use information from the different low resolution images to recover real information about the landscape.


I took this challenge as a chance try Tensorflow 2.0 features

-------







## Approaches

### Overview
I will use Deep Learning to enhance low resolutions images to transform to "ressemble", as much as possible, the high resolution image.

I used two approaches to feed the data to my model. Firstly as a single image constructed from all the low resolution images using the "central_tendency" method from the baseline.
Secondly by concatenating the low resolution images together to produce a 9 channel deep image. 

With the first method the network doesn't need to learn to align low resolution images and turns the problem into a classical Super-Resolution problem which might be simpler to approach. The second approach is more in line with the goal of the competition as we expect all low resolutions images to each bring their own relevant information (averaging them is lossy) to perform the Super Resolution. More details about the data below.

### Notations
- LR(s): low resolution image(s)
- HR: high resolution image
- QM(s): LR(s) quality map
- SM: HR quality map
- SR: Generated (super-resolved) image
- Scene: One observation from Proba including LRs, HR, QM
- Train: Refers to the train dataset
- Dev: Refers to the validation dataset
- Test: Refers to the test dataset
- cMSE: Clear mean-square error

### Data
We are given 1160 scenes in train, 290 in test. I take 30 scenes out from train to move them to dev for (potential) model fine tuning. all of LR and HR are Top-Of-Atmosphere reflectences for the RED and NIR spectral bands, I deal with both those bands the same way and perform no specific processing for the reflectences. Important factors are as follow:
1. Average brightness (pixel values) between LR and SRs can be different
2. Each scene has a "score" assigned by the competition
3. For scoring SR should be cropped at each by border by 3 pixel to compensate for pixel-shifts, then the cropped image is evaluated at all possible shifted position.

We deal with 1. by using the "clear" mean square error loss which will first subtract the average bias between HR and SR before taking the squared difference. (Same for clear mean average error. We could also "unbias" the whole dataset (unbias per scene) but it's more problematic. We don't use the scores given by 2 to weigh the computed losses from each scene but we could add it in the future. For 3. we simply crop our final predictions.

____________________

All of LR and HR have a 14 bit-depth but I found some images with 15. I load images as 16uint numpy arrays and convert them to float32 and then turn them into TFRecords format for convenience of use with the Tensorflow data API. This steps are showin in the "GenerateData" notebook.

------------------
LRs are of dimension [128, 128] representing 300m per pixel resolution images and HRs are three times bigger at [384, 384] for 100m per pixel.

-------------------
All of LR and HR in train comes with their own quality maps QMs/SM indicating, when the observation was taken, if a pixel is obstructed or not. I use SM to condition the loss function of the model. QMs aren't used.

#### Single Image
We generate a single TFRecords for each scene that includes: LR, HR and SM. LRs are fused through a median based central tendency measure (meaning that for one pixel, we take the median value of all LR for that pixel) and then upscaled, a priori I thought that it would be easier and quicker to start as a baseline but it is still hard/long to train. When loading a scene with that mode the following data augmentation is used:
1. horizontal and or vertical flipping
2. jittering (resize the image and then take a random crop)

#### Multiple Image
For multiple images I still generate a single TFRecords for each scene, it includes LRs, HR already modified with SM (np.NaN). Data augmentation are horizontal/vertical flipping only. It's a bit more tricky because the number of LR can be varying so we also have to store the length of the list in the TFRecord to reconsturct the sparse Tensor with the correct shape.

### ResidualCNN Single Image

I use Residual Block in my architecture which are as follows:

<img src="Report/residual_block.png" alt="resnet_block" style="width: 250px;"/>

Compared to the original ResNet paper, I don't use an activation function after the addition as shown to be better here: http://torch.ch/blog/2016/02/04/resnets.html. Also I replaced ReLU units by LeakyReLU.

---------
The "ResidualCNNSuperResolution" notebook contains an experiment of training a relatively simple CNN that only includes 3 residual blocks .
Since the input and the output to the model already have the same height and width, all the convolution are set with a strides of 1 and there are no upsampling operation. The original intuition was to try an "easy" baseline.


### ResidualCNN Multiple Image

In [5]:
from IPython.core.display import display, HTML
display(HTML(""""""))