# Working Environment

We import all the packages you should need below. However, feel free to import extra packages. If you need to install some of these packages (ex: **rasterio**), you can access your shell and install like this: 

```shell
!pip install myPackage
```



```shell
!pip install -e git+https://github.com/scikit-learn/scikit-learn.git
```

In [None]:
# All the imports are here
import warnings
from os import listdir
from os.path import isfile, join

import numpy as np
import rasterio

from matplotlib import pyplot as plt
%matplotlib inline 

# Data Manipulation Challenge description & instructions

In this challenge, you will have to load and manipulate satellite images, a typical daily task at Kayrros. Data manipulation is one of the core skills one should master, even before thinking about fancy algorithms and data pipelines. 

You will not be required to have any specific knowledge on imagery, as we will provide you with the necessary guidelines.

Please keep your code clean and don't forget to comment your code. You are also encouraged to provide as much detail as you can in markdown cells or in comments, even if you have ideas you do not have time to explore. We will evaluate the cleanliness of your code as much as your results.

Have fun!

# Data loading and data description

## Setup

In [None]:
# Define the path where the images are located
IMG_PATH = 'data/'

# The functions below will be useful to load tif images as arrays
def rio_open(p):
    """
    Open an image with rasterio.

    Args:
        p: path to the image file

    Returns:
        rasterio dataset
    """
    with warnings.catch_warnings():  # noisy warning may occur here
        warnings.filterwarnings("ignore", category=UserWarning)
        return rasterio.open(p)

def rio_read(p):
    """
    Read an image with rasterio.

    Args:
        p: path to the image file

    Returns:
        numpy array
    """
    with rio_open(p) as x:
        return x.read().transpose((1, 2, 0)).squeeze()

## Loading and visualizing your first random satellite image band

Each satellite image is composed of 13 bands, going from visible light to near infrared. 

You will find these bands in the `data/` folder, under the format [image\_date]\_[image\_number].tif

The usual RGB images you are used to see (a picture for instance) are made of bands B02 (Blue), B03 (Green) and B04 (Red)

In [None]:
# Here, we show you how to load a given band from a random image and to visualize it
random_image_band = 'data/20170106_B02.tif'
img = rio_read(random_image_band)
plt.imshow(img)

## Loading the data set

In this section, you are expected to load the full data set in the following format: **dataset = {img\_date: {'B01': img\_B01_array, ..., 'B12': img\_B12_array}}**, where img\_B0X_array are the image bands as arrays, i.e. the output of the rio_read() function

In [None]:
# Load the dataset under the required format
# WRITE YOUR CODE HERE

# Sanity check: visualize the 3rd band of the image from 2018/05/31
# WRITE YOUR CODE HERE

# Generating a RGB image 

Now we know how to visualize a given satellite image band, we are going to build a function that will allow us to visualize a given satellite image as an RGB image.

A RGB image can be represented as a 3D array, with the dimensions [channel, img_height, img_width] or [img_height, img_width, channel], with channels B02, B03 and B04.

In [None]:
# In order to test your function, you can use the following random image
# We want to display the image as an RGB composite, i.e. B04, B03, B02
img = np.array([dataset['20170630']['B04'], dataset['20170630']['B03'], dataset['20170630']['B02']])

In [None]:
# Build your function that takes as input, a 3D array with B04, B03, B02 bands and returns the final rgb_img ready to be visualized

# WRITE YOUR CODE HERE by completing the function below

def generate_rgb_img(img):
    
    # 1. First, you need to normalize your image. Otherwise you won't be able to visualize it.
    # Normalise your image here
    
    # 2. The function imshow() allows you to visualise an RGB image, but it needs to be under the format [img_height, img_width, channel_number]
    # It's currently under the format [channel_number, img_height, img_width], so transform the image into the correct format
    
    # Display the RGB image
    return None

# Sanity check: test your function
# WRITE YOUR CODE HERE

# Detecting patterns in satellite imagery: flaring detection
In this section, we are going to detect some activity that might be difficult to observe in an RGB image by utilizing other satellite bands.

The activity we want to detect is called flaring: the process of burning gas in an open flame, which can be detected easily in the infrared spectrum picked up by the satellite.

In [None]:
# Load and visualize the RGB image on 2018/05/31. Can you easily identify flaring ?

# WRITE YOUR CODE HERE

In [None]:
# You can generate a heatmap by simply adding B11 and B12, that is to say: heatmap = B11 + B12
# Below, generate a heatmap of the image and visualize it. You should see a clear signal on the center-right of the image

# WRITE YOUR CODE HERE

In [None]:
# To help us automate the detection process, it is easier to produce a mask of the flaring activity
# We can consider that there is flaring activity on a pixel if the value of the pixel is above 15,000
# Generate a mask of the flaring activity in the previous heatmap (mask: pixel=1 where there is flaring, pixel=0 where there is no flaring)
# Visualize the resulting mask

# WRITE YOUR CODE HERE

In [None]:
# We consider an image to have flaring activity if the number of pixels above 15.000 in an image is >= 5.
# Write below a function that returns all the dates for which we should detect flaring activity in the data we provided

# WRITE YOUR CODE HERE

Question: if you had 1,000 images to process, would you keep the same code structure? What would you change?

In [None]:
# Answer the question here, no need to re-write any code

# More fun on data manipulation: zooming in an image on the flaring activity detected

In [None]:
# First, find the center of the flaring activity, that is to say, given all the images for which you detected that there was a flare, 
# combine them to find the center of the flaring activity in the image time series
# Your code should return the index of the pixel at the center of the flaring activity

# WRITE YOUR CODE HERE

In [None]:
# Then, make a crop of band B02 using an image of your choice, centered on the pixel at the center of the flaring activity, with a size of 50 x 50 pixels
# If the crop is not entirely contained in the image, you can complete the missing pixels of the crop with the value 0
# Do not use any pre-built cropping functions from other libraries, only use umpy array manipulation.
# Visualize the crop

# WRITE YOUR CODE HERE

In [None]:
# Finally, zoom (expand the size) of the previously obtained crop, i.e. produce an image of size 100x100 pixels from the crop of size 50x50
# Do not use any pre-built functions from other libraries
# For this, you will have to create "new" pixels, that you can either interpolate or duplicate from neighboring existing pixels
# Visualize the resulting image

# WRITE YOUR CODE HERE

CONGRATS ! You made it. We hope you had fun, please send back your solution in due time.