<a href="https://colab.research.google.com/github/WRFitch/fyp/blob/main/src/fyp_preliminary_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Setup


*   Install & import necessary libraries
*   Set up Earth Engine datastores. 



In [4]:
!pip uninstall fastai

Uninstalling fastai-2.2.2:
  Would remove:
    /usr/local/lib/python3.6/dist-packages/fastai-2.2.2.dist-info/*
    /usr/local/lib/python3.6/dist-packages/fastai/*
Proceed (y/n)? y
  Successfully uninstalled fastai-2.2.2


In [None]:
!pip install -U --no-cache-dir fastai
#!pip install fastai2
#!pip install tensorflow

In [1]:
# TODO organise imports based on utility and location.
import ee
import folium
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf

from fastai import *
from fastai.tabular import *
from fastai.vision import *
from fastai.vision.all import *
from google.colab import drive
from osgeo import gdal
from PIL import Image
from pprint import pprint

In [2]:
ee.Authenticate()
ee.Initialize()

To authorize access needed by Earth Engine, open the following URL in a web browser and follow the instructions. If the web browser does not start automatically, please manually browse the URL below.

    https://accounts.google.com/o/oauth2/auth?client_id=517222506229-vsmmajv00ul0bs7p89v5m89qs8eb9359.apps.googleusercontent.com&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fearthengine+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdevstorage.full_control&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&response_type=code&code_challenge=hUYhQQMiq3HYen-GQgq-OJ_-mAg8sVCKDKktbMK9n-c&code_challenge_method=S256

The authorization workflow will generate a code, which you should paste in the box below. 
Enter verification code: 4/1AY0e-g4WvaUfAfsRGPcxJWFY0JWGzLQjjy5a9vNRTmqB6xdEaE23sfmM32I

Successfully saved authorization token.


In [3]:
drive.mount('/content/drive')

#print(fastai.__version__)
print(folium.__version__)
print(tf.__version__)

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
0.8.3
2.4.0


# Dataset import

### Import the following datasets into Google Drive

*   [Sentinel-2 Satellite photography](https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S2_SR)
*   [Sentinel-5 Precursor Data](https://developers.google.com/earth-engine/datasets/catalog/sentinel)
  *   [Aerosol](https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S5P_OFFL_L3_AER_AI)
  *   [Cloud](https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S5P_OFFL_L3_CLOUD)
  *   [Carbon Monoxide](https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S5P_OFFL_L3_CO)
  *   [Formaldehyde](https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S5P_OFFL_L3_HCHO)
  *   [Nitrogen Dioxide](https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S5P_OFFL_L3_NO2)
  *   [Ozone](https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S5P_OFFL_L3_O3)
  *   [Sulphur Dioxide](https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S5P_OFFL_L3_SO2)
  *   [Methane](https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S5P_OFFL_L3_CH4)
*   [ODIAC Fossil Fuel CO2 Emissions](https://db.cger.nies.go.jp/dataset/ODIAC/DL_odiac2019.html)

### TODO
- Systematise this into functions so I can easily select and make changes based on data resolution or climate dataset
- Import CO2 dataset

In [4]:
# Earth engine username, used to import classified image into ee assets folder
USERNAME = 'wrfitch'
OUTPUT_DIR = USERNAME + "/out/"

# Define image collections for each dataset to be used 
s2 = ee.ImageCollection("COPERNICUS/S2_SR")
s5_CO = ee.ImageCollection("COPERNICUS/S5P/OFFL/L3_CO")
s5_HCHO = ee.ImageCollection("COPERNICUS/S5P/OFFL/L3_HCHO")
s5_NO2 = ee.ImageCollection("COPERNICUS/S5P/OFFL/L3_NO2")
s5_O3 = ee.ImageCollection("COPERNICUS/S5P/OFFL/L3_O3")
s5_SO2 = ee.ImageCollection("COPERNICUS/S5P/OFFL/L3_SO2")
s5_CH4 = ee.ImageCollection("COPERNICUS/S5P/OFFL/L3_CH4")
#TODO import CO2 dataset

# Define dataset boundaries for britain and london 
# TODO work out polygon segmentation algo - there's probably a clever algorithm for this, but I could also just iterate 
#      through simple squares that fit my bandwidth and storage constraints. I run out of memory when using the gbr 
#      polygon anyway, so an iterative approach is necessary. 
great_britain = ee.Geometry.Polygon(
        [[[-1.836112801004015, 59.808076330562756],
          [-8.779472176004015, 58.82140293049428],
          [-7.988456551004015, 55.71069203454839],
          [-11.196464363504015, 54.42753859549109],
          [-11.328300301004015, 50.967746003015044],
          [-9.526542488504015, 50.77361752815123],
          [-6.274589363504015, 51.81776248652293],
          [-5.395683113504015, 51.21615275310099],
          [-6.582206551004015, 49.56332371186494],
          [-3.110526863504015, 49.904165426606255],
          [1.240059073995985, 50.80139967619036],
          [2.426582511495985, 52.33095407387208],
          [1.767402823995985, 53.4183511305661],
          [0.5369340739959849, 53.44453305344514],
          [-1.616386238504015, 56.32474216074427],
          [-0.7814253010040151, 57.805828290000164]]])

london = ee.Geometry.Polygon(
        [[[-1.0666833726431624, 51.89360084338857],
          [-0.9321008531119124, 51.38908166135181],
          [-0.18503054061191238, 51.08470683562287],
          [0.4741491468881076, 51.193274483099074],
          [0.9822668226693576, 51.60282356474035],
          [0.2269567640756076, 52.071221592742454]]])

# import variables
# Could the start and end dates 
start_date = '2020-01-01'
end_date = '2020-12-31'
vis_palette = ['black', 'blue', 'purple', 'cyan', 'green', 'yellow', 'red']

# export variables
drive_path = "/content/drive/MyDrive/"

In [5]:
# Import datasets 
# TODO analyse whether these min/max values are valid, recalibrate for highest variance where necessary. Separate values
#      may be necessary for different samples - for example, the perfect calibration for the UK won't work on the world. 
# TODO analyse whether it makes sense to analyse these on a highly localised level

# pre-filter to remove clouds - we can add them back in as data points from sentinel 5 if necessary
def maskS2clouds(image) :
  qa = image.select('QA60');

  # Bits 10 and 11 are clouds and cirrus, respectively.
  cloudBitMask = 1 << 10
  cirrusBitMask = 1 << 11

  # Both flags should be set to zero, indicating clear conditions.
  mask = qa.bitwiseAnd(cloudBitMask).eq(0).And( \
         qa.bitwiseAnd(cirrusBitMask).eq(0))

  return image.updateMask(mask).divide(10000)

# High-resolution satellite photograph 
s2_img = ee.ImageCollection('COPERNICUS/S2_SR') \
                  .filterDate(start_date, end_date) \
                  .filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', 20)) \
                  .filterBounds(great_britain) \
                  .map(maskS2clouds).median()
s2_id = s2_img.getMapId({'bands': ['B4', 'B3', 'B2'], \
                        'min': 0, \
                        'max': 0.3})

# Carbon monoxide
# Minmax scale is a bit off - recalibrate for Britain
CO_img = s5_CO.filterDate(start_date, end_date) \
              .filterBounds(great_britain) \
              .select('CO_column_number_density').mean()
CO_id = CO_img.getMapId( \
    {'palette': vis_palette, \
    'min': 0, \
    'max': 0.05})

# Formaldehyde
# Minmax scale is a bit off - recalibrate for Britain
HCHO_img = s5_HCHO.filterDate(start_date, end_date) \
                  .filterBounds(great_britain) \
                  .select('tropospheric_HCHO_column_number_density').mean()
HCHO_id = HCHO_img.getMapId( \
    {'palette': vis_palette, \
    'min': 0.0, \
    'max': 0.0003})

# Nitrogen Dioxide
NO2_img = s5_NO2.filterDate(start_date, end_date) \
                .filterBounds(great_britain) \
                .select('tropospheric_NO2_column_number_density').mean()
NO2_id = NO2_img.getMapId( \
    {'palette': vis_palette, \
    'min': 0.0, \
    'max': 0.0002})

# Ozone
O3_img = s5_O3.filterDate(start_date, end_date) \
              .filterBounds(great_britain) \
              .select('O3_column_number_density').mean()
O3_id = O3_img.getMapId( \
    {'palette': vis_palette, \
    'min': 0.12, \
    'max': 0.15})

# Sulphur Dioxide
SO2_img = s5_SO2.filterDate(start_date, end_date) \
                .filterBounds(great_britain) \
                .select('SO2_column_number_density').mean()
SO2_id = SO2_img.getMapId( \
    {'palette': vis_palette, \
    'min': 0.0, \
    'max': 0.0005})

# Methane
CH4_img = s5_CH4.filterDate(start_date, end_date) \
                .filterBounds(great_britain) \
                .select('CH4_column_volume_mixing_ratio_dry_air').mean()
CH4_id = CH4_img.getMapId( \
    {'palette': vis_palette, \
    'min': 1750, \
    'max': 1900})

In [None]:
# Visualise data on a Folium map 
# TODO find a valid attr value to replace the current val.
map = folium.Map(location=[51.5, 0.1], \
                    prefer_canvas=True)

folium.TileLayer(
    tiles=s2_id['tile_fetcher'].url_format,
    attr='Map Data &copy; <a href="https://earthengine.google.com/">Google Earth Engine</a>',
    overlay=True,
    name='median composite',
  ).add_to(map)

folium.TileLayer(
    tiles=CO_id['tile_fetcher'].url_format,
    attr='Map Data &copy; <a href="https://earthengine.google.com/">Google Earth Engine</a>',
    overlay=True,
    name='Carbon Monoxide',
  ).add_to(map)

folium.TileLayer(
    tiles=HCHO_id['tile_fetcher'].url_format,
    attr='Map Data &copy; <a href="https://earthengine.google.com/">Google Earth Engine</a>',
    overlay=True,
    name='Formaldehyde',
  ).add_to(map)

folium.TileLayer(
    tiles=NO2_id['tile_fetcher'].url_format,
    attr='Map Data &copy; <a href="https://earthengine.google.com/">Google Earth Engine</a>',
    overlay=True,
    name='Nitrogen Dioxide',
  ).add_to(map)

folium.TileLayer(
    tiles=O3_id['tile_fetcher'].url_format,
    attr='Map Data &copy; <a href="https://earthengine.google.com/">Google Earth Engine</a>',
    overlay=True,
    name='Ozone',
  ).add_to(map)

folium.TileLayer(
    tiles=SO2_id['tile_fetcher'].url_format,
    attr='Map Data &copy; <a href="https://earthengine.google.com/">Google Earth Engine</a>',
    overlay=True,
    name='Sulphur Dioxide',
  ).add_to(map)

folium.TileLayer(
    tiles=CH4_id['tile_fetcher'].url_format,
    attr='Map Data &copy; <a href="https://earthengine.google.com/">Google Earth Engine</a>',
    overlay=True,
    name='Methane',
  ).add_to(map)
  
map.add_child(folium.LayerControl())
map

In [15]:
# TODO random column is a placeholder for data sorting - remove when possible! (I could sort by some other parameter, 
#      but that would influence the sampling)

# Export one table of the given image, at the scale and dimensions specified.
def exportTable(image, polygon, scale):
  sample = image.sampleRegions(
      collection = polygon,
      scale = scale
  ).randomColumn()

# Export images from imagecollections to drive
# image dimensions can be defined at this stage, as can scaling, format

# Region can also be defined, allowing iteration here. Minimising the size of the loop is paramount, given the 
# functional paradigm EE uses.

  ee.batch.Export.table.toDrive(
    collection = sample,
    description = "exporting data at " + str(scale) + "m resolution",
    folder = str(scale) + "m",
    fileFormat = 'TFRecord'
  ).start()


image_export_options = {
    'patchDimensions': [256, 256],
    'maxFileSize': 104857600,
    'compressed':True 
}

# Export one GeoTIFF image of the given image, at the scale and dimension specified. 
# TODO reevaluate image export options - description needs coordinates
def exportGeotiff(image, polygon, scale):
  ee.batch.Export.image.toDrive(
    description = str(scale) + 'm_scale_img',
    fileFormat = 'GeoTIFF',
    folder = str(scale) + "m",
    # formatOptions = image_export_options,
    image = s2_img,
    region = polygon,
    scale = scale,
  ).start()

In [None]:
# take a sample of the image at the points given and add a random column
# TODO combine datasets into one. can tabular recommenders include images? 
# TOOD Save a small partition in google drive, then work on getting the next via a thread. 
#      This should also start the training process, then delete the small partition in the drive. 

# A range of samples to create an iterative downscaler. These could be more evenly spaced. 
# Reaching an unlikely level of precision below 100m. The convolution map from a 1km scale is very unlikely to have  
# enough detail to reconstruct individual roads and houses, but given this volume of data it's not unreasonable to 
# assume some improvement is still possible.
sizes = {1000, 500}#, 100, 50, 10}

for scale in sizes:
  # this should only be run once, when setting up
  # exportTable(s2_img, london, scale)

pprint(ee.batch.Task.list())

In [None]:
pprint(ee.batch.Task.list())

In [None]:
sizes = {1000, 500}#, 100, 50, 10}

for scale in sizes:
  # this should only be run once, when setting up
  # exportGeotiff(s2_img, london, scale)

pprint(ee.batch.Task.list())

# Fastai & tensorflow processing

### TODO
- Convert GeoTIFF to PNG
- Access datasets for fast.ai retraining. 2-400 images was sufficient for object recog, maybe double that for top-down sat photos?
- Train an upscaling unet that can do 2x upscaling 
- __Save the satellite upscaler, including a local copy__
- Transfer the upscaler to use methane data

In [None]:
# converts geotiff to png, using selected bands. There seems to be a limited range of functional bands, including only 3 
# being available for a non-transparent image. The bands also display in black-and-white when displayed individually. 

def geotiffToPng(tif_path):
  #TODO remap to ARGB to get more defined brightness data
  options_list = [
    '-ot Byte',
    '-of PNG',
    '-b 4',
    '-b 3',
    '-b 2',
    '-scale'
  ]
  options_string = " ".join(options_list)

  yourpath = os.path.join(drive_path, tif_path)
  print(yourpath)
  
  for root, dirs, files in os.walk(yourpath, topdown=False):
    for name in files:
      print(os.path.join(root, name))
      if os.path.splitext(os.path.join(root, name))[1].lower() == ".tif":
        path = os.path.splitext(os.path.join(root, name))[0]
        if os.path.isfile(path + ".png"):
          print("A png file already exists for %s" % name)
          #return
        
        gdal.Translate(
          path + 'band_432.png',
          path + '.tif',
          options=options_string
        )

# get images 
geotiffToPng("100m")
#file_extract(drive_path + "1000m/imgs/1000m_scale_img.tif") #file_extract doesn't seem to want to play

In [None]:
# preprocess images 
# Slice low-res images 
# upscale low-res ones to be the same size as the required input 
# resize high-res test outputs to required output
# Map low-res input images to high-res output images using GeoTIFF metadata 
# utilise image transformations to expand dataset as much as possible - rotations & transforms should do the trick. 

In [None]:
type(get_image_files(path))
print(get_image_files(path)[0])