<a href="https://colab.research.google.com/github/WRFitch/fyp/blob/main/src/fyp_ensemble_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Combining multiple models to interpolate greenhouse gases

This notebook is the beginning of an experiment in implementing tabular modelling to build on the CNN designed in [fyp_ai_analysis.ipynb](https://github.com/WRFitch/fyp/blob/main/src/fyp_ai_analysis.ipynb). Unfortunately due to time constraints this won't be finished by the official end of the project, but I intend to work on it in my own time, and therefore it is still included in this repository. 

All notebooks in this project are to be considered development environments, rather than bona fide scripts that, when run, will produce the end product. Therefore, certain code blocks and documentation are added for developer convenience. 

## Setup

### Import and install necessary supplementals 

In [None]:
!pip uninstall -y fastai
!pip install -U --no-cache-dir fastai

In [None]:
import os 
import numpy as np 
import pandas as pd 

from fastai.tabular.all import * 
from fastai.vision.all import * 
from google.colab import drive

drive.mount('/content/drive')

In [None]:
%cd /content
!git clone https://github.com/WRFitch/fyp.git

In [None]:
# Import fyputil library
%cd /content/fyp/src/fyputil
import constants as c
import fyp_utils as fyputil
%cd /content

### Get data & Model

In [None]:
ghg_df = pd.read_csv(c.ghg_csv)
norm_df = fyputil.normGhgDf(ghg_df.copy())

In [None]:
ghg_df.iloc[0:10]

In [None]:
def getGhgsAsArr(img_path):
  return fyputil.getGhgsAsArr(img_path, ghg_df)

In [None]:
model_name = "mrghg_060321-resnet152_increased_dataset_size_to_4k"
cnn_model = load_learner(f"{c.model_dir}/{model_name}.pkl")

In [None]:
predicted_df = pd.from_csv(f"{c.data_dir}/{model_name}.csv")

## Generate input CSV
- local readings & coordinates
  - Consider an encoding? 
  - Mathematical interpolation of nearby values. 
- Predicted reading
- Actual values 

If math interp is used, this keeps the columns to 18, which is a manageable volume. 

## Generate Tabular Model Ensemble
Train tabular model on ghg_df, predicting central reading based on eight nearest readings and cnn model output. 

# Random other offcuts from other notebooks 

Currently, the networks are having some trouble defining more subtle characteristics of the images, which shows some flaws in my work. The network will need some supplemental information to accurately predict the greenhouse gas at this point. This may include the following:
- **Latitude/Longitude.** Geography may affect predictions - all the images in my current dataset are near London, meaning they have far more greenhouse gases than most places. To encode a knowledge of city geography into a neural net may take some work...
- **Property Value.** How valuable is this land? This could go some way to encoding city dynamics, as well as explaining where the land might be. If land is rural, but valuable, it's likely to be near major cities or airports. 
- **Nearby GHG Values.** Combined with wind direction, an understanding of source & direction of airflow may describe how areas inherit ghg's from elsewhere. An example of this would be the high concentration of NO<sub>2</sub> north of Heathrow Airport, which may be caused by common flight patterns heading north. 
- **Wind Direction.** See above. 
- **Land Use.** Depending on detail, this may help alleviate the "grey field/massive factory" issue described in my log. By proving that certain areas are rural, residential, or industrial, we can limit errors based on inferring purely visual information. If we can specifically define what a large grey box is doing, we can also come to more developed conclusions about its purpose. A recycling center, an oil refinery, and a brewery may all look similar from above, but information about what they _are_ will limit a neural network getting confused. 
- **Population Density/Economic Output.** This will work in a similar way to property value, where we can predict human activity and its effects on greenhouse gases. Economic output may have a complex relationship to GHG emissions that cannot be easily represented, depending on the form of industry. For example, an eco-tourist attraction may rely on its low carbon footprint for survival, whereas a petrol station relies on high carbon ouput. 
- **Land Height**