# UNICEF Arm 2030 Vision #1: Flood Prediction in Malawi

The objective of this challenge is to build a machine learning model that helps predict the location and extent of floods in southern Malawi.

https://zindi.africa/competitions/2030-vision-flood-prediction-in-malawi

# Plan

I. Initialization

1. Imports

2. Global settings

2. Load databases

3. Instantiate object Analysis


II. Analysis

1. Overview

2. Analysis by features


III. Pre-process data

1. Impute NaNs

2. Remove outliers

3. Feature engineer

4. Transform categorical features


V. Analysis on clean data

1. Observe correlations


VI. Predict

1. Prepare data, tools, model

2. Train model

3. Test model

4. Benchmarks

## Imports

In [4]:
# System
import importlib

# Data
import numpy as np
import pandas as pd

# Graphs
import seaborn as sns

# Analysis
import cobratools as cobra

# ML
import torch
from torch.utils.data import Dataset, DataLoader
from torch.nn import Linear
from torch.nn import functional as F
from torch import nn, optim

## Settings

## Load Data

In [5]:
data = pd.read_csv(filepath_or_buffer='data/Train.csv',
                                            sep=',',
                                            low_memory=False,
                                            error_bad_lines=False)

## Reorganize Data

In [12]:
# Current data structuration
data.head()

Unnamed: 0,X,Y,target_2015,elevation,precip 2014-11-16 - 2014-11-23,precip 2014-11-23 - 2014-11-30,precip 2014-11-30 - 2014-12-07,precip 2014-12-07 - 2014-12-14,precip 2014-12-14 - 2014-12-21,precip 2014-12-21 - 2014-12-28,...,precip 2019-03-24 - 2019-03-31,precip 2019-03-31 - 2019-04-07,precip 2019-04-07 - 2019-04-14,precip 2019-04-14 - 2019-04-21,precip 2019-04-21 - 2019-04-28,precip 2019-04-28 - 2019-05-05,precip 2019-05-05 - 2019-05-12,precip 2019-05-12 - 2019-05-19,LC_Type1_mode,Square_ID
0,34.26,-15.91,0.0,887.764222,0.0,0.0,0.0,14.844025,14.552823,12.237766,...,0.896323,1.68,0.0,0.0,0.0,0.0,0.0,0.0,9,4e3c3896-14ce-11ea-bce5-f49634744a41
1,34.26,-15.9,0.0,743.403912,0.0,0.0,0.0,14.844025,14.552823,12.237766,...,0.896323,1.68,0.0,0.0,0.0,0.0,0.0,0.0,9,4e3c3897-14ce-11ea-bce5-f49634744a41
2,34.26,-15.89,0.0,565.728343,0.0,0.0,0.0,14.844025,14.552823,12.237766,...,0.896323,1.68,0.0,0.0,0.0,0.0,0.0,0.0,9,4e3c3898-14ce-11ea-bce5-f49634744a41
3,34.26,-15.88,0.0,443.392774,0.0,0.0,0.0,14.844025,14.552823,12.237766,...,0.896323,1.68,0.0,0.0,0.0,0.0,0.0,0.0,10,4e3c3899-14ce-11ea-bce5-f49634744a41
4,34.26,-15.87,0.0,437.443428,0.0,0.0,0.0,14.844025,14.552823,12.237766,...,0.896323,1.68,0.0,0.0,0.0,0.0,0.0,0.0,10,4e3c389a-14ce-11ea-bce5-f49634744a41


In [None]:
# New data structuration

# 2 Data Tables

# Square_ID | Date | Pos_X | Pos_Y | Precipitation | Elevation | LC_Type1 
# --------- | ---- | ----- | ----- | --------------| --------- | --------
# id1       | J    |        
# id1       | J-1  |          ...
# ...       | ...  |
# id1       | J-n  |
# id2       | J    |
# ...       | ...  |
# id2       | J-n  |
# ...

# This enables the addition of almost infinite number of older dates,
# by appending new rows to the database, contrarily to the current
# data structuration, that would require the addition of an equal 
# number of columns, ie. is not scalable.

# However it is adding a work overload, and requires more complex
# queries to view and modify data.

In [None]:
# Plan

I. Initialization

1. Import packages, classes, functions

2. Global settings

2. Load databases

3. Instantiate object Analysis


II. Analysis

1. Overview

2. Analysis by features


III. Pre-process data

1. Impute NaNs

2. Remove outliers

3. Feature engineer

4. Transform categorical features


V. Analysis on clean data

1. Observe correlations


VI. Predict

1. Prepare data, tools, model

2. Train model

3. Test model

4. Benchmarks