## Task

Conduct EDA for TensorFlow transfer learning pipeline to forecast **weekly dengue cases** (`total_cases`) from 22 multivariate weather/environmental features.

#### Notebook sections
1. Get Data
2. Exploratory Data Analysis
3. Data Cleaning (TBC)
4. Feature Selection (TBC poss notebook 02)
5. Feature Engineering (TBC poss notebook 02)
6. Benchmark Model
7. Model Tuning  (TBC)
8. Model Evaluation  (TBC, poss notebook 03)

In [1]:
import sys
import os
from pathlib import Path

# Set one level up as project root|
if os.path.abspath("..") not in sys.path:
    sys.path.insert(0, os.path.abspath(".."))
    
from src.config import ProjectConfig  # project config file parser

import pandas as pd
import numpy as np
import random
import time
from datetime import timedelta

import matplotlib.pyplot as plt
import matplotlib.colors as mcolors

def random_color():
    """
    get random matplot-lib colour - just for fun
    """
    color_names = list(mcolors.get_named_colors_mapping().keys())
    color_count = len(color_names)
    random_num = random.randint(0, color_count - 1)
    rand_col = mcolors.get_named_colors_mapping()[color_names[random_num]]
    # if rand_col == "No.":
    #     rand_col = random_color()
    return rand_col

In [2]:
cnfg = ProjectConfig.load_configuration()
PATH_TO_RAW_DATA = cnfg.data.dirs["raw"]
FILE_TRAIN_RAW = cnfg.data.files["features_train"]

### Get Data

In [4]:
df_raw = pd.read_csv(PATH_TO_RAW_DATA / FILE_TRAIN_RAW)
df_raw.sample(1)

Unnamed: 0,city,year,weekofyear,week_start_date,ndvi_ne,ndvi_nw,ndvi_se,ndvi_sw,precipitation_amt_mm,reanalysis_air_temp_k,...,reanalysis_precip_amt_kg_per_m2,reanalysis_relative_humidity_percent,reanalysis_sat_precip_amt_mm,reanalysis_specific_humidity_g_per_kg,reanalysis_tdtr_k,station_avg_temp_c,station_diur_temp_rng_c,station_max_temp_c,station_min_temp_c,station_precip_mm
1121,iq,2004,4,2004-01-22,0.320586,0.336529,0.2566,0.344114,29.23,299.62,...,19.4,82.467143,29.23,17.491429,11.228571,28.733333,11.4,34.6,21.8,20.1
