# Tables S5/7: Comparison of MOSAIKS, fine-tuned ResNet-18, pre-trained ResNet-152, and hybrid ResNet-18/MOSAIKS models

This notebook produces a table comparing MOSAIKS to alternative SIML approaches:

- A fine-tuned ResNet-18 CNN
- A pre-trained ResNet-152 CNN
- A hybrid ResNet-18/MOSAIKS model

The training of the fine-tuned ResNet-18 and pre-trained ResNet-152 models take place in the [train_CNN.py](../../Fig3_diagnostics/train_CNN.py) and [run_pretrained_resnet_regressions.ipynb](../../Fig3_diagnostics/run_pretrained_resnet_regressions.ipynb) files, so this notebook simply inputs the out-of-sample predictions from these previously-trained models. The majority of this notebook is dedicated to creating and training the hybrid model. This model consists of concatenating the last hidden layer of the ResNet-18 model (512 features) with MOSAIKS features (8192 features), and running a ridge regression on the concatenated feature set (allowing for differential regularization parameters on the two sources). Because the CNN features have already been trained on our entire 80k training+validation set, we train on this 80k, choose our hyperparameters using 10k of the remaining 20k (all 20k of which are reserved for testing in all other analyses), and report performance on the final 10k. To ensure comparability of results, we also re-evaluate both the pure MOSAIKS model and the pure ResNet-18 model trained and evaluated on the same sets. The final output table contains: (1) the original MOSAIKS test set results, (2) the original CNN test set results, (3 and 4) the corresponding results from both models on the harmonized 10k test set, (5) the hybrid model results from this harmonized test set, and (6) the pretrained ResNet-152 model results. The latter is not re-tested on the harmonized 10k test set because its performance is not close to that of the other candidate models. These results are used to populate Supplementary Materials Tables S5 and S7.

In [5]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [6]:
import os
import warnings
from dataclasses import dataclass
from os.path import join
from pathlib import Path

import numpy as np
import pandas as pd
from joblib import dump, load
from mosaiks import config as c
from mosaiks import transforms as m_transforms
from mosaiks.solve import cnn
from mosaiks.solve import data_parser as parse
from mosaiks.utils import io
from scipy.linalg import LinAlgWarning
from sklearn.linear_model import Ridge
from sklearn.metrics import r2_score
from sklearn.pipeline import Pipeline
from threadpoolctl import threadpool_limits
from torch.nn import Sequential

In [7]:
fixed_lambda = False
overwrite = None

SUBSET_N = None
SUBSET_FEAT = None
OVERWRITE_CNN_FEAT = False
NUM_THREADS = None
LABELS_TO_RUN = "all"

L_CNN = np.logspace(-8, 6, 15)
L_RCF = np.logspace(-2, 5, 8)

In [8]:
if NUM_THREADS is not None:
    threadpool_limits(NUM_THREADS)
    os.environ["NUMBA_NUM_THREADS"] = str(NUM_THREADS)
if overwrite is None:
    overwrite = os.getenv("MOSAIKS_OVERWRITE", False)
if LABELS_TO_RUN == "all":
    LABELS_TO_RUN = c.app_order

subset_str = ""
if (SUBSET_N is not None) or (SUBSET_FEAT is not None):
    subset_str = "_subset"

save_patt = join(
    "{save_dir}",
    "model_cnnHybrid_{label}_{variable}_CONTUS_16_640_{sampling}_"
    f"{c.sampling['n_samples']}_{c.sampling['seed']}_"
    f"{c.features['random']['patch_size']}_"
    f"{c.features['random']['seed']}{subset_str}.pickle",
)

out_dir = Path(c.res_dir) / "tables" / "TableS5"
out_dir.mkdir(exist_ok=True, parents=True)

## Create CNN feature vectors

In [10]:
def get_model(task, remove_fc=True):
    test_r2, cnn_out = io.load_cnn_performance(task, c, extra_return=["model"])
    if remove_fc:
        cnn_out.fc = Sequential()
    return test_r2, cnn_out


def get_cnn_ids(task):
    model = load(
        Path(c.data_dir) / "output" / "cnn_comparison" / f"resnet18_{task}.pickle"
    )
    return {"test": model["ids_test"], "train": model["ids_train"]}


def load_cnn_feats(task, c):
    sample = getattr(c, task)["sampling"]
    subgrid_path = c.grid_paths[sample]
    cnn_ids = np.load(subgrid_path)["ID"].astype(str)
    cnn_feat_path = Path(c.features_dir) / f"CONTUS_{sample}_resnet18_{task}.npy"
    cnn_feats = np.load(cnn_feat_path)
    cnn_feats = pd.DataFrame(
        cnn_feats, index=cnn_ids, columns=[f"XR_{i}" for i in range(cnn_feats.shape[1])]
    )
    return cnn_feats


@dataclass
class DataCategory:
    X: np.array
    Y: np.array
    latlons: np.array
    ids: np.array


@dataclass
class AllData:
    train: DataCategory
    val: DataCategory
    test: DataCategory


def split_data(task, X, latlons, c):

    c = io.get_filepaths(c, task)
    c_app = getattr(c, task)

    Y = io.get_Y(c, c_app["colname"])

    # merge
    Y, X, latlons, ids = parse.merge(
        Y, X, latlons, pd.Series(Y.index.values, index=Y.index)
    )

    # drop nulls
    Y, valid = m_transforms.dropna_Y(Y, task)
    X, latlons, ids = map(lambda x: x[valid], (X, latlons, ids))

    cnn_ids = get_cnn_ids(task)
    in_test = np.isin(ids, cnn_ids["test"])
    in_train = np.isin(ids, cnn_ids["train"])
    X_test, Y_test, latlons_test, ids_test = map(
        lambda x: x[in_test], (X, Y, latlons, ids)
    )
    X_train, Y_train, latlons_train, ids_train = map(
        lambda x: x[in_train], (X, Y, latlons, ids)
    )

    # apply transform
    X, Y, latlons = getattr(m_transforms, f"transform_{task}")(
        X, Y, latlons, c_app["logged"]
    )

    # split train/test to match CNN
    cnn_test_ids = get_cnn_ids(task)
    in_test = np.isin(ids, cnn_test_ids["test"])
    in_train = np.isin(ids, cnn_test_ids["train"])
    X_test, Y_test, latlons_test, ids_test = map(
        lambda x: x[in_test], (X, Y, latlons, ids)
    )
    X_train, Y_train, latlons_train, ids_train = map(
        lambda x: x[in_train], (X, Y, latlons, ids)
    )

    # split test set in half for validation and test set
    rng = np.random.default_rng(c.ml_model["seed"])
    val_ixs = rng.choice(Y_test.shape[0], size=int(Y_test.shape[0] / 2), replace=False)
    all_ixs = np.arange(Y_test.shape[0])
    test_ixs = all_ixs[~np.isin(all_ixs, val_ixs)]
    X_val, Y_val, latlons_val, ids_val = map(
        lambda x: x[val_ixs], (X_test, Y_test, latlons_test, ids_test)
    )
    X_test, Y_test, latlons_test, ids_test = map(
        lambda x: x[test_ixs], (X_test, Y_test, latlons_test, ids_test)
    )

    # subset
    X_train = X_train[slice(SUBSET_N), slice(SUBSET_FEAT)]
    X_val = X_val[:, slice(SUBSET_FEAT)]
    X_test = X_test[:, slice(SUBSET_FEAT)]
    Y_train = Y_train[slice(SUBSET_N)]
    latlons_train = latlons_train[slice(SUBSET_N)]

    train = DataCategory(X_train, Y_train, latlons_train, ids_train)
    val = DataCategory(X_val, Y_val, latlons_val, ids_val)
    test = DataCategory(X_test, Y_test, latlons_test, ids_test)

    return AllData(train, val, test)

In [11]:
models = {}
models["POP"] = {}
models["UAR"] = {}
for i in LABELS_TO_RUN:
    sample = getattr(c, i)["sampling"]
    _, models[sample][i] = get_model(i)

In [16]:
for sample in ["UAR", "POP"]:
    to_write = {
        task: model
        for task, model in models[sample].items()
        if OVERWRITE_CNN_FEAT
        or (
            not (
                Path(c.features_dir) / f"CONTUS_{sample}_resnet18_{task}.npy"
            ).is_file()
        )
    }

    # skip if nothing to overwrite
    if len(to_write) == 0:
        continue

    outputs = {i: [] for i in to_write}
    # get paths
    subgrid_path = c.grid_paths[sample]
    img_dir = Path(c.data_dir) / "raw" / "imagery" / f"CONTUS_{sample}"
    grid = np.load(subgrid_path)
    y = grid["lat"]  # nonsense y var
    ids = grid["ID"].astype(str)

    # configure dataloader
    dl = cnn.get_dataloader(img_dir, y, ids, shuffle=False, subset=SUBSET_N)

    features = []
    for _, img, _ in dl:
        for task, model in to_write.items():
            outputs[task].append(model(img).detach().numpy())

    for task in outputs:
        outputs[task] = np.concatenate(outputs[task], axis=0)
        np.save(
            Path(c.features_dir) / f"CONTUS_{sample}_resnet18_{task}.npy",
            outputs[task],
        )

## Run regressions

In [9]:
X_all, latlons_all = {}, {}
X_all["UAR"], latlons_all["UAR"] = io.get_X_latlon(c, "UAR")
X_all["POP"], latlons_all["POP"] = io.get_X_latlon(c, "POP")

In [11]:
# run on all tasks
bad_tasks = []
for tx, task in enumerate(LABELS_TO_RUN):
    print(f"Running regressions for task {tx+1}/{len(LABELS_TO_RUN)}: {task}")

    # get general paths
    c = io.get_filepaths(c, task)
    c_app = getattr(c, task)
    sample = c_app["sampling"]

    # Get save path
    save_path = Path(
        save_patt.format(
            save_dir=c.fig_dir_prim,
            label=task,
            variable=c_app["variable"],
            sampling=sample,
        )
    )

    best_lr = None
    best_lc = None
    this_l_cnn = L_CNN.copy()
    this_l_rcf = L_RCF.copy()
    this_l_cnn_save = L_CNN.copy()
    this_l_rcf_save = L_RCF.copy()
    already_run_cnn = []
    already_run_rcf = []
    ill_conditioned_rcf = []
    ill_conditioned_cnn = []
    if save_path.is_file():
        model = load(save_path)
        if fixed_lambda:
            params = model.get_params()
            l_rat = params["transform__kw_args"]["l_rat"]
            lr = params["regress__regressor__alpha"]
            lc = lr / l_rat
            this_l_cnn = np.array([lc])
            this_l_rcf = np.array([lr])
            hp_hits_boundary_prev = model.hp_hits_boundary
        elif not overwrite:
            already_run_cnn = model.lambdas_cnn
            already_run_rcf = model.lambdas_rcf
            best_lr = model.best_lr
            best_lc = model.best_lc
            ill_conditioned_cnn = getattr(model, "ill_conditioned_cnn", [])
            ill_conditioned_rcf = getattr(model, "ill_conditioned_rcf", [])
            if (
                np.isin(this_l_cnn, already_run_cnn).all()
                and np.isin(this_l_rcf, already_run_rcf).all()
            ):
                print(
                    f"{task} task output file already exists and no new "
                    "hyperparameters are being tested."
                )
            this_l_cnn_save = np.sort(
                np.unique(np.concatenate((this_l_cnn_save, already_run_cnn)))
            )
            this_l_rcf_save = np.sort(
                np.unique(np.concatenate((this_l_rcf_save, already_run_rcf)))
            )
            this_l_cnn = this_l_cnn[~np.isin(this_l_cnn, ill_conditioned_cnn)]
            this_l_rcf = this_l_rcf[~np.isin(this_l_rcf, ill_conditioned_rcf)]

    # load X
    print("...Loading data")
    X, latlons = X_all[sample], latlons_all[sample]

    # load cnn features
    cnn_feats = load_cnn_feats(task, c)
    n_cnn_feat = cnn_feats.shape[1]

    # merge with RCF features
    print("...Merging RCF/CNN")
    X = X.join(cnn_feats)

    # load y and split all data into train/val/test based on CNN test set
    print("...Splitting train/test")
    data = split_data(task, X, latlons, c)

    # train model
    best_score = -np.inf
    ridge_regr = cnn.get_bounded_ridge_regressor(c, task)
    pipe = Pipeline(
        [
            (
                "transform",
                cnn.get_hybrid_adjust_weights_transformer(n_cnn_feat=n_cnn_feat),
            ),
            ("regress", ridge_regr),
        ]
    )

    with warnings.catch_warnings():
        warnings.filterwarnings("error", category=LinAlgWarning)

        if not fixed_lambda:
            # test for ill conditioning within CNN and skip these hp's if so to save time in
            # 2D grid search
            print("...Testing CNN only model")
            for lcx, lc in enumerate(this_l_cnn):
                ridge_regr.set_params(regressor__alpha=lc)
                try:
                    ridge_regr.fit(data.train.X[:, -n_cnn_feat:], data.train.Y)
                    score = ridge_regr.score(data.val.X[:, -n_cnn_feat:], data.val.Y)
                    if score > best_score:
                        best_pipe = ridge_regr
                        best_lc = lc
                        best_lr = None
                        best_score = score
                    break
                except LinAlgWarning:
                    pass
            if lcx != 0:
                print(
                    f"......Dropping first {lcx} CNN lambdas in grid search due to "
                    "ill-conditioning"
                )
            this_l_cnn = this_l_cnn[lcx:]
            ill_conditioned_cnn += list(this_l_cnn[:lcx])

            # test for ill conditioning within RCF and skip these hp's if so
            print("...Testing RCF only model")
            for lrx, lr in enumerate(this_l_rcf):
                ridge_regr.set_params(regressor__alpha=lr)
                try:
                    ridge_regr.fit(data.train.X[:, :-n_cnn_feat], data.train.Y)
                    score = ridge_regr.score(data.val.X[:, :-n_cnn_feat], data.val.Y)
                    if score > best_score:
                        best_pipe = ridge_regr
                        best_lc = None
                        best_lr = lr
                        best_score = score
                    break
                except LinAlgWarning:
                    pass
            if lrx != 0:
                print(
                    f"......Dropping first {lrx} RCF lambdas in grid search due to "
                    "ill-conditioning"
                )
            this_l_rcf = this_l_rcf[lrx:]
            ill_conditioned_rcf += list(this_l_cnn[:lrx])

        # now do grid search over remaining sets of both hyperparameters
        for lcx, lc in enumerate(this_l_cnn):
            print(f"...Testing CNN lambda {lcx+1} / {len(this_l_cnn)}")
            for lrx, lr in enumerate(this_l_rcf):
                print(f"......Testing RCF lambda {lrx+1} / {len(this_l_rcf)}", end="")
                if lr in already_run_rcf and lc in already_run_cnn:
                    print(".........skipping b/c hp set already run in previous search")
                    continue
                l_rat = lr / lc
                this_pipe = pipe.set_params(
                    transform__kw_args={
                        "l_rat": l_rat,
                        "n_cnn_feat": n_cnn_feat,
                    },
                    regress__regressor__alpha=lr,
                )

                # ignore models that raise an ill-conditioned warning
                try:
                    this_pipe.fit(data.train.X, data.train.Y)
                except LinAlgWarning:
                    print("...skipped due to ill-condition warning")
                    continue
                print("")

                score = this_pipe.score(data.val.X, data.val.Y)
                if score > best_score:
                    best_pipe = this_pipe
                    best_lc = lc
                    best_lr = lr
                    best_score = score

    if best_lr is None:
        bad_tasks.append(task)
        continue

    # refit model
    best_pipe.set_params(
        transform__kw_args={
            "l_rat": best_lr / best_lc,
            "n_cnn_feat": n_cnn_feat,
        },
        regress__regressor__alpha=best_lr,
    )
    best_pipe.fit(data.train.X, data.train.Y)
    best_pipe.val_r2 = best_score
    best_pipe.test_r2 = best_pipe.score(data.test.X, data.test.Y)
    best_pipe.lambdas_cnn = this_l_cnn_save
    best_pipe.lambdas_rcf = this_l_rcf_save
    best_pipe.best_lc = best_lc
    best_pipe.best_lr = best_lr
    best_pipe.ill_conditioned_cnn = ill_conditioned_cnn
    best_pipe.ill_conditioned_rcf = ill_conditioned_rcf
    if fixed_lambda:
        best_pipe.hp_hits_boundary = hp_hits_boundary_prev
    else:
        best_pipe.hp_hits_boundary = {
            "cnn": {
                "upper": best_lc == this_l_cnn_save[-1],
                "lower": best_lc == this_l_cnn_save[0],
            },
            "rcf": {
                "upper": best_lr == this_l_rcf_save[-1],
                "lower": best_lr == this_l_rcf_save[0],
            },
        }

    # save model
    dump(best_pipe, save_path)


if len(bad_tasks) > 0:
    raise ValueError(
        f"No non-ill-conditioned hyperparameter values available for tasks: {bad_tasks}"
    )

Running regressions for task 1/7: treecover
...Loading data
...Merging RCF/CNN
...Splitting train/test
...Testing CNN lambda 1 / 1
......Testing RCF lambda 1 / 1
Running regressions for task 2/7: elevation
...Loading data
...Merging RCF/CNN
...Splitting train/test
...Testing CNN lambda 1 / 1
......Testing RCF lambda 1 / 1
Running regressions for task 3/7: population
...Loading data
...Merging RCF/CNN
...Splitting train/test
...Testing CNN lambda 1 / 1
......Testing RCF lambda 1 / 1
Running regressions for task 4/7: nightlights
...Loading data
...Merging RCF/CNN
...Splitting train/test
...Testing CNN lambda 1 / 1
......Testing RCF lambda 1 / 1
Running regressions for task 5/7: income
...Loading data
...Merging RCF/CNN
...Splitting train/test
...Testing CNN lambda 1 / 1
......Testing RCF lambda 1 / 1
Running regressions for task 6/7: roads
...Loading data
...Merging RCF/CNN
...Splitting train/test
...Testing CNN lambda 1 / 1
......Testing RCF lambda 1 / 1
Running regressions for task 7/7

## Validate that best chosen model is not hitting hyperparameter bounds

In [12]:
for task in c.app_order:

    print(task)

    # get general paths
    c = io.get_filepaths(c, task)
    c_app = getattr(c, task)
    sample = c_app["sampling"]
    subgrid_path = c.grid_paths[sample]

    # Get save path
    save_path = Path(
        save_patt.format(
            save_dir=c.fig_dir_prim,
            label=task,
            variable=c_app["variable"],
            sampling=c_app["sampling"],
        )
    )

    best_pipe = load(save_path)

    print(best_pipe.hp_hits_boundary)
    print(best_pipe.best_lc, best_pipe.best_lr, best_pipe.test_r2)
    print()

treecover
{'cnn': {'upper': False, 'lower': False}, 'rcf': {'upper': False, 'lower': False}}
1e-06 1.0 0.9425213101647336

elevation
{'cnn': {'upper': False, 'lower': False}, 'rcf': {'upper': False, 'lower': False}}
100.0 1.0 0.8071998090934125

population
{'cnn': {'upper': False, 'lower': False}, 'rcf': {'upper': False, 'lower': False}}
1e-06 1.0 0.8131466254822377

nightlights
{'cnn': {'upper': False, 'lower': False}, 'rcf': {'upper': False, 'lower': False}}
10000.0 1.0 0.9006131808056724

income
{'cnn': {'upper': False, 'lower': False}, 'rcf': {'upper': False, 'lower': False}}
99999.99999999999 1.0 0.5061013970456906

roads
{'cnn': {'upper': False, 'lower': False}, 'rcf': {'upper': False, 'lower': False}}
10000.0 1.0 0.5922103197828472

housing
{'cnn': {'upper': False, 'lower': False}, 'rcf': {'upper': False, 'lower': False}}
99999.99999999999 1.0 0.665529435859699



## Get MOSAIKS predictions using the same train and test dataset

In [21]:
# get mosaiks output pattern
mosaiks_patt = join(
    "{save_dir}",
    "outcomes_scatter_obsAndPred_{label}_{variable}_CONTUS_16_640_{sampling}_"
    f"{c.sampling['n_samples']}_{c.sampling['seed']}_random_features_"
    f"{c.features['random']['patch_size']}_"
    f"{c.features['random']['seed']}{subset_str}.data",
)

In [22]:
scores = pd.DataFrame(
    index=pd.Index(LABELS_TO_RUN, name="task"),
    columns=[
        "mosaiks",
        "mosaiks_10ktest",
        "resnet18",
        "resnet18_10ktest",
        "hybrid_10ktest",
    ],
)
for task in LABELS_TO_RUN:
    print(f"Running for task: {task}...")
    # get general paths
    c = io.get_filepaths(c, task)
    c_app = getattr(c, task)
    sample = c_app["sampling"]

    # Get optimal lambda and test r2
    mosaiks_outpath = Path(
        mosaiks_patt.format(
            save_dir=c.fig_dir_prim,
            label=task,
            variable=c_app["variable"],
            sampling=c_app["sampling"],
        )
    )
    test_model = load(
        mosaiks_outpath.parent / mosaiks_outpath.name.replace("scatter", "testset")
    )
    scores.loc[task, "mosaiks"] = r2_score(test_model["truth"], test_model["preds"])

    best_lambda = load(mosaiks_outpath)["best_lambda"]
    assert len(best_lambda) == 1
    best_lambda = best_lambda[0]

    print("...Loading data")

    # load last layer of CNN features and concatenate onto mosaiks in order to split
    # the same way
    cnn_feats = load_cnn_feats(task, c)
    n_cnn_feat = cnn_feats.shape[1]
    this_X = X_all[sample].join(cnn_feats, how="left")

    # load y and split all data into train/val/test based on CNN test set
    print("...Splitting train/test")
    data = split_data(task, this_X, latlons_all[sample], c)

    print("...Retraining and predicting using MOSAIKS")

    # train and evaluate MOSAIKS model on new 80k train/10k test dataset
    ridge_regr = Ridge(fit_intercept=False, random_state=0, alpha=best_lambda)
    ridge_regr.fit(data.train.X[:, :-n_cnn_feat], data.train.Y)
    scores.loc[task, "mosaiks_10ktest"] = ridge_regr.score(
        data.test.X[:, :-n_cnn_feat], data.test.Y
    )

    # evaluate RESNET on both 20k (original result) and 10k (harmonized test set)
    print("...predicting using Resnet")
    test_r2, model = get_model(task, remove_fc=False)
    scores.loc[task, "resnet18"] = test_r2
    weights = model.fc.weight.detach().numpy().T
    cnn_pred = np.dot(data.test.X[:, -n_cnn_feat:], weights)

    mean = data.train.Y.mean()
    std = data.train.Y.std()
    cnn_pred = cnn_pred * std + mean

    cnn_pred = cnn.clip_bounds(cnn_pred, c_app)
    scores.loc[task, "resnet18_10ktest"] = r2_score(data.test.Y, cnn_pred)

    # load hybrid model results
    hyb_outpath = Path(
        save_patt.format(
            save_dir=c.fig_dir_prim,
            label=task,
            variable=c_app["variable"],
            sampling=c_app["sampling"],
        )
    )
    scores.loc[task, "hybrid_10ktest"] = load(hyb_outpath).test_r2

# merge pretrained results
pretrained_outpath = (
    Path(c.data_dir)
    / "output"
    / "cnn_comparison"
    / "TestSetR2_resnet152_1e5_pretrained.csv"
)
pretrained = pd.read_csv(
    pretrained_outpath, header=None, index_col=0, names=["pretrained"]
)
pretrained.index.name = "task"
scores = scores.join(pretrained)

scores = pd.DataFrame(scores)
scores

Running for task: treecover...
...Loading data
...Retraining and predicting using MOSAIKS
...predicting using Resnet
Running for task: elevation...
...Loading data
...Retraining and predicting using MOSAIKS
...predicting using Resnet
Running for task: population...
...Loading data
...Retraining and predicting using MOSAIKS
...predicting using Resnet
Running for task: nightlights...
...Loading data
...Retraining and predicting using MOSAIKS
...predicting using Resnet
Running for task: income...
...Loading data
...Retraining and predicting using MOSAIKS
...predicting using Resnet
Running for task: roads...
...Loading data
...Retraining and predicting using MOSAIKS
...predicting using Resnet
Running for task: housing...
...Loading data
...Retraining and predicting using MOSAIKS
...predicting using Resnet


Unnamed: 0_level_0,mosaiks,mosaiks_10ktest,resnet18,resnet18_10ktest,hybrid_10ktest,pretrained
task,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
treecover,0.913321,0.894866,0.941586,0.940609,0.942521,0.657796
elevation,0.680964,0.681033,0.799261,0.803045,0.8072,0.315231
population,0.724783,0.714906,0.801206,0.808143,0.813147,0.289154
nightlights,0.846592,0.849689,0.890472,0.89124,0.900613,0.47535
income,0.452314,0.4535,0.473877,0.46613,0.506101,0.07085
roads,0.533398,0.532579,0.575179,0.579945,0.59221,0.160777
housing,0.586562,0.609621,0.495744,0.561459,0.665529,0.011157


In [25]:
scores.to_csv(out_dir / "MOSAIKS_vs_CNN.csv")