# Visualization of Domain Shift: or How I Suffered the Gap
As the competition host says, there is a significant domain gap between the train set and test set. To mitigate this issue, I created an adversarial validation model, that predict whether a given image is from train set or test set. As I mentioned [here](https://www.kaggle.com/c/seti-breakthrough-listen/discussion/265921), this task is so easy that my cross validation AUC score reach 0.999. In this notebook, I share some visualizations and what I tried (and failed) to fill the train / test domain shift.


## Setup

In [None]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [None]:
av = pd.read_csv("../input/seti-adversarial-validation/oof.csv")
tmp = pd.read_csv("../input/seti-breakthrough-listen/train_labels.csv")
tmp["label"] = tmp["target"]
av = pd.merge(av, tmp[["id", "label"]], on="id", how="left")

# target: train or test, pred: test-likeness,  label: needle or not
display(av)

## Some Statistics
Train and test are clearly separated

In [None]:
plt.hist(av.query("target == 0")["pred"], bins=100, label="train")
plt.hist(av.query("target == 1")["pred"], bins=100, label="test")
plt.xlabel("How likely it is in the test set");
plt.ylabel("Count")
plt.legend()
plt.grid()
plt.show()

Most of the samples are predicted less than 0.01, regardless of the needle label

In [None]:
fig, ax = plt.subplots(2, 2, figsize=(12, 6), tight_layout=True)

ax[0,0].plot(av.query("target == 0 and label == 0").sort_values("pred")[::-1]["pred"].values)
ax[0,0].set_xlabel("#Rank")
ax[0,0].set_ylabel("Predicted score")
ax[0,0].set_title("Label=0")
ax[0,0].grid()

ax[0,1].plot(av.query("target == 0 and label == 0").sort_values("pred")[::-1]["pred"].values)
ax[0,1].set_xlabel("#Rank")
ax[0,1].set_ylabel("Predicted score")
ax[0,1].set_title("Label=0")
ax[0,1].set_yscale("log")
ax[0,1].grid()

ax[1,0].plot(av.query("target == 0 and label == 1").sort_values("pred")[::-1]["pred"].values)
ax[1,0].set_xlabel("#Rank")
ax[1,0].set_ylabel("Predicted score")
ax[1,0].set_title("Label=1")
ax[1,0].grid()

ax[1,1].plot(av.query("target == 0 and label == 1").sort_values("pred")[::-1]["pred"].values)
ax[1,1].set_xlabel("#Rank")
ax[1,1].set_ylabel("Predicted score")
ax[1,1].set_title("Label=1")
ax[1,1].set_yscale("log")
ax[1,1].grid()

fig.show()

## Compare Cadences
Below, I compared the cadences of: typical train, non-typical train, non-typical test, and typical test. "Typical train" is the train samples with low predicted score, and alike.

Key findings:
- Typical train samples have "waves" in the background
- Typical test samples are more smooth and uniform
- Typical test samples have　weaker waves in the diagonal direction

In [None]:
def get_train_filename_by_id(_id: str) -> str:
    return f"../input/seti-breakthrough-listen/train/{_id[0]}/{_id}.npy"

def get_test_filename_by_id(_id: str) -> str:
    return f"../input/seti-breakthrough-listen/test/{_id[0]}/{_id}.npy"

### Train-like Train

In [None]:
fig, ax = plt.subplots(6, 5, figsize=(12, 16), tight_layout=True)
ax = ax.flatten()

for i, (id, label) in enumerate(av.query("target == 0").sort_values("pred")[:30][["id", "label"]].values):
    x = np.load(get_train_filename_by_id(id))
    x = np.vstack(x).astype(np.float32)
    ax[i].imshow(x, aspect=256/1638)
    ax[i].set_title(f"{id}: {int(label)}")

### Test-like Train

In [None]:
fig, ax = plt.subplots(6, 5, figsize=(12, 16), tight_layout=True)
ax = ax.flatten()

for i, (id, label) in enumerate(av.query("target == 0").sort_values("pred")[-30:][["id", "label"]].values):
    x = np.load(get_train_filename_by_id(id))
    x = np.vstack(x).astype(np.float32)
    ax[i].imshow(x, aspect=256/1638)
    ax[i].set_title(f"{id}: {int(label)}")

### Train-like Test

In [None]:
fig, ax = plt.subplots(6, 5, figsize=(12, 16), tight_layout=True)
ax = ax.flatten()

for i, (id, label) in enumerate(av.query("target == 1").sort_values("pred")[:30][["id", "label"]].values):
    x = np.load(get_test_filename_by_id(id))
    x = np.vstack(x).astype(np.float32)
    ax[i].imshow(x, aspect=256/1638)
    ax[i].set_title(f"{id}")

### Test-like Test

In [None]:
fig, ax = plt.subplots(6, 5, figsize=(12, 16), tight_layout=True)
ax = ax.flatten()

for i, (id, label) in enumerate(av.query("target == 1").sort_values("pred")[-30:][["id", "label"]].values):
    x = np.load(get_test_filename_by_id(id))
    x = np.vstack(x).astype(np.float32)
    ax[i].imshow(x, aspect=256/1638)
    ax[i].set_title(f"{id}")

## Next Steps
I trained a needle detection model with only the test-like train samples, but it was hard to train. When I set the test-likeness threshold 0.1 or 0.01, the model did not generalize at all (validation loss didn't decrease despite random split). Probably it should be set around 0.0001 to save enough number of training samples.