# `nearest_record` example

For `synthimpute` package. Uses the `mpg` sample dataset.

## Setup

In [1]:
import synthimpute as si
import pandas as pd
import numpy as np
from scipy.spatial.distance import euclidean

In [2]:
mpg = pd.read_csv(
    "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/mpg.csv"
)
# Drop class columns and sometimes-missing horsepower.
mpg.drop(["origin", "name", "horsepower"], axis=1, inplace=True)

## Synthesize

In [3]:
synth = si.rf_synth(mpg, ["cylinders"], random_state=0)

Synthesizing feature 1 of 5: acceleration...
Synthesizing feature 2 of 5: mpg...
Synthesizing feature 3 of 5: weight...
Synthesizing feature 4 of 5: model_year...
Synthesizing feature 5 of 5: displacement...


## `nearest_record`

In [4]:
nearest = si.nearest_record(synth, mpg, metric="euclidean")
nearest.head()

Unnamed: 0,id_A,id_B,dist
0,0,271,0.348291
1,1,97,0.622349
2,2,236,10.34488
3,3,386,0.096679
4,4,84,1.956868


Verify that the first record in `nearest` matches `euclidean()`.

In [5]:
euclidean(synth.iloc[0], mpg.iloc[int(nearest.iloc[0].id_B)])

0.3482910985210263