# Build Target

As a recap, the [training data](../data/processed/train-physicists-from-1901.csv) is a list of physicists who were eligible to receive a Nobel Prize in Physics. That is, they were alive on and after 10 December 1901, the date the prize was first awarded. All of the physicists in the list are deceased. The data was purposely sampled in this way as the aim is to use the training set to build models that predict whether a physicist who is still alive has been awarded or is likely to be awarded the Nobel Prize in Physics.

It is finally time to use the training data, along with the [Nobel Physics Laureates](../data/raw/nobel-physics-prize-laureates.csv) collected, in order to create the target which indicates whether a physicist is a *Nobel Laureate in Physics*.

In [None]:
import pandas as pd

## Reading in the Data

First let's read in the training data and list of Nobel Physics laureates.

In [None]:
train_physicists = pd.read_csv(
    '../data/processed/train-physicists-from-1901.csv')
train_physicists.head()

In [None]:
nobel_physicists = pd.read_csv(
    '../data/raw/nobel-physics-prize-laureates.csv')
nobel_physicists.head()

## Creating the Target

It is now time to create the target from the data I have collected.

In [None]:
def build_target(full_name, laureate):
    laureate = full_name.apply(
        lambda name: name in laureate.values).map({True: 'yes', False: 'no'})
    laureate.name = 'physics_laureate'
    return laureate

In [None]:
target = build_target(train_physicists.fullName, nobel_physicists.Laureate)
assert((len(target) == len(train_physicists)))
assert(isinstance(target, pd.core.series.Series))
assert((target == 'yes').sum() == 123)
target

## Persisting the Data

Now I have the training target series, I'll persist it for future use.

In [None]:
target.to_csv('../data/processed/train-target.csv', index=False, header=True)

Let's perform a quick sanity check to make sure the data is as expected.

In [None]:
target_on_disk = pd.read_csv('../data/processed/train-target.csv', squeeze=True)
assert(target_on_disk.equals(target))