# Create a Learner for inference

In [None]:
from fastai.gen_doc.nbdoc import *

In this tutorial, we'll see how the same API allows you to create an empty [`DataBunch`](/basic_data.html#DataBunch) for a [`Learner`](/basic_train.html#Learner) at inference time (once you have trained your model) and how to call the `predict` method to get the predictions on a single item.

In [None]:
jekyll_note("""As usual, this page is generated from a notebook that you can find in the <code>docs_src</code> folder of the
<a href="https://github.com/fastai/fastai">fastai repo</a>. We use the saved models from <a href="/tutorial.data.html">this tutorial</a> to
have this notebook run quickly.""")

<div markdown="span" class="alert alert-info" role="alert"><i class="fa fa-info-circle"></i> <b>Note: </b>As usual, this page is generated from a notebook that you can find in the <code>docs_src</code> folder of the
<a href="https://github.com/fastai/fastai">fastai repo</a>. We use the saved models from <a href="/tutorial.data.html">this tutorial</a> to
have this notebook run quickly.</div>

## Tabular

Last application brings us to tabular data. First let's import everything we'll need.

In [2]:
from fastai.tabular import *

We'll use a sample of the [adult dataset](https://archive.ics.uci.edu/ml/datasets/adult) here. Once we read the csv file, we'll need to specify the dependant variable, the categorical variables, the continuous variables and the processors we want to use.

In [3]:
adult = untar_data(URLs.ADULT_SAMPLE)
df = pd.read_csv(adult/'adult.csv')


In [20]:
def train_test_split(df, train_percent=.8, seed=None):
    np.random.seed(seed)
    perm = np.random.permutation(df.index)
    m = len(df.index)
    train_end = int(train_percent * m)
    train = df.loc[perm[:train_end]]
    test = df.loc[perm[train_end:]]
    return train, test

In [21]:
df = df.reset_index(drop=True)

In [22]:
np.random.seed(42)
train_df, test_df = train_test_split(df)

In [23]:
len(train_df), len(test_df)

(26048, 6513)

In [24]:
dep_var = '>=50k'


In [25]:
def unique_deps(x:Series)->List:
    od = OrderedDict.fromkeys(x)
    res = list(OrderedDict.fromkeys(x).keys())
    res.sort()
    return res, od

In [26]:
classes, od =unique_deps(df[dep_var].values)

In [27]:
cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race', 'sex', 'native-country']
cont_names = ['education-num', 'hours-per-week', 'age', 'capital-loss', 'fnlwgt', 'capital-gain']
procs = [FillMissing, Categorify, Normalize]

Then we can use the data block API to grab everything together before using `data.show_batch()`

In [28]:
data = (TabularList.from_df(train_df, path=adult, cat_names=cat_names, cont_names=cont_names, procs=procs, test_df=test_df)
                           .split_by_idx(valid_idx=range(800,1000))
                           .label_from_df(cols=dep_var)
                           .databunch())

--label_from_df() cols: >=50k
--label_from_df() cols: >=50k


We define a [`Learner`](/basic_train.html#Learner) object that we fit and then save the model.

In [29]:
learn = tabular_learner(data, layers=[200,100], metrics=accuracy)
learn.fit(1, 1e-2)
learn.save('mini_train')

epoch,train_loss,valid_loss,accuracy
1,0.339178,0.359460,0.825000


In [30]:
preds, y = learn.get_preds(DatasetType.Test)

TypeError: object of type 'NoneType' has no len()

In [None]:
preds, y = learn.get_preds(DatasetType.Test)

In [None]:
probs = np.exp(preds)

In [None]:
indexes=list(test_df.index.values)

In [None]:
#get classes
d = {}
p = {}
for indx, prob in zip(indexes, probs):
    max_idx = np.argmax(prob)
    max_val = prob[max_idx].item()
    p[indx] = max_val
    prob_c = classes[max_idx]
    d[indx] = prob_c

In [None]:
df_preds=pd.DataFrame([d, p])
df_preds=df_preds.T

In [None]:
df_preds.rename(columns={0: dep_var, 1: 'Probability'}, inplace=True)

In [None]:
df_preds.head(n=2)