# fastai v2 Kernel Starter Code

The goal of this kernel is to show how to train a neural network using fastai 2.0 for this Kaggle Competition

## Grabbing the Library

First we need to enable internet access within this kernel and then `!git clone` the `fastai_dev` repository for us to import from.

In [None]:
#!git clone https://github.com/fastai/fastai_dev.git
#%cd fastai_dev/dev

We're going to need a variety of imports, most importantly the `tabular.core` module for building the dataset (the rest deal with training the model)

In [1]:
from fastai2.data.all import *
from fastai2.tabular.core import *
from fastai2.tabular.model import *
from fastai2.optimizer import *
from fastai2.learner import *
from fastai2.metrics import *
from fastai2.callback.all import *

## Setting Up Our Data

Let's make a `Path` object to our data and combine the `train.csv` with the `building_metadata.csv` to grab some more information about these meter readings. For simplicity we will use the first 1000 samples from the training set. For the `DataFrame` preperation please see ryches Kernel [here](https://www.kaggle.com/ryches/simple-lgbm-solution)

In [4]:
path = Path('../input')

In [5]:
train = pd.read_csv(path/'train.csv')
#train = train.iloc[:5000]
bldg = pd.read_csv(path/'building_metadata.csv')
weather_train = pd.read_csv(path/"weather_train.csv")

In [6]:
train = train[np.isfinite(train['meter_reading'])]

In [7]:
train['meter_reading'].min(), train['meter_reading'].max()

(0.0, 21904700.0)

In [8]:
train['meter_reading'] = np.log1p(train['meter_reading'])

In [9]:
train['meter_reading'].min(), train['meter_reading'].max()

(0.0, 16.902211829285342)

In [10]:
np.all(np.isfinite(train['meter_reading']))

True

In [11]:
bldg.head()

Unnamed: 0,site_id,building_id,primary_use,square_feet,year_built,floor_count
0,0,0,Education,7432,2008.0,
1,0,1,Education,2720,2004.0,
2,0,2,Education,5376,1991.0,
3,0,3,Education,23685,2002.0,
4,0,4,Education,116607,1975.0,


In [12]:
train = train.merge(bldg, left_on = 'building_id', right_on = 'building_id', how = 'left')

In [13]:
train.head()

Unnamed: 0,building_id,meter,timestamp,meter_reading,site_id,primary_use,square_feet,year_built,floor_count
0,0,0,2016-01-01 00:00:00,0.0,0,Education,7432,2008.0,
1,1,0,2016-01-01 00:00:00,0.0,0,Education,2720,2004.0,
2,2,0,2016-01-01 00:00:00,0.0,0,Education,5376,1991.0,
3,3,0,2016-01-01 00:00:00,0.0,0,Education,23685,2002.0,
4,4,0,2016-01-01 00:00:00,0.0,0,Education,116607,1975.0,


In [14]:
weather_train.head()

Unnamed: 0,site_id,timestamp,air_temperature,cloud_coverage,dew_temperature,precip_depth_1_hr,sea_level_pressure,wind_direction,wind_speed
0,0,2016-01-01 00:00:00,25.0,6.0,20.0,,1019.7,0.0,0.0
1,0,2016-01-01 01:00:00,24.4,,21.1,-1.0,1020.2,70.0,1.5
2,0,2016-01-01 02:00:00,22.8,2.0,21.1,0.0,1020.2,0.0,0.0
3,0,2016-01-01 03:00:00,21.1,2.0,20.6,0.0,1020.1,0.0,0.0
4,0,2016-01-01 04:00:00,20.0,2.0,20.0,-1.0,1020.0,250.0,2.6


In [15]:
train = train.merge(weather_train, left_on = ['site_id', 'timestamp'], right_on = ['site_id', 'timestamp'])

In [16]:
train

Unnamed: 0,building_id,meter,timestamp,meter_reading,site_id,primary_use,square_feet,year_built,floor_count,air_temperature,cloud_coverage,dew_temperature,precip_depth_1_hr,sea_level_pressure,wind_direction,wind_speed
0,0,0,2016-01-01 00:00:00,0.000000,0,Education,7432,2008.0,,25.0,6.0,20.0,,1019.7,0.0,0.0
1,1,0,2016-01-01 00:00:00,0.000000,0,Education,2720,2004.0,,25.0,6.0,20.0,,1019.7,0.0,0.0
2,2,0,2016-01-01 00:00:00,0.000000,0,Education,5376,1991.0,,25.0,6.0,20.0,,1019.7,0.0,0.0
3,3,0,2016-01-01 00:00:00,0.000000,0,Education,23685,2002.0,,25.0,6.0,20.0,,1019.7,0.0,0.0
4,4,0,2016-01-01 00:00:00,0.000000,0,Education,116607,1975.0,,25.0,6.0,20.0,,1019.7,0.0,0.0
5,5,0,2016-01-01 00:00:00,0.000000,0,Education,8000,2000.0,,25.0,6.0,20.0,,1019.7,0.0,0.0
6,6,0,2016-01-01 00:00:00,0.000000,0,Lodging/residential,27926,1981.0,,25.0,6.0,20.0,,1019.7,0.0,0.0
7,7,0,2016-01-01 00:00:00,0.000000,0,Education,121074,1989.0,,25.0,6.0,20.0,,1019.7,0.0,0.0
8,8,0,2016-01-01 00:00:00,0.000000,0,Education,60809,2003.0,,25.0,6.0,20.0,,1019.7,0.0,0.0
9,9,0,2016-01-01 00:00:00,0.000000,0,Office,27000,2010.0,,25.0,6.0,20.0,,1019.7,0.0,0.0


In [17]:
del weather_train

In [18]:
train["timestamp"] = pd.to_datetime(train["timestamp"])
train["hour"] = train["timestamp"].dt.hour
train["day"] = train["timestamp"].dt.day
train["weekend"] = train["timestamp"].dt.weekday
train["month"] = train["timestamp"].dt.month

In [19]:
train.drop('timestamp', axis=1, inplace=True)

## Making the DataBunch

Next, just like in fastai v1 we need to declare a few things. Specifically our Categorical and Continuous variables, our preprocessors (Normalization, Categorification, and FillMissing), along with how we want to split our data. `fastai` v2 now includes a `RandomSplitter` which is similar to `.split_by_rand_pct()` but now we can specify a custom range for our data (hence `range_of(train)`)

In [20]:
cat_vars = ["building_id", "primary_use", "hour", "day", "weekend", "month", "meter"]
cont_vars = ["square_feet", "year_built", "air_temperature", "cloud_coverage",
              "dew_temperature"]
dep_var = 'meter_reading'

In [21]:
procs = [Normalize, Categorify, FillMissing, Cuda]
splits = RandomSplitter()(range_of(train))

Now that those are defined, we can create a `TabularPandas` object by passing in our dataframe, the `procs`, our variables, what our `y` is, and how we want to split our data. `fastai` v2 is built on a Pipeline structure where first we dictate what we want to do, then we call the databunch (the high-level API is not done yet so we have nothing similar to directly DataBunching an object)

In [22]:
to = TabularPandas(train, procs, cat_vars, cont_vars, y_names=dep_var, splits=splits, is_y_cat=False)

If we look at what `to` actually is, we can see what looks to be a bunch of batches of our data aligned into a dataframe that can easily be read!

In [23]:
to

          building_id  meter  meter_reading  site_id  primary_use  \
8762645          1019      1       5.013963       10            1   
11644712          213      4       5.741897        2            1   
9447939           988      3       5.476045        9            1   
13039607          186      1       4.377768        2            1   
262060           1159      2       0.000000       13           10   
167028           1211      2       1.503766       13            7   
6442559           388      1       5.463662        3            1   
8919853          1148      3       7.779726       13            2   
16015693          107      4       2.397895        1            1   
4694074           210      1       6.862036        2            1   
18626663           43      1       7.216225        0            8   
16363991          199      2       6.365838        2            7   
9713755           792      3       0.000000        7            1   
1972248          1351      2      

We can then also easily look at our training and validation datasets by calling `.train` or `.valid`

In [24]:
to.train

          building_id  meter  meter_reading  site_id  primary_use  \
8762645          1019      1       5.013963       10            1   
11644712          213      4       5.741897        2            1   
9447939           988      3       5.476045        9            1   
13039607          186      1       4.377768        2            1   
262060           1159      2       0.000000       13           10   
167028           1211      2       1.503766       13            7   
6442559           388      1       5.463662        3            1   
8919853          1148      3       7.779726       13            2   
16015693          107      4       2.397895        1            1   
4694074           210      1       6.862036        2            1   
18626663           43      1       7.216225        0            8   
16363991          199      2       6.365838        2            7   
9713755           792      3       0.000000        7            1   
1972248          1351      2      

From here we can create our DataBunch object one of two ways. We can either directly do a `dbch = to.databunch()`, *or* we can take it one step further and apply custom works to some dataloaders. First let's look at the basic version

In [25]:
dbch = to.databunch()
dbch.valid_dl.show_batch()

Unnamed: 0,square_feet,year_built,air_temperature,cloud_coverage,dew_temperature,meter_reading,building_id,primary_use,hour,day,weekend,month,meter,square_feet_na,year_built_na,air_temperature_na,cloud_coverage_na,dew_temperature_na
0,152559.001208,1970.0,-7.700001,5.556853e-08,-10.9,5.0,1029,Education,5,28,3,1,3,False,True,False,True,False
1,96336.000417,1912.999999,-4.4,5.556853e-08,-6.1,2.0,1396,Lodging/residential,3,10,5,12,0,False,False,False,True,False
2,9111.001273,1958.0,-1.700001,5.556853e-08,-7.8,2.0,1411,Education,19,10,5,12,0,False,False,False,True,False
3,428646.990609,1970.0,32.8,2.0,22.2,7.0,993,Education,20,28,6,8,0,False,True,False,False,False
4,71087.998445,1970.0,29.4,5.556853e-08,13.3,0.0,1259,Education,16,26,3,5,2,False,True,False,False,False
5,237545.993313,1970.0,0.600001,5.556853e-08,-10.6,8.0,1291,Office,21,24,6,1,2,False,True,False,False,False
6,1857.999255,1970.0,7.8,4.0,-1.1,1.0,381,Entertainment/public assembly,7,8,4,4,0,False,True,False,False,False
7,519.996752,1970.0,20.6,2.0,18.3,0.0,809,Entertainment/public assembly,4,1,1,11,0,False,True,False,False,False
8,345836.997385,1966.0,8.3,5.556853e-08,6.1,4.0,206,Public services,9,8,4,1,3,False,False,False,True,False
9,95181.999877,1970.0,20.0,8.0,17.8,6.0,1180,Education,12,19,4,8,2,False,True,False,False,False


In [26]:
is_categorical_dtype(to.meter_reading)

False

In [27]:
next(iter(dbch.train_dl))

(tensor([[1189,    7,   23,    7,    2,    6,    1,    1,    2,    1,    1,    1],
         [ 260,    1,   10,   13,    4,   10,    1,    1,    2,    1,    1,    1],
         [ 936,    7,   17,   26,    6,   11,    2,    1,    2,    1,    1,    1],
         [ 535,    2,   16,   21,    7,    8,    1,    1,    2,    1,    1,    1],
         [ 370,   10,   17,    2,    6,    1,    1,    1,    2,    1,    1,    1],
         [ 508,    1,   19,   19,    2,    4,    1,    1,    2,    1,    1,    1],
         [ 858,    2,    2,   10,    1,   10,    1,    1,    2,    1,    1,    1],
         [ 351,    1,   15,   29,    6,   10,    1,    1,    1,    1,    1,    1],
         [ 788,    5,    2,    3,    7,    4,    1,    1,    2,    1,    2,    1],
         [1219,    2,   21,   27,    1,    6,    1,    1,    2,    1,    2,    1],
         [1233,    1,   24,    4,    6,    6,    4,    1,    2,    1,    1,    1],
         [1190,    7,   21,    3,    4,   11,    3,    1,    2,    1,    1,    1],
    

In [28]:
trn_dl = TabDataLoader(to.train, bs=64, shuffle=True, drop_last=True)
val_dl = TabDataLoader(to.valid, bs=128)

Lastly we can create a `DataBunch` object by calling `DataBunch()` and passing in our two `DataLoaders`

In [29]:
dbunch = DataBunch(trn_dl, val_dl)
dbunch.valid_dl.show_batch()

Unnamed: 0,square_feet,year_built,air_temperature,cloud_coverage,dew_temperature,meter_reading,building_id,primary_use,hour,day,weekend,month,meter,square_feet_na,year_built_na,air_temperature_na,cloud_coverage_na,dew_temperature_na
0,152559.001208,1970.0,-7.700001,5.556853e-08,-10.9,5.0,1029,Education,5,28,3,1,3,False,True,False,True,False
1,96336.000417,1912.999999,-4.4,5.556853e-08,-6.1,2.0,1396,Lodging/residential,3,10,5,12,0,False,False,False,True,False
2,9111.001273,1958.0,-1.700001,5.556853e-08,-7.8,2.0,1411,Education,19,10,5,12,0,False,False,False,True,False
3,428646.990609,1970.0,32.8,2.0,22.2,7.0,993,Education,20,28,6,8,0,False,True,False,False,False
4,71087.998445,1970.0,29.4,5.556853e-08,13.3,0.0,1259,Education,16,26,3,5,2,False,True,False,False,False
5,237545.993313,1970.0,0.600001,5.556853e-08,-10.6,8.0,1291,Office,21,24,6,1,2,False,True,False,False,False
6,1857.999255,1970.0,7.8,4.0,-1.1,1.0,381,Entertainment/public assembly,7,8,4,4,0,False,True,False,False,False
7,519.996752,1970.0,20.6,2.0,18.3,0.0,809,Entertainment/public assembly,4,1,1,11,0,False,True,False,False,False
8,345836.997385,1966.0,8.3,5.556853e-08,6.1,4.0,206,Public services,9,8,4,1,3,False,False,False,True,False
9,95181.999877,1970.0,20.0,8.0,17.8,6.0,1180,Education,12,19,4,8,2,False,True,False,False,False


WARNING! Look at the `meter_reading` above, what's up?

## Training the Model

First we need to create a `TabularModel` that needs an embedding matrix size, how many continuous variables to expect, the number of possible outputs (classes), and how big we want our layers. To pass in the embedding matrix sizes, we can use `get_emb_sz` onto a `TabularPandas` object

First let's define our embedding size rule of thumb, along with our `get_emb_sz` function

In [30]:
def emb_sz_rule(n_cat): 
    "Rule of thumb to pick embedding size corresponding to `n_cat`"
    return min(600, round(1.6 * n_cat**0.56))

In [31]:
def _one_emb_sz(classes, n, sz_dict=None):
    "Pick an embedding size for `n` depending on `classes` if not given in `sz_dict`."
    sz_dict = ifnone(sz_dict, {})
    n_cat = len(classes[n])
    sz = sz_dict.get(n, int(emb_sz_rule(n_cat)))  # rule of thumb
    return n_cat,sz

In [32]:
def get_emb_sz(to, sz_dict=None):
    "Get default embedding size from `TabularPreprocessor` `proc` or the ones in `sz_dict`"
    return [_one_emb_sz(to.procs.classes, n, sz_dict) for n in to.cat_names]

Now we pass in our `TabularPandas` object, `to`

In [33]:
emb_szs = get_emb_sz(to); print(emb_szs)

[(1450, 94), (17, 8), (25, 10), (32, 11), (8, 5), (13, 7), (5, 4), (2, 2), (3, 3), (3, 3), (3, 3), (3, 3)]


The last piece of the puzzle we need is our basic `TabularModel`

In [34]:
class TabularModel(Module):
    "Basic model for tabular data."
    def __init__(self, emb_szs, n_cont, out_sz, layers, ps=None, embed_p=0., y_range=None, use_bn=True, bn_final=False):
        ps = ifnone(ps, [0]*len(layers))
        if not is_listy(ps): ps = [ps]*len(layers)
        self.embeds = nn.ModuleList([Embedding(ni, nf) for ni,nf in emb_szs])
        self.emb_drop = nn.Dropout(embed_p)
        self.bn_cont = nn.BatchNorm1d(n_cont)
        n_emb = sum(e.embedding_dim for e in self.embeds)
        self.n_emb,self.n_cont,self.y_range = n_emb,n_cont,y_range
        sizes = [n_emb + n_cont] + layers + [out_sz]
        actns = [nn.ReLU(inplace=True) for _ in range(len(sizes)-2)] + [None]
        _layers = [BnDropLin(sizes[i], sizes[i+1], bn=use_bn and i!=0, p=p, act=a)
                       for i,(p,a) in enumerate(zip([0.]+ps,actns))]
        if bn_final: _layers.append(nn.BatchNorm1d(sizes[-1]))
        self.layers = nn.Sequential(*_layers)
    
    def forward(self, x_cat, x_cont):
        if self.n_emb != 0:
            x = [e(x_cat[:,i]) for i,e in enumerate(self.embeds)]
            x = torch.cat(x, 1)
            x = self.emb_drop(x)
        if self.n_cont != 0:
            x_cont = self.bn_cont(x_cont)
            x = torch.cat([x, x_cont], 1) if self.n_emb != 0 else x_cont
        x = self.layers(x)
        if self.y_range is not None:
            x = (self.y_range[1]-self.y_range[0]) * torch.sigmoid(x) + self.y_range[0]
        return x

If you noticed, most of what changed with the v2 API is focused on the dataloading / DataBunch creation. The rest of this Kernel sould look very familiar to fastai users

In [35]:
model = TabularModel(emb_szs, len(to.cont_names), 1, [1000,500], y_range=(0,15)); model

TabularModel(
  (embeds): ModuleList(
    (0): Embedding(1450, 94)
    (1): Embedding(17, 8)
    (2): Embedding(25, 10)
    (3): Embedding(32, 11)
    (4): Embedding(8, 5)
    (5): Embedding(13, 7)
    (6): Embedding(5, 4)
    (7): Embedding(2, 2)
    (8): Embedding(3, 3)
    (9): Embedding(3, 3)
    (10): Embedding(3, 3)
    (11): Embedding(3, 3)
  )
  (emb_drop): Dropout(p=0.0, inplace=False)
  (bn_cont): BatchNorm1d(5, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (layers): Sequential(
    (0): BnDropLin(
      (0): Linear(in_features=158, out_features=1000, bias=True)
      (1): ReLU(inplace=True)
    )
    (1): BnDropLin(
      (0): BatchNorm1d(1000, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (1): Linear(in_features=1000, out_features=500, bias=True)
      (2): ReLU(inplace=True)
    )
    (2): BnDropLin(
      (0): BatchNorm1d(500, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (1): Linear(in_features=500, out_

Now we can define our optimization function and create our `Learner`

In [36]:
opt_func = partial(Adam, wd=0.01, eps=1e-5)
learn = Learner(dbunch, model, MSELossFlat(), opt_func=opt_func)

In [37]:
dbunch.train_dl.bs = 1024*4

In [None]:
learn.fit_one_cycle(5)

I need to solve the bug for why we are not fitting properly, but this is also just a subset of the data. Hope this helps you get started! :)

- muellerzr

In [None]:
p = learn.get_preds()

In [None]:
p.shape