# Example 3: Tile2Vec features for CDL classification
In this notebook, we'll use a Tile2Vec model that has been pre-trained on the NAIP dataset to embed a small NAIP dataset and then train a classifier on the corresponding Cropland Data Layer (CDL) labels.

In [1]:
import numpy as np
import os
import torch
from time import time
from torch.autograd import Variable

import sys
sys.path.append('../')
from src.tilenet import make_tilenet
from src.resnet import ResNet18

In [2]:
torch.cuda.empty_cache()

## Step 1. Loading pre-trained model
In this step, we will initialize a new TileNet model and then load the pre-trained weights.

In [3]:
# Setting up model
in_channels = 4
z_dim = 512
cuda = torch.cuda.is_available()
tilenet = make_tilenet(in_channels=in_channels, z_dim=z_dim)
#Use old model for now
#tilenet = ResNet18()
if cuda: tilenet.cuda()

In [4]:
# Load parameters
model_fn = '../models/TileNetNoise_epoch50_s1.ckpt'
checkpoint = torch.load(model_fn)
tilenet.load_state_dict(checkpoint)
tilenet.eval()

TileNet(
  (conv1): Conv2d(4, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (layer1): Sequential(
    (0): ResidualBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (shortcut): Sequential()
    )
    (1): ResidualBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, 

## Step 2. Embed NAIP tiles
In this step, we'll use TileNet to embed the NAIP tiles provided in `tile2vec/data/tiles`. There are 1000 tiles in total, named `1tile.npy` through `1000tile.npy`.

In [5]:
# Get data
tile_dir = '../data/tiles'
n_tiles = 1000
y = np.load(os.path.join(tile_dir, 'y.npy'))
print(y.shape)

(1000,)


In [6]:
# Embed tiles
t0 = time()
X = np.zeros((n_tiles, z_dim))
for idx in range(n_tiles):
    tile = np.load(os.path.join(tile_dir, '{}tile.npy'.format(idx+1)))
    # Get first 4 NAIP channels (5th is CDL mask)
    tile = tile[:,:,:4]
    # Rearrange to PyTorch order
    tile = np.moveaxis(tile, -1, 0)
    tile = np.expand_dims(tile, axis=0)
    # Scale to [0, 1]
    tile = tile / 255
    # Embed tile
    tile = torch.from_numpy(tile).float()
    tile = Variable(tile)
    if cuda: tile = tile.cuda()
    z = tilenet.encode(tile)
    if cuda: z = z.cpu()
    z = z.data.numpy()
    X[idx,:] = z
t1 = time()
print('Embedded {} tiles: {:0.3f}s'.format(n_tiles, t1-t0))

Embedded 1000 tiles: 8.408s


## Step 3. Train random forest classifier
In this step, we'll split the dataset into train and test sets and train a random forest classifier to predict CDL classes.

In [7]:
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler 
from sklearn.pipeline import make_pipeline

In [8]:
# Check CDL classes
print(set(y))

{1.0, 2.0, 21.0, 24.0, 152.0, 28.0, 36.0, 176.0, 49.0, 54.0, 61.0, 69.0, 71.0, 72.0, 75.0, 76.0, 205.0, 204.0, 208.0, 212.0, 217.0, 225.0, 236.0, 111.0, 121.0, 122.0, 123.0, 124.0}


Since the CDL classes are not numbered in consecutive order, we'll start by reindexing the classes from 0.

In [9]:
# Reindex CDL classes
y = LabelEncoder().fit_transform(y)
print(set(y))

{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27}


We can randomly split the data and train a random forest classifier many times to get an estimate of the average accuracy.

In [10]:
n_trials = 10
accs = np.zeros((n_trials,))
for i in range(n_trials):
    # Splitting data and training RF classifer
    X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.2)
    #rf = RandomForestClassifier()
    #rf.fit(X_tr, y_tr)
    #accs[i] = rf.score(X_te, y_te)
    pipe = make_pipeline(StandardScaler(), LogisticRegression())
    pipe.fit(X_tr, y_tr)
    accs[i] = pipe.score(X_te, y_te)
print('Mean accuracy: {:0.4f}'.format(accs.mean()))
print('Standard deviation: {:0.4f}'.format(accs.std()))

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

Mean accuracy: 0.6930
Standard deviation: 0.0326


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [11]:
accs

array([0.66 , 0.705, 0.73 , 0.665, 0.735, 0.655, 0.7  , 0.745, 0.665,
       0.67 ])

In [12]:
X_te

array([[0.        , 0.01285347, 0.72326899, ..., 0.        , 0.00450513,
        0.04605139],
       [0.        , 0.        , 0.09026568, ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.50852895, ..., 0.73325729, 0.94260317,
        0.        ],
       ...,
       [0.        , 0.03592017, 0.79199857, ..., 0.        , 0.10955492,
        0.27053744],
       [0.        , 0.        , 0.        , ..., 2.97879338, 0.0836513 ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 2.29957056, 0.14116895,
        0.        ]])

In [13]:
y_te

array([12, 20,  9,  8, 17, 12, 17, 12, 18, 20, 13, 24, 17,  1,  9, 20, 22,
        9,  8, 14,  9, 17,  9, 12, 20, 12, 12, 12, 13,  9,  9,  8,  9, 12,
       12,  5,  9, 17, 16, 12,  9, 15,  8,  8,  9,  9,  9, 20, 20, 12, 12,
        8, 17, 20, 12,  9, 12,  9, 20,  8,  9,  9, 20, 16, 12,  9,  9,  9,
       20, 17,  9,  9, 20, 17,  9,  9,  5, 20, 12, 20,  5, 17, 20, 17, 15,
        9, 20, 12, 20,  8,  9,  9, 26,  9, 20,  9, 12,  9, 12,  9, 14, 17,
        8, 20,  6,  9,  8,  9,  5, 12,  9,  9, 24,  9, 12,  9, 12,  9, 12,
       20,  9, 20, 12, 20, 17,  9, 20, 20, 21,  9,  9, 20, 12, 20, 25, 12,
        9, 12, 15,  9, 20,  9, 12,  9, 15,  9,  9,  9, 12, 12, 12, 16,  9,
       14,  9,  3,  8,  9,  9, 12,  9, 12,  1, 17,  8, 12, 12,  9,  9, 12,
       20, 20, 15,  9, 20, 12,  8, 17, 15,  9,  5,  9, 23, 16,  9, 17,  9,
       17,  9, 17, 12,  5, 17, 20, 12,  6, 12, 12,  9,  9])