## Tile Coding

---
<img src="Tiling.png" alt="drawing" width="800">
Tile coding is a form of coarse coding for multi-dimensional continuous spaces that is flexible and computationally efficient. It may be the most practical feature representation for modern sequential digital computers.


In tile coding the receptive fields of the features are grouped into partitions of the state space. Each such partition is called a tiling, and each element of the partition is called a tile.

---
## Tiling Offset

---
Tilings in all cases are offset from each other by a fraction of a tile width in each dimension. If `w` denotes the tile width and `n` the number of tilings, then `w` is a fundamental `w/n` unit.

In particular, for a continuous space of dimension `k`, a good choice is to use the first odd integers `(1,3,5,7,...,2k-1)`, with `n (the number of tilings)` set to an integer power of 2 greater than or equal to `4k`. (example: `<k = 2, n = 2^3 >= 4k>`)

In [1]:
import numpy as np

For a 2 dimensional space with features `x`, `y`, and `number of tilings = n`, 
then 
> `for each tiling`:
>>   `bins` * `bins`

In [6]:
def create_tiling(feat_range, bins, offset):
    """
    Create 1 tiling spec of 1 dimension(feature)
    feat_range: feature range; example: [-1, 1]
    bins: number of bins for that feature; example: 10
    offset: offset for that feature; example: 0.2
    """
    
    return np.linspace(feat_range[0], feat_range[1], bins+1)[1:-1] + offset

In [11]:
feat_range = [0, 1.0]
bins = 10
offset = 0.2

tiling_spec = create_tiling(feat_range, bins, offset)

tiling_spec

array([0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. , 1.1])

In [14]:
def create_tilings(feature_ranges, number_tilings, bins, offsets):
    """
    feature_ranges: range of each feature; example: x: [-1, 1], y: [2, 5] -> [[-1, 1], [2, 5]]
    number_tilings: number of tilings; example: 3 tilings
    bins: bin size for each tiling and dimension; example: [[10, 10], [10, 10], [10, 10]]: 3 tilings * [x_bin, y_bin]
    offsets: offset for each tiling and dimension; example: [[0, 0], [0.2, 1], [0.4, 1.5]]: 3 tilings * [x_offset, y_offset]
    """
    tilings = []
    # for each tiling
    for tile_i in range(number_tilings):
        tiling_bin = bins[tile_i]
        tiling_offset = offsets[tile_i]
        
        tiling = []
        # for each feature dimension
        for feat_i in range(len(feature_ranges)):
            feat_range = feature_ranges[feat_i]
            # tiling for 1 feature
            feat_tiling = create_tiling(feat_range, tiling_bin[feat_i], tiling_offset[feat_i])
            tiling.append(feat_tiling)
        tilings.append(tiling)
    return np.array(tilings)

In [17]:
feature_ranges = [[-1, 1], [2, 5]]  # 2 features
number_tilings = 3
bins = [[10, 10], [10, 10], [10, 10]]  # each tiling has a 10*10 grid
offsets = [[0, 0], [0.2, 1], [0.4, 1.5]]

tilings = create_tilings(feature_ranges, number_tilings, bins, offsets)

print(tilings.shape)  # # of tilings X features X bins

(3, 2, 9)


In [21]:
tilings

array([[[-8.00000000e-01, -6.00000000e-01, -4.00000000e-01,
         -2.00000000e-01,  0.00000000e+00,  2.00000000e-01,
          4.00000000e-01,  6.00000000e-01,  8.00000000e-01],
        [ 2.30000000e+00,  2.60000000e+00,  2.90000000e+00,
          3.20000000e+00,  3.50000000e+00,  3.80000000e+00,
          4.10000000e+00,  4.40000000e+00,  4.70000000e+00]],

       [[-6.00000000e-01, -4.00000000e-01, -2.00000000e-01,
          5.55111512e-17,  2.00000000e-01,  4.00000000e-01,
          6.00000000e-01,  8.00000000e-01,  1.00000000e+00],
        [ 3.30000000e+00,  3.60000000e+00,  3.90000000e+00,
          4.20000000e+00,  4.50000000e+00,  4.80000000e+00,
          5.10000000e+00,  5.40000000e+00,  5.70000000e+00]],

       [[-4.00000000e-01, -2.00000000e-01,  1.11022302e-16,
          2.00000000e-01,  4.00000000e-01,  6.00000000e-01,
          8.00000000e-01,  1.00000000e+00,  1.20000000e+00],
        [ 3.80000000e+00,  4.10000000e+00,  4.40000000e+00,
          4.70000000e+00,  5.00

In [30]:
def get_tile_coding(feature, tilings):
    """
    feature: sample feature with multiple dimensions that need to be encoded; example: [0.1, 2.5], [-0.3, 2.0]
    tilings: tilings with a few layers
    return: the encoding for the feature on each layer
    """
    num_dims = len(feature)
    feat_codings = []
    for tiling in tilings:
        feat_coding = []
        for i in range(num_dims):
            feat_i = feature[i]
            tiling_i = tiling[i]  # tiling on that dimension
            coding_i = np.digitize(feat_i, tiling_i)
            feat_coding.append(coding_i)
        feat_codings.append(feat_coding)
    return np.array(feat_codings)

In [31]:
feature = [0.1, 2.5]

coding = get_tile_coding(feature, tilings)
coding

array([[5, 1],
       [4, 0],
       [3, 0]])

### Q table & Tiling combine
---
For each tiling, there is a q table with size `<state_dim1 * state_dim2 * ... * action>`

In [94]:
class QValueFunction:
    
    def __init__(self, tilings, actions, lr):
        self.tilings = tilings
        self.num_tilings = len(self.tilings)
        self.actions = actions
        self.lr = lr/self.num_tilings  # learning rate equally assigned to each tiling
        self.state_sizes = [tuple(len(splits)+1 for splits in tiling) for tiling in self.tilings]  # [(10, 10), (10, 10), (10, 10)]
        self.q_tables = [np.zeros(shape=(state_size+(len(self.actions),))) for state_size in self.state_sizes]
        
    def value(self, state, action):
        state_codings = get_tile_coding(state, self.tilings)  # [[5, 1], [4, 0], [3, 0]] ...
        action_idx = self.actions.index(action)
        
        value = 0
        for coding, q_table in zip(state_codings, self.q_tables):
            # for each q table
            value += q_table[tuple(coding)+(action,)]
        return value
    
    def update(self, state, action, target):
        state_codings = get_tile_coding(state, self.tilings)  # [[5, 1], [4, 0], [3, 0]] ...
        action_idx = self.actions.index(action)
        
        for coding, q_table in zip(state_codings, self.q_tables):
            q_table[tuple(coding)+(action,)] += self.lr*(target - q_table[tuple(coding)+(action,)])
            

In [95]:
q = QValueFunction(tilings, [0, 1, 2], 0.1)

In [99]:
q.update([0.3, 2.4], 1, 2.1)
q.value([0.3, 2.4], 1)

0.41300000000000003