# Table Header Detection

## Requirements
- Conda or pip
- MongoDB instance
- PyMongo (will be installed by the notebook)

In [3]:
import sys
#!conda install --yes --prefix {sys.prefix} pymongo
#!conda install --yes --prefix {sys.prefix} premailer

#!{sys.executable} -m pip install numpy --upgrade
#!{sys.executable} -m pip install pandas
#!{sys.executable} -m pip install cssutils
#!{sys.executable} -m pip install premailer
#!{sys.executable} -m pip install python-crfsuite

import os
import json
import re
import pandas as pd
from pymongo import MongoClient
import requests
from bs4 import BeautifulSoup
from bs4.element import Tag
from cssutils import parseStyle
from premailer import Premailer
import time
from dateutil.parser import parse
import math
import pycrfsuite
from sklearn.model_selection import train_test_split


## Loading the seed data into mongoDB
- initial dataset [Wikipedia TabEL dataset](http://websail-fe.cs.northwestern.edu/TabEL/)
- dataset is lacking of some styling information
- we're crawling the wikipedia pages on our own
  - that should be feasible since we have to use labeled data only (both for training & testing)
  - we're taking the TabEL dataset pageID's as starting point, since we know that there should be at least one relational table

Each line of the TabEL dataset contains one JSON object representing a single table. However, the JSON objects are not contained within a JSON array. We need to wrap the single tables into an array first before we can parse the file as a whole.

In [4]:
def wrapJSONObjectLineIntoTable(inputFilePath, outputFilePath):
    inputFile = open(inputFilePath, 'r')
    outputFile = open(outputFilePath, 'w')

    outputFile.write('[')

    previousLine = False
    for tableLineJsonObject in inputFile:
        if (previousLine):
            outputFile.write(previousLine + ',')
        previousLine = tableLineJsonObject
    if (previousLine):
        outputFile.write(previousLine)

    outputFile.write(']')

    inputFile.close()
    outputFile.close()

Check if TabEL dataset has been transformed into an array before. If not, we want to parse it now.

In [12]:
inputFilePath = os.path.join('data', 'wikipedia_0_5000.json')
outputFilePath = os.path.join('data', 'wikipedia_0_5000_fixed.json')
if not os.path.isfile(outputFilePath):
    wrapJSONObjectLineIntoTable(inputFilePath, outputFilePath)

Parse JSON Array

In [83]:
tabEL = pd.read_json(os.path.join('data', 'wikipedia_0_5000_fixed.json'))
tabEL.head()

Unnamed: 0,_id,numCols,numDataRows,numHeaderRows,numericColumns,order,pgId,pgTitle,sectionTitle,tableCaption,tableData,tableHeaders,tableId
0,10000032-1,4,11,1,[1],0.535975,10000032,Mid Antrim (Northern Ireland Parliament consti...,Members of Parliament,Members of Parliament,"[[{'cellID': -1, 'textTokens': [], 'text': '',...","[[{'cellID': -1, 'textTokens': [], 'text': 'El...",1
1,1000006-1,4,21,1,[],0.856769,1000006,Römer (crater),Satellite craters,Satellite craters,"[[{'cellID': -1, 'textTokens': [], 'text': 'A'...","[[{'cellID': -1, 'textTokens': [], 'text': 'Rö...",1
2,10000088-1,2,1,3,[],0.318258,10000088,Whispermoon,,Track listing,"[[{'cellID': -1, 'textTokens': [], 'text': 'Al...","[[{'cellID': -1, 'textTokens': [], 'text': 'Pr...",1
3,10000218-1,2,6,1,[],0.553872,10000218,Khalsa Diwan Society Vancouver,First executive committee,First executive committee,"[[{'cellID': -1, 'textTokens': [], 'text': 'Pr...","[[{'cellID': -1, 'textTokens': [], 'text': 'Ti...",1
4,10000228-1,2,7,1,[1],0.951118,10000228,Julien Leparoux,Year-end charts,Year-end charts,"[[{'cellID': -1, 'textTokens': [], 'text': 'Na...","[[{'cellID': -1, 'textTokens': [], 'text': 'Ch...",1


Get 1000 unique page IDs and fetch the HTML content for it.

In [101]:
pageIDSample = tabEL[['pgId']].sample(n=1000)

## Crawl the wikipedia pages and fetch all occurring tables
We use the pageID's from the TabEL dataset and crawl the wikipedia html. One page might include multiple tables. We only extract HTML tables with the class `wikitable`. The style from the CSS file gets parsed into inline style.

In [8]:
BASE_URL = 'https://en.wikipedia.org/'
wikipediaCSSFilePath = os.path.join('data', 'wikipedia.css')
instance = Premailer(base_url=BASE_URL)

cssFilePath = os.path.join('data', 'wikipedia.css')
cssFile = open(cssFilePath, 'r')
css = cssFile.read()
style = '<style>' + css + '</style>'

def inlineCSS(html):
    return instance.transform(html.replace('</head>', style + '</head>'))

def crawl(tabEL):
    payload = { 'curid': str(tabEL['pgId']) }
    html = requests.get(BASE_URL, params=payload).text
    htmlWithInlineCSS = inlineCSS(str(html))
    return htmlWithInlineCSS

In [None]:
pageIDSample['HTML'] = pageIDSample.apply(crawl, axis='columns')

In [206]:
pageIDSample['HTML'] = pageIDSample['HTML'].str.replace('\n', '')
pageIDSample['HTML'] = pageIDSample['HTML'].str.replace('\t', '')

Since crawling is time expensive we store the data into file and db first.

In [17]:
client = MongoClient()
db = client.bob
pages = db.pages
pages.insert_many(pageIDSample.to_dict('records'))
pageIDSample.to_json(os.path.join("data", "crawled.json"))
client.close()

If we share data files this will load them into the database.

In [9]:
client = MongoClient()
db = client.bob
tables = db.tables
#tables.delete_many({})
input_file_path = '../data/new/tables.json'
file = open(input_file_path, 'r')
uniquePgIds = set()
for line in file:
    jsonTable = json.loads(line)
    oldId = jsonTable['_id']
    jsonTable['_id'] = oldId['$oid']
    
    tables.insert_one(jsonTable)
    
for table in tables:
    

In [2]:
client = MongoClient()
db = client.bob
pages = db.pages
cursor = pages.find({})
pageIDSample = pd.DataFrame(list(cursor))
client.close()

In [18]:
pageIDSample.head()

Unnamed: 0,HTML,pgId
0,"<!DOCTYPE html><html class=""client-nojs"" lang=...",10041828
1,"<!DOCTYPE html><html class=""client-nojs"" lang=...",10086127
2,"<!DOCTYPE html><html class=""client-nojs"" lang=...",1008145
3,"<!DOCTYPE html><html class=""client-nojs"" lang=...",1012548
4,"<!DOCTYPE html><html class=""client-nojs"" lang=...",10128185


Now we extract the tables along with some metadata. For each row we assign an unique ID (the index of the row within the table) and a tag (whether the row includes `th-tags` only or is contained within a `thead`)

In [None]:
HEADLINE_PATTERN = re.compile('(h|H)\d')
LABEL_CONTROLS = [
    {
        'label': 'Header',
        'color': 'light-blue'
    }, {
        'label': 'Data',
        'color': 'lime'
    }, {
        'label': 'Other',
        'color': 'orange'
    }
];

def extractPageTitle(soup):
    headlines = soup.select('h1')
    return headlines[0].text if len(headlines) > 0 else 'N/A'

def extractTableTitle(table):
    for sibling in table.previous_siblings:
        if (sibling is not None and sibling.name is not None and HEADLINE_PATTERN.match(sibling.name)):
            return sibling.text
    return 'N/A'

def addLabelControls(row, rowIndex, soup):
    labelControlTag = soup.new_tag(
        'th',
        attrs={
            'class': 'flex space-evenly'
        }
    )
    for labelControl in LABEL_CONTROLS:
        labelControlButton = soup.new_tag(
            'a',
            attrs={
                'class': 'labelButton waves-effect waves-light btn-small ' + labelControl['color'],
                'onClick': 'annotate(' + str(rowIndex) + ', "' + labelControl['label'] + '");',
            }
        )
        labelControlButton.string = labelControl['label']
        labelControlTag.append(labelControlButton)
    row.insert(0, labelControlTag)
    
def tagRow(row, rowIndex, soup, isHead=False):
    row['data-label'] = 'Header' if isHead else 'Data'
    row['data-row-index'] = rowIndex
    addLabelControls(row, rowIndex, soup)
    
def isHeaderRow(row):
    thTags = row.find_all('th', recursive=False)
    childCount = len(row.contents)
    return childCount == len(thTags) or row.parent.name == 'thead'

def tagRows(table, soup):
    rows = table.find_all('tr')
    annotations = []
    for rowIndex, row in enumerate(rows):
        isHeader = isHeaderRow(row)
        tagRow(row, rowIndex, soup, isHeader)  
        annotations.append('Header' if isHeader else 'Data')
    return annotations

def removeTableWidthLimitation(table):
    if not table.has_attr('style'):
        return
    tableStyle = parseStyle(table['style'])
    tableStyle['width'] = '100%'
    tableStyle['font-size'] = '100%'
    table['style'] = tableStyle.cssText
        
def extractTableInformation(table, pageID, tableIndex, pageTitle, soup):
    extractedInformation = {
        'pageID': pageID,
        'tableIndex': tableIndex,
        'pageTitle': pageTitle
    }
    extractedInformation['html'] = table.prettify()
    annotations = tagRows(table, soup)
    removeTableWidthLimitation(table)
    extractedInformation['taggedHtml'] = table.prettify()
    extractedInformation['annotations'] = annotations
    extractedInformation['tableTitle'] = extractTableTitle(table)
    return extractedInformation

def hasNestedTable(table):
    return len(table.select('table')) > 0

def extractTables(page):
    soup = BeautifulSoup(page['HTML'])
    pageTitle = extractPageTitle(soup)
    wikiTables = soup.select('.wikitable')
    extractedTables = []
    for tableIndex, table in enumerate(wikiTables):
        if hasNestedTable(table):
            continue
        extractedTable = extractTableInformation(table, page['pgId'], tableIndex, pageTitle, soup)
        extractedTables.append(extractedTable)
    return extractedTables

In [20]:
client = MongoClient()
db = client.bob
tables = db.tables
for extractedTables in pageIDSample.apply(extractTables, axis='columns').values:
    if len(extractedTables) > 0:
        tables.insert_many(extractedTables)
client.close()

ERROR	Property: Invalid value for "CSS Level 2.1" property: 1000 [1:1: width]


The data can now get labeled using the provided [labeling tool](https://github.com/RichStone/web-tables-header-detection/tree/master/Labeling%20Tool).

# Feature Extraction

In [4]:
client = MongoClient()
db = client.bob
tables = db.tables
cursor = tables.find({})
tables = pd.DataFrame(list(cursor))
client.close()

In [5]:
tables.head()

Unnamed: 0,_id,annotatedAt,annotations,features,html,pageID,pageTitle,tableIndex,tableTitle,taggedHtml
0,5cf28fdb1ae12a2691e6c562,1559401000000.0,"[Header, Header, Header, Data]","[{'row': 0, 'cell': 0, 'isMerged': True, 'isCe...","<table class=""wikitable floatright"" style=""flo...",10041828,Memories & Dust,0,,"<table class=""wikitable floatright"" style=""flo..."
1,5cf28fdb1ae12a2691e6c563,1559401000000.0,"[Header, Data]","[{'row': 0, 'cell': 0, 'isMerged': False, 'isC...","<table class=""wikitable"">\n <tbody>\n <tr>\n ...",10041828,Memories & Dust,1,Charts[edit],"<table class=""wikitable"">\n <tbody>\n <tr dat..."
2,5cf28fdb1ae12a2691e6c564,1559401000000.0,"[Header, Data]","[{'row': 0, 'cell': 0, 'isMerged': False, 'isC...","<table class=""wikitable"">\n <tbody>\n <tr>\n ...",10086127,Sant Esteve de Palautordera,0,Demography[edit],"<table class=""wikitable"">\n <tbody>\n <tr dat..."
3,5cf28fdb1ae12a2691e6c565,1559402000000.0,"[Header, Data, Data, Data, Data]","[{'row': 0, 'cell': 0, 'isMerged': False, 'isC...","<table class=""wikitable"" style=""text-align:cen...",1008145,Slovenia national football team,0,Euro 2000[edit],"<table class=""wikitable"" style=""text-align: ce..."
4,5cf28fdb1ae12a2691e6c566,1559402000000.0,"[Header, Data, Data, Data, Data]","[{'row': 0, 'cell': 0, 'isMerged': False, 'isC...","<table class=""wikitable"" style=""text-align:cen...",1008145,Slovenia national football team,1,2002 World Cup[edit],"<table class=""wikitable"" style=""text-align: ce..."


In [72]:
SHORT_TEXT_THRESHOLD = 20
LONG_TEXT_THRESHOLD = 40

def isInt(value):
    try: 
        int(value)
        return True
    except ValueError:
        return False
    
def getRowSpan(cell):
    if cell.has_attr('rowspan') and isInt(cell['rowspan']):
        return int(cell['rowspan'])
    return 1
    
def getColSpan(cell):
    if cell.has_attr('colspan') and isInt(cell['colspan']):
        return int(cell['colspan'])
    return 1

def isMerged(cell):
    return (
        getColSpan(cell) > 1 or
        getRowSpan(cell) > 1
    )

def isCenterAligned(cell, style):
    return (
        (cell.has_attr('align') and cell['align'] == 'center') or
        (style is not None and 'text-align' in style and style['text-align'] == 'center')
    )

def isThOrInTHead(cell):
    row = cell.parent
    rowParent = row.parent
    return (
        cell.name == 'th' or
        rowParent.name == 'thead'
    )

def extractLayoutFeatures(cell, style):
    return {
        'isMerged': isMerged(cell),
        'isCenterAligned': isCenterAligned(cell, style),
        'isTHOrInTHead': isThOrInTHead(cell)
    }

def isBold(cell, style):
    return bool(
        style is not None and (
            style['font-weight'] == 'bold' or 
            style['font-style'] == 'bold'
        ) or
        cell.find('b') or
        cell.find('strong')
    )

def isItalic(cell, style):
    return bool(cell.find('i'))

def isUnderlined(cell, style):
    return (
        cell.find('u') or
        style is not None and (
            style['text-decoration'] == 'underline' or
            style['font-style'] == 'bold'
        )
    )

def isColored(cell, style):
    return (
        style is not None and (
            'background-color' in style or
            'color' in style
        )
    )

def extractStyleFeatures(cell, style):
    return {
        'isBold': isBold(cell, style),
        'isItalic': isItalic(cell, style),
        'isUnderlined': isUnderlined(cell, style)
    }

def getCellStyle(cell):
    return parseStyle(cell['style']) if cell.has_attr('style') else None

def getContentLength(cell):
    return len(re.sub('\s+',' ', cell.get_text()).split())

def isEmpty(cell):
    return getContentLength(cell) == 0

def isText(cell):
    return cell.get_text().isalpha()

def isNumeric(cell):
    return cell.get_text().isdigit()

def isDate(cell):
    try: 
        parse(cell.get_text(), fuzzy=False)
        return True
    except (ValueError, OverflowError):
        return False
    
def isShortText(cell):
    return getContentLength(cell) <= SHORT_TEXT_THRESHOLD

def isLongText(cell):
    return getContentLength(cell) > LONG_TEXT_THRESHOLD

def isTotal(cell):
    return cell.get_text().lower() == 'total'

def extractValueFeatures(cell):
    return {
        'isEmpty': isEmpty(cell),
        'isText': isText(cell),
        'isNumeric': isNumeric(cell),
        'isDate': isDate(cell),
        'isShortText': isShortText(cell),
        'isLongText': isLongText(cell),
        'isTotal': isTotal(cell)
    }

def mapDictBoolValuesToInt(dictionary):
    return { key: int(value) for key, value in dictionary.items() }

def applyColSpanFactor(dictionary, colSpan):
    return { key: value * colSpan for key, value in dictionary.items() }

def merge(featuresA, featuresB):
    return { k: featuresA.get(k, 0) + featuresB.get(k, 0) for k in set(featuresA) | set(featuresB) }

def stringifyDictKeys(dictionary):
    return { str(key): value for key, value in dictionary.items() }

def numNormalisedCols(row):
    numCols = 0
    for cell in row.children:
        if type(cell) is Tag:
            numCols += getColSpan(cell)
    return numCols
            
def getSimilarity(feature, cell, neighbour, suffix):
    similarity = {}
    similarity[feature + 'A' + suffix] = cell[feature] and neighbour[feature]
    similarity[feature + 'B' + suffix] = cell[feature] and not neighbour[feature]
    return similarity
    
def extractSimilarityFeatures(cell, neighbour, suffix):
    similarityFeatures = {
        **getSimilarity('isMerged', cell, neighbour, suffix),
        **getSimilarity('isCenterAligned', cell, neighbour, suffix),
        **getSimilarity('isTHOrInTHead', cell, neighbour, suffix),
        **getSimilarity('isBold', cell, neighbour, suffix),
        **getSimilarity('isItalic', cell, neighbour, suffix),
        **getSimilarity('isUnderlined', cell, neighbour, suffix),
        **getSimilarity('isEmpty', cell, neighbour, suffix),
        **getSimilarity('isText', cell, neighbour, suffix),
        **getSimilarity('isNumeric', cell, neighbour, suffix),
        **getSimilarity('isDate', cell, neighbour, suffix),
        **getSimilarity('isShortText', cell, neighbour, suffix),
        **getSimilarity('isLongText', cell, neighbour, suffix),
        **getSimilarity('isTotal', cell, neighbour, suffix)        
    }
    return similarityFeatures    
    
def addSimilarityFeatures(normalizedFeatureTable):
    nftWithSimilarity = []
    numRows = len(normalizedFeatureTable)
    for rowIndex, row in enumerate(normalizedFeatureTable):
        nftWithSimilarity.append([])
        for cellIndex, cell in enumerate(row):
            features = cell
            if rowIndex > 0:
                features = {
                    **extractSimilarityFeatures(cell, normalizedFeatureTable[rowIndex - 1][cellIndex], 'u'), 
                    **features
                    }
            if rowIndex < numRows - 1:
                features = {
                    **extractSimilarityFeatures(cell, normalizedFeatureTable[rowIndex + 1][cellIndex], 'l'),
                    **features
                }
            intCellFeatures = mapDictBoolValuesToInt(features)
            nftWithSimilarity[-1].append(intCellFeatures)
    return nftWithSimilarity
    
def cleanOfEmptyCells(table):
    lastEmptyCellIndex = len(table[0])
    for row in table:
        for cellIndex,cell in enumerate(row):
            if cell == 'empty cell':
                lastEmptyCellIndex = min(lastEmptyCellIndex, cellIndex)
    newTable =[]
    for row in table:
        newTable.append(row[:lastEmptyCellIndex])
    return newTable

def normalizedFeatureTable(table):
    soup = BeautifulSoup(table['html'])
    rows = soup.select('tr')
    numRows = len(rows)
    numCols = numNormalisedCols(rows[0])
    nft = [['empty cell' for i in range(numCols)] for j in range(numRows)]
    for rowIndex, row in enumerate(rows):
        cellIndex = 0
        for cell in row.children:
            if type(cell) is not Tag:
                continue
            cellStyle = getCellStyle(cell)
            boolCellFeatures = {
                **extractLayoutFeatures(cell, cellStyle),
                **extractStyleFeatures(cell, cellStyle),
                **extractValueFeatures(cell)
            }
            boolCellFeatures['colCount'] = 1
            colSpan = getColSpan(cell)
            rowSpan = getRowSpan(cell)
            # find next empty cell
            while cellIndex < numCols and nft[rowIndex][cellIndex] != 'empty cell':
                cellIndex += 1
            for rIndex in range(rowIndex, min(rowIndex + rowSpan, numRows)):
                for cIndex in range(cellIndex, min(cellIndex + colSpan, numCols)):
                    nft[rIndex][cIndex] = boolCellFeatures
            cellIndex += colSpan
    nft = cleanOfEmptyCells(nft)
    return addSimilarityFeatures(nft)    
        
def isEmptyTable(table):
    soup = BeautifulSoup(table['html'])
    rows = soup.select('tr')
    return len(rows) == 0
    
def extractFeatures(table):
    if isEmptyTable(table):
        print(table['html'])
        return []
    featureTable = normalizedFeatureTable(table)
    rowFeatureTable = {}
    for rowIndex, row in enumerate(featureTable):
        # count how often every feature is true in a row
        rowFeatures = {}
        for cellFeatures in row:
            rowFeatures = merge(rowFeatures, cellFeatures)
        rowFeatureTable[rowIndex] = rowFeatures
    rowFeatureTable = stringifyDictKeys(rowFeatureTable)
    return rowFeatureTable

In [73]:
tables['features'] = tables.apply(extractFeatures, axis='columns')
tables.head()

<td class="wikitable hlist" colspan="9" style="text-align:center;background-color:#eaecf0">
 1.000 mi = 1.609 km; 1.000 km = 0.621 mi
 <br/>
 <div class="hlist" style="margin-left:1.6em;text-align:center;font-size:90%">
  <ul align="inherit" style="margin:0; padding:0; text-align:inherit">
   <li style="display:inline; margin:0">
    <span style="border:1px solid #000;background-color:#ddffdd;color:#ddffdd">
    </span>
    <a href="https://en.wikipedia.org/wiki/Concurrency_(road)" title="Concurrency (road)">
     Concurrency
    </a>
    terminus
   </li>
   <li style="display:inline; margin:0">
    <span style="border:1px solid #000;background-color:#dcdcfe;color:#dcdcfe">
    </span>
    <a href="https://en.wikipedia.org/wiki/Electronic_toll_collection" title="Electronic toll collection">
     Electronic toll collection
    </a>
   </li>
   <li style="display:inline; margin:0">
    <span style="border:1px solid #000;background-color:#ffdddd;color:#ffdddd">
    </span>
    <a href="h

ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]
ERRO

ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: linear-gradient(transparent, transparent), url("data:image/svg+xml,%3Csvg xmlns=%22http://www.w3.org/2000/svg%22 width=%2221%22 height=%229%22 viewBox=%220 0 21 9%22%3E %3Cpath d=%22M14.5 5l-4 4-4-4zm0-1l-4-4-4 4z%22/%3E %3C/svg%3E") [1:1: background-image]
ERROR	Property: In

<td class="wikitable hlist" colspan="6" style="text-align:center;background-color:#eaecf0">
 1.000 mi = 1.609 km; 1.000 km = 0.621 mi
 <br/>
 <div class="hlist" style="margin-left:1.6em;text-align:center;font-size:90%">
  <ul align="inherit" style="margin:0; padding:0; text-align:inherit">
   <li style="display:inline; margin:0">
    <span style="border:1px solid #000;background-color:#ddffdd;color:#ddffdd">
    </span>
    <a href="https://en.wikipedia.org/wiki/Concurrency_(road)" title="Concurrency (road)">
     Concurrency
    </a>
    terminus
   </li>
   <li style="display:inline; margin:0">
    <span style="border:1px solid #000;background-color:#ffdddd;color:#ffdddd">
    </span>
    <a href="https://en.wikipedia.org/wiki/Interchange_(road)#Complete_and_incomplete_interchanges" title="Interchange (road)">
     Incomplete access
    </a>
   </li>
   <li style="display:inline; margin:0">
    <span style="border:1px solid #000;background-color:#dff9f9;color:#dff9f9">
    </span>
  



<td class="wikitable hlist" colspan="9" style="text-align:center;background-color:#eaecf0">
 1.000 mi = 1.609 km; 1.000 km = 0.621 mi
 <br/>
 <div class="hlist" style="margin-left:1.6em;text-align:center;font-size:90%">
  <ul align="inherit" style="margin:0; padding:0; text-align:inherit">
   <li style="display:inline; margin:0">
    <span style="border:1px solid #000;background-color:#ddffdd;color:#ddffdd">
    </span>
    <a href="https://en.wikipedia.org/wiki/Concurrency_(road)" title="Concurrency (road)">
     Concurrency
    </a>
    terminus
   </li>
   <li style="display:inline; margin:0">
    <span style="border:1px solid #000;background-color:#dcdcfe;color:#dcdcfe">
    </span>
    <a href="https://en.wikipedia.org/wiki/Electronic_toll_collection" title="Electronic toll collection">
     Electronic toll collection
    </a>
   </li>
   <li style="display:inline; margin:0">
    <span style="border:1px solid #000;background-color:#ffdddd;color:#ffdddd">
    </span>
    <a href="h

ERROR	Property: Invalid value for "CSS Level 2.1" property: linear-gradient(transparent, transparent), url("data:image/svg+xml,%3Csvg xmlns=%22http://www.w3.org/2000/svg%22 width=%2221%22 height=%229%22 viewBox=%220 0 21 9%22%3E %3Cpath d=%22M14.5 5l-4 4-4-4zm0-1l-4-4-4 4z%22/%3E %3C/svg%3E") [1:1: background-image]
ERROR	Property: Invalid value for "CSS Level 2.1" property: linear-gradient(transparent, transparent), url("data:image/svg+xml,%3Csvg xmlns=%22http://www.w3.org/2000/svg%22 width=%2221%22 height=%229%22 viewBox=%220 0 21 9%22%3E %3Cpath d=%22M14.5 5l-4 4-4-4zm0-1l-4-4-4 4z%22/%3E %3C/svg%3E") [1:1: background-image]
ERROR	Property: Invalid value for "CSS Level 2.1" property: linear-gradient(transparent, transparent), url("data:image/svg+xml,%3Csvg xmlns=%22http://www.w3.org/2000/svg%22 width=%2221%22 height=%229%22 viewBox=%220 0 21 9%22%3E %3Cpath d=%22M14.5 5l-4 4-4-4zm0-1l-4-4-4 4z%22/%3E %3C/svg%3E") [1:1: background-image]
ERROR	PropertyValue: No match: ('CHAR', ':', 1

ERROR	PropertyValue: Unknown syntax or no value: border-top:1px solid darkgray
ERROR	CSSStyleDeclaration: Syntax Error in Property: border-top:border-top:1px solid darkgray
ERROR	PropertyValue: No match: ('CHAR', ':', 1, 22)
ERROR	PropertyValue: Unknown syntax or no value: border-top:1px solid darkgray
ERROR	CSSStyleDeclaration: Syntax Error in Property: border-top:border-top:1px solid darkgray
ERROR	PropertyValue: No match: ('CHAR', ':', 1, 39)
ERROR	PropertyValue: Unknown syntax or no value: border-top:1px solid darkgray
ERROR	CSSStyleDeclaration: Syntax Error in Property: border-top:border-top:1px solid darkgray
ERROR	PropertyValue: No match: ('CHAR', ':', 1, 39)
ERROR	PropertyValue: Unknown syntax or no value: border-top:1px solid darkgray
ERROR	CSSStyleDeclaration: Syntax Error in Property: border-top:border-top:1px solid darkgray
ERROR	PropertyValue: No match: ('CHAR', ':', 1, 54)
ERROR	PropertyValue: Unknown syntax or no value: border-top:1px solid darkgray
ERROR	CSSStyleDeclara

ERROR	CSSStyleDeclaration: Syntax Error in Property: border-top:border-top:1px solid darkgray
ERROR	PropertyValue: No match: ('CHAR', ':', 1, 39)
ERROR	PropertyValue: Unknown syntax or no value: border-top:1px solid darkgray
ERROR	CSSStyleDeclaration: Syntax Error in Property: border-top:border-top:1px solid darkgray
ERROR	PropertyValue: No match: ('CHAR', ':', 1, 39)
ERROR	PropertyValue: Unknown syntax or no value: border-top:1px solid darkgray
ERROR	CSSStyleDeclaration: Syntax Error in Property: border-top:border-top:1px solid darkgray
ERROR	PropertyValue: No match: ('CHAR', ':', 1, 54)
ERROR	PropertyValue: Unknown syntax or no value: border-top:1px solid darkgray
ERROR	CSSStyleDeclaration: Syntax Error in Property: border-top:border-top:1px solid darkgray
ERROR	PropertyValue: No match: ('CHAR', ':', 1, 22)
ERROR	PropertyValue: Unknown syntax or no value: border-top:1px solid darkgray
ERROR	CSSStyleDeclaration: Syntax Error in Property: border-top:border-top:1px solid darkgray
ERROR	

ERROR	No content to parse.
ERROR	PropertyValue: Unknown syntax or no value:  
ERROR	CSSStyleDeclaration: Syntax Error in Property: background: 
ERROR	No content to parse.
ERROR	PropertyValue: Unknown syntax or no value:  
ERROR	CSSStyleDeclaration: Syntax Error in Property: background: 
ERROR	No content to parse.
ERROR	PropertyValue: Unknown syntax or no value:  
ERROR	CSSStyleDeclaration: Syntax Error in Property: background: 
ERROR	No content to parse.
ERROR	PropertyValue: Unknown syntax or no value:  
ERROR	CSSStyleDeclaration: Syntax Error in Property: background: 
ERROR	No content to parse.
ERROR	PropertyValue: Unknown syntax or no value:  
ERROR	CSSStyleDeclaration: Syntax Error in Property: background: 
ERROR	No content to parse.
ERROR	PropertyValue: Unknown syntax or no value:  
ERROR	CSSStyleDeclaration: Syntax Error in Property: background: 
ERROR	No content to parse.
ERROR	PropertyValue: Unknown syntax or no value:  
ERROR	CSSStyleDeclaration: Syntax Error in Property: backg

ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]


<table class="wikitable" style="border:1px #000; float:right;">
</table>



INFO	CSSStyleDeclaration: Stripped standalone semicolon: ;
INFO	CSSStyleDeclaration: Stripped standalone semicolon: ;
INFO	CSSStyleDeclaration: Stripped standalone semicolon: ;
INFO	CSSStyleDeclaration: Stripped standalone semicolon: ;
INFO	CSSStyleDeclaration: Stripped standalone semicolon: ;
INFO	CSSStyleDeclaration: Stripped standalone semicolon: ;
INFO	CSSStyleDeclaration: Stripped standalone semicolon: ;
INFO	CSSStyleDeclaration: Stripped standalone semicolon: ;
ERROR	Property: Invalid value for "CSS Level 2.1" property: center [1:57: vertical-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: center [1:57: vertical-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: center [1:57: vertical-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: center [1:57: vertical-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: center [1:57: vertical-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: center [1:56: vert



<td class="wikitable hlist" colspan="6" style="text-align:center;background-color:#eaecf0">
 1.000 mi = 1.609 km; 1.000 km = 0.621 mi
 <br/>
</td>


ERROR	Property: Invalid value for "CSS Level 2.1" property: linear-gradient(transparent, transparent), url("data:image/svg+xml,%3Csvg xmlns=%22http://www.w3.org/2000/svg%22 width=%2221%22 height=%229%22 viewBox=%220 0 21 9%22%3E %3Cpath d=%22M14.5 5l-4 4-4-4zm0-1l-4-4-4 4z%22/%3E %3C/svg%3E") [1:1: background-image]
ERROR	Property: Invalid value for "CSS Level 2.1" property: linear-gradient(transparent, transparent), url("data:image/svg+xml,%3Csvg xmlns=%22http://www.w3.org/2000/svg%22 width=%2221%22 height=%229%22 viewBox=%220 0 21 9%22%3E %3Cpath d=%22M14.5 5l-4 4-4-4zm0-1l-4-4-4 4z%22/%3E %3C/svg%3E") [1:1: background-image]
ERROR	Property: Invalid value for "CSS Level 2.1" property: linear-gradient(transparent, transparent), url("data:image/svg+xml,%3Csvg xmlns=%22http://www.w3.org/2000/svg%22 width=%2221%22 height=%229%22 viewBox=%220 0 21 9%22%3E %3Cpath d=%22M14.5 5l-4 4-4-4zm0-1l-4-4-4 4z%22/%3E %3C/svg%3E") [1:1: background-image]
ERROR	Property: Invalid value for "CSS Level 2

<td class="wikitable hlist" colspan="6" style="text-align:center;background-color:#eaecf0">
 1.000 mi = 1.609 km; 1.000 km = 0.621 mi
 <br/>
 <div class="hlist" style="margin-left:1.6em;text-align:center;font-size:90%">
  <ul align="inherit" style="margin:0; padding:0; text-align:inherit">
   <li style="display:inline; margin:0">
    <span style="border:1px solid #000;background-color:#ffdddd;color:#ffdddd">
    </span>
    <a href="https://en.wikipedia.org/wiki/Interchange_(road)#Complete_and_incomplete_interchanges" title="Interchange (road)">
     Incomplete access
    </a>
   </li>
  </ul>
 </div>
</td>
<td class="wikitable hlist" colspan="6" style="text-align:center;background-color:#eaecf0">
 1.000 mi = 1.609 km; 1.000 km = 0.621 mi
 <br/>
</td>


INFO	CSSStyleDeclaration: Stripped standalone semicolon: ;
ERROR	Property: Invalid value for "CSS Level 2.1" property: linear-gradient(transparent, transparent), url("data:image/svg+xml,%3Csvg xmlns=%22http://www.w3.org/2000/svg%22 width=%2221%22 height=%229%22 viewBox=%220 0 21 9%22%3E %3Cpath d=%22M14.5 5l-4 4-4-4zm0-1l-4-4-4 4z%22/%3E %3C/svg%3E") [1:1: background-image]
ERROR	Property: Invalid value for "CSS Level 2.1" property: linear-gradient(transparent, transparent), url("data:image/svg+xml,%3Csvg xmlns=%22http://www.w3.org/2000/svg%22 width=%2221%22 height=%229%22 viewBox=%220 0 21 9%22%3E %3Cpath d=%22M14.5 5l-4 4-4-4zm0-1l-4-4-4 4z%22/%3E %3C/svg%3E") [1:1: background-image]
ERROR	Property: Invalid value for "CSS Level 2.1" property: linear-gradient(transparent, transparent), url("data:image/svg+xml,%3Csvg xmlns=%22http://www.w3.org/2000/svg%22 width=%2221%22 height=%229%22 viewBox=%220 0 21 9%22%3E %3Cpath d=%22M14.5 5l-4 4-4-4zm0-1l-4-4-4 4z%22/%3E %3C/svg%3E") [1:1: backg

<td class="wikitable hlist" colspan="9" style="text-align:center;background-color:#f2f2f2">
 <div class="hlist" style="margin-left: 1.6em; text-align: center; font-size:90%;">
  <ul align="inherit" style="margin:0; padding:0; text-align:inherit">
   <li style="display:inline; margin:0">
    <span style="border:1px solid #000;background-color:#d3d3d3;color:#d3d3d3">
    </span>
    Former
   </li>
   <li style="display:inline; margin:0">
    <span style="border:1px solid #000;background-color:#ffdead;color:#ffdead">
    </span>
    Future
   </li>
  </ul>
 </div>
</td>


ERROR	Property: Invalid value for "CSS Level 2.1" property: linear-gradient(transparent, transparent), url("data:image/svg+xml,%3Csvg xmlns=%22http://www.w3.org/2000/svg%22 width=%2221%22 height=%229%22 viewBox=%220 0 21 9%22%3E %3Cpath d=%22M14.5 5l-4 4-4-4zm0-1l-4-4-4 4z%22/%3E %3C/svg%3E") [1:1: background-image]
ERROR	Property: Invalid value for "CSS Level 2.1" property: linear-gradient(transparent, transparent), url("data:image/svg+xml,%3Csvg xmlns=%22http://www.w3.org/2000/svg%22 width=%2221%22 height=%229%22 viewBox=%220 0 21 9%22%3E %3Cpath d=%22M14.5 5l-4 4-4-4zm0-1l-4-4-4 4z%22/%3E %3C/svg%3E") [1:1: background-image]
ERROR	Property: Invalid value for "CSS Level 2.1" property: linear-gradient(transparent, transparent), url("data:image/svg+xml,%3Csvg xmlns=%22http://www.w3.org/2000/svg%22 width=%2221%22 height=%229%22 viewBox=%220 0 21 9%22%3E %3Cpath d=%22M14.5 5l-4 4-4-4zm0-1l-4-4-4 4z%22/%3E %3C/svg%3E") [1:1: background-image]


<td class="wikitable hlist" colspan="6" style="text-align:center;background-color:#eaecf0">
 1.000 mi = 1.609 km; 1.000 km = 0.621 mi
 <br/>
 <div class="hlist" style="margin-left:1.6em;text-align:center;font-size:90%">
  <ul align="inherit" style="margin:0; padding:0; text-align:inherit">
   <li style="display:inline; margin:0">
    <span style="border:1px solid #000;background-color:#ddffdd;color:#ddffdd">
    </span>
    <a href="https://en.wikipedia.org/wiki/Concurrency_(road)" title="Concurrency (road)">
     Concurrency
    </a>
    terminus
   </li>
  </ul>
 </div>
</td>


ERROR	PropertyValue: No match: ('CHAR', ':', 1, 54)
ERROR	PropertyValue: Unknown syntax or no value: border-top:1px solid darkgray
ERROR	CSSStyleDeclaration: Syntax Error in Property: border-top:border-top:1px solid darkgray
ERROR	PropertyValue: No match: ('CHAR', ':', 1, 22)
ERROR	PropertyValue: Unknown syntax or no value: border-top:1px solid darkgray
ERROR	CSSStyleDeclaration: Syntax Error in Property: border-top:border-top:1px solid darkgray
ERROR	PropertyValue: No match: ('CHAR', ':', 1, 39)
ERROR	PropertyValue: Unknown syntax or no value: border-top:1px solid darkgray
ERROR	CSSStyleDeclaration: Syntax Error in Property: border-top:border-top:1px solid darkgray
ERROR	PropertyValue: No match: ('CHAR', ':', 1, 39)
ERROR	PropertyValue: Unknown syntax or no value: border-top:1px solid darkgray
ERROR	CSSStyleDeclaration: Syntax Error in Property: border-top:border-top:1px solid darkgray
ERROR	PropertyValue: No match: ('CHAR', ':', 1, 54)
ERROR	PropertyValue: Unknown syntax or no value:

<td class="wikitable hlist" colspan="6" style="text-align:center;background-color:#eaecf0">
 1.000 mi = 1.609 km; 1.000 km = 0.621 mi
 <br/>
 <div class="hlist" style="margin-left:1.6em;text-align:center;font-size:90%">
  <ul align="inherit" style="margin:0; padding:0; text-align:inherit">
   <li style="display:inline; margin:0">
    <span style="border:1px solid #000;background-color:#ffdddd;color:#ffdddd">
    </span>
    <a href="https://en.wikipedia.org/wiki/Interchange_(road)#Complete_and_incomplete_interchanges" title="Interchange (road)">
     Incomplete access
    </a>
   </li>
  </ul>
 </div>
</td>


ERROR	Property: Invalid value for "CSS Level 2.1" property: linear-gradient(transparent, transparent), url("data:image/svg+xml,%3Csvg xmlns=%22http://www.w3.org/2000/svg%22 width=%2221%22 height=%229%22 viewBox=%220 0 21 9%22%3E %3Cpath d=%22M14.5 5l-4 4-4-4zm0-1l-4-4-4 4z%22/%3E %3C/svg%3E") [1:1: background-image]
ERROR	Property: Invalid value for "CSS Level 2.1" property: linear-gradient(transparent, transparent), url("data:image/svg+xml,%3Csvg xmlns=%22http://www.w3.org/2000/svg%22 width=%2221%22 height=%229%22 viewBox=%220 0 21 9%22%3E %3Cpath d=%22M14.5 5l-4 4-4-4zm0-1l-4-4-4 4z%22/%3E %3C/svg%3E") [1:1: background-image]
ERROR	Property: Invalid value for "CSS Level 2.1" property: linear-gradient(transparent, transparent), url("data:image/svg+xml,%3Csvg xmlns=%22http://www.w3.org/2000/svg%22 width=%2221%22 height=%229%22 viewBox=%220 0 21 9%22%3E %3Cpath d=%22M14.5 5l-4 4-4-4zm0-1l-4-4-4 4z%22/%3E %3C/svg%3E") [1:1: background-image]
ERROR	Property: Invalid value for "CSS Level 2

<td class="wikitable hlist" colspan="7" style="text-align:center;background-color:#eaecf0">
 1.000 mi = 1.609 km; 1.000 km = 0.621 mi
 <br/>
 <div class="hlist" style="margin-left:1.6em;text-align:center;font-size:90%">
  <ul align="inherit" style="margin:0; padding:0; text-align:inherit">
   <li style="display:inline; margin:0">
    <span style="border:1px solid #000;background-color:#ddffdd;color:#ddffdd">
    </span>
    <a href="https://en.wikipedia.org/wiki/Concurrency_(road)" title="Concurrency (road)">
     Concurrency
    </a>
    terminus
   </li>
   <li style="display:inline; margin:0">
    <span style="border:1px solid #000;background-color:#dcdcfe;color:#dcdcfe">
    </span>
    <a href="https://en.wikipedia.org/wiki/Electronic_toll_collection" title="Electronic toll collection">
     Electronic toll collection
    </a>
   </li>
   <li style="display:inline; margin:0">
    <span style="border:1px solid #000;background-color:#ffdddd;color:#ffdddd">
    </span>
    <a href="h

ERROR	PropertyValue: No match: ('CHAR', ':', 1, 54)
ERROR	PropertyValue: Unknown syntax or no value: border-top:1px solid darkgray
ERROR	CSSStyleDeclaration: Syntax Error in Property: border-top:border-top:1px solid darkgray
ERROR	PropertyValue: No match: ('CHAR', ':', 1, 22)
ERROR	PropertyValue: Unknown syntax or no value: border-top:1px solid darkgray
ERROR	CSSStyleDeclaration: Syntax Error in Property: border-top:border-top:1px solid darkgray
ERROR	PropertyValue: No match: ('CHAR', ':', 1, 39)
ERROR	PropertyValue: Unknown syntax or no value: border-top:1px solid darkgray
ERROR	CSSStyleDeclaration: Syntax Error in Property: border-top:border-top:1px solid darkgray
ERROR	PropertyValue: No match: ('CHAR', ':', 1, 39)
ERROR	PropertyValue: Unknown syntax or no value: border-top:1px solid darkgray
ERROR	CSSStyleDeclaration: Syntax Error in Property: border-top:border-top:1px solid darkgray
ERROR	PropertyValue: No match: ('CHAR', ':', 1, 54)
ERROR	PropertyValue: Unknown syntax or no value:

ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: auto [1:1: text-align]
ERROR	Property: No property value found: background-color: [1:17: :]
ERROR	CSSStyleDeclaration: Syntax Error in Property: background-color:
ERROR	PropertyValue: Missing token for production Choice(ColorValue, Dimension, URIValue, Value, variable, MSValu

<td class="wikitable hlist" colspan="7" style="text-align:center;background-color:#eaecf0">
 1.000 mi = 1.609 km; 1.000 km = 0.621 mi
 <br/>
 <div class="hlist" style="margin-left:1.6em;text-align:center;font-size:90%">
  <ul align="inherit" style="margin:0; padding:0; text-align:inherit">
   <li style="display:inline; margin:0">
    <span style="border:1px solid #000;background-color:#ffdddd;color:#ffdddd">
    </span>
    <a href="https://en.wikipedia.org/wiki/Interchange_(road)#Complete_and_incomplete_interchanges" title="Interchange (road)">
     Incomplete access
    </a>
   </li>
  </ul>
 </div>
</td>


ERROR	CSSStyleDeclaration: Unexpected token, ignoring upto '"'. [1:18: "]
ERROR	CSSStyleDeclaration: Unexpected token, ignoring upto '"'. [1:18: "]
ERROR	PropertyValue: Missing token for production Choice(ColorValue, Dimension, URIValue, Value, variable, MSValue, CSSCalc, function): ('CHAR', '#', 1, 12)
ERROR	No content to parse.
ERROR	PropertyValue: Unknown syntax or no value: #
ERROR	CSSStyleDeclaration: Syntax Error in Property: background:#
ERROR	PropertyValue: Missing token for production Choice(ColorValue, Dimension, URIValue, Value, variable, MSValue, CSSCalc, function): ('CHAR', '#', 1, 12)
ERROR	No content to parse.
ERROR	PropertyValue: Unknown syntax or no value: #
ERROR	CSSStyleDeclaration: Syntax Error in Property: background:#
ERROR	PropertyValue: Missing token for production Choice(ColorValue, Dimension, URIValue, Value, variable, MSValue, CSSCalc, function): ('CHAR', '#', 1, 12)
ERROR	No content to parse.
ERROR	PropertyValue: Unknown syntax or no value: #
ERROR	CSSStyleD

ERROR	PropertyValue: Missing token for production Choice(ColorValue, Dimension, URIValue, Value, variable, MSValue, CSSCalc, function): ('CHAR', '#', 1, 12)
ERROR	No content to parse.
ERROR	PropertyValue: Unknown syntax or no value: #
ERROR	CSSStyleDeclaration: Syntax Error in Property: background:#
ERROR	PropertyValue: Missing token for production Choice(ColorValue, Dimension, URIValue, Value, variable, MSValue, CSSCalc, function): ('CHAR', '#', 1, 12)
ERROR	No content to parse.
ERROR	PropertyValue: Unknown syntax or no value: #
ERROR	CSSStyleDeclaration: Syntax Error in Property: background:#
ERROR	PropertyValue: Missing token for production Choice(ColorValue, Dimension, URIValue, Value, variable, MSValue, CSSCalc, function): ('CHAR', '#', 1, 12)
ERROR	No content to parse.
ERROR	PropertyValue: Unknown syntax or no value: #
ERROR	CSSStyleDeclaration: Syntax Error in Property: background:#
ERROR	PropertyValue: Missing token for production Choice(ColorValue, Dimension, URIValue, Value, 

ERROR	Property: Invalid value for "CSS Level 2.1" property: center [1:57: vertical-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: center [1:57: vertical-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: center [1:57: vertical-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: center [1:57: vertical-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: center [1:57: vertical-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: center [1:57: vertical-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: center [1:57: vertical-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: center [1:57: vertical-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: center [1:57: vertical-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: center [1:57: vertical-align]
ERROR	Property: Invalid value for "CSS Level 2.1" property: center [1:57: vertical-align]
ERROR	Prop

ERROR	PropertyValue: No match: ('CHAR', ':', 1, 54)
ERROR	PropertyValue: Unknown syntax or no value: border-top:1px solid darkgray
ERROR	CSSStyleDeclaration: Syntax Error in Property: border-top:border-top:1px solid darkgray
ERROR	PropertyValue: No match: ('CHAR', ':', 1, 22)
ERROR	PropertyValue: Unknown syntax or no value: border-top:1px solid darkgray
ERROR	CSSStyleDeclaration: Syntax Error in Property: border-top:border-top:1px solid darkgray
ERROR	PropertyValue: No match: ('CHAR', ':', 1, 39)
ERROR	PropertyValue: Unknown syntax or no value: border-top:1px solid darkgray
ERROR	CSSStyleDeclaration: Syntax Error in Property: border-top:border-top:1px solid darkgray
ERROR	PropertyValue: No match: ('CHAR', ':', 1, 39)
ERROR	PropertyValue: Unknown syntax or no value: border-top:1px solid darkgray
ERROR	CSSStyleDeclaration: Syntax Error in Property: border-top:border-top:1px solid darkgray
ERROR	PropertyValue: No match: ('CHAR', ':', 1, 54)
ERROR	PropertyValue: Unknown syntax or no value:

ERROR	No content to parse.
ERROR	PropertyValue: Unknown syntax or no value: #
ERROR	CSSStyleDeclaration: Syntax Error in Property: background:#
ERROR	PropertyValue: Missing token for production Choice(ColorValue, Dimension, URIValue, Value, variable, MSValue, CSSCalc, function): ('CHAR', '#', 1, 12)
ERROR	No content to parse.
ERROR	PropertyValue: Unknown syntax or no value: #
ERROR	CSSStyleDeclaration: Syntax Error in Property: background:#
ERROR	PropertyValue: Missing token for production Choice(ColorValue, Dimension, URIValue, Value, variable, MSValue, CSSCalc, function): ('CHAR', '#', 1, 12)
ERROR	No content to parse.
ERROR	PropertyValue: Unknown syntax or no value: #
ERROR	CSSStyleDeclaration: Syntax Error in Property: background:#
ERROR	PropertyValue: Missing token for production Choice(ColorValue, Dimension, URIValue, Value, variable, MSValue, CSSCalc, function): ('CHAR', '#', 1, 12)
ERROR	No content to parse.
ERROR	PropertyValue: Unknown syntax or no value: #
ERROR	CSSStyleDecla

<td class="wikitable hlist" colspan="7" style="text-align:center;background-color:#eaecf0">
 1.000 mi = 1.609 km; 1.000 km = 0.621 mi
 <br/>
 <div class="hlist" style="margin-left:1.6em;text-align:center;font-size:90%">
  <ul align="inherit" style="margin:0; padding:0; text-align:inherit">
   <li style="display:inline; margin:0">
    <span style="border:1px solid #000;background-color:#ddffdd;color:#ddffdd">
    </span>
    <a href="https://en.wikipedia.org/wiki/Concurrency_(road)" title="Concurrency (road)">
     Concurrency
    </a>
    terminus
   </li>
   <li style="display:inline; margin:0">
    <span style="border:1px solid #000;background-color:#dcdcfe;color:#dcdcfe">
    </span>
    <a href="https://en.wikipedia.org/wiki/Electronic_toll_collection" title="Electronic toll collection">
     Electronic toll collection
    </a>
   </li>
   <li style="display:inline; margin:0">
    <span style="border:1px solid #000;background-color:#ffdddd;color:#ffdddd">
    </span>
    <a href="h

Unnamed: 0,_id,annotatedAt,annotations,features,html,pageID,pageTitle,tableIndex,tableTitle,taggedHtml,logBin
0,5cf28fdb1ae12a2691e6c562,1559401000000.0,"[Header, Header, Header, Data]","{'0': {'isShortText': 2, 'isEmptyAl': 0, 'isTe...","<table class=""wikitable floatright"" style=""flo...",10041828,Memories & Dust,0,,"<table class=""wikitable floatright"" style=""flo...","{'0': {'isShortText': {'a': 2, 'b': 1}, 'isEmp..."
1,5cf28fdb1ae12a2691e6c563,1559401000000.0,"[Header, Data]","{'0': {'isShortText': 3, 'isEmptyAl': 0, 'isTe...","<table class=""wikitable"">\n <tbody>\n <tr>\n ...",10041828,Memories & Dust,1,Charts[edit],"<table class=""wikitable"">\n <tbody>\n <tr dat...","{'0': {'isShortText': {'a': 3, 'b': 1}, 'isEmp..."
2,5cf28fdb1ae12a2691e6c564,1559401000000.0,"[Header, Data]","{'0': {'isShortText': 6, 'isEmptyAl': 0, 'isTe...","<table class=""wikitable"">\n <tbody>\n <tr>\n ...",10086127,Sant Esteve de Palautordera,0,Demography[edit],"<table class=""wikitable"">\n <tbody>\n <tr dat...","{'0': {'isShortText': {'a': 6, 'b': 2}, 'isEmp..."
3,5cf28fdb1ae12a2691e6c565,1559402000000.0,"[Header, Data, Data, Data, Data]","{'0': {'isShortText': 11, 'isEmptyAl': 0, 'isT...","<table class=""wikitable"" style=""text-align:cen...",1008145,Slovenia national football team,0,Euro 2000[edit],"<table class=""wikitable"" style=""text-align: ce...","{'0': {'isShortText': {'a': 11, 'b': 3}, 'isEm..."
4,5cf28fdb1ae12a2691e6c566,1559402000000.0,"[Header, Data, Data, Data, Data]","{'0': {'isShortText': 9, 'isEmptyAl': 0, 'isTe...","<table class=""wikitable"" style=""text-align:cen...",1008145,Slovenia national football team,1,2002 World Cup[edit],"<table class=""wikitable"" style=""text-align: ce...","{'0': {'isShortText': {'a': 9, 'b': 3}, 'isEmp..."


In [44]:
client = MongoClient()
db = client.bob
tablesCollection = db.tables
dictTables = tables.to_dict('records')
for table in dictTables:
    tablesCollection.replace_one({'_id': table['_id']}, table, True)
client.close()

## Logarithmic Binning

In [45]:
client = MongoClient()
db = client.bob
tables = db.tables
cursor = tables.find({})
tables = pd.DataFrame(list(cursor))
client.close()
tables.head()

Unnamed: 0,_id,annotatedAt,annotations,features,html,pageID,pageTitle,tableIndex,tableTitle,taggedHtml
0,5cf28fdb1ae12a2691e6c562,1559401000000.0,"[Header, Header, Header, Data]","{'0': {'isShortText': 2, 'isEmptyAl': 0, 'isTe...","<table class=""wikitable floatright"" style=""flo...",10041828,Memories & Dust,0,,"<table class=""wikitable floatright"" style=""flo..."
1,5cf28fdb1ae12a2691e6c563,1559401000000.0,"[Header, Data]","{'0': {'isShortText': 3, 'isEmptyAl': 0, 'isTe...","<table class=""wikitable"">\n <tbody>\n <tr>\n ...",10041828,Memories & Dust,1,Charts[edit],"<table class=""wikitable"">\n <tbody>\n <tr dat..."
2,5cf28fdb1ae12a2691e6c564,1559401000000.0,"[Header, Data]","{'0': {'isShortText': 6, 'isEmptyAl': 0, 'isTe...","<table class=""wikitable"">\n <tbody>\n <tr>\n ...",10086127,Sant Esteve de Palautordera,0,Demography[edit],"<table class=""wikitable"">\n <tbody>\n <tr dat..."
3,5cf28fdb1ae12a2691e6c565,1559402000000.0,"[Header, Data, Data, Data, Data]","{'0': {'isShortText': 11, 'isEmptyAl': 0, 'isT...","<table class=""wikitable"" style=""text-align:cen...",1008145,Slovenia national football team,0,Euro 2000[edit],"<table class=""wikitable"" style=""text-align: ce..."
4,5cf28fdb1ae12a2691e6c566,1559402000000.0,"[Header, Data, Data, Data, Data]","{'0': {'isShortText': 9, 'isEmptyAl': 0, 'isTe...","<table class=""wikitable"" style=""text-align:cen...",1008145,Slovenia national football team,1,2002 World Cup[edit],"<table class=""wikitable"" style=""text-align: ce..."


In [68]:
def calcA(c, r):
    if (c == 0):
        return 0
    if (c == r):
        return r
    if (c > r / 2.0):
        return math.floor(math.log2(r - c) + 1)
    return math.floor(math.log2(c) + 1)

def calcB(c, r):
    return math.floor(math.log2(r))

def isInSameBin(rowA, rowB, featureKey):
    return (
        calcB(rowA[featureKey], rowA['colCount']) == calcB(rowB[featureKey], rowB['colCount']) and 
        calcA(rowA[featureKey], rowA['colCount']) == calcA(rowB[featureKey], rowB['colCount'])
    )

def logBinTable(table):
    if len(table['features']) == 0:
        return []
    logBins = {}
    for rowIndex, row in table['features'].items():
        logBin = dict(row)
        colCount = logBin.pop('colCount')
        logBin = { 
            featureKey: { 
                'a': calcA(feature, colCount),
                'b': calcB(feature, colCount)
            } for featureKey, feature in logBin.items() 
        }
        logBins[rowIndex] = logBin
    return logBins

In [69]:
tables['logBin'] = tables.apply(logBinTable, axis='columns')
tables.head()

Unnamed: 0,_id,annotatedAt,annotations,features,html,pageID,pageTitle,tableIndex,tableTitle,taggedHtml,logBin
0,5cf28fdb1ae12a2691e6c562,1559401000000.0,"[Header, Header, Header, Data]","{'0': {'isShortText': 2, 'isEmptyAl': 0, 'isTe...","<table class=""wikitable floatright"" style=""flo...",10041828,Memories & Dust,0,,"<table class=""wikitable floatright"" style=""flo...","{'0': {'isShortText': {'a': 2, 'b': 1}, 'isEmp..."
1,5cf28fdb1ae12a2691e6c563,1559401000000.0,"[Header, Data]","{'0': {'isShortText': 3, 'isEmptyAl': 0, 'isTe...","<table class=""wikitable"">\n <tbody>\n <tr>\n ...",10041828,Memories & Dust,1,Charts[edit],"<table class=""wikitable"">\n <tbody>\n <tr dat...","{'0': {'isShortText': {'a': 3, 'b': 1}, 'isEmp..."
2,5cf28fdb1ae12a2691e6c564,1559401000000.0,"[Header, Data]","{'0': {'isShortText': 6, 'isEmptyAl': 0, 'isTe...","<table class=""wikitable"">\n <tbody>\n <tr>\n ...",10086127,Sant Esteve de Palautordera,0,Demography[edit],"<table class=""wikitable"">\n <tbody>\n <tr dat...","{'0': {'isShortText': {'a': 6, 'b': 2}, 'isEmp..."
3,5cf28fdb1ae12a2691e6c565,1559402000000.0,"[Header, Data, Data, Data, Data]","{'0': {'isShortText': 11, 'isEmptyAl': 0, 'isT...","<table class=""wikitable"" style=""text-align:cen...",1008145,Slovenia national football team,0,Euro 2000[edit],"<table class=""wikitable"" style=""text-align: ce...","{'0': {'isShortText': {'a': 11, 'b': 3}, 'isEm..."
4,5cf28fdb1ae12a2691e6c566,1559402000000.0,"[Header, Data, Data, Data, Data]","{'0': {'isShortText': 9, 'isEmptyAl': 0, 'isTe...","<table class=""wikitable"" style=""text-align:cen...",1008145,Slovenia national football team,1,2002 World Cup[edit],"<table class=""wikitable"" style=""text-align: ce...","{'0': {'isShortText': {'a': 9, 'b': 3}, 'isEmp..."


In [71]:
client = MongoClient()
db = client.bob
tablesCollection = db.tables
dictTables = tables.to_dict('records')
for table in dictTables:
    tablesCollection.replace_one({'_id': table['_id']}, table, True)
client.close()

##Conditional Random Fields

In [4]:
client = MongoClient()
db = client.bob
tables = db.tables
cursor = tables.find({})
tables = pd.DataFrame(list(cursor))
client.close()
tables.head()

Unnamed: 0,_id,annotatedAt,annotations,features,html,logBin,pageID,pageTitle,tableIndex,tableTitle,taggedHtml
0,5cf28fdb1ae12a2691e6c562,1559401000000.0,"[Header, Header, Header, Data]","{'0': {'isShortText': 2, 'isEmptyAl': 0, 'isTe...","<table class=""wikitable floatright"" style=""flo...","{'0': {'isShortText': {'a': 2, 'b': 1}, 'isEmp...",10041828,Memories & Dust,0,,"<table class=""wikitable floatright"" style=""flo..."
1,5cf28fdb1ae12a2691e6c563,1559401000000.0,"[Header, Data]","{'0': {'isShortText': 3, 'isEmptyAl': 0, 'isTe...","<table class=""wikitable"">\n <tbody>\n <tr>\n ...","{'0': {'isShortText': {'a': 3, 'b': 1}, 'isEmp...",10041828,Memories & Dust,1,Charts[edit],"<table class=""wikitable"">\n <tbody>\n <tr dat..."
2,5cf28fdb1ae12a2691e6c564,1559401000000.0,"[Header, Data]","{'0': {'isShortText': 6, 'isEmptyAl': 0, 'isTe...","<table class=""wikitable"">\n <tbody>\n <tr>\n ...","{'0': {'isShortText': {'a': 6, 'b': 2}, 'isEmp...",10086127,Sant Esteve de Palautordera,0,Demography[edit],"<table class=""wikitable"">\n <tbody>\n <tr dat..."
3,5cf28fdb1ae12a2691e6c565,1559402000000.0,"[Header, Data, Data, Data, Data]","{'0': {'isShortText': 11, 'isEmptyAl': 0, 'isT...","<table class=""wikitable"" style=""text-align:cen...","{'0': {'isShortText': {'a': 11, 'b': 3}, 'isEm...",1008145,Slovenia national football team,0,Euro 2000[edit],"<table class=""wikitable"" style=""text-align: ce..."
4,5cf28fdb1ae12a2691e6c566,1559402000000.0,"[Header, Data, Data, Data, Data]","{'0': {'isShortText': 9, 'isEmptyAl': 0, 'isTe...","<table class=""wikitable"" style=""text-align:cen...","{'0': {'isShortText': {'a': 9, 'b': 3}, 'isEmp...",1008145,Slovenia national football team,1,2002 World Cup[edit],"<table class=""wikitable"" style=""text-align: ce..."


In [27]:
def getCRFFeatures(table):
    tableFeatures = table['logBin']
    tableAnnotations = table['annotations']
    rowFeatures = []
    rowAnnotations = []
    
    for rowIndex in tableFeatures:
        rowFeatures.append(tableFeatures[rowIndex])
        rowAnnotations.append(tableAnnotations[int(rowIndex)])
    return (rowFeatures, rowAnnotations)

tableFeatures = tables.apply(getCRFFeatures, axis='columns')
featureSequence = []
lableSequence = []
for tf in tableFeatures:
    featureSequence.append(tf[0])
    lableSequence.append(tf[1])

#items = pycrfsuite.ItemSequence(featureSequence)

In [28]:
X_train, X_test, y_train, y_test = train_test_split(featureSequence, lableSequence, test_size=0.1)

trainer = pycrfsuite.Trainer(verbose=True)

print(len(X_train))
print(len(y_train))

for xseq, yseq in zip(X_train, y_train):
    trainer.append(xseq, yseq)

# params copied from https://github.com/scrapinghub/python-crfsuite/blob/master/examples/CoNLL%202002.ipynb
trainer.set_params({
    'c1': 1.0,   # coefficient for L1 penalty
    'c2': 1e-3,  # coefficient for L2 penalty
    'max_iterations': 50,  # stop earlier

    # include transitions that are possible, but not observed
    'feature.possible_transitions': True
})

trainer.train('../data/firstTraining.crfsuite')

5229
5229
Feature generation
type: CRF1d
feature.minfreq: 0.000000
feature.possible_states: 0
feature.possible_transitions: 1
0....1....2....3....4....5....6....7....8....9....10
Number of features: 399
Seconds required: 0.612

L-BFGS optimization
c1: 1.000000
c2: 0.001000
num_memories: 6
max_iterations: 50
epsilon: 0.000010
stop: 10
delta: 0.000010
linesearch: MoreThuente
linesearch.max_iterations: 20

***** Iteration #1 *****
Loss: 54923.071128
Feature norm: 0.500000
Error norm: 158659.944288
Active features: 339
Line search trials: 2
Line search step: 0.000000
Seconds required for this iteration: 0.443

***** Iteration #2 *****
Loss: 45603.986961
Feature norm: 0.463833
Error norm: 156859.957943
Active features: 332
Line search trials: 1
Line search step: 1.000000
Seconds required for this iteration: 0.149

***** Iteration #3 *****
Loss: 17140.965007
Feature norm: 0.899114
Error norm: 261925.551554
Active features: 232
Line search trials: 3
Line search step: 0.250000
Seconds required

***** Iteration #40 *****
Loss: 1143.362281
Feature norm: 5.721949
Error norm: 526.482603
Active features: 318
Line search trials: 1
Line search step: 1.000000
Seconds required for this iteration: 0.209

***** Iteration #41 *****
Loss: 1140.634086
Feature norm: 5.805245
Error norm: 334.722788
Active features: 292
Line search trials: 1
Line search step: 1.000000
Seconds required for this iteration: 0.155

***** Iteration #42 *****
Loss: 1139.013944
Feature norm: 5.902124
Error norm: 405.136825
Active features: 291
Line search trials: 1
Line search step: 1.000000
Seconds required for this iteration: 0.159

***** Iteration #43 *****
Loss: 1136.534145
Feature norm: 5.916094
Error norm: 173.808118
Active features: 316
Line search trials: 1
Line search step: 1.000000
Seconds required for this iteration: 0.155

***** Iteration #44 *****
Loss: 1134.648046
Feature norm: 6.026881
Error norm: 333.645435
Active features: 315
Line search trials: 1
Line search step: 1.000000
Seconds required for thi