# Features

This example demonstrates how to use a dictionary mapping to represent features.

Features are clues within the language for easy tagging of the data we are dealing with.

We use python dictionary, with the key representing the labels, and the values representing the clues extracted from input data.

In [1]:
import nltk
import random

data = [
    ('KA-01-F 1034 A', 'rtc'),
    ('KA-02-F 1030 B', 'rtc'),
    ('KA-03-FA 1200 C', 'rtc'),
    ('KA-01-G 0001 A', 'gov'),
    ('KA-02-G 1004 A', 'gov'),
    ('KA-03-G 0204 A', 'gov'),
    ('KA-04-G 9230 A', 'gov'),
    ('KA-27 1290', 'oth')
]
random.shuffle(data)
test_data = [
    'KA-01-G 0109',
    'KA-02-F 9020 AC',
    'KA-02-FA 0801',
    'KA-01 9129'
]

In [4]:
def learn_simple_features():
    def vehicle_number_feature(number):
        return {'vehicle_class': number[6]}
    
    feature_sets = [(vehicle_number_feature(vn), cls) for vn, cls in data]
    classifier = nltk.NaiveBayesClassifier.train(feature_sets)
    for num in test_data:
        feature = vehicle_number_feature(num)
        print(f"(simple) {num} is of type {classifier.classify(feature)}")

In [5]:
learn_simple_features()

(simple) KA-01-G 0109 is of type gov
(simple) KA-02-F 9020 AC is of type rtc
(simple) KA-02-FA 0801 is of type rtc
(simple) KA-01 9129 is of type gov


In [6]:
def learn_features():
    def vehicle_number_feature(number):
        return {
            'vehicle_class': number[6],
            'vehicle_pre': number[5]
        }

    feature_sets = [(vehicle_number_feature(vn), cls) for vn, cls in data]
    classifier = nltk.NaiveBayesClassifier.train(feature_sets)
    for num in test_data:
        feature = vehicle_number_feature(num)
        print(f"(dual) {num} is of type {classifier.classify(feature)}")

In [7]:
learn_features()

(dual) KA-01-G 0109 is of type gov
(dual) KA-02-F 9020 AC is of type rtc
(dual) KA-02-FA 0801 is of type rtc
(dual) KA-01 9129 is of type oth
