# Skeleton Moment demonstrator

*Gareth Howells, January 2022*

The notenook provides code samples to explore the effectiveness of moment nased features for handwritten character recognition. The code is deliberately designed to be basic skeleton so as to allow you to experiment with it. For that reason, many parameters are set via global "constants" in the cell below and little in the form of parameter or error testing or handling is provided. There are also commented out **print** "debugging" commands which you may uncomment to display information if you wish as you explore (or you can use the debugger).

The following cell provides imports and constant values used by the system. For simplicity in providing a user interface, variations in performance can be explored by amending the appropraite value of each constants. It is recommended that only the constants under "values to explore" are edited.

In [1]:
import numpy as np
import pandas as pd
from sklearn import tree
from sklearn.tree import DecisionTreeClassifier
import matplotlib.pyplot as plt
import matplotlib.image as pltimg
import time

###############################################
## CHANGE THE FOLLOWING PATH BEFORE STARTING ##
###############################################
# change the following to the location where you have uploaded the test data
# there should be two folders within this folder called "train" and "test"
EXAMPLES_LOCATION = "/home/elr/wgjh1/jupyter/class_1_data/digits/" 

# Normally, you should not need to alter the following four values
LOCATION_SUFFIX = ".norm"
ROWS = 24 # number of rows in the image (moment value y)
COLS = 16 # number of columns in the image (moment value x)

# Values to explore===change the following as you proceed with your investigation or amend the driver function to read them in from the keyboard
NUMBER_TRAINING_EXAMPLES = 10   # change this to alter the number of training patterns
PMAX = 2 # you can increae the p values investigated
QMAX = 2 # you can independently change the q values investigated

MAX_EXAMPLES = NUMBER_TRAINING_EXAMPLES * 10 # maximum number of traning or testing examples to be read in a given instance


## moment calculator

The following function calculates the the pq'th moment value $m_{pq}$ for a given image. It is thus the fundamental component of this notebook

In [2]:
def moment(p,q,image):
    dim = image.shape # find dimensions of the image
    
    sum = 0
    
    # PUT YOUR CODE HERE
    
    return sum        

## Reading images from the example files

set of images from a given file. For simplicity, they are tailored to the file format supplied and hence not generic. They would require modification to read in differing sile fomats although the remainder of the notebook should be applicable with relatively little difficulty.

The first function **read_image** reads a single image into position ^^example** in the **images** array. Note that, in the file format supplied, there is an end of line character at the end of each row of the image which is read in and discarded. 

Note that subsequently images follow immediately after the given image, there is no form of image separator in the file (you can open the file with a simple text editor like **notepad** to see it consists of only *0* and *1* characters

The second function **read_images** governs the opening and closing of the files together with reading a set of images from the given file. Note that the location of the files is governed by the constants names and a varaible with the name of the class as a numeral. "0", "1", "2" etc.


In [3]:
def read_image(f,example,images):

    for x in range(ROWS):
        for y in range(COLS):
            images[example,x,y]=(int(f.read(1))) # read either the 0 or 1 character and convert to an integer 0 or 1
        f.read(1) # skip eol character
    
    return(images)
    

def read_images(dir_name, examples, class_name,images):

    location= EXAMPLES_LOCATION + dir_name + str(class_name) + LOCATION_SUFFIX # construct the filename from the name of the class
    
    f = open(location)
    
    for example in range (examples):
        read_image(f,example,images)
            
    return(images)
    f.close()

## Display pattern

Another support function to display a range of patterns from an array using **matplotlib**.

It displays the first **no** images from the array so requires modification to display a specific image.


In [4]:
def show_patterns(images,no):
    
    for nxt_img in range(no):
        print (nxt_img)
        imgplot = plt.imshow(images[nxt_img])
        plt.show()
        #time.sleep(2) # unblock this and change delay to watch images print one at a time.

## Generate moments for the given image 

To generate all the required moments for a particular image up to the order governed by **pmax** and **qmax**. 

The results are stored in a dictionary named **results** where the calculated moment is added to the end of the list of values for all images associated with the named moment $m_{pq}$. 

In [5]:
def gen_image_moments(pmax,qmax, image, results):

    for p in range(pmax):
        for q in range(qmax):
           
            m_name = "m" + str(p) + str (q) # generate the "name" of the moment to use to index the dictionary of moment values 
            
            m = moment(p,q,image) # calculate the moment value m_pq
            
            # print(m_name + " is "+ str(m) +"\n") # uncomment to see results as they are calculated
                                 
            results[m_name].append(m) # add the calculated moment to the appropriate entry in the dictionary --- can eliminate variable m if printing the value were not required
                        
            # print ("Moments for " + m_name + " is " + str(results[m_name])) # uncomment to see "running total" of moment values
            
    return(results)

## Generate all Training Feature values

This function generates all the required moment values for training by repeatedly calling the **gen_image_moments** function  to generate a dictionary containing all the required training values for a set of patterns. They are stored sequencially via a list associated with each dictionary item. i.e. the value of the moment $m_{pq}$ for the n'th training pattern will be the n'th entry in the list associated by the dictionary entry associted with $m_{pq}$.

The class identifier **class_name** of each entry (i.e. the actual class of the training pattern whose values lie at the n'th value of each dictionary entry is added after the moments have been calculated as dictionary entry named **class_id** in the corresponding n'th entry in the list associted with this dictionary item. It will form the target for the Dicision Tree during training.

In [6]:
def gen_moment_features(class_id,no_training_pats,images,train_features):
        
    for pattern in range(no_training_pats):
        train_features = gen_image_moments(PMAX,QMAX,images[pattern],train_features)
        
        train_features["class_id"].append(class_id) # add the true class id of the training pattern at the end
    
    # print("Raw Trained Feature values")
    # print(train_features) # uncomment here if you want to see how the "raw" dictionary of trained values appears
    
    return(train_features)


## Read in and calculate the feature values for the test image set ##

**test_image** generates the moment feature values for a given image and appends it to the list of feature values returning a list of all the feature values for the image

**test_images** drives the process by reading in a number of images given in **no_pats** for a given class given by 
**class_id**. Note that the file name is contructed using this variable also. The list of feature values is added to the list of lists **test_feature_vals** where each component list consists od the moment features for a given image.

NOTE: these function produce a list of lists which each entry in the outer list comprising a list of moment features for a given pattern

In [7]:
def test_image(image,pmax,qmax):
          
    # calculate list of moment values for the image and add it to the initially empty list of values
    
    test_feature_val = [ moment(p,q,image) for p in range(pmax) for q in range(qmax) ] 
  
    return(test_feature_val)


def test_images(class_id,no_pats):
    
    images = np.zeros((MAX_EXAMPLES,ROWS,COLS)) # empty array to store the test images
    
    images = read_images("test/bs/",no_pats,class_id,images) # read in the images, this time from the "test" folder, this may be modified to use other test files available
    
    #show_patterns(images,no_pats) # uncomment this statement to see the test images that will be used.
    
    # Generate the list of lists of test image---each entry is a list of feature values for one pattern
    test_feature_vals = [test_image(images[pat],PMAX,QMAX) for pat in range(no_pats)]
        
    # print(" Test vals ") # uncomment here to see the test images used for testing
    # print(test_feature_vals)
    
    return(test_feature_vals)


Another relatively simple function **generate_feature_dict** which generates an empty Python Dictionary with the names of the moment features paired with a list of the associted values for each pattern together with an additional entry **class_id** whcih contains a list of the id's for each pattern in the order they occur, e.g. the first entry contained the class identifier for the first entry in the correcponding lists of all the feature values. For the following example therefore, the **class_id** would contain a list identifying the class from which each row derives.

- 214.0  2759.0  1699.0  21463.0
- 167.0  2056.0  1336.0  14440.0
- 187.0  2371.0  1554.0  18152.0

In [8]:
def generate_feature_dict(feature_names):
    
    feature_dict = dict()
    
    for nxt_feature in feature_names:
        feature_dict[nxt_feature] = list()
        
    feature_dict["class_id"] = list()
    
    return(feature_dict)
                                

# Training #

This function drives the generation the training for the Decision Tree **test_tree** passed in as a parameter.

The images are read into an array called **images**. 

The feature names are stored in a dictionary with the following format
{"m00" : list(), "m01":list(), "m10":list(), "m11": list(), "class_id" : list()}

Training works as follows
1. generate a list of the names of the moment features using the p and q numbers tagged onto the end of the character "m"
2. generate an empty dictinbary to store the generated values
3. for each class, read in the required number of training images from the file, generate the moment features, placing the results in the dictionary
4. load the training data nto a Pandas DataFrame object
5. divide the data frame into the training features and the target class **class_id** 
6. train the decision tree

The above comments are repeated prior to the Python statements associted with them below.

It is possible to print out the decision tree generated.This can be quite insightful in gaining an understanding of what is being deduced from the training patterns.

In [9]:
def train_classifier(test_tree, no_classes, no_training_examples):
    images = np.zeros((MAX_EXAMPLES,ROWS,COLS))
    
    # generate a list of the names of the moment features using the p and q numbers tagged onto the end of the character "m"
    feature_names =  ["m" + str(p) + str (q) for p in range(PMAX) for q in range(QMAX)] 
        
    # print(feature_names) # uncomment this statement to see the generated names
    
    # generate an empty dictinbary
    panel_cols = generate_feature_dict(feature_names)
    
    #print("cols : " + str(panel_cols)) # ncomment this statement to see the enoty dixtionary

    # for each class, read in the required number of training images from the file, generate the moment features,
    # placing the results in the dictionary
    
    for file_name in range(no_classes):
        images= read_images("train/br/",no_training_examples,file_name,images)
        
        #show_patterns(images,no_training_examples) # uncomment this line to see the training patterns

        features = gen_moment_features(file_name,no_training_examples,images,panel_cols)
        
    #load the training data nto a Pandas DataFrame object:
    
    df = pd.DataFrame(panel_cols)

    #print(df) # uncomment here to see the data frame

    # Divide the into the training features and the target class class_id 
    features = df[feature_names]
    
    # print( "FEATURES")
    # print(features) # uncomment here to see the actual dataframe
    
    target=df['class_id']
    test_tree = test_tree.fit(features,target) # train the decision tree
   
    # tree.plot_tree(test_tree) # uncomment here to see the actual decision tree produced. This can be quite insightful in 
    # plt.show()                # gaining an understanding of what is being deduced from the training patterns.

    return(test_tree)

# A simple function to test your moment calculator #

Use the following function initially to test your moment generator.

The answer should be:
    {'m00': [214.0], 'm01': [2759.0], 'm10': [1699.0], 'm11': [21463.0], 'class_id': []}

In [10]:
def moment_tester():
    
    image = np.zeros((1,ROWS,COLS)) # space for one image
    
    feature_names = ["m" + str(p) + str (q) for p in range(2) for q in range(2)] # generate feature names for moments to order 2 (so we can see the outputs)
             
    panel_cols = generate_feature_dict(feature_names) # create empty dictionary
    
    read_images("train/br/", 1, 0,image) # read 1 image from the class 0 training file
    
    features = gen_image_moments(2,2,image[0],panel_cols) # calculate the moments up to order 2 for this image
    
    print( "FEATURES")
    print(features)
    

# Main / Driver Function #

Main driver function for the system. Requested how many training classes and repeats the test for a given class and number of examples.

Feel free to edit this function to loop and ask for as many training samples as you require or test all classes. Also include error checking. Additionally you could explore changing the order of the moments by reading in the PMAX and QMAX values,


In [11]:
def driver():

    test_tree = DecisionTreeClassifier() # untrained decision tree

           
    no_classes = int(input("How many training classes?"))
    test_tree = train_classifier(test_tree, no_classes, NUMBER_TRAINING_EXAMPLES) # you may wish to reqrite this function to read in the number of training examples from the user

    # loop classes and class numbers until user manually stops execution
    while (True):
        class_name = input("Test class name>")
        no_examples = int(input("how many examples?"))

        test_features = test_images(class_name,no_examples)

        #print("tree features") # uncomment here to see the feature values
        #print(test_features)

        # the results contained as a list of numerals showing the predicted class of each image in turn
        print("I think the test patterns are: " + str(test_tree.predict(test_features))) 

moment_tester() # use this function first to test your moment calculator. Once complete, comment this line out and uncomment driver
# driver() # uncomment here to tun main decision tree classifier.


KeyboardInterrupt: Interrupted by user