<a href="https://colab.research.google.com/github/TristanFaine/Master_2_MLVC_Recognize_Handwritten_Equation/blob/main/Project_notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task

We were given InkML files, which contain metadata and a list of strokes: [(0,0),(1;0)]...  
These are on-line handwritten mathematical expressions, we'll try to recognize them via LG(Labelled Graph) as output.  
This is the sequence of actions performed:  
1) Determine possible stroke combinations.  
2) Remove impossible combinations with a classifier.  
3) Convert each combination to a symbol.  

We stopped ourselves at character recognition and won't handle spatial relations.   
This could be further improved by using a grammar or language model once we have predicted the symbols, to check if the resulting equation makes sense or not, and redo the prediction with different thresholds if not, or some other idea.

# Environment setup

## Getting project files

We will be importing the project files from our github repository.

In [1]:
!git clone https://github.com/TristanFaine/Master_2_MLVC_Recognize_Handwritten_Equation.git

Cloning into 'Master_2_MLVC_Recognize_Handwritten_Equation'...
remote: Enumerating objects: 124721, done.[K
remote: Counting objects: 100% (5010/5010), done.[K
remote: Compressing objects: 100% (3826/3826), done.[K
remote: Total 124721 (delta 1233), reused 4936 (delta 1167), pack-reused 119711
Receiving objects: 100% (124721/124721), 51.78 MiB | 28.34 MiB/s, done.
Resolving deltas: 100% (1237/1237), done.
Checking out files: 100% (214382/214382), done.


In [2]:
%cd Master_2_MLVC_Recognize_Handwritten_Equation/code

/content/Master_2_MLVC_Recognize_Handwritten_Equation/code


We will first show what each script does, and how to interpret their output, then we will show how to use the evaluation scripts, and finally detail how each script functions.

## Data to train our classifiers

[This](https://uncloud.univ-nantes.fr/index.php/s/OdAtErZgxKGjsNy) contains a bunch of symbols (in datasymbol_iso/) and expressions (in FullExpressions/) to help train our future classifiers.



# Showcasing parts of code

### segmenter.py explanation

The first action is to collect the possible stroke combinations, for now we'll simply take every consecutive stroke combinations.  
So if we have 4 strokes in one inkml file, 13 combinations can be done.  

Each line of the output corresponds to a hypothesis: indicating the symbol type, the symbol with index (starting with 1, but dummy value for now since we don't know what symbol the combination could be), then the symbol without index, then the supposed confidence of the model. We also have the strokes used next to these informations.

At this point, this can be optimized by already removed hypotheses take don't make sense, for instance trying to make a symbol with every single stroke is highly irregular and shouldn't happen, so we could remove that combination before calling the other scripts.

The ground truth for the segmentation is available in the original lg files, since the strokes used are listed next to each symbol.

In [8]:
!python3 segmenter.py -i ../data/formulaire001-equation001.inkml -o ../data/example.lg

In [7]:
!python3 segmenter.py -i ../data/formulaire001-equation001.inkml

O,hyp0,*,1.0,0
O,hyp1,*,1.0,1
O,hyp2,*,1.0,2
O,hyp3,*,1.0,3
O,hyp4,*,1.0,4
O,hyp5,*,1.0,0,1
O,hyp6,*,1.0,1,2
O,hyp7,*,1.0,2,3
O,hyp8,*,1.0,3,4
O,hyp9,*,1.0,0,1,2
O,hyp10,*,1.0,1,2,3
O,hyp11,*,1.0,2,3,4
O,hyp12,*,1.0,0,1,2,3
O,hyp13,*,1.0,1,2,3,4



As we've said earlier, we simply keep every consecutive stroke combination as our possible hypotheses.

### Git stuff

When messing up a commit, amend or do the 2 cells below

In [36]:
#!git stash
#!git stash drop

Saved working directory and index state WIP on main: 32e0d4665 Saved results in repo for later analysis
Dropped refs/stash@{0} (4765e3a09f38b73edf054b8a4a9c50e225b47ee5)


In [37]:
#!git reset --soft HEAD^ 

In [None]:
#!git pull

error: You have not concluded your merge (MERGE_HEAD exists).
hint: Please, commit your changes before merging.
fatal: Exiting because of unfinished merge.


In [33]:
#!git config --global user.email "XXX@gmail.com"
#!git config --global user.name "XXX"

In [38]:
#!git add .

In [39]:
#!git add ../data/

In [41]:
#!git commit -m "Updated thresholds for segmentReco"

[main d755f17ca] Updated thresholds for segmentReco
 4 files changed, 1 insertion(+), 1 deletion(-)
 rewrite code/graph_train_segmentReco.png (98%)
 rewrite code/graph_train_segmentSelector.png (99%)
 rewrite code/output.png (99%)


In [42]:
#!git push https://hiddentoken@github.com/TristanFaine/Master_2_MLVC_Recognize_Handwritten_Equation.git

Counting objects: 7, done.
Delta compression using up to 2 threads.
Compressing objects:  14% (1/7)   Compressing objects:  28% (2/7)   Compressing objects:  42% (3/7)   Compressing objects:  57% (4/7)   Compressing objects:  71% (5/7)   Compressing objects:  85% (6/7)   Compressing objects: 100% (7/7)   Compressing objects: 100% (7/7), done.
Writing objects:  14% (1/7)   Writing objects:  28% (2/7)   Writing objects:  42% (3/7)   Writing objects:  57% (4/7)   Writing objects:  71% (5/7)   Writing objects:  85% (6/7)   Writing objects: 100% (7/7)   Writing objects: 100% (7/7), 67.40 KiB | 22.47 MiB/s, done.
Total 7 (delta 3), reused 0 (delta 0)
remote: Resolving deltas:   0% (0/3)[Kremote: Resolving deltas:  33% (1/3)[Kremote: Resolving deltas:  66% (2/3)[Kremote: Resolving deltas: 100% (3/3)[Kremote: Resolving deltas: 100% (3/3), completed with 3 local objects.[K
To https://github.com/TristanFaine/Master_2_MLVC_Recognize_Handwritten_Equation.git
   687eb689b..d7

## CROHME_train_segmentSelector.py & segmentSelect.py explanation

Since we use neural networks in our overall process as our classifiers/predictors, they need to be trained beforehand. But let's first explain what we're trying to do in this part of the process:  

The script 'segmentSelect.py' takes as input the initial inkml file, alongside the "prototype" lg file: We combine the stroke combinations from the prototype file alongside the inkml stroke data to generate images, then we check whether these images make sense as a symbol, no matter the context.  
This is a classification problem with two possible outputs : Image is valid or invalid.  
We also check the confidence value of the model with a threshold in order to ignore unsure hypotheses, which should boost accuracy somewhat.

Now, for the training part, while we could simply store the weights of the models and import them, we still want to show the specifics of our training due to the characteristics of our data:

While training the model batch per batch, we make sure that each of these batches contain an representative random subset of the original data, since we have a lot less invalid images for training than valid images, while still trying to make sure that the model doesn't excessively consider valid images.  
The rest of the training logic is quite normal, the state of the model is saved whenever we achieve a new validation loss low, and we implemented early stopping to prevent overfitting.

In [11]:
!python3 CROHME_train_segmentSelector.py

cuda:0
('invalid', 'valid')
nb classes 2 , training size 12000, val size 4000, test size 4000
valid invalid valid valid
AlexNet(
  (layer1): Sequential(
    (0): Conv2d(1, 96, kernel_size=(11, 11), stride=(4, 4), bias=False)
    (1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
  )
  (layer2): Sequential(
    (0): Conv2d(96, 384, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2), bias=False)
    (1): BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
  )
  (layer3): Sequential(
    (0): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
  )
  (fc): Sequential(
    (0): Dropout(p=0.5, inplace=False)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
    (2): ReLU()
  )
  (fc1): Sequential(
    (0): Dropout(p=0.5, inplace=False)
    (1): Linear(in

## segmentSelect.py example

In [26]:
!python3 segmentSelect.py -o ../data/example2.lg ../data/formulaire001-equation001.inkml ../data/example.lg 

In [32]:
!python3 segmentSelect.py ../data/formulaire001-equation001.inkml ../data/example.lg 

O,hyp0,*,0.21581217646598816,0
O,hyp2,*,0.06668483465909958,2
O,hyp6,*,0.8783318996429443,1,2
O,hyp7,*,0.7654682993888855,2,3
O,hyp8,*,0.9960795044898987,3,4
O,hyp10,*,0.5203719139099121,1,2,3
O,hyp11,*,0.2547537386417389,2,3,4
O,hyp12,*,0.09755025804042816,0,1,2,3
O,hyp13,*,0.5135354399681091,1,2,3,4



This script takes as input the initial inkml file, alongside the "prototype" lg file: We combine the stroke combinations alongside the inkml data to generate images, then we check whether these make sense as a symbol, no matter the context.

With the outputs obtained from the classifier, we then decide which hypotheses to keep, this is done simply by selecting a somewhat high threshold (here, 0.5), and keeping only the hypotheses for which the model is confident enough.

## segmentReco.py explanation

This training script behaves similarly than the one introduces before, but it instantiates the model used with a number of classes equal to the number of classes present in our dataset.

This is achieved simply by browsing through the folder called "symbol_recognition" inside the "data" folder of our repository.

These are seperate instances since they're meant to be used for two different tasks, but it might be a good idea to try a slightly more complex (or rather, slightly bigger in terms of filters) architecture for this task and see if it improves results somewhat.


In [14]:
!python3 CROHME_train_segmentReco.py

cuda:0
['i', 'a', 'gamma', 'M_', 'q', 'dot', 'geq', 'p', 'm', 'o', 'd', 'int', 's', ']', 'h', 'H_', 'b', 'pi', 'P_', 'forall', '!', 'beta', 'rightarrow', '+', 'e', 'log', 'A_', 'X_', ',', 'sum', 'y', 'G_', 'sqrt', 'R_', '-', 'C_', 'in', 'phi', 'Delta', '7', 'x', 'E_', 'B_', 'sigma', '8', 'lim', 'z', 'N_', '0', 'n', '{', 'sin', 'pm', 'tan', 'g', 'prime', 'leq', 'div_op', 'S_', '1', '6', 't', ')', 'neq', 'times', '}', '(', 'L_', 'lambda', 'cos', 'pipe', 'u', 'V_', 'v', 'lt', 'I_', 'k', '4', 'w', '3', 'mu', 'F_', 'ldots', '[', 'c', 'alpha', '=', '2', 'r', 'infty', 'Y_', 'f', 'exists', 'j', 'T_', '9', '5', 'theta', 'l', 'gt', 'div']
nb classes 101 , training size 51480, val size 17160, test size 17162
    m    P_    pm     a
AlexNet(
  (layer1): Sequential(
    (0): Conv2d(1, 96, kernel_size=(11, 11), stride=(4, 4), bias=False)
    (1): BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
  )
  (layer2): Sequential(
    (0): Conv2d(96, 384, kernel

In [16]:
!python3 symbolReco.py -o ../data/example3.lg ../data/formulaire001-equation001.inkml ../data/example2.lg

In [30]:
!python3 symbolReco.py ../data/formulaire001-equation001.inkml ../data/example2.lg

O,hyp6,V_,0.39052459597587585,1,2
O,hyp6,ldots,0.2728962004184723,1,2
O,hyp8,=,0.26279333233833313,3,4



This script takes as input the initial inkml file, alongside the previously lg file obtained from segmentSelect.py: We obtain the list of hypotheses from the lgfile, then generate images from the stroke combinations listed in the inkml file, and see to what kind of symbols are predicted by the model:

We also use a threshold of 0.6 in order to keep only the plausible hypotheses.

## selectBestSeg.py explanation

In [18]:
!python3 selectBestSeg.py -o ../data/examplefinal.lg  ../data/example3.lg

In [22]:
!python3 selectBestSeg.py ../data/example3.lg

O,hyp13,L_,0.6920832395553589,3,4,1,2



This script takes as input the lg file obtained from symbolReco.py and applies a greedy sub-optimal algorithm in order to keep only the best hypotheses for the associated inkml file.

This is the final script used in order to predict characters, from now we need to evaluate this method and the model architectures used for this:

#Process everything

Please note this takes at least four hours to compute on a Colab session, meaning the session will probably interrupt itself after one or two hours.

In [None]:
! ./processAll.sh ../data/inkml_gt ../data/lg_output

Recognize: ../data/inkml_gt_mini/UN_101_em_0.inkml
../data/lg_output/hyp/UN_101_em_0.lg
Recognize: ../data/inkml_gt_mini/UN_101_em_10.inkml
../data/lg_output/hyp/UN_101_em_10.lg
Recognize: ../data/inkml_gt_mini/UN_101_em_11.inkml
../data/lg_output/hyp/UN_101_em_11.lg
Recognize: ../data/inkml_gt_mini/UN_101_em_12.inkml
../data/lg_output/hyp/UN_101_em_12.lg
Recognize: ../data/inkml_gt_mini/UN_101_em_13.inkml
../data/lg_output/hyp/UN_101_em_13.lg
Recognize: ../data/inkml_gt_mini/UN_101_em_14.inkml
../data/lg_output/hyp/UN_101_em_14.lg
Recognize: ../data/inkml_gt_mini/UN_101_em_15.inkml
../data/lg_output/hyp/UN_101_em_15.lg
Recognize: ../data/inkml_gt_mini/UN_101_em_16.inkml
../data/lg_output/hyp/UN_101_em_16.lg
Recognize: ../data/inkml_gt_mini/UN_101_em_17.inkml
../data/lg_output/hyp/UN_101_em_17.lg
Recognize: ../data/inkml_gt_mini/UN_101_em_18.inkml
../data/lg_output/hyp/UN_101_em_18.lg
Recognize: ../data/inkml_gt_mini/UN_101_em_19.inkml
../data/lg_output/hyp/UN_101_em_19.lg
Recognize: .

## Evaluating

In [None]:
# doing !export LgEvalDir = "/content/Master_2_MLVC_Recognize_Handwritten_Equation/lgeval/"
# doesn't work on colab since the current environment is like a sub-shell, you have to update
# the environment that spawns these sub-shells instead.
import os
os.environ['LgEvalDir'] = "/content/Master_2_MLVC_Recognize_Handwritten_Equation/lgeval/"

In [None]:
! ../lgeval/bin/evaluate /content/Master_2_MLVC_Recognize_Handwritten_Equation/data/lg_output/result /content/Master_2_MLVC_Recognize_Handwritten_Equation/data/lg_gt

[1;30;43mLe flux de sortie a été tronqué et ne contient que les 5000 dernières lignes.[0m
      /content/Master_2_MLVC_Recognize_Handwritten_Equation/data/lg_gt/UN_456_em_735.lg
      ['0', '1', '10', '2', '3', '4', '5', '6', '7', '8', '9']
  >> Comparing UN_456_em_736.lg
  !! IO Error (cannot open): /content/Master_2_MLVC_Recognize_Handwritten_Equation/data/lg_output/result/UN_456_em_736.lg
  !! Inserting ABSENT nodes for:
      /content/Master_2_MLVC_Recognize_Handwritten_Equation/data/lg_output/result/UN_456_em_736.lg vs.
      /content/Master_2_MLVC_Recognize_Handwritten_Equation/data/lg_gt/UN_456_em_736.lg
      ['0', '1', '10', '11', '12', '13', '14', '15', '16', '2', '3', '4', '5', '6', '7', '8', '9']
  !! Inserting ABSENT nodes for:
      /content/Master_2_MLVC_Recognize_Handwritten_Equation/data/lg_output/result/UN_456_em_736.lg vs.
      /content/Master_2_MLVC_Recognize_Handwritten_Equation/data/lg_gt/UN_456_em_736.lg
      ['0', '1', '10', '11', '12', '13', '14', '15', '16

#Accessing results

The folder containing the metrics is created from wherever the evaluate script is called from.

In our case, that'd be the cloned repo's code folder, but we move it to its own folder for convenience's sake.

In [None]:
!mkdir ../Results

In [None]:
!mv ../code/Results_result/* ../Results

Now you can check in ../Results/Summary.txt the results.

In [3]:
!cat ../Results/Summary.txt

LgEval Evaluation Summary
Tue Dec  6 02:39:58 2022

Output File Directory:  /home/antoine/MLCV/Master_2_MLVC_Recognize_Handwritten_Equation/data/lg_output/result
Ground Truth Directory: /home/antoine/MLCV/Master_2_MLVC_Recognize_Handwritten_Equation/data/lg_gt

****  PRIMITIVES   **************************************************************

  Directed   Rate(%)     Total   Correct    Errors    SegErr     ClErr    RelErr
---------------------------------------------------------------------------------
     Nodes      0.37     16621        61     16560
     Edges     86.79    325596    282574     43022     23734      3532     15756

     Total     82.59    342217    282635     59582


Undirected   Rate(%)     Total   Correct    Errors    SegErr     ClErr    RelErr
---------------------------------------------------------------------------------
     Nodes      0.37     16621        61     16560
Node Pairs     81.95 162798.00 133409.00     29389     11867      1766     15756

     Total

## Interpretation
PRIMITIVES  
Nodes = Correct stroke combinations  
Edges = Not sure, so we'ill ignore these. 
Node pairs = Structure of equation: links between strokes (not handled in our study).  
OBJECT  
Object = Hypotheses.  
Classes = Symbols recognized correctly.  
Classes/Det = Symbols recognized correctly within correctly classified hypotheses.  
FILES  
objects = Correct segmentation: for a file, every hypothesis between output and ground truth is correct.   
Classes = Every symbol is predicted correctly in a file.  
class/det = Files segmented correctly, where each symbol is predicted correctly.

## Evaluation

### Summary

As of now, the results are terrible, the only saving grace is that it barely manages to predict segmentations, if we check the line OBJECTS we get this:  
7.05(R)     13.45(P)      9.26(F1)

There are two possible reasons for this failure: The first one being that our models simply aren't performing well and need to be modified or trained differently.  
The other possible reason is that we might have messed up before processing everything as we forgot to set a new threshold insde symbolReco.py, the threshold was 0.05 which is really low meaning that the resulting lg file contained a lot of different hypotheses with a lot of wrong symbols.

Still, that should have been mitigated by the effects of selectBestSeg.py, and also, we have very low scores for the score predictions, which doesn't depend on symbolReco.py but on segmentSelect.py.

Therefore, the most plausible reason is the first one: our models aren't performing well.  

Still, we will attempt to re-train the model by tonight with the correct threshold for segmentReco and see if the results improve.

### Checking the details for one file

In [4]:
!cat  ../Results/Metrics/UN_101_em_0.diff

*N,5,V_,1.0,:vs:,+,1.0
*N,6,ABSENT,1.0,:vs:,x,1.0
*N,2,ABSENT,1.0,:vs:,2,1.0
*N,9,gt,1.0,:vs:,-,1.0
*N,0,9,1.0,:vs:,x,1.0
*N,10,ABSENT,1.0,:vs:,1,1.0
*N,7,I_,1.0,:vs:,x,1.0
*N,3,V_,1.0,:vs:,M,1.0
*N,4,V_,1.0,:vs:,+,1.0
*N,1,lt,1.0,:vs:,x,1.0
*N,8,gt,1.0,:vs:,M,1.0
*E,6,7,_,1.0,:vs:,x,1.0
*E,7,6,_,1.0,:vs:,x,1.0
*E,4,6,_,1.0,:vs:,Right,1.0
*E,5,6,_,1.0,:vs:,Right,1.0
*E,9,10,_,1.0,:vs:,Right,1.0
*E,2,3,_,1.0,:vs:,Right,1.0
*E,0,2,_,1.0,:vs:,Sup,1.0
*E,1,2,_,1.0,:vs:,Sup,1.0
*E,6,8,_,1.0,:vs:,Sup,1.0
*E,3,5,V_,1.0,:vs:,_,1.0
*E,3,4,V_,1.0,:vs:,_,1.0
*E,5,3,V_,1.0,:vs:,_,1.0
*E,5,4,V_,1.0,:vs:,+,1.0
*E,4,3,V_,1.0,:vs:,_,1.0
*E,4,5,V_,1.0,:vs:,+,1.0
*E,9,8,gt,1.0,:vs:,_,1.0
*E,8,9,gt,1.0,:vs:,Right,1.0
*E,5,7,_,1.0,:vs:,Right,1.0
*E,4,7,_,1.0,:vs:,Right,1.0
*E,7,8,_,1.0,:vs:,Sup,1.0
*E,0,5,_,1.0,:vs:,Right,1.0
*E,0,4,_,1.0,:vs:,Right,1.0
*E,0,1,_,1.0,:vs:,x,1.0
*E,1,5,_,1.0,:vs:,Right,1.0
*E,1,4,_,1.0,:vs:,Right,1.0
*E,1,0,_,1.0,:vs:,x,1.0
*S,3,5
*S,3,4
*S,5,3
*S,4,3
*S,9,8
*S,8,9
*S,7,6
*

Ignoring the relations (hypotheses starting with E or S), we can see that the model's predictions for the symbols aren't even close to being correct, although it "only" missed three stroke combinations out of 11.