Separate Demo and Casework Simulation Use Cases #15

stephaniereinders · 2024-07-26T15:05:06Z

The Problem

Amy Crawford found that cluster templates with K=40 clusters yield the highest accuracy when the number of known writers is 100+. The drawback is that fitting a model and analyzing questioned documents with this many known writers takes several hours, so this scenario is unsuitable for a handwriter demo.

I tried using a template with K=40 clusters and only 5 known writers. Fitting the model was much faster, but the accuracy on the questioned documents was terrible: 0%! The model training documents only used 22 of the K=40 clusters, but around 1/6 of the graphs in the questioned documents fell into the other 18 clusters. This meant that a large portion of the questioned documents' data was thrown out and not used to estimate the questioned writers' profiles.

I found that if I used a smaller cluster template with K=5 or K=8 clusters, and 5 known writers, the model achieved 100% accuracy on two questioned documents. However, the model tanked on the 20 other questioned documents that I tested. We could use this small cluster template, these known writers, and the 2 "successful" questioned documents as handwriter demo data because the model and analysis run quite fast. However, we don't want to allow users to use the small cluster template on other data because the results are likely to be quite poor.

The Solution

Give users the option to see a demonstration of handwriter with data that we provide them or allow users to analyze their own data to simulate casework.

Option 1: Demo

Handwriter will use the small cluster template, the 5 known writers, and the 2 questioned documents. Users won't have the option to select their own data.

Option 2: Casework Simulation

Handwriter will use a template with K=40 clusters. Users will upload their own data but be required to use at least 100 known writers. We can also include a link to the CSAFE Handwriting Database if they would like to download data.

stephaniereinders linked a pull request Jul 30, 2024 that will close this issue

15 demo casework #16

Merged

stephaniereinders closed this as completed in #16 Jul 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate Demo and Casework Simulation Use Cases #15

Separate Demo and Casework Simulation Use Cases #15

stephaniereinders commented Jul 26, 2024

Separate Demo and Casework Simulation Use Cases #15

Separate Demo and Casework Simulation Use Cases #15

Comments

stephaniereinders commented Jul 26, 2024

The Problem

The Solution

Option 1: Demo

Option 2: Casework Simulation