Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate Demo and Casework Simulation Use Cases #15

Closed
stephaniereinders opened this issue Jul 26, 2024 · 0 comments · Fixed by #16
Closed

Separate Demo and Casework Simulation Use Cases #15

stephaniereinders opened this issue Jul 26, 2024 · 0 comments · Fixed by #16

Comments

@stephaniereinders
Copy link
Member

The Problem

Amy Crawford found that cluster templates with K=40 clusters yield the highest accuracy when the number of known writers is 100+. The drawback is that fitting a model and analyzing questioned documents with this many known writers takes several hours, so this scenario is unsuitable for a handwriter demo.

I tried using a template with K=40 clusters and only 5 known writers. Fitting the model was much faster, but the accuracy on the questioned documents was terrible: 0%! The model training documents only used 22 of the K=40 clusters, but around 1/6 of the graphs in the questioned documents fell into the other 18 clusters. This meant that a large portion of the questioned documents' data was thrown out and not used to estimate the questioned writers' profiles.

I found that if I used a smaller cluster template with K=5 or K=8 clusters, and 5 known writers, the model achieved 100% accuracy on two questioned documents. However, the model tanked on the 20 other questioned documents that I tested. We could use this small cluster template, these known writers, and the 2 "successful" questioned documents as handwriter demo data because the model and analysis run quite fast. However, we don't want to allow users to use the small cluster template on other data because the results are likely to be quite poor.

The Solution

Give users the option to see a demonstration of handwriter with data that we provide them or allow users to analyze their own data to simulate casework.

Option 1: Demo

Handwriter will use the small cluster template, the 5 known writers, and the 2 questioned documents. Users won't have the option to select their own data.

Option 2: Casework Simulation

Handwriter will use a template with K=40 clusters. Users will upload their own data but be required to use at least 100 known writers. We can also include a link to the CSAFE Handwriting Database if they would like to download data.

@stephaniereinders stephaniereinders linked a pull request Jul 30, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant