You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Amy Crawford found that cluster templates with K=40 clusters yield the highest accuracy when the number of known writers is 100+. The drawback is that fitting a model and analyzing questioned documents with this many known writers takes several hours, so this scenario is unsuitable for a handwriter demo.
I tried using a template with K=40 clusters and only 5 known writers. Fitting the model was much faster, but the accuracy on the questioned documents was terrible: 0%! The model training documents only used 22 of the K=40 clusters, but around 1/6 of the graphs in the questioned documents fell into the other 18 clusters. This meant that a large portion of the questioned documents' data was thrown out and not used to estimate the questioned writers' profiles.
I found that if I used a smaller cluster template with K=5 or K=8 clusters, and 5 known writers, the model achieved 100% accuracy on two questioned documents. However, the model tanked on the 20 other questioned documents that I tested. We could use this small cluster template, these known writers, and the 2 "successful" questioned documents as handwriter demo data because the model and analysis run quite fast. However, we don't want to allow users to use the small cluster template on other data because the results are likely to be quite poor.
The Solution
Give users the option to see a demonstration of handwriter with data that we provide them or allow users to analyze their own data to simulate casework.
Option 1: Demo
Handwriter will use the small cluster template, the 5 known writers, and the 2 questioned documents. Users won't have the option to select their own data.
Option 2: Casework Simulation
Handwriter will use a template with K=40 clusters. Users will upload their own data but be required to use at least 100 known writers. We can also include a link to the CSAFE Handwriting Database if they would like to download data.
The text was updated successfully, but these errors were encountered:
The Problem
Amy Crawford found that cluster templates with K=40 clusters yield the highest accuracy when the number of known writers is 100+. The drawback is that fitting a model and analyzing questioned documents with this many known writers takes several hours, so this scenario is unsuitable for a handwriter demo.
I tried using a template with K=40 clusters and only 5 known writers. Fitting the model was much faster, but the accuracy on the questioned documents was terrible: 0%! The model training documents only used 22 of the K=40 clusters, but around 1/6 of the graphs in the questioned documents fell into the other 18 clusters. This meant that a large portion of the questioned documents' data was thrown out and not used to estimate the questioned writers' profiles.
I found that if I used a smaller cluster template with K=5 or K=8 clusters, and 5 known writers, the model achieved 100% accuracy on two questioned documents. However, the model tanked on the 20 other questioned documents that I tested. We could use this small cluster template, these known writers, and the 2 "successful" questioned documents as handwriter demo data because the model and analysis run quite fast. However, we don't want to allow users to use the small cluster template on other data because the results are likely to be quite poor.
The Solution
Give users the option to see a demonstration of handwriter with data that we provide them or allow users to analyze their own data to simulate casework.
Option 1: Demo
Handwriter will use the small cluster template, the 5 known writers, and the 2 questioned documents. Users won't have the option to select their own data.
Option 2: Casework Simulation
Handwriter will use a template with K=40 clusters. Users will upload their own data but be required to use at least 100 known writers. We can also include a link to the CSAFE Handwriting Database if they would like to download data.
The text was updated successfully, but these errors were encountered: