-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
question if you know of a local_alignment implementation on sequences of letters #3
Comments
I have some difficulty understanding exactly what the |
For a linkage project I did quite a while back, I ran into the problem that string distances for some dutch street names can be quite large. For example:
Don't know if this is a similar problem you are running into. I tried to solve this first by standardisation of terms as |
textreuse::align_local uses Smith-Waterman algorithm https://en.wikipedia.org/wiki/Smith%E2%80%93Waterman_algorithm for finding sequences of words but instead on sequences of ACGT letters it does it on words. Thanks for the input on these abbreviations. Problem I'm having with medieval / 18th/19th texts is that the there did not exist a lot of standardisation of names at that time. |
Thanks for the input Jan. Closing as it looks like textreuse::align_local as a similarity metric does not exist for sequences of letters instead. Will implement it myself. |
Ok. Succes. If you have an implementation, I can imagine @markvanderloo might be interested to include the measure in the stringdist package. |
FYI. |
Thanks for releasing this package. I'm using it to match person names from 18th-19th century persons in Bruges as well as street addresses from medieval documents (which were extracted from image scans as well as some OCR-ed images/documents) to an existing set of street addresses and person names.
About the functions to find similarities, do you know if something similar exists as textreuse::align_local but instead of working on a sequence of words on sequence of characters and a similarity metric instead of a distance metric
Adding @lmullen (author of textreuse), @markvanderloo (author of stringdist) and @djvanderlaan (author of reclin) just in case someone has pointers.
thanks for any feedback
The text was updated successfully, but these errors were encountered: