Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend "Blasto" by the language Latin #1

Open
bi3mw opened this issue Aug 18, 2021 · 11 comments
Open

Extend "Blasto" by the language Latin #1

bi3mw opened this issue Aug 18, 2021 · 11 comments

Comments

@bi3mw
Copy link

bi3mw commented Aug 18, 2021

Can you extend your program "Blasto" by the language Latin ? The program would be very useful for deciphering old manuscripts then.

@Merricx
Copy link
Owner

Merricx commented Aug 19, 2021

I can if I have the data of Latin quadgram dataset, but unfortunately I don't have it.
If you have, maybe you can send the quadgram data here and I will add it to language support.

@bi3mw
Copy link
Author

bi3mw commented Aug 20, 2021

Unfortunately, I also do not have a corresponding dataset and do not know how one could create this from a Latin corpus. I have made a request in a forum if someone knows a source and I will get back to you if there is any feedback.

@marcoponzi
Copy link

I put this together from a ~2.5M words corpus. Can it be useful?

https://drive.google.com/file/d/1ZX0Fu3rWREViVayVat_1myvSJq8lk2u-/view

@Merricx
Copy link
Owner

Merricx commented Aug 23, 2021

Nice one, let me check that if it can be implemented

@marcoponzi
Copy link

marcoponzi commented Aug 23, 2021

Something that occurred to me after I created that file: medieval Latin typically represented 'u' and 'v' with the same character. Should this be 'simulated' in the quadgrams by replacing u with v or vice-versa in at least part of the corpus? Maybe duplicating those lines so that they appear both with distinct u/v and a single character?
Similarly, in medieval Latin, combinations like 'ae' and 'oe' were often written as they are pronounced: just 'e'. Would it be useful to also manage this in the same way as proposed for u/v?

@Merricx
Copy link
Owner

Merricx commented Aug 23, 2021

I think duplicating those characters as different line is the easiest way, although the quadgram size will be bigger.
If we replace 'u' with 'v' or vice versa, we should replace the "possible" plaintext from the ciphertext that we are trying to crack with the corresponding character and it can cause some miss accuracy to the correct plaintext.

@marcoponzi
Copy link

An updated version of the file, where I added the replacement of AE/OE with E and of V with U. This of course results in additional quadgrams (about 1% more lines).

https://drive.google.com/file/d/1F3R1byY_63bS4H6TLssn3PieUthCNxwc/view?usp=sharing

@zbelanger
Copy link

I am also interested in the implementation of Latin in Blasto. Does it look like this can happen?

@bi3mw
Copy link
Author

bi3mw commented Oct 2, 2022

Is there any progress yet in the implementation of Latin ?

@bi3mw
Copy link
Author

bi3mw commented Jan 8, 2024

I am also interested in the implementation of Latin in Blasto. Does it look like this can happen?

If you are still interested, here is the mini version with Latin support:
https://www.dropbox.com/scl/fi/y066ahjjsccnpc8z9knpu/subst_solver_latin.zip?rlkey=l5en2nl8lps3rgv1rjiv6ln84&dl=1

@zbelanger
Copy link

zbelanger commented Jan 8, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants