Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example when explaining regular expressions for "Place search" #67

Closed
tutebatti opened this issue Jan 24, 2022 · 16 comments
Closed

Example when explaining regular expressions for "Place search" #67

tutebatti opened this issue Jan 24, 2022 · 16 comments
Assignees
Labels
discussion help wanted Extra attention is needed

Comments

@tutebatti
Copy link
Collaborator

In the current example in the info text for the search of places, one reads:

The search field supports JavaScript-style regular expressions. For example, to search for locations with an Arabic definite article, the query \ba([tdrzsṣḍṭẓln]|[tds]h)- can be used.

If I understand correctly from the list of places, we do not use the DMG notation for Arabic articles (cf. https://de.wikipedia.org/wiki/DIN_31635). That example makes little sense, then. Any better suggestions. @rpbarczok, you probably no the data itself better than @mfranke93?

@tutebatti tutebatti added help wanted Extra attention is needed discussion labels Jan 24, 2022
@mfranke93
Copy link
Collaborator

I think at the time I wrote that example, at least some places did. I don't think any do anymore. The example might be too complex for casual users anyways, but I thought it was neat to demonstrate what it could be used for ;) feel free to do some simplification here. Maybe this doesn't have to be so detailed, and we can link here for power users that want to do more than normal text search.

@mfranke93
Copy link
Collaborator

mfranke93 commented Jan 24, 2022

By the way: The code that sorts the place names alphabetically uses that exact RegEx to this day so that the Arabic definite articles don't affect the sorting, both for the visualization and the reports. This is also why this, and the initial apostrophe, are aligned differently in the location list.

@tutebatti
Copy link
Collaborator Author

I could think of something simple like

searching for Bagh?dad would find "Bagdad" as well as "Baghdad", because h followed by ? matches zero or exactly one h.

Linking to the external documentation is good, too.

Btw, why is searching Bagdad finding "Baghdad" already now? As far as I can see, the former is not listed under "alternative names".

@mfranke93
Copy link
Collaborator

I could think of something simple like

searching for Bagh?dad would find "Bagdad" as well as "Baghdad", because h followed by ? matches zero or exactly one h.

I like it!

Btw, why is searching Bagdad finding "Baghdad" already now? As far as I can see, the former is not listed under "alternative names".

Because "Bagdad" appears in the simplified column of the alternative names (name_var) table. That is one of the places searched. Why the place search claims "external URI matches" is beyond me though. That is a bug (#68).

@tutebatti
Copy link
Collaborator Author

I like it!

👍

Because "Bagdad" appears in the simplified column of the alternative names (name_var) table.

But these simplified names are not displayed in the tooltip?

@tutebatti
Copy link
Collaborator Author

I'm not sure if there's a misunderstanding, but I cannot see any section or something similar entitled transcription.
grafik

@mfranke93
Copy link
Collaborator

mfranke93 commented Jan 24, 2022

transcription is a column for alternative names. The primary name of a place is always transcribed already, but for alternative names, it could for example be in Arabic script, and then the transcription would provide a "European-readable" version of the name. If you look at the URI page for Baghdad, it is what is written in parentheses in the Arabic name variant (بغداد). This also appears in reports. There is no such section here because it is not an attribute of the place itself.

@tutebatti
Copy link
Collaborator Author

In other words, there is a match when searching because the term matches the simplified transcription of an alternative name?

At any rate, I will discuss this with @rpbarczok. I'm not sure how much of this behavior must be made transparent to the visitor who has no access to the db itself, but can only see the tooltip or the URI page which does not provide the simplified transcription either.

@mfranke93
Copy link
Collaborator

In other words, there is a match when searching because the term matches the simplified transcription of an alternative name?

Yes. See https://github.tik.uni-stuttgart.de/frankemx/damast/issues/64

@tutebatti
Copy link
Collaborator Author

Ok. As @rpbarczok told me, simplified should be mostly consistent in that it represents (i.e., at least one of strings in simplified represents) a "normalized" form of transcription. It is sufficient to make that transparent to the user.

(It would be preferable, of course, if the transcription was automatically normalized according to given patterns and the results stored in a separate column. Apparently, this is not (easily) implementable. Entering simplified transcriptions manually is prone to errors.)

@rpbarczok
Copy link

I forgot to mention that we also add an english simplified transcription in the simplified table (e.g. gh, kh, j, sh etc.). Usually we use the simplified english transcription as the main name, but in the case that there is more than one Arabic variant. E.g. in the case of al-Ahsa. For the name variant هجر, we give the transcript Haǧar, and the simplified forms Hagar and Hajar.

@rpbarczok rpbarczok reopened this Jan 25, 2022
@mfranke93 mfranke93 added this to To do in Public Instance at HU via automation Jan 26, 2022
@mfranke93 mfranke93 removed their assignment Jan 26, 2022
@tutebatti
Copy link
Collaborator Author

tutebatti commented Jan 26, 2022

but in the case that there is more than one Arabic variant

@rpbarczok, you mean "but only in the case that"...?

What is more, I'm not sure what to tell the user regarding what you stated.

@mfranke93
Copy link
Collaborator

Just my 2 cents: We included this originally to make the search a bit more powerful and also forgiving. So, we wouldn't have to type names exactly (with the ǧ etc.), but could use a Latin g. Since this is quite hard to do only in software (there are a lot of letters with diacritics, hard not to miss some, ...) we decided it would be good to save the typical "latinified" names in the database. In my opinion, this is an implementation detail users do not need to know about at all. The only thing to communicate here would be that the search box is a bit more forgiving regarding exact spelling (or accepts variant spellings of places).

@rpbarczok
Copy link

I am sorry, the sentence was mutilated when editing it. What I mean is: For Arabic and other forms, we usually have one transcribed form in the transcription system of the DMG. Additional, we save the basic form of the letters in the simplified forms. We later decided also to include the simplified english trancription. So basically you can inform that the user usually should find a place also by entering the basic forms of the letters and by looking for a simplified English transcription, e.g. Hajar.

@tutebatti
Copy link
Collaborator Author

The only thing to communicate here would be that the search box is a bit more forgiving regarding exact spelling (or accepts variant spellings of places).

I might not be the average user in that case, but I would want to know how the search works exactly and how I can reproduce results. But I will certainly find an explanation (which you will correct, if necessary) for the current behavior. This is already pretty good:

So basically you can inform that the user usually should find a place also by entering the basic forms of the letters and by looking for a simplified English transcription, e.g. Hajar.

Public Instance at HU automation moved this from To do to Done Jan 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion help wanted Extra attention is needed
Projects
No open projects
Development

No branches or pull requests

3 participants