As a user, I want to search on metadata and transcriptions together so that I can find records by description or content. #10

rlskoeser · 2020-10-05T21:40:17Z

No description provided.

kmcelwee · 2020-10-14T19:16:11Z

Intersection of Cambridge PGPIDs that have links for which we have transcripts. Ignores 'b' and '-deleted' files.
cam-link-transcription-pgpids.txt

^^ Test set for converting transcriptions in search prototype

Basic tei transcription conversion & search #10

rlskoeser · 2020-11-03T21:22:43Z

The search prototype now includes text from the TEI transcriptions in the index. It's a fairly "dumb" implementation, just as a first pass so we can start searching on metadata & transcriptions together; I'm including the labels from the TEI as well as transcription text, and Solr doesn't know what language the transcriptions are, so it's treating them like English text for now. (In particular, this means stemming won't work, and tokenization may not work properly in all cases.)

When your search terms match transcription text, you should see a line of context with your search term highlighted.

Try searching on transcription terms alone and in combination with metadata searches.

sluescher · 2020-11-04T20:54:55Z

I do see the line of context. I can search for terms and metadata and it shows fine. However, I am still puzzled behind the way it orders results.

For example. "מרכב illness bedbound Ramle". The first word is from the transcription, illness and bedbound, and Ramle tags or in the description, however the first two results have no Hebrew (the result I was looking for is result #3)

rlskoeser · 2020-11-04T21:50:40Z

The relevance score between the first few is not very different; I think the exact match on the short tag is continuing to be scored as "more relevant" vs the Hebrew term occurring once in a larger text field. Perhaps it's exaggerated because Ramle is a less common term. (That seems to be the case from what I can tell.)

You can use boolean operators, like "מרכב AND illness bedbound Ramle"; you can also try exact phrase and proximity searching within the transcriptions.

sluescher · 2020-11-09T13:18:50Z

Good morning!

I tried searching for words split across two lines and it works (or I assume it does since i only see one line in the result, but the right document appears.

In one case, no line appeared at all. This search returns the document I want at the bottom T-S NS J24 with the two words going from line 23-24. However, no transcription appears.

When I search for the shelfmark plus the two words (not in quotation marks), the text highlighted in the Hebrew script line is only one of the two words. I am assuming it looks for the first match? When I search with the Hebrew script in quotation marks, again no line appears?

Another thing that is somewhat annoying, when switching to Hebrew script (or Arabic), the direction of writing changes and adding quotation marks or * and other characters becomes a pain as it adds it to the wrong side. This is an issue that google search still has, so I am not sure whether it's fixable, but thought I should mention it.

Boolean operators work well too. No negative reports.

rlskoeser · 2020-11-09T20:09:56Z

I noticed the direction of writing change when trying to input Hebrew search terms as well! It is indeed annoying. I'm going to put this our list of questions coming out of the prototyping, hopefully we can get @gissoo to do some work on a better solution for the future.

Right now the transcription search is across lines but the highlighting is only individual lines. I did fix it so it shows up to three lines of matching context now instead of just one — but the exact phrase that runs across highlighting won't ever match a single line, which I think explains what you're seeing.

The reason I limited the highlighting to single lines was because, with whitespace preserved, the highlighted term context could be quite large for some records (many short lines). I think it's ok for now, but I'm going to add this to the prototype questions documents too.

sluescher · 2020-11-09T20:51:24Z

Sounds good! Happy to sign off

rlskoeser added the 🛠️ chore One-off task or update label Oct 5, 2020

rlskoeser assigned rlskoeser and kmcelwee Oct 14, 2020

rlskoeser changed the title ~~incorporate transcription data from TEI in search prototype~~ As a user, I want to search on metadata and transcriptions together so that I can find records by description or content. Oct 28, 2020

rlskoeser added 🧪 experiment Prototypes that support future work and removed 🛠️ chore One-off task or update labels Oct 28, 2020

rlskoeser added a commit that referenced this issue Nov 3, 2020

Merge pull request #25 from Princeton-CDH/experiment/tei-transcriptions

ac6559d

Basic tei transcription conversion & search #10

rlskoeser added the 🗜️ awaiting testing Implemented and ready to be tested label Nov 3, 2020

sluescher closed this as completed Nov 10, 2020

gissoo mentioned this issue Nov 11, 2020

Design a way to navigate to a cluster from the search results so the user can view a specific cluster and learn more about the documents it contains #32

Closed

rlskoeser removed the 🗜️ awaiting testing Implemented and ready to be tested label Nov 13, 2020

richmanrachel mentioned this issue Dec 16, 2021

As a user I would like to see transcription excerpts in my search results so I can tell which records have a transcription and can see some of the content. #299

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

As a user, I want to search on metadata and transcriptions together so that I can find records by description or content. #10

As a user, I want to search on metadata and transcriptions together so that I can find records by description or content. #10

rlskoeser commented Oct 5, 2020

kmcelwee commented Oct 14, 2020

rlskoeser commented Nov 3, 2020

sluescher commented Nov 4, 2020 •

edited

rlskoeser commented Nov 4, 2020

sluescher commented Nov 9, 2020

rlskoeser commented Nov 9, 2020

sluescher commented Nov 9, 2020

As a user, I want to search on metadata and transcriptions together so that I can find records by description or content. #10

As a user, I want to search on metadata and transcriptions together so that I can find records by description or content. #10

Comments

rlskoeser commented Oct 5, 2020

kmcelwee commented Oct 14, 2020

rlskoeser commented Nov 3, 2020

sluescher commented Nov 4, 2020 • edited

rlskoeser commented Nov 4, 2020

sluescher commented Nov 9, 2020

rlskoeser commented Nov 9, 2020

sluescher commented Nov 9, 2020

sluescher commented Nov 4, 2020 •

edited