Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow specifying audio context that manual lookup can associate with #136

Open
Calvin-Xu opened this issue Nov 23, 2022 · 4 comments
Open
Labels
enhancement New feature or request

Comments

@Calvin-Xu
Copy link
Contributor

(moved from #133)

Currently Memento does not seem to support associating an OCR (or manual lookup) result with a certain stretch of audio in the absence of a subtitle file, which feels like neglecting Memento's most important feature.

I think it would be great if Memento allows the user to choose a part of the current video as the current context. Some DWIM behavior I imagine is:

  • allow the user to continually mark points in the video with a hot key
    • when adding a card:
      • if there are no points past current playback time: extract context between the last two set points
      • if current playback time is between two points: extract context between the two points

Also this might not be possible, but I wonder if Memento could use the timing information from sub-seek to determine the current {sentence}. This feels inflexible and might be expensive, so I don't really know.

@Calvin-Xu
Copy link
Contributor Author

Calvin-Xu commented Nov 24, 2022

(responding to #133 (comment))

It makes sense to me that manual search may be associated with an audio clip if raster subtitles were OCR'd.

Yes. This is actually what I tested OCR with.

Outside of that, OCR will usually be associated with visual context as opposed to audio context.

You are totally correct. I realized I am actually more envisioning this for manual lookup. Because Memento currently sends OCR results to lookup I decided to talk about them together.

My imagined main use case actually pertains to consuming content that does not have subtitles at all. I have encountered the following two scenarios:

  1. Video with no subtitles that I understand 90%. By listening closely and typing out what I think I heard into a web search engine, I eventually transcribe the sentence with new vocab / usage.

  2. Video with no timed subtitles, but (partial) transcript is available in some other form. Examples include songs on YouTube that often don't have subtitles but the lyrics can be easily looked up, TV news that show a slightly altered version of the talking points on screen (OCR helps here), etc.

The reason I wanted to add OCR in the first place was due to Evangelion episode 14 using cards of text throughout to communicate information. The second use case I found after implementing it was using this https://github.com/Dudemanguy/mpv-manga-reader to turn Memento into a manga reader. For both of these cases, I don't see the benefit of extracting audio from the content.

I agree that in cases where visual context is detached from audio context this does not work. Perhaps I'll be having multiple profiles for lookup w/ context, lookup w/o context, etc. I want to know what you think.

@ripose-jp ripose-jp added the enhancement New feature or request label Nov 24, 2022
@ripose-jp
Copy link
Owner

Now that I understand the use case for this feature, my next concern is how to explain it to the user. My philosophy when designing Memento has been to try and keep everything as self explanatory as possible. I don't want Memento to become a piece of software you need a guide to use.

Yes. This is actually what I tested OCR with.

In this case {audio-media} should work, but {audio-context} will fail. I don't think this is worth fixing since this is expected behavior given the descriptions of the two features.

@Calvin-Xu
Copy link
Contributor Author

Calvin-Xu commented Nov 24, 2022

In this case {audio-media} should work, but {audio-context} will fail. I don't think this is worth fixing since this is expected behavior given the descriptions of the two features.

I think this makes enough sense.

my next concern is how to explain it to the user

I think a number of video players including mpv support setting some kind of A-B loop (default keybind l) with certain established DWIM behavior:

--ab-loop-a=, --ab-loop-b=
Set loop points. If playback passes the b timestamp, it will seek to the a timestamp. Seeking past the b point doesn't loop (this is intentional).
If a is after b, the behavior is as if the points were given in the right order, and the player will seek to b after crossing through a. This is different from old behavior, where looping was disabled (and as a bug, looped back to a on the end of the file).
If either options are set to no (or unset), looping is disabled. This is different from old behavior, where an unset a implied the start of the file, and an unset b the end of the file.
The loop-points can be adjusted at runtime with the corresponding properties. See also ab-loop command.
https://mpv.io/manual/stable/#options-ab-loop-a

though I agree it is not always the most intuitive feature.

I think providing a new marker like {audio-selection} will be a good move, and any explanation that it needs can be there for those that want to use it.

@ripose-jp
Copy link
Owner

I think providing a new marker like {audio-selection} will be a good move, and any explanation that it needs can be there for those that want to use it.

I'm satisfied with this. I'm not sure when I'll get this done, but I have enough to go off of now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants