Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audio often doesn't match when only using parts of files via secondsToProcess and startAtSecond #191

Closed
cjmanca opened this issue Jul 16, 2022 · 7 comments

Comments

@cjmanca
Copy link

cjmanca commented Jul 16, 2022

Describe the bug
While using:
.From("/path/to/file", 123.4, 123.4, MediaType.Audio)

There is a much lower match rate than just using:
.From("/path/to/file", MediaType.Audio)

This affects both "BuildQueryCommand" and BuildFingerprintCommand. Using partial times with just one or the other doesn't have as much of an effect, but using with both at the same time reduces the matches to abysmal levels.

I've verified that the matches when using the non-partial version are within the segments that I'm restricting to (both in the track side and query side), but they just don't match when restricted to partial durations.

I'm using the InMemoryStorage, since I don't need to keep the fingerprints around for future use.

I'm not sure if it makes a difference, but my use case is picking out matching duplicates from longer streams (in this case - finding intros and credits in tv shows, so I'm trying to fingerprint the last few minutes of every show and then query each show against those fingerprints to find the timing of the credits, for instance).

I wanted to take a look at solving this myself, but it seems that the Emy package isn't open source, and the ffmpeg is part of that, so I can't really trace it through.

  • OS: Windows 10
@cjmanca
Copy link
Author

cjmanca commented Jul 16, 2022

Extra info: I rewrote my code to use ffmpeg to cut the video for the time I'm interested in into separate files, and fingerprint/query off those and it finds the matches perfectly using the same timestamps that I was passing in to the secondsToProcess and startAtSecond. There's definitely an issue there.

This is working for now, but I'd really rather not have to create these temporary video files just to fingerprint.

@AddictedCS
Copy link
Owner

I tried to reproduce it with a sample example but I couldn't.

The following example fingerprints the entire file, then queries only 15 seconds starting at 10'th second.

public static class Issue191
{
    public static async Task<AVQueryResult> Reproduce()
    {
        var modelService = new InMemoryModelService();
        var mediaService = new FFmpegAudioService();

        string file = "path to a 30 seconds file";
        var hashes = await FingerprintCommandBuilder.Instance
            .BuildFingerprintCommand()
            .From(file)
            .UsingServices(mediaService)
            .Hash();

        var track = new TrackInfo("1", string.Empty, string.Empty);
        
        modelService.Insert(track, hashes);

        var results = await QueryCommandBuilder.Instance
            .BuildQueryCommand()
            .From(file, secondsToProcess: 15, startAtSecond: 10)
            .UsingServices(modelService, mediaService)
            .Query();
        
        return results;
    }
}

I get the result as expected with the following properties:

TrackCoverageWithPermittedGapsLength = 14.30
TrackMatchStartsAt = 10.03

Same holds if I fingerprint just a portion of the file:

var hashes = await FingerprintCommandBuilder.Instance
            .BuildFingerprintCommand()
            .From(file, secondsToProcess: 15, startAtSecond: 10)
            .UsingServices(mediaService)
            .Hash();

In this case, I get full matches, with

TrackCoverageWithPermittedGapsLength = 15
TrackMatchStartsAt = 0

@AddictedCS
Copy link
Owner

Can you provide a sample example with expected and actual results?

@cjmanca
Copy link
Author

cjmanca commented Jul 18, 2022

Mmm... I'd have to hunt for some that wouldn't be a copyright infringement, since the ones I'm using are TV shows ripped from dvd. It's not every file either. Some match fine, but some are "problem files" which don't match at all. Although, I have seen some files which "sometimes match" unreliably too when using startSeconds, but usually if it doesn't match, it'll never match.

Most of my library is mkv for the container. Audio codecs vary though, I'll see if I can see a commonality for codec type.

It may be something to do with the encoding perhaps, since normally I get back 400+ items in the results when querying the whole file against the whole series (most matches are ~1 second though), but on the files where it won't find matches, it returns no results at all, not even short ones. Those same files work fine when passing the full file in though. It only returns no results when using the startSeconds option.

@cjmanca
Copy link
Author

cjmanca commented Jul 20, 2022

I see that there are logging outputs in your source, but I can't see a way to access them. How do I turn on logging?

@AddictedCS
Copy link
Owner

Both QueryCommandBuilder and FingerprintCommandBuilder can be instantiated with ILoggerFactory:

var queryCommandBuilder = new QueryCommandBuilder(loggerFactory);

Once provided it will start logging in the project configured output.

@AddictedCS
Copy link
Owner

A similar bug has been described in #207 (not exactly the same scenario but a particular use case when both Audio | Video fingerprints are generated during query).
A fix has been provided in v8.24.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants