Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NetworkStreamNmeaReceiver not keeping up when parsing messages #160

Open
pckv opened this issue Oct 12, 2022 · 13 comments
Open

NetworkStreamNmeaReceiver not keeping up when parsing messages #160

pckv opened this issue Oct 12, 2022 · 13 comments

Comments

@pckv
Copy link

pckv commented Oct 12, 2022

We've been having issues where the network stream keeps stopping occasionally for a few seconds until it resumes. There are no logged error messages, and when it resumes the stream is as fast as before, so we're losing a lot of data. Eventually, after several minutes, it results in Corvus failing with the following error message:

System.ArgumentOutOfRangeException: The value needs to translate in milliseconds to -1 (signifying an infinite timeout), 0, or a positive integer less than or equal to the maximum allowed timer duration. (Parameter 'delay')
    at System.Threading.Tasks.Task.ValidateTimeout(TimeSpan timeout, ExceptionArgument argument)
    at System.Threading.Tasks.Task.Delay(TimeSpan delay)
    at Corvus.Retry.Async.SleepService.DefaultSleepService.SleepAsync(TimeSpan timeSpan)
    at Corvus.Retry.Retriable.RetryAsync(Func`1 asyncFunc, CancellationToken cancellationToken, IRetryStrategy strategy, IRetryPolicy policy, Boolean continueOnCapturedContext)
    at AisLog.Services.AisLogService.ExecuteAsync(CancellationToken cancellationToken) in /source/Services/AisLogService.cs:line 102
    at Microsoft.Extensions.Hosting.Internal.Host.TryExecuteBackgroundServiceAsync(BackgroundService backgroundService)

This started happening a few weeks ago on our local development setups, and just recently it's also affecting our production server. CPU load is rarely high on either setup (<1%).

I've conducted some tests of running the receiver in an isolated environment, counting the number of messages received in a minute.

using Ais.Net.Receiver.Receiver;

var receiver = new NetworkStreamNmeaReceiver("153.44.253.27", 5631, TimeSpan.FromSeconds(0.5));
var receiverHost = new ReceiverHost(receiver);

var count = 0;

receiverHost.Sentences.Subscribe(_ => count++);
//receiverHost.Messages.Subscribe(_ => {});

await Task.WhenAny(receiverHost.StartAsync(), Task.Delay(TimeSpan.FromMinutes(1)));

Console.WriteLine(count);

Here are results of 5 runs from my laptop:

$ dotnet run -> 2473
$ dotnet run -> 2410
$ dotnet run -> 2402
$ dotnet run -> 2400
$ dotnet run -> 2401

Subscribing to ReceiverHost.Sentences seemingly works fine, as every test I've run has resulted in between 2400-2500 messages.

However, uncommenting receiverHost.Messages.Subscribe(_ => {}); thus also parsing the messages using a subscription, I've been getting mixed results on the sentence count. Another 5 runs of this new code on my laptop:

$ dotnet run -> 743
$ dotnet run -> 499
$ dotnet run -> 935
$ dotnet run -> 837
$ dotnet run -> 373

I've also tried only subscribing to messages and counting those which results in similar numbers. Running tests with something like receiverHost.Sentences.Subscribe(Console.WriteLine); I can also inspect that subscribing to Sentences results in a steady stream of messages, whereas when subscribing to Messages the stream pauses multiple times.

The reason I've been specifying laptop is that it seems to run fine (at the moment) on my desktop machine, which understandably is more powerful.

Laptop: Dell XPS 13 9310, Linux Ubuntu 22.04.1 LTS, Intel Core i7-1185G7, 16 GB RAM
Desktop: Windows 10, AMD Ryzen 5600X, 32 GB RAM
Production server: Virtual LXC Server, Linux Ubuntu 20.04 LTS, 2 vCPUs (unsure but might be Intel Core i7-3770), 2 GB RAM

What could be the cause of this? Consiering Ais.Net reportedly can handle more than a million sentences per minute, I'd assume this is a problem in the receiver.

@HowardvanRooijen
Copy link
Contributor

Hi! Thanks for the details. What network stream source are you listening to?

@pckv
Copy link
Author

pckv commented Oct 12, 2022

Hello and thanks for the quick response! I'm listening to the same source as used in my test code, so 153.44.253.27:5631, which is the public Norwegian AIS stream.

@HowardvanRooijen
Copy link
Contributor

That can definitely be "less than reliable" - I've seen issues with this endpoint via a number of different scenarios - the machine hosting the service getting rebooted (I suspect), the broadcast service crashing (i.e. the TCP endpoint is available but nothing responds) and various other glitches.

Adding some more robustness has been on my todo list for a while - I host the service on a Raspberry PI and it tends to fall over every couple of weeks, but root causing has proved a little elusive. I'll see if I can get some time with @idg10 today to see if he has any ideas.

@pckv
Copy link
Author

pckv commented Oct 12, 2022

I'll see if I can get some time with @idg10 today to see if he has any ideas.

That would be wonderful.

I've now also ran the same test, this time on my desktop which seemed to handle the public endpoint just fine, on the Norwegian Coastal Administration's closed source which provides more data.

These new results demonstrate the real issue with parsing the messages. When subscribing to Sentences, the count is very high and consistent:

$ dotnet run -> 15945
$ dotnet run -> 15957
$ dotnet run -> 16322
$ dotnet run -> 16082
$ dotnet run -> 16391

But when subscribing to Messages and thus parsing every sentence, the results are amazingly low:

$ dotnet run -> 2
$ dotnet run -> 45
$ dotnet run -> 12
$ dotnet run -> 11
$ dotnet run -> 69

This is way lower than our production setup, which seemingly receives a few thousand messages every second in between all of the pauses, which parses the messages and stores them in the database. This is most certainly a problem that occurs when parsing the messages, and not an actual problem with the endpoint. I'll attempt to run the first test (the one where it runs fine) while simultaneously performing some slightly expensive operations to see if that has the same effect.

@pckv
Copy link
Author

pckv commented Oct 12, 2022

@HowardvanRooijen I've managed to find the cause of the pausing. The stream crashes due to an uncaught NotImplementedException caused in

lineStreamProcessor.OnNext(new NmeaLineParser(lineAsAscii), 0);

The exception is raised due to OnError not being implemented in NmeaToAisMessageTypeProcessor, by the following line

Simply catching this exception solves our problem and the stream remains consistent.

try
{
    lineStreamProcessor.OnNext(new NmeaLineParser(lineAsAscii), 0);
}
catch (ArgumentException ex)
{
    if (errorSubject.HasObservers)
    {
        errorSubject.OnNext((ex, line));
    }
}
catch (NotImplementedException e)
{
    if (errorSubject.HasObservers)
    {
        errorSubject.OnNext((e, line));
    }
}

I still have no clue where this exception is raised or why it so rarely is. It seems to be raised by the processor and not the NmeaLineParser.

I'm not sure what the best way to go about resolving this would be, so I hope you have a better idea on what to do here. Otherwise, I could create a PR that simply ignores the error (which I suppose is not ideal).

We're currently using version 0.2.0-preview.5, and would very much appreciate a new build ASAP.

@HowardvanRooijen
Copy link
Contributor

What's throwing the NotImplementedException? Are you able to capture the data that's causing that to be thrown?

@pckv
Copy link
Author

pckv commented Oct 12, 2022

Seems like there are some Class B vessel messages that are not being parsed properly. Here's a line that caused it to crash:

\\g:1-2-22691,s:2573115,c:1665565535*47\\!BSVDM,1,1,1,A,H3mhk`PEH400000000000000000,2*20

With the exception:

System.ArgumentException: Can only handle non-zero padding on the final message in a fragment

Since this also is an ArgumentException I've resolved the previous issue again by just doing

public void OnError(in ReadOnlySpan<byte> line, Exception error, int lineNumber)
{
    throw error;
}

@HowardvanRooijen
Copy link
Contributor

Good troubleshooting! So it looks like there's some data being broadcast that doesn't adhere to the spec... it's one of the challenges of creating a data driven parser, these real-world use cases only happen in the real world and require sample data to fix... adding some error handling will definitely help... I should add some more robust handling / logging to the receiver app.

@pckv
Copy link
Author

pckv commented Oct 12, 2022

Preventing crashes upon reading a message that can't be parsed would at least be a good start for our case! We don't mind some messages being skipped. But at least the same issues arise in the public AIS stream we used first, so the issue should be reproducible.

@HowardvanRooijen
Copy link
Contributor

There are a few different layers to this project - Ais.NET - that has proper test coverage in order to implement the spec, Ais.Net.Models and Ais.Net.Receiver are more demo/skunk work projects that we wrote to explore other use cases for Ais.Net and to create good demos, and to capture some free real data for analysis for user group talks... it's definitely not a production system. Ais.Net is the high performance layer - "Models" is the convenience layer that models the domain in a lower performance way but is easier for a human to grok.

@HowardvanRooijen
Copy link
Contributor

We've just merged a PR (#155) into master which adds better error handling and you can configure the verbosity of log output too, including statistics:

25/10/2022 14:17:57: Sentences: 2564 | Messages: 2126 | Errors 66
25/10/2022 14:18:57: Sentences: 2596 | Messages: 2114 | Errors 80
25/10/2022 14:19:57: Sentences: 2536 | Messages: 2069 | Errors 68
25/10/2022 14:20:57: Sentences: 2579 | Messages: 2107 | Errors 73
25/10/2022 14:21:57: Sentences: 2686 | Messages: 2175 | Errors 80
25/10/2022 14:22:57: Sentences: 2625 | Messages: 2152 | Errors 71
25/10/2022 14:23:57: Sentences: 2565 | Messages: 2103 | Errors 88
25/10/2022 14:24:57: Sentences: 2557 | Messages: 2060 | Errors 89
25/10/2022 14:25:57: Sentences: 2618 | Messages: 2110 | Errors 86
25/10/2022 14:26:57: Sentences: 2637 | Messages: 2160 | Errors 82

@HowardvanRooijen
Copy link
Contributor

Oh and on my Raspberry PI, it's consuming 1.2% of the CPU and 1.8% of the memory.

@HowardvanRooijen
Copy link
Contributor

We've just published these fixes (and various other updates) as a 0.2 release.
https://github.com/ais-dotnet/Ais.Net.Receiver/releases/tag/0.2.0

This release requires some updates to the configuration, you can find details on the readme:
https://github.com/ais-dotnet/Ais.Net.Receiver/tree/0.2.0#configuration

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants