Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RangeError [ERR_OUT_OF_RANGE] after longer operation times #168

Closed
sh-developers opened this issue Dec 7, 2018 · 18 comments
Closed

RangeError [ERR_OUT_OF_RANGE] after longer operation times #168

sh-developers opened this issue Dec 7, 2018 · 18 comments

Comments

@sh-developers
Copy link

sh-developers commented Dec 7, 2018

Background

We implement request/response communication pattern via AMQP and use Qpid (C++ version) as the amqp broker. We ported our applications from amqp10 to rhea because when we moved from node 8 to node 10, we started getting buffer write errors (very similar to the one I'm about to show you).

We create our receiver queues via the same connection using the QMF2 framework (which is probably not relevant, because the error only occurs on sender connections, for which the queue is created by other services).

Description

The errors we have encountered with rhea only start showing up after the application has run for a while (in this case a day on a low traffic test server). These errors start with the number 256 and start counting up from there (281 in the example).
We have only observed this error occurring when we are sending to an already existing queue.

Example

[RangeError [ERR_OUT_OF_RANGE]] "The value of "value" is out of range. It must be >= 0 and <= 255. Received 281" RangeError [ERR_OUT_OF_RANGE]: The value of "value" is out of range. It must be >= 0 and <= 255. Received 281
    at writeU_Int8 (internal/buffer.js:559:11)
    at Buffer.writeUInt8 (internal/buffer.js:569:10)
    at types.Writer.write_uint (/app/node_modules/rhea/lib/types.js:725:28)
    at types.Writer.write_variable_width (/app/node_modules/rhea/lib/types.js:751:10)
    at types.Writer.write_value (/app/node_modules/rhea/lib/types.js:789:14)
    at types.Writer.write (/app/node_modules/rhea/lib/types.js:779:14)
    at types.Writer.write_compound (/app/node_modules/rhea/lib/types.js:814:18)
    at types.Writer.write_value (/app/node_modules/rhea/lib/types.js:791:14)
    at types.Writer.write (/app/node_modules/rhea/lib/types.js:779:14)
    at types.Writer.write_compound (/app/node_modules/rhea/lib/types.js:814:18)

Further observations

Because the number always counts up, we think that the number being written might be a counter of some sorts? We use a correlationId consisting of a string (ULID-identifier), so it cant be that. All other information being non-counting, the only other number we send ourselves is the userId number, which is always the same on this environment, and is always observed to be in the tens not in the hunderds (and counting up).

If you need any further information, please tell us.

@grs
Copy link
Member

grs commented Dec 7, 2018

It looks like it is a string or similar that is too large for the encoding used. The value complained about looks like it is the size, so e.g. a str8 or perhaps a vbin8 that is too large. However there are alternate encodings for string and binary types that I would expect to get used. Are you explicitly wrapping any values in your message with a particular type?

Are you able to create a simple reproducer? Or perhaps get some protocol traces from a run of the application to see if there is any trend in the data?

@sh-developers
Copy link
Author

I noticed we just send the entire body as a simple Javascript object. As a fix I have tried to send the body as a JSON-encoded string to see if that fixes it.

@grs
Copy link
Member

grs commented Dec 14, 2018

Any update on this?

@sh-developers
Copy link
Author

sh-developers commented Dec 14, 2018

Solution

Problems seem to be solved when serializing the body as JSON before placing a message on a queue. (and deserializing after)

Prevention

I would recommend adding some notice/warning/recommendation to serialize any body content before placing on the bus, or what the limitations of the AMQP serialization formats are. (Evidently we did not realize the limitations of the format when reading either the documentation of amqp10 or rhea.) For example that with larger messages, serialization of the body is better done with a dedicated message format like messagepack/JSON/...

If you would like, I could formulate something for the readme, but someone that actually implemented the encoding/formatting code would be better suited for this.

This ticket is solved for us now, thank you for your help and attention!

@grs
Copy link
Member

grs commented Dec 14, 2018

Thanks for the update. Serialising javascript objects should not cause this error, so potentially there is still a bug. I can't see anything obvious in going through the code though.

@MattOakley
Copy link

We're getting this issue as well.

I think the number is the link_credit on the frame object called in session.js on the output method.

For e.g. we get "The value of "value" is out of range. It must be >= 0 and <= 255. Received -3" with a link_credit of -3
The frame:

delivery_count:951
drain:undefined
echo:undefined
handle:0
incoming_window:1866
link_credit:-3
next_incoming_id:951
next_outgoing_id:0
outgoing_window:4294967295
properties:undefined

@grs
Copy link
Member

grs commented Jan 4, 2019

@MattOakley this is for a receiving link? (i.e. first thing created on the session was a receiver not a sender). It is certainly a bug for the link credit to go negative. Do you have a simple test or reproducer to isolate this? Is it possible you ever call receiver.add_credit()/flow() with a negative number?

@grs
Copy link
Member

grs commented Jan 4, 2019

@MattOakley I pushed a check in the one place I see where the credit go in theory go negative (230e717); does that impact your use case?

@MattOakley
Copy link

That does impact our use case, I'll run a test with it tomorrow.

Although I can't see why that method would be hit when the credit is zero. Wouldn't this mess up the amount of inflight messages (or worse)?

In answer to your earlier question:

It is a receiving link and I don't have a simple script to reproduce it to hand, I could hopefully make one on monday if it helps.

We have a relatively long running on message handler (approx 1min) and the issue happens after a minute or so of consistent message flow. The credit window is 0 and when the message hander is finished we call accept() and add_credit(1), This is the only time we call this.

@grs
Copy link
Member

grs commented Jan 4, 2019

I agree, I don't see why the credit is zero when a transfer is received unless the credit is reduced by the application (or there is some other aspect I have not yet understood).

If you are able to create a reproducer of some kind that would really help tracking it down.

@MattOakley
Copy link

We are sending to queues on sender links as well, not sure if that could be important?

@MattOakley
Copy link

MattOakley commented Jan 7, 2019

Below is a reproducer script:

The change (230e717) fixes the connection error being thrown but, now writes a lot of Received transfer when credit was 0 messages.
Also there's a typo which makes the logs print out NaN.

import * as container from "rhea";

const queueName = "rheaTest";
let senderConnection = container.connect({ port: 8091, host: "localhost" });
let receiverConnection = container.connect({ port: 8091, host: "localhost" });

senderConnection.on("error", async e => {
    console.log("Sender Error: " + e);
    debugger;
});

receiverConnection.on("error", async e => {
    console.log("Receiver Error: " + e);
    debugger;
});
receiverConnection.on("receiver_open", (context) => {
    context.receiver.flow(100);
});

let sender = senderConnection.open_sender({
    name: queueName,
    snd_settle_mode: 0,
    rcv_settle_mode: 0,
    target: {
        durable: 2,
        address: queueName
    }
});

container["once"]("sendable", (context: container.EventContext) => {
    let message = Array.from({ length: 250 }, (x, i) => i).join("");;
    for (let i = 0; i < 1000; i++) {
        let x = sender.send({ body: message, application_properties: [], durable: true, correlation_id: i });
    }


    let receiver = receiverConnection.open_receiver({
        name: queueName,
        autoaccept: false,
        rcv_settle_mode: 0,
        snd_settle_mode: 0,

        // Issue flow manually for async message handler.
        credit_window: 0,
        source: {
            durable: 2,
            address: queueName
        }
    });

    receiver.on("message", async (context: container.EventContext) => {
        console.log("received: " + context.message.correlation_id);

        let seconds = Math.floor(Math.random() * 6 + 25);
        await waitSeconds(seconds);

        console.log("resolved: " + context.message.correlation_id);
        context.delivery.accept();
        receiver.add_credit(1);
    });


    async function waitSeconds(seconds: number): Promise<{}> {
        return new Promise(resolve => {
            setTimeout(() => {
                resolve();
            }, seconds * 1000);
        });
    };
});

Running node 10 on windows
Using RabbitMQ with amqp 1.0

The message Received transfer when credit was 0 was printed out 49 times for 1000 messages

@grs
Copy link
Member

grs commented Jan 7, 2019

Thanks! I can reproduce with that and it looks like a bug in the rabbitmq 1.0 support (I'm using v3.6.16 of the 1.0 plugin in my test). It is not honouring credit at all. If a receiver link is attached and no credit is issued, the broker still sends messages. I've verified that with an entirely different client.

@grs
Copy link
Member

grs commented Jan 7, 2019

Fyi I've raised the following issue: rabbitmq/rabbitmq-amqp1.0#67

@grs
Copy link
Member

grs commented Jan 7, 2019

Above issue is fixed from 3.7 apparently; I upgraded and indeed the simple issue is gone. However I can still reproduce the problem and it does still look to me like an error from the server where it sends transfers for which credit has not been issued, just more subtle.

@grs
Copy link
Member

grs commented Jan 7, 2019

I've confirmed with wireshark trace that rabbitmq server will occasionally send more transfers than it is entitled from granted credit. Issue is already reported https://github.com/rabbitmq/rabbitmq-amqp1.0/issues/50

@grs
Copy link
Member

grs commented Jan 7, 2019

@MattOakley I have fixed the typo in the error message. The current code ensures that the credit counter doesn't go negative when the server violates the protocol by sending more transfers than credits granted.

We could close the receiver with an error since the server is violating the protocol. If that would be preferred, let's raise a separate issue for that.

@MattOakley
Copy link

Ok, thanks for your help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants