Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'job.data' passed in to processor callback is sometimes empty. #2461

Closed
tinytortoise-dev opened this issue Nov 10, 2022 · 17 comments
Closed
Labels

Comments

@tinytortoise-dev
Copy link

tinytortoise-dev commented Nov 10, 2022

Description

As in the title, job.data passed in to processor callback is an empty object though our code ALWAYS pass in an object with values. This happens ocassionally in production environment, not in local environment.

Here's high level overview of the code:

import Queue from 'bull';
const queue = new Queue('queue');

onHappenEvent = async (evt) => {
  const job = await toBullJob(evt); // this job object always looks like: `{ name: 'some-job', params: { ... }}`
  const opts = {
        timeout: 60 * 1000,
        attempts: 1,
        removeOnComplete: true,
        removeOnFail: true,
        delay: 5 * 60 * 1000,
  };
  await queue.add(job, opts);
};

const processorCb = async (job) => {
  console.log(job.data); // Most of the time the output is `{ name: 'some-job', params: { ... }}`, but `{}` sometimes...
  // do something
};
queue.process(1, processorCb);

Since we always pass in a nonempty object with values to queue.add function, we expect that processorCb is always passed in THAT nonempty object. Does anyone have an idea of the cause of this problem?

Minimal, Working Test code to reproduce the issue.

(An easy to reproduce test case will dramatically decrease the resolution time.)

Bull version

3.22.12

Additional information

We have 3 Node instances in a production environment. All of them accesses to one Redis instance.

@manast
Copy link
Member

manast commented Nov 10, 2022

Whats your maxmemory policy configuration in Redis?

@tinytortoise-dev
Copy link
Author

@manast

> config get maxmemory*
maxmemory
0
maxmemory-samples
5
maxmemory-policy
noeviction

@manast
Copy link
Member

manast commented Nov 11, 2022

I have never heard about this issue before, and I cannot see how this would be possible if you really are sending data in every event. Have you checked to put a log before adding the job to BullMQ to verify it really is always an object and not undefined?

@tinytortoise-dev
Copy link
Author

When we see our code, there seems to be no room where we could pass in a non-object value to bull's add function, such as null or undefined.
But haven't verified it yet by actually logging the value, so we'll insert a logger and see the output.

@tinytortoise-dev
Copy link
Author

tinytortoise-dev commented Nov 14, 2022

@manast
As a reference, this is the job object passed in to the processor function:

{
    "id": "38272958",
    "name": "__default__",
    "data": {}, // this is weirdly empty...
    "opts": {
        "attempts": 1,
        "delay": 0,
        "timestamp": 1668129039381
    },
    "progress": 0,
    "delay": null,
    "timestamp": null,
    "attemptsMade": 0,
    "stacktrace": [],
    "returnvalue": null,
    "finishedOn": null,
    "processedOn": 1668128992531
}

And I have additional information that I thought is irrelevant to the issue. But I'll show it just in case.

  • We have 2 other queue instances, say 'other-queue1' and 'other-queue2'.
  • Both 'other-queue1' and 'other-queue2' has only one processor.
  • Processor concurrency for 'other-queue1' is 100 and one for 'other-queue2' is 8.
  • Processor for 'other-queue1' is non-named and one for 'other-queue2' is named.
  • We're adding a non-named job to 'other-queue1' and a correspondingly named job to 'other-queue2'.

I didn't include this information because I thought one queue instance can never be affected by other queue instances.

@roych98
Copy link

roych98 commented Nov 14, 2022

I don't see a reason to which job.data should be empty, unless whatever comes back from toBullJob(evt) has nothing within it's data object. Can you please show us what evt and what the toBullJob functions do?

@tinytortoise-dev
Copy link
Author

@roych98

interface Evt {
    deviceId: number; // Id of device that caused the event
}

// get some data that's needed by processor from postgres and redis
const toBullJob = async (evt: Evt) => {
    const device = await myDb.deviceTable.getById(evt.deviceId);
    if (!device.otherParams) throw new Error();
    const deviceCache = await myRedis.deviceCache.getById(evt.deviceId);
    if (!deviceCache.evtHappenedAt) throw new Error();
    return {
        name: 'device_event_job',
        device: {
            id: evt.deviceId,
            evtHappendAt: parseInt(deviceCache.evtHappenedAt),
        },
        otherParams: device.otherParams
    }
}

NOTE: toBullJob is actually an async function(I've updated the original description)

@manast
Copy link
Member

manast commented Nov 15, 2022

If that's the code, then I agree, it looks as if it should always be defined. But it is also quite straightforward in the Bull internals, so I don't know why this is happening to you. As mentioned, this is the first time we hear about data being empty, so there must be something special in your setup.

@stale
Copy link

stale bot commented Jan 14, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Jan 14, 2023
@stale stale bot closed this as completed Jan 21, 2023
@hgeldenhuys
Copy link

I'm also getting this. I can confirm that data is being added after I add a job, but when it is picked up and processed, the data object is empty

@manast
Copy link
Member

manast commented Jul 5, 2023

@hgeldenhuys it is quite unlikely that this is an issue with Bull.

@hgeldenhuys
Copy link

hgeldenhuys commented Jul 5, 2023

Hey @manast , where else would this be do you think? I'm using NestJS, I'm processing bureau files. It all worked well for about 40k files, until I started processing 2GB files, inside redis even the data attribute is gone:
image

The missing lock too is not accurate, I think it masks the real issue, and in the queue interface, it looks like 53 years elapsed on some jobs:
image

//config
const settings = {
  lockDuration: 600_000 * 4,
  lockRenewTime: 600_000 * 2,
  stalledInterval: 0,
  maxStalledCount: 0,
}

const redis = {
  host: constants().REDIS_HOST,
  port: constants().REDIS_PORT,
  password: constants().REDIS_PASSWORD,
  username: constants().REDIS_USERNAME,
}
@Module({
  imports: [
    BullModule.forRoot({ redis, settings }),
    BullModule.registerQueue({ name: 'level0-monthly', redis, settings }),
//...

const lineJobs: Level0Jobs = {
        id: summary.id,
        batch,
        srn: summary.srn,
        test: summary.test,
        total: summary.lines,
        filename: summary.filename,
        frequency: summary.frequency,
        type: 'data',
        lines,
      }
      const job = await queue.add(lineJobs, { removeOnComplete: this.BULL_REMOVE_ON_COMPLETE })    

@hgeldenhuys
Copy link

I'm also getting this. I can confirm that data is being added after I add a job, but when it is picked up and processed, the data object is empty

Not only data, but timestamp is also missing
image

@hgeldenhuys
Copy link

This is speculative, but this might be related to Redis, at some point it's not available for a moment while swapping memory files, and all hell breaks loose. So Bull doesn't get the jobs properly back but overlays the failure attributes on top of an empty object. Maybe Bull should just check that the timestamp is available before assuming the job's object is okay?

@manast
Copy link
Member

manast commented Jul 5, 2023

@hgeldenhuys it looks like the job has been removed while still being processed. Data can be removed by Redis if you do not have the correct maxmemory policy configured... I am not sure what you mean with 2Gb files but I hope you are not sending 2Gb as data to a job, that is not a very good idea for many reasons. https://docs.bullmq.io/guide/going-to-production#max-memory-policy

@hgeldenhuys
Copy link

hgeldenhuys commented Jul 5, 2023

@hgeldenhuys it looks like the job has been removed while still being processed. Data can be removed by Redis if you do not have the correct maxmemory policy configured... I am not sure what you mean with 2Gb files but I hope you are not sending 2Gb as data to a job, that is not a very good idea for many reasons. https://docs.bullmq.io/guide/going-to-production#max-memory-policy

No, it was a 2GB file split in lines of data. The implication was just that I was pushing a lot more data through. I'll check out the max-memory-policy, but the missing timestamp results in a 1970 default date when you do you stall calculation, and then a false missing lock is raised, which is why many people think increasing the lock duration variables will fix it but doesn't. This should be handled differently as it puts people on the wrong troubleshooting path, unless it could be documented.

I don't think the jobs are being deleted, because there are still a lot of intact jobs waiting, but as soon as they get processed the missing lock logic kicks in and hides the real issue (which is probably due to redis' memory swap)

@hgeldenhuys
Copy link

It was Redis, I've set the maxmemory higher and changed the maxmemory-policy and no longer get the missing lock issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants