Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use Task instead of spawn_link for starting workers #436

Merged
merged 1 commit into from
Jan 2, 2021

Conversation

mitchellhenke
Copy link
Contributor

Hello! 👋🏼

I maintain https://github.com/getsentry/sentry-elixir and have received issues where failed jobs in Exq don't report process metadata from the crashed process (getsentry/sentry-elixir#349). It looks like it's due to using spawn_link instead of Elixir modules like Task or GenServer that rely on :proc_lib underneath. The change would allow for some more detailed reporting on the state of the worker process is in when it crashed, including parent process, metadata, and more.

I wasn't sure if Task was appropriate, but the tests do pass 🙂

@ananthakumaran
Copy link
Collaborator

Existing

[
  msg: [
    "Process ",
    "#PID<0.242.0>",
    " raised an exception",
    10,
    "** (RuntimeError) hello",
    ["\n    " | "(exq_demo 0.1.0) lib/hardworker.ex:8: Hardworker.perform/0"],
    ["\n    " |
     "(exq 0.14.0) lib/exq/worker/server.ex:180: anonymous fn/4 in Exq.Worker.Server.dispatch_work/3"]
  ],
  md: [
    crash_reason: {%RuntimeError{message: "hello"},
     [
       {Hardworker, :perform, 0, [file: 'lib/hardworker.ex', line: 8]},
       {Exq.Worker.Server, :"-dispatch_work/3-fun-0-", 4,
        [file: 'lib/exq/worker/server.ex', line: 180]}
     ]},
    error_logger: %{emulator: true, tag: :error},
    gl: #PID<0.218.0>,
    pid: #PID<0.242.0>,
    time: 1609096346068586
  ]
]

New

[
  msg: [
    "Task #PID<0.322.0> started from #PID<0.323.0> terminating",
    [
      [10 | "** (RuntimeError) hello"],
      ["\n    " | "(exq_demo 0.1.0) lib/hardworker.ex:8: Hardworker.perform/0"],
      ["\n    " |
       "(exq 0.14.0) lib/exq/worker/server.ex:189: anonymous fn/4 in Exq.Worker.Server.dispatch_work/3"],
      ["\n    " |
       "(elixir 1.10.3) lib/task/supervised.ex:90: Task.Supervised.invoke_mfa/2"],
      ["\n    " | "(stdlib 3.11) proc_lib.erl:249: :proc_lib.init_p_do_apply/3"]
    ],
    "\nFunction: #Function<0.27156016/0 in Exq.Worker.Server.dispatch_work/3>",
    "\n    Args: []"
  ],
  md: [
    crash_reason: {%RuntimeError{message: "hello"},
     [
       {Hardworker, :perform, 0, [file: 'lib/hardworker.ex', line: 8]},
       {Exq.Worker.Server, :"-dispatch_work/3-fun-0-", 4,
        [file: 'lib/exq/worker/server.ex', line: 189]},
       {Task.Supervised, :invoke_mfa, 2,
        [file: 'lib/task/supervised.ex', line: 90]},
       {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 249]}
     ]},
    domain: [:otp, :elixir],
    error_logger: %{tag: :error_msg},
    gl: #PID<0.300.0>,
    pid: #PID<0.323.0>,
    report_cb: &Task.Supervised.format_report/1,
    time: 1609096397377193,
    worker: "hardworker"
  ]
]
[
  msg: [
    "Process ",
    "#PID<0.323.0>",
    " terminating",
    [
      10,
      "** (RuntimeError) hello",
      ["\n    " | "(exq_demo 0.1.0) lib/hardworker.ex:8: Hardworker.perform/0"],
      ["\n    " |
       "(exq 0.14.0) lib/exq/worker/server.ex:180: anonymous fn/4 in Exq.Worker.Server.dispatch_work/3"],
      ["\n    " | "(stdlib 3.11) proc_lib.erl:234: :proc_lib.init_p/3"]
    ],
    [
      '\n',
      "Initial Call: ",
      "anonymous fn/0 in Exq.Worker.Server.dispatch_work/3",
      '\n',
      "Ancestors: ",
      "[#PID<0.322.0>, Exq.Worker.Sup, Exq.Sup, #PID<0.301.0>]",
      ['\n', "Message Queue Length", 58, 32, "0"],
      ['\n', "Messages", 58, 32, "[]"],
      ['\n', "Links", 58, 32, "[#PID<0.322.0>]"],
      ['\n', "Dictionary", 58, 32,
       "[\"$logger_metadata$\": %{worker: \"hardworker\"}]"],
      ['\n', "Trapping Exits", 58, 32, "false"],
      ['\n', "Status", 58, 32, ":running"],
      ['\n', "Heap Size", 58, 32, "610"],
      ['\n', "Stack Size", 58, 32, "27"],
      ['\n', "Reductions", 58, 32, "301"]
    ],
    []
  ],
  md: [
    ancestors: [#PID<0.322.0>, Exq.Worker.Sup, Exq.Sup, #PID<0.301.0>],
    crash_reason: {%RuntimeError{message: "hello"},
     [
       {Hardworker, :perform, 0, [file: 'lib/hardworker.ex', line: 8]},
       {Exq.Worker.Server, :"-dispatch_work/3-fun-0-", 4,
        [file: 'lib/exq/worker/server.ex', line: 180]},
       {:proc_lib, :init_p, 3, [file: 'proc_lib.erl', line: 234]}
     ]},
    domain: [:otp, :sasl],
    error_logger: %{tag: :error_report, type: :crash_report},
    file: "proc_lib.erl",
    function: "crash_report/4",
    gl: #PID<0.300.0>,
    initial_call: {Exq.Worker.Server, :"-dispatch_work/3-fun-0-", 0},
    line: 508,
    logger_formatter: %{title: 'CRASH REPORT'},
    mfa: {:proc_lib, :crash_report, 4},
    module: :proc_lib,
    pid: #PID<0.323.0>,
    report_cb: &:proc_lib.report_cb/2,
    time: 1609096766204713,
    worker: "hardworker"
  ]
]

From limited testing, this also introduces a crash report as well (assuming handle_sasl_reports is set to true). I will try to verify if there are any other changes introduced by it.

@mitchellhenke
Copy link
Contributor Author

Thanks for the quick response!

I should have better described the potential changes. The list of behaviors for processes started through proc_lib is here: https://erlang.org/doc/man/proc_lib.html#description. SASL is mentioned, though it was deprecated with the release of OTP 21, so my hope is the impact of that is more limited.

@ananthakumaran ananthakumaran merged commit f1bce6f into akira:master Jan 2, 2021
@mitchellhenke mitchellhenke deleted the use-task-for-workers branch January 2, 2021 16:44
@akira
Copy link
Owner

akira commented Jan 2, 2021

Thanks for PR @mitchellhenke!! @ananthakumaran let me know when we should cut a release.

@ananthakumaran
Copy link
Collaborator

@akira we can cut a release in a week or two. We have been running fe1b102 in prod and it looks stable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants