Skip to content

Conversation

@odesenfans
Copy link
Collaborator

Several improvements to the pending TX/message jobs.

  • The jobs used to work by batch, meaning they started a series
    of N tasks and then waited for these tasks to finish before
    starting the next batch. This has the obvious disadvantage
    that the job ends up waiting on the slower tasks, while other
    tasks could have been started earlier. The jobs now have a
    max concurrency parameter that limits the total number of tasks.
    The job will automatically collect any finished task and spawn
    new ones up until the limit.

  • Refactored the message processing job to get rid of the i/j
    logic that modified the batch size depending on whether the job
    was processing STORE messages. The job now uses a system of
    semaphores per message type that can limit the number of tasks
    running in parallel for a specific message type. This is still
    suboptimal as using semaphores per item type would make more
    (to limit the number of tasks fetching data from the network
    or ipfs), but this feature will be implemented later.

  • Several job metrics were removed, as they lost meaning:

    • pyaleph_processing_pending_messages_gtasks_total
    • pyaleph_processing_pending_messages_i_total
    • pyaleph_processing_pending_messages_j_total

    These metrics were replaced by:

    • pyaleph_processing_pending_messages_aggregate_tasks
    • pyaleph_processing_pending_messages_forget_tasks
    • pyaleph_processing_pending_messages_post_tasks
    • pyaleph_processing_pending_messages_program_tasks
    • pyaleph_processing_pending_messages_store_tasks.
  • The max concurrency parameter of each job and the limits per
    message type can now be configured from the config file.

@odesenfans odesenfans added the work in progress Work in progress, major parts of the PR can change label May 18, 2022
@odesenfans odesenfans requested a review from hoh May 18, 2022 18:21
@odesenfans odesenfans self-assigned this May 18, 2022
@odesenfans odesenfans removed the request for review from hoh May 18, 2022 18:22
@odesenfans odesenfans force-pushed the od-improve-throughput-of-jobs branch from 9874965 to 4547997 Compare May 19, 2022 15:54
@odesenfans odesenfans removed the work in progress Work in progress, major parts of the PR can change label May 19, 2022
@odesenfans odesenfans requested a review from hoh May 19, 2022 15:54
@odesenfans odesenfans assigned hoh and unassigned odesenfans May 19, 2022
@odesenfans odesenfans added this to the May release milestone May 19, 2022
@odesenfans odesenfans force-pushed the od-improve-throughput-of-jobs branch 2 times, most recently from f874968 to 657b5f7 Compare May 19, 2022 16:09
@odesenfans odesenfans marked this pull request as ready for review May 23, 2022 08:31
Several improvements to the pending TX/message jobs.

* The jobs used to work by batch, meaning they started a series
  of N tasks and then waited for these tasks to finish before
  starting the next batch. This has the obvious disadvantage
  that the job ends up waiting on the slower tasks, while other
  tasks could have been started earlier. The jobs now have a
  max concurrency parameter that limits the total number of tasks.
  The job will automatically collect any finished task and spawn
  new ones up until the limit.

* Refactored the message processing job to get rid of the i/j
  logic that modified the batch size depending on whether the job
  was processing STORE messages. The job now uses a system of
  semaphores per message type that can limit the number of tasks
  running in parallel for a specific message type. This is still
  suboptimal as using semaphores per item type would make more
  (to limit the number of tasks fetching data from the network
  or ipfs), but this feature will be implemented later.

* Several job metrics were removed, as they lost meaning:
  - pyaleph_processing_pending_messages_gtasks_total
  - pyaleph_processing_pending_messages_i_total
  - pyaleph_processing_pending_messages_j_total

  These metrics were replaced by:
  - pyaleph_processing_pending_messages_aggregate_tasks
  - pyaleph_processing_pending_messages_forget_tasks
  - pyaleph_processing_pending_messages_post_tasks
  - pyaleph_processing_pending_messages_program_tasks
  - pyaleph_processing_pending_messages_store_tasks.

* The max concurrency parameter of each job and the limits per
  message type can now be configured from the config file.
Removed now useless views in the Aleph Node Metrics dashboards
and added a view to watch the number of messages by message type
over time.
@odesenfans odesenfans force-pushed the od-improve-throughput-of-jobs branch from 09c2fcf to de809e2 Compare May 24, 2022 10:05
@odesenfans odesenfans merged commit 4ca3079 into aleph-im:dev May 25, 2022
@odesenfans odesenfans deleted the od-improve-throughput-of-jobs branch May 25, 2022 09:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants