Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seperate python script for armada v1 and v2 system diagrams #2758

Merged
merged 19 commits into from
Aug 21, 2023
Merged

Seperate python script for armada v1 and v2 system diagrams #2758

merged 19 commits into from
Aug 21, 2023

Conversation

Pradeep-Kurapati
Copy link
Contributor

@Pradeep-Kurapati Pradeep-Kurapati commented Jul 27, 2023

This is a documentation improvement pull request

What this PR does:

This pull request gives the seperate diagram for Armada V1 System.

Issue PR fixes:

Fixes #2738

Special notes for your reviewer:

  1. Please check if the diagram is correct.
  2. Should we remove generate.py as I'm thinking of generate_v1.py and generate_v2.py?
  3. If this diagram is correct, I will be commiting generate_v2.py

Here is the resultant diagram:

image

┆Issue is synchronized with this Jira Task by Unito

@codecov
Copy link

codecov bot commented Jul 27, 2023

Codecov Report

Patch and project coverage have no change.

Comparison is base (3fa5ab2) 47.40% compared to head (23dd9ab) 47.40%.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2758   +/-   ##
=======================================
  Coverage   47.40%   47.40%           
=======================================
  Files         395      395           
  Lines       44045    44045           
  Branches      487      487           
=======================================
  Hits        20879    20879           
  Misses      21582    21582           
  Partials     1584     1584           
Flag Coverage Δ
unittests 47.40% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Sharpz7
Copy link
Contributor

Sharpz7 commented Jul 28, 2023

Great start! A few things:

Armada V1 actually uses lookout V2, but in a special way. The pulsar -> Lookout V2 Ingester -> Lookout V2 API loop exists, but instead, it also talks to the Lookout V1 UI. can you add that in?

Also, Pulsar does not talk to the executors, instead, the executors talk to Armada Server (This way, Armada Server is not responsible for keeping track of its executors, the executors need to send data to the server, and it responds with a decision). Also, if you are able to add in the Armada Clients talking to The server like in the V2 Doc that would be great.

Also, the "Redis Scheduler" should be an Armada Component, not the Redis Icon.

@Pradeep-Kurapati
Copy link
Contributor Author

This looks very wrong. Please review this. Thank you:

image

@Sharpz7
Copy link
Contributor

Sharpz7 commented Jul 28, 2023

Yeah so this needs a few more things:

  • The lookout V2 API talks to The Lookout V1 UI.
  • Pulsar does not talk to the lookout V1 UI.
  • The Executor API does not talk to the Redis Scheduler, in fact, remove the redis scheduler, as for V1 it is included inside the Armada Server. I.e make all connections going to the Redis scheduler go to the server instead
  • https://drive.google.com/uc?id=1eeaX7EMdJfCXBgVKcNkiHV9yG1zqxBqO These components need to be duplicated for the V1 Lookout. There should, be two of these, with the Lookout V1 and V2 APIs being the only components which talk to the Lookout UI

I know this looks messy... but welcome to Armada :))

@Pradeep-Kurapati
Copy link
Contributor Author

I did not clearly understand the last point. When you said V1 Lookout, were you referring to Lookout V1 UI or is it a component I didn't add? Some clarification about the last point would be great!

image

@Pradeep-Kurapati
Copy link
Contributor Author

@Sharpz7, please review Armada V2 System diagram also:

image

@Pradeep-Kurapati
Copy link
Contributor Author

@Sharpz7 kindly review the changes and provide feedback. Thank you.

@Sharpz7
Copy link
Contributor

Sharpz7 commented Jul 31, 2023

V2 is looking good now!

V1 Needs this change https://drive.google.com/uc?id=1cBMG1pR6c5KUGAwRqvLA03tbmQMHZyEM. You need to add a 2nd ingester and API in for lookout V1 and connect it as shown.

Also, now it is mostly ready, give some documentation a go! Try building armada yourself and experimenting, add the images into the PR as well please directly

@Pradeep-Kurapati
Copy link
Contributor Author

@Sharpz7 Please review V1 diagram:

image

@Pradeep-Kurapati Pradeep-Kurapati changed the title Seperate python script for armada v1 system diagram Seperate python script for armada v1 and v2 system diagrams Jul 31, 2023
@Sharpz7
Copy link
Contributor

Sharpz7 commented Jul 31, 2023

This looks good! Lets get the documentation written including these diagrams and then we can ask more people to review :))

You probably want to try and get armada running so you can see what is going on. Go to the developer docs to see how that happens :))

@Sharpz7 Sharpz7 requested review from kannon92 and Sharpz7 July 31, 2023 22:14
@Pradeep-Kurapati
Copy link
Contributor Author

Roger that!

@Pradeep-Kurapati
Copy link
Contributor Author

I updated relationships_diagram.md to include our new diagrams. Please suggest necessary reviews. Thanks!

@Pradeep-Kurapati
Copy link
Contributor Author

@Sharpz7 @kannon92 Hi there! I just wanted to check in on the status of this pull request. Would appreciate any feedback or suggestions. Thank you!

1 similar comment
@Pradeep-Kurapati
Copy link
Contributor Author

@Sharpz7 @kannon92 Hi there! I just wanted to check in on the status of this pull request. Would appreciate any feedback or suggestions. Thank you!

@Sharpz7
Copy link
Contributor

Sharpz7 commented Aug 15, 2023

Apolgies for not getting back to you, will look at this today.

Copy link
Contributor

@Sharpz7 Sharpz7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy for no more time to be spent on this. I think the diagrams do a good enough job.

@Sharpz7 Sharpz7 enabled auto-merge (squash) August 21, 2023 17:33
@Sharpz7 Sharpz7 merged commit 95faa44 into armadaproject:master Aug 21, 2023
18 checks passed
svc-gh-ghzonetrans-p pushed a commit that referenced this pull request Oct 23, 2023
* Update simulator

* Replace Output with C

* Typo

* Restore pkg proto

* Restore files

* Fixing simulator changes (#6)

* Fixing simulator changes

* Changed to less than or equal

Co-authored-by: Mustafa Ilyas <mustafai@uberit.net>

* Simulator Changes (#9)

* Add config and dependency injection to scheduler metrics (#2892)

* Replace metrics singleton with an injection pattern.

* fix

* add configuration structures to metrics

* add configuration

* rename elements

* Maker Pulsar ReceiverQueueSize Configurable (#2895)

* wip

* wip

* set receiverQueueSize to 100

* remove old PulsarReceiverQueueSize

* revert

* subscriptionin api

---------

Co-authored-by: Chris Martin <chris@cmartinit.co.uk>

* Add poll_interval (#2805)

* Add poll_interval

* Add poll_interval

* Added poll_interval

* update by running tox-e docs

---------

Co-authored-by: Kevin Hannon <kannon1992@gmail.com>
Co-authored-by: Adam McArthur <46480158+Sharpz7@users.noreply.github.com>

* Seperate python script for armada v1 and v2 system diagrams (#2758)

* Seperate python script for armada v1 system diagram

* removed generate.py so it can be replaced with two seperate files for Armada V1 and Armada V2

* Python script to generate Armada V2 system diagram

* generate_v1.py Update #1

* generate_v1.py Update Number:2

* generate.py runs generate_v1.py as well as generate_v2.py and it is consistent with our instructions as 'docs/design/diagrams/relationships'

* generate_v1.py Update No:3

* Armada V1 and Armada V2 diagrams

* updated relationships_diagram.md to include armada v1 and v2 diagrams

---------

Co-authored-by: Adam McArthur <46480158+Sharpz7@users.noreply.github.com>

* Add config to use autoupdater on tagged branches (#2905)

* #2904 add autoupdate config

* #2904 add label config and other options

* docs: create README.md for plugins directory (#2897)

* Create README.md for plugins directory

* Update README.md

* Update plugins/README.md

Co-authored-by: Kevin Hannon <kehannon@redhat.com>

* Update README.md

---------

Co-authored-by: Kevin Hannon <kehannon@redhat.com>
Co-authored-by: Adam McArthur <46480158+Sharpz7@users.noreply.github.com>

* Enables airflow operator level retry. (#2894)

* Update docker stuff for latest airflow 2.7.0

* Use AirflowException instead of AirflowFailException to allow for retries

* Remove codecov workflows (#2902)

* Upgrade Pulsar Client to v0.11 (#2896)

* update

* update pulsar client

* Fix bug causing server spinning

* Abstract out the retry until success logic for testing (#2901)

* Respond to review

---------

Co-authored-by: Chris Martin <chris@cmartinit.co.uk>
Co-authored-by: Daniel Rastelli <rastellidani@gmail.com>

* Sync quickstart/index.md with gh-pages/quickstart.md (#2891)

* Log Call Site (#2909)

* allow logger to report caller

* allow logger to report caller

* lint

---------

Co-authored-by: Chris Martin <chris@cmartinit.co.uk>

* Add cleaner test output for mage with os/exec.Command (#2907)

* feat: Update Semver from version 6.3.0 to 6.3.1 (#2686)

Co-authored-by: Adam McArthur <46480158+Sharpz7@users.noreply.github.com>

* fix: upgrade @typescript-eslint/parser from 5.52.0 to 5.61.0 (#2743)

Snyk has created this PR to upgrade @typescript-eslint/parser from 5.52.0 to 5.61.0.

See this package in npm:


See this project in Snyk:
https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr

Co-authored-by: snyk-bot <snyk-bot@snyk.io>
Co-authored-by: Adam McArthur <46480158+Sharpz7@users.noreply.github.com>
Co-authored-by: Mohamed Abdelfatah <39927413+Mo-Fatah@users.noreply.github.com>

* fix: upgrade @types/react from 16.14.32 to 16.14.43 (#2747)

Snyk has created this PR to upgrade @types/react from 16.14.32 to 16.14.43.

See this package in npm:


See this project in Snyk:
https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr

Co-authored-by: snyk-bot <snyk-bot@snyk.io>
Co-authored-by: Adam McArthur <46480158+Sharpz7@users.noreply.github.com>
Co-authored-by: Mohamed Abdelfatah <39927413+Mo-Fatah@users.noreply.github.com>

* Bump github.com/go-openapi/jsonreference from 0.20.0 to 0.20.2 (#2316)

Bumps [github.com/go-openapi/jsonreference](https://github.com/go-openapi/jsonreference) from 0.20.0 to 0.20.2.
- [Release notes](https://github.com/go-openapi/jsonreference/releases)
- [Commits](go-openapi/jsonreference@v0.20.0...v0.20.2)

---
updated-dependencies:
- dependency-name: github.com/go-openapi/jsonreference
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Adam McArthur <46480158+Sharpz7@users.noreply.github.com>
Co-authored-by: Mohamed Abdelfatah <39927413+Mo-Fatah@users.noreply.github.com>

* Order leased jobs by serial (#2912)

This will ensure the job leased first, gets send to the cluster first

Currently we just order by postgres default sorting - which often picks the most recently leased - causing the first lease jobs to get stuck
 - This only occurs when scheduling is faster than leasing

* Bump webpack from 5.75.0 to 5.77.0 in /internal/lookout/ui (#2302)

Bumps [webpack](https://github.com/webpack/webpack) from 5.75.0 to 5.77.0.
- [Release notes](https://github.com/webpack/webpack/releases)
- [Commits](webpack/webpack@v5.75.0...v5.77.0)

---
updated-dependencies:
- dependency-name: webpack
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Adam McArthur <46480158+Sharpz7@users.noreply.github.com>
Co-authored-by: Mohamed Abdelfatah <39927413+Mo-Fatah@users.noreply.github.com>

* Bump word-wrap from 1.2.3 to 1.2.5 in /internal/lookout/ui (#2806)

Bumps [word-wrap](https://github.com/jonschlinkert/word-wrap) from 1.2.3 to 1.2.5.
- [Release notes](https://github.com/jonschlinkert/word-wrap/releases)
- [Commits](jonschlinkert/word-wrap@1.2.3...1.2.5)

---
updated-dependencies:
- dependency-name: word-wrap
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Adam McArthur <46480158+Sharpz7@users.noreply.github.com>
Co-authored-by: Mohamed Abdelfatah <39927413+Mo-Fatah@users.noreply.github.com>

* resolve flaky (#2914)

Co-authored-by: Adam McArthur <46480158+Sharpz7@users.noreply.github.com>

* fix: upgrade @typescript-eslint/eslint-plugin from 5.52.0 to 5.61.0 (#2744)

Snyk has created this PR to upgrade @typescript-eslint/eslint-plugin from 5.52.0 to 5.61.0.

See this package in npm:


See this project in Snyk:
https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr

Co-authored-by: snyk-bot <snyk-bot@snyk.io>
Co-authored-by: Adam McArthur <46480158+Sharpz7@users.noreply.github.com>
Co-authored-by: Mohamed Abdelfatah <39927413+Mo-Fatah@users.noreply.github.com>

* fix: upgrade react-router-dom from 6.9.0 to 6.14.1 (#2746)

Snyk has created this PR to upgrade react-router-dom from 6.9.0 to 6.14.1.

See this package in npm:


See this project in Snyk:
https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr

Co-authored-by: snyk-bot <snyk-bot@snyk.io>
Co-authored-by: Adam McArthur <46480158+Sharpz7@users.noreply.github.com>
Co-authored-by: Mohamed Abdelfatah <39927413+Mo-Fatah@users.noreply.github.com>

* Bump semver from 6.3.0 to 6.3.1 in /internal/lookout/ui (#2661)

Bumps [semver](https://github.com/npm/node-semver) from 6.3.0 to 6.3.1.
- [Release notes](https://github.com/npm/node-semver/releases)
- [Changelog](https://github.com/npm/node-semver/blob/v6.3.1/CHANGELOG.md)
- [Commits](npm/node-semver@v6.3.0...v6.3.1)

---
updated-dependencies:
- dependency-name: semver
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Adam McArthur <46480158+Sharpz7@users.noreply.github.com>
Co-authored-by: Mohamed Abdelfatah <39927413+Mo-Fatah@users.noreply.github.com>

* Run CodeQL once daily on a schedule (#2918)

* Helm chart update: executor  (#2917)

* Helm chart update: executor

At the moment the helm chart for the executor doesn't include priorityClass even though one is created in the chart. This means that the executor deployment is unable to set the priorityClass.

* Patch/dependencies (#2923)

* Bump github.com/go-openapi/strfmt from 0.21.3 to 0.21.7

Bumps [github.com/go-openapi/strfmt](https://github.com/go-openapi/strfmt) from 0.21.3 to 0.21.7.
- [Release notes](https://github.com/go-openapi/strfmt/releases)
- [Commits](go-openapi/strfmt@v0.21.3...v0.21.7)

---
updated-dependencies:
- dependency-name: github.com/go-openapi/strfmt
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* Bump github.com/go-openapi/runtime from 0.24.2 to 0.26.0

Bumps [github.com/go-openapi/runtime](https://github.com/go-openapi/runtime) from 0.24.2 to 0.26.0.
- [Release notes](https://github.com/go-openapi/runtime/releases)
- [Commits](go-openapi/runtime@v0.24.2...v0.26.0)

---
updated-dependencies:
- dependency-name: github.com/go-openapi/runtime
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Bump github.com/goreleaser/nfpm/v2 from 2.25.1 to 2.29.0

Bumps [github.com/goreleaser/nfpm/v2](https://github.com/goreleaser/nfpm) from 2.25.1 to 2.29.0.
- [Release notes](https://github.com/goreleaser/nfpm/releases)
- [Changelog](https://github.com/goreleaser/nfpm/blob/main/.goreleaser.yml)
- [Commits](goreleaser/nfpm@v2.25.1...v2.29.0)

---
updated-dependencies:
- dependency-name: github.com/goreleaser/nfpm/v2
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* Bump github.com/go-playground/validator/v10 from 10.11.1 to 10.14.1

Bumps [github.com/go-playground/validator/v10](https://github.com/go-playground/validator) from 10.11.1 to 10.14.1.
- [Release notes](https://github.com/go-playground/validator/releases)
- [Commits](go-playground/validator@v10.11.1...v10.14.1)

---
updated-dependencies:
- dependency-name: github.com/go-playground/validator/v10
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Bump Grpc.Net.Client in /client/DotNet/ArmadaProject.Io.Client

Bumps [Grpc.Net.Client](https://github.com/grpc/grpc-dotnet) from 2.47.0 to 2.52.0.
- [Release notes](https://github.com/grpc/grpc-dotnet/releases)
- [Changelog](https://github.com/grpc/grpc-dotnet/blob/master/doc/release_process.md)
- [Commits](grpc/grpc-dotnet@v2.47.0...v2.52.0)

---
updated-dependencies:
- dependency-name: Grpc.Net.Client
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* fix: upgrade @mui/material from 5.10.17 to 5.13.6

Snyk has created this PR to upgrade @mui/material from 5.10.17 to 5.13.6.

See this package in npm:


See this project in Snyk:
https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr

* fix: upgrade prettier from 2.7.1 to 2.8.8

Snyk has created this PR to upgrade prettier from 2.7.1 to 2.8.8.

See this package in npm:


See this project in Snyk:
https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr

* fix: upgrade @mui/icons-material from 5.10.16 to 5.14.3

Snyk has created this PR to upgrade @mui/icons-material from 5.10.16 to 5.14.3.

See this package in npm:


See this project in Snyk:
https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr

* fix: upgrade eslint-plugin-import from 2.26.0 to 2.28.0

Snyk has created this PR to upgrade eslint-plugin-import from 2.26.0 to 2.28.0.

See this package in npm:


See this project in Snyk:
https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr

* fix: upgrade eslint-config-prettier from 8.5.0 to 8.10.0

Snyk has created this PR to upgrade eslint-config-prettier from 8.5.0 to 8.10.0.

See this package in npm:


See this project in Snyk:
https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr

* Trying to update klog

* go mod fix

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: snyk-bot <snyk-bot@snyk.io>
Co-authored-by: Mohamed Abdelfatah <39927413+Mo-Fatah@users.noreply.github.com>

* Fix bug causing GetJobSetEvents to get stuck (#2903)

* Add error message of final job run to JobFailedMessage

When we hit the maximum retry limit, the JobFailedMessage just says something along the lines of
"Job has been retried too many times, giving up"

Now we include the final run error in that message - to make it easier to work out the cause of retries

* Fix bug causing GetJobSetEvents to get stuck

GetJobSetEvents only increments its fromId variable on sending new messages

However now all redis events produce api events that will be sent downstream

The issue here is if we get 500 redis events in a row that don't produce api events, then the fromId never gets updated
 - Meaning the watching gets stuck here

To fix this, ReadEvents now returns a lastMessageId. So if there are no messages to process, the fromId should be updated using the lastMessageId

* Formatting

* Bump @adobe/css-tools from 4.0.1 to 4.3.1 in /internal/lookout/ui (#2931)

Bumps [@adobe/css-tools](https://github.com/adobe/css-tools) from 4.0.1 to 4.3.1.
- [Changelog](https://github.com/adobe/css-tools/blob/main/History.md)
- [Commits](https://github.com/adobe/css-tools/commits)

---
updated-dependencies:
- dependency-name: "@adobe/css-tools"
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Improved etcd protection (#2925)

* Initial commit

* Delete unused code

* Export metrics collection delay metrics

* Add mutex to InMemoryJobRepository

* Add tests

* Lint

* Update internal/executor/configuration/types.go

* Lint

---------

Co-authored-by: JamesMurkin <jamesmurkin@hotmail.com>

* Stop executor requesting more jobs when it still has leased jobs (#2932)

* Stop executor requesting more jobs when it still has leased jobs

Currently we "queue" jobs to be submitted on the executor - which sit the leased state until they are submitted to kubernetes

However this causes 2 issues with our current setup:
 - It prevents back-pressure from working well on the scheduler side. As it sees all these "Leased" jobs as active, so just keep scheduling more
 - In the case we are slowing submission due to etcd going over its limit. We "queue" lots of jobs, and as soon as etcd goes under its limit we hit it with potentially thousands of jobs

This flow needs further work and thought - however for now this is the minimal fix to prevent bad behaviour

Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>

* WIP

Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>

* Fix scheduler side tests

Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>

* Implement number of requested jobs on executor side

Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>

* Remove unused config

Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>

* Fixing panic on startup when etcd health monitor not registered

Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>

* Enhance logging

Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>

* Set more sensible default for maxLeasedJobs

Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>

---------

Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>

* Fix race in etcd protections (#2937)

* Initial commit

* Fix MultiHealthMonitor race

* Fix etcd health metric naming conflict (#2939)

* Fix metric naming conflict

* Fix metric names

* Fix metrix prefix

* Fix label

* Bump golang.org/x/sync from 0.1.0 to 0.3.0 (#2946)

Bumps [golang.org/x/sync](https://github.com/golang/sync) from 0.1.0 to 0.3.0.
- [Commits](golang/sync@v0.1.0...v0.3.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sync
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add more scheduler metrics (#2906)

* Add jobs considered and refactor to counters

* Add fair share metrics

* Add reset for gauge metrics

* format

* cycle imports

* modify cycle return struct

* verbose logging

---------

Co-authored-by: Albin Severinson <albin@severinson.org>

* Update config.yaml (#2953)

* Remove gang job cardinality submit check. Add placeholder for min gang size

* Add msumner91 and mustafai to magic list of trusted people (#2956)

* Add msumner91 to magic list of trusted people

* Update .mergify.yml

* Airflow: always set credentials from args in channel ctor (#2952)

In the GrpcChannelArguments constructor, always set the
credentials_callback_args member from what is given. Add a test to
verify serialization round-tripping is complete, and a __eq__
implementation for GrpcChannelArguments.

Signed-off-by: Rich Scott <richscott@sent.com>

* Removed Makefile from repo (#2915)

Co-authored-by: Mohamed Abdelfatah <39927413+Mo-Fatah@users.noreply.github.com>

* Add per-queue scheduling rate-limiting (#2938)

* Initial commit

* Add rate limiters

* go mod tidy

* Updates

* Add tests

* Update default config

* Update default scheduler config

* Whitespace

* Cleanup

* Docstring improvements

* Remove limiter nil checks

* Add Cardinality() function on gctx

* Fix test

* Fix test

* Add note about signed commits to Contributor documentation (#2960)

* Add note about signed commits to Contributor documentation

Signed-off-by: Aviral Singh <itsaviral.2609@gmail.com>

* Add note about signed commits to Contributor documentation

---------

Signed-off-by: Aviral Singh <itsaviral.2609@gmail.com>

* ArmadaContext that includes a logger (#2934)

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* compilation!

* rename package

* more compilation

* rename to Context

* embed

* compilation

* compilation

* fix test

* remove old ctxloggers

* revert design doc

* revert developer doc

* formatting

* wip

* tests

* don't gen

* don't gen

* merged master

---------

Co-authored-by: Chris Martin <chris@cmartinit.co.uk>
Co-authored-by: Albin Severinson <albin@severinson.org>

* Bump armada airflow operator to version 0.5.4 (#2961)

* Bump armada airflow operator to version 0.5.4

Signed-off-by: Rich Scott <richscott@sent.com>

* Regenerate Airflow Operator Markdown doc.

Signed-off-by: Rich Scott <richscott@sent.com>

* Fix regenerated Airflow doc error.

Signed-off-by: Rich Scott <richscott@sent.com>

* Pin versions of all modules, especially around docs generation.

Signed-off-by: Rich Scott <richscott@sent.com>

* Regenerate Airflow docs using Python 3.10

Signed-off-by: Rich Scott <richscott@sent.com>

---------

Signed-off-by: Rich Scott <richscott@sent.com>

* Simulator Changes

Made a number of changes to the simulator and simulator tests, most notably:
 - Fixed implementation of minSubmitTime setting for workload
   specifications
 - Added tests for SchedulingConfigsFromPattern,
   ClusterSpecsFromPattern, WorkloadFromPattern
 - Added sample workloads, clusters and scheduling configs
 - Added tests which simulate per-pool and per-executorGroup scheduling
 - Implemented further metrics for use in simulator tests, such as a
   cluster's aggregate resources, number of preemptions and schedules
   for a given test run
 - Added optimisation to speed up simulator, whereby the scheduler skips
   the current schedule event if no eventSequences have been received
   since the previous schedule.

* Simplified TestClusterSpecsFromPattern and TestWorkloadFromPattern tests

* Removed unused test

* Fixed malformed yaml

* Improved metrics for simulations. Improved simulator tests with errorgroups.

* Removed all simulator test data except basic data necessary for testing

* Implementing CLI

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
Signed-off-by: Rich Scott <richscott@sent.com>
Signed-off-by: Aviral Singh <itsaviral.2609@gmail.com>
Co-authored-by: Daniel Rastelli <rastellidani@gmail.com>
Co-authored-by: Chris Martin <council_tax@hotmail.com>
Co-authored-by: Chris Martin <chris@cmartinit.co.uk>
Co-authored-by: Sarthak Negi <122533767+sarthaksarthak9@users.noreply.github.com>
Co-authored-by: Kevin Hannon <kannon1992@gmail.com>
Co-authored-by: Adam McArthur <46480158+Sharpz7@users.noreply.github.com>
Co-authored-by: Pradeep Kurapati <113408145+Pradeep-Kurapati@users.noreply.github.com>
Co-authored-by: Dave Gantenbein <dave@gr-oss.io>
Co-authored-by: Shivang Shandilya <101946115+ShivangShandilya@users.noreply.github.com>
Co-authored-by: Kevin Hannon <kehannon@redhat.com>
Co-authored-by: Clif Houck <me@clifhouck.com>
Co-authored-by: Mohamed Abdelfatah <39927413+Mo-Fatah@users.noreply.github.com>
Co-authored-by: Kanu Mike Chibundu <michotall95@gmail.com>
Co-authored-by: snyk-bot <snyk-bot@snyk.io>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: JamesMurkin <jamesmurkin@hotmail.com>
Co-authored-by: owenthomas17 <owen@owen-thomas.co.uk>
Co-authored-by: Albin Severinson <albin@severinson.org>
Co-authored-by: Mark Sumner <m.sumner91@hotmail.co.uk>
Co-authored-by: Rich Scott <rich@gr-oss.io>
Co-authored-by: MeenuyD <116630390+MeenuyD@users.noreply.github.com>
Co-authored-by: Aviral Singh <itsaviral.2609@gmail.com>
Co-authored-by: Mustafa Ilyas <mustafai@uberit.net>

* Adding verbose flag to simulator CLI, changing logging context in simulator

* Improved simulator CLI output, removed redundant features, implemented parallel simulations by addressing mutability of structures inputted into the simulator

* Removed unknown logging library

* Changing threadSafeLogger Info call to Print. Adding separation back between simulation results

* Implemented stochastic runtime for jobs using a shifted exponential distribution (#13)

* Implemented stochastic runtime for jobs using a shifted exponential distribution

* Implemented min submit time from dependency completion (#14)

Co-authored-by: Mustafa Ilyas <mustafai@uberit.net>

* Fixed tests

* Fixed implementation of shifted exponential distribution

* Using FP unrounded parameters to sample from distribution

* Modified stochastic runtime definition

* Adding logging to simulator

Co-authored-by: Mustafa Ilyas <mustafai@uberit.net>

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
Signed-off-by: Rich Scott <richscott@sent.com>
Signed-off-by: Aviral Singh <itsaviral.2609@gmail.com>
Co-authored-by: Albin Severinson <larsalbins@uberit.net>
Co-authored-by: Mustafa Ilyas <Mustafa.Ilyas@gresearch.co.uk>
Co-authored-by: Mustafa Ilyas <mustafai@uberit.net>
Co-authored-by: Daniel Rastelli <rastellidani@gmail.com>
Co-authored-by: Chris Martin <council_tax@hotmail.com>
Co-authored-by: Chris Martin <chris@cmartinit.co.uk>
Co-authored-by: Sarthak Negi <122533767+sarthaksarthak9@users.noreply.github.com>
Co-authored-by: Kevin Hannon <kannon1992@gmail.com>
Co-authored-by: Adam McArthur <46480158+Sharpz7@users.noreply.github.com>
Co-authored-by: Pradeep Kurapati <113408145+Pradeep-Kurapati@users.noreply.github.com>
Co-authored-by: Dave Gantenbein <dave@gr-oss.io>
Co-authored-by: Shivang Shandilya <101946115+ShivangShandilya@users.noreply.github.com>
Co-authored-by: Kevin Hannon <kehannon@redhat.com>
Co-authored-by: Clif Houck <me@clifhouck.com>
Co-authored-by: Mohamed Abdelfatah <39927413+Mo-Fatah@users.noreply.github.com>
Co-authored-by: Kanu Mike Chibundu <michotall95@gmail.com>
Co-authored-by: snyk-bot <snyk-bot@snyk.io>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: JamesMurkin <jamesmurkin@hotmail.com>
Co-authored-by: owenthomas17 <owen@owen-thomas.co.uk>
Co-authored-by: Albin Severinson <albin@severinson.org>
Co-authored-by: Mark Sumner <m.sumner91@hotmail.co.uk>
Co-authored-by: Rich Scott <rich@gr-oss.io>
Co-authored-by: MeenuyD <116630390+MeenuyD@users.noreply.github.com>
Co-authored-by: Aviral Singh <itsaviral.2609@gmail.com>
severinson added a commit that referenced this pull request Oct 27, 2023
* Sync out testsuite changes (#19)

* Update simulator

* Replace Output with C

* Typo

* Restore pkg proto

* Restore files

* Fixing simulator changes (#6)

* Fixing simulator changes

* Changed to less than or equal

Co-authored-by: Mustafa Ilyas <mustafai@uberit.net>

* Simulator Changes (#9)

* Add config and dependency injection to scheduler metrics (#2892)

* Replace metrics singleton with an injection pattern.

* fix

* add configuration structures to metrics

* add configuration

* rename elements

* Maker Pulsar ReceiverQueueSize Configurable (#2895)

* wip

* wip

* set receiverQueueSize to 100

* remove old PulsarReceiverQueueSize

* revert

* subscriptionin api

---------

Co-authored-by: Chris Martin <chris@cmartinit.co.uk>

* Add poll_interval (#2805)

* Add poll_interval

* Add poll_interval

* Added poll_interval

* update by running tox-e docs

---------

Co-authored-by: Kevin Hannon <kannon1992@gmail.com>
Co-authored-by: Adam McArthur <46480158+Sharpz7@users.noreply.github.com>

* Seperate python script for armada v1 and v2 system diagrams (#2758)

* Seperate python script for armada v1 system diagram

* removed generate.py so it can be replaced with two seperate files for Armada V1 and Armada V2

* Python script to generate Armada V2 system diagram

* generate_v1.py Update #1

* generate_v1.py Update Number:2

* generate.py runs generate_v1.py as well as generate_v2.py and it is consistent with our instructions as 'docs/design/diagrams/relationships'

* generate_v1.py Update No:3

* Armada V1 and Armada V2 diagrams

* updated relationships_diagram.md to include armada v1 and v2 diagrams

---------

Co-authored-by: Adam McArthur <46480158+Sharpz7@users.noreply.github.com>

* Add config to use autoupdater on tagged branches (#2905)

* #2904 add autoupdate config

* #2904 add label config and other options

* docs: create README.md for plugins directory (#2897)

* Create README.md for plugins directory

* Update README.md

* Update plugins/README.md

Co-authored-by: Kevin Hannon <kehannon@redhat.com>

* Update README.md

---------

Co-authored-by: Kevin Hannon <kehannon@redhat.com>
Co-authored-by: Adam McArthur <46480158+Sharpz7@users.noreply.github.com>

* Enables airflow operator level retry. (#2894)

* Update docker stuff for latest airflow 2.7.0

* Use AirflowException instead of AirflowFailException to allow for retries

* Remove codecov workflows (#2902)

* Upgrade Pulsar Client to v0.11 (#2896)

* update

* update pulsar client

* Fix bug causing server spinning

* Abstract out the retry until success logic for testing (#2901)

* Respond to review

---------

Co-authored-by: Chris Martin <chris@cmartinit.co.uk>
Co-authored-by: Daniel Rastelli <rastellidani@gmail.com>

* Sync quickstart/index.md with gh-pages/quickstart.md (#2891)

* Log Call Site (#2909)

* allow logger to report caller

* allow logger to report caller

* lint

---------

Co-authored-by: Chris Martin <chris@cmartinit.co.uk>

* Add cleaner test output for mage with os/exec.Command (#2907)

* feat: Update Semver from version 6.3.0 to 6.3.1 (#2686)

Co-authored-by: Adam McArthur <46480158+Sharpz7@users.noreply.github.com>

* fix: upgrade @typescript-eslint/parser from 5.52.0 to 5.61.0 (#2743)

Snyk has created this PR to upgrade @typescript-eslint/parser from 5.52.0 to 5.61.0.

See this package in npm:


See this project in Snyk:
https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr

Co-authored-by: snyk-bot <snyk-bot@snyk.io>
Co-authored-by: Adam McArthur <46480158+Sharpz7@users.noreply.github.com>
Co-authored-by: Mohamed Abdelfatah <39927413+Mo-Fatah@users.noreply.github.com>

* fix: upgrade @types/react from 16.14.32 to 16.14.43 (#2747)

Snyk has created this PR to upgrade @types/react from 16.14.32 to 16.14.43.

See this package in npm:


See this project in Snyk:
https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr

Co-authored-by: snyk-bot <snyk-bot@snyk.io>
Co-authored-by: Adam McArthur <46480158+Sharpz7@users.noreply.github.com>
Co-authored-by: Mohamed Abdelfatah <39927413+Mo-Fatah@users.noreply.github.com>

* Bump github.com/go-openapi/jsonreference from 0.20.0 to 0.20.2 (#2316)

Bumps [github.com/go-openapi/jsonreference](https://github.com/go-openapi/jsonreference) from 0.20.0 to 0.20.2.
- [Release notes](https://github.com/go-openapi/jsonreference/releases)
- [Commits](go-openapi/jsonreference@v0.20.0...v0.20.2)

---
updated-dependencies:
- dependency-name: github.com/go-openapi/jsonreference
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Adam McArthur <46480158+Sharpz7@users.noreply.github.com>
Co-authored-by: Mohamed Abdelfatah <39927413+Mo-Fatah@users.noreply.github.com>

* Order leased jobs by serial (#2912)

This will ensure the job leased first, gets send to the cluster first

Currently we just order by postgres default sorting - which often picks the most recently leased - causing the first lease jobs to get stuck
 - This only occurs when scheduling is faster than leasing

* Bump webpack from 5.75.0 to 5.77.0 in /internal/lookout/ui (#2302)

Bumps [webpack](https://github.com/webpack/webpack) from 5.75.0 to 5.77.0.
- [Release notes](https://github.com/webpack/webpack/releases)
- [Commits](webpack/webpack@v5.75.0...v5.77.0)

---
updated-dependencies:
- dependency-name: webpack
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Adam McArthur <46480158+Sharpz7@users.noreply.github.com>
Co-authored-by: Mohamed Abdelfatah <39927413+Mo-Fatah@users.noreply.github.com>

* Bump word-wrap from 1.2.3 to 1.2.5 in /internal/lookout/ui (#2806)

Bumps [word-wrap](https://github.com/jonschlinkert/word-wrap) from 1.2.3 to 1.2.5.
- [Release notes](https://github.com/jonschlinkert/word-wrap/releases)
- [Commits](jonschlinkert/word-wrap@1.2.3...1.2.5)

---
updated-dependencies:
- dependency-name: word-wrap
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Adam McArthur <46480158+Sharpz7@users.noreply.github.com>
Co-authored-by: Mohamed Abdelfatah <39927413+Mo-Fatah@users.noreply.github.com>

* resolve flaky (#2914)

Co-authored-by: Adam McArthur <46480158+Sharpz7@users.noreply.github.com>

* fix: upgrade @typescript-eslint/eslint-plugin from 5.52.0 to 5.61.0 (#2744)

Snyk has created this PR to upgrade @typescript-eslint/eslint-plugin from 5.52.0 to 5.61.0.

See this package in npm:


See this project in Snyk:
https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr

Co-authored-by: snyk-bot <snyk-bot@snyk.io>
Co-authored-by: Adam McArthur <46480158+Sharpz7@users.noreply.github.com>
Co-authored-by: Mohamed Abdelfatah <39927413+Mo-Fatah@users.noreply.github.com>

* fix: upgrade react-router-dom from 6.9.0 to 6.14.1 (#2746)

Snyk has created this PR to upgrade react-router-dom from 6.9.0 to 6.14.1.

See this package in npm:


See this project in Snyk:
https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr

Co-authored-by: snyk-bot <snyk-bot@snyk.io>
Co-authored-by: Adam McArthur <46480158+Sharpz7@users.noreply.github.com>
Co-authored-by: Mohamed Abdelfatah <39927413+Mo-Fatah@users.noreply.github.com>

* Bump semver from 6.3.0 to 6.3.1 in /internal/lookout/ui (#2661)

Bumps [semver](https://github.com/npm/node-semver) from 6.3.0 to 6.3.1.
- [Release notes](https://github.com/npm/node-semver/releases)
- [Changelog](https://github.com/npm/node-semver/blob/v6.3.1/CHANGELOG.md)
- [Commits](npm/node-semver@v6.3.0...v6.3.1)

---
updated-dependencies:
- dependency-name: semver
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Adam McArthur <46480158+Sharpz7@users.noreply.github.com>
Co-authored-by: Mohamed Abdelfatah <39927413+Mo-Fatah@users.noreply.github.com>

* Run CodeQL once daily on a schedule (#2918)

* Helm chart update: executor  (#2917)

* Helm chart update: executor

At the moment the helm chart for the executor doesn't include priorityClass even though one is created in the chart. This means that the executor deployment is unable to set the priorityClass.

* Patch/dependencies (#2923)

* Bump github.com/go-openapi/strfmt from 0.21.3 to 0.21.7

Bumps [github.com/go-openapi/strfmt](https://github.com/go-openapi/strfmt) from 0.21.3 to 0.21.7.
- [Release notes](https://github.com/go-openapi/strfmt/releases)
- [Commits](go-openapi/strfmt@v0.21.3...v0.21.7)

---
updated-dependencies:
- dependency-name: github.com/go-openapi/strfmt
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* Bump github.com/go-openapi/runtime from 0.24.2 to 0.26.0

Bumps [github.com/go-openapi/runtime](https://github.com/go-openapi/runtime) from 0.24.2 to 0.26.0.
- [Release notes](https://github.com/go-openapi/runtime/releases)
- [Commits](go-openapi/runtime@v0.24.2...v0.26.0)

---
updated-dependencies:
- dependency-name: github.com/go-openapi/runtime
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Bump github.com/goreleaser/nfpm/v2 from 2.25.1 to 2.29.0

Bumps [github.com/goreleaser/nfpm/v2](https://github.com/goreleaser/nfpm) from 2.25.1 to 2.29.0.
- [Release notes](https://github.com/goreleaser/nfpm/releases)
- [Changelog](https://github.com/goreleaser/nfpm/blob/main/.goreleaser.yml)
- [Commits](goreleaser/nfpm@v2.25.1...v2.29.0)

---
updated-dependencies:
- dependency-name: github.com/goreleaser/nfpm/v2
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>

* Bump github.com/go-playground/validator/v10 from 10.11.1 to 10.14.1

Bumps [github.com/go-playground/validator/v10](https://github.com/go-playground/validator) from 10.11.1 to 10.14.1.
- [Release notes](https://github.com/go-playground/validator/releases)
- [Commits](go-playground/validator@v10.11.1...v10.14.1)

---
updated-dependencies:
- dependency-name: github.com/go-playground/validator/v10
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* Bump Grpc.Net.Client in /client/DotNet/ArmadaProject.Io.Client

Bumps [Grpc.Net.Client](https://github.com/grpc/grpc-dotnet) from 2.47.0 to 2.52.0.
- [Release notes](https://github.com/grpc/grpc-dotnet/releases)
- [Changelog](https://github.com/grpc/grpc-dotnet/blob/master/doc/release_process.md)
- [Commits](grpc/grpc-dotnet@v2.47.0...v2.52.0)

---
updated-dependencies:
- dependency-name: Grpc.Net.Client
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* fix: upgrade @mui/material from 5.10.17 to 5.13.6

Snyk has created this PR to upgrade @mui/material from 5.10.17 to 5.13.6.

See this package in npm:


See this project in Snyk:
https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr

* fix: upgrade prettier from 2.7.1 to 2.8.8

Snyk has created this PR to upgrade prettier from 2.7.1 to 2.8.8.

See this package in npm:


See this project in Snyk:
https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr

* fix: upgrade @mui/icons-material from 5.10.16 to 5.14.3

Snyk has created this PR to upgrade @mui/icons-material from 5.10.16 to 5.14.3.

See this package in npm:


See this project in Snyk:
https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr

* fix: upgrade eslint-plugin-import from 2.26.0 to 2.28.0

Snyk has created this PR to upgrade eslint-plugin-import from 2.26.0 to 2.28.0.

See this package in npm:


See this project in Snyk:
https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr

* fix: upgrade eslint-config-prettier from 8.5.0 to 8.10.0

Snyk has created this PR to upgrade eslint-config-prettier from 8.5.0 to 8.10.0.

See this package in npm:


See this project in Snyk:
https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr

* Trying to update klog

* go mod fix

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: snyk-bot <snyk-bot@snyk.io>
Co-authored-by: Mohamed Abdelfatah <39927413+Mo-Fatah@users.noreply.github.com>

* Fix bug causing GetJobSetEvents to get stuck (#2903)

* Add error message of final job run to JobFailedMessage

When we hit the maximum retry limit, the JobFailedMessage just says something along the lines of
"Job has been retried too many times, giving up"

Now we include the final run error in that message - to make it easier to work out the cause of retries

* Fix bug causing GetJobSetEvents to get stuck

GetJobSetEvents only increments its fromId variable on sending new messages

However now all redis events produce api events that will be sent downstream

The issue here is if we get 500 redis events in a row that don't produce api events, then the fromId never gets updated
 - Meaning the watching gets stuck here

To fix this, ReadEvents now returns a lastMessageId. So if there are no messages to process, the fromId should be updated using the lastMessageId

* Formatting

* Bump @adobe/css-tools from 4.0.1 to 4.3.1 in /internal/lookout/ui (#2931)

Bumps [@adobe/css-tools](https://github.com/adobe/css-tools) from 4.0.1 to 4.3.1.
- [Changelog](https://github.com/adobe/css-tools/blob/main/History.md)
- [Commits](https://github.com/adobe/css-tools/commits)

---
updated-dependencies:
- dependency-name: "@adobe/css-tools"
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Improved etcd protection (#2925)

* Initial commit

* Delete unused code

* Export metrics collection delay metrics

* Add mutex to InMemoryJobRepository

* Add tests

* Lint

* Update internal/executor/configuration/types.go

* Lint

---------

Co-authored-by: JamesMurkin <jamesmurkin@hotmail.com>

* Stop executor requesting more jobs when it still has leased jobs (#2932)

* Stop executor requesting more jobs when it still has leased jobs

Currently we "queue" jobs to be submitted on the executor - which sit the leased state until they are submitted to kubernetes

However this causes 2 issues with our current setup:
 - It prevents back-pressure from working well on the scheduler side. As it sees all these "Leased" jobs as active, so just keep scheduling more
 - In the case we are slowing submission due to etcd going over its limit. We "queue" lots of jobs, and as soon as etcd goes under its limit we hit it with potentially thousands of jobs

This flow needs further work and thought - however for now this is the minimal fix to prevent bad behaviour

Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>

* WIP

Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>

* Fix scheduler side tests

Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>

* Implement number of requested jobs on executor side

Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>

* Remove unused config

Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>

* Fixing panic on startup when etcd health monitor not registered

Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>

* Enhance logging

Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>

* Set more sensible default for maxLeasedJobs

Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>

---------

Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>

* Fix race in etcd protections (#2937)

* Initial commit

* Fix MultiHealthMonitor race

* Fix etcd health metric naming conflict (#2939)

* Fix metric naming conflict

* Fix metric names

* Fix metrix prefix

* Fix label

* Bump golang.org/x/sync from 0.1.0 to 0.3.0 (#2946)

Bumps [golang.org/x/sync](https://github.com/golang/sync) from 0.1.0 to 0.3.0.
- [Commits](golang/sync@v0.1.0...v0.3.0)

---
updated-dependencies:
- dependency-name: golang.org/x/sync
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add more scheduler metrics (#2906)

* Add jobs considered and refactor to counters

* Add fair share metrics

* Add reset for gauge metrics

* format

* cycle imports

* modify cycle return struct

* verbose logging

---------

Co-authored-by: Albin Severinson <albin@severinson.org>

* Update config.yaml (#2953)

* Remove gang job cardinality submit check. Add placeholder for min gang size

* Add msumner91 and mustafai to magic list of trusted people (#2956)

* Add msumner91 to magic list of trusted people

* Update .mergify.yml

* Airflow: always set credentials from args in channel ctor (#2952)

In the GrpcChannelArguments constructor, always set the
credentials_callback_args member from what is given. Add a test to
verify serialization round-tripping is complete, and a __eq__
implementation for GrpcChannelArguments.

Signed-off-by: Rich Scott <richscott@sent.com>

* Removed Makefile from repo (#2915)

Co-authored-by: Mohamed Abdelfatah <39927413+Mo-Fatah@users.noreply.github.com>

* Add per-queue scheduling rate-limiting (#2938)

* Initial commit

* Add rate limiters

* go mod tidy

* Updates

* Add tests

* Update default config

* Update default scheduler config

* Whitespace

* Cleanup

* Docstring improvements

* Remove limiter nil checks

* Add Cardinality() function on gctx

* Fix test

* Fix test

* Add note about signed commits to Contributor documentation (#2960)

* Add note about signed commits to Contributor documentation

Signed-off-by: Aviral Singh <itsaviral.2609@gmail.com>

* Add note about signed commits to Contributor documentation

---------

Signed-off-by: Aviral Singh <itsaviral.2609@gmail.com>

* ArmadaContext that includes a logger (#2934)

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* wip

* compilation!

* rename package

* more compilation

* rename to Context

* embed

* compilation

* compilation

* fix test

* remove old ctxloggers

* revert design doc

* revert developer doc

* formatting

* wip

* tests

* don't gen

* don't gen

* merged master

---------

Co-authored-by: Chris Martin <chris@cmartinit.co.uk>
Co-authored-by: Albin Severinson <albin@severinson.org>

* Bump armada airflow operator to version 0.5.4 (#2961)

* Bump armada airflow operator to version 0.5.4

Signed-off-by: Rich Scott <richscott@sent.com>

* Regenerate Airflow Operator Markdown doc.

Signed-off-by: Rich Scott <richscott@sent.com>

* Fix regenerated Airflow doc error.

Signed-off-by: Rich Scott <richscott@sent.com>

* Pin versions of all modules, especially around docs generation.

Signed-off-by: Rich Scott <richscott@sent.com>

* Regenerate Airflow docs using Python 3.10

Signed-off-by: Rich Scott <richscott@sent.com>

---------

Signed-off-by: Rich Scott <richscott@sent.com>

* Simulator Changes

Made a number of changes to the simulator and simulator tests, most notably:
 - Fixed implementation of minSubmitTime setting for workload
   specifications
 - Added tests for SchedulingConfigsFromPattern,
   ClusterSpecsFromPattern, WorkloadFromPattern
 - Added sample workloads, clusters and scheduling configs
 - Added tests which simulate per-pool and per-executorGroup scheduling
 - Implemented further metrics for use in simulator tests, such as a
   cluster's aggregate resources, number of preemptions and schedules
   for a given test run
 - Added optimisation to speed up simulator, whereby the scheduler skips
   the current schedule event if no eventSequences have been received
   since the previous schedule.

* Simplified TestClusterSpecsFromPattern and TestWorkloadFromPattern tests

* Removed unused test

* Fixed malformed yaml

* Improved metrics for simulations. Improved simulator tests with errorgroups.

* Removed all simulator test data except basic data necessary for testing

* Implementing CLI

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
Signed-off-by: Rich Scott <richscott@sent.com>
Signed-off-by: Aviral Singh <itsaviral.2609@gmail.com>
Co-authored-by: Daniel Rastelli <rastellidani@gmail.com>
Co-authored-by: Chris Martin <council_tax@hotmail.com>
Co-authored-by: Chris Martin <chris@cmartinit.co.uk>
Co-authored-by: Sarthak Negi <122533767+sarthaksarthak9@users.noreply.github.com>
Co-authored-by: Kevin Hannon <kannon1992@gmail.com>
Co-authored-by: Adam McArthur <46480158+Sharpz7@users.noreply.github.com>
Co-authored-by: Pradeep Kurapati <113408145+Pradeep-Kurapati@users.noreply.github.com>
Co-authored-by: Dave Gantenbein <dave@gr-oss.io>
Co-authored-by: Shivang Shandilya <101946115+ShivangShandilya@users.noreply.github.com>
Co-authored-by: Kevin Hannon <kehannon@redhat.com>
Co-authored-by: Clif Houck <me@clifhouck.com>
Co-authored-by: Mohamed Abdelfatah <39927413+Mo-Fatah@users.noreply.github.com>
Co-authored-by: Kanu Mike Chibundu <michotall95@gmail.com>
Co-authored-by: snyk-bot <snyk-bot@snyk.io>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: JamesMurkin <jamesmurkin@hotmail.com>
Co-authored-by: owenthomas17 <owen@owen-thomas.co.uk>
Co-authored-by: Albin Severinson <albin@severinson.org>
Co-authored-by: Mark Sumner <m.sumner91@hotmail.co.uk>
Co-authored-by: Rich Scott <rich@gr-oss.io>
Co-authored-by: MeenuyD <116630390+MeenuyD@users.noreply.github.com>
Co-authored-by: Aviral Singh <itsaviral.2609@gmail.com>
Co-authored-by: Mustafa Ilyas <mustafai@uberit.net>

* Adding verbose flag to simulator CLI, changing logging context in simulator

* Improved simulator CLI output, removed redundant features, implemented parallel simulations by addressing mutability of structures inputted into the simulator

* Removed unknown logging library

* Changing threadSafeLogger Info call to Print. Adding separation back between simulation results

* Implemented stochastic runtime for jobs using a shifted exponential distribution (#13)

* Implemented stochastic runtime for jobs using a shifted exponential distribution

* Implemented min submit time from dependency completion (#14)

Co-authored-by: Mustafa Ilyas <mustafai@uberit.net>

* Fixed tests

* Fixed implementation of shifted exponential distribution

* Using FP unrounded parameters to sample from distribution

* Modified stochastic runtime definition

* Adding logging to simulator

Co-authored-by: Mustafa Ilyas <mustafai@uberit.net>

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
Signed-off-by: Rich Scott <richscott@sent.com>
Signed-off-by: Aviral Singh <itsaviral.2609@gmail.com>
Co-authored-by: Albin Severinson <larsalbins@uberit.net>
Co-authored-by: Mustafa Ilyas <Mustafa.Ilyas@gresearch.co.uk>
Co-authored-by: Mustafa Ilyas <mustafai@uberit.net>
Co-authored-by: Daniel Rastelli <rastellidani@gmail.com>
Co-authored-by: Chris Martin <council_tax@hotmail.com>
Co-authored-by: Chris Martin <chris@cmartinit.co.uk>
Co-authored-by: Sarthak Negi <122533767+sarthaksarthak9@users.noreply.github.com>
Co-authored-by: Kevin Hannon <kannon1992@gmail.com>
Co-authored-by: Adam McArthur <46480158+Sharpz7@users.noreply.github.com>
Co-authored-by: Pradeep Kurapati <113408145+Pradeep-Kurapati@users.noreply.github.com>
Co-authored-by: Dave Gantenbein <dave@gr-oss.io>
Co-authored-by: Shivang Shandilya <101946115+ShivangShandilya@users.noreply.github.com>
Co-authored-by: Kevin Hannon <kehannon@redhat.com>
Co-authored-by: Clif Houck <me@clifhouck.com>
Co-authored-by: Mohamed Abdelfatah <39927413+Mo-Fatah@users.noreply.github.com>
Co-authored-by: Kanu Mike Chibundu <michotall95@gmail.com>
Co-authored-by: snyk-bot <snyk-bot@snyk.io>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: JamesMurkin <jamesmurkin@hotmail.com>
Co-authored-by: owenthomas17 <owen@owen-thomas.co.uk>
Co-authored-by: Albin Severinson <albin@severinson.org>
Co-authored-by: Mark Sumner <m.sumner91@hotmail.co.uk>
Co-authored-by: Rich Scott <rich@gr-oss.io>
Co-authored-by: MeenuyD <116630390+MeenuyD@users.noreply.github.com>
Co-authored-by: Aviral Singh <itsaviral.2609@gmail.com>

* Add missing brace

* Lint

* Lint

* Lint

* Cleanup

* Testsuite improvements

* Lint

* Tidying

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
Signed-off-by: Rich Scott <richscott@sent.com>
Signed-off-by: Aviral Singh <itsaviral.2609@gmail.com>
Co-authored-by: Albin Severinson <Albin.Severinson@gresearch.co.uk>
Co-authored-by: Albin Severinson <larsalbins@uberit.net>
Co-authored-by: Mustafa Ilyas <Mustafa.Ilyas@gresearch.co.uk>
Co-authored-by: Mustafa Ilyas <mustafai@uberit.net>
Co-authored-by: Daniel Rastelli <rastellidani@gmail.com>
Co-authored-by: Chris Martin <council_tax@hotmail.com>
Co-authored-by: Chris Martin <chris@cmartinit.co.uk>
Co-authored-by: Sarthak Negi <122533767+sarthaksarthak9@users.noreply.github.com>
Co-authored-by: Kevin Hannon <kannon1992@gmail.com>
Co-authored-by: Adam McArthur <46480158+Sharpz7@users.noreply.github.com>
Co-authored-by: Pradeep Kurapati <113408145+Pradeep-Kurapati@users.noreply.github.com>
Co-authored-by: Dave Gantenbein <dave@gr-oss.io>
Co-authored-by: Shivang Shandilya <101946115+ShivangShandilya@users.noreply.github.com>
Co-authored-by: Kevin Hannon <kehannon@redhat.com>
Co-authored-by: Clif Houck <me@clifhouck.com>
Co-authored-by: Mohamed Abdelfatah <39927413+Mo-Fatah@users.noreply.github.com>
Co-authored-by: Kanu Mike Chibundu <michotall95@gmail.com>
Co-authored-by: snyk-bot <snyk-bot@snyk.io>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: JamesMurkin <jamesmurkin@hotmail.com>
Co-authored-by: owenthomas17 <owen@owen-thomas.co.uk>
Co-authored-by: Mark Sumner <m.sumner91@hotmail.co.uk>
Co-authored-by: Rich Scott <rich@gr-oss.io>
Co-authored-by: MeenuyD <116630390+MeenuyD@users.noreply.github.com>
Co-authored-by: Aviral Singh <itsaviral.2609@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Armada V1 and Armada V2 diagrams
3 participants