Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RI2 Ch03] Update Lab Requirements #2413

Merged
merged 4 commits into from Jun 19, 2021
Merged

[RI2 Ch03] Update Lab Requirements #2413

merged 4 commits into from Jun 19, 2021

Conversation

rihabbanday
Copy link
Collaborator

@rihabbanday rihabbanday commented May 11, 2021

Fixes #2397, Fixes #2208

@rihabbanday rihabbanday added this to In progress in RI2 via automation May 11, 2021
@rihabbanday rihabbanday changed the title {RI2 Ch03] Update Lab Requirements [RI2 Ch03] Update Lab Requirements May 11, 2021
@trevgc
Copy link
Collaborator

trevgc commented May 11, 2021

The purpose of specifying infrastructure requirements in chapter 2 i.e. "Requirements for Labs" is to agree on a "standard / common" environment for 1) community collaboration on development and testing of RI-2 and 2) allow anybody to setup their own environment to easily deploy RI-2 (i.e. purpose is not requirements for a production environment). IMO we should try to make the requirements as flexible as possible for current and anticipated needs but also very practical to minimize cost and support overhead. To this end should we specify a 1) minimum configuration 2) ideal configuration? Are there different deployment options that would impact the infrastructure requirements and should these be captured here? 3.2.2 Connectivity specifies 5 physical networks (from the original Pharos spec.) ... is this still a requirement for RA-2? Is out-of-band hardware management network being used for RI-2 (I am not sure that lab users have access and may only be used by lab admins). Does RI-2 require physically separate Admin and Public networks for dev/testing?

@acmacm
Copy link
Collaborator

acmacm commented May 12, 2021

Trevor, I would like to see a configuration that remains useful in the long-term. Also, I would like to see the networking specs: it's time to include 25G to 40G links to the switch, if this topic is in-scope.

@michaelspedersen
Copy link
Collaborator

Looking at RA2 2.2.1 these values (mem, storage) are the minimum per-pod requirement. For the node/server amount of resources must be quite a bit higher, both to support multiple pods per node but also for the overhead of the OS and K8s deployment.

I am thinking something along the lines of:

  • 2x (min.) 20 core x86_64 CPUs, for a total of 40 (or more) cores (80+ threads with SMT)
  • 256+GB memory (DDR4)
  • 1+TB storage (not sure if this should be raised to 1.5TB minimum)
    • Should be "solid" storage (NVMe and SSD).
    • Additional bulk storage can be added as HDD if needed (I have not looked at details for storage requirements)
  • 25Gbps+ network solution
    • Ideally at least 2 ports per socket (NUMA)
    • Additional NICs/ports (10Gbps+) might be useful as well
  • (to be considered) Additional PCI devices (e.g. FPGA, GPU, QAT) based on the mentions of device plugins
    • These could potentially be added later on

I think the above also follows some of the previous comments, but any additional input is appreciated.

@rihabbanday
Copy link
Collaborator Author

@trevgc @lylavoie @acmacm I agree to the points listed above - We had discussed the referenced issue #2397 in last week's RI2 call and concluded that it would be best to derive these minimum requirements from RA2 Basic Profile. But the minimum requirements in RA2 have clearly raised new questions which I believe should be clarified, best in a separate issue. I can't recall if there was a discussion last year when these minimum specifications, physical networks, etc. were originally documented in this RI2. I'll probably refactor the entire chapter, and will stick to the ideal specifications, something similar to what @michaelspedersen has proposed and what we are currently using in Kuberef. Min vs ideal requirements proposed by @trevgc sounds like a good proposal, but the question remains still open - how do we define the minimum requirements?

@lylavoie
Copy link
Collaborator

@rihabbanday I agree the minimum requirements for the architecture are for the RA2 documentation. But for the lab setup, I think those requirements would 1) likely be above the minimum (I'm thinking minimum is a true minimum), and 2) might need to address differing scenarios between development or trials, etc.

@acmacm For 25G networking, that would bring things inline with the requirements for the RI1 lab, which already lists 25G, so I support adding that into the documentation.

@jgu17
Copy link
Collaborator

jgu17 commented May 12, 2021

I agree with what was suggested above to have the "desired" or recommended lab spec is more important than to have the true minimum for the RI. The dev/trial vs reference lab In my mind is about the number of servers, even though it might not hurt to state that the dev configuration can have less than ram/disk etc. But to have some consistency across the lab pods (such as Intel lab), to use the same default ram/cpu/disk spec even for dev, it might have the benefits here.

The other question I have, but do not want to digress from what Rihab wants to focus here, is the coming Airship RI-2 has its own RI-2 lab requirement, most of which is similar to the desired/ideal spec Michael raised up. A quick question here is what is everyone' thought whether we want to list lab requirements separately for Airship and Kuberef, and find the common spec?

@gkunz
Copy link
Collaborator

gkunz commented May 12, 2021

Hi all,
to @jgu17's question: I would like to go for (wish for) a common specification valid for Kuberef and Airship. Separate requirement sets are just confusing and do not add value. Moreover, as mentioned above, the lab requirements should be a solid configuration above the minimum specs which should remain valid for some time. I assume we should be able to find such a configuration which suits both Kuberef and Airship.

@pgoyal01
Copy link
Collaborator

Want to second a number of similar comments. Can we also take 'workload" requirements into account such as open-source 5G core, Clearwater IMS, etc..

@michaelspedersen
Copy link
Collaborator

Want to second a number of similar comments. Can we also take 'workload" requirements into account such as open-source 5G core, Clearwater IMS, etc..

I agree it might be good to take some of these into considerations. I would however assume that most/all of them should work on "generic" hardware configurations, otherwise (at least in my opinion) it defeats the purpose of cramming it into a K8s cluster to begin with.

@rihabbanday
Copy link
Collaborator Author

As agreed in the RI2 meeting today - start with ideal requirements for RI2 that would be valid for both Kuberef and Airship.

@lylavoie
Copy link
Collaborator

I think this looks good now.

Copy link
Collaborator

@michaelspedersen michaelspedersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned during RI2 meeting I think these changes look good. It might still be good to leave the PR open for comments/requests for a few more days before merging.

@rihabbanday rihabbanday requested a review from pgoyal01 June 2, 2021 15:13
@pgoyal01 pgoyal01 requested a review from a team June 2, 2021 15:14
@pgoyal01
Copy link
Collaborator

@walterkozlowski This is ready for merge

@pgoyal01 pgoyal01 linked an issue Jun 15, 2021 that may be closed by this pull request
@rihabbanday
Copy link
Collaborator Author

@walterkozlowski @wmk-admin could you please merge/approve this PR? It has 3 approvals, but I am not able to merge it

@wmk-admin wmk-admin merged commit 195aa6f into anuket-project:master Jun 19, 2021
RI2 automation moved this from In progress to Done Jun 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
RI2
  
Done
Development

Successfully merging this pull request may close these issues.

[RI2 Ch03] Update Lab requirements [RI2 ch04] Add physical server specs
9 participants