Skip to content
This repository has been archived by the owner on Oct 24, 2023. It is now read-only.

feat: revive CoreOS support #892

Merged
merged 40 commits into from Apr 11, 2019
Merged

feat: revive CoreOS support #892

merged 40 commits into from Apr 11, 2019

Conversation

alexeldeib
Copy link
Contributor

@alexeldeib alexeldeib commented Mar 27, 2019

Reason for Change:

Fixes #584

Issue Fixed:

Requirements:

Notes:
I took the absolute dead simplest approach I could to get this working. Right now the masters and agents successfully form a cluster. Seems like there are some issues with rpc-statd and possibly volume mounts. Got some running pods though!

image

@acs-bot acs-bot added the size/L label Mar 27, 2019
@alexeldeib alexeldeib changed the title feat: revive CoreOS support [WIP] [WIP] feat: revive CoreOS support Mar 27, 2019
@alexeldeib alexeldeib changed the title [WIP] feat: revive CoreOS support [WIP] chore: revive CoreOS support Mar 28, 2019
@alexeldeib
Copy link
Contributor Author

alexeldeib commented Mar 28, 2019

I got rpc-statd to work on the master nodes by adding a Requires=rpc-statd.service to kubelet.service, but it seems that doesn't do the trick on worker nodes. I'm only looking at this because I thought it's the reason the flexvolumes don't work, but plausibly it's a red herring? I think rpc-statd will only start when you try to actually do a mount, which etcd does on the masters, so maybe i'm chasing my own tail.

I need a deeper understanding of flexvolume implementation/kubernetes storage and dependencies, probably. The cluster looks pretty okay otherwise. Needs a LOT of testing.

@alexeldeib
Copy link
Contributor Author

Also, anything customized like the SGX drivers or GPU drivers will almost certainly not work with coreos as-is.

@acs-bot acs-bot added size/XL and removed size/L labels Mar 28, 2019
ExecStart=/usr/bin/dockerd -H fd:// --storage-driver=overlay2 --bip={{WrapAsParameter "dockerBridgeCidr"}}
{{end}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's keep the existing whitespace/indentation usage pattern

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to clarify, anything that's not whitespace sensitive should be fully left-justified?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure if the stuff inside content should be indented with the text or misaligned to make it more pronounced

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, you're correct that the template layer tokens that open and close block/lexical scope (e.g., {{) aren't whitespace-sensitive, but for readability/maintainability we like to align them.

E.g.,

        {{if foo}}
bar baz...
        {{end}}

So in this case let's add 4 spaces to the {{end}} on L150 so it's left-aligned w/ the {{if .IsCoreOS}}.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you got it :)

@jackfrancis
Copy link
Member

Generally lgtm, let's get rid of those .bak files (also added a nit) and make add an E2E test to protect the coreos scenario going forward.

Thanks for adding this!

@acs-bot acs-bot added size/L and removed size/XL labels Mar 29, 2019
@jackfrancis
Copy link
Member

@alexeldeib I just made your life more difficult by merging this:

#870

That said, I think this PR is mergeable, so feel free to get rid of the [WIP] prefix, lemme know if rebase is tricky

@alexeldeib
Copy link
Contributor Author

@jackfrancis I don't think flexvolumes work, and the default model includes blobfuse and keyvault flexvolumes. I was hoping to make those work but it's turning out to be a bit tricky.

Would you prefer I write tests for the basic functionality and disable the flexvolumes (+ possibly validation for addons/customization which will not work with coreos) or wait to merge this until I get at least flexvols working?

I'll take a look at the merge.

@alexeldeib
Copy link
Contributor Author

alexeldeib commented Mar 30, 2019

I haven't looked deeply at the e2e tests yet, but I would at least like to manually validate some normal scenarios, e.g. deploy nginx-ingress and try to hit some running pods to make sure things work as expected. Looks like e2e supports something like that.

I'm not well versed in PVs/kube storage but if flexvols don't work there could be some wider storage issues (although, I didn't see any issues loading the other volume plugins and e.g. mounting secrets).

@jackfrancis
Copy link
Member

@alexeldeib Because CoreOS is still definitely a feature "edge case", we definitely don't have to require functional parity w/ Ubuntu before merging. Let's make sure the docs aren't obviously misleading in places if it makes sense to suggests (CoreOS support coming soon! or something), and yeah, make it so that we don't deliver addon specs that just result in failed pods at cluster creation time.

In terms of E2E tests, I'd like to throw the whole kitchen sink at a CoreOS-backed cluster and see what fails. Again, it's an experimental scenario, let's merge it when it's minimally functional, and then iterate towards functional parity.

@alexeldeib
Copy link
Contributor Author

/azp run pr-e2e

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@Azure Azure deleted a comment from azure-pipelines bot Mar 30, 2019
@alexeldeib
Copy link
Contributor Author

/azp run pr-e2e

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@codecov
Copy link

codecov bot commented Mar 30, 2019

Codecov Report

Merging #892 into master will decrease coverage by <.01%.
The diff coverage is 79.16%.

@@            Coverage Diff             @@
##           master     #892      +/-   ##
==========================================
- Coverage   74.51%   74.51%   -0.01%     
==========================================
  Files         131      131              
  Lines       17926    17963      +37     
==========================================
+ Hits        13358    13385      +27     
- Misses       3824     3829       +5     
- Partials      744      749       +5

@alexeldeib
Copy link
Contributor Author

/azp run pr-e2e

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@alexeldeib
Copy link
Contributor Author

/azp run pr-e2e

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@alexeldeib alexeldeib changed the title [WIP] chore: revive CoreOS support chore: revive CoreOS support Mar 30, 2019
@alexeldeib
Copy link
Contributor Author

Still need to update model validation to prevent deploying busted pods and update tests on CSE script to reflect coreos.

e2e looking surprisingly good!

@alexeldeib
Copy link
Contributor Author

/azp run pr-e2e

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@alexeldeib
Copy link
Contributor Author

@jackfrancis @CecileRobertMichon think I managed the rebase, also added a simple validation test on addons. Hopefully this is good to merge!

backlog automation moved this from In progress to Under Review Apr 11, 2019
Copy link
Contributor

@CecileRobertMichon CecileRobertMichon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jackfrancis
Copy link
Member

/lgtm

@acs-bot
Copy link

acs-bot commented Apr 11, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alexeldeib, jackfrancis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@acs-bot acs-bot merged commit ef2000d into Azure:master Apr 11, 2019
backlog automation moved this from Under Review to Done Apr 11, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
backlog
  
Done
Development

Successfully merging this pull request may close these issues.

CoreOs Support
5 participants