-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Local PV health monitor #2
Local PV health monitor #2
Conversation
/release-note-none |
35fceb5
to
b9cfe42
Compare
do you have upstream PR? what's the relationship with the PR here? |
yeah, Upstream PR: kubernetes-retired/external-storage#528 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be highly coupled with local storage monitoring, what's the plan for generalizing it?
src/local-pv-monitor/OWNERS
Outdated
@@ -0,0 +1,4 @@ | |||
approvers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need this anymore since we already have top-level one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still suggest we can have our owners under each package just like kubernetes does.
Since it is likely that different storage driver monitors are authored by different people. If we put all these reviewers into the top-level file, bot may assign the wrong people to the issues and PRs.
src/local-pv-monitor/cmd/main.go
Outdated
@@ -20,24 +20,33 @@ import ( | |||
"flag" | |||
"os" | |||
|
|||
lvmonitor "github.com/caicloud/kube-storage-monitor/src/local-pv-monitor/pkg/monitor" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the reason for this layout? src/local-pv-monitor/cmd/main.go
.
ideally, we should just use cmd/main.go
; here is our project template, FYI
├── .github
│ ├── ISSUE_TEMPLATE
│ └── PULL_REQUEST_TEMPLATE
├── .gitignore
├── .pre-commit-config.yaml
├── CHANGELOG.md
├── Makefile
├── CODEOWNERS
├── README.md
├── bin
│ ├── admin
│ └── controller
├── cmd
│ ├── admin
│ │ └── admin.go
│ └── controller
│ └── controller.go
├── build
│ ├── admin
│ │ ├── Dockerfile
│ └── controller
│ ├── Dockerfile
├── docs
│ └── README.md
├── hack
│ ├── README.md
│ ├── deployment.yaml
│ └── script.sh
├── pkg
│ ├── utils
│ │ └── net
│ │ └── net.go
│ └── version
│ └── version.go
├── test
│ └── README.md
│ └── test_make.sh
├── third_party
│ └── README.md
└── vendor
└── README.md
A brief description of the layout:
.github
has two template files for creating PR and issue. Please see the files for more details..gitignore
varies per project, but all projects need to ignorebin
directory..pre-commit-config.yaml
is for configuring pre-commit.Makefile
is used to build the project.CHANGELOG.md
contains auto-generated changelog information.CODEOWNERS
contains owners of the project.README.md
is a detailed description of the project.bin
is to hold build outputs.cmd
contains main packages, each subdirecoty ofcmd
is a main package.build
contains scripts, yaml files, dockerfiles, etc, to build and package the project.docs
for project documentations.hack
contains scripts used to manage this repository, e.g. codegen, installation, verification, etc.pkg
places most of project business logic.test
holds all tests (except unit tests), e.g. integration, e2e tests.third_party
for all third party libraries and tools, e.g. swagger ui, protocol buf, etc.vendor
contains all vendored code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
local
pvs are kind of different from other storage drivers. Local PVs monitor must be deployed as Daemonset
in order to watch local path on each node.
So i plan to make the local
PV monitor separate from other monitors. Other storage driver PV monitors can be combined like you suggested.
@ddysher Regarding the plan for generalizing this, i planned to implement the local PV monitor as well as other kind of storage PV monitors in this repo at first, since we are not sure we can merge all of these into kubernetes core repo. And also, monitors in this repo can be deployed independently, if possible, we can put |
How about one binary for all use cases, that's more commonly seen from other projects, for example,
or if you want to run only one binary at one time, use subcommand:
I find it much easier to manage w.r.t deployment. Implementation-wise, the plan sounds good to me. |
Sounds good, i will reconsider it |
b9cfe42
to
bc33a59
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: Assign the PR to them by writing The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
How about this change for the layout ? @ddysher
|
Much better. you can "s/local-pv-monitor/local-pv", since you already have monitor in parent directory. |
bc33a59
to
3dcf222
Compare
seems not a big deal, done @ddysher |
will test this PR later |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
briefly scanned the implementation
can u also add design, readme, makefile, etc (in later PRs)
pkg/local-pv-monitor/monitor.go
Outdated
// CheckNodeAffinity looks at the PV node affinity, and checks if the node has the same corresponding labels | ||
// This ensures that we don't mount a volume that doesn't belong to this node | ||
func CheckNodeAffinity(pv *v1.PersistentVolume, nodeLabels map[string]string) (bool, error) { | ||
affinity, err := helper.GetStorageNodeAffinityFromAnnotation(pv.Annotations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this only get affinity from annotation? what if this is using a beta version where affinity is part of the volume spec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, i will update this
pkg/local-pv-monitor/monitor.go
Outdated
} | ||
return | ||
} | ||
// TODO: make sure that PV used bytes is not greater that PV capacity ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, we should do this.
updating PV with capacity will pose unreasonable pressure on API server, to surface this, i'm thinking of dong threshold based approach. for example, we pv status is CapacityPressure if 80% of its capacity has been used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updating PV with capacity will pose unreasonable pressure on API server, to surface this, i'm thinking of dong threshold based approach. for example, we pv status is CapacityPressure if 80% of its capacity has been used.
sounds good, can do this in the following PRs
pkg/local-pv-monitor/monitor.go
Outdated
} | ||
|
||
// check PV size: PV capacity must not be greater than device capacity and PV used bytes must not be greater that PV capacity | ||
dir, _ := monitor.VolUtil.IsDir(mountPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
won't this go wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, it will. If err occurs, the first return value will certainly be false
, so i did not check the err.
Maybe i need to log an error message.
pkg/local-pv-monitor/monitor.go
Outdated
|
||
func (monitor *LocalPVMonitor) checkMountPoint(mountPath string, pv *v1.PersistentVolume) bool { | ||
// Retrieve list of mount points to iterate through discovered paths (aka files) below | ||
mountPoints, mountPointsErr := monitor.RuntimeConfig.Mounter.List() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this operation can be potentially heavy if there's a lot of pods (list mount point), any thoughts on optimizing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cache mountpoints ? is this needed ? we just need to list mountpoints on this node. And each node will have a daemonset to do this.
pkg/local-pv-monitor/monitor.go
Outdated
for _, mp := range mountPoints { | ||
if mp.Path == mountPath { | ||
glog.V(10).Infof("mountPath is still a mount point: %s", mountPath) | ||
err := monitor.markOrUnmarkPV(pv, NotMountPoint, "yes", false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if i change from a mountpoint to not a mountpoint, then change it back again, will there be any problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i am thinking this too.
Basically i think we still need to mark the local PV as unhealthy, because it had been not a mountpoint for a period of time (users may have lost some data).
But i will unmark the local pv in the implementation now. Maybe i need to change it
pkg/local-pv-monitor/monitor.go
Outdated
|
||
if mark { | ||
// mark PV | ||
_, ok := volumeClone.ObjectMeta.Annotations[ann] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so right now, then only thing we do is to to mark the PV using annotation and send an event right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, since we do not have taint API, using annotations instead
yeah, sure, will do |
3dcf222
to
bacc6ae
Compare
6a502e5
to
6a89167
Compare
6a89167
to
b66cf37
Compare
/hold |
@NickrenREN what's the status? |
will update the dependencies today, and then ,we can merge it |
/hold cancel |
we can go now @ddysher |
merging... |
Partly fix: #1