Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local PV health monitor #2

Merged
merged 8 commits into from
Apr 17, 2018

Conversation

NickrenREN
Copy link
Contributor

Partly fix: #1

@caicloud-bot caicloud-bot added do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. caicloud-cla: yes Indicates the PR's author has not signed the Caicloud CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 8, 2018
@NickrenREN
Copy link
Contributor Author

/release-note-none

@caicloud-bot caicloud-bot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Apr 8, 2018
@NickrenREN NickrenREN mentioned this pull request Apr 8, 2018
3 tasks
@ddysher
Copy link
Member

ddysher commented Apr 8, 2018

do you have upstream PR? what's the relationship with the PR here?

@NickrenREN
Copy link
Contributor Author

yeah, Upstream PR: kubernetes-retired/external-storage#528
They did the same work, but upstream PR is integrated into external-storage repo,
This one can be deployed separately.

Copy link
Member

@ddysher ddysher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be highly coupled with local storage monitoring, what's the plan for generalizing it?

@@ -0,0 +1,4 @@
approvers:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need this anymore since we already have top-level one

Copy link
Contributor Author

@NickrenREN NickrenREN Apr 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still suggest we can have our owners under each package just like kubernetes does.
Since it is likely that different storage driver monitors are authored by different people. If we put all these reviewers into the top-level file, bot may assign the wrong people to the issues and PRs.

@@ -20,24 +20,33 @@ import (
"flag"
"os"

lvmonitor "github.com/caicloud/kube-storage-monitor/src/local-pv-monitor/pkg/monitor"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the reason for this layout? src/local-pv-monitor/cmd/main.go.

ideally, we should just use cmd/main.go; here is our project template, FYI

├── .github
│   ├── ISSUE_TEMPLATE
│   └── PULL_REQUEST_TEMPLATE
├── .gitignore
├── .pre-commit-config.yaml
├── CHANGELOG.md
├── Makefile
├── CODEOWNERS
├── README.md
├── bin
│   ├── admin
│   └── controller
├── cmd
│   ├── admin
│   │   └── admin.go
│   └── controller
│       └── controller.go
├── build
│   ├── admin
│   │   ├── Dockerfile
│   └── controller
│       ├── Dockerfile
├── docs
│   └── README.md
├── hack
│   ├── README.md
│   ├── deployment.yaml
│   └── script.sh
├── pkg
│   ├── utils
│   │   └── net
│   │       └── net.go
│   └── version
│       └── version.go
├── test
│   └── README.md
│   └── test_make.sh
├── third_party
│   └── README.md
└── vendor
    └── README.md

A brief description of the layout:

  • .github has two template files for creating PR and issue. Please see the files for more details.
  • .gitignore varies per project, but all projects need to ignore bin directory.
  • .pre-commit-config.yaml is for configuring pre-commit.
  • Makefile is used to build the project.
  • CHANGELOG.md contains auto-generated changelog information.
  • CODEOWNERS contains owners of the project.
  • README.md is a detailed description of the project.
  • bin is to hold build outputs.
  • cmd contains main packages, each subdirecoty of cmd is a main package.
  • build contains scripts, yaml files, dockerfiles, etc, to build and package the project.
  • docs for project documentations.
  • hack contains scripts used to manage this repository, e.g. codegen, installation, verification, etc.
  • pkg places most of project business logic.
  • test holds all tests (except unit tests), e.g. integration, e2e tests.
  • third_party for all third party libraries and tools, e.g. swagger ui, protocol buf, etc.
  • vendor contains all vendored code.

Copy link
Contributor Author

@NickrenREN NickrenREN Apr 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local pvs are kind of different from other storage drivers. Local PVs monitor must be deployed as Daemonset in order to watch local path on each node.
So i plan to make the local PV monitor separate from other monitors. Other storage driver PV monitors can be combined like you suggested.

@NickrenREN
Copy link
Contributor Author

NickrenREN commented Apr 8, 2018

@ddysher Regarding the plan for generalizing this, i planned to implement the local PV monitor as well as other kind of storage PV monitors in this repo at first, since we are not sure we can merge all of these into kubernetes core repo.
BTW: We plan to just support local PV monitor in external-storage at the first stage, and add other kind of PV monitoring support if needed,but we can implement all of these monitors more faster here even if upstream does not want to do so. Because i think it is necessary and important for users to monitor their data.

And also, monitors in this repo can be deployed independently, if possible, we can put monitor CSI supporting code here later too.

@ddysher
Copy link
Member

ddysher commented Apr 8, 2018

How about one binary for all use cases, that's more commonly seen from other projects,

for example,

kube-monitor --monit local-pv blabla

kube-monitor --monit ceph,glusterfs blabla

kube-monitor --monit ceph,glusterfs,local-pv blabla
ERROR!  can not run local-pv with other type of storage monitoring

or if you want to run only one binary at one time, use subcommand:

kube-monitor local-pv --blabla

kube-monitor ceph --blabla

I find it much easier to manage w.r.t deployment.

Implementation-wise, the plan sounds good to me.

@NickrenREN
Copy link
Contributor Author

Sounds good, i will reconsider it

@caicloud-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: nickrenren

Assign the PR to them by writing /assign @nickrenren in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@NickrenREN
Copy link
Contributor Author

NickrenREN commented Apr 8, 2018

How about this change for the layout ? @ddysher

├── cmd
│   ├── kube-storage-monitor
│   │   └── app
│   │       └──server.go
│   │   └── local-pv-monitor
│   │       └──server.go
│	└── main.go
├── pkg
│   ├── local-pv-monitor
│   │   └── config.go
│   │   └── localVolumeCache.go
│   │   └── monitor.go
│   └── util
│       └── util.go

@ddysher
Copy link
Member

ddysher commented Apr 8, 2018

Much better.

you can "s/local-pv-monitor/local-pv", since you already have monitor in parent directory.

@NickrenREN
Copy link
Contributor Author

seems not a big deal, done @ddysher

@NickrenREN
Copy link
Contributor Author

will test this PR later

Copy link
Member

@ddysher ddysher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

briefly scanned the implementation

can u also add design, readme, makefile, etc (in later PRs)

// CheckNodeAffinity looks at the PV node affinity, and checks if the node has the same corresponding labels
// This ensures that we don't mount a volume that doesn't belong to this node
func CheckNodeAffinity(pv *v1.PersistentVolume, nodeLabels map[string]string) (bool, error) {
affinity, err := helper.GetStorageNodeAffinityFromAnnotation(pv.Annotations)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this only get affinity from annotation? what if this is using a beta version where affinity is part of the volume spec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, i will update this

}
return
}
// TODO: make sure that PV used bytes is not greater that PV capacity ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, we should do this.

updating PV with capacity will pose unreasonable pressure on API server, to surface this, i'm thinking of dong threshold based approach. for example, we pv status is CapacityPressure if 80% of its capacity has been used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updating PV with capacity will pose unreasonable pressure on API server, to surface this, i'm thinking of dong threshold based approach. for example, we pv status is CapacityPressure if 80% of its capacity has been used.

sounds good, can do this in the following PRs

}

// check PV size: PV capacity must not be greater than device capacity and PV used bytes must not be greater that PV capacity
dir, _ := monitor.VolUtil.IsDir(mountPath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

won't this go wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it will. If err occurs, the first return value will certainly be false, so i did not check the err.
Maybe i need to log an error message.


func (monitor *LocalPVMonitor) checkMountPoint(mountPath string, pv *v1.PersistentVolume) bool {
// Retrieve list of mount points to iterate through discovered paths (aka files) below
mountPoints, mountPointsErr := monitor.RuntimeConfig.Mounter.List()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this operation can be potentially heavy if there's a lot of pods (list mount point), any thoughts on optimizing this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cache mountpoints ? is this needed ? we just need to list mountpoints on this node. And each node will have a daemonset to do this.

for _, mp := range mountPoints {
if mp.Path == mountPath {
glog.V(10).Infof("mountPath is still a mount point: %s", mountPath)
err := monitor.markOrUnmarkPV(pv, NotMountPoint, "yes", false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if i change from a mountpoint to not a mountpoint, then change it back again, will there be any problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i am thinking this too.
Basically i think we still need to mark the local PV as unhealthy, because it had been not a mountpoint for a period of time (users may have lost some data).
But i will unmark the local pv in the implementation now. Maybe i need to change it


if mark {
// mark PV
_, ok := volumeClone.ObjectMeta.Annotations[ann]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so right now, then only thing we do is to to mark the PV using annotation and send an event right?

Copy link
Contributor Author

@NickrenREN NickrenREN Apr 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, since we do not have taint API, using annotations instead

@NickrenREN
Copy link
Contributor Author

can u also add design, readme, makefile, etc (in later PRs)

yeah, sure, will do

@NickrenREN NickrenREN changed the title Local PV health monitor WIP: Local PV health monitor Apr 10, 2018
@caicloud-bot caicloud-bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 10, 2018
@NickrenREN NickrenREN force-pushed the local-pv-health-monitor branch 2 times, most recently from 6a502e5 to 6a89167 Compare April 11, 2018 10:37
@NickrenREN NickrenREN changed the title WIP: Local PV health monitor Local PV health monitor Apr 12, 2018
@caicloud-bot caicloud-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 12, 2018
@NickrenREN
Copy link
Contributor Author

/hold

@caicloud-bot caicloud-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 12, 2018
@ddysher
Copy link
Member

ddysher commented Apr 16, 2018

@NickrenREN what's the status?

@NickrenREN
Copy link
Contributor Author

will update the dependencies today, and then ,we can merge it

@NickrenREN
Copy link
Contributor Author

/hold cancel

@caicloud-bot caicloud-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 16, 2018
@NickrenREN
Copy link
Contributor Author

we can go now @ddysher

@NickrenREN
Copy link
Contributor Author

merging...
will enrich this later

@NickrenREN NickrenREN merged commit 33a115f into caicloud:master Apr 17, 2018
@NickrenREN NickrenREN deleted the local-pv-health-monitor branch April 17, 2018 08:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
caicloud-cla: yes Indicates the PR's author has not signed the Caicloud CLA. release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add monitor for local storage PVs
3 participants