Skip to content
This repository has been archived by the owner on Feb 1, 2021. It is now read-only.

Filter for containerslots-label #1809

Closed
wants to merge 0 commits into from
Closed

Filter for containerslots-label #1809

wants to merge 0 commits into from

Conversation

ottes
Copy link

@ottes ottes commented Feb 12, 2016

As proposed in #1734

@ottes
Copy link
Author

ottes commented Feb 12, 2016

i'm confused about that failing build...

The command "fgt golint ./..." exited with 1.
$ fgt find test/ -type f \( -name "*.sh" -or -name "*.bash" -or -name "*.bats" \) -exec grep -Hn -e "^ " {} \;

This exits with 0 at my local mac, so any suggestions how to fix this?
And any comments about the filter?

@abronan
Copy link
Contributor

abronan commented Feb 17, 2016

@xFragger:

scheduler/filter/slots.go:11:2: exported var ErrNoNodeWithFreeSlotsAvailable should have comment or be unexported

ErrNoNodeWithFreeSlotsAvailable should have a comment beginning by:

// ErrNoNodeWithFreeSlotsAvailable <...>

@abronan
Copy link
Contributor

abronan commented Feb 17, 2016

/cc @dongluochen

@ottes
Copy link
Author

ottes commented Mar 9, 2016

So what is the next step here? Any suggestions? Anything i should change? Is there any hope, this will be adopted?

@dongluochen
Copy link
Contributor

@xFragger Thanks for your PR! Can you add integration test under test/integration? You can take a look at other examples like port-filters.bats. We would also like you to update documents but let's do that in a separate PR after this.

@nishanttotla
Copy link
Contributor

@xFragger Could you also please squash your commits into a single one? Easiest way to do that (for the case of 5 commits like here) is

$ git reset --soft HEAD~5
$ git add .
$ git commit -s -m "<commit-message>"
$ git push -f <remote-branch-on-forked-repo>

Also, if this gets merged after #1796, you might have to write a little more code to add code to print all constraints :)

@abronan
Copy link
Contributor

abronan commented Mar 10, 2016

ping @vieux WDYT about this?

@ottes
Copy link
Author

ottes commented Mar 14, 2016

Sorry for my git-stupidness, but i'm not able to squash the commits at the moment... i think it's caused by the 2 merges from master... can someone help me out with that?
About the docs, what speaks against using this PR for that?
Which docs(files) should i update?
Any idea, why travis fails?


@test "containerslots filter" {
export DOCKER_DAEMON_ADDITIONAL_PARAMETERS="--label containerslots=2"
start_docker_with_busybox 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use start_docker_with_busybox 2 --label containerslots=2 no need for DOCKER_DAEMON_ADDITIONAL_PARAMETERS

@nishanttotla
Copy link
Contributor

@xFragger if you can't squash commits with soft reset, you might have to make a backup copy of the code, do a hard reset, then redo the changes. The current list of commits needs to cleaned up to keep the repo history clean.

Secondly, you will need to make appropriate changes since #1796 was merged. Let us know if you have questions :)

@ottes
Copy link
Author

ottes commented Mar 15, 2016

so i will make the changes for #1796, when you guys give the okay, i will hard reset and redo everything

@nishanttotla
Copy link
Contributor

@xFragger sure, whatever you're comfortable with. In general, to avoid too many commits, you can use git commit --amend to add changes to an existing commit, then force push using git push -f to reflect those changes remotely.

@ottes
Copy link
Author

ottes commented Mar 15, 2016

thought a while about it.... isn't my case similar to the "healthy-case" and an exception as well?
if it can't find a node without free slots, its the same as it can't found a node that is healthy.

As an alternative i could just return something like "free slot"

for _, node := range nodes {

if slotsString, ok := node.Labels["containerslots"]; ok {
slots, err := strconv.Atoi(slotsString) //if err => cannot cast to int, so no candidate for us
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

//if err => cannot cast to int, so no candidate for us

I think it wrong to implicitly not using a node. It'd be better to fail it explicitly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to any ideas.... but i don't know, what i should do alternatively...
Return an error, to prevent starting a container at all?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about 2 different approaches.

  • This label is invalid so it doesn't take effect. The node would be included as a candidate. You may add a error logging for the manager of this invalid label. It might generate too much log if we do it here though.
  • Do validation on this label and put the node to unhealthy state. Nodes in unhealthy state wouldn't take new containers. But docker -H manager:port info would show the node as unhealthy. So it's an explicit failure.

What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be okay with both approaches.... i don't think its too much logging, but the problem is: who reads the syslog?! i guess nearly nobody...

The other approach sounds better too me, someone startet a docker daemon with a faulty label value, so it will never be healthy to start any containers on it. Will the user see, "why" it is unhealthy? There is some kind of error field in the info view, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the second approach it's a little more complicated than I thought. Yes swarm info can show errors for a node. But unhealthy is designed for reachability failure, while pending is designed for node validation. See #1486. It looks like pending is a better state for this failure. But it was a one-way flow. Now if docker label gets updated on-the-fly the node will move back to pending. It may require some work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first idea was for the docker daemon, it was for the swarm agent, running on the nodes, can't they have a command line argument at start-time?

I do not follow. Swarm agent is docker daemon. Label is what you can put as user specification.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You start one swarm master that controls your swarm and on every docker daemon you start one swarm-agent, don't you?
docker run -d swarm join --advertise=<node_ip>:2375 consul://<consul_ip>:8500
and my idea was to introduce another variable there (if possible)
docker run -d swarm join --slots=3 --advertise=<node_ip>:2375 consul://<consul_ip>:8500

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a good idea to repurpose join to change a agent's configuration. Swarm may deprecate join.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, to be sure, i'm going to"just" ignore the label, as it was not there, when its not castable to integer and then we could possibly merge this thing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a acceptable solution.

@ottes
Copy link
Author

ottes commented Apr 4, 2016

One last problem about #1796. If i got it right, i just return something like "free slot". It more something like the health filter in my opinion, which is a special case.

I think if all slots are full on all nodes, the message should be something like "Unable to find a node that satisfies the following conditions" linebreak "[free slots]", right?

@ottes
Copy link
Author

ottes commented Apr 4, 2016

waiting for your feedback, will hard-reset+redo to clean up commit-history, when everything is okay for you

# And the number of running containers should be 5.
run docker_swarm ps
[ "${#lines[@]}" -eq 6 ]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Format problem.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing linebreak? or something else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On line 77 I see a red return at the end.

@dongluochen
Copy link
Contributor

I think if all slots are full on all nodes, the message should be something like "Unable to find a node that satisfies the following conditions" linebreak "[free slots]", right?

Take a look at #1796. Each filter can reduce the candidate set. We are printing the filters till the candidate set is empty.

@ottes
Copy link
Author

ottes commented Apr 8, 2016

sorry, could you just help for that last little thing? what excatly should my filter return unter which circumstances?

@dongluochen
Copy link
Contributor

You already implements the GetFilters function. If this is the function leading to empty candidate set, free slots would show up.

func (f *SlotsFilter) GetFilters(config *cluster.ContainerConfig) ([]string, error) {
    return []string{"free slots"}, nil
}

Here is the example for port filter and how the test works.
https://github.com/docker/swarm/blob/master/scheduler/filter/port.go#L123
https://github.com/docker/swarm/blob/master/test/integration/port-filters.bats#L45

@ottes
Copy link
Author

ottes commented Apr 9, 2016

Yeah i just did, but the question was, if this is okay for you and if we are ready, if i merge all changes to one commit?

@dongluochen
Copy link
Contributor

Yes please squash the changes.

@ottes ottes closed this Apr 18, 2016
@GordonTheTurtle
Copy link

Looks like we would not be able to merge this PR because of merge conflicts. Please rebase, fix conflicts, and force push to your branch.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants