Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ddev rm, describe, and stop are now reliable even when a site directory is missing; fixes #459 #460

Merged
merged 49 commits into from
Nov 16, 2017

Conversation

nmccrory
Copy link
Contributor

@nmccrory nmccrory commented Sep 14, 2017

The Problem/Issue/Bug:

ddev rm, describe, and stop were unable to remove sites that were missing their config.yml or sites that did not have directories associated with them. This made it impossible to cleanup corrupted sites pointed out by ddev list.

How this PR Solves The Problem:

This PR solves the problem by using a site's containers to grab it's site-name and using that site-name to run Cleanup() on the site. It also checks for a missing directory and if the approot is missing it skips app.Down() and uses Cleanup().

Manual Testing Instructions:

Test this PR using the following steps:

  • Start a site using ddev start
  • rm -r the site directory
  • ddev list the site to see that it reports its status as app directory missing:
  • Use ddev rm <sitename> to remove the site

Automated Testing Overview:

Tests for ddev rm are run in TestCleanupWithoutCompose. Tests for ddev describe are run in TestDescribeMissingDirectory. And tests for ddev stop are run in TestLocalStopMissingDirectory.

Related Issue Link(s):

#459

Release/Deployment notes:

@beeradb
Copy link
Contributor

beeradb commented Sep 14, 2017

I'll give this a more thorough review in the AM, but at first glance, I would love it if this could be consolidated into the Down() method rather than living in the cmd for the following reasons:

  • As a general rule, we try to keep as much logic outside of things in the cmd directory as possible. We don't always achieve this, but the idea is that the cmd directory is just for defining commands and their flags + arguments and then passing the work along to our packaged code.
  • The conditional here is important logic for Down() to work correctly. We'd have to do the same checks anywhere we tried to call Down() from. With that in mind, it's probably better to just include it in that method and ensure Down() is as robust as it can be.

To be clear, I mean this code: https://github.com/drud/ddev/pull/460/files#diff-5c1b23b6448d5099a5a94f5b2b0685f2R43

Copy link
Member

@rfay rfay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just leaving my comments for now. Will look again after tests pass.

As mentioned in slack, "To make things easier in the future (but use with care) on a PR that exactly solves an issue, you can add "fixes #xxx" to the title. So #460 will exactly fix #459, so please add "fixes #459" to the PR title. That way it will auto-close or at least if the person who's pulling makes sure that makes it into the commit message it will."

case strings.Contains(status, SiteDirMissing):
status = color.RedString(status)
case strings.Contains(status, SiteConfigMissing):
status = color.RedString(status)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the removal here a rebase failure?

appRoot, err := CheckForConf(siteDir)
if err != nil {
return "", fmt.Errorf("unable to determine the application for this command. Have you run 'ddev config'? Error: %s", err)
// In the case of a missing .ddev/config.yml just return the site directory.
return siteDir, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What what about the situation where they do a ddev start and haven't done a ddev config in advance, which is where I think this came from? Is that still handled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes because it will still get the app root from the running web container and return it by itself if it can't find a config.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the message shown with this PR in the situation @rfay describes (ddev start on dir that config has not been run for): "Failed to start: Could not find a site in /Users/tannerjfco/sites/test. Please specify a site name or change directories: no .ddev/config.yaml file was found in this directory or any parent" This may be ok enough, but we no longer suggest to run config. Maybe change to "Could not find a configured site?"

@nmccrory
Copy link
Contributor Author

Still need to refactor some tests to work but this is ready for a second review @beeradb

if err != nil {
util.Failed("Failed to remove %s: %s", l.GetName(), err)
}
} else if missingConfig != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conditional can just be combined with the one above it, as they both do the same thing.

if !fileutil.FileExists(l.AppRoot()) || missingConfig != nil { ... }

Copy link
Contributor

@beeradb beeradb Sep 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might also suggest configErr over missingConfig. The latter really reads like it should be a bool.

if !fileutil.FileExists(l.AppRoot()) {
err := Cleanup(l)
if err != nil {
util.Failed("Failed to remove %s: %s", l.GetName(), err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This results in a really verbose message when running ddev rm in a directory where a config doesn't exist, example:

"Failed to remove: Could not find a site in /Users/beeradb/Projects/go/src/github.com/drud/ddev. Please specify a site name or change directories: no .ddev/config.yaml file was found in this directory or any parent"

It feels like we could simplify that a lot.

Also, "%v" is the preferred Printf token for errors.

@@ -812,13 +812,26 @@ func (l *LocalApp) Config() error {
func (l *LocalApp) Down(removeData bool) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little uncomfortable with how much nesting and complexity we have in this function now. There are a lot of paths through this function, and along with those paths, lots of room for bugs. It's also much harder to maintain than I would like.

Here are a couple competing ideas for how we might go about making this more maintainable:

  • We split out the removeData portion into its own method. I'm not entirely convinced that Down() should handle that anyway, as it's arguably separate concern from stopping/removing the containers. Doing this would allow us to simplify the conditionals at the top to something more like the following:
   if !fileutil.FileExists(l.AppRoot()) || missingConfig != nil {
      return Cleanup(l)
   }

   err := dockerutil.ComposeCmd(l.ComposeFiles(), "down", "-v")
   if err != nil {
       util.Warning("Could not stop site with docker-compose. Attempting manual cleanup.")
       return Cleanup(l)    
   }
  • We could consider just always calling Cleanup() regardless of whether a docker-compose file exists or not. At this point I don't believe we are using any volumes, so removing a site is just a matter of stopping and removing the containers. If we think we can trust the Cleanup() method then it would definitely simplify our code to just rely on it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to use the "just call Cleanup()" technique. I see no downside, and no reason that we should have been doing other things. The reason we do the dockerutil.ComposeCmd("down") is just that the code was always there. Now we have better code, let's use it instead.

+1 for split removeData into own function.

@nmccrory
Copy link
Contributor Author

Today I'll be pushing the changes requested in Brad's review however tomorrow I would really benefit from a paired programming session to help write tests for this. Any time for this? @beeradb @tannerjfco

@rfay
Copy link
Member

rfay commented Sep 19, 2017

Originally I thought this bug was too niche to deserve a test. But then there was an accident toward the end of Nick's work in which the tests still worked but the basic premise of the PR stopped working. Reminded me how valuable tests really are. So I suggested to Nick to go ahead and add a test in pkg that just starts a single site, rm -rf the codebase, then check that it can describe, then stop, and rm.

@nmccrory nmccrory changed the title ddev remove now removes sites with missing directories and/or configs ddev rm, describe, and stop are now reliable even when a site directory is missing; fixes #459 Sep 25, 2017
Copy link
Member

@rfay rfay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a catastrophic experience testing this manually, my current directory was completely destroyed. I restored and repeated the experience. I had used the make testcmd approach to getting a dummy ddev site going, and ctrl-c to stop it, then removed the directory where the site was:

rfay@Randy-Air:~/go/src/github.com/drud/ddev$ ddev list
2 local sites found.
NAME                  TYPE       LOCATION                                                                                URL                                     STATUS
TestMainCmdWordpress  wordpress  /private/var/folders/db/741gx8tj20d5z1f3zjpvw2vh0000gp/T/TestMainCmdWordpress954110703  http://TestMainCmdWordpress.ddev.local  app directory missing: /private/var/folders/db/741gx8tj20d5z1f3zjpvw2vh0000gp/T/TestMainCmdWordpress954110703
gchistory             drupal7    ~/workspace/gchistory                                                                   http://gchistory.ddev.local             running

DDEV ROUTER STATUS: running
rfay@Randy-Air:~/go/src/github.com/drud/ddev$ ddev rm --remove-data TestMainCmdWordpress
Stopping container: ddev-TestMainCmdWordpress-web
Stopping container: ddev-TestMainCmdWordpress-dba
Stopping container: ddev-TestMainCmdWordpress-db
Removing container: ddev-TestMainCmdWordpress-web
Removing container: ddev-TestMainCmdWordpress-dba
Removing container: ddev-TestMainCmdWordpress-db
Failed to remove TestMainCmdWordpress: failed to remove data directories: remove .: invalid argument

After completion, the entire current directory I was working in, which happened to be drud/ddev, was empty. I restored it with time machine, then tried it again, but same result. Obviously this is not what we want.

@rickmanelius
Copy link
Contributor

I added myself to the reviewer list because data loss is nothing to mess around with and I'd love to manually test this alongside Randy to ensure this is buttoned up.

@rickmanelius
Copy link
Contributor

Reset the testers now that the tests passed on circle. Going to try and recreate the issue @rfay surfaced.

@rickmanelius
Copy link
Contributor

Initial Testing

Preparation

  • Remove homebrew install. brew uninstall ddev
  • Get commit from nmccrory.
    • Add remote: git remote add nmccrory git@github.com:nmccrory/ddev.git
    • Fetch git fetch --all
    • Checkout git checkout nmccrory/20170912-ddev-rm
  • Build make clean && make darwin
  • Confirmed ddev version reports v0.9.1-43-g2756d26f

Testing

Active

  • Catastrophic scenario that Randy reported.
    • Setup a WordPress site using https://github.com/drud/wordpress.
    • Remove directory rm -r ~/git/wordpress
    • pwd shows ~/git/ddev
    • ddev list confirms wordpress is at status app directory missing: /Users/rmanelius/git/wordpress
    • running ddev rm wordpress2 shows the containers being stopped and then removed.
    • I verified the working directory was still in tact.

TODO

  • All other scenarios in scope of the parent issue, including rm, describe, and stop.

@rickmanelius
Copy link
Contributor

Testing

Confirmed

  • rm
    • Started a fresh wordpress site.
    • Confirmed site in ddev list
    • Move directory cd
    • Remove directory rm -rf ~/git/wordpress
    • ddev list confirms wordpress is at status app directory missing: /Users/rmanelius/git/wordpress
    • ddev rm wordpress stop and removes the containers
  • stop
    • Same prep steps as rm above
    • ddev stop wordpress shows a warning and prompts to run ddev rm wordpress although it doesn't wrap that in quotes.
  • describe
    • Same prep steps as rm above
    • ddev describe wordpress notes the status is "app directory missing: /Users/rmanelius/git/wordpress"

Copy link
Contributor

@rickmanelius rickmanelius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My manual testing worked, including the data loss scenario that @rfay identified. If he also gives a thumbs up, this is ready to merge.

Copy link
Member

@rfay rfay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Needs to use GetGlobalDdevDir(), especially for cross-platform predictability.
  • I'd rather see AppConfig get built right in the first place if that's possible.
  • I think it's really important to fix the shortsighted if !fileutil.FileExists(dir) { that allowed this to happen. We can at the very least check to see if dir is not empty; I kind of think that logic is pretty shortsighted though and can be structured better.

if fileutil.FileExists(l.AppConfig.AppRoot) {
dir = filepath.Dir(l.AppConfig.DataDir)
} else {
user, err := user.Current()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should all be using GetDdevGlobalDir(), don't reinvent the wheel.

// If it does not exist, manually set the data directory to remove.
var dir string
if fileutil.FileExists(l.AppConfig.AppRoot) {
dir = filepath.Dir(l.AppConfig.DataDir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way to deal with this is to make sure that AppConfig.DataDir was already set correctly. I'd like that better. Could we push this farther up, to make sure that AppConfig gets populated a bit better? We're building it, right? So we can build it right? I think the question is about why it didn't get set right in the first place.

// Since the site may be used for Cleanup(), also check that AppConfig.DataDir is set.
// This ensures that the sites data can be removed with the containers.
if app.AppConfig.DataDir == "" {
dataDir := fmt.Sprintf("%s/%s", util.GetGlobalDdevDir(), app.AppConfig.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is siteName aka AppConfig.Name guaranteed to be populated and not empty? If it were somehow empty seems like we might delete ~/.ddev ?

Copy link
Contributor Author

@nmccrory nmccrory Sep 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see it potentially being empty if someone were to run ddev rm by itself with no site-name argument in a directory with a deleted .ddev/config.yml. Not quite sure how that would play out.

Simply because if they didn't give a site-name then we wouldn't even be able to use GetActiveAppRoot in the first place so it would likely error out.

_, err = CheckForConf(siteDir)
if err != nil {
return "", fmt.Errorf("Could not find a site in %s. Please specify a site name or change directories: %s", siteDir, err)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no site name is given to ddev rm and the site's .ddev/config.yaml is missing, I have tested and confirmed that it will be caught here and properly error out telling the user to specify a site name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, please check in the code to make sure that siteName is never empty so there's no way we can construct a data directory of ~/.ddev

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we check to see if siteName is empty inside of GetActiveAppRoot and handle it there.

@rickmanelius
Copy link
Contributor

@rfay Wondering if this addresses your PR feedback.

Copy link
Member

@rfay rfay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are these remaining things that are pretty important to me:

  • Consolidate the handling of Down into one thing, without if/then/else. Unless there's a reason not to. I thought we had discussed it and we could just use the container-approach to down and not do any if-then-else about the config dirs.
  • Please look at where the data removal happens and put some code there so it can never, ever delete the wrong thing. I'd say look at the front of the filepath and make sure it's GlobalDdevDir, and look a the back of the filepath and make sure it's something past the /.
  • Take a look at my comment about the removal of filepath on the datadir and confirm that that removal was intentional. (Edit: I looked and it seems legit, but looks like it was sketchy before).

Let's make absolutely sure that we don't remove the user's home directory. Or ~/.ddev, or the current directory. Our --remove-data is a bit too powerful for my liking right now.

util.Warning("Could not stop site with docker-compose. Attempting manual cleanup.")
err = Cleanup(l)
if !fileutil.FileExists(l.AppRoot()) {
err := Cleanup(l)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I thought we were going to unconditionally do Cleanup() and skip the extra if/then/else stuff here, and simplify the code. Let's do that please, unless there was some reason not to do that.

@@ -837,7 +854,8 @@ func (l *LocalApp) Down(removeData bool) error {
}
}
}
dir := filepath.Dir(l.AppConfig.DataDir)
// Convenience identifier
dir := l.AppConfig.DataDir
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering why the filepath.Dir() was removed here? Is DataDir already a fully rooted path?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, took a look and it seems to me that filepath.Dir() should never have been used here right? That removes a level from the DataDir? Could end up removing the user's home directory?

Copy link
Contributor Author

@nmccrory nmccrory Sep 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filepath.Dir() was actually what was removing the tail end of the dataDir filepath because filepath.Dir() removes the last element of a filepath. For example if GlobalDdevDir was /Users/Nick/.ddev and I appended the site name to the end to make /Users/Nick/.ddev/site-name, filepath.Dir would trim that back to /Users/Nick/.ddev thus deleting every site's data. Very slippery slope and I agree that this situation needs to have an air tight seal.

func removeAllErrCheck(path string, assert *asrt.Assertions) {
err := os.RemoveAll(path)
assert.NoError(err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you had some trouble with os.RemoveAll() - Was that on windows testing on the test we had to move out? This code doesn't seem like it should be that necessary, although I don't think it does any harm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the code that Brad and I use to cleanup the directory name changes that we make to mock a missing directory in TestLocalStopMissingDirectory, TestDescribeMissingDirectory, and CleanupWithoutCompose.

@@ -17,7 +17,7 @@ const SiteNotFound = "not found"
const SiteDirMissing = "app directory missing"

// SiteConfigMissing defines the string used to denote when a site is missing its .ddev/config.yml file.
const SiteConfigMissing = ".ddev/config.yml missing"
const SiteConfigMissing = ".ddev/config.yaml missing"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, thought this would happen if .ddev was missing or docker_compose.yml was missing? Fine with me, don't think it's that important.

@rfay
Copy link
Member

rfay commented Sep 28, 2017

Oh and @beeradb please put this one on your docket for a revisit. I know you have commits in it, but still, needs your eyes. The big bug we uncovered has me spooked.

@rickmanelius
Copy link
Contributor

Looks like the build failed.

@rfay rfay self-assigned this Oct 2, 2017
nmccrory and others added 24 commits November 16, 2017 15:38
…moves application data by manually setting the site's data directory if the site is missing its directory. Test coverage included.
…. Add validation for unsafe AppConfig.DataDir values. Down() uses Cleanup() in all cases.
…Head is equal to GlobalDdevDir as tests revealed it did not meet our needs.
@rfay
Copy link
Member

rfay commented Nov 16, 2017

Handled @beeradb 's main comments, rebased against master. Will go with either implicit or explicit approval by @rickmanelius after tests are green.

Copy link
Contributor

@rickmanelius rickmanelius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to go ahead and approve because I gave a previous thumbs up and we now have essentially approvals across the board. This is a longstanding issue that I don't want to see blocked/stale, and I intend to re-review on our next major release as per usual. Any nitpicks or edge case issues can be filed as subsequent tickets versus holding up this larger PR.

@rfay rfay merged commit bdfbf02 into ddev:master Nov 16, 2017
@rfay
Copy link
Member

rfay commented Nov 16, 2017

Thanks @nmccrory for all your work on this. It was far more ambitious than anybody expected, and you hung with it a long time. Hope it feels good that it's finally in!

@rickmanelius rickmanelius added this to the v0.9.4 milestone Nov 17, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants