Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drud-s3 provider implementation, fixes #998 #1005

Merged
merged 36 commits into from
Aug 12, 2018

Conversation

rfay
Copy link
Member

@rfay rfay commented Jul 22, 2018

The Problem/Issue/Bug:

ddev-Live needs integration, uses s3 buckets

How this PR Solves The Problem:

Implement drud-s3 provider plugin

Remaining work:

  • Some ugly specific mentions of provider type need to just be refactored. Can be done much better.
  • AWS credentials need to be pulled into ~/.ddev, they're currently in the ~/.aws/credentials standard place.
  • Bucket Name has to be moved into config.
  • Validate environment in ValidateField
  • Use https://github.com/AlecAivazis/survey for prompting at least here.
  • Validate environment in environmentPrompt()
  • I don't think the environment prompt is actually working right; no display of available environments.
  • Checking for an existing project/site should probably probe by sitename-prefix instead of getting all available keys
  • Consider changing import.yaml to drud-s3.yaml
  • Figure out how ddev config drud-s3 --flag --flag is supposed to work, this is ugly everywhere, in fact we added custom flags for pantheon.
  • Tests for the drud-s3 provider
  • Tests for the ddev config drud-s3 command
  • Add to ddev pull tests to support the new provider.
  • Add docs for usage
  • Make sure that AWS credentials used for tests do not allow access to writable directories.
  • Make tests for config UI work, or spin off issue to do that with survey

Review at this point should just be general "works/doesn't work" and "looks like the code is going the right/wrong direction", since there's quite a bit of work remaining

Manual Testing Instructions:

  • Clone one of the ddev-local-tests repositories, git info in the 1password ddev vault, item called "ddev-local/ddev-live development environments"
  • ddev config drud-s3 - AWS credentials are in that same 1Password vault item.
  • CHOOSE THE bucket ddev-local-tests - Unfortunately our AWS S3 keys allow us to see way too many buckets.
  • Any environment that is available there should work for this, they all have backups in them.
  • ddev pull
  • Look at the local project site.
  • Experiment with command-line arguments ddev config drud-s3 -h
  • Test regular config and ddev config pantheon usage. ddev config has been completely and totally refactored.

Automated Testing Overview:

  • Basic testing for the drud-s3 provider is in place
  • Testing of the UI (prompting) is not there at all, since we couldn't figure out how to do it yet.

Needs much:

  • Overall provider testing

Related Issue Link(s):

OP #998

Release/Deployment notes:

@rickmanelius
Copy link
Contributor

Took this for a spin and it worked! One thing to note is that I originally used a repo called "drupal-kickstart" and I had to name the project "d7-kickstart" for it to connect. This is the same with Pantheon, so it's all good... just something worth noting in the testing criteria.

Copy link
Contributor

@beeradb beeradb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks like a great start! I read over the code and added some comments, I'm sure most of them were known already.

The general approach here looks sound to me. Thanks for taking this on!

// used any time a field is set via `ddev config` on the primary app config, and
// allows provider plugins to have additional validation for top level config
// settings.
func (p *DrudS3Provider) ValidateField(field, value string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really wonder if this should have passed all values into a single validate function instead of just doing it on individual fields. It would make validation of "combo fields" like site + environment name so much easier.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a single Validate() function, so I just made the ValidateField() do nothing in the end and leave it all to Validate().

fmt.Println("\n\t- " + strings.Join(p.projectEnvironments, "\n\t- ") + "\n")
var environmentPrompt = "Type the name to select an environment to import from"
if p.EnvironmentName != "" {
environmentPrompt = fmt.Sprintf("%s (%s)", environmentPrompt, p.EnvironmentName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've moved to https://github.com/AlecAivazis/survey for some of this stuff on the hosting side (although we haven't converted everything) and it makes things oh so much nicer.


// This is currently depending on auth in ~/.aws/credentials
sess, err := session.NewSession(&aws.Config{
Region: aws.String("us-east-1")},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably going to have to be a first-class config setting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - It actually doesn't matter for this kind of thing if I understand right, it's just declaring who we are using to query, it would be important if we were creating buckets or stuff like that.

//sessionLocation := filepath.Join(ddevDir, "DrudS3config.json")

// This is currently depending on auth in ~/.aws/credentials
sess, err := session.NewSession(&aws.Config{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this would allow for environment variables as well I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, it's going to live in project/.ddev/import.yaml (which is gitignored).

}

// Functions to allow golang sort-by-modified-date (descending) for s3 objects found.
type byModified []*s3.Object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably prefer custom types at the top of the file or in their own separate file, just for visibility. That's probably just preference though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably disagree, since this type absolutely is only used for the sort functions declared here. Open to better ways to do this of course.

// find what projects are available in the bucket and then to get the
// environments for that project.
func getDrudS3Projects(client *s3.S3, bucket string) (map[string]map[string]bool, error) {
objects, err := getS3ObjectsWithPrefix(client, bucket, "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe if you include the prefix here then you shouldn't have to process all items in the bucket.

if (len(components)) >= 2 {
tmp := make(map[string]bool)
tmp[components[1]] = true
projectMap[components[0]] = tmp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I found this pretty hard to follow by just reading the code. Structs are cheap and field names can sometimes help with code like that. for instance projectMap['drud.io'].Environments['production'] = true. Not a big deal though, just took me a minute to parse what was happening.

@rfay rfay force-pushed the 20180721_drud_s3_provider branch 2 times, most recently from e80ba26 to 747ce7b Compare July 24, 2018 18:46
@rickmanelius
Copy link
Contributor

Hi @rfay. I took another spin on this...

  • I love the improvements on the interactive prompts and password masking for the S3 private key.
  • I was a initial confused by the bucket name only because our test key seems to have access to a few more buckets than I was expecting, but I figured it out. I think it's a nice touch to have the dropdown.
  • The selection of environments is way better/easier. Big 👍
  • I confirmed the .gitignore blocking import.yml (will be renamed to drud-s3.yml?), and that's great.
    • We may also want to put a big scary warning to note commit the keys for security purposes, but that's probably too much handholding.
  • I confirmed the files and db downloads. I confirmed that using a cached version by running multiple ddev pulls and occasionally removing a file in global ~/.ddev/drud-s3/ to verify it re downloads when the file is missing.

Overall this is a 👍

@andrewfrench
Copy link
Contributor

Just tried this out, Survey is very cool, there are several places that we should be using this!

I was able to:

  • Clone the remote repo
  • Run ddev config and set up AWS credentials
  • Select a bucket (I agree with Rick, I found this part confusing)
  • ddev pull
  • Access the local site

Overall, very cool and a fairly seamless experience. Let me know when the code has stabilized and I'll jump on that.

@rfay rfay force-pushed the 20180721_drud_s3_provider branch 2 times, most recently from 4407515 to becf316 Compare July 30, 2018 16:58
@rickmanelius
Copy link
Contributor

BTW, it looks like a lot of progress has been made (based on commit history and updates to the issue summary). I'm happy to hold off on a 2nd round review until your ready unless there is enough change to warrant a quick test. Not pushing, just offering :)

@rfay rfay force-pushed the 20180721_drud_s3_provider branch 3 times, most recently from a853209 to 2f654e4 Compare July 31, 2018 18:06
@rfay
Copy link
Member Author

rfay commented Jul 31, 2018

I think this is ready for review. I completely refactored ddev config into 3 files, with mostly shared code. I think it's a major advance, as the old cmd/config code was just pure spaghetti.

I will still add a mention in the docs, and hope to follow up with the maintainers of survey and go-expect to see if maybe we can figure out how to test.

@rfay
Copy link
Member Author

rfay commented Aug 1, 2018

On second thought, please hold off on testing this. I found the very simple https://github.com/Bowery/prompt, which looks like it may do the job for us. Not as slick, but good enough, seems to work on Windows, looks like it will be testable.

@rfay
Copy link
Member Author

rfay commented Aug 1, 2018

I tried almost all of the prompting packages at https://golanglibs.com/search?q=prompt&sort=top and didn't succeed with getting one that:

  • Worked on Windows in both git-bash and cmd
  • Tested successfully
  • Tested successfully on Windows

So I went ahead and wrapped our traditional technique with a new util.Prompt(), added tests, for that, etc. This is not as pleasant as survey was, but it's much simpler code and it actually works and tests.

It certainly may be that next time around we/I will do better figuring out the testing with one of the other packages; the problem seems to be understanding golang terminal behavior and stdio.

@rickmanelius
Copy link
Contributor

Thank you @rfay! I'll get this tested with feedback today. Sorry about the testing behavior on the terminal, but better to go forward with the tried and true.

Copy link
Contributor

@rickmanelius rickmanelius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preparation

  • git clone 4081072d
  • make darwin
  • ddev version => v1.0.0-40-g4081072d

Reference

Testing

  • Located "ddev-local/ddev-live development environments" entry.
    • Observed entries for D7 Kickstart, D8 Umami, and WordPress 4
  • Reviewing documentation
    • ddev config drud-s3 --help
      • LOL'd at "rweeMAGICSECRET"
      • Observed 4 flags that were added beyond the standard ddev config
        • --access-key-id string => drud-s3 only: AWS S3 access key ID
        • --bucket-name string => drud-s3 only: AWS S3 bucket
        • --environment string => Choose the environment for a project (production/staging/etc)
        • --secret-access-key string => drud-s3 only: AWS S3 secret access key
      • Observed the examples and they look sane.
        • Question, could we not put in a read only bucket that had a legitimately working command that someone could run a test against?
  • d7-kickstart
    • Run git clone
      • Observed no existing .ddev folder or settings*.php files
    • Run ddev config drud-s3
      • Answered first 3 stock items
      • Just hit "enter" through the access key id and secrete access key.
      • Observed
        • (correct) failure of "There was a problem configuring your project: unable to list buckets: unable to list S3 buckets: EmptyStaticCreds: static credentials are empty"
        • No .ddev folder or config was produced!
    • Re-run ddev config drud-s3
      • Entered in a valid key id
      • Entered in a blank secret id
      • Observed
        • The same (and correct) failure of "There was a problem configuring your project: unable to list buckets: unable to list S3 buckets: EmptyStaticCreds: static credentials are empty"
        • No .ddev folder or config was produced!
    • Re-run ddev config drud-s3
      • Noted that there was no access key id because there was no config used from last time.
      • This time I entered in a value of 'a' for both and observed this error: "There was a problem configuring your project: unable to list buckets: unable to list S3 buckets: InvalidAccessKeyId: The AWS Access Key Id you provided does not exist in our records. status code: 403, request id: D3BC291964087D87, host id: bzwIga1ZKexfVxvbGSJMDGMkYcVUrILJq7iSlmayT77Zrd2SDXpdD7TT9R4tW6uaphqUyFRWDoI="
    • Re-run ddev config drud-s3
      • I entered the correct id/secret
      • Observed the list of buckets in a long string.
        • Noted "ddev-local-tests" but tested just hitting enter to see what happens.
      • Observed "There was a problem configuring your project: could not getDrudS3Projects: there are no objects matching in bucket"
    • Re-run ddev config drud-s3
      • Answered all the questions correctly
      • Set production as my target
      • Observed .ddev folder
      • Observed .ddev/config.yml with the 4 values
      • Observed a .gitignore file that excludes import.yml and states "#This file is created and managed by ddev"
    • Re-run ddev config drud-s3
      • Noted that I'm not prompted for my S3 keys again. Nice!
        • Could potentially explain how to change them there in the file or delete the file to specify new credentials. Probably not necessary given our use case here is a bit more limited.
      • Observed "Configuration complete. You may now run 'ddev start'."
        • Should we also state a ddev pull here as a valid option?
    • Run ddev start
      • Observed a valid install screen
    • Run ddev pull
      • Responded 'yes' to prompt of replacing the files and db.
      • Observed "Downloaded file /Users/rmanelius/.ddev/drud-s3/d7-kickstart/db-2018-08-02T00-10-16_UTC_D.sql.gz (749586 bytes)"
      • Observed "Downloaded file /Users/rmanelius/.ddev/drud-s3/d7-kickstart/files-2018-08-02T00-10-16_UTC_D.tar.gz (5273973 bytes)"
      • Verified both downloads are where they claimed to be.
      • Verified the site was working
      • Verified I could login with the admin credentials
    • CONFIRMED
  • WordPress
    • Same experience as D7 Kickstart
    • I do note the appropriate warning of "Wordpress sites require a search/replace of the database when the URL is changed. You can run "ddev exec wp search-replace [http://www.myproductionsite.example] http://wordpress4.ddev.local" to update the URLs across your database. For more information, see http://wp-cli.org/commands/search-replace/".
      • Not sure if we can automate that because we might know the URL it came from as well as the local URL.
    • CONFIRMED
  • Drupal 8 Umami
    • Same experience as D7 Kickstart
    • CONFIRMED

Copy link
Contributor

@andrewfrench andrewfrench left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still marked as WIP, so no full review just yet, only a couple notes on function usage.

}

// Find out if flags have been provided
flagsProvided := false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flagSet.NFlag() returns the number of flags that were set, so I think this can be simplified into something like

if cmd.Flags().NFlag() == 0 {
  // prompt for config workflow
}

err := ioutil.WriteFile(gitignoreFile, gitignoreContent, 0644)
if err != nil {
return fmt.Errorf("Failed to write .ddev/.gitignore file: %v", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a createGitIgnore() function in pkg/ddevapp/utils.go, perhaps that could be used here? I agree that adding the "created and managed by ddev" signature would be a good idea.

Copy link
Contributor

@andrewfrench andrewfrench left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few points regarding S3 interactions and some pre-existing functions.

Bucket: aws.String(bucket),
Prefix: aws.String(prefix),
MaxKeys: maxKeys,
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the best way to confidently get all keys here. As you noted above, this will return a maximum of 1000 keys, and a MaxKeys value over 1000 will always be ignored.

S3's ListObjectOutput struct has a NextMarker *string field that can be passed into ListObjectInput's NextMarker *string to get the next <1000 objects. You can inspect *ListObjectOutput.IsTruncated after each request to see if S3 indicates that there are more objects to return. You'll also need to pass in the Delimiter field, which in our case looks like the usual /.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with adding this, and of course you're correct. OTOH, the upstream project is supposed to prune to a few weeklies and a few dailies. Do you think it's worth going ahead and adding the (hard-to-test) loop?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's truly never expected to come up, then that's totally fair. Given that assumption, though, I think the way MaxKeys is currently defined is a bit confusing, and there may be a reasonable middle ground where we just simply check if S3 indicates that more results are available, then warn or fail on that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do the loop. My fear was that I'd write code that didn't get executed enough and would be basically untested. But silly me, I realized I can make the max low and it will be exercised regularly. Or at least during testing.

func getLatestS3Object(client *s3.S3, bucket string, prefix string) (*s3.Object, error) {
// TODO: Manage maxKeys better; it would be nice if we could just get recent, but
// AWS doesn't support that.
maxKeys := aws.Int64(1000000000)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same note as above regarding this value being ignored if over 1000 and iterating through results if more are returned.

While S3 doesn't explicitly let you sort by most recent, it does return an alphanumerically sorted set of keys. If we were able to stick to a very strict naming convention (like always begin a key with a timestamp) and guarantee we didn't break that, we could rely on S3's sorted output, but I think that would probably be too fragile and it's best to sort the output ourselves.

@rfay rfay force-pushed the 20180721_drud_s3_provider branch from cdd40db to a120f16 Compare August 8, 2018 04:24
@rfay rfay force-pushed the 20180721_drud_s3_provider branch from a378c5a to 8563d8c Compare August 11, 2018 22:24
@rfay
Copy link
Member Author

rfay commented Aug 12, 2018

@andrewfrench glad you pushed me to do better on the S3 objects query, think this is a lot better, 8563d8c

@rfay rfay merged commit 39aa1a8 into ddev:master Aug 12, 2018
@rickmanelius
Copy link
Contributor

This is closed, but I note the need to ensure the AWS credentials used for testing should be verified as read only. Can that be confirmed?

@rfay rfay deleted the 20180721_drud_s3_provider branch May 18, 2020 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants