Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TechDocs: Deleted pages continue to show up when using an external storage #6132

Closed
OrkoHunter opened this issue Jun 21, 2021 · 10 comments · Fixed by #6488
Closed

TechDocs: Deleted pages continue to show up when using an external storage #6132

OrkoHunter opened this issue Jun 21, 2021 · 10 comments · Fixed by #6488
Assignees
Labels
area:techdocs Related to the TechDocs Project Area bug Something isn't working good first issue Good for newcomers help wanted Help/Contributions wanted from community members

Comments

@OrkoHunter
Copy link
Member

OrkoHunter commented Jun 21, 2021

Feature Suggestion

When a page is deleted in the markdown source, the corresponding TechDocs site should reflect that the page is not available.

As of now, when using any of the cloud storage publishers (GCP, AWS, etc.), their publish method only uploads generated files into the bucket. This is good to make sure newly created pages are there, existing pages are modified but does not ensure that removed pages get deleted in the bucket.

Possible Implementation

Maybe delete the particular TechDocs site's folder before uploading.

@OrkoHunter OrkoHunter added bug Something isn't working good first issue Good for newcomers help wanted Help/Contributions wanted from community members area:techdocs Related to the TechDocs Project Area labels Jun 21, 2021
@OrkoHunter OrkoHunter added this to the [TechDocs] Beta release milestone Jun 21, 2021
@Fox32
Copy link
Contributor

Fox32 commented Jun 21, 2021

One problem of cleaning the folder first is that TechDocs might not work during the upload. Maybe we could have some kind of versioning where the versions are uploaded to separate folders and we just change the reference to the latest one. This would also allow to travel back to older versions of a site.

@andrewthauer
Copy link
Collaborator

S3 has the ability to also delete files on the target which might address this issue there. Not sure if the other storage options also have this option.

@soapraj soapraj moved this from Incoming to To Do 📝 in TechDocs project board Jul 5, 2021
@ottosichert ottosichert self-assigned this Jul 12, 2021
@ottosichert ottosichert moved this from To Do 📝 to In Progress 🚧 in TechDocs project board Jul 13, 2021
@ottosichert
Copy link
Contributor

I think we should choose an approach that is platform-agnostic. This keeps the possibility of integration other external storage providers with a minimal set of operations:

  • upload
  • delete
  • link
  • rename

I'm not sure if we should keep the whole history though, so I would propose following approach:

  1. Build assets for my-site
  2. Abort if building was unsuccessful
  3. Upload assets to external storage with unique suffix, e.g. commit hash: my-site-a1b2c3
  4. If my-site is a static folder, move to my-site-pending-deletion and create link from my-site
  5. Point my-site link to my-site-a1b2c3
  6. Perform smoke test on my-site
  7. Rollback link and abort if test was unsuccessful
  8. Delete old folder

What do you think?

@OrkoHunter
Copy link
Member Author

@ottosichert Interesting! When you say create a link from 'my-site' folder to 'my-site-<new>', do you mean a symlink?

I am just curious if this is supported for cloud storage providers. This says GCS doesn't support it.

@iamEAP
Copy link
Member

iamEAP commented Jul 13, 2021

Yeah, echo @OrkoHunter. This is conceptually solid, but I think the problem is going to be that we'll still ultimately be limited to working on individual objects by the various cloud providers' APIs (meaning, spotty support for folder-level operations).

@andrewthauer
Copy link
Collaborator

andrewthauer commented Jul 13, 2021

I think you would likely need to delete them recreate the entire structure to get the most breadth. A rename could work as well (if supported), but would end up being a swap with possibly 3 operations. Both cases could lead to an inconsistent state if there are failures between. However that could be just as likely for any copy/sync, except maybe though additional logic errors introduced by added orchestration.

Why not support sync where possible and then fallback to something like a destroy / create?

As for storing history, I don't see the point. The docs are source controlled already, so this is just taking up more cloud storage that is possible to recreate from git history. It might be useful at some point to support versioned docs though. However that seems to serve a different use case imo.

@ottosichert
Copy link
Contributor

Thanks for the input! I would argue though that it is favourable we use only one approach almost everywhere to reduce complexity.

So replacing link with a swap-rename operation we could opt for these steps:

  1. Build assets for my-site
  2. Abort if building was unsuccessful
  3. Upload assets to external storage with custom suffix, e.g. my-site-pending-deployment
  4. Rename folder my-site to my-site-pending-deletion
  5. Rename folder my-site-pending-deployment to my-site
  6. Perform smoke test on my-site
  7. Delete my-site-pending-deletion if test was successful, otherwise delete my-site and rename my-site-pending-deletion to my-site

I actually like that approach, as we now don't need to check for backwards compatibility, it is already included in the happy case.

Also, to optimize for manual operations in case of failures, the folder naming should be clear during any steps of the above. Let me know if you have a suggestion to improve it further!

@iamEAP
Copy link
Member

iamEAP commented Jul 13, 2021

I think you're still going to run into an issue during implementation that's a result of folder not being a first-class concept in most cloud storage providers: folders are just UI magic on top of a flat list of objects.

In order to implement a rename folder operation, you'll have to implement a way to list all objects prefixed with the "folder" anyway (to then iterate over and rename each specific object).

And if you have that list, you could just as easily load and stash it up front (as existingFiles or something), upload the built files like we do today, then compare the list of uploadedFiles with existingFiles and anything that was in existingFiles but not in uploadedFiles can safely be deleted.

@ottosichert
Copy link
Contributor

@ottosichert
Copy link
Contributor

cc @camilaibs @iamEAP

While having a closer look at the rename/move operations, it seems they are still performed on file-by-file basis, and hence would incur a downtime on the target TechDocs site while publishing.

Hence we decided to go for a different approach:

  • Retrieve a list of files of the previous build (currently existing in the bucket)
  • Upload and overwrite all new files (effectively merging both versions)
  • Deleting stale files that are not present in the current build

Draft PR: #6488

TechDocs project board automation moved this from In Progress 🚧 to Done ✅ Aug 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:techdocs Related to the TechDocs Project Area bug Something isn't working good first issue Good for newcomers help wanted Help/Contributions wanted from community members
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

5 participants