Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GitLab as a sync backend #734

Merged
merged 19 commits into from
Nov 1, 2021
Merged

Conversation

chasecaleb
Copy link
Contributor

@chasecaleb chasecaleb commented Oct 28, 2021

Implements #733.

Notes:

  • I plan on rebasing to clean up my commits once fully implemented, but for now I made a draft to get feedback. Done
  • This only works for https://gitlab.com, not self-managed instances. Adding support for self-managed instances would be possible in a future PR if someone wants to implement it, but I don't want to myself. You would have to add inputs at sign-in to allow the user to provide domain and OAuth client ID + secrets. You would also need to provide/link to instructions for registering an OAuth application on their instance.

Additional work required before merging this:

@munen: You'll need to create an OAuth application with your GitLab account. I've been testing it with an application tied to my account, but I don't think anyone wants that in prod since I'm not a maintainer 馃槅. See instructions for a user-owned application and use the following settings:

  • Name: Organice (or whatever you want)
  • Redirect URI: TBD depending on how we handle project selection. I'll update this once determined.
  • Confidential: uncheck, because this is a client-side SPA
  • Expire access tokens: keep checked
  • Scopes: api only. This is unfortunately quite broad, but it's the best we can do. You might want to subscribe to notifications on https://gitlab.com/gitlab-org/gitlab/-/issues/22115 in case GitLab ever gets around to allowing project-specific OAuth tokens.

Once you have the application created, set REACT_APP_GITLAB_CLIENT_ID and REACT_APP_GITLAB_SECRET in .env.

@chasecaleb
Copy link
Contributor Author

@munen This is clearly still WIP, but I have OAuth working and getting ready to start on the actual sync backend implementation. However I'd like your feedback on how to handle selecting a GitLab project.

Ideas:

  • Put an input field on the sign in page for the user to provide the project URL, then make an API call to fetch the project ID. For example, my Emacs config URL is and https://gitlab.com/chasecaleb/emacs.d project ID is 18505842. This would be similar to how WebDAV provides input fields during sign on.
  • Instead of making user pick during sign in, make the GitLab implementation of getDirectoryListing show a list of projects for the root directory. So for instance a path of /foo/bar.org would correspond to file bar.org in the root of foo project.
  • Add a new page after OAuth sign in before redirecting back to / (App.js). So it would work like this:
    1. User clicks sign in
    2. User clicks GitLab logo
    3. App redirects user to GitLab to approve OAuth request
    4. GitLab redirects user to /sign_in/gitlab (or a different URL if you prefer)
    5. Users picks from a list of their projects. This would be styled similarly to the directory list view I imagine.
    6. App saves the project choice to localStorage and redirects user to /.

My opinion on the ideas:

  • Input field for the project URL: easiest to implement/maintain and what I would do if this was only for my own use. However I'm not sure how you feel about the UX.
  • Treating projects as top-level directories: "elegant" and has the advantage of letting users switch between projects without signing out and back in, but I don't think anyone has org files scattered across multiple repos. The potential for accidentally modifying the wrong project (whether due to a bug or user mistake due to e.g. a misconfigured capture template) is a significant concern though, so I think this is a bad idea.
  • Adding a new page: basically a more polished UX alternative to the input field idea. I'd rather just go with the input field for the sake of saving time, but I don't mind implementing this option if you have a strong preference.

Thoughts?

@munen
Copy link
Collaborator

munen commented Oct 29, 2021

Hi @chasecaleb

I have checked out your PR, created an application on Gitlab and confirmed that signing in works as described.

This only works for https://gitlab.com, not self-managed instances.

馃憤

Thank you for documenting the path forward for a potential future person interested in adding support for self-managed instances.

... I'd like your feedback on how to handle selecting a GitLab project.

I like option 1 very much for the same reasons you have addressed. In fact, those thoughts came up, I made a mental note to comment on them and then saw in the next sentence that you've thought about it yourself. It was quite a satisfactory read馃槉

So far, I have only glanced at the code, but it seems you've found all the right spots and put some nice stubs there.

Really good job, I'm looking forward to your next ping馃殌馃檹

@munen
Copy link
Collaborator

munen commented Oct 29, 2021

@chasecaleb The failed ci/circleci: build-docs task has no footing in your PR and will turn green once you rebase on master. I have just merged a fix for that.

As opposed to always showing once clicked.

Preparatory work for 200ok-ch#733
Make them usable for any sign in form (such as the WIP GitLab form), as
opposed to specific to WebDAV.

Preparatory work for 200ok-ch#733
Version control makes this redundant.

For 200ok-ch#733
@chasecaleb chasecaleb marked this pull request as ready for review October 31, 2021 04:26
Copy link
Contributor Author

@chasecaleb chasecaleb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@munen I'm done implementing this, so it's ready for your review. I added some review comments already, too. Let me know if there are any changes you'd like to see; I'm a software engineer by day so I'm used to iterating on PRs.

src/actions/sync_backend.js Show resolved Hide resolved
.env.sample Show resolved Hide resolved
src/components/SyncServiceSignIn/index.js Show resolved Hide resolved
};

let cachedDefaultBranch;
const getDefaultBranch = async () => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using the API to determine the default branch (e.g. master or main, typically), because I think that's what >99% of users are going to want for their org files. Sound reasonable?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. What if the user changes their default branch and it's still cached? Then, the subsequent API actions would yield errors like "not found" and the user have to logout/login, right?

Copy link
Collaborator

@munen munen Oct 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My 2c: Lots of tools fail when the repo admin changes the default branch. When some people switched to main, we've seen errors in npm packages, Clojure modules, CI and so forth and so on.

Personally, I'd cache the branch instead of making a conditional API request ahead of every other API request. If the user changes the default branch they can re-login. On the other hand, switching default branches might even just be a subset of branch related issues for particular users. For those, we could add a dropdown in the settings where she can select the branch she wants to work on.

Since the first option is easier, I'd personally go with that one and leave the latter to the user who wants to switch branches (or change default branches).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the user changes their default branch and it's still cached? Then, the subsequent API actions would yield errors like "not found" and the user have to logout/login, right?

Not quite. The cachedDefaultBranch is just a local variable scoped to the client object. It isn't persisted to localStorage or anything, so all the user has to do is refresh the page. Since this is a SPA that means it is only fetched once so long as they keep the browser tab open, don't refresh the page, and the browser doesn't unload the page due to low memory.

Personally, I'd cache the branch instead of making a conditional API request ahead of every other API request. If the user changes the default branch they can re-login.

@munen I am caching it, but only in a local variable instead of persisting to storage -- see previous paragraph. It seems like a good compromise of avoiding too many API calls without causing edge cases from persisting it long-term like what @schoettl was getting at. I would prefer to avoid persisting to localStorage until we add a dropdown in settings like you mention, out of concern for edge cases that are hard to convey to the user.

If that explanation doesn't make sense, you might check out the network requests tab of your browser's dev tools to see what I mean.

So does leaving this the way it is currently sound good, or are you suggesting a different form of caching?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So does leaving this the way it is currently sound good, or are you suggesting a different form of caching?

Works for me! I didn't say so explicitly, but when I said "I'd cache the branch" I already did want to agree with your existing strategy which I saw from the code. I could have been clearer in that.

In any case, kudos to the original implementation and thank you for the additional explanation for both of us 馃檹

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@schoettl Will you resolve this conversation if you're fine with the current implementation, please?

src/util/settings_persister.js Outdated Show resolved Hide resolved
src/actions/org.js Outdated Show resolved Hide resolved
@munen
Copy link
Collaborator

munen commented Oct 31, 2021

@chasecaleb First of all, congratulations on this PR. From a quick glance it looks really good!

Before I dig into the code and your comments, I've pulled the code and tried it out. Logging in and browsing my files worked fine. When I made a change, this is what I got:

image

When I debug it, this is what's set for lastServerModifiedAt:

image

The lastServerModifiedAt is used to determine whether or not the last sync was earlier or later than the changed file in src/actions.org.js.

@munen
Copy link
Collaborator

munen commented Oct 31, 2021

I've tested the app locally and went over all comments. I have resolved the comments where appropriate.

Before digging into the code, let's continue discuss/resolve the open questions.

In any case, this looks very promising and I'm looking forward to using this new feature! 馃檹

@chasecaleb
Copy link
Contributor Author

馃う I fixed the lastModifiedDate issue. Not sure how I didn't notice sooner, but I was returning a Date object instead of a string.

@chasecaleb
Copy link
Contributor Author

I believe I've address all of the review issues now, let me know if you see anything else.

@munen
Copy link
Collaborator

munen commented Nov 1, 2021

I fixed the lastModifiedDate issue. Not sure how I didn't notice sooner, but I was returning a Date object instead of a string.

Confirmed. After pulling, the new sync back-end works as expected! 馃殌

@@ -32,6 +55,14 @@ const debouncedPushConfigToSyncBackend = _.debounce(
alert(`There was an error trying to push settings to your sync backend: ${error}`)
);
break;
case 'GitLab':
// INFO: Not calling syncBackendClient.createFile is a
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment might be misleading, because if you look at the updateConfigForGitlab function the main reason that it exists is actually to determine whether to call updateFile() or createFile(). See this comment (which maybe should be promoted to a jsdoc string on the function instead of an inline comment inside it):

// GitLab doesn't allow updating a file that doesn't exist or creating one that already exists, so
// need to figure out whether or not it exists.

I think it would make more sense to add the #736 link to this comment:

// Also avoid a pointless empty commit if file didn't actually change.

Copy link
Collaborator

@munen munen Nov 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did read the code. That's why I've annoated some docs. It's much appreciated that you've documented your rationale so well! 馃檹

I appreciate that GitLab doesn't support creating a file if it doesn't exist. That's not a reason to not adhere to the strategy pattern for sync-backends, though. The reason there's a strategy pattern is exactly that every sync back-end differs in the strategy. However, they all implement the same functionality and in fact even functions. Without this strategy pattern, it would quickly become very unwieldy to support multiple back-ends as their code would differ in imaginative and complicated ways.

createFile for GitLab could do the same sanity check than updateConfigForGitlab does, could it not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate that GitLab doesn't support creating a file if it doesn't exist.

Correct me if I'm wrong, but I assume you meant "doesn't support updating a file if it doesn't exist".

The reason there's a strategy pattern is exactly that every sync back-end differs in the strategy.

Definitely agreed, and to your point I agree it's unfortunate that there are so many switch statements around backend client calls that make this a leaky abstraction -- another example is the Google Drive backend having three arguments for createFile() instead of two.

createFile for GitLab could do the same sanity check than updateConfigForGitlab does, could it not?

Yes, although that makes me confused about the intended strategy interface. It seems like since there are separate createFile and updateFile functions that the GitLab implementation is "correct", because otherwise the strategy interface would have a single createOrUpdateFile function instead of distinct create vs update functions. I realize the other backends allow updating an existing file via createFile() but that seems like misuse of a leaky abstraction as opposed to intended behavior. I'm only basing this off my own inference while looking at the code base since there isn't a spec/contract for the strategy, so I might be way off base.

In theory, it seems like the correct implementation of debouncedPushConfigToSyncBackend should be something like one of these two options (rough and untested, just for illustration):

// Pre-emptively figure out whether create or update is the right call to make. This makes sense
// if there's an easy way to track this via local state, but I'm not sure how straightforward
// that would be. If the only way to determine if fileExists is by something like
// syncBackendClient.getFileContents(..) then I think the other example is better.
const debouncedPushConfigToSyncBackend = _.debounce(
  (syncBackendClient, contents) => {
    const fileExists = figureThisOut();
    let promise;
    if (fileExists) {
      promise = syncBackendClient.updateFile('/.organice-config.json', contents);
    } else {
      promise = syncBackendClient.createFile('/.organice-config.json', contents);
    }
    promise.catch((error) =>
      alert(`There was an error trying to push settings to your sync backend: ${error}`)
    );
  },
  1000,
  { maxWait: 3000 }
);

// Try updating first because the file is going to exist most of the time this is called, then
// fall back to create if it fails.
// This is likely a better idea than the other example.
const debouncedPushConfigToSyncBackend = _.debounce(
  (syncBackendClient, contents) => {
    syncBackendClient.updateFile('/.organice-config.json', contents).catch(() => {
      syncBackendClient
        .createFile('/.organice-config.json', contents)
        .catch((error) =>
          alert(`There was an error trying to push settings to your sync backend: ${error}`)
        );
    });
  },
  1000,
  { maxWait: 3000 }
);

Unfortunately the Google Drive backend breaks the strategy interface completely with the way it takes an extra 'root' argument, so that would have to be changed too, which would require a non-trivial amount of work I imagine. So to be clear I'm not suggesting we make a change like this right now, although in the future it could be a beneficial (albeit likely low priority) improvement.


Anyways, all of that was a bit of a tangent to the original point I was trying to make. I think we're talking past each other because there's two issues going on in this section of code:

  1. The redundant config syncs issue in Redundant config syncs聽#736
  2. The leaky abstraction issue that you just pointed out (here, for clarity) and I was responding to in the first half of this message.

What I meant in my original post in this thread was that maybe the code comments should be rearranged a little to clarify which bit of code is dealing with which of the two concerns. I'm not sure if I'm conveying my point very well, so I made a commit to illustrate what I originally meant about rearranging the comments: 53455e3 . I don't mind dropping the commit if you disagree.

Also I think I'm turning a molehill of a comment into a mountain here, which wasn't my intent. If I'm still not making sense or you disagree then let's just drop the commit that I added and leave it the way you had it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong, but I assume you meant "doesn't support updating a file if it doesn't exist".

Typo. Yes.

another example is the Google Drive backend having three arguments for createFile() instead of two.

True. Everything Google Drive is unfortunate (of course I mean only in organice cough). We've been waiting for aeons and I've personally put a lot of effort into trying to get Google to enable production mode for the Drive App. So far without success. There's other issues with the Drive integration, too. That it is now also in the way of creating a more solid sync back-end abstraction only compounds the already huge issue around Google Drive.

So the next step is either a complete cleanup and getting the Drive app to work or just to remove the code. Any contributor who runs into this the next time is welcome to remove it. I'm not aware that any maintainer is using it. If somebody else wants to step up and fix it, they can still do it.

Yes, although that makes me confused about the intended strategy interface. It seems like since there are separate createFile and updateFile functions that the GitLab implementation is "correct", because otherwise the strategy interface would have a single createOrUpdateFile function instead of distinct create vs update functions.

I didn't say that the interface is set in stone(; I just wanted to say that it's important to try and keep things as tidy as possible and not deviate if there's an existing pattern unless absolutely needed.

Also I think I'm turning a molehill of a comment into a mountain here, which wasn't my intent. If I'm still not making sense or you disagree then let's just drop the commit that I added and leave it the way you had it.

Thank you for taking the extra effort of putting some emotional context to your statements above 馃檹

Let's make this simple and just merge. I'm quite sure we're pretty much on the same page on everything(;

Thank you also for putting in the extra effort to further debug the strategy interface and come up with some (pretty solid) mock code. I'll put it into an appropriate issue for safe keeping and another time^^

       Add GitLab to all of them and show it as the second option
       after Dropbox.
@munen
Copy link
Collaborator

munen commented Nov 1, 2021

I have finished my code review and am very happy with the state of this PR. Amazing work, chasecaleb馃帄

I didn't refactor any code, because it's in perfect shape, already. I only made some small contributions:

The last thing for me to do is add the GitLab apps credentials to the deployment setup for organice.200ok.ch to work.

Again, amazing work, chasecaleb!馃檱馃檹

This will certainly make a lot of people happy.

I'll leave this PR open for chasecaleb to review my additions. After that, I'll merge to master. Personally, I'm very much looking forward to switch from Dropbox to GitLab for a while and see how it goes(;

@schoettl I'm assuming that the discussion in #734 (comment) has been resolved in your favor, as well. If it hasn't, I don't think it's a blocker for this PR. We can fix it as soon as possible afterwards. I hope that's ok for you.

@chasecaleb
Copy link
Contributor Author

You did an amazing job reviewing all of this in a timely manner and providing constructive feedback, even though we're on opposite sides of the world (I'm in USA), so thank you 馃檹 !

Rationale: The default page limit of 20 is too small to show my complete top level directory of Org files and folders. I didn't see a penalty for increasing the per_page limit in the GitLab documentation, too.

Yeah this makes sense, I was tempted to increase it to 50ish myself but wasn't sure, so I left it alone. There isn't any penalty for it other than a slightly longer wait until the first page of results is returned since the response size will be larger. That's still faster than having to fetch multiple pages though, of course.

I do have one bit of feedback left in #734 (comment), but to reiterate it's really not a big deal either way to me. So you're welcome to merge as it currently stands with commit 53455e3, or drop that commit and merge only up to the previous commit, 8cbeecb. Either way I'm good with merging to master.

Thanks again!

@munen munen mentioned this pull request Nov 1, 2021
@munen
Copy link
Collaborator

munen commented Nov 1, 2021

Thank you for the kind words, the same goes to you - there has been superbly fast iteration on this PR and the original issue 馃檱 馃檹 It's not often to see not a request, but an offer to implement - only to be followed up with a perfect implementation including tests within the next few days.

It's been a lot of fun! 馃挴

This PR will be merged right after this comment.

@munen munen merged commit 2557836 into 200ok-ch:master Nov 1, 2021
@chasecaleb chasecaleb deleted the gitlab-backend branch November 1, 2021 18:52
@chasecaleb chasecaleb mentioned this pull request Nov 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants