Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitDumper should also checkout the main branches #148

Closed
erikyao opened this issue Jun 16, 2021 · 6 comments
Closed

GitDumper should also checkout the main branches #148

erikyao opened this issue Jun 16, 2021 · 6 comments
Assignees

Comments

@erikyao
Copy link
Contributor

erikyao commented Jun 16, 2021

When adding a new plugin to pending.biothings.io, the URL to the GitHub repo will be passed to http://localhost:19080/dataplugin/register_url,

image

which in the end calls the AssistantManager.register_url() method (assistant.py#L699).

The AssistantManager instance looks like to add a message (including the URL to register) to its corresponding MongoDB collection, and finally a GitDumper instance will receive the URL and then check the repo out. By default, GitDumper will only check master branches, but recently GitHub has changed its default branch name from master to main. Therefore our GitDumper cannot checkout the latest GitHub repo-based plugins.

The root cause in the code seems to be dumper.py#L1072:

DEFAULT_BRANCH = "master"
@zcqian
Copy link
Contributor

zcqian commented Jun 16, 2021

I think the better solution is to figure out what the remote HEAD is and use that as the default branch.

But please do note that while HEAD usually points to master or main (in GitHub's case), having a default branch is not required (per git-scm documentation).

Use git remote show origin (origin or whatever the name of the remote is) to find the "HEAD branch"

@newgene
Copy link
Member

newgene commented Jun 16, 2021

Per this post, can use this to get the default git branch.

git ls-remote --symref "$REPO_URL" HEAD | sed -nE 's|^ref: refs/heads/(\S+)\s+HEAD|\1|p'

@zcqian
Copy link
Contributor

zcqian commented Jun 16, 2021

The reason we don’t want to introduce sed is that there’s no guarantee that it exists, and piping probably doesn’t work as expected when running in a subprocess.

@newgene
Copy link
Member

newgene commented Jun 16, 2021

@zcqian certainly, within python code, you don't really need sed, should be easy to extract what you need with regex, same as what sed part intend to do in the above example.

@zcqian
Copy link
Contributor

zcqian commented Jun 16, 2021

That’s what I’m thinking for now. We will invoke git and scrape its output for information, and in this case we will take the default branch in this order:

  1. User defined default. The current class variable will default to None. Hoped this doesn’t break other people’s code.
  2. Remote HEAD
  3. master branch of the remote, if it exists
  4. main if master does not exist but main exists

raise if none of the above applies

Edit: changed how things work, now only taking into account the configured repo URL, not what's actually configured in the local git repository.

In the long run I don’t like scraping the standard output of git, or using GitPython (we use this in some places, it is also probably scraping the output, given that it depends on the git command). We should look into solutions like libgit2.

@zcqian
Copy link
Contributor

zcqian commented Jun 21, 2021

Fixed via #150

@zcqian zcqian closed this as completed Jun 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants