Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Keeping Git branches in sync with job-server workspaces #1507

Open
iaindillingham opened this issue May 3, 2024 · 1 comment
Open

Comments

@iaindillingham
Copy link
Member

iaindillingham commented May 3, 2024

In the Bristol-Cambridge-Oxford meeting on May 2, 2024 @venexia asked a question about keeping Git branches in sync with job-server workspaces. The question was prompted by an exchange with tech support.1

We should recognize that it may not be desirable to keep Git branches in sync with job-server workspaces. Nevertheless, this issue captures the question and the exchange with tech support. Ultimately, the intention is to improve the documentation, by making recommendations to researchers.

In the following workflow, the primary branch is GitHub's default branch. It is often called main. The terms primary branch, primary workspace, secondary branch, and secondary workspace have no meaning beyond this issue. HEAD is Git-speak for "the current branch's latest commit".

  • Researcher creates repo from opensafely/research-template
  • Researcher commits to primary branch
  • Researcher creates primary workspace associated with primary branch
  • Researcher runs jobs in primary workspace
  • Primary workspace directories are created on L3 and L4 filesystems
  • Files are written to primary workspace directories
  • Researcher writes paper based on primary branch HEAD and files in primary workspace directories
  • Researcher submits paper 🎉

At this point, the paper is based on primary branch HEAD and files in primary workspace directories.

The paper is reviewed; further analysis is requested, which necessitates modifications to the dataset definition. The researcher doesn't want to overwrite files in primary workspace directories, because modifications to the dataset definition could result in a different dataset. 🙁

  • Researcher branches from primary branch, giving secondary branch
  • Researcher commits to secondary branch
  • Researcher creates secondary workspace associated with secondary branch
  • Researcher runs jobs in secondary workspace
  • Secondary workspace directories are created on L3 and L4 filesystems
  • Files are written to secondary workspace directories
  • Researcher updates paper based on secondary branch HEAD and files in secondary workspace directories
  • Researcher merges secondary branch into primary branch
  • Secondary branch is deleted (by researcher, by GitHub, etc.)
  • Researcher submits paper 🎉

At this point, the paper is based on primary branch HEAD and files in secondary workspace directories.

The paper is reviewed; further analysis is requested 🙁

  • Should the researcher commit to primary branch? Files in primary workspace directories are behind files in secondary workspace directories. The researcher would need to run jobs in primary workspace.

  • Should the researcher branch from primary branch, giving new secondary branch with same name as old secondary branch, and commit to new secondary branch? The researcher would not need to run jobs in secondary workspace.

Footnotes

  1. https://bennettoxford.slack.com/archives/C01D7H9LYKB/p1709140748180189

@iaindillingham
Copy link
Member Author

The researcher doesn't want to overwrite files in primary workspace directories, because modifications to the dataset definition could result in a different dataset.

I think the user wants to undertake an experiment: that is, to compare the dataset in the primary workspace with the dataset in the secondary workspace. The comparison need not be exact; it may be an approximation. DVC (Data Version Control) provides experiment management, which we could learn from. For more information, see:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant