Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distrust the infrastructure in workflow management, issue tracking, and doc PR review #3958

Open
andrewdavidwong opened this issue Jun 5, 2018 · 13 comments

Comments

@andrewdavidwong
Copy link
Member

@andrewdavidwong andrewdavidwong commented Jun 5, 2018

We aim to distrust the infrastructure. However, as we've discussed previously and more recently, we actually trust GitHub quite a bit for workflow and issue tracking. We also implicitly trust GitHub when having each other review documentation PRs before merging. For example, whenever I request that @marmarek review a PR because he has expertise that I lack, I have to trust that the interface telling me that he has approved the PR is being truthful when I merge it. I think we should seriously investigate ways of reducing our reliance on (i.e., distrusting) these aspects of GitHub.

@andrewdavidwong andrewdavidwong added this to the Ongoing milestone Jun 5, 2018
@andrewdavidwong andrewdavidwong changed the title Distrust the infrastructure in workflow management, issue tracking, and PR review Distrust the infrastructure in workflow management, issue tracking, and doc PR review Jun 5, 2018
@fosslinux
Copy link

@fosslinux fosslinux commented Jun 5, 2018

How would this be possible? GitHub is closed-source, I think the only real way would be to migrate away from GitHub for these tasks.

My only thought would be git cloneing the repository/ies and cross checking that actions were done. However GitHub and the git command is still being trusted that what is on GitHub's servers is actually being cloned. The good thing there though is that git is open-source. Does this mean we can trust git because it is open-source?

@woju
Copy link
Member

@woju woju commented Jun 5, 2018

One thing to do would be to actually backup the issues and pull-requests, irrespective of any migration plans, which may or may not happen in the future. It should include reviews, comments, labels, milestones, and all the metadata maybe up to those "👍" reactions under comments. Also we sometimes use comments to commits outside reviews. (Did I miss something?)

Both @marmarek and I have extensive e-mail archive of github notifications, which is better than nothing, but much of the metadata is missing.

@fosslinux
Copy link

@fosslinux fosslinux commented Jun 5, 2018

@woju I see the benefit of that. I'm not sure how comments to commits outside reviews would work, but everything else maybe have a look at the new user Migration API: https://developer.github.com/changes/2018-05-24-user-migration-api/ (I know we're not migrating but it might suit the purpose) and this gist script: https://gist.github.com/rodw/3073987.

@tokideveloper
Copy link

@tokideveloper tokideveloper commented Jun 7, 2018

@andrewdavidwong wrote:

We aim to distrust the infrastructure. However, as we've discussed previously and more recently, we actually trust GitHub quite a bit for workflow and issue tracking. We also implicitly trust GitHub when having each other review documentation PRs before merging.
[…]
I think we should seriously investigate ways of reducing our reliance on (i.e., distrusting) these aspects of GitHub.

I see that distrusting any infrastructure you don't own is necessary. But wouldn't it be cheaper, easier and more time-saving to move to a trusted infrastructure since there wouldn't be any needs to spend time, money and thinking on tasks towards distrusting it?

@tokideveloper
Copy link

@tokideveloper tokideveloper commented Jun 7, 2018

@woju wrote:

One thing to do would be to actually backup the issues and pull-requests […]. It should include reviews, comments, labels, milestones, […]

As far as I can see, these things can be migrated to GitLab when using the GitHub to GitLab importer. And since you can host your own GitLab instance, you can rely on that part of infrastructure, at least more than on the current GitHub instance.

[…] and all the metadata maybe up to those "+1" reactions under comments. Also we sometimes use comments to commits outside reviews. (Did I miss something?)

I don't know if these are included into GitLab's importer. But I see that GitLab is under heavy development, so, chances are high that this will be implemented soon, if it's missing. (Especially now, where many projects move to GitLab.)

@marmarek
Copy link
Member

@marmarek marmarek commented Jun 7, 2018

It isn't only about "owning" the infrastructure. It's also about it's complexity. Even if we run our own servers, in our own data center, there will be still amazingly complex software stack there (all the http servers, web applications etc), where surely a lot of bugs exist. We prefer to not trust them, instead of attempting to secure them.

@tokideveloper
Copy link

@tokideveloper tokideveloper commented Jun 7, 2018

Okay, I must say that I missed to make my point. In my posts I wanted to say that emigrating from GitHub and getting ownership of the infrastructure would be probably feasible. But I missed to tell the reasons why I think it's important to do so:

My reasons are on the level of power, not software safety/security. Surprisingly, GitHub was bought by MS. And the first action after that was bad IMHO and also surprising: This time it was censoring "upend" from the Trending page, but tomorrow it could also be

  • forcing developers to sign in with a MS account,
  • forcing the use of Bing when searching,
  • forcing developers to pay in order to use GitHub,
  • disadvantaging those projects (maybe QubesOS?) which could be adverse for MS (maybe in a way that we cannot notice) (this could lead to differently privileged classes of projects),
  • excluding certain developers from signing in,
  • modifying the terms of service in a way some developers wouldn't like it, so, they would leave GitHub,
  • taking projects over
  • or even shutting down projects.

Note that all of this could happen as surprisingly as the deal. E.g. if QubesOS is deleted then you don't have any chance to emigrate. (I know, for this case, someone would have the most recent Git repo, but the issues etc. will be lost.)

All the reasons listed above cannot become a problem for QubesOS if it were hosted on its own infrastructure, I guess.

Am I wrong? Have I overseen something?

@tokideveloper
Copy link

@tokideveloper tokideveloper commented Jun 7, 2018

It isn't only about "owning" the infrastructure. It's also about it's complexity. Even if we run our own servers, in our own data center, there will be still amazingly complex software stack there (all the http servers, web applications etc), where surely a lot of bugs exist. We prefer to not trust them, instead of attempting to secure them.

Sorry, I can't resist: I've heard that there is an operating system where software you don't trust can be kinda "jailed" into so-called "qubes" in order to prevent them from affecting other parts of your computer/software. Maybe we could ask that project to help us? ;-)

@tokideveloper
Copy link

@tokideveloper tokideveloper commented Jun 7, 2018

What if MS surprisingly changes the terms of service or modifies the software of GitHub that way that issues, PR comments etc. cannot be exported anymore? Or at least not in a free format? Or only encrypted that it cannot be used elsewhere? Or other things like that?

@fosslinux
Copy link

@fosslinux fosslinux commented Jun 7, 2018

TL;DR: in any case we probably have at least 30 days before MS could change anything.

@tokideveloper All of these are very important points. While I do think that this is an issue and calls for immediate action, there is one major point as to why MS (for now) (probably) could not change things just like that.

In the GitHub Terms of Service Part R: Changes to These Terms it states:

We reserve the right, at our sole discretion, to amend these Terms of Service at any time and will update these Terms of Service in the event of any such amendments. We will notify our Users of material changes to this Agreement, such as price changes, at least 30 days prior to the change taking effect by posting a notice on our Website. For non-material modifications, your continued use of the Website constitutes agreement to our revisions of these Terms of Service. You can view all changes to these Terms in our Site Policy repository.

(emphasis mine)

This clearly states that for material changes, we get 30 days notice, more than enough to migrate to GitLab/trusted infastructure/BitBucket/something else. I would personally classify any of what you have said as material changes, however the ToC do not define material or non-material.

A few possible definitions of material:

  • Changes to the content of the site/how the site works.
  • Major changes.
  • Physical changes vs changes to the focus of the site.
  • something else.

While I do not disagree with the importance of these points, it should be considered that none of these are happening any time soon. While there is a probability of them happening tomorrow, it is a low one. But I think caution should still be taken and some backup should be made.

@tokideveloper
Copy link

@tokideveloper tokideveloper commented Jun 8, 2018

@sstt011 Thank you for your investigation and estimation!

While there is a probability of [possible material changes] happening tomorrow, it is a low one.

Agreed (since I know that 30-days thing now).

But I think caution should still be taken and some backup should be made.

Yes! Maybe, we could make a full backup (of repos, issues, PR comments etc.) and measure the time it needs. If it takes around 30 days or longer then we should definitely make backups in appropriate intervals.

However, if it's possible to make incremental backups then I'd prefer that one in a permanent way.

@marmarek
Copy link
Member

@marmarek marmarek commented Jun 8, 2018

There is a new migration API to download all the data associated with user/repository. I don't see documentation about archive format there, but I'd assume it is something machine readable (a set of json files?) that could be used to import it into another service if needed.

@andrewdavidwong
Copy link
Member Author

@andrewdavidwong andrewdavidwong commented Jun 9, 2018

But I think caution should still be taken and some backup should be made.

We should always have backups regardless of whether we plan to migrate away from the service.

There is a new migration API to download all the data associated with user/repository. I don't see documentation about archive format there, but I'd assume it is something machine readable (a set of json files?) that could be used to import it into another service if needed.

We should use it to make backups regardless of whether we plan to import them into another service.

See: https://groups.google.com/d/msg/qubes-devel/HDt1ZdDMfz4/Q8yS32a-EAAJ

Branched to: #3974

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants