Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore DMCA Repos #39

Closed
BitesizedLion opened this issue Oct 15, 2021 · 6 comments
Closed

Ignore DMCA Repos #39

BitesizedLion opened this issue Oct 15, 2021 · 6 comments
Assignees

Comments

@BitesizedLion
Copy link

Hey, would be great if it could ignore DMCA Repos, currently it fails and kills the entire tool therefore skipping the rest of the repos.

@Justintime50
Copy link
Owner

Hey there, can you provide the output of what happens when this dies? Each repo's action is spun up in a separate thread via this tool meaning if one repo blows up in its operation, it should only affect that single repo and not stop the entire tool from running.

Additionally, do you have a repo you know has this issue currently I can take a look at? Is there some kind of indicator via API that says a repo is affected by DMCA? I wasn't able to find anything on a brief initial check - without some kind of indicator it would be difficult to tell the tool to skip it.

For context on DMCA: https://docs.github.com/en/github/site-policy/dmca-takedown-policy

@Justintime50
Copy link
Owner

Here is an example of a repo that was taken down due to DMCA: https://github.com/CleanFlash/installer.

These are all published via https://github.com/github/dmca.

If you try retrieving this repo via API, it will return the following. With this info, we could check for a block key and simply filter those records out:

{
  "message": "Repository access blocked",
  "block": {
    "reason": "dmca",
    "created_at": "2021-10-05T15:29:49Z",
    "html_url": "https://github.com/github/dmca/blob/master/2021/10/2021-10-04-adobe.md"
  }
}

@Justintime50 Justintime50 self-assigned this Oct 15, 2021
@Justintime50
Copy link
Owner

Justintime50 commented Oct 15, 2021

The tool behaves as intended. I found a user that had a DMCA repo and ran the tool against their repos:

venv/bin/python '/Users/jhammond/git/personal/github-archive/github_archive/cli.py' --https --users mahima145 --clone

Command 'git clone https://github.com/mahima145/project /Users/jhammond/github-archive/repos/mahima145/project' returned non-zero exit status 128.
Repo: mahima145/Practice clone success!
Repo: mahima145/system_design clone success!
Repo: mahima145/Cppfeatures clone success!
Repo: mahima145/Cpp-Interview-Question-Guide clone success!
Repo: mahima145/AlgoQuestions clone success!
GitHub Archive complete! Execution time: 0:00:01.706232.

It clones every repo just fine that is not DMCA, this is because each git operation is self-contained in its own thread. As you can see, the first repo failed because it has a DMCA claim on it; however, the following 6 succeed. The tool does not "get killed" or fails completely due to the failure of a single repo. If you could provide additional details on how it failed for you, that would be great and I could continue looking into it; however, without additional context, I'm of the opinion there is no real issue here.

@BitesizedLion
Copy link
Author

BitesizedLion commented Oct 15, 2021

this attempt the user had 100+ repos
image

@BitesizedLion
Copy link
Author

BitesizedLion commented Oct 15, 2021

here you can see a user with 25 repos, yet it fails as soon as it hits the dmca'd repo, and doesn't continue
image
image

@Justintime50
Copy link
Owner

Justintime50 commented Oct 15, 2021

The GitHub Archive complete line tells me the tool itself didn't fail because that's the last line to get printed once we've iterated over each repo. Although this user has 25 repos listed, 18 of them are forks which are not included by default, you need to pass the --forks flag. The output you see is intended as it's cloning all non-fork repos which in this case is 7 (minus 1 which fails due to DMCA).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants