-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added error handling #18
base: main
Are you sure you want to change the base?
Conversation
daget/__main__.py
Outdated
@@ -22,22 +22,22 @@ def main(): | |||
# get doi/url and resolve to landing page | |||
try: | |||
url = get_redirect_url(args.url) | |||
res = [ele for ele in ["dataverse.harvard.edu", "dataverse.no", "snd.se/catalogue", "snd.gu.se", "su.figshare.com", "figshare.scilifelab.se", "zenodo.org"] if(ele in url)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checking the url on a hardcoded list would not cover all cases, all repositories where schema.org contains distrubution information will work and the list of figshare repositories is quite long: https://knowledge.figshare.com/type-of-client/institutions
A more generic solution would be to check if get_file_list_from_repo()
returns any values.
except urllib.error.HTTPError: | ||
raise ResolveError(f"{url} not found") | ||
|
||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, giving more precise errors is good
daget/__main__.py
Outdated
exit(1) | ||
|
||
print(f'landing page: {url}') | ||
|
||
# get desitnation directory and create directory | ||
desitnation = os.path.realpath(args.destination) | ||
|
||
if not os.path.exists(desitnation): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure why this was removed, checking for empty destination directory was a feature i added to make sure the downloaded dataset will be exactly as the remote source.
super().__init__(message) | ||
self.url = url | ||
self.supported_urls = supported_urls or ["dataverse.harvard.edu", "dataverse.no", "snd.se/catalogue", "su.figshare.com", "figshare.scilifelab.se", "zenodo.org"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hardcoded list of repository url:s should be removed.
daget should try to get a file list via schema.org distribution (if it´s not figshare or zenodo) and if this fails it should throw the error instead. keeping a list of all suported url:s in the source coude is not a sustainable soultion
import urllib, urllib.error | ||
from daget.exceptions import RepoError, ResolveError | ||
|
||
|
||
def get_redirect_url(url): | ||
# if url provided is a shorthand doi (TODO: check with regex) | ||
if not url.startswith(('http://', 'https://')): | ||
if not re.match(r'^https?://', url): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
No description provided.