Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added error handling #18

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Added error handling #18

wants to merge 1 commit into from

Conversation

alireza1111
Copy link

No description provided.

@@ -22,22 +22,22 @@ def main():
# get doi/url and resolve to landing page
try:
url = get_redirect_url(args.url)
res = [ele for ele in ["dataverse.harvard.edu", "dataverse.no", "snd.se/catalogue", "snd.gu.se", "su.figshare.com", "figshare.scilifelab.se", "zenodo.org"] if(ele in url)]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checking the url on a hardcoded list would not cover all cases, all repositories where schema.org contains distrubution information will work and the list of figshare repositories is quite long: https://knowledge.figshare.com/type-of-client/institutions

A more generic solution would be to check if get_file_list_from_repo() returns any values.

except urllib.error.HTTPError:
raise ResolveError(f"{url} not found")

try:
Copy link
Owner

@borsna borsna Nov 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, giving more precise errors is good

exit(1)

print(f'landing page: {url}')

# get desitnation directory and create directory
desitnation = os.path.realpath(args.destination)

if not os.path.exists(desitnation):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why this was removed, checking for empty destination directory was a feature i added to make sure the downloaded dataset will be exactly as the remote source.

super().__init__(message)
self.url = url
self.supported_urls = supported_urls or ["dataverse.harvard.edu", "dataverse.no", "snd.se/catalogue", "su.figshare.com", "figshare.scilifelab.se", "zenodo.org"]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hardcoded list of repository url:s should be removed.
daget should try to get a file list via schema.org distribution (if it´s not figshare or zenodo) and if this fails it should throw the error instead. keeping a list of all suported url:s in the source coude is not a sustainable soultion

import urllib, urllib.error
from daget.exceptions import RepoError, ResolveError


def get_redirect_url(url):
# if url provided is a shorthand doi (TODO: check with regex)
if not url.startswith(('http://', 'https://')):
if not re.match(r'^https?://', url):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants