Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cdaweb direct file access #89

Merged
merged 45 commits into from Sep 20, 2023
Merged

Conversation

jeandet
Copy link
Member

@jeandet jeandet commented May 4, 2023

Big PR, mostly adding direct archive support plus some refactoring around http module.
Direct Archive support means, Speasy can access servers with no webservices or any local folder as long as:

  • remote server enables mod_dir or generates some html files for each folder where Speasy can extract the file list
  • files are split regularly or at least start date can be parsed from file names
  • file are ISTP compliant CDFs

Most interesting parts for review are:

  • archive module itself here and here for downloaders strategies
  • former HTTP module has been split into:
  • AMDA and CDAWeb have been refactored to use new HTTP AnyFiles an URL modules
  • Test are here:

The module is documented here and human readable here.

This PR also adds PyYAML as new dependency.

@jeandet jeandet added enhancement New feature or request WIP Work In Progress (don't merge) labels May 4, 2023
@jeandet jeandet force-pushed the cdaweb_direct_file_access branch 2 times, most recently from 9195a9c to 385964e Compare May 23, 2023 16:19
@jeandet jeandet force-pushed the cdaweb_direct_file_access branch from 9bb8937 to 842fe48 Compare May 24, 2023 13:35
@jeandet jeandet marked this pull request as ready for review May 24, 2023 13:56
speasy/core/file_access.py Fixed Show fixed Hide fixed
@jeandet jeandet force-pushed the cdaweb_direct_file_access branch from f5cf14e to 2a535b9 Compare May 25, 2023 08:54
speasy/core/file_access.py Fixed Show fixed Hide fixed
speasy/core/file_access.py Fixed Show fixed Hide fixed
speasy/core/file_access.py Fixed Show fixed Hide fixed
speasy/core/any_files.py Fixed Show fixed Hide fixed

RETRY_AFTER_LIST = [429, 503] # Note: Specific treatment for 429 & 503 error codes (see below)

_HREF_REGEX = re.compile(' href="([A-Za-z0-9.-_]+)">')

Check warning

Code scanning / CodeQL

Overly permissive regular expression range Medium

Suspicious character range that overlaps with 0-9 in the same character class, and overlaps with A-Z in the same character class, and is equivalent to \[.\/0-9:;<=>?@A-Z\\[\\\\]^_\].
@@ -10,9 +10,9 @@
from .utils import load_catalog, load_csv, load_timetable
# General modules
from ...config import amda as amda_cfg
from ...core.any_files import any_loc_open

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'any_loc_open' is not used.

RETRY_AFTER_LIST = [429, 503] # Note: Specific treatment for 429 & 503 error codes (see below)

_HREF_REGEX = re.compile(' href="([A-Za-z0-9.-_]+)">')

Check notice

Code scanning / CodeQL

Unused global variable Note

The global variable '_HREF_REGEX' is not used.
speasy/webservices/csa/__init__.py Fixed Show fixed Hide fixed
speasy/core/cdf/__init__.py Fixed Show fixed Hide fixed
speasy/core/any_files.py Fixed Show fixed Hide fixed
@jeandet jeandet force-pushed the cdaweb_direct_file_access branch 2 times, most recently from 9b35369 to 7ad431e Compare May 26, 2023 15:06
speasy/webservices/csa/__init__.py Fixed Show fixed Hide fixed
speasy/core/any_files.py Fixed Show fixed Hide fixed
@@ -104,13 +105,13 @@
return root


def _read_cdf(response: requests.Response, variable: str) -> SpeasyVariable:
with tarfile.open(fileobj=BytesIO(response.content)) as tar:
def _load_variable(archive: io.BytesIO, variable: str) -> SpeasyVariable:

Check notice

Code scanning / CodeQL

Explicit returns mixed with implicit (fall through) returns Note

Mixing implicit and explicit returns may indicate an error as implicit returns always return None.
speasy/core/any_files.py Fixed Show fixed Hide fixed
# -*- coding: utf-8 -*-

"""Tests for `speasy.common` package."""
import re

Check notice

Code scanning / CodeQL

Unused import Note

Import of 're' is not used.
speasy/core/url_utils.py Fixed Show fixed Hide fixed

from speasy.core.url_utils import ensure_url_scheme, is_local_file

_HERE_ = os.path.dirname(os.path.abspath(__file__))

Check notice

Code scanning / CodeQL

Unused global variable Note

The global variable '_HERE_' is not used.
This reduces a lot functions signature size...

Signed-off-by: Alexis Jeandet <alexis.jeandet@member.fsf.org>
Signed-off-by: Alexis Jeandet <alexis.jeandet@member.fsf.org>
…tring

Signed-off-by: Alexis Jeandet <alexis.jeandet@member.fsf.org>
Signed-off-by: Alexis Jeandet <alexis.jeandet@member.fsf.org>
Signed-off-by: Alexis Jeandet <alexis.jeandet@member.fsf.org>
Signed-off-by: Alexis Jeandet <alexis.jeandet@member.fsf.org>
…y extractor

Signed-off-by: Alexis Jeandet <alexis.jeandet@member.fsf.org>
Signed-off-by: Alexis Jeandet <alexis.jeandet@member.fsf.org>
Signed-off-by: Alexis Jeandet <alexis.jeandet@member.fsf.org>
Signed-off-by: Alexis Jeandet <alexis.jeandet@member.fsf.org>
Signed-off-by: Alexis Jeandet <alexis.jeandet@member.fsf.org>
Signed-off-by: Alexis Jeandet <alexis.jeandet@member.fsf.org>
… and

remote files + regex filter and file list for archive module

Since cda archive datasets have different cdf versions across the same
year, this PR adds the capability to list files from remote location
parsing html pages such as apache mod_dir generated ones. This allows to
give some predictable url base with a small regex part to filter matching
files from either remote or local location.

Signed-off-by: Alexis Jeandet <alexis.jeandet@member.fsf.org>
Signed-off-by: Alexis Jeandet <alexis.jeandet@member.fsf.org>
Signed-off-by: Alexis Jeandet <alexis.jeandet@member.fsf.org>
Signed-off-by: Alexis Jeandet <alexis.jeandet@member.fsf.org>
@codecov
Copy link

codecov bot commented Jun 28, 2023

Codecov Report

Patch coverage: 88.32% and project coverage change: +2.62 🎉

Comparison is base (3187bdb) 83.86% compared to head (1df08de) 86.49%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #89      +/-   ##
==========================================
+ Coverage   83.86%   86.49%   +2.62%     
==========================================
  Files          45       51       +6     
  Lines        2771     3043     +272     
  Branches      437      473      +36     
==========================================
+ Hits         2324     2632     +308     
+ Misses        316      276      -40     
- Partials      131      135       +4     
Flag Coverage Δ
unittests 85.86% <88.32%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
speasy/webservices/amda/inventory.py 80.18% <0.00%> (ø)
speasy/webservices/amda/_impl.py 77.67% <44.44%> (+9.53%) ⬆️
speasy/core/cdf/__init__.py 75.47% <57.14%> (+2.97%) ⬆️
speasy/core/dataprovider.py 89.61% <66.66%> (+0.13%) ⬆️
speasy/core/inventory/indexes.py 90.90% <75.00%> (ø)
speasy/webservices/cda/__init__.py 81.91% <83.33%> (-0.56%) ⬇️
speasy/core/cdf/inventory_extractor.py 85.71% <85.71%> (ø)
speasy/webservices/csa/__init__.py 87.39% <88.88%> (+2.59%) ⬆️
speasy/core/any_files.py 89.47% <89.47%> (ø)
speasy/webservices/generic_archive/__init__.py 89.55% <89.55%> (ø)
... and 16 more

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Signed-off-by: Alexis Jeandet <alexis.jeandet@member.fsf.org>
Signed-off-by: Alexis Jeandet <alexis.jeandet@member.fsf.org>
Signed-off-by: Alexis Jeandet <alexis.jeandet@member.fsf.org>
Signed-off-by: Alexis Jeandet <alexis.jeandet@member.fsf.org>
Signed-off-by: Alexis Jeandet <alexis.jeandet@member.fsf.org>
@jeandet jeandet added this to the 1.2.0 milestone Jun 30, 2023
Signed-off-by: Alexis Jeandet <alexis.jeandet@member.fsf.org>
Signed-off-by: Alexis Jeandet <alexis.jeandet@member.fsf.org>
@jeandet jeandet force-pushed the cdaweb_direct_file_access branch from c100877 to b1a1e8b Compare July 4, 2023 13:44
Signed-off-by: Alexis Jeandet <alexis.jeandet@member.fsf.org>
headers usage

Signed-off-by: Alexis Jeandet <alexis.jeandet@member.fsf.org>
@jeandet jeandet force-pushed the cdaweb_direct_file_access branch from 2acbe77 to 256f49a Compare July 5, 2023 16:59
Signed-off-by: Alexis Jeandet <alexis.jeandet@member.fsf.org>
Signed-off-by: Alexis Jeandet <alexis.jeandet@member.fsf.org>
Signed-off-by: Alexis Jeandet <alexis.jeandet@member.fsf.org>
Signed-off-by: Alexis Jeandet <alexis.jeandet@member.fsf.org>
Signed-off-by: Alexis Jeandet <alexis.jeandet@member.fsf.org>
@jeandet jeandet added the New WS label Jul 10, 2023
@jeandet jeandet removed the WIP Work In Progress (don't merge) label Jul 19, 2023
@jeandet
Copy link
Member Author

jeandet commented Sep 20, 2023

@brenard-irap can we merge this now?

@jeandet jeandet merged commit 5372fdf into SciQLop:main Sep 20, 2023
19 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request New WS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants