Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doctor_visits patching code #1977

Merged
merged 21 commits into from
Jul 8, 2024
Merged

Doctor_visits patching code #1977

merged 21 commits into from
Jul 8, 2024

Conversation

minhkhul
Copy link
Contributor

@minhkhul minhkhul commented Jun 20, 2024

Description

This change automates multi-day patching for doctor_visits indicators, producing csv files organized in batch issue format: [name-of-patch]/issue_[issue-date]/nssp/actual_data_file.csv, making it easier to ingest large batches of patch data.

Users start by specify in params.json the start issue and end issue of the patch, along with a directory where output is stored in. For example:

  "patch": {
    "patch_dir": "/Users/minhkhuele/Desktop/delphi/covidcast-indicators/doctor_visits/AprilPatch",
    "start_issue": "2024-04-20",
    "end_issue": "2024-04-21"
  }

the run patch.py will create a directory 'covidcast-indicators/doctor_visits/AprilPatch' containing:

AprilPatch/
    issue_20240420/
        doctor-visits/
            20240408_state_smoothed_adj_cli.csv
            20240408_state_smoothed_cli.csv
            ...
    issue_20240421/
        doctor-visits/
            20240409_state_smoothed_adj_cli.csv
            20240408_state_smoothed_cli.csv
            ...

@@ -51,9 +51,13 @@ def change_date_format(name):
name = '_'.join(split_name)
return name

def download(ftp_credentials, out_path, logger):
def download(ftp_credentials, out_path, logger, issue=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general thought: can we introduce using typing for the function parameters? I checked that typing was introduced as a part of the standard library at 3.5 so should be able to use it for this repo. As a newcomer it would be helpful to know the type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aysim319 can you turn this request into an issue?

@aysim319
Copy link
Contributor

I know we mentioned testing in person yesterday and was wondering if you were planning to add a basic pytest for sanity checks

Copy link
Contributor

@nmdefries nmdefries left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made some comments about adding example input/output.

Please add unit tests for download() and get_latest_filename that use the issue_date arg to make sure that is working as desired. Merge at will once that's done.

Everything else looks good.

Comment on lines +56 to +60
if not issue_date:
current_time = datetime.datetime.now()
else:
current_time = datetime.datetime.strptime(issue_date, "%Y-%m-%d").replace(hour=23, minute=59, second=59)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (optional): since we're using this chunk in multiple places, consider factoring it out into a separate function. (2 duplicates is okay to leave, if we use this in a third place, we definitely should factor it out.)

doctor_visits/delphi_doctor_visits/patch.py Outdated Show resolved Hide resolved
To use this module, you need to specify the range of issue dates in params.json.

It will generate data for that range of issue dates, and store them in batch issue format:
[name-of-patch]/issue_[issue-date]/doctor-visits/actual_data_file.csv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: add example output

Suggested change
[name-of-patch]/issue_[issue-date]/doctor-visits/actual_data_file.csv
[name-of-patch]/issue_[issue-date]/doctor-visits/actual_data_file.csv
The run using the above example `params.json` will create a directory 'covidcast-indicators/doctor_visits/AprilPatch' containing:
AprilPatch/
issue_20240420/
doctor-visits/
20240408_state_smoothed_adj_cli.csv
20240408_state_smoothed_cli.csv
...
issue_20240421/
doctor-visits/
20240409_state_smoothed_adj_cli.csv
20240408_state_smoothed_cli.csv
...

@nmdefries
Copy link
Contributor

nmdefries commented Jul 2, 2024

question: do we have or do we need a helper script to put these files into the DB? That sounds useful. Due to the output dir structure defined here, this seems like it will need a specialized acquisition script. That would belong in delphi-epidata rather than here.

Co-authored-by: nmdefries <42820733+nmdefries@users.noreply.github.com>
@minhkhul
Copy link
Contributor Author

minhkhul commented Jul 4, 2024

@nmdefries No we don't need new script to go from csv to db. Acquisition on epidata already includes code that allow files in this structure to get put into db using the flag specific_issue_date. I set up this cronicle job that calls acquisition using the flag, and been loading patches that way to our db.

@minhkhul minhkhul merged commit 1526dc9 into main Jul 8, 2024
16 checks passed
@nmdefries nmdefries deleted the patch_doctor_visits branch July 8, 2024 23:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants