Skip to content

Dockerised framework for DAFNI #251

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 30, 2024
Merged

Dockerised framework for DAFNI #251

merged 5 commits into from
Jan 30, 2024

Conversation

f-allian
Copy link
Contributor

@f-allian f-allian commented Dec 12, 2023

This PR contains an initial commit for the dockerisation of the causal testing framework, so that it can be hosted on DAFNI - closes issue #92 after review.

Steps to reproduce tests

This PR uses the vaccinating elderly example as a test-case. The entry point to the framework is wrapped in a script called main_dafni.py, which takes in 4 mandatory input arguments, including data_path, dag_path, tests_path, variables_path.

1. Without Docker

Simply run:

python main_dafni.py --variables_path $VARIABLES --dag_path $DAG_PATH --data_path $DATA_PATH --tests_path $CAUSAL_TESTS,

and point to the path containing your configuration files (for this example, everything is defined under ./dafni/inputs). The resultant causal tests will be saved in .json format in the folder ./dafni/outputs. (Note; the folder structure here is important for DAFNI)

2. With Docker

  1. Create a .env file in the ./dafni directory containing the environment variables, which is then passed into the dockerfile for the build.
  2. Then let docker-compose do the building of the image and creating the container by simply running docker-compose up.

Overall, the total execution time (building of the image and script execution) takes ~1 minute on my computer (this may take slightly longer on a new setup as there won't be any cached data).

Overall Progress

  • DAFNI / Docker #92
  • Create a wrapper of the causal testing framework to be used as an entry-point for Docker
  • Containerise the causal testing framework using Docker
  • Update the model_definition.yaml to contain the appropriate dataslot IDs (if any) needed for the execution of the framework
  • Conduct more tests using different examples (we should have plenty to test)
  • Upload to DANFI.

@f-allian f-allian added the enhancement New feature or request label Dec 12, 2023
@f-allian f-allian self-assigned this Dec 12, 2023
Copy link

github-actions bot commented Dec 12, 2023

🦙 MegaLinter status: ⚠️ WARNING

Descriptor Linter Files Fixed Errors Elapsed time
⚠️ PYTHON black 29 1 0.95s
✅ PYTHON pylint 29 0 3.89s

See detailed report in MegaLinter reports

MegaLinter is graciously provided by OX Security

Copy link

codecov bot commented Dec 12, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (48fd185) 95.69% compared to head (5972be0) 95.69%.
Report is 1 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #251   +/-   ##
=======================================
  Coverage   95.69%   95.69%           
=======================================
  Files          22       22           
  Lines        1557     1557           
=======================================
  Hits         1490     1490           
  Misses         67       67           

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4034519...5972be0. Read the comment docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I'm just not software engineering-y enough, but why do we need separate files for inputs and tests, each with a single key in for which the value is a list? Could we not either combine tests.json and variables.json into a single file, or have them each just contain a list of the values, or is that in some way bad practice?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A potential alternative to this would be to include this metadata in the dag file as attributes for each variable. That could make things simpler from a user's point of view as they'd then only have to make 1 file (the DAG, the tests can be built automatically from the DAG).

Comment on lines +158 to +166
estimators = {"LinearRegressionEstimator": LinearRegressionEstimator}

# Step 3: Define the expected variables

expected_outcome_effects = {
"Positive": Positive(),
"Negative": Negative(),
"NoEffect": NoEffect(),
"SomeEffect": SomeEffect()}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way we can get round this hardcoding? This makes it pretty much impossible for a user to implement customisations, and is also the only blocker to having a single default main.py file (I think), now we're putting variables in a file.

@jmafoster1
Copy link
Contributor

jmafoster1 commented Jan 3, 2024

@f-allian please can you post your .env file as an example? I've never made one before, so guessed at the following, but docker-compose up fails with error message main_dafni.py: error: argument --variables_path: expected one argument
My .env file:

VARIABLES=inputs/variables.json
DAG_PATH=inputs/dag.dot
DATA_PATH=inputs/simulated_data.csv
CAUSAL_TESTS=inputs/causal_tests.json

@christopher-wild
Copy link
Contributor

christopher-wild commented Jan 3, 2024

Hi Michael,

I believe your VARIABLES line should instead be VARIABLES_PATH. i.e.

VARIABLES_PATH=inputs/variables.json
DAG_PATH=inputs/dag.dot
DATA_PATH=inputs/simulated_data.csv
CAUSAL_TESTS=inputs/causal_tests.json

otherwise your .env file is identical to the one Farhad shared with me

@jmafoster1
Copy link
Contributor

Thanks Chris, that works. Would it be sensible to commit the .env file as part of the example, or is there a reason why this isn't a good idea?

@f-allian
Copy link
Contributor Author

f-allian commented Jan 3, 2024

Thanks Chris, that works. Would it be sensible to commit the .env file as part of the example, or is there a reason why this isn't a good idea?

@jmafoster1 There isn't a specific reason why I didn't include it in my commit in this case. Generally speaking, environment files typically contain passwords, API keys etc., which is why they're left out of the version controlling stage. I don't think DAFNI requires that to be specified in our case, so I can provide a template if you think that's useful.

Also, if you think some of my variable naming conventions aren't helpful/easily identifiable, please let me know!

@jmafoster1
Copy link
Contributor

I think it would be helpful to show an example/template. I'm happy with variable names, but we should update the "without docker" run command to include the _PATH suffix to be consistent.

dafni/Dockerfile Outdated
RUN pip install causal-testing-framework --no-cache-dir

# Use the necessaary environment variables for the script's inputs
ENV VARIABLES=./inputs/variables.json \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that these are overwritten by the .env but VARIABLES here should be VARIABLES_PATH. Also, if we're assuming a .env to be supplied, are setting these within the dockerfile necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rsomers1998 You were indeed correct! Thanks for flagging this.

Copy link
Contributor

@christopher-wild christopher-wild left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really really good! Found some small clean ups

WORKDIR /usr/src/app/

# Install core dependencies using PyPi
RUN pip install causal-testing-framework --no-cache-dir
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity why is the --no-cache-dir flag used here? My guess would be that that there is no causal-testing-framework wheel in cache as it's a fresh container.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cwild-UoS The --no-cache-dir option disables the downloading and storing of cached packages, which reduces the overall Docker image size. I haven't calculated what that difference is, but it's not too important for our purposes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes lots of sense! Good idea

@@ -0,0 +1,210 @@
import warnings
warnings.filterwarnings("ignore", message=".*The 'nopython' keyword.*")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What warning does this filter out? I tried running the script without this line and didn't see any warnings.

"""
if not variables_path.exists() or variables_path.is_dir():

raise ValidationError(f"Cannot find a valid settings file at {variables_path.absolute()}.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue this could be the built in FileNotFoundError rather than a custom exception.


constraints = set()

for variable, _inputs in zip(variables, inputs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the element of inputs called _inputs rather than input? Typically the underscore convention is used for private variables

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cwild-UoS input is already a pre-defined method in Python, so it's not a good idea to overwrite it! I don't think the variable name matters too much in this case, but I can change it if you think it's needed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think changing it from having an underscore would be good just to give off the right intent. Also isn't it already non pre-defined being inputs rather than input?



if __name__ == "__main__":
main()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some old fashioned reason (joe heffer explained it once and I forgot), python files should end with a blank line haha

@f-allian f-allian merged commit e7a3e90 into main Jan 30, 2024
@christopher-wild christopher-wild deleted the dafni-branch branch April 4, 2024 08:25
@f-allian f-allian linked an issue Apr 9, 2024 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DAFNI / Docker
4 participants