-
Notifications
You must be signed in to change notification settings - Fork 5
Dockerised framework for DAFNI #251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🦙 MegaLinter status:
|
Descriptor | Linter | Files | Fixed | Errors | Elapsed time |
---|---|---|---|---|---|
black | 29 | 1 | 0.95s | ||
✅ PYTHON | pylint | 29 | 0 | 3.89s |
See detailed report in MegaLinter reports
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #251 +/- ##
=======================================
Coverage 95.69% 95.69%
=======================================
Files 22 22
Lines 1557 1557
=======================================
Hits 1490 1490
Misses 67 67 Continue to review full report in Codecov by Sentry.
|
dafni/inputs/variables.json
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps I'm just not software engineering-y enough, but why do we need separate files for inputs and tests, each with a single key in for which the value is a list? Could we not either combine tests.json
and variables.json
into a single file, or have them each just contain a list of the values, or is that in some way bad practice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A potential alternative to this would be to include this metadata in the dag file as attributes for each variable. That could make things simpler from a user's point of view as they'd then only have to make 1 file (the DAG, the tests can be built automatically from the DAG).
estimators = {"LinearRegressionEstimator": LinearRegressionEstimator} | ||
|
||
# Step 3: Define the expected variables | ||
|
||
expected_outcome_effects = { | ||
"Positive": Positive(), | ||
"Negative": Negative(), | ||
"NoEffect": NoEffect(), | ||
"SomeEffect": SomeEffect()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way we can get round this hardcoding? This makes it pretty much impossible for a user to implement customisations, and is also the only blocker to having a single default main.py
file (I think), now we're putting variables in a file.
@f-allian please can you post your
|
Hi Michael, I believe your
otherwise your |
Thanks Chris, that works. Would it be sensible to commit the |
@jmafoster1 There isn't a specific reason why I didn't include it in my commit in this case. Generally speaking, environment files typically contain passwords, API keys etc., which is why they're left out of the version controlling stage. I don't think DAFNI requires that to be specified in our case, so I can provide a template if you think that's useful. Also, if you think some of my variable naming conventions aren't helpful/easily identifiable, please let me know! |
I think it would be helpful to show an example/template. I'm happy with variable names, but we should update the "without docker" run command to include the |
dafni/Dockerfile
Outdated
RUN pip install causal-testing-framework --no-cache-dir | ||
|
||
# Use the necessaary environment variables for the script's inputs | ||
ENV VARIABLES=./inputs/variables.json \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that these are overwritten by the .env but VARIABLES
here should be VARIABLES_PATH
. Also, if we're assuming a .env to be supplied, are setting these within the dockerfile necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rsomers1998 You were indeed correct! Thanks for flagging this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks really really good! Found some small clean ups
WORKDIR /usr/src/app/ | ||
|
||
# Install core dependencies using PyPi | ||
RUN pip install causal-testing-framework --no-cache-dir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity why is the --no-cache-dir
flag used here? My guess would be that that there is no causal-testing-framework
wheel in cache as it's a fresh container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cwild-UoS The --no-cache-dir
option disables the downloading and storing of cached packages, which reduces the overall Docker image size. I haven't calculated what that difference is, but it's not too important for our purposes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes lots of sense! Good idea
dafni/main_dafni.py
Outdated
@@ -0,0 +1,210 @@ | |||
import warnings | |||
warnings.filterwarnings("ignore", message=".*The 'nopython' keyword.*") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What warning does this filter out? I tried running the script without this line and didn't see any warnings.
dafni/main_dafni.py
Outdated
""" | ||
if not variables_path.exists() or variables_path.is_dir(): | ||
|
||
raise ValidationError(f"Cannot find a valid settings file at {variables_path.absolute()}.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd argue this could be the built in FileNotFoundError
rather than a custom exception.
dafni/main_dafni.py
Outdated
|
||
constraints = set() | ||
|
||
for variable, _inputs in zip(variables, inputs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the element of inputs
called _inputs
rather than input
? Typically the underscore convention is used for private variables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cwild-UoS input
is already a pre-defined method in Python, so it's not a good idea to overwrite it! I don't think the variable name matters too much in this case, but I can change it if you think it's needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think changing it from having an underscore would be good just to give off the right intent. Also isn't it already non pre-defined being inputs
rather than input
?
dafni/main_dafni.py
Outdated
|
||
|
||
if __name__ == "__main__": | ||
main() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some old fashioned reason (joe heffer explained it once and I forgot), python files should end with a blank line haha
This PR contains an initial commit for the dockerisation of the causal testing framework, so that it can be hosted on DAFNI - closes issue #92 after review.
Steps to reproduce tests
This PR uses the vaccinating elderly example as a test-case. The entry point to the framework is wrapped in a script called
main_dafni.py
, which takes in 4 mandatory input arguments, includingdata_path
,dag_path
,tests_path
,variables_path
.1. Without Docker
Simply run:
python main_dafni.py --variables_path $VARIABLES --dag_path $DAG_PATH --data_path $DATA_PATH --tests_path $CAUSAL_TESTS
,and point to the path containing your configuration files (for this example, everything is defined under
./dafni/inputs
). The resultant causal tests will be saved in.json
format in the folder./dafni/outputs
. (Note; the folder structure here is important for DAFNI)2. With Docker
.env
file in the./dafni
directory containing the environment variables, which is then passed into the dockerfile for the build.docker-compose up
.Overall, the total execution time (building of the image and script execution) takes ~1 minute on my computer (this may take slightly longer on a new setup as there won't be any cached data).
Overall Progress
model_definition.yaml
to contain the appropriate dataslot IDs (if any) needed for the execution of the framework