New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature generation refactor (closes gh-32) #70
Conversation
Augmented feature generation testing (which currently tests all features against previously-computed values) with unit tests for specific algorithms. Where possible the tests compare to a priori closed form solutions; in a few cases where this isn't practical, we compare to the current outputof that function (so these tests are only beneficial going forward).
Move relevant code from TCP library into new path; leave TCP code in place for now, but nothing there should be used anywhere at this point.
Remove unnecessary code from lc_tools module that is not used for feature generation; use dask for computing feature values, in the same way as is now done for science_features.
@@ -2576,7 +2576,7 @@ def test_prediction_page(self): | |||
res_dict = json.loads(rv.data) | |||
while "currently running" in fa.check_job_status(res_dict["PID"]): | |||
time.sleep(1) | |||
time.sleep(1) | |||
time.sleep(5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's necessary to sleep 5 seconds here?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I don't remember changing that...going to just revert it and double-check that tests still pass.
Naming conventions and implementation details were inconsistent between `lc_tools` (now `obs_feature_tools`), `science_feature_tools`, and in various other places throughout the featurization pipeline. Improves consistency within these areas and removes some deprecated/unnecessary logic, including 1) featurization of a single time series now always returns a dict, instead of sometimes a list containing a single dict; 2) `short_fname` keys into dictionaries are now always the filename without the extension (previously the extension was included in some places and not others). Undo test_prediction_page sleep change
This reverts commit 0120754. For now, restore TCP module to simplify the refactor pull request. Will remove TCP later after the replacement code has been merged.
2edcfc0
to
7c54921
Compare
return None | ||
|
||
|
||
def generate_obs_features(t, m, e, features_to_compute=cfg.features_list_obs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole module is so much cleaner now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay! Reminds me I need to write a test for the binning/peak-finding stuff...
This is such a big clean-up that I don't want to hold back too much on details. I've left a few comments, but I'm +1 on merging. |
@@ -1554,7 +1554,7 @@ def test_prediction_proc(self): | |||
entry = r.table("predictions").get("TEMP_TEST01").run(conn) | |||
pred_results_list_dict = entry | |||
assert(pred_results_list_dict["pred_results_list_dict"] | |||
["TESTRUN_215153.dat"][0][0] in ['Beta_Lyrae', | |||
["TESTRUN_215153"][0][0] in ['Beta_Lyrae', | |||
'Herbig_AEBE']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's one example of an indentation issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I forgot to save my vim plugins from my old machine and I'm still trying to get the indentation working correctly...
@bnaul just let me know when you've pushed the last changes you want to include and I'll merge |
|
This must be the error Josh mentioned is happening in Drone... |
try |
|
f76e839
to
c1776d2
Compare
dfd97e3
to
1059710
Compare
Tidying up: remove import of deprecated `lc_tools`, add a couple tests for `obs_feature_tools`, change some test data, remove extraneous comments, add a couple of docstrings, change `test_flask_app` database teardown procedure.
1059710
to
2660c11
Compare
I've taken care of the notes here and the other things I was still working on, ready when you are... |
All clear! |
Feature generation refactor (closes gh-32)
Once you've made sure that you've checked in all your files: |
All relevant code was moved to mltsp.science_features in cesium-ml#70.
All relevant code was moved to mltsp.science_features in cesium-ml#70.
All relevant code was moved to mltsp.science_features in cesium-ml#70.
TCP
module intoscience_features
;TCP
is no longer in use anywhere, will be removed laterlc_tools
toobs_feature_tools
, remove unused legacy codeshort_fname
should now always be a basename without an extension (this is used as a key in various dictionaries)dict
, and some alist
of a singledict
; should always be adict
now, and a lot of tests were changed to reflect this flattening (e.g.result[0]['somekey']
->result['somekey']
)As far as I know this is functionally finished (all tests passing for me), just needs to be thoroughly reviewed. The new functionality should all have docstrings, but some of the old code that I changed was lacking docstrings to begin with; will try to go back and add them as I review my changes.