Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature generation refactor (closes gh-32) #70

Merged
merged 9 commits into from Oct 1, 2015

Conversation

bnaul
Copy link
Contributor

@bnaul bnaul commented Oct 1, 2015

  • Move relevant code from TCP module into science_features; TCP is no longer in use anywhere, will be removed later
  • Rename lc_tools to obs_feature_tools, remove unused legacy code
  • Improve consistency of style/parameters between various feature generation functions, including:
    • short_fname should now always be a basename without an extension (this is used as a key in various dictionaries)
    • Some featurization functions were returning a dict, and some a list of a single dict; should always be a dict now, and a lot of tests were changed to reflect this flattening (e.g. result[0]['somekey'] -> result['somekey'])

As far as I know this is functionally finished (all tests passing for me), just needs to be thoroughly reviewed. The new functionality should all have docstrings, but some of the old code that I changed was lacking docstrings to begin with; will try to go back and add them as I review my changes.

Augmented feature generation testing (which currently tests all features
against previously-computed values) with unit tests for specific
algorithms. Where possible the tests compare to a priori closed form
solutions; in a few cases where this isn't practical, we compare to the
current outputof that function (so these tests are only beneficial
going forward).
Move relevant code from TCP library into new path; leave TCP code in
place for now, but nothing there should be used anywhere at this point.
Remove unnecessary code from lc_tools module that is not used for
feature generation; use dask for computing feature values, in the same
way as is now done for science_features.
@@ -2576,7 +2576,7 @@ def test_prediction_page(self):
res_dict = json.loads(rv.data)
while "currently running" in fa.check_job_status(res_dict["PID"]):
time.sleep(1)
time.sleep(1)
time.sleep(5)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's necessary to sleep 5 seconds here?!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I don't remember changing that...going to just revert it and double-check that tests still pass.

Naming conventions and implementation details were inconsistent between
`lc_tools` (now `obs_feature_tools`), `science_feature_tools`, and in
various other places throughout the featurization pipeline. Improves
consistency within these areas and removes some deprecated/unnecessary
logic, including 1) featurization of a single time series now always
returns a dict, instead of sometimes a list containing a single dict;
2) `short_fname` keys into dictionaries are now always the filename
without the extension (previously the extension was included in some
places and not others).

Undo test_prediction_page sleep change
This reverts commit 0120754.
For now, restore TCP module to simplify the refactor pull request. Will
remove TCP later after the replacement code has been merged.
return None


def generate_obs_features(t, m, e, features_to_compute=cfg.features_list_obs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole module is so much cleaner now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay! Reminds me I need to write a test for the binning/peak-finding stuff...

@stefanv
Copy link
Contributor

stefanv commented Oct 1, 2015

This is such a big clean-up that I don't want to hold back too much on details. I've left a few comments, but I'm +1 on merging.

@@ -1554,7 +1554,7 @@ def test_prediction_proc(self):
entry = r.table("predictions").get("TEMP_TEST01").run(conn)
pred_results_list_dict = entry
assert(pred_results_list_dict["pred_results_list_dict"]
["TESTRUN_215153.dat"][0][0] in ['Beta_Lyrae',
["TESTRUN_215153"][0][0] in ['Beta_Lyrae',
'Herbig_AEBE'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's one example of an indentation issue

Copy link
Contributor

@stefanv stefanv Oct 1, 2015 via email

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I forgot to save my vim plugins from my old machine and I'm still trying to get the indentation working correctly...

@acrellin
Copy link
Member

acrellin commented Oct 1, 2015

@bnaul just let me know when you've pushed the last changes you want to include and I'll merge

@acrellin
Copy link
Member

acrellin commented Oct 1, 2015

➜  mltsp git:(7c54921) make init
python start_mltsp.py --db-init --force
Vendor:  Continuum Analytics, Inc.
Package: mkl
Message: trial mode expires in 29 days
Traceback (most recent call last):
  File "start_mltsp.py", line 3, in <module>
    from mltsp.Flask import flask_app as fa
  File "/home/arien/projects/mltsp/mltsp/Flask/flask_app.py", line 41, in <module>
    from .. import featurize
  File "/home/arien/projects/mltsp/mltsp/featurize.py", line 15, in <module>
    from . import lc_tools
ImportError: cannot import name lc_tools
make: *** [init] Error 1

@acrellin
Copy link
Member

acrellin commented Oct 1, 2015

This must be the error Josh mentioned is happening in Drone...

@acrellin
Copy link
Member

acrellin commented Oct 1, 2015

try
find . -name '*.pyc' -exec rm -rf {} \;
and then run the tests again

@acrellin
Copy link
Member

acrellin commented Oct 1, 2015

find . -name '*.pyc' -delete is probably much safer actually...

@stefanv stefanv changed the title Feature generation refactor Feature generation refactor (closes gh-32) Oct 1, 2015
@bnaul bnaul force-pushed the bnaul-TCP_refactor branch 2 times, most recently from dfd97e3 to 1059710 Compare October 1, 2015 21:50
Tidying up: remove import of deprecated `lc_tools`, add a couple tests
for `obs_feature_tools`, change some test data, remove extraneous
comments, add a couple of docstrings, change `test_flask_app` database
teardown procedure.
@bnaul
Copy link
Contributor Author

bnaul commented Oct 1, 2015

I've taken care of the notes here and the other things I was still working on, ready when you are...

@acrellin
Copy link
Member

acrellin commented Oct 1, 2015

All clear!

acrellin added a commit that referenced this pull request Oct 1, 2015
Feature generation refactor (closes gh-32)
@acrellin acrellin merged commit 1a4c6b3 into cesium-ml:master Oct 1, 2015
@stefanv
Copy link
Contributor

stefanv commented Oct 1, 2015

Once you've made sure that you've checked in all your files: git clean -xdf

bnaul added a commit to bnaul/cesium that referenced this pull request Oct 2, 2015
All relevant code was moved to mltsp.science_features in cesium-ml#70.
bnaul added a commit to bnaul/cesium that referenced this pull request Oct 2, 2015
All relevant code was moved to mltsp.science_features in cesium-ml#70.
@bnaul bnaul mentioned this pull request Oct 2, 2015
acrellin pushed a commit to acrellin/cesium that referenced this pull request Oct 7, 2015
All relevant code was moved to mltsp.science_features in cesium-ml#70.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants