Feature generation refactor (closes gh-32) #70

bnaul · 2015-10-01T00:09:31Z

Move relevant code from TCP module into science_features; TCP is no longer in use anywhere, will be removed later
Rename lc_tools to obs_feature_tools, remove unused legacy code
Improve consistency of style/parameters between various feature generation functions, including:
- short_fname should now always be a basename without an extension (this is used as a key in various dictionaries)
- Some featurization functions were returning a dict, and some a list of a single dict; should always be a dict now, and a lot of tests were changed to reflect this flattening (e.g. result[0]['somekey'] -> result['somekey'])

As far as I know this is functionally finished (all tests passing for me), just needs to be thoroughly reviewed. The new functionality should all have docstrings, but some of the old code that I changed was lacking docstrings to begin with; will try to go back and add them as I review my changes.

Augmented feature generation testing (which currently tests all features against previously-computed values) with unit tests for specific algorithms. Where possible the tests compare to a priori closed form solutions; in a few cases where this isn't practical, we compare to the current outputof that function (so these tests are only beneficial going forward).

Move relevant code from TCP library into new path; leave TCP code in place for now, but nothing there should be used anywhere at this point.

Remove unnecessary code from lc_tools module that is not used for feature generation; use dask for computing feature values, in the same way as is now done for science_features.

acrellin · 2015-10-01T00:20:31Z

mltsp/tests/test_flask_app.py

@@ -2576,7 +2576,7 @@ def test_prediction_page(self):
            res_dict = json.loads(rv.data)
            while "currently running" in fa.check_job_status(res_dict["PID"]):
                time.sleep(1)
-            time.sleep(1)
+            time.sleep(5)


It's necessary to sleep 5 seconds here?!

Hm, I don't remember changing that...going to just revert it and double-check that tests still pass.

Naming conventions and implementation details were inconsistent between `lc_tools` (now `obs_feature_tools`), `science_feature_tools`, and in various other places throughout the featurization pipeline. Improves consistency within these areas and removes some deprecated/unnecessary logic, including 1) featurization of a single time series now always returns a dict, instead of sometimes a list containing a single dict; 2) `short_fname` keys into dictionaries are now always the filename without the extension (previously the extension was included in some places and not others). Undo test_prediction_page sleep change

This reverts commit 0120754. For now, restore TCP module to simplify the refactor pull request. Will remove TCP later after the replacement code has been merged.

acrellin · 2015-10-01T00:35:55Z

mltsp/obs_feature_tools.py

+        return None
+
+
+def generate_obs_features(t, m, e, features_to_compute=cfg.features_list_obs):


This whole module is so much cleaner now

Yay! Reminds me I need to write a test for the binning/peak-finding stuff...

stefanv · 2015-10-01T14:23:37Z

This is such a big clean-up that I don't want to hold back too much on details. I've left a few comments, but I'm +1 on merging.

acrellin · 2015-10-01T18:20:17Z

mltsp/tests/test_flask_app.py

@@ -1554,7 +1554,7 @@ def test_prediction_proc(self):
            entry = r.table("predictions").get("TEMP_TEST01").run(conn)
            pred_results_list_dict = entry
            assert(pred_results_list_dict["pred_results_list_dict"]
-                   ["TESTRUN_215153.dat"][0][0] in ['Beta_Lyrae',
+                   ["TESTRUN_215153"][0][0] in ['Beta_Lyrae',
                                                    'Herbig_AEBE'])


Here's one example of an indentation issue

It may be worth installing a pyflakes checker for your editor (it typically finds this sort of thing quite easily).

Yeah, I forgot to save my vim plugins from my old machine and I'm still trying to get the indentation working correctly...

acrellin · 2015-10-01T18:32:06Z

@bnaul just let me know when you've pushed the last changes you want to include and I'll merge

acrellin · 2015-10-01T18:35:29Z

➜  mltsp git:(7c54921) make init
python start_mltsp.py --db-init --force
Vendor:  Continuum Analytics, Inc.
Package: mkl
Message: trial mode expires in 29 days
Traceback (most recent call last):
  File "start_mltsp.py", line 3, in <module>
    from mltsp.Flask import flask_app as fa
  File "/home/arien/projects/mltsp/mltsp/Flask/flask_app.py", line 41, in <module>
    from .. import featurize
  File "/home/arien/projects/mltsp/mltsp/featurize.py", line 15, in <module>
    from . import lc_tools
ImportError: cannot import name lc_tools
make: *** [init] Error 1

acrellin · 2015-10-01T18:37:37Z

This must be the error Josh mentioned is happening in Drone...

acrellin · 2015-10-01T18:38:42Z

try
find . -name '*.pyc' -exec rm -rf {} \;
and then run the tests again

acrellin · 2015-10-01T18:44:53Z

find . -name '*.pyc' -delete is probably much safer actually...

Tidying up: remove import of deprecated `lc_tools`, add a couple tests for `obs_feature_tools`, change some test data, remove extraneous comments, add a couple of docstrings, change `test_flask_app` database teardown procedure.

bnaul · 2015-10-01T21:58:37Z

I've taken care of the notes here and the other things I was still working on, ready when you are...

acrellin · 2015-10-01T22:02:42Z

All clear!

Feature generation refactor (closes gh-32)

stefanv · 2015-10-01T22:28:45Z

Once you've made sure that you've checked in all your files: git clean -xdf

All relevant code was moved to mltsp.science_features in cesium-ml#70.

bnaul added 6 commits September 25, 2015 16:00

Initial refactor of science feature generation

080f07e

Move relevant code from TCP library into new path; leave TCP code in place for now, but nothing there should be used anywhere at this point.

Remove mltsp.TCP module

0120754

Change test data to be consistent with other new tests

fced01b

Improve docstrings for science_features module.

57243c7

Refactor lc_tools feature generation

5f812c7

Remove unnecessary code from lc_tools module that is not used for feature generation; use dask for computing feature values, in the same way as is now done for science_features.

acrellin reviewed Oct 1, 2015
View reviewed changes

bnaul added 2 commits September 30, 2015 17:27

Revert "Remove mltsp.TCP module"

7c54921

This reverts commit 0120754. For now, restore TCP module to simplify the refactor pull request. Will remove TCP later after the replacement code has been merged.

bnaul force-pushed the bnaul-TCP_refactor branch from 2edcfc0 to 7c54921 Compare October 1, 2015 00:30

acrellin reviewed Oct 1, 2015
View reviewed changes

acrellin mentioned this pull request Oct 1, 2015

Review the current feature extractor code & add tests #32

Closed

bnaul force-pushed the bnaul-TCP_refactor branch from f76e839 to c1776d2 Compare October 1, 2015 21:37

stefanv changed the title ~~Feature generation refactor~~ Feature generation refactor (closes gh-32) Oct 1, 2015

bnaul force-pushed the bnaul-TCP_refactor branch 2 times, most recently from dfd97e3 to 1059710 Compare October 1, 2015 21:50

Remove old import + other style changes

2660c11

Tidying up: remove import of deprecated `lc_tools`, add a couple tests for `obs_feature_tools`, change some test data, remove extraneous comments, add a couple of docstrings, change `test_flask_app` database teardown procedure.

bnaul force-pushed the bnaul-TCP_refactor branch from 1059710 to 2660c11 Compare October 1, 2015 21:56

acrellin added a commit that referenced this pull request Oct 1, 2015

Merge pull request #70 from bnaul/bnaul-TCP_refactor

1a4c6b3

Feature generation refactor (closes gh-32)

acrellin merged commit 1a4c6b3 into cesium-ml:master Oct 1, 2015

bnaul added a commit to bnaul/cesium that referenced this pull request Oct 2, 2015

Remove mltsp.TCP module

96f624f

All relevant code was moved to mltsp.science_features in cesium-ml#70.

bnaul added a commit to bnaul/cesium that referenced this pull request Oct 2, 2015

Remove mltsp.TCP module

b999a37

All relevant code was moved to mltsp.science_features in cesium-ml#70.

bnaul mentioned this pull request Oct 2, 2015

Remove mltsp.TCP module #72

Merged

acrellin pushed a commit to acrellin/cesium that referenced this pull request Oct 7, 2015

Remove mltsp.TCP module

e4c7de5

All relevant code was moved to mltsp.science_features in cesium-ml#70.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature generation refactor (closes gh-32) #70

Feature generation refactor (closes gh-32) #70

bnaul commented Oct 1, 2015

acrellin Oct 1, 2015

bnaul Oct 1, 2015

acrellin Oct 1, 2015

bnaul Oct 1, 2015

stefanv commented Oct 1, 2015

acrellin Oct 1, 2015

stefanv Oct 1, 2015 via email

bnaul Oct 1, 2015

acrellin commented Oct 1, 2015

acrellin commented Oct 1, 2015

acrellin commented Oct 1, 2015

acrellin commented Oct 1, 2015

acrellin commented Oct 1, 2015

bnaul commented Oct 1, 2015

acrellin commented Oct 1, 2015

stefanv commented Oct 1, 2015

		return None


		def generate_obs_features(t, m, e, features_to_compute=cfg.features_list_obs):

Feature generation refactor (closes gh-32) #70

Feature generation refactor (closes gh-32) #70

Conversation

bnaul commented Oct 1, 2015

acrellin Oct 1, 2015

Choose a reason for hiding this comment

bnaul Oct 1, 2015

Choose a reason for hiding this comment

acrellin Oct 1, 2015

Choose a reason for hiding this comment

bnaul Oct 1, 2015

Choose a reason for hiding this comment

stefanv commented Oct 1, 2015

acrellin Oct 1, 2015

Choose a reason for hiding this comment

stefanv Oct 1, 2015 via email

Choose a reason for hiding this comment

bnaul Oct 1, 2015

Choose a reason for hiding this comment

acrellin commented Oct 1, 2015

acrellin commented Oct 1, 2015

acrellin commented Oct 1, 2015

acrellin commented Oct 1, 2015

acrellin commented Oct 1, 2015

bnaul commented Oct 1, 2015

acrellin commented Oct 1, 2015

stefanv commented Oct 1, 2015