Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

time to ensemble!! #7

Closed
ClimbsRocks opened this issue Nov 20, 2015 · 1 comment
Closed

time to ensemble!! #7

ClimbsRocks opened this issue Nov 20, 2015 · 1 comment

Comments

@ClimbsRocks
Copy link
Owner

focus:
within the validations folder:

  1. read in each validation file.
    2a. create a list of lists (predictionsAllRows)
  2. create a list for each row (predictionsForRow)
  3. append each algo's prediction to predictionsForRow
  4. once we have read in all the predictions
    read in the validation dataset (with all the features)
  5. append predictionsAllRows to our validationData using hstack

post-MVP:
break out the data at this point into validationTrain and validationTest.
test is just going to be our actual test data.

from there, run a RF over the data.

post-post-post MVP:
run a modified version of machineJS over the data. see how we can train the best classifiers possible over the validation data and the predictions from the first round of machineJS.

then ensembler will simply average the results together.

what this means:
leave the current ensembler flow untouched. we will use that again on the second round, once we have run things back through machineJS.
we need to create a new workflow in machineJS to accommodate this (no splitData or dataFormatting. that might be the only difference)

http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.hstack.html

@ClimbsRocks
Copy link
Owner Author

whew, finished!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant