[SYSTEMDS-2844] RandomForest.dml implementation #1204
Closed
ChristofJ95 wants to merge 4 commits intoapache:masterfrom
ChristofJ95:master
Closed
[SYSTEMDS-2844] RandomForest.dml implementation #1204ChristofJ95 wants to merge 4 commits intoapache:masterfrom ChristofJ95:master
ChristofJ95 wants to merge 4 commits intoapache:masterfrom
ChristofJ95:master
Conversation
Contributor
|
LGTM. Thanks for the initial builtin function and cleanups @ChristofJ95 and team. I now did some additional cleanups (running junit test, test script, builtin function registration, docs cleanup, replaced equi-height binning with quantile). |
Contributor
|
Also, this script was/is certainly one of the most convoluted ones, so don't feel bad about not fully completing it. We appreciate your effort. I now also have a plan to rework this and make it much easier to understand (and hopefully faster too). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Das Hauptproblem, welches wir derzeit haben liegt darin das wir nicht verstehen, warum der Parameter „cur_small_nodes“ = 0 ist. Während wir glauben, dass wir das Binning richtig initialisieren, auch die entsprechende Matrix zurückgeben, wird dennoch kein "small node" gefunden. Folglich wird die perform schleife in Zeile 741 ausgeführt mit 1:0 was einen Error wirft. Wir haben ebenfalls die gesamte parfor in eine if Abfrage gegeben, welche bei dem Fall, dass "cur_num_small_nodes = 0" ist die gesamte parfor überspringt dennoch werden weitere Errors geworfen.
Beim Debuggen haben wir herausgefunden das „cur_num_small_nodes“ von „cur_max_I_gain“ abhängt. Dieses wiederrum hängt von „I_gain_portion“ usw. ab. Am Ende haben wir uns komplett darin verloren herauszufinden an welchem Parameter es am Endeffekt wirklich liegt.
Zusatzbemerkung:
Die implementierte Binning funktion wurde von uns zum Zwecke des debuggings extrem vereinfacht und auf 2 while schleifen geändert. In der originalen, haben wir alles innerhalb einer while schleife berechnet (Mittels arithmetischem Mittel) und zurückgegeben, zum Zwecke der Errorsuche haben wir diese vereinfacht um mögliche Fehler schneller zu finden.
Edit for archival purpose: (english translation)
The main problem we currently have is that we don't understand why the parameter "cur_small_nodes" = 0. While we believe that we initialize the
binningcorrectly and also return the corresponding matrix, no "small node" is found. As a result, the parfor loop in line 741 is executed with 1: 0, which throws an error.We have also given the entire
parforin anifquery, which skips the entireparforin the case thatcur_num_small_nodes = 0but further errors are thrown.While debugging we found out that
cur_num_small_nodesdepends oncur_max_I_gain. This in turn depends onI_gain_portionand so on. In the end we got completely lost in figuring out which parameters really mattered in the end.Additional remark:
The implemented
binningfunction was extremely simplified by us for the purpose of debugging and changed to 2while loops. In the original, we calculated everything within a while loop (using the arithmetic mean) and returned it, for the purpose of error search we have simplified it in order to find possible errors more quickly.