Skip to content

[SYSTEMDS-2844] RandomForest.dml implementation #1204

Closed
ChristofJ95 wants to merge 4 commits intoapache:masterfrom
ChristofJ95:master
Closed

[SYSTEMDS-2844] RandomForest.dml implementation #1204
ChristofJ95 wants to merge 4 commits intoapache:masterfrom
ChristofJ95:master

Conversation

@ChristofJ95
Copy link
Contributor

@ChristofJ95 ChristofJ95 commented Mar 17, 2021

Das Hauptproblem, welches wir derzeit haben liegt darin das wir nicht verstehen, warum der Parameter „cur_small_nodes“ = 0 ist. Während wir glauben, dass wir das Binning richtig initialisieren, auch die entsprechende Matrix zurückgeben, wird dennoch kein "small node" gefunden. Folglich wird die perform schleife in Zeile 741 ausgeführt mit 1:0 was einen Error wirft. Wir haben ebenfalls die gesamte parfor in eine if Abfrage gegeben, welche bei dem Fall, dass "cur_num_small_nodes = 0" ist die gesamte parfor überspringt dennoch werden weitere Errors geworfen.
Beim Debuggen haben wir herausgefunden das „cur_num_small_nodes“ von „cur_max_I_gain“ abhängt. Dieses wiederrum hängt von „I_gain_portion“ usw. ab. Am Ende haben wir uns komplett darin verloren herauszufinden an welchem Parameter es am Endeffekt wirklich liegt.

Zusatzbemerkung:
Die implementierte Binning funktion wurde von uns zum Zwecke des debuggings extrem vereinfacht und auf 2 while schleifen geändert. In der originalen, haben wir alles innerhalb einer while schleife berechnet (Mittels arithmetischem Mittel) und zurückgegeben, zum Zwecke der Errorsuche haben wir diese vereinfacht um mögliche Fehler schneller zu finden.

Edit for archival purpose: (english translation)

The main problem we currently have is that we don't understand why the parameter "cur_small_nodes" = 0. While we believe that we initialize the binning correctly and also return the corresponding matrix, no "small node" is found. As a result, the parfor loop in line 741 is executed with 1: 0, which throws an error.

We have also given the entire parfor in an if query, which skips the entire parfor in the case that cur_num_small_nodes = 0 but further errors are thrown.
While debugging we found out that cur_num_small_nodes depends on cur_max_I_gain. This in turn depends on I_gain_portion and so on. In the end we got completely lost in figuring out which parameters really mattered in the end.

Additional remark:
The implemented binning function was extremely simplified by us for the purpose of debugging and changed to 2 while loops. In the original, we calculated everything within a while loop (using the arithmetic mean) and returned it, for the purpose of error search we have simplified it in order to find possible errors more quickly.

@ChristofJ95 ChristofJ95 reopened this Mar 17, 2021
@ChristofJ95 ChristofJ95 changed the title WIP [WIP] RandomForest.dml implementation Mar 17, 2021
@mboehm7
Copy link
Contributor

mboehm7 commented Apr 3, 2021

LGTM. Thanks for the initial builtin function and cleanups @ChristofJ95 and team. I now did some additional cleanups (running junit test, test script, builtin function registration, docs cleanup, replaced equi-height binning with quantile).

@asfgit asfgit closed this in 968f45a Apr 3, 2021
@mboehm7
Copy link
Contributor

mboehm7 commented Apr 3, 2021

Also, this script was/is certainly one of the most convoluted ones, so don't feel bad about not fully completing it. We appreciate your effort. I now also have a plan to rework this and make it much easier to understand (and hopefully faster too).

@j143 j143 changed the title [WIP] RandomForest.dml implementation [SYSTEMDS-2844] RandomForest.dml implementation May 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants