Skip to content

[DIA] Sherlock#1183

Closed
tuchris wants to merge 49 commits intoapache:masterfrom
tuchris:fb_sherlock
Closed

[DIA] Sherlock#1183
tuchris wants to merge 49 commits intoapache:masterfrom
tuchris:fb_sherlock

Conversation

@tuchris
Copy link
Contributor

@tuchris tuchris commented Feb 15, 2021

Hi,
this is our initial version of the implementation of the Sherlock project.
http://sherlock.media.mit.edu/assets/2019-Sherlock-KDD.pdf
The neural network part is implemented like described in the paper. Is there a possibility/need to write test cases for a neuronal network?
The feature extraction is currently missing, but a workflow on how we are planing to extract them is set up in the UtilFunctions.class. Could you please confirm if we are on the right trace?
Thanks for the review and feedback. :)

tuchris and others added 30 commits December 1, 2020 21:12
* no return value
* input file not changeable (must be a static string)
got compiliation error otherwise
values from paper look good, just use huge amount of samples
no test functions should be implemented within this file
removed input file dependencies
@tuchris
Copy link
Contributor Author

tuchris commented Feb 25, 2021

Thank you for the feedback. The hardcoded dependencies will be removed.
Regarding the testing, we have to test 3 parts. The 1.) preprocessing UtilFunction, 2.) training and 3.) testing of the network.
ad 1) is deterministic and testable
but ad 2&3, need large amount of input data to produce reasonable/stable results on training and validation. How shall we proceed here without commit too large training/validation files into the repository?
We would be very happy about some tips regarding this issue.. Maybe @mboehm7 or @Shafaq-Siddiqi could you provide us with some tips?

Many thanks and BR,
Christian, Jasmin, Mauro

@tuchris tuchris changed the title [DIA-WIP] Sherlock [DIA] Sherlock Feb 28, 2021
@corepointer
Copy link
Contributor

Testing training and validation would be out of scope of a unit test imho. And you are right, we don't want a lot of extra data files in the repository for testing. I'd merge the PR to staging with minor formatting changes. A few questions came to mind while looking over it:

  • What is sherlockPreprocessing.dml for and where is it used? That one also contains a path reference to a non existent directory.
  • How did you test the implementation? And where did you get the test data from? What about a simple shell script that downloads the needed data (if that exists somewhere online) and fires off the algorithm.

Regards,
Mark

@tuchris
Copy link
Contributor Author

tuchris commented Mar 3, 2021

Thank you for the information.

  • I will remove the preprocessing script, as it is going to be contributed via a separate PR.

  • You are right. A script/docu with the used files for training/testing will be added.

@tuchris
Copy link
Contributor Author

tuchris commented Mar 10, 2021

I added the download script right in the scripts/builtin/ folder - please move it to the correct location on merging.

Thanks for the good review!
BR Christian

@Baunsgaard
Copy link
Contributor

I added the download script right in the scripts/builtin/ folder - please move it to the correct location on merging.

Thanks for the good review!
BR Christian

if it is a dataset usefull in general it would be nice to integrate in our python tutorials.
see: src/main/python/systemds/examples/tutorials

it is now implemented as python class, to make it easier to use.
@tuchris
Copy link
Contributor Author

tuchris commented Mar 16, 2021

It is a dataset intended to do semantic data type detection. Yes, it could be useful to other projects too.
I moved (+updated) it to the suggested location.

@corepointer
Copy link
Contributor

Thank you for following up on this! We appreciate the extra effort you take 👍
I started merging this morning but was luckily interrupted by a meeting ;-) I'll test it and merge it in if I don't run into any issues.

@corepointer
Copy link
Contributor

Thanks again for the PR. I merged it in now (not in staging - that wouldn't work for a builtin function). I only made formatting modifications and did a test run of the JUnit test you provided. I hope I gave proper github credit to the two coauthors I found on the history of this branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants