Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to run the block-level experiment #28

Open
Kaka727 opened this issue Dec 20, 2018 · 9 comments
Open

How to run the block-level experiment #28

Kaka727 opened this issue Dec 20, 2018 · 9 comments

Comments

@Kaka727
Copy link

Kaka727 commented Dec 20, 2018

The description can be found here.
https://github.com/Mondego/SourcererCC/issues/26
@dyangUCI @pedromartins4
Yeah, I used the samples in this repository (test-env). The three projects are zipped so I executed the command "python tokenizer.py zipblocks ". But as I have said, the document under /file_block_stats ("file-stats") is empty. I don't know what is wrong.

@dyangUCI
Copy link
Contributor

sorry, I cannot regenerate your error. When I ran the command "python tokenizer.py zipblocks", there will be data under file_blocks_stats. Did you unzip the folder test-env.tgz? Maybe that's the issue?

@Kaka727
Copy link
Author

Kaka727 commented Dec 21, 2018

@dyangUCI
Thanks for your response. This time I retry this command and the file "file_blocks_stats" really contains some contents as below.
image

However, the file "file-tokens" is still empty. I'd like to know if this is the case under your environment.
Thanks~

@dyangUCI
Copy link
Contributor

Hi, I found the issue in the tokenizer: there's some extra info we once collected for Java functions for some specific experiments and abandoned later on, but the code remains in tokenizer.py, causing index out of range failures, so the results files are not complete. Please pull the git project now and rerun the tokenizer.py, it should be correct now.

@dyangUCI
Copy link
Contributor

There will be 56 blocks in the tokens file. The stats file contains both file stats and block stats, 61 lines in total. You can check the results on your end accordingly.

@dyangUCI dyangUCI reopened this Dec 21, 2018
@Kaka727
Copy link
Author

Kaka727 commented Dec 22, 2018

Yeah, thanks very much!
this time it really works~

@Kaka727
Copy link
Author

Kaka727 commented Dec 22, 2018

But I still have some questions below.
First, in my computer, the results for block-level clones of sampled projects are null. Is it right?
Second, I'd like to know what do Node_1, Node_2, and so on represent for?
Thanks~

@saini
Copy link
Contributor

saini commented Dec 22, 2018

The number of Node folders represents the number of processes that were run in parallel to carry out the clone detection. The numeric argumnet N in the command ‘Python controller.py N’ tells the controller script to cary out clone detection using N processes. For systems where memory is low, N should be 1. Each process will reserve the amount of memory which is specified in the xmx and xms arguments to jvm.

@Kaka727
Copy link
Author

Kaka727 commented Dec 23, 2018

@dyangUCI @saini
Thanks for your quick response.
I'd like to know the results for the three sampled projects. In my computer, there is no "query" file under /NODE_1/output8.0 after executing "python controller.py 1". I'd like to know what's the matter.

@zhuwq585
Copy link

@dyangUCI @saini
Thanks for your quick response.
I'd like to know the results for the three sampled projects. In my computer, there is no "query" file under /NODE_1/output8.0 after executing "python controller.py 1". I'd like to know what's the matter.

Did you find the matter? (If U still remember it...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants