Passing in new java source code snippets to get predictions #109

Thirunayan22 · 2021-09-16T17:38:51Z

How can we pass in a single snippet of java code as string into the model and get a comment generated from the model, for example, what are the preprocessing steps involved?

SpirinEgor · 2021-09-18T12:09:17Z

Hi!

First of all, you need to extract paths from the code snippet. This should be done by the tool you used for model training.

After that, you should write a little wrapper to pass your datapoint into the model. Look into an example from readme for correct model loading. Pass your example into the model according to forward method signature.

Thirunayan22 · 2021-09-18T15:52:43Z

Hi @SpirinEgor, thanks for responding !
To train the model I downloaded the dataset and ran the train script provided in the readme. I am a bit unclear, what do you mean by the "tool" used for model training?

SpirinEgor · 2021-09-18T16:14:05Z

What dataset did you download?

The tool I was speaking about is a tool for extracting paths from AST of code. There are some different ways to do it:

JavaExtractor from original code2seq repo by Alon
astminer
PSIMiner

All these tools use different parsers therefore they build different ASTs and extract different paths.

hehehwang · 2021-09-23T10:17:00Z

hello, i'm doing same thing with my own python ast miner
can you confirm whether i'm doing correctly?
https://gist.github.com/hehehwang/d058c6fca986a5b479afe10245f63a3e
(i think something went wrong since it's keep printing out and token..)

thank you!

SpirinEgor · 2021-09-23T10:38:06Z

It seems that there are correct steps. Could you provide more information:

The mined paths from the code snippet
The input tensors shapes (they should be something like [5; n_paths])

Also, the model was trained with masked method names, so you should replace them too in your example. As far as I remember, mask token in <MN>. But you can check it. It should be in the top of the label_to_id counter. The correct code snippet is

def <MN>(n):
    if n == 0:
        return 1
    else:
        return n * <MN>(n-1)

Output shape is [7; 1; vocab size], so correct decoding is:

# [7]
predictions = output.squeeze(1).argmax(-1)
labels = [id_to_label(i.item()) for i in predictions]

hehehwang · 2021-09-23T14:12:53Z

seems that i've made a silly mistake on decoding.. thank you for your help!
now it works just fine and clear (without mask token though)

thanks again!

SpirinEgor · 2021-09-24T10:03:53Z

Awesome!
If you have other questions, feel free to open an issue!

Thirunayan22 changed the title ~~Passing in new java source code line to get predictions~~ Passing in new java source code snippets to get predictions Sep 16, 2021

SpirinEgor closed this as completed Sep 24, 2021

hehehwang mentioned this issue Oct 3, 2021

can i get an information about the java-med dataset parser? #113

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Passing in new java source code snippets to get predictions #109

Passing in new java source code snippets to get predictions #109

Thirunayan22 commented Sep 16, 2021 •

edited

Loading

SpirinEgor commented Sep 18, 2021

Thirunayan22 commented Sep 18, 2021

SpirinEgor commented Sep 18, 2021

hehehwang commented Sep 23, 2021

SpirinEgor commented Sep 23, 2021

hehehwang commented Sep 23, 2021

SpirinEgor commented Sep 24, 2021

Passing in new java source code snippets to get predictions #109

Passing in new java source code snippets to get predictions #109

Comments

Thirunayan22 commented Sep 16, 2021 • edited Loading

SpirinEgor commented Sep 18, 2021

Thirunayan22 commented Sep 18, 2021

SpirinEgor commented Sep 18, 2021

hehehwang commented Sep 23, 2021

SpirinEgor commented Sep 23, 2021

hehehwang commented Sep 23, 2021

SpirinEgor commented Sep 24, 2021

Thirunayan22 commented Sep 16, 2021 •

edited

Loading