Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passing in new java source code snippets to get predictions #109

Closed
Thirunayan22 opened this issue Sep 16, 2021 · 7 comments
Closed

Passing in new java source code snippets to get predictions #109

Thirunayan22 opened this issue Sep 16, 2021 · 7 comments

Comments

@Thirunayan22
Copy link

Thirunayan22 commented Sep 16, 2021

How can we pass in a single snippet of java code as string into the model and get a comment generated from the model, for example, what are the preprocessing steps involved?

@Thirunayan22 Thirunayan22 changed the title Passing in new java source code line to get predictions Passing in new java source code snippets to get predictions Sep 16, 2021
@SpirinEgor
Copy link
Contributor

Hi!

First of all, you need to extract paths from the code snippet. This should be done by the tool you used for model training.

After that, you should write a little wrapper to pass your datapoint into the model. Look into an example from readme for correct model loading. Pass your example into the model according to forward method signature.

@Thirunayan22
Copy link
Author

Hi @SpirinEgor, thanks for responding !
To train the model I downloaded the dataset and ran the train script provided in the readme. I am a bit unclear, what do you mean by the "tool" used for model training?

@SpirinEgor
Copy link
Contributor

What dataset did you download?

The tool I was speaking about is a tool for extracting paths from AST of code. There are some different ways to do it:

  1. JavaExtractor from original code2seq repo by Alon
  2. astminer
  3. PSIMiner

All these tools use different parsers therefore they build different ASTs and extract different paths.

@hehehwang
Copy link

hello, i'm doing same thing with my own python ast miner
can you confirm whether i'm doing correctly?
https://gist.github.com/hehehwang/d058c6fca986a5b479afe10245f63a3e
(i think something went wrong since it's keep printing out and token..)

thank you!

@SpirinEgor
Copy link
Contributor

It seems that there are correct steps. Could you provide more information:

  • The mined paths from the code snippet
  • The input tensors shapes (they should be something like [5; n_paths])

Also, the model was trained with masked method names, so you should replace them too in your example. As far as I remember, mask token in <MN>. But you can check it. It should be in the top of the label_to_id counter. The correct code snippet is

def <MN>(n):
    if n == 0:
        return 1
    else:
        return n * <MN>(n-1)

Output shape is [7; 1; vocab size], so correct decoding is:

# [7]
predictions = output.squeeze(1).argmax(-1)
labels = [id_to_label(i.item()) for i in predictions]

@hehehwang
Copy link

seems that i've made a silly mistake on decoding.. thank you for your help!
now it works just fine and clear (without mask token though)

thanks again!

@SpirinEgor
Copy link
Contributor

Awesome!
If you have other questions, feel free to open an issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants