You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your works on InferCode, it's awesome!
My name is Maksim Zubkov, and I am doing my bachelor thesis at JetBrains Research on the topic of self-supervised learning techniques on source code. I want to compare the pre-training scheme proposed in your paper with one I investigate in the scope of my research.
I tried to initialize CodeClassificationData to train the model on my date, but I could not find a script to create files with a .pkl extension. Now it seems like I was finally able to run preprocessing. In order to achieve this goal, I followed the following steps:
As suggested in the README, I execute: docker run --rm -v $(pwd):/data -w /data --entrypoint /usr/local/bin/subtree -it yijun/fast examples/raw_code examples/subtrees node_types.csv to create .ids.csv files in examples/subtrees
Then I explored yijun/fast docker image and found binaries /usr/local/bin/pkl. I ran docker with /usr/local/bin/pkl as an entry point which resulted in several .pkl files.
Then I added minor changes to your repo, namely add some __init__.py files
The next step was to deal with the fast_pb2.py file, which I simply copied from graph-ast repo
Finally, I have succeeded to create trees object and run put_trees_into_bucket, but could you please answer several questions:
Is this a correct algorithm to prepare data for your model? If so, I can create a pull request and add all this information to the README? Or maybe I missed some important point?
I didn't got the difference between /usr/local/bin/pkl and /usr/local/bin/pklpos, could you please explain what is the difference?
If I can somehow help you with open-sourcing the code base of InferCode, I will be pleased to help you, if it is possible
The text was updated successfully, but these errors were encountered:
Hello!
Thanks for your works on
InferCode
, it's awesome!My name is Maksim Zubkov, and I am doing my bachelor thesis at JetBrains Research on the topic of self-supervised learning techniques on source code. I want to compare the pre-training scheme proposed in your paper with one I investigate in the scope of my research.
I tried to initialize
CodeClassificationData
to train the model on my date, but I could not find a script to create files with a.pkl
extension. Now it seems like I was finally able to run preprocessing. In order to achieve this goal, I followed the following steps:README
, I execute:docker run --rm -v $(pwd):/data -w /data --entrypoint /usr/local/bin/subtree -it yijun/fast examples/raw_code examples/subtrees node_types.csv
to create.ids.csv
files inexamples/subtrees
yijun/fast
docker image and found binaries/usr/local/bin/pkl
. I ran docker with/usr/local/bin/pkl
as an entry point which resulted in several.pkl
files.__init__.py
filesfast_pb2.py
file, which I simply copied from graph-ast repoFinally, I have succeeded to create
trees
object and runput_trees_into_bucket
, but could you please answer several questions:README
? Or maybe I missed some important point?/usr/local/bin/pkl
and/usr/local/bin/pklpos
, could you please explain what is the difference?If I can somehow help you with open-sourcing the code base of
InferCode
, I will be pleased to help you, if it is possibleThe text was updated successfully, but these errors were encountered: