Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Chintan Shah
committed
Apr 21, 2019
1 parent
c32c0b1
commit 89b1848
Showing
2 changed files
with
42 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,19 @@ | ||
# deep-semantic-code-search | ||
# Deep Semantic Code Search | ||
Deep Semantic Code Search aims to explore a joint embedding space for code and description vectors and then use it for a code search application | ||
|
||
|
||
In these experiments, there are 2 parts: | ||
|
||
1. The first one uses an approach suggested in [1] and we train their architecture on our own python dataset. | ||
2. The second approach expands on the first one through methodology suggested in [2] and we achieve reasonably good results. | ||
|
||
We can clearly observe that semantic information is captured the results: | ||
|
||
![Query Results](screenshot.png) | ||
|
||
|
||
### References: | ||
|
||
[1] https://guxd.github.io/papers/deepcs.pdf | ||
|
||
[2] https://towardsdatascience.com/semantic-code-search-3cd6d244a39c |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
## Code summarization using transfer learning | ||
|
||
|
||
### How to run? | ||
|
||
These notebooks should be run sequentially using the docker containers provided below. | ||
|
||
1. The first notebook fetches and creates the dataset. | ||
2. The second notebook vectorizes the code sequence and description sequence and trains 3 seq2seq models: | ||
* Seq2Seq model from function tokens -> docstring | ||
* Seq2Seq model from api seq -> docstring | ||
* Seq2Seq model from method name -> docstring | ||
3. This notebook trains an AWD LSTM model for docstring using FastAI's implementation. | ||
4. This notebooks trains the final joint embedder from code to docstring vectors. | ||
5. In this notebook, we build a search engine that uses the trained networks to output query results. | ||
6. This notebook evaluates the model. | ||
|
||
In order to run these sets of notebooks (1 - 6), we would highly suggest using these docker containers: | ||
|
||
#### Docker Containers | ||
|
||
- [hamelsmu/ml-gpu](https://hub.docker.com/r/hamelsmu/ml-gpu/): Use this container for any *gpu* bound parts. | ||
|
||
- [hamelsmu/ml-cpu](https://hub.docker.com/r/hamelsmu/ml-cpu/): Use this container for any *cpu* bound parts. |