Text2Code for Jupyter notebook
A proof-of-concept jupyter extension which converts english queries into relevant python code.
Blog post with more details:
Supported Operating Systems:
NOTE: We have renamed the plugin from mopp to jupyter-text2code. Uninstall mopp before installing new jupyter-text2code version.
pip uninstall mopp
For Mac and other Ubuntu installations not having a nvidia GPU, we need to explicitly set an environment variable at time of install.
GPU install dependencies:
sudo apt-get install libopenblas-dev libomp-dev
git clone https://github.com/deepklarity/jupyter-text2code.git cd jupyter-text2code pip install . jupyter nbextension enable jupyter-text2code/main
pip uninstall jupyter-text2code
- Start Jupyter notebook server by running the following command:
- If you don't see
Nbextensionstab in Jupyter notebook run the following command:
jupyter contrib nbextension install --user
- You can open the sample
notebooks/ctds.ipynbnotebook for testing
- If installation happened successfully, then for the first time, Universal Sentence Encoder model will be downloaded from
- Click on the
TerminalIcon which appears on the menu (to activate the extension)
- Type "help" to see a list of currently supported commands in the repo
- Watch Demo video for some examples
Docker containers for jupyter-text2code
We have published CPU and GPU images to docker hub with all dependencies pre-installed.
https://hub.docker.com/r/deepklarity/jupyter-text2code/ to download the images and usage instructions.Visit
CPU image size:
GPU image size:
Generate training data:
From a list of templates present at
jupyter_text2code/jupyter_text2code_serverextension/data/ner_templates.csv, generate training data by running the following command:
cd scripts && python generate_training_data.py
This command will generate data for intent matching and NER(Named Entity Recognition).
Create intent index faiss
Use the generated data to create a intent-matcher using faiss.
cd scripts && python create_intent_index.py
Train NER model
cd scripts && python train_spacy_ner.py
Steps to add more intents:
- Add more templates in
ner_templateswith a new intent_id
- Generate training data. Modify
generate_training_data.pyif different generation techniques are needed or if introducing a new entity.
- Train intent index
- Train NER model
jupyter_text2code/jupyter_text2code_serverextension/__init__.pywith new intent's condition and add actual code for the intent
- Reinstall plugin by running:
pip install .
- Publish Docker image
- Refactor code and make it mode modular, remove duplicate code, etc
- Add support for Windows
- Add support for more commands
- Improve intent detection and NER
- Explore sentence Paraphrasing to generate higher-quality training data
- Gather real-world variable names, library names as opposed to randomly generating them
- Try NER with a transformer-based model
- With enough data, train a language model to directly do English->code like GPT-3 does, instead of having separate stages in the pipeline
- Create a survey to collect linguistic data
- Add Speech2Code support