continue-dlt-demo

A demo on fine-tuning an LLM on coding data to improve autocomplete predictions for continue.dev within an organization. Using the tools: dlt, Hugging Face and Ollama

Creating the dataset from continue.dev autocomplete suggestions

The continue.dev tool has an autocomplete feature, and stores data in a autocomplete.jsonl file on whether the person using the tool has accepted or rejected the suggestions made. The continue-hf-pipeline.py file contains code for a custom dlt destination that takes this data, converts it to a parquet file and pushes it to a Hugging Face dataset repo, in a format ready for finetuning an LLM.

Finetuning process

I finetuned the starcoder2:3b model using the SFTTrainer from Hugging Face, based on the finetuning code that the creators of that model open-sourced.

I tried finetuning both on the dlt github repository, as well as the autocomplete dataset mentioned above. The code for the finetuning process can be found here: https://colab.research.google.com/drive/1jjb14BDlEeGjRmeXnfm41gDBlTNvsscn?usp=sharing

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
continue-hf-pipeline.py		continue-hf-pipeline.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

continue-hf-pipeline.py

continue-hf-pipeline.py

requirements.txt

requirements.txt

Repository files navigation

continue-dlt-demo

Creating the dataset from continue.dev autocomplete suggestions

Finetuning process

About

Releases

Packages

Languages

dlt-hub/continue-dlt-demo

Folders and files

Latest commit

History

Repository files navigation

continue-dlt-demo

Creating the dataset from continue.dev autocomplete suggestions

Finetuning process

About

Resources

Stars

Watchers

Forks

Languages