markdownbot

building a basic ai chatbot finetuned using markdown data and opensource pre-trained LLMs. accomplished by running a preprocessing script on a markdown database, chunking each markdown file and compiling them into a single text file. this text file is used to finetune a pretrained LLM using the huggingface LLM database and training library (https://huggingface.co/models). the "run_chatbot.py" file allows the user to interact with the fine tuned LLM via the console.

Suggested Workflow:

suggest creating a new project folder to work in. copy your markdown database into this new folder to avoid editing the actual markdown vault.
point the "preprocessingScript.py" file at your copied markdown database by editing line 8 and run to produce a single text file for finetuning
edit lines 15 and 25 in the "fine_tune_model.py" file to point at your chosen pretrained huggingface model (suggest test running code with a small model like gpt2 at first to avoid long wait times) and run the training script.
once training is complete, point the "run_chatbot.py" file at the finetuned model by editing line 11 and run the script to start a conversation (suggest running in an IDE, the while loop has no end condition, will have to manually stop/exit to end each conversation)

Notes:

all of the code here was generated using ChatGPT 4o and little has been done to optmize any further. suggest experimenting with different training and preprocessing conditions.
deepfates has a preprocessing script for twitter archives as well here (https://gist.github.com/deepfates/78c9515ec2c2f263d6a65a19dd10162d), bit more sophisticated.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
README.md		README.md
fine_tune_model.py		fine_tune_model.py
preprocessingScript.py		preprocessingScript.py
run_chatbot.py		run_chatbot.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

markdownbot

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

markdownbot

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages