This project aims to fine-tune a Large Language Model (LLM) using LoRA to automatically generate high-quality Anki flashcards from various knowledge sources (Wikipedia or Personal pdfs).
The system supports:
- Automatic training data generation from PDFs and Wikipedia pages
- LoRA fine-tuning
- Flashcard generation from PDFs
- Flashcard generation from Wikipedia pages
The goal is to produce concise, factual, and pedagogically effective Anki cards suitable for long-term learning.
Models used fro data generation and lora fine-tuning can be ajusted to the user's computing power.
My personal fine-tuned model is available on this link: https://huggingface.co/Guibibo/Mistral-7B-v0.3-FlashCards.
- LoRA fine-tuning
- Domain-agnostic flashcard generation
- PDF parsing and knowledge extraction
- Wikipedia page ingestion
- Synthetic training data generation
- Anki-compatible outputs (
APKG) - Modular end-to-end pipeline
pip install -r environment.ymlpython3 -m mainpython3 -m data.combine_data data/revised/file1.jsonl data/revised/file2.jsonl ...python3 -m data.convert_to_lora.py path/to/filepython3 -m LoRA.run_lora