This repository contains the implementation of MARK: Memory Augmented Refinement of Knowledge, by Prabal Deb and Anish Ganguli, which has been published at Microsoft Journal of Applied Research (MSJAR) 2025. The public version of the paper is available at arXiv:2505.05177.
MARK is a framework for memory-augmented conversational agents, designed to enhance knowledge retrieval and response quality using Azure OpenAI, Azure AI Search, and custom memory-building logic. It supports batch experimentation and evaluation on datasets such as MedMCQA, with modular agent and memory components.
- Memory-Augmented Agents: Integrates Azure AI Search and OpenAI embeddings for context-aware responses.
- Batch Experimentation: Run large-scale experiments with different datasets and agent configurations.
- Automated Evaluation: Evaluate generated answers using custom information capture metrics.
- Extensible Architecture: Modular design for agents, memory, evaluation, and data handling.
-
Clone the repository:
git clone <repo-url> cd MARK
-
Install dependencies:
pip install -r requirements.txt
-
Environment Variables:
- Create a
.envfile in the project root with the following keys (see code for required variables):AZURE_OPENAI_BASE_URL= AZURE_OPENAI_API_KEY= AZURE_OPENAI_DEPLOYMENT_NAME= AZURE_OPENAI_MODEL_NAME= AZURE_OPENAI_API_VERSION= AZURE_OPENAI_EMBEDDING_MODEL= AZURE_OPENAI_EMBEDDING_API_VERSION= AZURE_SEARCH_ENDPOINT= AZURE_SEARCH_API_KEY= AZURE_SEARCH_INDEX_NAME= CHAINLIT_USERNAME= CHAINLIT_PASSWORD= CHAINLIT_ROLE= CHAINLIT_AUTH_SECRET= AZURE_OPENAI_EVALUATION_DEPLOYMENT_NAME= AZURE_OPENAI_EVALUATION_API_VERSION=
- Create a
Start an interactive memory builder agent session:
chainlit run ./experiment_mem_builder.pyRun batch experiments with memory-augmented agents:
python run_batch_experiment.py --file <input_data.csv> --limit 10 --type med_mcqa--file: Path to the input data file (CSV or MedMCQA format).--limit: Number of records to process (default: 10).--type: Dataset type (med_mcqaorexp_2).
Results are saved in the .evaluation_input_data directory.
Evaluate generated answers using information capture metrics:
python run_batch_evaluation.py --file <experiment_results.jsonl>--file: Path to the.jsonlfile with generated answers.
Evaluation results and summary are saved in the .evaluation_output_data directory.
src/agents/: Agent implementations (e.g., ChatbotAgent).src/data/: Data loaders and models (e.g., MedMCQADataSet, EvaluationData).src/memory/: Memory and Azure AI Search integration.src/service/: Memory builder logic.src/evaluation/: Evaluation metrics and scoring.run_batch_experiment.py: Batch experiment runner.run_batch_evaluation.py: Batch evaluation runner.
- Add new agents in
src/agents/. - Extend memory or evaluation logic in
src/memory/andsrc/evaluation/. - Update
.envfor new Azure/OpenAI endpoints or models.
- Ensure all Azure resources (OpenAI, AI Search) are provisioned and accessible.
- For large-scale experiments, adjust batch sizes and thresholds as needed in the scripts.
MARK is licensed under the MIT License.
If you use this code in your research or use our research paper for your own research work, please provide proper citation:
@misc{ganguli2025markmemoryaugmentedrefinement,
title={MARK: Memory Augmented Refinement of Knowledge},
author={Anish Ganguli and Prabal Deb and Debleena Banerjee},
year={2025},
eprint={2505.05177},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2505.05177},
}