ParamBench is a comprehensive graduate-level benchmark in Hindi designed to evaluate Large Language Models (LLMs) on their understanding of Indic subjects. The benchmark contains 17,275 multiple-choice questions across 21 subjects, covering a wide range of topics from Indian competitive examinations.
This benchmark is specifically designed to:
- Assess LLM performance on culturally and linguistically diverse content
- Evaluate understanding of India-specific knowledge domains
- Support the development of more culturally aware AI systems
- 17,275 Questions: Extensive collection of graduate-level MCQs in Hindi
- 21 Subjects: Comprehensive coverage of diverse academic domains
- Standardized Format: Consistent question structure for reliable evaluation
- Automated Evaluation: Scripts for benchmarking and analysis
- Detailed Metrics: Subject-wise and question-type-wise performance analysis
Each question in the dataset includes:
unique_question_id
: Unique identifier for each questionquestion_text
: The question textoption_a
,option_b
,option_c
,option_d
: Four multiple choice optionscorrect_answer
: The correct option (A, B, C, or D)subject
: Subject categoryexam_name
: Source examinationpaper_number
: Paper/section identifierquestion_type
: Type of question (MCQ, Blank-filling, assertion/reasoning, etc.)
The benchmark covers 21 subjects including but not limited to:
- Music
- History
- Drama and Theatre
- Economics
- Anthropology
- Current Affairs
- Indian Culture
- And more...

ParamBench/
├── data/
│ └── full-data.csv # Main dataset file
├── checkpoints/ # Model evaluation checkpoints
├── results/ # Analysis results and visualizations
├── benchmark_script.py # Main benchmarking script
├── analysis_models.py # Analysis and visualization script
├── requirements.txt # Python dependencies
└── README.md # This file
pip install -r requirements.txt
- Python 3.8+
- PyTorch 2.0+
- Transformers 4.45+
- Pandas
- NumPy
- Plotly (for visualization)
- Clone the repository
git clone https://github.com/yourusername/ParamBench.git
cd ParamBench
- Run the benchmark
python benchmark_script.py
The benchmark script supports various configuration options:
# In benchmark_script.py
group_to_run = "small" # Options: "small", "medium", "large", or "all"
batch_size = 16 # Adjust based on GPU memory
After running benchmarks, generate comprehensive analysis reports:
python analysis_models.py
This will generate:
- Model performance summary CSV
- Subject-wise accuracy charts
- Question type analysis
- Combined report with all metrics