Analyze how DocHPLT dataset is structured, and how should it be processed in datamix
Analyze how DocHPLT dataset is structured, and how should it be processed in datamix