Skip to content
@hplt-project

HPLT - High Performance Language Technologies

A space that combines petabytes of natural language data with large-scale model training

Pinned Loading

  1. OpusCleaner Public

    OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.

    Python 51 15

  2. OpusTrainer Public

    Curriculum training

    Python 17 6

Repositories

Showing 10 of 22 repositories
  • data-analytics-tool Public

    HPLT Analytics

    HTML 13 GPL-3.0 1 0 0 Updated Mar 25, 2025
  • warc2text-runner Public

    Scripts for parallelized extraction of plain texts from WARC archieves. Aiming at common and reproducible extraction approach.

    Jupyter Notebook 4 0 4 1 Updated Mar 23, 2025
  • HPLT-WP4 Public

    Information and pipelines on WP4: language models training

    Jupyter Notebook 2 CC0-1.0 3 0 0 Updated Mar 19, 2025
  • HPLT-MT-Models Public

    This contains the configuration and scripts for HPLT MT model releases.

    Python 4 0 1 0 Updated Mar 18, 2025
  • HPLT-textpipes Public

    Step-by-step schematic description of data processing in HPLT

    HTML 0 0 0 0 Updated Mar 11, 2025
  • OpusTrainer Public

    Curriculum training

    Python 17 MIT 6 19 0 Updated Mar 9, 2025
  • OpusPocus Public

    Marian machine translation training pipeline for thousands of models

    Python 2 0 20 (3 issues need help) 0 Updated Mar 3, 2025
  • Shell 1 0 3 0 Updated Feb 24, 2025
  • Jupyter Notebook 1 8 1 1 Updated Feb 23, 2025
  • monotextor-slurm Public

    Set of scripts to run monotextor-like pipeline under slurm HPCs

    Rust 2 GPL-3.0 0 0 0 Updated Feb 7, 2025

Top languages

Loading…

Most used topics

Loading…