Skip to content

Pull requests: NVIDIA-NeMo/Curator

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Implement safe extraction methods for tar files to prevent path traversal r0.9.0 Pick this label for auto-cherry-picking into r0.9.0
#769 opened Jul 3, 2025 by abhinavg4 Loading…
3 tasks done
Initial PR for Synthetic data generation
#767 opened Jul 2, 2025 by abhinavg4 Loading…
3 tasks
Create .gitattributes to classify lang as Python
#766 opened Jul 2, 2025 by arhamm1 Loading…
Presidio pii redaction
#765 opened Jul 2, 2025 by ayushdg Draft
3 tasks
Adding with_ for options in base class
#764 opened Jul 2, 2025 by abhinavg4 Loading…
2 of 3 tasks
[DRAFT|WIP] Video Pipeline
#759 opened Jun 30, 2025 by suiyoubi Loading…
3 tasks
docs: readme refresher
#758 opened Jun 30, 2025 by lbliii Loading…
3 tasks
debug toml change
#754 opened Jun 27, 2025 by chtruong814 Loading…
3 tasks
[Ray] Classifiers
#753 opened Jun 27, 2025 by sarahyurick Draft
3 tasks
[Ray] DocumentFilter and Filter/Score/ScoreFilter
#746 opened Jun 24, 2025 by sarahyurick Loading…
4 of 5 tasks
[Ray] Add Common Crawl Download Stage
#738 opened Jun 17, 2025 by praateekmahajan Loading…
3 tasks
[Ray] Common Crawl Draft
#734 opened Jun 17, 2025 by praateekmahajan Draft
3 tasks
Add classifier CLI script tests gpuci Run GPU CI/CD on PR
#684 opened Apr 22, 2025 by sarahyurick Loading…
10 tasks done
SemDedup bug fix for single element cluster gpuci Run GPU CI/CD on PR
#683 opened Apr 22, 2025 by praateekmahajan Loading…
3 tasks
ci: Fix code-freeze workflow
#638 opened Apr 8, 2025 by ko3n1g Loading…
3 tasks
Change prompt to try and get only topic names
#623 opened Apr 3, 2025 by abhinavg4 Loading…
3 tasks
[WIP] Remote I/O in SemDedup
#621 opened Apr 2, 2025 by praateekmahajan Draft
3 tasks
Add Regex Modifier
#568 opened Feb 24, 2025 by shuoyangd Loading…
3 tasks done
Add option to skip data by adding a flag instead of removing them
#566 opened Feb 22, 2025 by shuoyangd Loading…
1 of 3 tasks
Add a way to pass expected language to FastTextLangId filter
#565 opened Feb 21, 2025 by shuoyangd Loading…
2 of 3 tasks
Create FastText classifier module
#546 opened Feb 13, 2025 by sarahyurick Draft
Hard negative mining for Retriever fine-tuning
#523 opened Feb 5, 2025 by vinay-raman Loading…
3 tasks done
ProTip! Exclude everything labeled bug with -label:bug.