A PyTorch Code Dataset for Cutting-Edge Fine-tuning
You can install the package using pip
pip install pytorch-dataset
Downloader that downloads and unzips each repository in an account
from pytorch import GitHubRepoDownloader
# Example usage:
downloader = GitHubRepoDownloader(username="lucidrains", download_dir="lucidrains_repositories")
downloader.download_repositories()
Processor that cleans, formats, and submits the cleaned dataset to huggingface
from pytorch import CodeDatasetBuilder
# Example usage:
code_builder = CodeDatasetBuilder("lucidrains_repositories")
code_builder.save_dataset(
"lucidrains_python_code_dataset",
exclude_files=["setup.py"], exclude_dirs=["tests"]
)
code_builder.push_to_hub("lucidrains_python_code_dataset", organization="kye")
MIT