Skip to content

Agora-Lab-AI/Pytorch-Dataset

Repository files navigation

Multi-Modality

Pytorch-Dataset

A PyTorch Code Dataset for Cutting-Edge Fine-tuning

Installation

You can install the package using pip

pip install pytorch-dataset

Usage

Downloader that downloads and unzips each repository in an account

from pytorch import GitHubRepoDownloader

# Example usage:
downloader = GitHubRepoDownloader(username="lucidrains", download_dir="lucidrains_repositories")
downloader.download_repositories()

Processor that cleans, formats, and submits the cleaned dataset to huggingface

from pytorch import CodeDatasetBuilder

# Example usage:
code_builder = CodeDatasetBuilder("lucidrains_repositories")

code_builder.save_dataset(
    "lucidrains_python_code_dataset", 
    exclude_files=["setup.py"], exclude_dirs=["tests"]
)

code_builder.push_to_hub("lucidrains_python_code_dataset", organization="kye")

License

MIT

About

A PyTorch Code Dataset for Cutting-Edge Fine-tuning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

 

Packages

No packages published