DataNexus is a simple to use Python module that you can use in your projects to get transcripts, datasets, etc. The module also allows you to extract character lines from transcripts witch makes it easyer for you to be able to do finetunning of a GPT2 model as an example.
- Downloading of Datasets and Transcripts
- Extract Characters from Transcripts
To get started:
pip install datanexus
from datanexus import datanexus
datanexus = datanexus()
datasets = datanexus.download_dataset('ironman.txt')
print(datasets)
You will need to create a folder called Models
to sucessfully extract the character information
from datanexus import datanexus
character = datanexus.save_character('ironman.txt', 'Tony Stark:', 'Tony.txt')
print(character)
This function will show all of the possible datasets that is usable
from datanexus import datanexus
datasets = datanexus.possible_models()
for dataset in datasets:
print(dataset)
If you have any question or any issues then feel free to create an issue on Github.
Feel free to join The Workshop discord server and send me a ping (_Ethan_
)