Custom Ipsum

Placeholder text generator using markov chains; one method assuming too much about the English language, the other assuming nothing about anything. Some fairly cool principles in here though, I would hazard a guess that this is the most advanced program of its type (as long as you maintain a very narrow view of what NLP can mean - c'mon LLM's don't count, do they?).

Disclaimer

fineweb-top5000.tensordict was generated using the top 5000 entries sorted by language score from the 🍷 FineWeb dataset (Penedo et al. 2024). samplemerged@7E-4.tensordict also contains these weights at a ratio of 7.0x10-4 into my own data. Therefore, I assume no liability for any issues arising from these model's outputs. The data may contain inappropriate content and are not intended for critical decision-making, and I advise against relying on them for such purposes.

References

Penedo, G., Kydlíček, H., allal, L.B., Lozhkov, A., Mitchell, M., Raffel, C., Von Werra, L., and Wolf, T. (2024) ‘The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale’, available: https://doi.org/10.48550/ARXIV.2406.17557.

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
data		data
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Custom Ipsum

Disclaimer

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Custom Ipsum

Disclaimer

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages