Skip to content

JoshuaJewell/Custom-Ipsum

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Custom Ipsum

Placeholder text generator using markov chains; one method assuming too much about the English language, the other assuming nothing about anything. Some fairly cool principles in here though, I would hazard a guess that this is the most advanced program of its type (as long as you maintain a very narrow view of what NLP can mean - c'mon LLM's don't count, do they?).

Disclaimer

fineweb-top5000.tensordict was generated using the top 5000 entries sorted by language score from the 🍷 FineWeb dataset (Penedo et al. 2024). samplemerged@7E-4.tensordict also contains these weights at a ratio of 7.0x10-4 into my own data. Therefore, I assume no liability for any issues arising from these model's outputs. The data may contain inappropriate content and are not intended for critical decision-making, and I advise against relying on them for such purposes.

References

Penedo, G., Kydlíček, H., allal, L.B., Lozhkov, A., Mitchell, M., Raffel, C., Von Werra, L., and Wolf, T. (2024) ‘The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale’, available: https://doi.org/10.48550/ARXIV.2406.17557.

About

Placeholder text generator using markov chain and too many assumptions about the English language.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors