GitHub - eartharxiv/Text-to-Speech: Exploring Text-to-Speech options to extend the EarthArXiv API usage

Requirements and Background

This code has additional Python libraries on which it depends.

For text-to-speach, we're using Google Text-To-Speech (gTTS) version 1.2.2 installed via pip

pip install gTTS

This example was initially created to demonstrate converting EarthArXiv preprints to speech. As such, the code currently focuses exclusively on PDF to Text to Speech as PDF is what is returned from the EarthArXiv API. The PDF to text I use pdfminer, which has an excellent tutorial here. Note that pdfminer is only available for python2.

I use the EarthArXiv Python API code to retrieve a preprint in PDF format. I then use pdf2mp3.py in this repository to convert the PDF to mp3.

Known Issues

This project is still very much under development. A number of know issues exist that we hope to resolve over in future releases.

1.) Currently, we are only focused on PDF files as they are what is returned from the EarthArXiv API. No support currently exists for converting other formats to speech.

2.) My initial interest was to listen to preprints on my commute. As such, the current version outputs an mp3 file that I transfer to my iPod. Future versions will look at playing the audio directly within a Python application.

3.) Not really an issue, more something I'd like to look into. Nic Weber has a nice speech-to-text code. I'd love to explore capturing preprint comments via this code and converting them to text, which could then be sent back to EarthArXiv.

4.) The conversion to mp3 is a bit clunky at the moment. Text-to-Speech simply reads the document linearly from top to bottom. This raises the following issues (and probably a few others I haven't found yet) a.) if a figure or table is embedded in the text, which it almost always is for science documents, the text-to-speech will jump from the main text to the figure caption and back. This is confusing to a listener. b.) science papers are often formatted two-columns, which leads to words being hyphonated as they are wrapped on the page. Text-to-speech has trouble with hyponated words that normally aren't hyphonated

I think 4.) can be addressed with some pre-processing of the text prior to converting it to mp3, i.e. go from PDF to text, remove figures/captions/table, format hypons, then convert to mp3. Just haven't had time to explore this yet.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md
Rees.mp3		Rees.mp3
Rees.pdf		Rees.pdf
pdf2mp3.py		pdf2mp3.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Requirements and Background

Known Issues

About

Releases

Packages

Languages

License

eartharxiv/Text-to-Speech

Folders and files

Latest commit

History

Repository files navigation

Requirements and Background

Known Issues

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages