Skip to content

Exploring Text-to-Speech options to extend the EarthArXiv API usage

License

Notifications You must be signed in to change notification settings

eartharxiv/Text-to-Speech

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Requirements and Background

This code has additional Python libraries on which it depends.

For text-to-speach, we're using Google Text-To-Speech (gTTS) version 1.2.2 installed via pip

pip install gTTS

This example was initially created to demonstrate converting EarthArXiv preprints to speech. As such, the code currently focuses exclusively on PDF to Text to Speech as PDF is what is returned from the EarthArXiv API. The PDF to text I use pdfminer, which has an excellent tutorial here. Note that pdfminer is only available for python2.

I use the EarthArXiv Python API code to retrieve a preprint in PDF format. I then use pdf2mp3.py in this repository to convert the PDF to mp3.

Known Issues

This project is still very much under development. A number of know issues exist that we hope to resolve over in future releases.

1.) Currently, we are only focused on PDF files as they are what is returned from the EarthArXiv API. No support currently exists for converting other formats to speech.

2.) My initial interest was to listen to preprints on my commute. As such, the current version outputs an mp3 file that I transfer to my iPod. Future versions will look at playing the audio directly within a Python application.

3.) Not really an issue, more something I'd like to look into. Nic Weber has a nice speech-to-text code. I'd love to explore capturing preprint comments via this code and converting them to text, which could then be sent back to EarthArXiv.

4.) The conversion to mp3 is a bit clunky at the moment. Text-to-Speech simply reads the document linearly from top to bottom. This raises the following issues (and probably a few others I haven't found yet) a.) if a figure or table is embedded in the text, which it almost always is for science documents, the text-to-speech will jump from the main text to the figure caption and back. This is confusing to a listener. b.) science papers are often formatted two-columns, which leads to words being hyphonated as they are wrapped on the page. Text-to-speech has trouble with hyponated words that normally aren't hyphonated

I think 4.) can be addressed with some pre-processing of the text prior to converting it to mp3, i.e. go from PDF to text, remove figures/captions/table, format hypons, then convert to mp3. Just haven't had time to explore this yet.

About

Exploring Text-to-Speech options to extend the EarthArXiv API usage

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages