Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] Download transcription as text file (raw or in a subtitle format) #15

Open
turicas opened this issue Feb 26, 2021 · 2 comments
Labels
enhancement New feature or request

Comments

@turicas
Copy link
Contributor

turicas commented Feb 26, 2021

I had this idea during this use case:

  1. I needed to transcribe an 1-hour long interview in Brazilian Portuguese (the file was 80MB+ and was in M4A format)
  2. I split the original file in 5 OGG parts using ffmpeg (~8MB each part)
  3. I sent the files to the bot, received lots of 4k-chars messages in reply and copied to a text editor (this was boring)
  4. I found some errors when reading the text in the editor, but it was hard to find the error chunk in the audio file (so I could listen and fix it manually)

Being able to download the transcription in a text file will solve problem in item 3. Using a subtitle file format (like srt) would help a lot in item 4. The behavior of attaching the file could be triggered automatically for files longer than 1 minute.

I'm willing to implement this feature if the maintainers accept the proposal.

@carloalbertobarbano
Copy link
Member

I think that having the option to generate a txt/subtitle file for longer audios is great.
I would however leave the possibility to the user of choosing which mode they prefer (e.g. a command /mode <message/text/subtitle>).

From my side it is okay. If @stefanodelbosco does not have any issue (I'm guessing not), you can definitely work on it! 👍

@stefanodelbosco stefanodelbosco added the enhancement New feature or request label Feb 26, 2021
@stefanodelbosco
Copy link
Member

Hi,
I like the idea for the command '/mode' proposed by @carloalbertobarbano (message/text/subtitle) 👍
The result files should be generated in the "/data" directory and these will be deleted as soon as they are sent.

Remember that bots can currently send files of any type of up to 50 MB in size (using the https://api.telegram.org).

It will be possible in the future that TranscriberBot will use a custom bot api (https://github.com/tdlib/telegram-bot-api).
With custom bot api you can Upload files up to 2000 MB.

For me is ok, you can work on it! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants