I wanted to have transcriptions of the debates and tried to do it myself. It is being a fun ride. Loads of work. 😅 I've been improving the process over this period. Newer debates are likely better transcribed than the initial ones. I haven't got the time to re-review them, with all this content landing each day.
Disclaimer: I try my best to review each debate's SRT (which never takes me less than 45 min on easy ones). It's sometimes very challenging to understand, let alone correct, when multiple people are talking at once. Whisper does an overall good job at this. There were a couple of periods I had to completely write from scratch.
- 21h SIC: PS - IL
- 22h RTP3: PAN - Chega
- 18h RTP3: PCP - PAN
- 20h TVI: AD - BE
- 22h SICN: IL - Chega
- 18h CNN: IL - Livre
- 18h SICN: BE - Livre
- 18h SICN: IL - PAN
- 21h RTP: PS - Livre
- 22h CNN: Chega - PCP
- 21h RTP PSD - Chega
- 18h CNN PCP - Livre
- 22h RTP3 Chega - BE
- 18h RTP3 Livre - PAN
- 21h TVI PS - Chega
- 22h RTP3 IL - PCP
- 18h CNN IL - BE
- 20.30 RTP PS - BE
- 22h SICN Chega - Livre
- 20.30 TVI PS - PCP
- 21h SIC PSD - Livre
- 21h SIC PS - PSD
- 21h RTP3 partidos sem assento parlamentar
- 21h RTP1 partidos com assento parlamentar
PODCAST PROCESS
wget "url" -O 1.mp3
ffmpeg -i 1.mp3 -map 0:a -c:a copy -map_metadata -1 2.mp3
ffmpeg -i 2.mp3 -ss 35 -vcodec copy -acodec copy 3.mp3
wget "url" -O 1.mp4
ffmpeg -i 1.mp4 -map 0:a -c:a copy -map_metadata -1 2.aac
ffmpeg -i 2.aac -ss 20 -codec:a libmp3lame -b:a 128k 3.mp3
-
save m3u8 stream to file on VLC:
-
vlc open network
-
first m3u8...
-
stream output
-
settings
-
file ... asd.ts
-
MPEG TS
-
video to audio without transcoding:
ffmpeg -i vlc-output.ts -vn -acodec copy audio.aac
-
aac to mp3:
ffmpeg -i audio.aac -acodec mp3 audio.mp3
- pinokio + whisper webui
large v3
portuguese
- toggle off suffix checkbox
- supply mp3 file and wait...
- get output from app's
output
folder
#set INFILE 2024-02-05_pan-chega.mp3
ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 $INFILE
ffmpeg -i $INFILE -lavfi showspectrumpic=s=3622x512 out.png
ffmpeg -i $INFILE -filter_complex "showwavespic=s=14488x512" -frames:v 1 out.png
space
- toggle playbackup/down
- move to previous/next subtitleleft/right
- review/fast forward by 15 seconds
For each new debate (an mp3 file), we expect 2 additional files to be created:
- a subtitles file (srt), which initially comes from running whisper over the mp3
- a json file listing the speakers and which subtitles indices belong to each speaker the index.json needs to updated to also list the name of this new debate (used in the search features of the main page)
When the site is running locally for editing purposes,
node server.mjs
should also be running. It changes the file system debate files according to the operations defined in the front end.
There's a set of key bindings for manipulating SRT and JSON files in tandem:
-
j
oins the current subtitle with either its previous or next one -
s
plits the current subtitle by a ratio into 2 new ones -
e
dits the current subtitle's text content -
t
ime tweaks the start and end placements for the current subtitle and its neighbors -
x
deletes the current subtitle -
f
fills the space between the previous subtitle and the current one with a new subtitle -
1
assigns the moderator role to the current subtitle (typically gray) -
2
assigns the 1st debater role to the current subtitle (typically cyan) -
3
assigns the 2nd debater role to the current subtitle (typically magenta) -
§
(before 1, on mac) clears any speaker role from the current subtitle