-
Notifications
You must be signed in to change notification settings - Fork 0
/
pipe_flv2annexb.py
435 lines (319 loc) · 22.6 KB
/
pipe_flv2annexb.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
#!/usr/bin/env python
# -*- coding: utf-8 -*-
""" Stream raw video frames to FFmpeg stdin PIPE,
FFmpeg encodes the video in H.264 codec and FLV (Flash Video) container format.
The code reads the FLV encoded data from FFmpeg stdout PIPE,
the AVC NAL units are extracted and converted from AVCC to Annex B format.
The output is a file containing H.264 elementary stream (in Annex B format).
What is the purpose?
The Python implementation is a "proof of concept", for C++ implementation.
Using PIPE interface to FFmpeg may be useful for prototyping, simulations and demonstrations.
Note: The implementation stores the output to a file for testing
The true purpose is to transmit the stream (over TCP/UDP for example).
The described concept is useful when:
1. Video encoding latency is important.
2. It is required to get the encoded data frame by frame.
Splitting the stream to "encoded frames" is required when customization is needed.
Example: Attaching a packet of meta data for each "encoded frame" (then streaming out the result).
Requirements:
1. Python 3.6 or above.
2. FFmpeg executable must be in the execution path.
Why FFmpeg?
1. FFmpeg is free, and open sourced (LGPL 3.0 license, but ffmpeg.exe is GPL 3.0).
2. FFmpeg is an extremely powerful and versatile tool for video encoding (and more).
3. FFmpeg has an extensive documentation, and many results in Google...
4. FFmpeg may be used as single statically linked executable (no setup needed).
5. FFmpeg is supported by Windows, Linux flavors, x86, ARM (and other platforms).
See: https://ffmpeg.org/
Why PIPE interface?
1. PIPE is a generic solution for inter-process communication:
PIPE interface is supported by Windows, Linux (and more).
PIPE interface is supported by most relevant programming languages.
2. Using PIPE interface is usually safer compared to using dynamic linking.
3. Using FFmpeg with PIPE interface is more simple then using FFmpeg C interface.
4. Using PIPE interface doesn't break the GPL 3.0 license.
See: http://zulko.github.io/blog/2013/09/27/read-and-write-video-frames-in-python-using-ffmpeg/
And: https://batchloaf.wordpress.com/2017/02/12/a-simple-way-to-read-and-write-audio-and-video-files-in-c-using-ffmpeg-part-2-video/
Input: --------- Output:
raw video | FFmpeg | Encoded video
------------> | Process | --------------> AVC Encoded stream in FLV container.
stdin PIPE --------- stdout PIPE
Why H.264 (AVC)?
1. In most cases, H.264 format is a system requirement.
2. Most platforms supports H.264 hardware acceleration.
[Unfortunately FFmpeg does not support H.265 encoded FLV].
Why FLV container?
1. FLV container is simple, and well documented.
2. FLV container is designed for streaming (opposed to MP4 [for example] that designed as file container).
3. In FLV, every payload ("access unit") starts with "Payload Size".
The size information is useful when reading from a PIPE (a way to know how many bytes to read).
Knowing the size is the main reason for using a container
(in H.264 elementary stream there is no size information when reading from a PIPE).
FLV container is selected because it is relatively simple to parse.
The advantage of using FLV over elementary H.264 stream is the "Payload Size" information.
H.264 Annex B encoded packets doesn't have packet size information.
The only way to know when packet ends, is when next packet begins.
The lack of size information adds a latency of one frame to the "pipeline".
The lack of size information also makes it difficult to split stream into packets.
Note: There is no method for querying the number of bytes in the PIPE.
Note: Reading byte by byte from stdout PIPE may solve the issues, but it is very inefficient.
NAL units:
AVC (H.264) encoded video stream is separated into NAL units.
NAL refers to Network Abstraction Layer.
See: https://en.wikipedia.org/wiki/Network_Abstraction_Layer
An encoded video frame "access unit" may be built from one or few NAL units.
In streaming video each keyframe (IDR frame) begins with SPS and PPS (data) NAL units,
then comes the "encoded data" NAL unit.
Other frames (P/B frames) are encoded as a single NAL unit.
See: https://yumichan.net/video-processing/video-compression/introduction-to-h264-nal-unit/
NAL units in FLV container:
FLV file/stream is divided into block names "FLV tags".
FLV Payload Data of one Tag applies one "access unit" (encoded video frame with one/few NAL units).
When reading FLV encoded data from stdout PIPE, we can read the Payload Size from Payload header first,
and then read the expected number of bytes from the PIPE.
AVCC versus Annex B:
FLV container stores the NAL units in AVCC format.
The H.264 elementary stream (our output) must be stored in Annex B format.
The code below, converts the NAL units from AVCC to Annex B format (among other things).
Assume NALU is a basic unit.
Annex B format: ([start code] NALU) | ( [start code] NALU)
AVCC format: ([length] NALU) | ([length] NALU)
In Annex B, [start code] may be 0x000001 or 0x00000001.
For Annex B and AVCC format, the NALUs are the same.
See: https://stackoverflow.com/questions/23404403/need-to-convert-h264-stream-from-annex-b-format-to-avcc-format
And: http://neurocline.github.io/dev/2016/07/28/video-and-containers.html
Adjusting stream format by setting FFmpeg flags:
The following FFmpeg flags are used for removing some irrelevant data:
"-flvflags +no_sequence_end+no_metadata+no_duration_filesize"
The flags removes the file "footer" and some irrelevant metadata.
The following FFmpeg flags are used for adding additional data: "-bsf:v dump_extra"
The "dump_extra" flag, forces the repetition of SPS and PPS NAL units for every key frame.
(Opposed to placing the SPS and PPS only once at the beginning of the file).
When using a file, there is no need to duplicate the SPS and PPS information.
When using a stream (i.e UDP streaming), there is no guarantee that the receiver
receives the beginning of the stream (it needs to re-sync on next key frame).
FLV (Flash Video) container format:
FLV stream begins with three letters "FLV" (we may use it for "sanity check").
The stream starts with a short header and data, that is irrelevant in our case.
After the header comes FLV body.
The FLV body contains the AVC encoded stream, arranged in "FLV tags"
Each "FLV tag" stats with a small header followed by AVC "access unit" payload.
FLV Tag Structure:
[Start of Last Packet][ Type ][Payload Length][timestamp][timestamp upper][identifier][Packet Payload]
4 Bytes 1 Byte 3 Byes 3 Byes 1 Byte 3 Bytes
AVC Packet Payload structure:
[Frame Type and Codec ID][AVC Packet Type][Composition Time][ Data ]
1 Byte 1 Byte 3 Bytes n Bytes
See: https://en.wikipedia.org/wiki/Flash_Video
And: https://www.adobe.com/content/dam/acom/en/devnet/flv/video_file_format_spec_v10.pdf
AVC Payload Data Structure (AVCC format NAL unit/units):
May be few NAL units: [SPS][PPS][Coded slice] (Each NAL unit starts with 4 bytes length)
Or one NAL unit: [Coded slice] (The NAL unit starts with 4 bytes length)
Converting the "FLV stream" to H.264 elementary stream:
1. The FLV "file header" is read from the PIPE, and ignored.
2. Read FLV Tag header, and extract Payload Length.
Repeat the process until stdout PIPE is closed:
2.1 Read NAL units until number of read bytes reaches Payload Length:
2.1.1 Read NAL unit length.
2.1.2 Read NAL unit data (according to length).
2.1.3 Convert the NAL unit format from AVCC to Annex B (replace length with 0x00000001 or 0x000001).
The Annex B NAL units are written to a file (for testing).
How to test:
The output is written to a file.
The file is compared to H.264 elementary stream file (encoded with the same set of encoder parameters).
The content of both files should be the same.
Tested using FFmpeg 4.2.2 under Windows 10
"""
import cv2 # Used for building synthetic video frames (for testing only).
import numpy as np # Used for building synthetic video frames (for testing only).
import subprocess as sp
import shlex
width, height, n_frames, fps = 1280, 720, 100, 25 # 100 frames, resolution 1280x720, and 25 fps
def make_raw_frame_as_bytes(i):
""" Build synthetic "raw BGR" image for testing, convert the image to bytes sequence """
p = width//60 # Font scale (p) and thickness (p*2) is proportional to image width.
img = np.full((height, width, 3), 60, np.uint8)
# Random BGR image in resolution width by height.
# img = np.random.randint(0, 99, (height, width, 3), np.uint8)
#img = np.random.randint(60, 64, (height, width, 3), np.uint8)
text = str(i+1)
(tsize_x, tsize_y), _ = cv2.getTextSize(text, cv2.FONT_HERSHEY_DUPLEX, p, p*2)
cv2.putText(img, text, ((width-tsize_x)//2, (height+tsize_y)//2), cv2.FONT_HERSHEY_DUPLEX, p, (255, 30, 30), p*2) # Blue number (centered).
raw_img_bytes = img.tobytes()
return raw_img_bytes
# https://www.programiz.com/python-programming/user-defined-exception
# Python user-defined exceptions
class FlvError(Exception):
""" FLV bad format or unsupported format exception """
def __init__(self, message):
self.message = message
super().__init__(self.message)
def __str__(self):
return self.message
def ReadFlvFileHeaderAndFirstPayload(f):
""" FLV files start with a standard header (9 bytes).
After the header comes the first payload, which contains irrelevant data.
The function reads the header and the first payload data. """
# https://en.wikipedia.org/wiki/Flash_Video
flv_signature = f.read(3)
if flv_signature != b'FLV':
raise FlvError('Bad signature') # FLV file must start with "FLV" letters.
version_byte = f.read(1)
if version_byte[0] != 1:
raise FlvError('Bad version') # Version byte must be 1
flags_byte = f.read(1)
if flags_byte[0] != 1:
raise FlvError(f'flags_byte = {flags_byte} ... Bitmask: 0x04 is audio, 0x01 is video (so 0x05 is audio+video), but we expect video only')
header_size = f.read(4) # Used to skip a newer expanded header
# Read first packet (and ignore it).
flv_payload_size = ReadFlvPacketHeader(f)
# Read "frame_type and _codec_id" byte for testing.
frame_type_and_codec_id = f.read(1)
codec_id = int(frame_type_and_codec_id[0]) & 0xF
frame_type = int(frame_type_and_codec_id[0]) >> 4 # 1 - keyframe, 2 - inter frame
if codec_id != 7:
raise FlvError(f'Codec ID is not AVC. codec_id = {codec_id}, instead of 7')
# Read AVC packet type for testing
avc_packet_type = f.read(1) # 0 - AVC sequence header, 1 - AVC NALU, 2 - AVC end of sequence
if avc_packet_type[0] != 0:
raise FlvError(f'First packet is supposed to be AVC sequence header, but avc_packet_type = {avc_packet_type}, instead of 0')
payload_data = f.read(flv_payload_size-2) # Data as defined by packet type
def ReadFlvPacketHeader(f):
""" Read header of FLV packet and return FLV payload size
After the header, the file is split into packets called "FLV tags",
which have 15-byte packet headers.
The first four bytes denote the size of the previous packet/tag
(including the header without the first field), and aid in seeking backward """
size_of_previous_packet = f.read(4) # For first packet set to NULL (uint32 big-endian)
packet_type = f.read(1) # For first packet set to AMF Metadata
flv_payload_size = f.read(3) # For first packet set to AMF Metadata (uint24 big-endian)
timestamp_lower = f.read(3) # For first packet set to NULL (uint24 big-endian)
timestamp_upper = f.read(1) # Extension to create a uint32_be value (0)
stream_id = f.read(3) # For first stream of same type set to NULL (uint24 big-endian)
if len(flv_payload_size) < 3:
return 0 # End of file.
# Convert uint24 big-endian to integer value
return (int(flv_payload_size[0]) << 16) + (int(flv_payload_size[1]) << 8) + int(flv_payload_size[2])
def ReadPacket5BytesHeader(f):
""" Read the 5 bytes of FLV packet header, and return codec_id """
# https://www.adobe.com/content/dam/acom/en/devnet/flv/video_file_format_spec_v10.pdf
frame_type_and_codec_id = f.read(1)
codec_id = int(frame_type_and_codec_id[0]) & 0xF
frame_type = int(frame_type_and_codec_id[0]) >> 4 # 1 - keyframe, 2 - inter frame
# According to video_file_format_spec_v10.pdf, if codec_id = 7, next comes AVCVIDEOPACKET
avc_packet_type = f.read(1) # 0 - AVC sequence header, 1 - AVC NALU, 2 - AVC end of sequence
composition_time = f.read(3)
if avc_packet_type[0] != 1:
raise FlvError(f'avc_packet_type = {avc_packet_type}')
return codec_id
def ReadFlvPayloadAndConvertToAnnexB(f):
"""Read FLV payload, and convert it to AVC Annex B format.
Return Annex B data as bytes array (return None if end of file).
The FLV payload may contain several AVC NAL units (in AVCC format).
Return empty if reached end of file.
https://yumichan.net/video-processing/video-compression/introduction-to-h264-nal-unit/ """
# annexb_payload is the "encoded frame" converted from AVCC to Annex B format.
annexb_payload = b''
# Read FLV packet header
flv_payload_size = ReadFlvPacketHeader(f)
if flv_payload_size == 0:
return None # End of file.
codec_id = ReadPacket5BytesHeader(f)
if codec_id != 7:
raise FlvError(f'Codec ID is not AVC. codec_id = {codec_id}, instead of 7')
flv_payload_size -= 5 # After reading the "5 Bytes Header", remaining size is 5 bytes less
# https://www.adobe.com/content/dam/acom/en/devnet/flv/video_file_format_spec_v10.pdf
# Next comes "Data"
# if AVCPacketType == 0 AVCDecoderConfigurationRecord
# else if AVCPacketType == 1 One or more NALUs (can be individual slices per FLV packets; that is, full frames are not strictly required)
# else if AVCPacketType == 2 Empty
# Keep reading AVC NAL units, until flv_payload_size is zero (flv_payload_size holds the remaining size).
while flv_payload_size > 0:
nal_size = f.read(4) # Read NAL unit size (uin32, big endian).
flv_payload_size -= 4 # Remaining size is 4 bytes less
if nal_size is None:
return None # End of file.
# Convert uint32 big-endian to integer value
nal_size = (int(nal_size[0]) << 24) + (int(nal_size[1]) << 16) + (int(nal_size[2]) << 8) + int(nal_size[3])
nal_data = f.read(nal_size) # Read NAL unit data
flv_payload_size -= nal_size # Remaining size is nal_size bytes less
# The number of leading zeros (2 or 3) has minor differences between encoders (the implementation tries to match the selected encoder).
if do_use_intel_quick_sync:
if ((nal_data[0] & 0xF) == 5) or (nal_data[0] & 0xF == 1):
# Coded slice of IDR picture and Coded slice of non IDR picture (for some reason begins with only 2 zeros when encoding with Intel QuickSync)
# SEI NAL unit (nal_data[0] == 1) is also begin with only 2 zeros.
annexb_payload_head = b'\x00\x00\x01' # Concatenate [0, 0, 1] to end of annexb_payload, and then NAL data
else:
# Other NAL units begins with 0 0 0 0 1 (for matching FFmpeg Annex B encoded stream)
annexb_payload_head = b'\x00\x00\x00\x01' # Concatenate [0, 0, 0, 1] to end of annexb_payload, and then NAL data
else:
if ((nal_data[0] & 0xF) == 5) or (nal_data[0] & 0xF == 6):
# Coded slice of an IDR picture (for some reason begins with only 2 zeros when encoding with libx264)
# SEI NAL unit (nal_data[0] == 6) is also begin with only 2 zeros.
annexb_payload_head = b'\x00\x00\x01' # Concatenate [0, 0, 1] to end of annexb_payload, and then NAL data
else:
# Other NAL units begins with 0 0 0 0 1 (for matching FFmpeg Annex B encoded stream)
annexb_payload_head = b'\x00\x00\x00\x01' # Concatenate [0, 0, 0, 1] to end of annexb_payload, and then NAL data
# Concatenate NAL data in Annex B format to annexb_payload
annexb_payload = annexb_payload + annexb_payload_head + nal_data
return annexb_payload
do_test_zero_latency = True # Set to True for testing latency of zero frames (applies latency added by the encoder).
# Stream FLV H.264 encoded video to stdout PIPE (the input is raw BGR video streamed to stdin PIPE).
# -flvflags +no_sequence_end+no_metadata+no_duration_filesize - removes redundant FLV metadata
# -bsf:v dump_extra - repeat SPS and PPS (and SEI) NAL units for every key frame.
# -g 25 -bf 3 - GOP size of 25 frames, and encode video with 3 B-Frames
# -threads 1 - Use single thread (reduce latency in some cases).
if do_test_zero_latency:
n_frames_latency = 0 # Latency of zero frames
do_use_intel_quick_sync = False # Using Intel Quick requires Hardware and Drivers support.
if do_use_intel_quick_sync:
ffmpeg_cmd = f'ffmpeg -hide_banner -threads 1 -framerate {fps} -video_size {width}x{height} -pixel_format bgr24 -f rawvideo -an -sn -dn -i pipe: -threads 1 -vcodec h264_qsv -async_depth 1 -forced_idr 1 -g 10 -bf 0 -b:v 20000k -pix_fmt nv12 -f flv -flvflags no_sequence_end+no_metadata+no_duration_filesize -bsf:v dump_extra -an -sn -dn pipe:'
# FFmpeg sub-process with same arguments, but without FLV container, and save output to a file (instead of stdout PIPE) for testing.
ffmpeg_test_cmd = f'ffmpeg -y -hide_banner -threads 1 -framerate {fps} -video_size {width}x{height} -pixel_format bgr24 -f rawvideo -an -sn -dn -i pipe: -threads 1 -vcodec h264_qsv -async_depth 1 -forced_idr 1 -g 10 -bf 0 -b:v 20000k -pix_fmt nv12 -f h264 -an -sn -dn out.264'
else:
ffmpeg_cmd = f'ffmpeg -hide_banner -threads 1 -framerate {fps} -video_size {width}x{height} -pixel_format bgr24 -f rawvideo -an -sn -dn -i pipe: -threads 1 -vcodec libx264 -x264-params bframes=0:force-cfr=1:no-mbtree=1:sync-lookahead=0:sliced-threads=1:rc-lookahead=0 -g 10 -pix_fmt yuv444p -crf 10 -f flv -flvflags no_sequence_end+no_metadata+no_duration_filesize -bsf:v dump_extra -an -sn -dn pipe:'
# FFmpeg sub-process with same arguments, but without FLV container, and save output to a file (instead of stdout PIPE) for testing.
ffmpeg_test_cmd = f'ffmpeg -y -hide_banner -threads 1 -framerate {fps} -video_size {width}x{height} -pixel_format bgr24 -f rawvideo -an -sn -dn -i pipe: -threads 1 -vcodec libx264 -x264-params bframes=0:force-cfr=1:no-mbtree=1:sync-lookahead=0:sliced-threads=1:rc-lookahead=0 -g 10 -pix_fmt yuv444p -crf 10 -f h264 -an -sn -dn out.264'
else:
# Using the following setting results latency of many frames
n_frames_latency = 26 # The exact value was found by trial and error
do_use_intel_quick_sync = False
ffmpeg_cmd = f'ffmpeg -hide_banner -threads 1 -framerate {fps} -video_size {width}x{height} -pixel_format bgr24 -f rawvideo -an -sn -dn -i pipe: -threads 1 -vcodec libx264 -g 25 -bf 3 -pix_fmt yuv444p -crf 10 -f flv -flvflags no_sequence_end+no_metadata+no_duration_filesize -bsf:v dump_extra -an -sn -dn pipe:'
# FFmpeg sub-process with same arguments, but without FLV container, and save output to a file (instead of stdout PIPE) for testing.
ffmpeg_test_cmd = f'ffmpeg -y -hide_banner -threads 1 -framerate {fps} -video_size {width}x{height} -pixel_format bgr24 -f rawvideo -an -sn -dn -i pipe: -threads 1 -vcodec libx264 -g 25 -bf 3 -pix_fmt yuv444p -crf 10 -f h264 -an -sn -dn out.264'
# Execute FFmpeg as sub-process.
ffmpeg_process = sp.Popen(shlex.split(ffmpeg_cmd), stdin=sp.PIPE, stdout=sp.PIPE)
# Execute another FFmpeg sub-process (used for testing).
ffmpeg_test_process = sp.Popen(shlex.split(ffmpeg_test_cmd), stdin=sp.PIPE)
flv_pipe = ffmpeg_process.stdout
out_f = None # Open output file (Annex B stream format)
for i in range(n_frames):
raw_img_bytes = make_raw_frame_as_bytes(i)
ffmpeg_process.stdin.write(raw_img_bytes) # Write raw video frame to input stream of FFmpeg sub-process.
ffmpeg_process.stdin.flush()
ffmpeg_test_process.stdin.write(raw_img_bytes)
ffmpeg_test_process.stdin.flush()
if i == 0:
ReadFlvFileHeaderAndFirstPayload(flv_pipe) # Read FLV header that should be ignored.
out_f = open('out_avcc.264', 'wb') # Open output file (Annex B stream format)
# Assume <n_frames_latency> frames latency
# Note: the reading process is supposed to be in a separate thread (performed here for simplicity).
if i >= n_frames_latency:
# Read FLV payload data and convert the AVC NAL unit/units from AVCC format to Annex B format.
annexb_payload = ReadFlvPayloadAndConvertToAnnexB(flv_pipe)
# Write encoded frame to output file.
# Note: "encoded frame" may contain few NAL units, but each FLV payload applies one "encoded frame" (one "access unit").
out_f.write(annexb_payload) # Write to file for testing.
ffmpeg_test_process.stdin.close()
ffmpeg_process.stdin.close() # Closing stdin "pushes" all the remaining frames from the encoder to stdout (FFmpeg feature).
# Handle the last (delayed) "encoded frames".
for i in range(n_frames_latency):
# Read FLV payload data and convert the AVC NAL unit/units from AVCC format to Annex B format.
annexb_payload = ReadFlvPayloadAndConvertToAnnexB(flv_pipe)
out_f.write(annexb_payload) # Write to file for testing.
if out_f is not None:
out_f.close()
extra_data_byte = flv_pipe.read(4) # Read extra trailing 4 bytes (FFmpeg puts the 4 bytes as a footer [instead of a header of the next frame]).
ffmpeg_process.stdout.close()
ffmpeg_process.wait()
ffmpeg_test_process.wait()