forked from nus-mmsys/dranger-ffmpeg-tuto
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.html
1967 lines (1817 loc) · 109 KB
/
README.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!-- saved from url=(0049)http://dranger.com/ffmpeg/ffmpegtutorial_all.html -->
<html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<title>ffmpeg tutorial</title>
</head>
<body>
<h2 class="center">An ffmpeg and SDL Tutorial</h2>
<a href="http://dranger.com/ffmpeg/ffmpegtutorial_all.html#tutorial01.html">Part 1</a><br>
<a href="http://dranger.com/ffmpeg/ffmpegtutorial_all.html#tutorial02.html">Part 2</a><br>
<a href="http://dranger.com/ffmpeg/ffmpegtutorial_all.html#tutorial03.html">Part 3</a><br>
<a href="http://dranger.com/ffmpeg/ffmpegtutorial_all.html#tutorial04.html">Part 4</a><br>
<a href="http://dranger.com/ffmpeg/ffmpegtutorial_all.html#tutorial05.html">Part 5</a><br>
<a href="http://dranger.com/ffmpeg/ffmpegtutorial_all.html#tutorial06.html">Part 6</a><br>
<a href="http://dranger.com/ffmpeg/ffmpegtutorial_all.html#tutorial07.html">Part 7</a><br>
<a href="http://dranger.com/ffmpeg/ffmpegtutorial_all.html#tutorial08.html">Part 8</a><br>
<a href="http://dranger.com/ffmpeg/ffmpegtutorial_all.html#end.html">End</a><br>
<a name="ffmpeg.html"></a>
<p>ffmpeg is a wonderful library for creating video applications or even general purpose utilities. ffmpeg takes care of all the hard work of video processing by doing all the decoding, encoding, muxing and demuxing for you. This can make media applications much simpler to write. It's simple, written in C, fast, and can decode almost any codec you'll find in use today, as well as encode several other formats.
</p>
<p>
The only prolem is that documentation is basically nonexistent. There is a single tutorial that shows the basics of ffmpeg and auto-generated doxygen documents. That's it. So, when I decided to learn about ffmpeg, and in the process about how digital video and audio applications work, I decided to document the process and present it as a tutorial.</p>
<p>There is a sample program that comes with ffmpeg called ffplay. It is a simple C program that implements a complete video player using ffmpeg. This tutorial will begin with an updated version of the original tutorial, written by Martin Böhme (I have <span class="crossout">stolen</span> liberally borrowed from that work), and work from there to developing a working video player, based on Fabrice Bellard's ffplay.c. In each tutorial, I'll introduce a new idea (or two) and explain how we implement it. Each tutorial will have a C file so you can download it, compile it, and follow along at home. The source files will show you how the real program works, how we move all the pieces around, as well as showing you the technical details that are unimportant to the tutorial. By the time we are finished, we will have a working video player written in less than 1000 lines of code!
</p>
<p>In making the player, we will be using SDL to output the audio and video of the media file. SDL is an excellent cross-platform multimedia library that's used in MPEG playback software, emulators, and many video games. You will need to download and install the SDL development libraries for your system in order to compile the programs in this tutorial.
</p>
<p>
This tutorial is meant for people with a decent programming background. At the very least you should know C and have some idea about concepts like queues, mutexes, and so on. You should know some basics about multimedia; things like waveforms and such, but you don't need to know a lot, as I explain a lot of those concepts in this tutorial.
</p>
<p>
There are printable HTML files along the way as well as old school ASCII files. You can also get a tarball of the text files and source code or just the source. You can get a printable page of the full thing in HTML or in text.
</p>
<p><b>UPDATE:</b> I've fixed a code error in Tutorial 7 and 8, as well as adding -lavutil.</p>
<p>
Please feel free to email me with bugs, questions, comments, ideas, features, whatever, at <em>dranger at gmail dot com</em>.
</p>
<p><em><b>>></b> Proceed with the tutorial!</em></p>
<a name="tutorial01.html"></a>
<h2>Tutorial 01: Making Screencaps</h2>
<span class="codelink">Code: tutorial01.c</span>
<h3>Overview</h3>
<p>
Movie files have a few basic components. First, the file itself is called a <b>container</b>, and the type of container determines where the information in the file goes. Examples of containers are AVI and Quicktime. Next, you have a bunch of <b>streams</b>; for example, you usually have an audio stream and a video stream. (A "stream" is just a fancy word for "a succession of data elements made available over time".) The data elements in a stream are called <b>frames</b>. Each stream is encoded by a different kind of <b>codec</b>. The codec defines how the actual data is COded and DECoded - hence the name CODEC. Examples of codecs are DivX and MP3. <b>Packets</b> are then read from the stream. Packets are pieces of data that can contain bits of data that are decoded into raw frames that we can finally manipulate for our application. For our purposes, each packet contains complete frames, or multiple frames in the case of audio.
</p>
<p>
At its very basic level, dealing with video and audio streams is very easy:
</p><pre>10 OPEN video_stream FROM video.avi
20 READ packet FROM video_stream INTO frame
30 IF frame NOT COMPLETE GOTO 20
40 DO SOMETHING WITH frame
50 GOTO 20
</pre>
Handling multimedia with ffmpeg is pretty much as simple as this program, although some programs might have a very complex "DO SOMETHING" step. So in this tutorial, we're going to open a file, read from the video stream inside it, and our DO SOMETHING is going to be writing the frame to a PPM file.
<p></p>
<h3>Opening the File</h3>
<p>
First, let's see how we open a file in the first place. With ffmpeg, you have to first initialize the library. (Note that some systems might have to use <ffmpeg/avcodec.h> and <ffmpeg/avformat.h> instead.)
</p><pre>#include <avcodec.h>
#include <avformat.h>
...
int main(int argc, charg *argv[]) {
av_register_all();
</pre>
This registers all available file formats and codecs with the library so they will be used automatically when a file with the corresponding format/codec is opened. Note that you only need to call av_register_all() once, so we do it here in main(). If you like, it's possible to register only certain individual file formats and codecs, but there's usually no reason why you would have to do that.
<p></p>
<p>
Now we can actually open the file:
</p><pre>AVFormatContext *pFormatCtx;
// Open video file
if(av_open_input_file(&pFormatCtx, argv[1], NULL, 0, NULL)!=0)
return -1; // Couldn't open file
</pre>
We get our filename from the first argument. This function reads the file header and stores information about the file format in the AVFormatContext structure we have given it. The last three arguments are used to specify the file format, buffer size, and format options, but by setting this to NULL or 0, libavformat will auto-detect these.
<p>This function only looks at the header, so next we need to check out the stream information in the file.:
</p><pre>// Retrieve stream information
if(av_find_stream_info(pFormatCtx)<0)
return -1; // Couldn't find stream information
</pre>
This function populates <tt>pFormatCtx->streams</tt> with the proper information. We introduce a handy debugging function to show us what's inside:
<pre>// Dump information about file onto standard error
dump_format(pFormatCtx, 0, argv[1], 0);
</pre>
Now <tt>pFormatCtx->streams</tt> is just an array of pointers, of size <tt>pFormatCtx->nb_streams</tt>, so let's walk through it until we find a video stream.
<pre>int i;
AVCodecContext *pCodecCtx;
// Find the first video stream
videoStream=-1;
for(i=0; i<pFormatCtx->nb_streams; i++)
if(pFormatCtx->streams[i]->codec->codec_type==CODEC_TYPE_VIDEO) {
videoStream=i;
break;
}
if(videoStream==-1)
return -1; // Didn't find a video stream
// Get a pointer to the codec context for the video stream
pCodecCtx=pFormatCtx->streams[videoStream]->codec;
</pre>
The stream's information about the codec is in what we call the "codec context." This contains all the information about the codec that the stream is using, and now we have a pointer to it. But we still have to find the actual codec and open it:
<pre>AVCodec *pCodec;
// Find the decoder for the video stream
pCodec=avcodec_find_decoder(pCodecCtx->codec_id);
if(pCodec==NULL) {
fprintf(stderr, "Unsupported codec!\n");
return -1; // Codec not found
}
// Open codec
if(avcodec_open(pCodecCtx, pCodec)<0)
return -1; // Could not open codec
</pre>
Some of you might remember from the old tutorial that there were two other parts to this code: adding <tt>CODEC_FLAG_TRUNCATED</tt> to <tt>pCodecCtx->flags</tt> and adding a hack to correct grossly incorrect frame rates. These two fixes aren't in ffplay.c anymore, so I have to assume that they are not necessary anymore. There's another difference to point out since we removed that code: <tt>pCodecCtx->time_base</tt> now holds the frame rate information. <tt>time_base</tt> is a struct that has the numerator and denominator (AVRational). We represent the frame rate as a fraction because many codecs have non-integer frame rates (like NTSC's 29.97fps).
<p></p>
<h3>Storing the Data</h3>
<p>Now we need a place to actually store the frame:
</p><pre>AVFrame *pFrame;
// Allocate video frame
pFrame=avcodec_alloc_frame();
</pre>
Since we're planning to output PPM files, which are stored in 24-bit RGB, we're going to have to convert our frame from its native format to RGB. ffmpeg will do these conversions for us. For most projects (including ours) we're going to want to convert our initial frame to a specific format. Let's allocate a frame for the converted frame now.
<pre>// Allocate an AVFrame structure
pFrameRGB=avcodec_alloc_frame();
if(pFrameRGB==NULL)
return -1;
</pre>
Even though we've allocated the frame, we still need a place to put the raw data when we convert it. We use <tt>avpicture_get_size</tt> to get the size we need, and allocate the space manually:
<pre>uint8_t *buffer;
int numBytes;
// Determine required buffer size and allocate buffer
numBytes=avpicture_get_size(PIX_FMT_RGB24, pCodecCtx->width,
pCodecCtx->height);
buffer=(uint8_t *)av_malloc(numBytes*sizeof(uint8_t));
</pre>
<tt>av_malloc</tt> is ffmpeg's malloc that is just a simple wrapper around malloc that makes sure the memory addresses are aligned and such. It will <i>not</i> protect you from memory leaks, double freeing, or other malloc problems.
<p></p>
<p>
Now we use avpicture_fill to associate the frame with our newly allocated buffer. About the AVPicture cast: the AVPicture struct is a subset of the AVFrame struct - the beginning of the AVFrame struct is identical to the AVPicture struct.
</p><pre>// Assign appropriate parts of buffer to image planes in pFrameRGB
// Note that pFrameRGB is an AVFrame, but AVFrame is a superset
// of AVPicture
avpicture_fill((AVPicture *)pFrameRGB, buffer, PIX_FMT_RGB24,
pCodecCtx->width, pCodecCtx->height);
</pre>
Finally! Now we're ready to read from the stream!
<p></p>
<h3>Reading the Data</h3>
<p>
What we're going to do is read through the entire video stream by reading in the packet, decoding it into our frame, and once our frame is complete, we will convert and save it.
</p><pre>int frameFinished;
AVPacket packet;
i=0;
while(av_read_frame(pFormatCtx, &packet)>=0) {
// Is this a packet from the video stream?
if(packet.stream_index==videoStream) {
// Decode video frame
avcodec_decode_video(pCodecCtx, pFrame, &frameFinished,
packet.data, packet.size);
// Did we get a video frame?
if(frameFinished) {
// Convert the image from its native format to RGB
img_convert((AVPicture *)pFrameRGB, PIX_FMT_RGB24,
(AVPicture*)pFrame, pCodecCtx->pix_fmt,
pCodecCtx->width, pCodecCtx->height);
// Save the frame to disk
if(++i<=5)
SaveFrame(pFrameRGB, pCodecCtx->width,
pCodecCtx->height, i);
}
}
// Free the packet that was allocated by av_read_frame
av_free_packet(&packet);
}
</pre>
<span class="sidenote">
<b>A note on packets</b>
<p>
Technically a packet can contain partial frames or other bits of data, but ffmpeg's parser ensures that the packets we get contain either complete or multiple frames.
</p>
</span>
The process, again, is simple: <tt>av_read_frame()</tt> reads in a packet and stores it in the <tt>AVPacket</tt> struct. Note that we've only allocated the packet structure - ffmpeg allocates the internal data for us, which is pointed to by <tt>packet.data</tt>. This is freed by the <tt>av_free_packet()</tt> later. <tt>avcodec_decode_video()</tt> converts the packet to a frame for us. However, we might not have all the information we need for a frame after decoding a packet, so <tt>avcodec_decode_video()</tt> sets frameFinished for us when we have the next frame. Finally, we use <tt>img_convert()</tt> to convert from the native format (<tt>pCodecCtx->pix_fmt</tt>) to RGB. Remember that you can cast an AVFrame pointer to an AVPicture pointer. Finally, we pass the frame and height and width information to our SaveFrame function.
<p></p>
<p>
Now all we need to do is make the SaveFrame function to write the RGB information to a file in PPM format. We're going to be kind of sketchy on the PPM format itself; trust us, it works.
</p><pre>void SaveFrame(AVFrame *pFrame, int width, int height, int iFrame) {
FILE *pFile;
char szFilename[32];
int y;
// Open file
sprintf(szFilename, "frame%d.ppm", iFrame);
pFile=fopen(szFilename, "wb");
if(pFile==NULL)
return;
// Write header
fprintf(pFile, "P6\n%d %d\n255\n", width, height);
// Write pixel data
for(y=0; y<height; y++)
fwrite(pFrame->data[0]+y*pFrame->linesize[0], 1, width*3, pFile);
// Close file
fclose(pFile);
}
</pre>
We do a bit of standard file opening, etc., and then write the RGB data. We write the file one line at a time. A PPM file is simply a file that has RGB information laid out in a long string. If you know HTML colors, it would be like laying out the color of each pixel end to end like <tt>#ff0000#ff0000</tt>.... would be a red screen. (It's stored in binary and without the separator, but you get the idea.) The header indicated how wide and tall the image is, and the max size of the RGB values.
<p></p>
<p>
Now, going back to our main() function. Once we're done reading from the video stream, we just have to clean everything up:
</p><pre>// Free the RGB image
av_free(buffer);
av_free(pFrameRGB);
// Free the YUV frame
av_free(pFrame);
// Close the codec
avcodec_close(pCodecCtx);
// Close the video file
av_close_input_file(pFormatCtx);
return 0;
</pre>
You'll notice we use av_free for the memory we allocated with avcode_alloc_frame and av_malloc.
<p></p>
<p>
That's it for the code! Now, if you're on Linux or a similar platform, you'll run:
</p><pre>gcc -o tutorial01 tutorial01.c -lavutil -lavformat -lavcodec -lz -lavutil -lm
</pre>
If you have an older version of ffmpeg, you may need to drop -lavutil:
<pre>gcc -o tutorial01 tutorial01.c -lavformat -lavcodec -lz -lm
</pre>
Most image programs should be able to open PPM files. Test it on some movie files.
<p></p>
<p>
<em><b>>></b> Tutorial 2: Outputting to the Screen</em>
<a name="tutorial02.html"></a>
</p><h2>Tutorial 02: Outputting to the Screen</h2>
<span class="codelink">Code: tutorial02.c</span>
<h3>SDL and Video</h3>
<p>
To draw to the screen, we're going to use SDL. SDL stands for Simple Direct Layer, and is an excellent library for multimedia, is cross-platform, and is used in several projects. You can get the library at the official website or you can download the development package for your operating system if there is one. You'll need the libraries to compile the code for this tutorial (and for the rest of them, too).
</p>
<p>
SDL has many methods for drawing images to the screen, and it has one in particular that is meant for displaying movies on the screen - what it calls a YUV overlay. YUV (technically not YUV but YCbCr)
<span class="sidenote"><b>* A note: </b>There is a great deal of annoyance from some people at the convention of calling "YCbCr" "YUV". Generally speaking, YUV is an analog format and YCbCr is a digital format. ffmpeg and SDL both refer to YCbCr as YUV in their code and macros.</span>
is a way of storing raw image data like RGB. Roughly speaking, Y is the brightness (or "luma") component, and U and V are the color components. (It's more complicated than RGB because some of the color information is discarded, and you might have only 1 U and V sample for every 2 Y samples.) SDL's YUV overlay takes in a raw array of YUV data and displays it. It accepts 4 different kinds of YUV formats, but YV12 is the fastest. There is another YUV format called YUV420P that is the same as YV12, except the U and V arrays are switched. The 420 means it is subsampled at a ratio of 4:2:0, basically meaning there is 1 color sample for every 4 luma samples, so the color information is quartered. This is a good way of saving bandwidth, as the human eye does not percieve this change. The "P" in the name means that the format is "planar" simply meaning that the Y, U, and V components are in separate arrays. ffmpeg can convert images to YUV420P, with the added bonus that many video streams are in that format already, or are easily converted to that format.
</p>
<p>
So our current plan is to replace the <tt>SaveFrame()</tt> function from Tutorial 1, and instead output our frame to the screen. But first we have to start by seeing how to use the SDL Library. First we have to include the libraries and initalize SDL:
</p><pre>#include <SDL.h>
#include <SDL_thread.h>
if(SDL_Init(SDL_INIT_VIDEO | SDL_INIT_AUDIO | SDL_INIT_TIMER)) {
fprintf(stderr, "Could not initialize SDL - %s\n", SDL_GetError());
exit(1);
}
</pre>
<tt>SDL_Init()</tt> essentially tells the library what features we're going to use. <tt>SDL_GetError()</tt>, of course, is a handy debugging function.
<p></p>
<h3>Creating a Display</h3>
<p>
Now we need a place on the screen to put stuff. The basic area for displaying images with SDL is called a <b>surface</b>:
</p><pre>SDL_Surface *screen;
screen = SDL_SetVideoMode(pCodecCtx->width, pCodecCtx->height, 0, 0);
if(!screen) {
fprintf(stderr, "SDL: could not set video mode - exiting\n");
exit(1);
}
</pre>
This sets up a screen with the given width and height. The next option is the bit depth of the screen - 0 is a special value that means "same as the current display". (This does not work on OS X; see source.)
<p></p>
<p>
Now we create a YUV overlay on that screen so we can input video to it:
</p><pre>SDL_Overlay *bmp;
bmp = SDL_CreateYUVOverlay(pCodecCtx->width, pCodecCtx->height,
SDL_YV12_OVERLAY, screen);
</pre>
As we said before, we are using YV12 to display the image.
<p></p>
<h3>Displaying the Image</h3>
<p>
Well that was simple enough! Now we just need to display the image. Let's go all the way down to where we had our finished frame. We can get rid of all that stuff we had for the RGB frame, and we're going to replace the <tt>SaveFrame()</tt> with our display code. To display the image, we're going to make an AVPicture struct and set its data pointers and linesize to our YUV overlay:
</p><pre> if(frameFinished) {
SDL_LockYUVOverlay(bmp);
AVPicture pict;
pict.data[0] = bmp->pixels[0];
pict.data[1] = bmp->pixels[2];
pict.data[2] = bmp->pixels[1];
pict.linesize[0] = bmp->pitches[0];
pict.linesize[1] = bmp->pitches[2];
pict.linesize[2] = bmp->pitches[1];
// Convert the image into YUV format that SDL uses
img_convert(&pict, PIX_FMT_YUV420P,
(AVPicture *)pFrame, pCodecCtx->pix_fmt,
pCodecCtx->width, pCodecCtx->height);
SDL_UnlockYUVOverlay(bmp);
}
</pre>
First, we lock the overlay because we are going to be writing to it. This is a good habit to get into so you don't have problems later. The AVPicture struct, as shown before, has a <tt>data</tt> pointer that is an array of 4 pointers. Since we are dealing with YUV420P here, we only have 3 channels, and therefore only 3 sets of data. Other formats might have a fourth pointer for an alpha channel or something. <tt>linesize</tt> is what it sounds like. The analogous structures in our YUV overlay are the <tt>pixels</tt> and <tt>pitches</tt> variables. ("pitches" is the term SDL uses to refer to the width of a given line of data.) So what we do is point the three arrays of <tt>pict.data</tt> at our overlay, so when we write to pict, we're actually writing into our overlay, which of course already has the necessary space allocated. Similarly, we get the linesize information directly from our overlay. We change the conversion format to <tt>PIX_FMT_YUV420P</tt>, and we use <tt>img_convert</tt> just like before.
<p></p>
<h3>Drawing the Image</h3>
<p>
But we still need to tell SDL to actually show the data we've given it. We also pass this function a rectangle that says where the movie should go and what width and height it should be scaled to. This way, SDL does the scaling for us, and it can be assisted by your graphics processor for faster scaling:
</p><pre>SDL_Rect rect;
if(frameFinished) {
/* ... code ... */
// Convert the image into YUV format that SDL uses
img_convert(&pict, PIX_FMT_YUV420P,
(AVPicture *)pFrame, pCodecCtx->pix_fmt,
pCodecCtx->width, pCodecCtx->height);
SDL_UnlockYUVOverlay(bmp);
rect.x = 0;
rect.y = 0;
rect.w = pCodecCtx->width;
rect.h = pCodecCtx->height;
SDL_DisplayYUVOverlay(bmp, &rect);
}
</pre>
Now our video is displayed!
<p></p>
<p>
Let's take this time to show you another feature of SDL: its event system. SDL is set up so that when you type, or move the mouse in the SDL application, or send it a signal, it generates an <b>event</b>. Your program then checks for these events if it wants to handle user input. Your program can also make up events to send the SDL event system. This is especially useful when multithread programming with SDL, which we'll see in Tutorial 4. In our program, we're going to poll for events right after we finish processing a packet. For now, we're just going to handle the <tt>SDL_QUIT</tt> event so we can exit:
</p><pre>SDL_Event event;
av_free_packet(&packet);
SDL_PollEvent(&event);
switch(event.type) {
case SDL_QUIT:
SDL_Quit();
exit(0);
break;
default:
break;
}
</pre>
And there we go! Get rid of all the old cruft, and you're ready to compile. If you are using Linux or a variant, the best way to compile using the SDL libs is this:
<pre>gcc -o tutorial02 tutorial02.c -lavutil -lavformat -lavcodec -lz -lm \
`sdl-config --cflags --libs`
</pre>
sdl-config just prints out the proper flags for gcc to include the SDL libraries properly. You may need to do something different to get it to compile on your system; please check the SDL documentation for your system. Once it compiles, go ahead and run it.
<p></p>
<p>
What happens when you run this program? The video is going crazy! In fact, we're just displaying all the video frames as fast as we can extract them from the movie file. We don't have any code right now for figuring out <i>when</i> we need to display video. Eventually (in Tutorial 5), we'll get around to syncing the video. But first we're missing something even more important: sound!
</p>
<p>
<em><b>>></b> Playing Sound</em>
</p>
<a name="tutorial03.html"></a>
<h2>Tutorial 03: Playing Sound</h2>
<span class="codelink">Code: tutorial03.c</span>
<h3>Audio</h3>
<p>
So now we want to play sound. SDL also gives us methods for outputting sound. The <tt>SDL_OpenAudio()</tt> function is used to open the audio device itself. It takes as arguments an <tt>SDL_AudioSpec</tt> struct, which contains all the information about the audio we are going to output.
</p>
<p>
Before we show how you set this up, let's explain first about how audio is handled by computers. Digital audio consists of a long stream of <b>samples</b>. Each sample represents a value of the audio waveform. Sounds are recorded at a certain <b>sample rate</b>, which simply says how fast to play each sample, and is measured in number of samples per second. Example sample rates are 22,050 and 44,100 samples per second, which are the rates used for radio and CD respectively. In addition, most audio can have more than one channel for stereo or surround, so for example, if the sample is in stereo, the samples will come 2 at a time. When we get data from a movie file, we don't know how many samples we will get, but ffmpeg will not give us partial samples - that also means that it will not split a stereo sample up, either.
</p>
<p>
SDL's method for playing audio is this: you set up your audio options: the sample rate (called "freq" for <b>frequency</b> in the SDL struct), number of channels, and so forth, and we also set a callback function and userdata. When we begin playing audio, SDL will continually call this callback function and ask it to fill the audio buffer with a certain number of bytes. After we put this information in the <tt>SDL_AudioSpec</tt> struct, we call <tt>SDL_OpenAudio()</tt>, which will open the audio device and give us back <i>another</i> AudioSpec struct. These are the specs we will <i>actually</i> be using we are not guaranteed to get what we asked for!
</p>
<h3>Setting Up the Audio</h3>
<p>Keep that all in your head for the moment, because we don't actually have any information yet about the audio streams yet! Let's go back to the place in our code where we found the video stream and find which stream is the audio stream.
</p><pre>// Find the first video stream
videoStream=-1;
audioStream=-1;
for(i=0; i < pFormatCtx->nb_streams; i++) {
if(pFormatCtx->streams[i]->codec->codec_type==CODEC_TYPE_VIDEO
&&
videoStream < 0) {
videoStream=i;
}
if(pFormatCtx->streams[i]->codec->codec_type==CODEC_TYPE_AUDIO &&
audioStream < 0) {
audioStream=i;
}
}
if(videoStream==-1)
return -1; // Didn't find a video stream
if(audioStream==-1)
return -1;
</pre>
From here we can get all the info we want from the <tt>AVCodecContext</tt> from the stream, just like we did with the video stream:
<pre>AVCodecContext *aCodecCtx;
aCodecCtx=pFormatCtx->streams[audioStream]->codec;
</pre>
<p></p>
<p>
Contained within this codec context is all the information we need to set up our audio:
</p><pre>wanted_spec.freq = aCodecCtx->sample_rate;
wanted_spec.format = AUDIO_S16SYS;
wanted_spec.channels = aCodecCtx->channels;
wanted_spec.silence = 0;
wanted_spec.samples = SDL_AUDIO_BUFFER_SIZE;
wanted_spec.callback = audio_callback;
wanted_spec.userdata = aCodecCtx;
if(SDL_OpenAudio(&wanted_spec, &spec) < 0) {
fprintf(stderr, "SDL_OpenAudio: %s\n", SDL_GetError());
return -1;
}
</pre>
Let's go through these:
<ul>
<li><tt>freq</tt>: The sample rate, as explained earlier.</li>
<li><tt>format</tt>: This tells SDL what format we will be giving it. The "S" in "S16SYS" stands for "signed", the 16 says that each sample is 16 bits long, and "SYS" means that the endian-order will depend on the system you are on. This is the format that <tt>avcodec_decode_audio2</tt> will give us the audio in.</li>
<li><tt>channels</tt>: Number of audio channels.</li>
<li><tt>silence</tt>: This is the value that indicated silence. Since the audio is signed, 0 is of course the usual value.</li>
<li><tt>samples</tt>: This is the size of the audio buffer that we would like SDL to give us when it asks for more audio. A good value here is between 512 and 8192; ffplay uses 1024.</li>
<li><tt>callback</tt>: Here's where we pass the actual callback function. We'll talk more about the callback function later.</li>
<li><tt>userdata</tt>: SDL will give our callback a void pointer to any user data that we want our callback function to have. We want to let it know about our codec context; you'll see why.</li>
</ul>
Finally, we open the audio with <tt>SDL_OpenAudio</tt>.
<p></p>
<p>
If you remember from the previous tutorials, we still need to open the audio codec itself. This is straightforward:
</p><pre>AVCodec *aCodec;
aCodec = avcodec_find_decoder(aCodecCtx->codec_id);
if(!aCodec) {
fprintf(stderr, "Unsupported codec!\n");
return -1;
}
avcodec_open(aCodecCtx, aCodec);
</pre>
<p></p>
<h3>Queues</h3>
<p>
There! Now we're ready to start pulling audio information from the stream. But what do we do with that information? We are going to be continuously getting packets from the movie file, but at the same time SDL is going to call the callback function! The solution is going to be to create some kind of global structure that we can stuff audio packets in so our <tt>audio_callback</tt> has something to get audio data from! So what we're going to do is to create a <b>queue</b> of packets. ffmpeg even comes with a structure to help us with this: <tt>AVPacketList</tt>, which is just a linked list for packets. Here's our queue structure:
</p>
<pre>typedef struct PacketQueue {
AVPacketList *first_pkt, *last_pkt;
int nb_packets;
int size;
SDL_mutex *mutex;
SDL_cond *cond;
} PacketQueue;
</pre>
First, we should point out that <tt>nb_packets</tt> is not the same as <tt>size</tt> <tt>size</tt> refers to a byte size that we get from <tt>packet->size</tt>. You'll notice that we have a mutex and a condtion variable in there. This is because SDL is running the audio process as a separate thread. If we don't lock the queue properly, we could really mess up our data. We'll see how in the implementation of the queue. Every programmer should know how to make a queue, but we're including this so you can learn the SDL functions.
<p></p>
<p>
First we make a function to initialize the queue:
</p><pre>void packet_queue_init(PacketQueue *q) {
memset(q, 0, sizeof(PacketQueue));
q->mutex = SDL_CreateMutex();
q->cond = SDL_CreateCond();
}
</pre>
Then we will make a function to put stuff in our queue:
<pre>int packet_queue_put(PacketQueue *q, AVPacket *pkt) {
AVPacketList *pkt1;
if(av_dup_packet(pkt) < 0) {
return -1;
}
pkt1 = av_malloc(sizeof(AVPacketList));
if (!pkt1)
return -1;
pkt1->pkt = *pkt;
pkt1->next = NULL;
SDL_LockMutex(q->mutex);
if (!q->last_pkt)
q->first_pkt = pkt1;
else
q->last_pkt->next = pkt1;
q->last_pkt = pkt1;
q->nb_packets++;
q->size += pkt1->pkt.size;
SDL_CondSignal(q->cond);
SDL_UnlockMutex(q->mutex);
return 0;
}
</pre>
<tt>SDL_LockMutex()</tt> locks the mutex in the queue so we can add something to it, and then <tt>SDL_CondSignal()</tt> sends a signal to our get function (if it is waiting) through our condition variable to tell it that there is data and it can proceed, then unlocks the mutex to let it go.
<p></p>
<p>
Here's the corresponding get function. Notice how <tt>SDL_CondWait()</tt> makes the function <b>block</b> (i.e. pause until we get data) if we tell it to.
</p><pre>int quit = 0;
static int packet_queue_get(PacketQueue *q, AVPacket *pkt, int block) {
AVPacketList *pkt1;
int ret;
SDL_LockMutex(q->mutex);
for(;;) {
if(quit) {
ret = -1;
break;
}
pkt1 = q->first_pkt;
if (pkt1) {
q->first_pkt = pkt1->next;
if (!q->first_pkt)
q->last_pkt = NULL;
q->nb_packets--;
q->size -= pkt1->pkt.size;
*pkt = pkt1->pkt;
av_free(pkt1);
ret = 1;
break;
} else if (!block) {
ret = 0;
break;
} else {
SDL_CondWait(q->cond, q->mutex);
}
}
SDL_UnlockMutex(q->mutex);
return ret;
}
</pre>
As you can see, we've wrapped the function in a forever loop so we will be sure to get some data if we want to block. We avoid looping forever by making use of SDL's <t>SDL_CondWait() function. Basically, all CondWait does is wait for a signal from <tt>SDL_CondSignal()</tt> (or <tt>SDL_CondBroadcast()</tt>) and then continue. However, it looks as though we've trapped it within our mutex if we hold the lock, our put function can't put anything in the queue! However, what <tt>SDL_CondWait()</tt> also does for us is to unlock the mutex we give it and then attempt to lock it again once we get the signal.
<p></p>
<h3>In Case of Fire</h3>
<p>
You'll also notice that we have a global <tt>quit</tt> variable that we check to make sure that we haven't set the program a quit signal (SDL automatically handles TERM signals and the like). Otherwise, the thread will continue forever and we'll have to <tt>kill -9</tt> the program. ffmpeg also has a function for providing a callback to check and see if we need to quit some blocking function: <tt>url_set_interrupt_cb</tt>.
</p><pre>int decode_interrupt_cb(void) {
return quit;
}
...
main() {
...
url_set_interrupt_cb(decode_interrupt_cb);
...
SDL_PollEvent(&event);
switch(event.type) {
case SDL_QUIT:
quit = 1;
...
</pre>
This only applies for ffmpeg functions that block, of course, not SDL ones. We make sure to set the <tt>quit</tt> flag to 1.
<p></p>
<h3>Feeding Packets</h3>
<p>
The only thing left is to set up our queue:
</p><pre>PacketQueue audioq;
main() {
...
avcodec_open(aCodecCtx, aCodec);
packet_queue_init(&audioq);
SDL_PauseAudio(0);
</pre>
<tt>SDL_PauseAudio()</tt> finally starts the audio device. It plays silence if it doesn't get data; which it won't right away.
<p></p>
<p>
So, we've got our queue set up, now we're ready to start feeding it packets. We go to our packet-reading loop:
</p><pre>while(av_read_frame(pFormatCtx, &packet)>=0) {
// Is this a packet from the video stream?
if(packet.stream_index==videoStream) {
// Decode video frame
....
}
} else if(packet.stream_index==audioStream) {
packet_queue_put(&audioq, &packet);
} else {
av_free_packet(&packet);
}
</pre>
Note that we don't free the packet after we put it in the queue. We'll free it later when we decode it.
<p></p>
<h3>Fetching Packets</h3>
<p>
Now let's finally make our <tt>audio_callback</tt> function to fetch the packets on the queue. The callback has to be of the form <tt>void callback(void *userdata, Uint8 *stream, int len)</tt>, where <tt>userdata</tt> of course is the pointer we gave to SDL, <tt>stream</tt> is the buffer we will be writing audio data to, and <tt>len</tt> is the size of that buffer. Here's the code:
</p><pre>void audio_callback(void *userdata, Uint8 *stream, int len) {
AVCodecContext *aCodecCtx = (AVCodecContext *)userdata;
int len1, audio_size;
static uint8_t audio_buf[(AVCODEC_MAX_AUDIO_FRAME_SIZE * 3) / 2];
static unsigned int audio_buf_size = 0;
static unsigned int audio_buf_index = 0;
while(len > 0) {
if(audio_buf_index >= audio_buf_size) {
/* We have already sent all our data; get more */
audio_size = audio_decode_frame(aCodecCtx, audio_buf,
sizeof(audio_buf));
if(audio_size < 0) {
/* If error, output silence */
audio_buf_size = 1024;
memset(audio_buf, 0, audio_buf_size);
} else {
audio_buf_size = audio_size;
}
audio_buf_index = 0;
}
len1 = audio_buf_size - audio_buf_index;
if(len1 > len)
len1 = len;
memcpy(stream, (uint8_t *)audio_buf + audio_buf_index, len1);
len -= len1;
stream += len1;
audio_buf_index += len1;
}
}
</pre>
This is basically a simple loop that will pull in data from another function we will write, <tt>audio_decode_frame()</tt>, store the result in an intermediary buffer, attempt to write <tt>len</tt> bytes to <tt>stream</tt>, and get more data if we don't have enough yet, or save it for later if we have some left over. The size of <tt>audio_buf</tt> is 1.5 times the size of the largest audio frame that ffmpeg will give us, which gives us a nice cushion.
<p></p>
<h3>Finally Decoding the Audio</h3>
<p>
Let's get to the real meat of the decoder, <tt>audio_decode_frame</tt>:
</p><pre>int audio_decode_frame(AVCodecContext *aCodecCtx, uint8_t *audio_buf,
int buf_size) {
static AVPacket pkt;
static uint8_t *audio_pkt_data = NULL;
static int audio_pkt_size = 0;
int len1, data_size;
for(;;) {
while(audio_pkt_size > 0) {
data_size = buf_size;
len1 = avcodec_decode_audio2(aCodecCtx, (int16_t *)audio_buf, &data_size,
audio_pkt_data, audio_pkt_size);
if(len1 < 0) {
/* if error, skip frame */
audio_pkt_size = 0;
break;
}
audio_pkt_data += len1;
audio_pkt_size -= len1;
if(data_size <= 0) {
/* No data yet, get more frames */
continue;
}
/* We have data, return it and come back for more later */
return data_size;
}
if(pkt.data)
av_free_packet(&pkt);
if(quit) {
return -1;
}
if(packet_queue_get(&audioq, &pkt, 1) < 0) {
return -1;
}
audio_pkt_data = pkt.data;
audio_pkt_size = pkt.size;
}
}
</pre>
This whole process actually starts towards the end of the function, where we call <tt>packet_queue_get()</tt>. We pick the packet up off the queue, and save its information. Then, once we have a packet to work with, we call <tt>avcodec_decode_audio2()</tt>, which acts a lot like its sister function, <tt>avcodec_decode_video()</tt>, except in this case, a packet might have more than one frame, so you may have to call it several times to get all the data out of the packet.
<span class="sidenote"><b>* A note: </b>Why <tt>avcodec_decode_audio<b>2</b></tt>? Because there used to be an <tt>avcodec_decode_audio</tt>, but it's now deprecated. The new function reads from the <tt>data_size</tt> variable to figure out how big <tt>audio_buf</tt> is.</span>
Also, remember the cast to audio_buf, because SDL gives an 8 bit int buffer, and ffmpeg gives us data in a 16 bit int buffer. You should also notice the difference between <tt>len1</tt> and <tt>data_size</tt>. <tt>len1</tt> is how much of the packet we've used, and <tt>data_size</tt> is the amount of raw data returned.
<p></p>
<p>
When we've got some data, we immediately return to see if we still need to get more data from the queue, or if we are done. If we still had more of the packet to process, we save it for later. If we finish up a packet, we finally get to free that packet.
</p>
<p>
So that's it! We've got audio being carried from the main read loop to the queue, which is then read by the <tt>audio_callback</tt> function, which hands that data to SDL, which SDL beams to your sound card. Go ahead and compile:
</p><pre>gcc -o tutorial03 tutorial03.c -lavutil -lavformat -lavcodec -lz -lm \
`sdl-config --cflags --libs`
</pre>
Hooray! The video is still going as fast as possible, but the audio is playing in time. Why is this? That's because the audio information has a sample rate we're pumping out audio information as fast as we can, but the audio simply plays from that stream at its leisure according to the sample rate.
<p></p>
<p>
We're almost ready to start syncing video and audio ourselves, but first we need to do a little program reorganization. The method of queueing up audio and playing it using a separate thread worked very well: it made the code more managable and more modular. Before we start syncing the video to the audio, we need to make our code easier to deal with. Next time: Spawning Threads!
</p>
<p>
<em><b>>></b> Spawning Threads</em>
</p>
<a name="tutorial04.html"></a>
<h2>Tutorial 04: Spawning Threads</h2>
<span class="codelink">Code: tutorial04.c</span>
<h3>Overview</h3>
<p>
Last time we added audio support by taking advantage of SDL's audio functions. SDL started a thread that made callbacks to a function we defined every time it needed audio. Now we're going to do the same sort of thing with the video display. This makes the code more modular and easier to work with - especially when we want to add syncing. So where do we start?
</p>
<p>
First we notice that our main function is handling an awful lot: it's running through the event loop, reading in packets, and decoding the video. So what we're going to do is split all those apart: we're going to have a thread that will be responsible for decoding the packets; these packets will then be added to the queue and read by the corresponding audio and video threads. The audio thread we have already set up the way we want it; the video thread will be a little more complicated since we have to display the video ourselves. We will add the actual display code to the main loop. But instead of just displaying video every time we loop, we will integrate the video display into the event loop. The idea is to decode the video, save the resulting frame in <i>another</i> queue, then create a custom event (<tt>FF_REFRESH_EVENT</tt>) that we add to the event system, then when our event loop sees this event, it will display the next frame in the queue. Here's a handy ASCII art illustration of what is going on:
</p><pre> ________ audio _______ _____
| | pkts | | | | to spkr
| DECODE |----->| AUDIO |--->| SDL |-->
|________| |_______| |_____|
| video _______
| pkts | |
+---------->| VIDEO |
________ |_______| _______
| | | | |
| EVENT | +------>| VIDEO | to mon.
| LOOP |----------------->| DISP. |-->
|_______|<---FF_REFRESH----|_______|
</pre>
The main purpose of moving controlling the video display via the event loop is that using an SDL_Delay thread, we can control exactly when the next video frame shows up on the screen. When we finally sync the video in the next tutorial, it will be a simple matter to add the code that will schedule the next video refresh so the right picture is being shown on the screen at the right time.
<p></p>
<h3>Simplifying Code</h3>
<p>
We're also going to clean up the code a bit. We have all this audio and video codec information, and we're going to be adding queues and buffers and who knows what else. All this stuff is for one logical unit, <i>viz.</i> the movie. So we're going to make a large struct that will hold all that information called the <tt>VideoState</tt>.
</p><pre>typedef struct VideoState {
AVFormatContext *pFormatCtx;
int videoStream, audioStream;
AVStream *audio_st;
PacketQueue audioq;
uint8_t audio_buf[(AVCODEC_MAX_AUDIO_FRAME_SIZE * 3) / 2];
unsigned int audio_buf_size;
unsigned int audio_buf_index;
AVPacket audio_pkt;
uint8_t *audio_pkt_data;
int audio_pkt_size;
AVStream *video_st;
PacketQueue videoq;
VideoPicture pictq[VIDEO_PICTURE_QUEUE_SIZE];
int pictq_size, pictq_rindex, pictq_windex;
SDL_mutex *pictq_mutex;
SDL_cond *pictq_cond;
SDL_Thread *parse_tid;
SDL_Thread *video_tid;
char filename[1024];
int quit;
} VideoState;
</pre>
Here we see a glimpse of what we're going to get to. First we see the basic information - the format context and the indices of the audio and video stream, and the corresponding AVStream objects. Then we can see that we've moved some of those audio buffers into this structure. These (audio_buf, audio_buf_size, etc.) were all for information about audio that was still lying around (or the lack thereof). We've added another queue for the video, and a buffer (which will be used as a queue; we don't need any fancy queueing stuff for this) for the decoded frames (saved as an overlay). The VideoPicture struct is of our own creations (we'll see what's in it when we come to it). We also notice that we've allocated pointers for the two extra threads we will create, and the quit flag and the filename of the movie.
<p></p>
<p>
So now we take it all the way back to the main function to see how this changes our program. Let's set up our <tt>VideoState</tt> struct:
</p><pre>int main(int argc, char *argv[]) {
SDL_Event event;
VideoState *is;
is = av_mallocz(sizeof(VideoState));
</pre>
<tt>av_mallocz()</tt> is a nice function that will allocate memory for us and zero it out.
<p></p>
<p>
Then we'll initialize our locks for the display buffer (<tt>pictq</tt>), because since the event loop calls our display function - the display function, remember, will be pulling pre-decoded frames from <tt>pictq</tt>. At the same time, our video decoder will be putting information into it - we don't know who will get there first. Hopefully you recognize that this is a classic <b>race condition</b>. So we allocate it now before we start any threads. Let's also copy the filename of our movie into our <tt>VideoState</tt>.
</p><pre>pstrcpy(is->filename, sizeof(is->filename), argv[1]);
is->pictq_mutex = SDL_CreateMutex();
is->pictq_cond = SDL_CreateCond();
</pre>
<tt>pstrcpy</tt> is a function from ffmpeg that does some extra bounds checking beyond strncpy.
<p></p>
<h3>Our First Thread</h3>
<p>
Now let's finally launch our threads and get the real work done:
</p><pre>schedule_refresh(is, 40);
is->parse_tid = SDL_CreateThread(decode_thread, is);
if(!is->parse_tid) {
av_free(is);
return -1;
}
</pre>
<tt>schedule_refresh</tt> is a function we will define later. What it basically does is tell the system to push a <tt>FF_REFRESH_EVENT</tt> after the specified number of milliseconds. This will in turn call the video refresh function when we see it in the event queue. But for now, let's look at <tt>SDL_CreateThread()</tt>.
<p></p>
<p>
<tt>SDL_CreateThread()</tt> does just that - it spawns a new thread that has complete access to all the memory of the original process, and starts the thread running on the function we give it. It will also pass that function user-defined data. In this case, we're calling <tt>decode_thread()</tt> and with our <tt>VideoState</tt> struct attached. The first half of the function has nothing new; it simply does the work of opening the file and finding the index of the audio and video streams. The only thing we do different is save the format context in our big struct. After we've found our stream indices, we call another function that we will define, <tt>stream_component_open()</tt>. This is a pretty natural way to split things up, and since we do a lot of similar things to set up the video and audio codec, we reuse some code by making this a function.
</p>
<p>
The <tt>stream_component_open()</tt> function is where we will find our codec decoder, set up our audio options, save important information to our big struct, and launch our audio and video threads. This is where we would also insert other options, such as forcing the codec instead of autodetecting it and so forth. Here it is:
</p><pre>int stream_component_open(VideoState *is, int stream_index) {
AVFormatContext *pFormatCtx = is->pFormatCtx;
AVCodecContext *codecCtx;
AVCodec *codec;
SDL_AudioSpec wanted_spec, spec;
if(stream_index < 0 || stream_index >= pFormatCtx->nb_streams) {
return -1;
}
// Get a pointer to the codec context for the video stream
codecCtx = pFormatCtx->streams[stream_index]->codec;
if(codecCtx->codec_type == CODEC_TYPE_AUDIO) {
// Set audio settings from codec info
wanted_spec.freq = codecCtx->sample_rate;
/* .... */
wanted_spec.callback = audio_callback;
wanted_spec.userdata = is;
if(SDL_OpenAudio(&wanted_spec, &spec) < 0) {
fprintf(stderr, "SDL_OpenAudio: %s\n", SDL_GetError());
return -1;
}
}
codec = avcodec_find_decoder(codecCtx->codec_id);
if(!codec || (avcodec_open(codecCtx, codec) < 0)) {
fprintf(stderr, "Unsupported codec!\n");
return -1;
}
switch(codecCtx->codec_type) {
case CODEC_TYPE_AUDIO:
is->audioStream = stream_index;
is->audio_st = pFormatCtx->streams[stream_index];
is->audio_buf_size = 0;
is->audio_buf_index = 0;
memset(&is->audio_pkt, 0, sizeof(is->audio_pkt));
packet_queue_init(&is->audioq);
SDL_PauseAudio(0);
break;
case CODEC_TYPE_VIDEO:
is->videoStream = stream_index;
is->video_st = pFormatCtx->streams[stream_index];
packet_queue_init(&is->videoq);
is->video_tid = SDL_CreateThread(video_thread, is);
break;
default:
break;
}
}
</pre>
This is pretty much the same as the code we had before, except now it's generalized for audio and video. Notice that instead of aCodecCtx, we've set up our big struct as the userdata for our audio callback. We've also saved the streams themselves as <tt>audio_st</tt> and <tt>video_st</tt>. We also have added our video queue and set it up in the same way we set up our audio queue. Most of the point is to launch the video and audio threads. These bits do it:
<pre> SDL_PauseAudio(0);
break;
/* ...... */
is->video_tid = SDL_CreateThread(video_thread, is);
</pre>
We remember <tt>SDL_PauseAudio()</tt> from last time, and <tt>SDL_CreateThread()</tt> is used as in the exact same way as before. We'll get back to our <tt>video_thread()</tt> function.
<p></p>
<p>
Before that, let's go back to the second half of our <tt>decode_thread()</tt> function. It's basically just a for loop that will read in a packet and put it on the right queue:
</p><pre> for(;;) {
if(is->quit) {
break;
}
// seek stuff goes here
if(is->audioq.size > MAX_AUDIOQ_SIZE ||
is->videoq.size > MAX_VIDEOQ_SIZE) {
SDL_Delay(10);
continue;
}
if(av_read_frame(is->pFormatCtx, packet) < 0) {
if(url_ferror(&pFormatCtx->pb) == 0) {
SDL_Delay(100); /* no error; wait for user input */
continue;
} else {
break;
}
}
// Is this a packet from the video stream?
if(packet->stream_index == is->videoStream) {
packet_queue_put(&is->videoq, packet);
} else if(packet->stream_index == is->audioStream) {
packet_queue_put(&is->audioq, packet);
} else {
av_free_packet(packet);
}
}
</pre>
Nothing really new here, except that we now have a max size for our audio and video queue, and we've added a function that will check for read errors. The format context has a <tt>ByteIOContext</tt> struct inside it called <tt>pb</tt>. <tt>ByteIOContext</tt> is the structure that basically keeps all the low-level file information in it. <tt>url_ferror</tt> checks that structure to see if there was some kind of error reading from our file.
<p></p>
<p>
After our for loop, we have all the code for waiting for the rest of the program to end or informing it that we've ended. This code is instructive because it shows us how we push events - something we'll have to later to display the video.
</p><pre> while(!is->quit) {
SDL_Delay(100);
}
fail:
if(1){
SDL_Event event;
event.type = FF_QUIT_EVENT;
event.user.data1 = is;
SDL_PushEvent(&event);
}
return 0;
</pre>
We get values for user events by using the SDL constant <tt>SDL_USEREVENT</tt>. The first user event should be assigned the value <tt>SDL_USEREVENT</tt>, the next <tt>SDL_USEREVENT + 1</tt>, and so on. <tt>FF_QUIT_EVENT</tt> is defined in our program as <tt>SDL_USEREVENT + 2</tt>. We can also pass user data if we like, too, and here we pass our pointer to the big struct. Finally we call <tt>SDL_PushEvent()</tt>. In our event loop switch, we just put this by the <tt>SDL_QUIT_EVENT</tt> section we had before. We'll see our event loop in more detail; for now, just be assured that when we push the <tt>FF_QUIT_EVENT</tt>, we'll catch it later and raise our <tt>quit</tt> flag.
<p></p>
<h3>Getting the Frame: <tt>video_thread</tt></h3>
<p>
After we have our codec prepared, we start our video thread. This thread reads in packets from the video queue, decodes the video into frames, and then calls a <tt>queue_picture</tt> function to put the processed frame onto a picture queue:
</p><pre>int video_thread(void *arg) {
VideoState *is = (VideoState *)arg;
AVPacket pkt1, *packet = &pkt1;
int len1, frameFinished;
AVFrame *pFrame;
pFrame = avcodec_alloc_frame();
for(;;) {
if(packet_queue_get(&is->videoq, packet, 1) < 0) {
// means we quit getting packets
break;
}
// Decode video frame
len1 = avcodec_decode_video(is->video_st->codec, pFrame, &frameFinished,
packet->data, packet->size);
// Did we get a video frame?
if(frameFinished) {
if(queue_picture(is, pFrame) < 0) {
break;
}
}
av_free_packet(packet);
}
av_free(pFrame);
return 0;
}
</pre>
Most of this function should be familiar by this point. We've moved our <tt>avcodec_decode_video</tt> function here, just replaced some of the arguments; for example, we have the AVStream stored in our big struct, so we get our codec from there. We just keep getting packets from our video queue until someone tells us to quit or we encounter an error.
<p></p>
<h3>Queueing the Frame</h3>
<p>
Let's look at the function that stores our decoded frame, <tt>pFrame</tt> in our picture queue. Since our picture queue is an SDL overlay (presumably to allow the video display function to have as little calculation as possible), we need to convert our frame into that. The data we store in the picture queue is a struct of our making:
</p><pre>typedef struct VideoPicture {
SDL_Overlay *bmp;
int width, height; /* source height & width */
int allocated;
} VideoPicture;
</pre>
Our big struct has a buffer of these in it where we can store them. However, we need to allocate the <tt>SDL_Overlay</tt> ourselves (notice the <tt>allocated</tt> flag that will indicate whether we have done so or not).
<p></p>
<p>
To use this queue, we have two pointers - the writing index and the reading index. We also keep track of how many actual pictures are in the buffer. To write to the queue, we're going to first wait for our buffer to clear out so we have space to store our <tt>VideoPicture</tt>. Then we check and see if we have already allocated the overlay at our writing index. If not, we'll have to allocate some space. We also have to reallocate the buffer if the size of the window has changed! However, instead of allocating it here, to avoid locking issues. (I'm still not quite sure why; I believe it's to avoid calling the SDL overlay functions in different threads.)
</p><pre>int queue_picture(VideoState *is, AVFrame *pFrame) {
VideoPicture *vp;
int dst_pix_fmt;
AVPicture pict;
/* wait until we have space for a new pic */
SDL_LockMutex(is->pictq_mutex);
while(is->pictq_size >= VIDEO_PICTURE_QUEUE_SIZE &&
!is->quit) {
SDL_CondWait(is->pictq_cond, is->pictq_mutex);
}
SDL_UnlockMutex(is->pictq_mutex);
if(is->quit)
return -1;
// windex is set to 0 initially
vp = &is->pictq[is->pictq_windex];
/* allocate or resize the buffer! */
if(!vp->bmp ||
vp->width != is->video_st->codec->width ||
vp->height != is->video_st->codec->height) {
SDL_Event event;
vp->allocated = 0;
/* we have to do it in the main thread */
event.type = FF_ALLOC_EVENT;
event.user.data1 = is;
SDL_PushEvent(&event);
/* wait until we have a picture allocated */
SDL_LockMutex(is->pictq_mutex);
while(!vp->allocated && !is->quit) {
SDL_CondWait(is->pictq_cond, is->pictq_mutex);
}
SDL_UnlockMutex(is->pictq_mutex);
if(is->quit) {
return -1;
}
}
</pre>
The event mechanism here is the same one we saw earlier when we wanted to quit. We've defined <tt>FF_ALLOC_EVENT</tt> as <tt>SDL_USEREVENT</tt>. We push the event and then wait on the conditional variable for the allocation function to run.
<p></p>
<p>
Let's look at how we change our event loop:
</p><pre>for(;;) {
SDL_WaitEvent(&event);
switch(event.type) {
/* ... */
case FF_ALLOC_EVENT:
alloc_picture(event.user.data1);
break;