/
tutorial.html
864 lines (835 loc) · 51.6 KB
/
tutorial.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<title>TICT Tutorial S1P9 - Code optimization</title>
<link rel="stylesheet" type="text/css" media="screen, projection" href="style.css" />
</head>
<body>
<!-- ====================================================================== -->
<!-- HEADER OF TUTORIAL -->
<!-- ====================================================================== -->
<h1 class="toptitle">TICT TUTORIAL SERIES 1 - Part IX</h1>
<h2 class="subtitle2">© TI-Chess Team 2004-2008</h2>
<h2 class="subtitle2">Optimizing code</h2>
<hr>
<!-- ====================================================================== -->
<!-- FOCUS OF TUTORIAL -->
<!-- ====================================================================== -->
<h2>Focus of this tutorial, several important notes</h2>
<blockquote>
<p>I've read the source of dozens of programs, starting with the TICT ones,
to look for optimizations and at report them to their authors, sometimes
with modified code. That's somehow the way I became member of TICT, when
Tom was too busy to implement my suggestions (no code from me at that time,
I was a beginner) and I had more time than he had. I found that a number of
same optimizations can be applied to many programs, so I thought I could
pack them into a tutorial, whose content can be applied <em>while</em>
programming or <em>after</em> programming.<br>
<br>
First of all, I should mention that me mentioning programs where I applied
optimizations is not done so as to point that the coder was bad. After all,
I once did not know all those things about optimization, and it took me
years to learn what I know.<br>
This is done so as to give actual examples of already optimized code
if my explanations are unclear. I know you may find them to be unclear,
since multiple persons asked me to "decrypt" somewhat what I wrote them.<br>
<br>
I should also mention that optimization can collide with readability,
portability and maintainability, and that this tutorial does therefore not
deal with "modern" coding practices: it deals with optimized code on
platforms where optimization matters a lot, more than on other platforms.<br>
This tutorial is made mainly for the CISC 68000 processor in TI-68k
calculators. Although a number of optimizations apply to all simple enough
processors (not super-scalar or long-pipelined, without branch prediction
unit to a lesser extent), many are 68000-specific. I have checked with
HP-GCC that a number of optimizations also work very well on the not-so-RISC
ARM9 used in the HP-49G+, but for example, the comment about shifts and
rotates being slow does obviously not apply to ARM processors at all, as
instant shifts & rotates are a strength of ARM processors.<br>
<br>
The compiler version also comes into play, but it's been a while since
only GCC 4.0+ is usable with TIGCCLIB. GCC 4.0+ versions are usually
stronger than GCC 3.3.x for optimizing, with the caveat that -O2, -O3 and
of course -O4 are now hardly usable if you want to recompile a program
designed for previous versions without facing an important size increase...
That said, it's often pessimizing a number of hand-tuned C code
constructions: all TIGCC-GCC 4.0.x versions so far (up to 4.0.2) grossly
miscompile cast-as-lvalues (deprecated extension, removed from the FSF GCC
but enabled again by Kevin to keep backwards compatibility, he was right
doing it because that extension is powerful). On another project, both
versions perform terrible interprocedural register allocation, despite
thoughtful explicit register passing convention... Maybe the IPO in GCC
4.1 and later can improve that.<br>
<br>
<br>
When I started programming on TI-68k calculators, I found the TICT
tutorials helpful, but I had never made one entirely by myself (I worked on
extending and optimizing S1P6). Here is "my own".<br>
I started writing about four years ago, when I thought I'd have no
more time to program on TI-68k calculators during the school year (which
happened one year later). And it was becoming clear than I should spend
some time on other concepts (starting with OO), languages (Java, Perl,
PHP, etc.) and platforms.<br>
The TI-68k calculators platform is great for learning a number of concepts
and practices dealing with programming on embedded platforms (which is the
specialty of my last year of studies), but not much more than that -
although it has simple cooperative multitasking, for instance.<br>
I learnt a lot about programming, and about human relationship as well. I
could never have imagined several kinds of behaviour I saw. Forums and
e-mails taught me to poke deeper into the real state of things.
There's no unreplaceable single person in the community.</p>
</blockquote>
<br>
<hr>
<!-- ====================================================================== -->
<!-- HAND-MADE CODE OPTIMIZATION TRICKS -->
<!-- ====================================================================== -->
<h2>Hand-made code optimization tricks</h2>
<strong>Coding style:</strong>
<blockquote>
<ul>
<li><span class="under">if/else if or switch optimization</span>: try to
make the items as symmetric as possible. If all items have common
instructions, put them last if possible: this helps the compiler, at
least on -Os level. For example, take this piece of code in
TICT-Explorer 1.40, extension.c:
<pre>case SDT_LIST: // 1<br> strcpy(comment,MSGDIRECT_COMMENT_LIST);<br> break;<br>case SDT_MAT: // 2<br> strcpy(comment,MSGDIRECT_COMMENT_MAT);<br> break;<br>case SDT_FUNC: // 3<br> *buftypeaddr = TYPE_BASIC;<br> strcpy(comment,MSGDIRECT_COMMENT_FUNC);<br> break;<br>case SDT_PRGM: // 4<br> *buftypeaddr = TYPE_BASIC;<br> strcpy(comment,MSGDIRECT_COMMENT_PRGM);<br> break;<br>case SDT_PIC: // 5<br> *buftypeaddr = TYPE_PIC;<br> strcpy(comment,MSGDIRECT_COMMENT_PIC);<br> break;<br>case SDT_STR: // 6<br> strcpy(comment,MSGDIRECT_COMMENT_STRING);<br> break;<br>case SDT_TEXT: // 7<br> *buftypeaddr = TYPE_TEXT;<br> strcpy(comment,MSGDIRECT_COMMENT_TEXT);<br> break;<br>case SDT_GDB: // 8<br> strcpy(comment,MSGDIRECT_COMMENT_GDB);<br> break;<br>case SDT_DATA: // 9<br> strcpy(comment,MSGDIRECT_COMMENT_DATA);<br> break;<br>case SDT_FIG: // 10<br> strcpy(comment,MSGDIRECT_COMMENT_FIG);<br> break;<br>case SDT_MAC: // 11<br> strcpy(comment,MSGDIRECT_COMMENT_MACR);<br> break;</pre>
If you invert a <code>*buftypeaddr = ...;</code> and a
<code>strcpy(...);</code> and recompile, you'll notice a significant
size increase (at least with GCC 3.3-).</li>
<li><span class="under">inlining</span>: declare <code>inline</code>
(preferably preceded with <code>static</code>if the function is used
only in the file it is written in, although the unit-at-a-time mode in
GCC 4.0+ should figure that out on its own) functions used at only one
or two places in the program, all the more they're small and/or often
executed. Anyway, the compiler won't usually inline pieces of code if
it can figure out that it will give worse code.<br>The GCC 4.0+ inliner
is more porwerful than that of GCC 3.3-, it's responsible for a part of
the size increase with -O2 and more. Most likely, the greatest part,
since the size increase is noticeable even on code with no "special"
things such as numerous multiplications or tight loops with small
numbers of iterations)<br>
Inlining can yield speed and size optimizations that would not have
been possible otherwise. The speed-optimized version of the new pure
ASM ttunpack routine by Samuel Stearley uses inlining, saving hundreds
thousands of clocks (previously at < 30 KB/sec, now at > 80
KB/sec); on the contrary, the size-optimized version (more than twice
smaller, but < 30 KB/sec) spends much more time branching all over
the place due to no inlining.<br>
Starting from TIGCC 0.96, the small version is the default one in the
specific launchers TIGCC generates, which should hardly ever be used
anyway: as soon as there's more than <strong>one</strong> such launcher
at a time on a calculator, it's smarter to use a generic launcher (ttstart,
SuperStart). That's a space savings and a single point of update in case
there's a HW update that breaks the existing launchers (like HW3 did).</li>
<li><span class="under">optimizing structured programming</span>: if you
split your programs into many functions (like you're certainly learning
at school if you study computer science), try to reduce the drawbacks
of this practice, which always slows down programs on our platforms
(the processor doesn't feature a branch prediction unit) and can
increase size, all the more the compiler is not given
-fomit-frame-pointer (unlikely with GCC 4.0, unless you compile without
optimization - ! - or explicitely use -fno-fomit-frame-pointer, see
below). Very short functions like:
<pre>returntype foo(type1 param1, type2 param2, type3 param3, type4 param4) {<br> return bar(param1, param2, param3, param4, TRUE);<br>}<br><br>globaltype returnglobal1(void) {<br> return global1;<br>}</pre>
*should* be declared <code>inline</code>, or be turned into macros
(provided that visibility is not hurt): this saves both space and run
time !<br>
Functions that are called thousands of time (like interrupt handler
subroutines) *should* be logically inlined (<code>static inline</code>
/ macros), all the more there are many of them. 34 clocks (the minimum
call/return penalty, not taking anything else, like parameter
passing/retrieving, into account) * 256 Hz (AI1 rate on HW2, higher on
HW1) * 20 subroutines is 174080, ~1.5% of total HW2 processor speed.
This is neglectable, but interrupt handlers should always execute as
fast as possible.<br>
All that said, while excessive split into subroutines slows down, can
increase size and lower readability, the opposite excess reduces
extensibility and maintainability (<em>in either way, do comment your
sourcecode</em>), and it's fairly hard to make maintainable code out of
a messy one.</li>
<li><span class="under">non-structured programming</span>: <em>provided
you handle error conditions correctly</em>, for efficiency, you can use
use goto, break, continue, returns in the middle of functions. My
teachers would kill me if I dared turn homework coded the way the
modified tthdex is coded...</li>
<li><span class="under">global register variables</span>: probably the
best way to have optimized references to globals (see below).</li>
<li><span class="under">optimized string arrays</span> (all program
sizes) and/or <span class="under">optimized function pointer
arrays</span> (may be impossible with programs larger than 32 KB).
Travis Fischer (Fisch2) made a tool for strings and released it on
ticalc.org, you can find the link at the end of this file. Switching to
optimized function pointer arrays saved ~1400 bytes in GFA-TEM
(GFA-Basic).</li>
<li><span class="under">loop / array subscripts optimization 1</span>:
don't use C multi-dimensional arrays (use single-dimensional arrays
with an accessor macro); use auxiliary pointers with postincremented
/ predecremented mode instead of array subscripts whenever possible.
The compiler cannot usually do such optimizations because they don't
preserve the exact meaning of the program. That is to say, replace
code such as
<pre>for (i = 0; i < N; i++) {<br> T[i] = i;<br>}</pre>
by
<pre>ptr = &T[0];<br>for (i = 0; i < N; i++) {<br> *ptr++ = i;<br>}</pre>
or (excerpt from TICT-Explorer 1.30):
<pre>if (search_for_file) {<br> for (i=0;i<file_count;i++) {<br> if (!strcmp(search_for_file,file_list[i].name)) {<br> active_file = i;<br> if (active_file > C89_92(9,12)) {<br> file_winpos = C89_92(9,12);<br> }<br> else {<br> file_winpos = active_file;<br> }<br> }<br> }<br>}</pre>
by
<pre>if (search_for_file) {<br> file_t *f = &file_list[0];<br> for (i=0;i<file_count;i++) {<br> if (!strcmp(search_for_file,f->name)) {<br> active_file = i;<br> if (active_file > C89_92(9,12)) {<br> file_winpos = C89_92(9,12);<br> }<br> else {<br> file_winpos = active_file;<br> }<br> }<br> f++;<br> }<br>}</pre>
or (excerpt from TI-Chess 4.14-):
<pre>for (j=0;j<2;j++) {<br> if (!j) magic = MAGIC_BOOK_WHITE;<br> else magic = MAGIC_BOOK_BLACK;<br><br> nr_books[j] = FindAndOpenTICFiles(bookfiles[j],MAX_BOOKS_USED,magic);<br><br> for (i=0;i<nr_books[j];i++) {<br> src = bookfiles[j][i].start+6;<br> books[j][i].nr_pos = *(unsigned short*)src;<br> src += 2;<br> books[j][i].nr_moves = *(unsigned short*)src;<br> src += 2;<br> books[j][i].first_hashcode = *(hash_t*)src;<br> src += (books[j][i].nr_pos-1)*10;<br> books[j][i].last_hashcode = *(hash_t*)src;<br> }<br>}</pre>
by
<pre>for (j=0;j<2;j++) {<br> ptrbooks = &books[j][0];<br> ptrbookfiles = &bookfiles[j][0];<br><br> if (!j) magic = MAGIC_BOOK_WHITE;<br> else magic = MAGIC_BOOK_BLACK;<br><br> nr_books[j] = FindAndOpenTICFiles(bookfiles[j],MAX_BOOKS_USED,magic);<br><br> for (i=0;i<nr_books[j];i++) {<br> src = ptrbookfiles->start+6;<br> ptrbooks->nr_pos = *(((unsigned short*)src)++);<br> ptrbooks->nr_moves = *(((unsigned short*)src)++);<br> ptrbooks->first_hashcode = *(hash_t*)src;<br> src += (ptrbooks->nr_pos-1)*10;<br> ptrbooks->last_hashcode = *(hash_t*)src;<br><br> ptrbooks++;<br> ptrbookfiles++;<br> }<br>}</pre>
<br>
It may be hard to force GCC 4.0+ to generate postincremented mode.
Adding the construct Kevin suggested me for TI-Chess 4.12
<pre>asm volatile (""::"a"(ptr));</pre>
right after a postincremented mode use <em>might</em> improve
things.<br>
If the code within the loop allows it, you can replace
<pre>for (i = 0; i < 2000; i++) {<br> ...<br>}</pre>
by
<pre>for (i=2000; (i--);) {<br> ...<br>}</pre>
which makes GCC generate the dbf processor instruction, often leading
to smaller code overall (although GCC could sometimes be smarter when
generating dbf instructions).</li>
<li><span class="under">loop / array subscript optimization 2</span>:
DEREFSMALL is an useful macro defined as
<pre> #define DEREFSMALL(__p,__i) \<br> (*(typeof(&*(__p)))((unsigned char*)(__p)+(long)(short)((short)(__i)*sizeof(*(__p)))))</pre>
<em>(yes, the <code>&*</code> is necessary)</em><br>
For "small" arrays (well, smaller than 32768 bytes !),
<code>DEREFSMALL(p,i)</code> is just the same as <code>p[i]</code>, but
more optimized (faster and smaller). GCC cannot know whether an array
is small enough to use this construct by itself, so it will use the
general way that never fails. Such a macro was probably used in AMS
1.xx in the HeapDeref function and macro (various functions using
ROM_CALL_441 "HeapTable" in some way), but AMS 2.xx (and obviously, AMS
3.xx, which is even worse in terms of optimization) probably use the
general way.</li>
<li><span class="under">local variables</span>: do not use large local
variables, especially if they are initialized and never change. Indeed,
they a) load the stack, which may lead to stack overflows when launched
from file explorers, as most of them do not leave the entire stack
empty and b) turn into slower and bigger programs (code is required to
copy data from the executable onto the stack). The code itself does not
always allow this optimization, though. It was possible on Venus
(movements.c), it is possible on TI-Pinball.</li>
<li><span class="under">loop strength</span>: beware of tight loops, plain
68000 doesn't have a branch prediction unit. Unrolling several loops a
bit costs several bytes (up to several percents of the total size) but
can greatly increase speed. This happened in the ExtGraph tilemap engine
(Refresh* - 20%), and we could have pushed further in that direction.</li>
<li><span class="under">multiplications</span>: beware that -Os (default
TIGCC setting) will generate multiplies instead of the equivalent
bigger but faster add/shift/subtract sequence, when multiplying by a
non-power of 2. This might be an issue speed-wise. I use -O2 or -O3
when I need speed (C routines of ExtGraph or the TI-Chess engine for
example), I use -Os when I need size optimization and speed doesn't
matter (most TICT games, interface of TI-Chess, TICT-Explorer for
example). Using command-line compilation (batches, makefiles), you can
mix files compiled with different options: interface should usually be
-Os, algorithms should usually be -O2/-O3.<br>You should avoid TIGCC
Projects, at least in their current form. I switched a number of TICT
programs to them *before* I knew of their exact drawbacks (several
folks over at yAronet said they sucked, but I had never stumbled across
the problems), and now, it doesn't make much sense to modify back the
sources to revert to batches.</li>
<li><span class="under">fast string drawing</span>: if your program is a
bit slow (nearer from 10 FPS than from 20 FPS - more than 20 FPS is
pointless due to a rather bad screen) and you're drawing strings,
consider using fast methods such as that used in ebook 2.06+,
TICT-Explorer 1.40+, TI-Chess 4.10+, S1P6, Ice Hockey 68k (other
programmers use similar methods); use <em>fastitoa.h</em> (browse down
the news page of the TICT website). Doing this boosted FL's Game of
Life, Ice Hockey 68k, etc. for minimal size cost, if even positive,
given that their __regparm__ calling convention is more efficient
size-wise than that of DrawStr / sprintf.<br>
Like the kernel RAM_CALLs, this method gives direct access to the font
data. Unlike the kernel RAM_CALLs, pointers never point to garbage (due
to an unfortunate method to retrieve addresses, kernel RAM_CALLs can -
you can see that with an old Solar Striker version on a Titanium), it
is very fast to set up, and it takes the AMS 2.xx and later font
redefinition possibility into account.<br>
If you need a special drawing mode, tell me or tell someone on the
boards, so that you can be pointed to an existing program, or someone
that might make it for you. A complete set and support of such routines
has been in the todo list of ExtGraph 2.00 Betas for a long time, but
it's still not done.</li>
<li><span class="under">shifts and rotates</span> are rather slow. For
example, I made FastSprite32_MIRROR_H_R from ExtGraph twice faster than
the original one by removing shifts. See also the assembly trick
below.</li>
<li><span class="under">bit instructions</span> can be very useful
size-wise and speed-wise. This is why the EXT_...PIX_AN macros in
ExtGraph 2.xx use them. We'll deprecate those macros (which proved to
be buggy so many times until 2.00 Beta 5...) when GCC always generates
bit instructions on the old EXT_...PIX version using EXT_...PIX_AM.
Compression/decompression routines also benefit from them.</li>
<li>AMS <span class="under">floating-point numbers</span> are slow. Peter
J. Rowe (Mig53) has worked on usable fast (binary) floating point
(MC68343-style) routines for TI-68k calculators, I don't know what is
the current state of that project. They boost the very few programs
that really need them, at the expense of size of course. Note that
fixed-point math may be enough (there's also a FIP library by Mig53,
and another one by I don't remember whom), and it's faster. ClosedGL
badly needs FFP routines, I talked about that with its author.</li>
<li><span class="under">instruction scheduling</span>: carefully analyse
your algorithms to schedule tests and branches, remove bottlenecks
(there may be another way to do the same thing faster: this often
happens with bit manipulations). In the core of the Dissolve effect,
there used to be a test *after* a shift: putting it *before* saves a
number of clocks several thousands of times...<br>
If at the end, it turns out that GCC could be generating smarter code,
you can always switch to inline ASM with C operands, but it's not
always easy to use (ahem, the ExtGraph pixel macros...).</li>
<li><span class="under">calling conventions</span>: when passing
parameters through registers, try to keep most parameters in
d0-d2/a0-a1 (the TIGCC documentation suggests using <em>up to</em> six
registers to pass parameters). I used d3 or a2 in several functions of
ExtGraph, and a2-a3 in the tilemap engine because I just don't have
time to modify nearly all functions to have them take their parameters
outside of d0-d3/a0-a1 on the stack. If you use registers d3-d7/a2-a6
to pass parameters, you'll leave the compiler less registers it can use
permanently (the standard calling convention being "d0-d2/a0-a1 can be
destroyed"). This may <em>in fine</em> turn into less optimized code -
all the more this can prevent using -freg-relative-an / global register
variables (the ExtGraph tilemap engine patch made by Kevin so that the
TIGCCLIB doublebuffering is usable with the tilemap engine does, that's
why I don't support it; read on).</li>
<li><span class="under">file handling</span>: use vat.h functions instead
of stdio.h functions (faster, smaller, easy to use). This is basically
what you're doing on *nix platforms within a mmap ... munmap pair: all
files are memory-mapped on our platform.
When using vat.h functions, you can sometimes use SYM_STRs computed at
compile-time instead of ordinary C strings (which have to be converted
to SYM_STRs by SYMSTR at run-time). This has saved hundreds of bytes in
Ice Hockey 68k and TI-Chess.</li>
</ul>
</blockquote>
<strong>Memory allocation/management</strong>:
<blockquote>
<ul>
<li>rather than worrying about every possible case where an allocation
could fail, design your program in such a way that you can easily find
and free all the stuff you've already allocated.<br>
The best way to do this is usually to pack separate memory allocations
into a single allocation. This will save HANDLEs (the number of memory
blocks on our calculators is limited to 2000), and most of all code
space (since there's only one check for successful allocation and only
one free). Have a look at Ice Hockey 68k and TI-Chess for complete code
examples (optimizing memory allocation saved several hundreds of
bytes).<br>It is more sensitive to memory fragmentation, but I never
stumbled across the problem on my calculator, despite huge uptimes
(measured through FiftyMsecTick). If a program cannot allocate a single
block of 20 or 30 KB, well, the calculator cannot run large programs
either, so it should be reset !<br>
Never allocate small blocks (smaller than, say, 32 bytes): use a
pooling allocator instead, all the more the AMS functions are rather
slow.<br>
In addition to that, you can use ...throw functions and an error
handler (TRY/ONERR/ENDTRY or TRY/FINALLY/ENDTRY) that always frees
everything you allocated.</li>
<li>if speed matters, avoid memcpy/memset/memmove when the amount of data
is smaller than several hundreds of bytes, as these functions are
rather well optimized for "large" blocks (even if they cannot rival the
brute-force movem trick used in grayscale supports and plane copy
routines), but there's an overhead due to them being generic functions.
The GCC versions in TIGCC are currently unfortunately unable to generate
small inline copy loops (GCC 4.3+ is supposed to know how to do that).
For once, GCC will often generate code worse speed-wise than that the
[insert swear words here] compiler in TIFS spits out. Writing such
loops is easy, in both C and ASM.</li>
</ul>
</blockquote>
<strong>Structures, unions:</strong>
<blockquote>
<ul>
<li>Pad structs and unions out to a size that is a power of two (GCC will
generate multiplies on -Os level otherwise, which may be an issue
speed-wise). Arrange structs to minimize the amount of wasted space:
pack chars together. Beware of words and longs at odd addresses (GCC
should warn you), they trigger the dreaded "Address Errors".</li>
<li>Put the most-frequently-accessed member of a struct first so that a
more efficient addressing mode can be used under some conditions. If
you use internal structures that you partly include in your savefiles,
all saved members should be consecutive so that you can use memcpy /
memset with VAT functions (you don't use stdio.h functions, do you ?).
This was done in Venus.</li>
<li>When using a switch, use tightly packed values for the cases if
possible, the jump tables will be smaller that way. If speed matters,
do not use if-else if-... chains when you can use a switch (it usually
increases size, but not always), except for small chains.</li>
<li>Don't mix types in such a way as to force many unnecessary sign
extensions. signed char subscripts do (and GCC usually warns about
them), as the 68000 doesn't have the d(an,dn/an.b) addressing mode.</li>
<li>Do sanity-check the compiler's output from time to time, using
-save-temps, especially on your inner loops; it might reveal an issue
with your code or most often, bad code generated by GCC.</li>
</ul>
</blockquote>
<strong>Assembly tricks:</strong>
<blockquote>
<ul>
<li><span class="under">Pack writes to memory</span>. That is to say,
frequent<br>
<pre>move.w #word2,-(sp)<br>move.w #word1,-(sp)</pre>
can be replaced by
<pre>move.l #((word1)*65536+word2),-(sp)</pre>
<pre>clr.w d(sp)<br>clr.w (d+2)(sp)</pre>
and other combinations of arithmetic operations and addressing modes,
can be replaced by
<pre>clr.l d(sp)</pre>
<strong>unless at least one of the variables is
<em>volatile</em></strong>, which is infrequent.<br>
The former optimization is now in the TIGCC peephole optimizer, I
bugged Kevin many times to add it ;-). Adding the latter in TIGCC could
save at least 100 more bytes in TICT-Explorer and similar programs
(many zero-initialized local variables on the stack). GTC can perform at
least the latter.</li>
<li>There's an interesting way to <span class="under">combine two bytes
into one word</span>, storing the result in a register. The first idea
that comes to mind is obviously
<pre>move.b <ea1>,dn<br>lsl.w #8,dn<br>move.b <ea2>,dn</pre>
However
<pre>move.b <ea1>,-(sp)<br>move.w (sp)+,dn<br>move.b <ea2>,dn</pre>
is faster and not necessarily bigger.<br>
This trick is used in at least the speed-optimized version of the
latest TTPack/PPG decompression routine I'm talking about above.</li>
<li>Think of <span class="under">using the CPU flags</span>, especially
the C, N flags and combinations of them. Conditionally doing something
when an unsigned char value is above 0x80, an unsigned short is above
0x8000, an unsigned long is above 0x80000000, can be achieved without
any comparison, just (signed) pl and mi branches. Checking multiple
bits one at a time can be achieved by shifting and checking C.<br>
This kind of tricks is frequently used in ExtGraph between others, and
GCC can perform at least some of them on its own: for example, it can
generate a single unsigned comparison for the following code:
<pre>if ((foo < 0) || (foo > bar)) { ... }</pre>
This kind of trick enabled me to save 2 bytes in ttstart, and most of
all 12 bytes out of 32 (!) on the VTI detection method by JM.
</li>
<li><span class="under">immediate comparisons</span>: If the value of the
"comparand" can be destroyed, you can replace
<pre>cmpi.w #[-8..-1/1..8],<ea></pre>
by
<pre>subq.w #[-8..-1/1..8],<ea></pre>;
<pre>cmpi.size #0,<ea></pre>
is better under the form
<pre>tst.size <ea></pre>
This is used in PolySnd, ExtGraph.</li>
<li>The trick used in kernel-based programs' headers (reproduced below in
a mix of assembly dialects) is the smallest way to push a pointer on
the stack. I like it because it's a very specific and infrequent, but
clever, use of bsr:
<pre>tst.w $30.w | Check the kernel magic.<br>beq.s there_below | Branch taken -> none installed.<br>movea.l $34.w,a0 | kernel::exec<br>jmp (a0) | The execution never resumes at printstr.<br>printstr: | Print the string whose address is on the stack.<br>movea.l $C8,a0<br>movea.l $398(a0),a0 | ST_helpMsg<br>jsr (a0)<br>| 4 to remove string address, 4 to undo the first bsr (not reproduced here)<br>| right before the program header.<br>addq.w #8,sp<br>| Return to launcher (AMS, a pstarter, ttstart, SuperStart, some file explorer, etc.).<br>rts<br>there_below:<br>| Pushes the address of the string right after this instruction and branch above. Never returns.<br>bsr.s printstr<br>.ascii "Kernel required"</pre>
</li>
</ul>
A number of those optimizations was performed in the latest version (2.10)
of star (Starfield Effect by TICT), the latest version of TI-Miner, the
latest version of TICT-Explorer, TICT Tutorial S1P6, Ice Hockey 68k, Civ89, etc.
</blockquote>
<hr>
<!-- ====================================================================== -->
<!-- OPTIMIZED COMPILATION OPTIONS -->
<!-- ====================================================================== -->
<h2>Optimized compilation options</h2>
<strong>Most of those optimizations cannot be enabled by default in the
compiler, for backwards compatibility and/or lowest possible side effects on
the code.</strong> It's up to you to use them.<br>
<blockquote>
<ul>
<li><strong>separate builds</strong> (one for 89/89T, one for 92+/V200)
just as I do in TICT programs. <em>This is a multi-kilobyte
optimization on Ice Hockey 68k, Hawk, Backgammon, many others - and the
quickest one !</em><br>
This point of view is not shared by everyone in the community. Some
proeminent member fights against on-calc incompatibility, calling on-calc
compatibility "functionality". This is arguable, since some end users do
not like on-calc-incompatible programs... although on-calc compatibility
takes space that could be used to improve programs speed-wise and
<em>functionality</em>-wise...<br><br>
The fact is, calculators have been sold packaged with links for years.
In other words, most TI-68k users now have link cables, many more
than back in 2000 when I bought my 89. Internet connections are much more
common than in 2000 as well. This means that users can download the
binaries for their particular calculator model, and transfer them to their
calculators - or one of their friends in the same classroom can.
Moreover, TI-89(T) are a majority. On-calc compatibility basically makes
89 users bother with 92+/V200-only code which neither them nor most
calculators around them will ever use ! This code makes the programs
they use bigger (and very slightly slower, but the difference is definitely
not noticeable)...<br><br>
<em>On-calc incompatibility is actually not so much of a drawback in
terms of use</em>: if end-users *really* want a program (game, cheat,
"clack", etc.), then they do what they have to do to get it working
(upgrade the AMS, use PreOS, remove language localizations, use the
version adapted to their calculator model, etc.). TICT programs, which
I didn't create but happen to maintain, are rather widely used -
especially TI-Chess - while being and becoming on-calc incompatible,
aren't they ;-) ?<br><br>
I estimate TI-Chess would be more than 10 KB (!) larger (uncompressed)
if it were on-calc compatible, due to storing keyboard handling and GFX
for both models in the same executable. XtraKeys does exactly that, which
makes it much larger than it could be. And the first TICT-Explorer 1.30+
versions, way before the 1.40 ones, are also kilobytes larger when
changing the definitions of the C89_92 macros to use compat.h definitions
(just for testing purposes).<br>
<strong>The "Optimize Calc Consts" option</strong> makes on-calc-incompatible
programs with a single build, but the results are far from being as
good as those separate builds can yield, because the compiler must
generate code that reads a global variable instead of optimizing
constants away. More than 1 KB IIRC of extra code for the first
TICT-Explorer 1.30+ versions. Therefore, I advise against using
that option.<br><br>
Some persons have got a problem with compiling their program twice (or
three times if the program's design allows compiling an on-calc compatible
version <strong>in addition to</strong> the on-calc-incompatible ones -
TI-Chess' design does not, with good reason, as stated above), because it
takes more time. On computers running a real OS (i.e. not Win 9x or ME
- NT-derived Windows or even more *nix/BSD handle launching external
programs quickly), this looks like a non-argument.<br>Indeed, compiling
the program more than once is hardly necessary in development stage,
i.e. most of the time. For TI-Chess and TICT-Explorer, the longest
compilation takes less than 15 seconds total on my 4+-year-old computer,
with a significant part of that time spent reordering sections for
greater optimization. It's true that when making the distribution
packages, I compile TI-Chess <em>eight</em> times and TICT-Explorer
<em>twelve</em> times, due to language localizations. But the process
is neither very frequent nor extremely long, and <strong>very</strong>
few other programs have more than two language localizations...</li>
<li><strong>-fomit-frame-pointer</strong> (now default in TIGCC 0.96+
with GCC 4.0+ - for a long time, it was not default because it didn't
work with floating-point): the compiler will not use any frame pointers
(safe ways to access local variables and parameters on the stack) if
they're not necessary, which will turn into more optimized (faster,
smaller) code. This option is an important one on the 68000
architecture, especially if there are many small subroutines (which
could often be logically inlined, as mentioned above).</li>
<li><strong>-mno-bss</strong> or sometimes better, merging the BSS
section with the data section (<strong>-DMERGE_BSS</strong>), as BSS
are now used by default (unlike what TIGCC 0.94- did). Instead of
reserving space permanently in the binary for non-initialized globals,
the BSS support allocates a block of memory before _main is executed,
and destroys it after _main returns.<br>
This looks like a great idea (that's what most platforms do anyway,
but they usually have a MMU), but it turns out that on our platform
BSS are inefficient in practice:
<ul>
<li>Many programs that have globals large enough so that BSS might
make sense, actually do the allocation work by themselves (which
was necessary in TIGCC 0.94-).</li>
<li class="li2">Worse, due to their nature, just like kernel or
compressed relocations to RAM_CALLs and ROM_CALLs, they force using
the relocated 68000 xxx.l addressing mode, which is less efficient
speed-wise and size-wise than the non-relocated d(pc) / d(an)
addressing mode merging the BSS section with the data section often
enables to use...</li>
</ul>
As of TIGCC 0.96 Beta 4, -mno-bss / -DMERGE-BSS is compulsory in case
you want to use -freg-relative-an, as reg-relative references to BSS
are not yet supported.<br>
<em>This was a multi-kilobyte optimization on Ice Hockey 68k and a
number of other programs - and the code is very slightly faster</em> !</li>
<li><strong>-mpcrel</strong>, mutually exclusive with
<strong>-freg-relative-an</strong> as of TIGCC 0.96 Beta 5.</li>
<li><strong>-freg-relative-an</strong> (use only n=4 or 5, as a number of
routines use a2-a3; n=5 forces not to use OPTIMIZE_ROM_CALLS), mutually
exclusive with <strong>-mpcrel</strong> as of TIGCC 0.96 Beta 5.<br>
I added -freg-relative-a5 (and -mno-bss) in TI-Chess 4.12+.
Compared to without it, the overall impact on size was not significant,
as the benefit of more efficient references was compensated by the size
of some large globals, hidden for some time (since TIGCC 0.95 was used,
actually) by BSS. Nevertheless, the compression ratios jumped up. This
is also visible in Backgammon (800 bytes over ~9000 !).</li>
<li><strong>-Wa,--all-relocs</strong> for stronger linker-side
optimization. Default in a number of situations, but I always forget
which ones, and defining it more than once won't hurt anything.</li>
<li><strong>-Wa,-l</strong>. It does not always work with programs larger
than 32 KB (though it can work on significantly larger Venus, if
section reordering is disabled and files are manually reordered). Your
computer might turn unresponsive due to thousands of errors if used with
a 32+ KB program, and reordering is impossible...</li>
<li><strong>-mregparm(=n)</strong> (do not use beyond n=5 or 6, see
above). I don't remember seeing a program worsened by switching to
-mregparm, it usually saves big. For backwards compatibility, it cannot
be enabled by default in TIGCC, as TIGCC 0.93- do not feature
__regparm__ mode.<br>
<strong>CAUTION</strong>, -mregparm will turn into invalid code if you
use improperly-declared function pointers or libraries that are not
aware of -mregparm; be SURE to check whether calling conventions match,
since those bugs, which caught me more than once on TICT software, are
hard to track down, although the TIGCC debugger support (along with
TIEmu) now helps finding them.</li>
<li><strong>--optimize-code --cut-ranges --reorder-sections
--merge-constants -ffunction-sections -fdata-sections
-fmerge-all-constants</strong> (read the documentation for more
information), or their TIGCC project checkbox equivalents if you're
using a project. <strong>--reorder-sections</strong> might prevent you
from using -Wa,-l and -mpcrel in large programs, but usually improves
your program, at the expense of link time.</li>
<li><strong>F-Line instructions</strong> (ROM_CALLs, jumps) can reduce
size. Though, they turn into slightly slower code and require an
internal emulator to work on old AMS versions (very few programs cannot
work on AMS 2.03-, and those are mostly CAS additions, so you should
always use one instead of setting a high MIN_AMS). Although I fought
against them for quite some time, I've been using them in multiple TICT
programs for a while. After all, hardly any TICT program requires extreme
speed, and the difference is not too significant, unless the ROM_CALL
is small. Anyway, ROM_CALLs are written in a sloppy - and ever-worsening - way.</li>
<li><strong>-ftracer</strong> turns in faster but larger code (more
duplications). The GCC 4.0+ new speed optimization options are even
stronger with it (but watch out the size - this is why this option is
hardly usable on this platform !!).</li>
<li><strong>-fno-if-conversion</strong> may or may not decrease size, it
depends on the program.</li>
<li>new optimization options in GCC 4.0 sometimes seem to have a bad
effect size-wise or speed-wise, like <strong>-ftree-dominator-opts
</strong> (enabled by default when optimizing). <strong>However, this
may no longer be true in future TIGCC versions as the GCC 4.x
versions stabilize.</strong></li>
<li><strong>-fgcse-lm</strong>, <strong>-fgcse-sm</strong>,
<strong>-fgcse-las</strong> may help or not.</li>
</ul>
</blockquote>
<hr>
<!-- ====================================================================== -->
<!-- COMPARISON BETWEEN DIFFERENT APPROACHES -->
<!-- ====================================================================== -->
<blockquote>
<table summary="Pros and cons of design / compilation options" border="1">
<caption>Comparison between -mpcrel, -Wa,-l, BSS, -freg-relative-an,
global register variables</caption>
<thead>
<tr>
<td><strong>Type</strong></td>
<td><strong>Pros</strong></td>
<td><strong>Cons</strong></td>
</tr>
</thead>
<tbody>
<tr>
<td><strong>-mpcrel</strong></td>
<td><ul>
<li class="li2">Position Independent Code</li>
<li class="li2">usually saves space</li>
<li class="li2">works best with -Wa,-l</li>
</ul>
</td>
<td><ul>
<li class="li2">takes up an address register in a semi-permanent
way, for writes most of the time</li>
<li class="li2">doesn't work with most programs larger than 32 KB
- may require disabling --reorder-sections</li>
</ul>
</td>
</tr>
<tr>
<td><strong>-Wa,-l</strong></td>
<td><ul>
<li class="li2">saves space</li>
<li class="li2">works best with -mpcrel</li>
</ul>
</td>
<td><ul>
<li class="li2">not very powerful</li>
<li class="li2">doesn't work with most programs larger than 32 KB
- may require disabling --reorder-sections</li>
</ul>
</td>
</tr>
<tr>
<td><strong>BSS</strong></td>
<td><ul>
<li class="li2">transparent for programmers</li>
<li class="li2">work with programs larger than 32 KB</li>
</ul>
</td>
<td><ul>
<li class="li2">references are relocated xxx.l: removing BSS from
Ice Hockey 68k using -mno-bss saved ~600 relocations and more
than 2 KB, compared to kernel-style BSS references !</li>
</ul>
</td>
</tr>
<tr>
<td><strong>-freg-relative-an</strong></td>
<td><ul>
<li class="li2">usually transparent for programmers</li>
<li class="li2">optimized (d(an) accesses)</li>
<li class="li2">works with most programs larger than 32 KB</li>
</ul>
</td>
<td><ul>
<li class="li2">takes up an address register permanently</li>
<li class="li2">dirty but simple hack needed to work in callbacks
and interrupt handlers (see TI-Chess 4.12+)</li>
</ul>
</td>
</tr>
<tr>
<td><strong>global register variables</strong></td>
<td><ul>
<li class="li2">optimized (d(an) accesses)</li>
<li class="li2">work with programs larger than 32 KB</li>
</ul>
</td>
<td><ul>
<li class="li2">takes up an address register permanently</li>
</ul>
</td>
</tr>
</tbody>
</table>
</blockquote>
<hr>
<!-- ====================================================================== -->
<!-- GENERAL ADVICE -->
<!-- ====================================================================== -->
<h2>General advice that doesn't really fit elsewhere</h2>
<blockquote>
<ul>
<li>Don't worry too much about returning all the way down to _main when
you want to quit; there's nothing wrong with a call stack of _main
-> show_main_menu -> pick_choice -> main_menu_quit -> exit.
An alternate way to do that is setjmp/longjmp or errors caught within a
TRY ... FINALLY ... ENDTRY block in _main, especially if your program
uses events in some form.</li>
<li>when making savefiles, you should use a custom type of file (OTH_TAG)
and both a magic number and a version number. Never use strings, as
some versions of TI-Connect choke on them if the number of 0x00 in them
is too high. We usually use magic+version numbers for TICT programs, as
it increases stability (checking for known files and formats prevents
crashes), and it works very well.</li>
</ul>
</blockquote>
<!-- ====================================================================== -->
<!-- USEFUL TOOLS, DOCS, WEBSITES, FOOD FOR THOUGHT I'D LIKE TO MENTION -->
<!-- ====================================================================== -->
<h2>Useful tools, docs, (semi-off-topic) food for thought</h2>
<blockquote>
<ul>
<li><a
href="http://www.ticalc.org/archives/files/fileinfo/350/35077.html">Travis
Fischer (Fisch2)'s tool</a> for optimized string arrays.</li>
<li><a href="http://www.jimrandomh.org/sgt/">Jim Babcock (JimRandomH)'s
tool</a> (beta) for easier and more powerful language
localizations.</li>
</ul>
<ul>
<li><a href="http://tiwiki.etherdream.org/Accueil">TI-Wiki</a>, another
TI-68k calculators documentation, more general but way less thorough
than the TIGCC documentation, and currently written nearly entirely in
French. There hasn't been any activity on it for a while.</li>
<li><a href="http://tifreakware.ath.cx/">TI-Freakware</a> and
<a href="http://board.boolsoft.org/">boolsoft</a>, two TI-68k/TI-Z80
programming message boards.</li>
</ul>
<ul>
<li><a href="http://www.joelonsoftware.org">Joel on software</a>, a good
resource on programming style and insightful thoughts on the way the
computer industry goes.</li>
<li>Sites of the <a href="http://ostg.com/">Open Source Technology
Group</a>, especially <a href="http://slashdot.org">Slashdot</a>, <a
href="http://newsforge.com">Newsforge</a> and <a
href="http://sf.net/">Sourceforge</a>; <a
href="http://lwn.net">Linux Weekly News</a>: large news and
programming sites. Slashdot's users' comments have the reputation of
being somewhat bad, with some reason (quite many comments are rated
sub-normal), but there are always many thorough and technical comments
(+4, +5 "insightful"/"informative" in Slashdot ratings) and solutions
in them. Digg!'s unmoderated news queue (a number of damagingly wrong
news in a few months) and comments are worse... Looks like some
Slashdot ACs, trolls, kiddies have found a new haven there.</li>
<li><a href="http://distrowatch.com">Distrowatch</a>, the well-known
resource of information about the numerous Linux/BSD distributions.</li>
<li><a href="http://www.zegeniestudios.net/ldc/">Zegenie Linux Distribution
Chooser</a>, a tool to help finding a GNU/Linux distribution tailored
to your needs. It worked alright on the various scenarios several of
my schoolmates and I tested.</li>
</ul>
<br>
<p>While reading news and following the trends over several years, I got
the conviction that optimizing code in an old-fashioned way, on a
platform not so many persons care about, is rather pointless compared to
all the locking down and privacy invasions that happen on us, for the sake
of large companies, or foreign governments, to try to keep a disfunctional
system afloat...</p>
</blockquote>
<hr>
<br>
<h2>... And The Credits go to:</h2>
<ul>
<li>First, obviously, the TIGCC team for the TIGCC development environment.</li>
<li>The many proofreaders of this tutorial, especially Jim Babcock
(JimRandomH) and Travis Fischer (Fisch2) for their comments of
additions.</li>
<li>My schoolmate Yoann for making me aware of the beauty and power of
CSS, although I don't currently know too much about it.</li>
<li>*HTML tools, all of them usable under <a
href="http://www.debian.org/">Debian</a>-based <a
href="http://www.mepis.com/">Mepis GNU/Linux</a> but not necessarily
under Windows XP:
<ul>
<li>the powerful free <a href="http://www.nvu.com">Nvu</a> and <a
href="http://bluefish.openoffice.nl/">Bluefish</a> graphical
editors, and "lightweight" (?) <a
href="http://www.scintilla.org">SciTE</a>. No, vi, emacs and
derivatives are not text editors, nor will ever be ;-P</li>
<li class="li2">the <a href="http://www.w3c.org">World Wide Web
consortium (W3C)</a> tools Tidy (used through Bluefish and
as a <a href="http://www.mozilla.org/products/firefox">Firefox</a>
plugin) and <a href="www.w3.org/Amaya">Amaya</a> to check the
validity and accessibility of this page (Firefox is more
standards-compliant when rendering pages than Amaya though).</li>
</ul>
MEPIS is a rather popular GNU/Linux distribution, thanks to its ease of use.
Yes, it contains several non-free programs, most of which have good
free equivalents, with the notable exception of the unmatched binary
drivers for graphic cards, which are fast and support recent models...<br>
My main PC has been running MEPIS >95% of the time for about two years
and a half.
</li>
<li>... and <a href="mailto:lionel_debroux@yahoo.fr">Lionel Debroux
(me)</a> for writing this tutorial.</li>
</ul>
<h2>Contact TI-Chess Team Members</h2>
<ul>
<li class="li2">You can reach Thomas Nussbaumer at <a
href="mailto:thomas.nussbaumer@gmx.net">thomas.nussbaumer@gmx.net</a></li>
<li class="li2">Marcos Lopez (retired) can be reached at <a
href="mailto:marcos.lopez@wol.es">marcos.lopez@wol.es</a></li>
<li class="li2">You can reach Lionel Debroux at <a
href="mailto:lionel_debroux@yahoo.fr">lionel_debroux@yahoo.fr</a></li>
</ul>
<blockquote>
<p>Check the TICT HQ Website at <a
href="http://tict.ticalc.org">http://tict.ticalc.org</a> for more tutorials
and software.</p>
<p>More useful tips, tricks and hints can be found at our messageboard at:
<a
href="http://p080.ezboard.com/btichessteamhq">http://p080.ezboard.com/btichessteamhq</a>.</p>
<p>Suggestions, bug reports and similar are welcome (use our messageboard
for this).</p>
</blockquote>
<h2>How to thank the author ?</h2>
<blockquote>
<p>The usual: please give credit in your programs, and use the
messageboard.</p>
</blockquote>
<h2>Copyleft</h2>
<blockquote>
<p>This documentation and the accompanying stylesheet may be distributed
by any other website.</p>
<p>The author makes no representations or warranties about the suitability
of the software and/or the data files, either express or implied. The
author shall not be liable for any damages suffered as a result of using or
distributing this.</p>
<p>You are free to re-use any part of the sourcecode, and we'd like it if
you gave credits including a reference to the TICT-HQ (<a
href="http://tict.ticalc.org/">http://tict.ticalc.org/</a>).</p>
</blockquote>
<hr>
<em>Lionel Debroux, France, 2004-2008</em>
</body>
</html>