-
Notifications
You must be signed in to change notification settings - Fork 43
/
CHANGES
1373 lines (1232 loc) · 60.2 KB
/
CHANGES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
2.0 November 2022
=====================
Notable changes and features
----------------------------
- Added support for LLVM 15.
- Experimental RISC-V support:
- Generate RISC-V ASIPs with custom instruction set extensions
(see the RISC-V Tutorial in manual for usage).
- New Operation Triggered Architecture (OTA) formats that map to the formats
of the RISC-V ISA.
- New operations added to OSAL for RISC-V
- oacc-riscv compiler driver that is able to adapt to RISC-V
custom instructions.
- Toolset name change:
- To reflect the wider ISA support than TTAs, the "TCE" name is now changed
to "OA" (OpenASIP) in most of the places and the old name phased out.
- Tools such as the compiler "tcecc" have been deprecated and replaced with
"oacc". To keep backwards compability, the old tools still work.
- FUGen:
- Automated function unit RTL implementation generation feature: Hardware
is generated automatically if an operation has a DAG description that
can be resolved to an "HDL snippet".
- Zero-register support:
- Since RISC-V needed it, a new register file attribute "zero-register" was
added to ADF. This indicates that the first index of the register file is
hardcoded to a zero value and thus can be utilized as the zero immediate
value.
- DAG syntax changes:
- New "Var" and "OP" keywords that can be used alongside "SimValue"
and "EXEC_OPERATION".
- AlmaIFV2:
- Hardware generation support for AlmaIFV2 was added.
- Read more on the following publication:
Topi Leppänen, Panagiotis Mousouliotis, Georgios Keramidas, Joonas Multanen, Pekka Jääskeläinen:"Unified OpenCL Integration Methodology for FPGA Designs",
in NorCAS 2021: IEEE Nordic Circuits and Systems Conference
- RTL generation code has been unified.
- oacc --data-start can be used to redefine the global data
starting address in the default address space.
- Program Image Generation supports Bin2n-format, which is a binary format
where each word has been zero-padded to the next power-of-two. Used for
AlmaIF instruction memory region.
- The license has been changed to LGPL v2.1.
Notable bugfixes
----------------------------
- GUIs: Patches to GTK assertion errors on (mostly) non-Ubuntu systems.
Fixes Github issue #140. Thanks to Sarah Clark of Google.
1.25 June 2022
=====================
Notable changes and features
----------------------------
- Added support for LLVM 14.
- TDGen: Register-file marked as 'reserved' will not be used by LLVM
RegAlloc anymore. This is needed for the upcoming inline asm named
reg support.
- Initial partial inline assembly support. It supports GNU's extended
assembly constructs such as local register variables and clobbers
in C code. Consequently, this enables clobber and physical register
constraints in LLVM IR.
- Experimental hardware loop compiler support based on LLVM's generic
hardware loop pass.
Notable bugfixes
----------------------------
- DDG: Operations mapped to different FU should not have false-dependency edge
1.24 October 2021
=====================
Notable changes and features
----------------------------
- Added support for LLVM 13.
- Experimental tool support for Blocks-CGRA [1] designs. Currently
tooling is limited to translation of VLIW like Blocks (xml representation)
to ADF file and re-using TCE compiler for Blocks codegeneration.
[1] Wijtvliet, M. (2020). Blocks, a reconfigurable architecture combining
energy efficiency and flexibility. Technische Universiteit Eindhoven.
- Support for wxWidgets 3.1
Notable bugfixes
----------------------------
- tcecc: did not disable loop unrolling properly, even if asked to do so.
- OSEd: Element-width for SIntWord was fixed to 32.
- ProDe: Added an operation even when its operand width exceeded the socket
width.
1.23 June 2021
=====================
Notable changes and features
----------------------------
- Added support for LLVM 12.
Dropped support for LLVM versions older than 11.
Notable bugfixes
----------------
- Compiler instruction scheduler does not any more
assume an FU state dependency between operations
if the operations are forced to different FUs
(thus modify different states).
Fixes GitHub issue #114.
1.22 December 2020
=====================
Notable changes and features
----------------------------
- Added support for LLVM 11.
Dropped support for LLVM 3.5, LLVM 3.6 and LLVM 3.7.
- Experimental support for 64-bit little-endian processor designs.
Please note that the support is at the architecture level yet;
compilation and simulation is implemented, but RTL generation is
still incomplete (HDB for 64b FUs not yet added).
- New instruction scheduler which is much more aggressive at
performing software bypasses, which reduces register pressure.
Creates considerably faster code with processor architectures with
small number of register file ports.
The old instruction scheduler can still be used by giving --td-scheduler
parameter to tcecc.
- Compiler backend plugin compilation speedup by over 3x by creating a
pluginwrapperfile that includes multiple sourcefiles such that commonly
included files are linked and compiled only once for all the sourcefiles
in the wrapper.
- Upgraded Python scripts to version 3. Python 2 no longer supported.
Notable bugfixes
----------------
- Stack spill alias analysis now should finally work.
1.21 March 2020
=====================
Notable changes and features
----------------------------
- Added support for LLVM 10.
- Brought back the alias analysis which knows that spills of variables
into stack cannot alias with other memory operations.
This was working on some old TCE versions and ancient LLVM versions,
but has not been supported with the recent LLVM versions.
- 8-bit loads are no longer needed in a minimal machine; they can be
emulated with 32-bit loads with masking and shifting. Of course, this is
typically very slow and naturally should be used only in cases where
they are not needed in performance-critical parts of the applications
of interest.
- tcecc can now emulate 16-bit aligned loads, but emulation of 16-bit unaligned
loads not yet supported, so all code will not compile if 16-bit loads are
left out from the machine (a work in progress).
- Both sign-extending and zero-extending loads are no longer needed in a
minimal machine; tcecc can now use either one to emulate the other
one (naturally with a performance penalty).
- The LLVM installer scripts for LLVM 9 and LLVM 10 updated to use the git
repo instead of svn to download the llvm sources.
1.20 October 2019
=====================
Notable changes and features
----------------------------
- Modified guard evaluation in the simulation model and the guarded register
file implementations. Guard now evaluates as true if and only if the least
significant bit stored in the register is 1.
- Support for LLVM 9
- Added new 1-bit shift instructions (shl1_32, shr1_32 and shru1_32) and
compiler support to use sequences of these for all shifts. This allows
compiling any C code to architecture without barrel shifter.
For static shift amounts, also sequences of shift instructions that
shift by multiple bits can be supported, if they are named like
shlN_32 where N is the number of bits shifted and have the DAG set.
Also left shift is now optional, can use multiple additions to achive
left shifting.
- Support for LLVM 9
Notable bugfixes
----------------
- Fixed broken software floating point conversion from unsigned int to float.
- Fixed instruction scheduler testbench to also work with little-endian ADFs.
Cleanups
--------
- Remove deprecated dynamic exception specifications.
1.19 April 2019
=====================
Notable changes and features
----------------------------
- Support for LLVM 8
Usability features
------------------
- ProDe connection tool now tries to more intelligently figure out the
direction of a previously unconnected socket based on the connected
FU port or RF port, from the bound FU operands or the RF port name.
- ProDe implementation selector dialog now shortens the HDB paths for
easier readability.
1.18 September 2018
=====================
Notable changes and features
----------------------------
- Support for LLVM 7.0.
- Added hexadecimal output to PIG.
- Added HDB with register files and a basic ALU optimized for Xilinx Series 7
devices. Thanks to Stephan Nolting and Guillermo Payá-Vayá / IMS, Leibniz
Univ. Hannover for the contribution of the shifter.
Notable bugfixes
----------------
- Bugfixes related to handling of long immediate units with
sign extension. Values to these could break in case multiple
instruction templates could write to these.
- Fix support for some math library routines (ceil, floor, round, exp2)
1.17 March 2018
=====================
Notable changes and features
----------------------------
- Support for LLVM 6.0.
Usability features
------------------
- Sane defaults for OSEd GUI text editor.
Misc.
-----
- Clarified the TCE tour text.
1.16 September 2017
=====================
Notable changes and features
----------------------------
- Support for LLVM 5.0.
- Support for little-endian TTAs.
- VLIWConnectIC: An explorer plugin which creates a VLIW-like interconnection
network and creates a separate RF for each distinct bus width.
Usability features
----------------------------
- Proxim: Added ability to search for a certain pattern in dissasembly window
The feature can be accessed by pressing ctrl-f in the main window.
- ProDe/Proxim: Zoom in/out the machine canvas with mouse wheel.
- ProDe: Improved search in 'Add FU From HDB'-dialog.
- OSEd: Fixed OperationDAGDialog scaling at resize time.
Notable bugfixes
----------------
- ProDe: fix crashes in "Add from OSAL"-dialog.
- ProDe: Address space dialog now displays max-address correctly with 32bits.
- Fixed execution of TCE tools from the build tree in Debian Stretch.
1.15 March 2017
=====================
Notable changes and features
----------------------------
- Support for LLVM 4.0.
Other features and improvements
-------------------------------
- Bus trace format is changed to comma separated values listing. Bus
values are displayed in hexadecimals and more than 32 bits.
- GHDL test bench scripts updated to work with its latest version.
Notable bugfixes
----------------
- ttasim: Operation state is now reset with 'kill' command.
- tcecc: Fix alias analysis or separating RA save/restore on function.
- tcecc: Register renamer did not handle frame pointer correctly.
1.14 November 2016
=====================
Notable changes and features
----------------------------
- Support for LLVM 3.9.
- Support for wxWidgets 3.x and thus Ubuntu 16.04 which doesn't ship wxW 2.x
anymore. Check the TROUBLESHOOTING file in case encountering problems.
- Support for variable-length local arrays and alloca, that is, dynamic
stack objects. When using dynamic stack objects, the architecture must
have an additional 32-bit register (total minimum of 6) as one of them
will be used as a frame pointer in functions with dynamic stack objects.
- ProGe now can generate a simple control interface "AlmaIF" that can be
used for control and debug access to TCE processors integrated in SoCs.
- Added source code debugging window to Proxim.
Other features and improvements
-------------------------------
- TCE_INSTALL_DIR environment variable can be set pointing to directory where
user manually installed TCE.
- Added ability to modify memory in simulator with load_data command.
Usability features
------------------
- HBDEditor automatically fills new implementations with sequential opcode
numbers.
- Made IDF's more portable: In case referring to the TCE-shipped HDB files,
a magic string tce: can be used instead of absolute paths which differ from
the system to system.
- SVG export was added when using wxWidgets 3.0 (replaces the EPS export)
- ProDe shows machine instruction width in status bar.
- ProDe now shows more information of the operations in the opset dialog.
- Column/list sorting in various ProDe/OSEd dialogs.
- List filtering in ProDe dialogs 'Add from opset' and 'Add FU from HDB'.
- Directory where ttasim dumps simulation traces can be changed using an
environment variable TTASIM_TRACE_DIR.
Documentation
-------------
- Added descriptions for all non-obvious base.opp operations.
Notable bugfixes
----------------
- If the processor had narrower than 32-bit buses that can transport
immediate values, broken code could have been generated
1.13 February 2016
====================
Notable changes and features
----------------------------
- Support for LLVM 3.8
- Semantics of extraBits in BEM changed: when the encoding is zero,
the encoding size is now calculated as 0 and one more extra bit is
needed than before to create the same encoding.
When new bem files are generated, this one more
extra bit is created. BEM files with the new semantic have version number
of >= 1.2. When an older BEM file is loaded, the amount of extrabits
is automatically converted to the new format.
Loading a new BEM file with older version of TCE is not supported.
- Support for LLVM 3.8
Notable bugfixes
----------------
- Bus having only one source or destination socket which has a subfield
(trigger opcode index, register index, or immediate field) and no
nop encoding (nop encoding done by always false guard)
no longer loses the index field in the instruction encoding.
1.12 September 2015
====================
Notable changes and features
----------------------------
- Support for LLVM 3.7
Other features and improvements
------------------------------
- Operation definitions can be now overridden one-by-one
by redefining them in a local OSAL search path.
- Register files can be implemented using synchronous SRAMs. Included one
RF implementation using Xilinx's BRAMs.
Notable bugfixes
----------------
- Basic block with only call may have crashed the compiler
- If converter broke with floating point immediate values
- Fixes to FMA unit implementations.
- ICDecoder sometimes generated VHDL syntax when should have generated Verilog
1.11 March 2015
====================
Notable changes and features
----------------------------
- Support for LLVM 3.6.
- tcecc: Support for the "address of label" extension:
http://blog.llvm.org/2010/01/address-of-label-and-indirect-branches.html
- tcecc: Support for using SUB to flip the sign of constants in case the
constant cannot be encoded directly.
- HDBEditor: External ports and additional parameters can be defined for RF
implementations.
- ProGe: RF's external ports can be now generated.
- tcecc --init-sp: force the initial stack pointer value
- ProGe: DefaultICDecoderPlugin generates better VHDL code for code coverage.
- ProGe: Simulating processor can now generate bus trace variant that excludes
all locked states. The new variant is enabled when the regular bus trace is
enabled in the DefaultICDecoderPlugin.
- MachInfo prints bindings between ADF ports and OSAL operations' operands
in the function unit table.
- Clustered TTA mode compilation dropped when using LLVM 3.6.
Other features and improvements
------------------------------
- TestOsal: test context bitwidth attribute has been removed, since Operands
know their own width. Also, values that are printed in hex format are printed
using Operand's whole width, including leading zeroes.
- osed: Operand type can be set to Bool.
- ProGe: Generated HDL output files now have registers in deterministic order.
- Improved code coverage for default ic decoder plugin.
- Added verilog implementation to output global lock trace.
- The way SimValue stores values has been changed from little-endian to
big-endian convention. This doesn't affect the way how SimValue should be
used.
- Transfer of immediate 1 into boolean register no longer uses
long immediate when it's not needed.
- Better descriptions on some OSAL operations.
- Added helper script generatebustrace.sh under tools/scripts/ that generates
bustrace from given ADF and TPEF.
- tcecc: More user friendly error messages for several failures caused by
unsupported ADFs.
Notable bugfixes
----------------
- Fixed ProGe not generating valid verilog code for core with IUs.
- Fixed PIG failure if ADF had 2+ IUs one having size of one and others size
of one plus.
- SystemC simulation hook crash fixes.
- Fixes to FMA unit implementations.
Usability features
------------------
- ttasim: register values are always printed in hex format for the whole width.
- buildopset: use LDFLAGS environment variable to fix linking in 3rd party libs.
1.10 September 2014
====================
Notable changes and features
----------------------------
- Support for LLVM 3.5.
- machinfo: a tool for automatically printing out documentation of
the designed ADF in LaTeX.
Other features and improvements
------------------------------
- Compilation times improved.
- Functions marked as 'noinline' are not removed from the final program
even though the program itself does not refer to them.
- Template Slots can be edited in ProDe's Intruction Templates-dialog.
- Added select operation that can be used for select(?) operator
instead of conditional moves.
- Added patterns which can do select(?) operator without conditional moves
or select operations with series of ors ands etc.
- Better error message when trying to compile to adf with broken port bindings.
- Build fixes for latest Mac OSX versions.
- Support for calling custom operations to be executed in any FU with given
addressspace (e.g. _TCEAS_LDW("#1", addr, result) where 1 is the address
space numeric identifier), or _TCEAS_LDW("data" where data is the name
of the address space.
- SLEEP operation that locks the core until an external signal has been
asserted.
- Always false guard can be used for encoding NOPs, potentially saving
bits in source and destination fields.
- Processor may not have ldq and ldh instructions and compiler
can then compile code which does not load 8/16 bit values.
Previously these operations were needed even when not used.
- C++11 compiling mode can be manually enabled by giving --enable-cxx11 for the
configure script. The mode is always enabled if LLVM version is 3.5 or higher.
- Default ic decoder plugin can print a separate global lock trace.
- Fixed issues spotted by compiling TCE using Clang++ 3.5.
- Added a SLEEP operation to the base operation set.
- tcecc no longer asserts/breaks if LDQ/STQ or LDH/STH is missing.
It, however, cannot emulate these, so only code which
does not load/store 8/16-bit values will compile if these are missing.
- tcecc: Lacking some immediate width no longer force one register to be
wasted for regcopyadder.
Code generator improvements
---------------------------
- Generation of conversions between half float and integers automatically.
Usability features
------------------
- ProDe: Bus-socket -connection can be edited with single click in the
Edit Connections mode.
- ProDe: When unit details are not printed, the unit label is printed
with a larger font for readability. The unit details toggle button is
now added by default to the toolbar.
- ProDe: The units are now rounded in case they are function units,
"more rounded" in case of a control unit, rectangles in case
of register files, and trapezoids in case of immediate units.
Also coloured: FU blue, LSU green, CU purple, IMMU orange, RF yellow.
- ProDe: In the non-unit-details mode only the unit's name is printed.
This enables making the unit visualization (and thus the whole processor
visualization) smaller.
- ProDe: Wider ports, sockets and buses look wider, narrower look narrower.
- ProDe: Units, ports and buses are closer to each others in the main view.
- tcecc: More user friendly error printouts when lacking immediate
capabilities or when running out of imem.
Notable bugfixes
----------------
- Fix PIG and ProGe when generating bits or RTL for a machine with a bus
(slot) that does not have any connections.
- Having volatile variables could have caused broken code being generated.
- Compiler sometimes refused to schedule to processors with relatively
sparse connectivity but no need for register copies.
- Wrong number of parameters in osed dags might now have reasonable error
message instead of crash.
- Boolean return values no longer cause compiler to fail.
- PIG generates initalization data also for variables initialized to zero.
- ProGe: Reset also immediate unit control signals.
1.9 January 2014
====================
Notable changes and features
----------------------------
- Support for LLVM 3.4.
- OSAL Operation: optional element-count and element-width attribute fields
added for in/out operands in operation schema. From now on operands have
information of their subword width and subword count.
Code generator improvements
---------------------------
- Support for vector comparisons with the vectorbackend.
Misc. smaller improvements
--------------------------
- Fixed the TCE sources to compile with Clang++, a lot of new warnings
exposed by it fixed.
- Standalone OpenCL support: More OpenCL API functions implemented;
support added for zero-copy buffers and reading from and writing to
buffers by manipulation of buffer pointers. Also reduced kernel call
overhead.
- Explorer: HDB files which are given with -b switch are now searched under
current working dir and default hdb search paths.
- OSEd: element width and element count of an operand are now displayed in
operation property dialog, and these fields can also be modified for input
and output operands.
- all memory accesses to volatile variables now happen in original order
- TCE code base builds and works now in g++'s C++11 mode (but does not yet
require it!) and Clang++
- Make the simulation function symbols of the compiled simulator unique
per compiled simulator engine instance so one can have multiple of them
in the same process to simulate (heterogeneous) multi-TTA setups.
Bugfixes
--------
- ldh_ldhu_ldq_ldqu_ldw_sth_stq_stw.vhdl did not react to glock properly but
updated a new value too early to the output.
- ttasim -q now simulates half float programs correctly
- some half float comparison routine fixed
- bottom-up-scheduler caused broken schedule with function units with
pipeline resource usage after last port usage (for example stores with
less than store op/cycle bandwidth)
- FPU's were buggy with numbers close to 2.
1.8 June 2013
===================
Notable changes and features
----------------------------
- Support for LLVM 3.3.
- Removed the use of llvm-ld (and its copy in the TCE source tree) for linking
bitcode libraries. It now uses llvm-link instead. Beware: it might require your
Newlib bitcode libraries to be rebuilt.
Code generator improvements
---------------------------
- If conversion enabled in llvm side of the compiler backend.
- Post-pass operand sharing removes some unnecessary operand writes
after instruction scheduler. Does not improve performance, only saves power.
- Computation support for half-precision floats.
- The trigger operand can be now changed on the fly for commutative
operations in case the schedule benefits from it.
Misc. smaller improvements
--------------------------
- Some ALU vhdl implementations in default HDB optimized to be smaller.
- Instruction decoder optimized to be smaller and faster.
- Added shl1add and shl2add operations for faster array indexing calculations
into base operation set and some ALU implementations.
- Estimator can print out info of the found and unfound cost data
with -v (thanks to Jani Boutellier).
- Fixed some issues when using libtce through a dlopen (icd loader or some other
"plugin interface"). When TCE loaded the operation behavior descriptions as
plugins, not all symbols of libtce they needed were found. Now tries to link
the OPBs against libtce whenever possible to mark the dependency.
- generatebits: -v (verbose) parameter now prints amount of full instruction
NOPs and amounts of consecutive 2, 3 and 4+ full instruction NOP groups
- Support for Boost versions up to 1.53.0 (1.42.0 is the oldest
tested version).
Misc. changes
------------
- -lcpp parameter needed when compiling/linking c++ programs.
- tcecc: --emit-llvm not the default anymore
- tcecc: --sequential-schedule now passes -O0 to the llvmtce only to
enable aggressive LLVM optimization combined with a sequential
schedule.
Hardware Database Additions
---------------------------
- 2-wide SIMD units for half floats
Usability features
------------------
- HDBEditor: Double-clicking some lists in the "Add Implementation"-dialog
now opens editing the list item. (no need to press the "edit"-button)
- ProDe: User can add guards for all indices of a register file at once in the
"Register File Guard"-dialog
- ProDe: Multiselection and deletion enabled for register file guards in the
"Bus"-dialog
- ProDe: Added a UI element to assign IDs for address spaces in the
"Address Space"-dialog
- ProDe: Ordering of register file guard list changed to be more user
friendly in the "Bus"-dialog
- ProDe: When .idf is saved to a file, all file paths (.hdb, .vhdl, etc.)
pointing under the current working directory are made relative paths,
meaning that the absolute path above current working directory is cut off.
- ProDe: When loading an IDF file, implementation files are checked if they
exist under absolute path or current working dir. Files that are not found,
are searched under default search paths and if a file is found from there,
the file path is fixed to that location. User is prompted about this.
- ProDe: Cached HDB files are now monitored for changes, and they update
correctly to the list of FUs/RFs that can be added from HDB.
- ProDe: When saving IDF, default save location is the directory of the ADF.
- ProDe: Automatic implementation selection added to "Processor
Implementation"-dialog.
Bugfixes
--------
- ProGe/PlatformIntegrator: imem_mau_pkg.vhdl is now prepended with top
level entity name like the actual package. (bug: #1063667)
1.7 January 2013
====================
Notable changes and features
----------------------------
- Support for LLVM 3.2. Dropped support for LLVM 3.0.
- Initial support for half-precision floats.
- OpenCL host-tta-device mode simulation using pocl's ttasim driver.
Misc. smaller improvements
--------------------------
- GrowMachine explorer now copies the guards when it duplicates
buses.
- Several vector operations such as vector load/stores added to the
base operation set.
- Added support for vectors of 8- and 16-bit numbers in the vector backend.
- tcecc now compiles big basic blocks faster.
- tcecc --analyze-instruction-patterns dumps the operation graph just
after instruction selection. This is helpful in finding custom operation
candidates that are automatically selectable.
- ConnectionSweeper now starts from the loosest worsening threshold
and makes it gradually more strict. This leads to finding the
least-connected machines first. Also the default sweep mode is
more coarse grained: it tries to remove all connections (RF or
bypasses) of a bus at once, not one by one.
- explore: can now list plugin parameter descriptions (-p).
- Parts of the user manual revised by Dr. Erno Salminen. Thanks to Erno
for his contributions.
- Added a RawData data type to OSAL. This should be used as operand data type
for all load/store/io methods which just transfer raw data which can be of
any type. If using some other data type, data curruption may occur when
calling these operations as custom operations.
- DefaultDecoderGenerator produces slightly smaller and faster decoders.
- Added optimized ALUs to the asic_130nm_1.5V database.
Code generator improvements
---------------------------
- min, minu, max, maxu, minf, maxf support in codegen.
Now the compiler can use these operations automatically, if
available in the architecture.
Usability features
------------------
- HDBEditor: Source files section of FU Implementation dialog has now
Move Up/Move Down-buttons for rearranging the (compilation) order
of the files. Thanks to Antti Häyrinen for this contribution.
- ProDe: Editing processor implementation allows double clicking on the
component lists to get to the "select component implementation" dialog,
no need to press the select FU/IU/RF buttons anymore.
Also double clicking implementation in the dialog closes the dialog.
- ProDe: When editing processor implementation, pressing close button
no longer loses all unsaved idf data. If there is unsaved data,
it asks if the user wants to save the idf.
- ProDe: Selecting implementation for FU or RF no longer causes the RF or FU
list to scroll to the beginning.
- Pressing Close on OsED DAG editing dialog no longer loses unsaved data,
it saves all unsaved data.
Major bugfixes
--------------
- __attribute__((aligned)) now works for global variables.
- register to itself move as only instruction in llvm bb could cause
tce compiler to fail.
- explorer: The -f switch to pass compiler options to exploration took only
the first argument. Now takes the whole command line (e.g.: -f'-a -b -c').
- multiple address spaces with reduced connectivity could cause broken
code to be generated or compiler to abort.
- Operations with more than 3 inputs could cause usage of freed memory,
causing compiler failures
- TCEFU macros did not work correctly with memory operations,
the operation could end up in wrong LSU.
- Major simulator memory leaks fixed. Now very long automated DS explorations
are feasible again.
- Failed bypassing could cause compiler to fail on sparsely connected
machines.
- signed immediate with equal amount of bits to insruction address space
(1 too few becauses of the sign bit) could cause broken code
(negative jump addresses) to be generated
- If long immediate template consisted of multiple slots, long immediate
encoded instruction references were updated incorrectly to the
instruction bit image during the image creation.
- OSAL DAGs with constants work a bit better, smaller change
to get strange error of invalid pattern.
Misc.
-----
- Removed the deprecated schedule binary and the old scheduler configuration
file framework. Now programs are always compiled with tcecc and all
compiler options are given as command line parameters.
Known problems
--------------
- If more than one read port or more than one write port of an RF
is connected to same bus, but there are other buses which connect only
either of those ports, loading the compiled program may fail or hang.
Workaround: either reduce connections so that all buses connect to
a maximum of only one read port of any RF and one write port of any RF
(it may still have connections to many ports in different RF's), or
add connections so that all buses that connect to some read port or some
write ports in some RF connect to all read or write ports.
1.6 June 2012
=====================
Notable changes and features
----------------------------
- Support for LLVM 3.1.
LLVM 3.0 *might* still work but is unsupported and some features
do not work with llvm 3.0.
Dropped support for LLVM 2.9.
- ProGe: Experimental support for Verilog and a small Verilog hardware
database (mixed_hdl.hdb) supporting minimal.adf. Thanks to
Vinogradov Viacheslav for contributing this!
- Support for the address_space attribute to allow using multiple
separate memories from C code.
- Simplified C++ interface to the processor simulation engine for making it
easier to build C++-based system simulation models.
- ADFCombiner: an explorer plugin to generate clustered-style TTA
machines from two input ADFs.
Usability features
------------------
- dumptpef -m has now more user friendly output (thanks to Kalle
Raiskila).
- OSAL files are searched also from environment variable TCE_OSAL_PATH.
Smaller features
----------------
- OSAL.hh: FU_NAME macro for accessing the name of the function
unit the operation is executed in. The standard STREAM operations
now use filenames FU_NAME.in and FU_NAME.out instead of the fixed
names. This allows easier simulation of TTAs with multiple
I/O FUs. By Jani Boutellier from University of Oulu.
- Allow program.ll (a fully linked LLVM assembly file) as an option for
program.bc as an input file for the DSExplorer applications.
Experimental features
---------------------
- Support for vector input to the compiler. Code is generated from
LLVM vector instructions by combining multiple TCE registers into vector
registers and mapping the incoming LLVM vector regs to them.
Allows using wide loads (ldw2, ldw4, ldw4) and stores (stw2, stw4, stw8)
to load and store these vectors. This can be enabled with parameter
--vector-backend. This feature requires prededefined register file naming
to identify the "lanes", as generated with the ADFCombiner clustered machine
generator plugin. Works only with LLVM 3.1.
- Added experimental bottom-up-instruction scheduler. Can be enabled with
--bottom-up-scheduler command line parameter.
Misc. code generator improvements
---------------------------------
- Immediates marked as rematerializable; This should reduce spilling as
values that come from immediates are not spilled to stack.
- Registers can now be renamed to different RF during scheduling.
This is however not yet used to reduce number of register-to-registe copies
on sparsely connected machines.
- Compiler can now make floating point constant values to be used from
immediates instead of always putting them to constant pool.
- No longer floating point aliases for 32-bit registers on llvm backend;
The same registers now belong to both i32 and f32 register classes.
Noteble bugfixes
----------------
- tcecc: Missing antidependence edges could cause broken schedule when
bypasser used on partially connected machines.
- ttasim: Try to simulate machines with unconnected ports (only a warning
is printed in such cases).
- tcecc: some scheduling errors didn't fail gracefully
https://bugs.launchpad.net/tce/+bug/894816
- hdb: fpu_sp_mul.vhdl didn't freeze the pipeline with glock
https://bugs.launchpad.net/tce/+bug/942551
- Removed one broken fu architecture from hdb;
There was no implementation for it anyway
- ttasim: In compiling mode failed to generate simulation code for
operation with some dags.
- compatibility fix for Fedora 16's default libedit
(thanks to Vinogradov Viacheslav)
- tcecc: Scheduling of operations with more than 2 inputs sometimes failed.
1.5 December 2011
====================
Notable changes and features
----------------------------
- Support for LLVM 3.0.
LLVM 2.9 might still work but is unsupported.
Dropped support for LLVM 2.8 and older.
- Experimental OpenCL C Embedded Profile support in offline compilation
mode (we call it the OpenCL "standalone mode").
- tcecc: Floating point emulation code is not included by default anymore,
use --swfp in case you use floating points and your machine does
not support them.
- bclib: added a Light Weight PRinting library. Small functions useful
for debug printouts.
- Support for calling custom operations to be executed in specified
function unit (e.g. _TCEFU_ADD("ALU8", A1_Cb, A1_Cr, result2)).
Thanks to Hervé Yviquel for the patch.
- Generalizations to the architecture description format to allow
using the instruction scheduler for operation triggered architectures.
The Cell SPU is the proof of the concept architecture which can be
scheduled for out-of-the-box with LLVM 3.0 (see tcecc-spu).
Bugfixes
--------
- HDB: Hardware bug fix for load-store units in hibi_adapter.hdb and
stratixII.hdb.
Global lock signal might cause pipelined load result to be ignored.
- tcecc: Scheduler could sometimes fail to schedule on sparsely connected
machines.
- OSEd: OSEd crashed when selecting an operation for which there was no
simulation model in an otherwise valid .opb.
- OSEd: Reload modified simulation functions from a rebuilt simulation
function module (.opb) (Bug 179).
- OSEd: Renaming an operation might cause osed to crash.
- OSEd: Checking of operation with same name already exists was broken.
- OSEd: When new operation is created, the DAG can now be edited immediately
without false error messages about missing outputs.
- ProDe: Bit width calculation of address spaces was incorrect if max-address
was power of 2.
- tcecc: standard libcalls are now converted to cheaper ones again using
the llvm -simplify-libcalls (e.g. printf("foo\n") -> puts("foo").
This was broken due to adding -fno-libcall switch as default. Now it's
added only while building Newlib.
- Build system: Fixed build when --as-needed is used as link flag by some of
the libraries.
- 1-bit global constants had invalid size calculation. This could cause
compiler to fail to write program.
- tcedisasm: the starting address of data section initialization
output was computed wrongly.
- generatebits: MIF data image output had a rounding error which led to
missing data words at the end of image in case the number of words was
not divisible with the row width.
- OSAL DAG language had broken illegal recursive dag detection, which
resulted some legal DAG's not to be used. This happened in cases
where same operation was used multiple times inside a dag.
- OSAL: Added check of operation DAGs which do not write to some output
operand. Refuse to load this kind of broken DAGs.
- OSAL DAG language could not recursively use smaller patterns as
part of bigger patterns in instruction selection.
- tcecc: fail with an error in case the compiled program uses dynamic
stack objects (not yet supported by tcecc) instead of silently
producing invalid code.
- Do not save the backend plugins to disk while running the design
space exploration. This caused disk space fillup with long explorations
and small hard disks.
- Proxim: clicking OK on the options dialog crashed Proxim in case a simulation
was not initialized.
- ProDe: fixed a crash when checking programmability on a machine with more
than 1 boolean registers/no boolean register files.
- tcecc: On some platforms an exception thrown when a symbol (usually a
from a call to a function not linked in) crashes at LLVM/TCE library
boundaries. Moved the exception handling closer to the call position
to produce a graceful error message printout for this case.
- Proxim: the configuration file was not saved to the correct location in
the user home dir.
- tcecc: fixed an issue compiling multiple source code in the same
command line with the same basename (but with a different suffix or
directory).
Code generator improvements
---------------------------
- Introduced jump with negative guard to llvm. This makes llvm's
BranchFolding pass to generate more sane CFGs, and should result
in slightly better code being generated.
- Can rename registers during scheduling
- Does not save return address to stack in leaf functions.
- Alias analysis of LLVM is now exploited in DDG building to improve
parallelization.
- TCE instruction scheduler CFG is now generated directly from the LLVM CFG.
The old "builder" that builds the TCE CFG from a "flat" program
representation can still be used with --old-builder parameter, but this
disables also some other new features and will be removed in the next
TCE release.
- A major reorganization of the phases in the compiler backend. The memory
consumption of the compiler should be now smaller, but compile time for
small programs longer. NOTE: The old scheduler configuration system is now
deprecated (not used with the default tcecc options) and will be removed in
the next release.
Smaller features
----------------
- ProGe: switch -s that can be used to define a separate directory for
files that are potentially shared between multiple TTA processors in
the same (heterogeneous) TTA multicore design.
- ProGe and PIG: the string given with --entity-name is now used to make
the generated VHDL entity etc. names unique to allow easier
instantiation of multiple TTA cores in designs.
- tcecc: support for LLVM assembly files (.ll) as input.
Thanks to Hervé Yviquel for the patch.
- ProGe: test bench generation is now disabled by default, use '-t'
to generate the test bench.
Thanks to Hervé Yviquel for the patch.
- ProGe: HDL-file compilation order in Modelsim compilation script is
now fixed. Thanks to Vinogradov Vyacheslav for the patch
- tcedisasm now outputs to filename.tpef.S by default.
- generate_cachegrind now uses line numbering and counts NOPs per
instruction in case an assembly file is present as foobar.tpef.S.
- Generatebits prints out info about the imem usage and instruction compression
with the verbose flag (-v).
- tcecc: added switches --bypass-distance, --bypass-distance-nodre and
--no-kill-dead-results to control the software bypassing aggressiveness and
the dead result elimination.
- explorer: added switch --compiler_options="XYZ" to pass XYZ to tcecc when
calling it from during exploration.
Usability features
------------------
- ProGe: Reasonable error message when implementation for some FU is invalid.
External interface changes
--------------------------
- OSAL.hh: removed RUNTIME_ERROR_WITH_DATA as it's a too specific helper
for OSAL API. Let's keep it minimal and clean.
Documentation
-------------
- Documented the different "datapath connectivity levels" and
their support in TCE.
- Added an "Unsupported C Language Features"-section.
- Added and fixed documentation on the floating point TTA designs.
- Added some documentation about the dialog used to define operation operand
bindings and timings to a function unit in ProDe.
- Added hints about avoiding the most common bottlenecks on TTA designs with
the current TCE compiler.
- Added some documentation for the OpenCL support.
1.4 April 2011
====================
Notable new features
--------------------
- Support for LLVM 2.9.
LLVM 2.7 and 2.8 unsupported but might still work, see below for known
problems. We strongly recommend upgrading to LLVM 2.9.
- OpenCL Embedded compliant FPU implementations by Timo Viitanen / TUT
- Generic VHDL implementations for the basic streaming operations
from Jani Boutellier / University of Oulu.
- ConnectionSweeper IC network exploration algorithm.
Optimizes the IC network by sweeping the buses of the machine and
removing the least important connections first until a cycle count
worsening threshold is reached. Tries to remove RF connections
first as they are usually more expensive than the bypass connections.
- Added --pareto_set switch to the explorer for printing pareto efficient
configurations. Currently supports the connectivity and cycle count as
the quality metrics.
- proge: IP-XACT support updated to version 1.5
- Added switch --print-resource-constraints to tcecc to assist in
deciding which resources to add to the machine to improve the
schedule. Dumps DDGs to dot files along with dependence and
resource constraint analysis data.
Code generator improvements
---------------------------
- Passes the first function parameter in register instead of stack.
- Uses negative guard more aggressively, less stupid guard xoring operations.
- Emulation pattern generation improved, can use immediates directly when
using DAG to emulate missing operations.
- Some other minor pattern improvements leading to slightly better code
on some situations.
- Alias analysis improvements, understands that register spills to stack
cannot alias with other memory operations
- Software Bypasser is much more aggressive.
Optimizations
-------------
- tcecc: Decreased scheduling time.
- tcecc: Decreased memory usage.
- ttasim: Compiled simulation (-q) can correctly simulate machines with
guard latency higher than 1. Simulating such machines no longer makes
the simulator revert to interpreting mode.
Smaller features
----------------
- tcecc: Reasonable error message if disk space runs out during TPEF writing.
- ttasim: Refuses to simulate a program that moves a too wide immediate to
a too narrow jump/PC port in the control unit. It would result in wrong