-
Notifications
You must be signed in to change notification settings - Fork 0
/
blog.html
492 lines (370 loc) Β· 23.8 KB
/
blog.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
<!DOCTYPE html>
<html>
<head>
<title>Eduardo JosΓ© GΓ³mez HernΓ‘ndez</title>
<meta content="text/html; charset=UTF-8" http-equiv="content-type">
<meta charset="UTF-8">
<link rel="canonical" href="https://www.edujgh.net/news.html" />
<meta name="viewport" content="width=device-width, initial-scale=1">
<style>
.menu ul {
list-style-type: none;
margin: 0;
padding: 0;
overflow: hidden;
background-color: #333;
position: -webkit-sticky; /* Safari */
position: sticky;
width: 100%;
}
.menu li {
float: left;
border-right: 1px solid #bbb;
}
.menu li:last-child {
border-right: none;
}
.menu a {
display: block;
color: white;
text-align: center;
padding: 14px 16px;
text-decoration: none;
}
.menu li a:hover {
background-color: #111;
}
.menu li .active {
background-color: #4CAF50;
}
header h1 {
margin: 1em 0 0.5em 0;
color: #343434;
font-weight: normal;
font-family: 'Ultra', sans-serif;
font-size: 36px;
line-height: 42px;
text-transform: uppercase;
text-shadow: 0 2px white, 0 3px #777;
}
.note {
position: relative;
padding: 1em 1.5em;
margin: 2em auto;
color: #fff;
background: #97C02F;
overflow: hidden;
margin: 1em;
}
.note:before {
content: "";
position: absolute;
top: 0;
right: 0;
border-width: 0 16px 16px 0;
border-style: solid;
border-color: #fff #fff #658E15 #658E15;
background: #658E15;
-webkit-box-shadow: 0 1px 1px rgba(0,0,0,0.3), -1px 1px 1px rgba(0,0,0,0.2);
-moz-box-shadow: 0 1px 1px rgba(0,0,0,0.3), -1px 1px 1px rgba(0,0,0,0.2);
box-shadow: 0 1px 1px rgba(0,0,0,0.3), -1px 1px 1px rgba(0,0,0,0.2);
/* Firefox 3.0 damage limitation */
display: block; width: 0;
}
.note.rounded {
-moz-border-radius: 5px 0 5px 5px;
border-radius: 5px 0 5px 5px;
}
.note.rounded:before {
border-width: 8px;
border-color: #fff #fff transparent transparent;
-moz-border-radius: 0 0 0 5px;
border-radius: 0 0 0 5px;
}
.note header {
font-size: 175%;
text-decoration: underline;
}
img.profile {
border-radius: 30%;
-webkit-box-shadow: 2px 2px 5px 0px rgba(0, 0, 0, 1);
-moz-box-shadow: 2px 2px 5px 0px rgba(0, 0, 0, 1);
box-shadow: 2px 2px 5px 0px rgba(0, 0, 0, 1);
}
ul li a img {height: 1.5em;}
pre {
background: #303030;
color: #f1f1f1;
padding: 10px 16px;
border-radius: 2px;
border-top: 4px solid #5895fc;
border-bottom: 4px solid #5895fc;
-moz-box-shadow: inset 0 0 10px #000;
box-shadow: inset 0 0 10px #000;
counter-reset: line;
}
pre span {
display: block;
line-height: 1.5rem;
}
pre span:before {
counter-increment: line;
content: counter(line);
display: inline-block;
border-right: 1px solid #ddd;
padding: 0 .5em;
margin-right: .5em;
color: #888
}
hr {
border: 3px solid #f9c7fc;
border-radius: 5px;
}
</style>
</head>
<body>
<header>
<h1>Eduardo JosΓ© GΓ³mez HernΓ‘ndez</h1>
<nav class="menu">
<ul>
<li><a href="index.html">Biography</a></li>
<li><a href="research.html">Research</a></li>
<li><a href="education.html">Education and Experience</a></li>
<li><a href="teaching.html">Teaching</a></li>
<li><a href="students.html">Students</a></li>
<li><a href="reviewer.html">Reviewer</a></li>
<li><a href="tools.html">Tools</a></li>
<li><a href="news.html">News</a></li>
<li><a class="active" href="#">Blog</a></li>
</ul>
</nav>
</header>
<br>
This is blog is no longer a mirror. It is now the main location of the blog :)
<br>
<h2>by Topic</h2>
<ul>
<li> TheZZAZZGlitch's April Fools Event
<ul>
<li><a href="#zzazz23-comp">Finally!!! I have implemented something very close to a compiler</a></li>
</ul>
</li>
<li> Splash-4
<ul>
<li><a href="#splash4-1">SPLASH-4 Article #1: Introduction - Road to Splash-4.1</a></li>
<li><a href="#splash4-2">SPLASH-4 Article #2: FFT extra barriers and prefetch</a></li>
<li><a href="#splash4-3">SPLASH-4 Article #3: CLOCK is not enough to measure time</a></li>
<li><a href="#splash4-4">SPLASH-4 Article #4: Old-style data types</a></li>
<li><a href="#splash4-5">SPLASH-4 Article #5: A special version Released - Splash-4.0.1</a></li>
</ul>
</li>
<li> Random
<ul>
<li><a href="#welcome">Welcome!!</a></li>
</ul>
</li>
</ul>
<hr>
<h2>by Date</h2>
<h3>2023</h3>
<ul>
<li>October
<ul>
<li><a href="#splash4-5">SPLASH-4 Article #5: A special version Released - Splash-4.0.1</a></li>
</ul>
</li>
<li>May
<ul>
<li><a href="#zzazz23-comp">Finally!!! I have implemented something very close to a compiler</a></li>
<li><a href="#splash4-4">SPLASH-4 Article #4: Old-style data types</a></li>
</ul>
</li>
<li>April
<ul>
<li><a href="#splash4-3">SPLASH-4 Article #3: CLOCK is not enough to measure time</a></li>
<li><a href="#splash4-2">SPLASH-4 Article #2: FFT extra barriers and prefetch</a></li>
</ul>
</li>
<li>February
<ul>
<li><a href="#splash4-1">SPLASH-4 Article #1: Introduction - Road to Splash-4.1</a></li>
<li><a href="#welcome">Welcome!!</a>
</ul>
</li>
</ul>
<hr>
<div class="blogpost" id="splash4-5">
<h2>SPLASH-4 Article #5: A special version Release - Splash-4.0.1</h2>
<p>After so long, a new article is out :)<br>
<p>During the development of another different project, a question arose about the last Lock that remains in Volrend. So, we decided to tackle it, and if necessary, build another version if the results are relevant enough.</p>
<p>The critical section is the following one:</p>
<pre><code> LOCK(Global->CountLock);
printf("%3ld\t%3ld\t%6ld\t%6ld\t%6ld\t%6ld\t%8ld\n",my_node,frame,exectime,
exectime1,num_rays_traced,num_traced_rays_hit_volume,
num_samples_trilirped);
UNLOCK(Global->CountLock);</code></pre>
<p>This section is used to print the progress of the application. Its output can be used to determine if the application ran successfully, however, because the order was random, the output was never used for this purpose and instead, the output tiff was used.</p>
<p>The new version moves this data to a shared structure and it is later print by thread 0, without the need for an extra synchronization (there was already a barrier after the code mentioned above.</p>
<pre><code> Global->progress[my_node].frame = frame;
Global->progress[my_node].exectime = exectime;
Global->progress[my_node].exectime1 = exectime1;
Global->progress[my_node].num_rays_traced = num_rays_traced;
Global->progress[my_node].num_traced_rays_hit_volume = num_traced_rays_hit_volume;
Global->progress[my_node].num_samples_trilirped = num_samples_trilirped;</code></pre>
<p>This new version is named "<code>Volrend No Print Lock</code>".</p>
<p>With this last change, we announce the new version Splash-4.0.1:</p>
<p>25 - October -2023: Release Splash-4.0.1
<ul>
<li>NEW: A new version for Volrend, Volrend-no_print_lock. The progress printing lock at adaptative.c.in have been replaced with a memory location. After the barrier sync, thread 0 prints all the progress in order.<br>
Besides the reduction in unnecessary synchronization, the output is always in thread+frame order.</li>
<li>FIX: The CLOCK macro has been replaced with a high-res, more info at issue #2</li>
<li>MINI: SPLASH3_ROI_BEGINand SPLASH3_ROI_END have been replaced with SPLASH4_ROI_BEGIN and SPLASH4_ROI_END</li>
</ul>
</p>
<p>Remember, the new version is available at the repository: <a href="https://github.com/OdnetninI/Splash-4" target="_blank">https://github.com/OdnetninI/Splash-4</a>
<p>Best Regards, OdnetninI</p>
</div>
<hr>
<div class="blogpost" id="zzazz23-comp">
<h2>Finally!!! I have implemented something very close to a compiler</h2>
<p>I am still preparing two posts for explaning why I did this, but as they are not yet ready, let me give you a very small summary:</p>
<p>Every year, the Youtuber "TheZZAZZGlitch" (a.k.a. zzazz) creates a challenge named "<code>TheZZAZZGlitch's April Fools Event</code>". Typicaly this event is run Gameboy or Gameboy Advance with some unmodified Pokemon ROM and a custom Save file. There is an exploration part, where everyone without technical knowledge can play and solved without any issue. But, there is also a secondary part which includes the understanding of several topics: Reverse Engineering, Networking, Assembly (LR35902, ARM), Quick prototyping, among many others. I've been participating since 2017 each year, most of them teamed with by friend Radixan.</p>
<p>However, this year was special, zzazz did not have enough time to prepare a full challenge, they got a new job in Microsoft and they are preparing cybersecurity conferences, so they did an more Capture The Flag challenge (CTF). This time, there was an invented 16-architecture that has its own instructions, applications, environment...</p>
<p>Okay, but how a compiler is related with a CTF challenge?</p>
<p>Well, after the challenge finished, we were waiting for zzazz to release the source code of the challenge, like they usually do. But there was no luck, only the leaderboard got released.</p>
<p>So, I did my own. I have implemented 99% of the challenge. I will not enter in details, as I am detailing everything in another post, but in summary, there are some information missing that we were not able to recover before the challenge server died. However, it is possible to emulate enought of it to make the Flag available.</p>
<p>The point was, that, during the development of the replica of the challenge, I wanted to have some tools to Assemble and Disassemble the binaries of the challenge.</p>
<p>The Disassembler was easy, we already done it during the challenge itself, it required some tinkering and refactorization, but it worked. The main issue was the Assember. I didn't want just a simple assembler that process line by line, I wanted to have labels, constants, string reallocation, expressions... Just some things that make life easier for developers.</p>
<p>As the challenge did not have an specific assembly language, I invented my own, based on x86, ARM, Z80, 6202, ... everything I already know.</p>
<p>I splited the challenge of making the assembler in two steps:
<ul>
<li>Make an small dirtier version that just works, even if instructions need to change to make my life easier (like "<code>movi</code>" instead of the generic "<code>mov</code>" to indicate that we are using a number instead of a register).</li>
<li>When it works, refine the assembly language and create an assembler that has at least 4 modules: Lex, Grammar, Code, Symbols.</li>
</ul>
</p>
<p>The first version, was a nightmare, a lot of small tricks, and hardcoded things to make it work. But it did :) This assembler:
<ul>
<li><a href="https://github.com/OdnetninI/zzazz-2023-server/blob/e36cdcb5914540a29fc3363883eb513ecbe73376/tools/assember/src/assembler.c" target="_blank">https://github.com/OdnetninI/zzazz-2023-server/blob/e36cdcb5914540a29fc3363883eb513ecbe73376/tools/assember/src/assembler.c</a></li>
Was able to compile this program:
<li><a href="https://github.com/OdnetninI/zzazz-2023-server/blob/e36cdcb5914540a29fc3363883eb513ecbe73376/tools/assember/13337.boot.asm" target="_blank">https://github.com/OdnetninI/zzazz-2023-server/blob/e36cdcb5914540a29fc3363883eb513ecbe73376/tools/assember/13337.boot.asm</a></li>
It worked
</ul>
</p>
<p>The second version, was a bit better (<a href="https://github.com/OdnetninI/zzazz-2023-server/tree/main/tools/assember" target="_blank">https://github.com/OdnetninI/zzazz-2023-server/tree/main/tools/assember</a>), however, some things are still done for quicker development. For example, the code does not differentiate between "<code>call</code>" and "<code>caZZ</code>", it just assume that if it starts with letter 'c' then an 'a' follows it, and it is 4 characters long, it is "<code>call</code>".</p>
<p>But, after finishing some parts, adding more stuff, refining things, creating the symbol tables, allocating space, resolving symbols, doing the grammar maching... it end up being a compiler, a very simple one.</p>
<p>It is full of bugs, things that can be improved, better decissions could be done, but in the end, it works. I have tried so many times to implement a compiler, even during the Bachelor's I had a subject about it, but this is the first time I was able to implement something that works, done completely by myself and I am proud of it.</p>
<p>For some of you, this could be nothing, but for me, it was huge.</p>
<p>Please, check the code, and feel free to submit your improvements. The only requirement is that the already existing "<code>.asm</code>" files generate binary identical files to the challenge ones. I have some improvements in mind, but I need motivation to implement them.</p>
<p>This is all for this time.</p>
<p>Best Regards, OdnetninI</p>
</div>
<hr>
<div class="blogpost" id="splash4-4">
<h2>SPLASH-4 Article #4: Old-style data types</h2>
<p>Welcome to the next Splash-4.1 article.<br>
Sorry for not writing in a while, but I am preparing a lot of things, so I did not have too much free time.</p>
<p>Before the standardization of C built-in types in C99, it was very common to have large sections declaring types from the built-in ones:</p>
<pre><code>// Extracted from raytrace
typedef char CHAR;
typedef char S8;
typedef unsigned char UCHAR;
typedef unsigned char U8;
typedef short SHORT;
typedef short S16;
typedef unsigned short USHORT;
typedef unsigned short U16;
typedef long INT;
typedef unsigned long UINT;
typedef unsigned long BOOL;
typedef long LONG;
typedef long S32;
typedef unsigned long ULONG;
typedef unsigned long U32;
typedef float FLOAT;
typedef float R32;
typedef double DOUBLE;
typedef double R64;
typedef double REAL;</code></pre>
<p>In these cases, we have several possible solutions. The first one is to fix those typedefs with the right data type, which is the easiest solution. However, even being more time-consuming, it is better to use the correct data types in the code and remove the typedefs completely.</p>
<p>It is easy to think, that just a replacement in the full code with solve the issue. This is far from the truth.</p>
<p>Now, it is important to check all the uses of those types and replace some of them with better alternatives. For example, some variables can only be positive, so no need for them to be signed. Other variables can use fewer bits. There is a full set of changes that can be applied to optimize not only memory usage but also CPU usage.</p>
<p>Let me include an honorable mention, <code>"bool"</code>. In the past, the boolean type was not implemented in the standard, so most people used unsigned/signed 8-bit variables to implement them, which helped with macros. Sometimes, longer types, and other types even without macros. But nowadays, the <code>"stdbool"</code> header solves this issue. However, even today, there are reasons why some programmers prefer avoiding the <code>"bool"</code> type, but this is a story for another time.</p>
<p>Nowadays, typedef is used for reducing the size of data type names (from <code>"struct MyType"</code> to <code>"MyType"</code>), giving more meaningful names (from <code>"unsigned int"</code> to <code>"Index"</code> or <code>"index_t"</code>), and when the data is meant to be changed by the user (from <code>"float"</code> to <code>"Element_Type"</code>).</p>
<p>In the end, to conclude, this is a programmer's decision. But reasons to use typedefs have changed due to standardization, language evolution, and also other languages and programmers' influence. Everything evolves.</p>
<p>Best Regards, OdnetninI</p>
</div>
<hr>
<div class="blogpost" id="splash4-3">
<h2>SPLASH-4 Article #3: CLOCK is not enough to measure time</h2>
<p>Hello again, folks π,</p>
<p>I got another request in my email asking if I had any clue why the benchmarks were reporting <code>"0"</code> or <code>"nan"</code> execution times.</p>
<p>I also noticed it, but never care about it as all my measures were done using the ROI with my custom code.</p>
<p>But, if I want to enable more people to use the Splash-4 benchmarks as a replacement for Splash-2, Splash-2X, or Splash-3, I have to fix these issues. Note that in Splash-4.1, these statistics will not exist as the synchronization point is slower than the code itself. I am still thinking of a way of solving this question.</p>
<p>Going back to the issue mentioned in the email. Why is the timer reporting <code>"0"</code> or <code>"nan"</code>?</p>
<p>Well <code>"nan"</code> could appear when there is a by zero division. So let's check if this is the case.</p>
<pre><code>CLOCK(initdone);
...
CLOCK(finish);
...
Global->totaltimes[MyNum] = finish-initdone;
...
((double)Global->transtimes[0])/Global->totaltimes[0]</code></pre>
<p>So, let's check how the CLOCK macro works</p>
<pre><code>m4_define(CLOCK, `{long time(); ($1) = time(0);}')</code></pre>
<p>The <code>"time()"</code> C call, as mentioned in the man 2 page:</p>
<p><code>time() returns the time as the number of seconds since the Epoch, 1970-01-01 00:00:00 +0000 (UTC).</code></p>
<p>So, the problem is easy to understand, the measured parts take less than 1 second to execute. And when the measure is done just at the border of two seconds, instead of <code>"nan"</code>, the benchmarks spill out <code>"0"</code>.</p>
<p>The main problem in solving this is that the variables, for measuring time, are 32 bits in most of the benchmarks. Therefore, in the future, this will be again an issue, but meanwhile, I find a better solution, I replaced the CLOCK macro with this one, that has a high-res timer.</p>
<pre><code>m4_define(CLOCK, `{
struct timeval FullTime;
gettimeofday(&FullTime, NULL);
($1) = (unsigned long)(FullTime.tv_usec + FullTime.tv_sec * 1000000);
}')</code></pre>
<p>I cannot point out the exact moment, but in Splash-2 I found both versions of the CLOCK macro, depending on the specific repository I look at. Also, in Splash-3 the problematic CLOCK macro is used.</p>
<p>After some testing, I declare the issue solved, for now.</p>
<p>Best Regards, OdnetninI</p>
</div>
<hr>
<div class="blogpost" id="splash4-2">
<h2>SPLASH-4 Article #2: FFT extra barriers and prefetch</h2>
<p>Hello everyone π</p>
<p>Since I started developing Splash-4, I noticed that, more commonly the kernel apps, there are several synchronization points that are not needed at all.</p>
<p>After receiving an email asking about why some of the barriers were there, as they seem to be doing nothing, I started looking into them again.</p>
<p>One of the first things I noticed is that nearly half of the barriers in FFT were there just to be able to measure the time between different execution parts. However, in nowadays hardware and designs, these benchmarks execute extremely fast, this synchronization overhead is no longer insignificant.</p>
<pre><code>BARRIER(Global->start, P);
if ((MyNum == 0) || (dostats)) {
CLOCK(clocktime1);
printf("Step 2: %8lu\n", clocktime1-clocktime2);
}</code></pre>
<p>Therefore, after a deep analysis of which parts use data from other cores and when it is using its data, only three Barriers are required (four if reverse FFT is enabled to check the results). That is a great reduction from the original seven mandatory barriers.</p>
<p>Apart from the barriers, before starting computing, each thread tries to prefetch its data using:</p>
<pre><code>TouchArray(x, trans, umain2, upriv, MyFirst, MyLast);</code></pre>
<p>After doing some experiments on three different machines, I noticed that this "prefetch" was slowing down the application a little bit in some cases.</p>
<p>Prefetching could be important in specific architectures, which is why I decided to make it optional using the -y argument when executing the application.</p>
<p>I know this time there are no fancy graphs, but I hope to see you next time βΊ</p>
<p>Best Regards, OdnetninI</p>
</div>
<hr>
<div class="blogpost" id="splash4-1">
<h2>SPLASH-4 Article #1: Introduction - Road to Splash-4.1</h2>
<img src="blog/splash4-1-img.webp" alt="Performance difference between Splash-3, Splash-3 with Atomics, Splash-3 with sense reversing Barriers and Splash-4">
<p>Hello everyone π</p>
<p>Several months ago, my research group and I released The Splash-4 Benchmark Suite. Published in IISWC2022 <code>"10.1109/IISWC55918.2022.00015"</code> and available in GitHub <a href="https://github.com/OdnetninI/Splash-4" target="_blank">https://github.com/OdnetninI/Splash-4</a></p>
<p>This release was a big update on those very old benchmarks (over 25 years old), significantly reducing the synchronization overhead of running the benchmarks, reducing the execution time and increasing the performance of the applications.</p>
<p>However, even after 3 updates, no one has invested enough time into the applications to modernize the code to today's standards.</p>
<p>To begin, because most of the benchmarks were written in the early 90s, they follow very old C programming style. As an example, all the variables of a function are defined at the beginning. Creating a lot of lines that are not relevant at that moment when trying to understand the code.</p>
<p>During the development of the Splash (or probably Splash-2), the authors introduced the M4 macro system. M4 has several good features, it is much more evolved than the C preprocessor, but it has some several drawbacks. When using an external macro system, matching compiler errors with original source files is not trivial, this is the reason why the C preprocessor introduces several <code>"#line"</code> sentences after the preprocessor executes. Another drawback is that M4 uses positional arguments for the macros, and never checks the number of parameters.</p>
<p>Therefore, the main goal of Splash-4.1 is to modernize the source code to make them easier to read, removing external dependencies (like the M4 macro processor) and fix several bugs, remove synchronization overheads that are not needed. All these changes will allow future researchers to understand and modify the benchmarks as they need, without the hassle of reading near 30 years old code.</p>
<p>In the following articles, I plan to show some of the steps done, and the reason behind them.</p>
<p>So, I hope to see you around in the following post βΊ</p>
<p>Best Regards, OdnetninI</p>
</div>
<hr>
<div class="blogpost" id="welcome">
<h2>Welcome!!</h2>
<p>Hi everyone π,</p>
<p>I am OdnetninI (aka. Odi, Odnet or Eduardo) a Ph.D. Student at the University of Murcia. I plan to post several series of articles focused on different topics I encounter in my research career (sometimes also my life). Hope to see you soon π</p>
<p>Best Regards, OdnetninI</p>
</div>
</body>
</html>