/
README
222 lines (153 loc) · 8.11 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
Data Parallel Hands-on exercise: C-ray ray tracing
==================================================
Overview
--------
This directory contains the framework for writing a ray-tracing
program in Chapel. All of the ray tracing complexity that converts a
pixel coordinate into a color value is provided to you, as are utility
routines that can write out BMP or PPM files. All you have to do is
follow the directions below to create an array representing the image
and compute its pixel values.
File List
---------
Important files in this directory are as follows:
README : This file
c-ray.chpl : The main Chapel ray-tracing source file
Image.chpl : A helper module for creating image files
Makefile : A trivial Makefile showing how to compile the program
scene : A simple, default scene file describing four spheres
sphfract : A more complex scene file describing a fractal pattern
scene.bmp : The expected BMP file for the default scene
sphfract.bmp : The expected BMP file for 'sphfract'
If present, the following files are related to the Chapel testing
system, which we use to maintain this code. They can be ignored for
the purposes of the hands-on exercise:
CATFILES
CLEANFILES
c-ray.execopts
c-ray.good
Image.notest
Instructions
------------
STEP 0: Compile and run the framework as it stands
--------------------------------------------------
* Compile 'c-ray.chpl' as it stands (e.g. 'chpl c-ray.chpl').
* Run the resulting program (e.g., './c-ray').
* You should see a new file 'image.bmp'. Viewing this using your
favorite image viewer should show a small all-black rectangle.
* Run the program with --usage to print out a help message describing
the options that will affect its behavior (most of which won't have
any effect until you start ray-tracing).
* Browse the code as much or as little as you'd like.
STEP 1: Declare and write your image array
------------------------------------------
* Find the first "TODO" section in the program and replace it with an
array declaration (and optionally a domain for the array) that
describes the image of pixels to render. The array should store
'yres' x 'xres' pixel elements of type 'pixelType' (a configurable
integer type alias defined in the 'Image.chpl' module).
Note: Graphics people (and this program's '--size' argument)
describe images as "width x height" where Chapel typically uses
"rows x cols". This is why the program often uses (y, x) values
rather than (x, y) — to represent this transposition.
* Find the call to writeImage() at the end of main() and replace its
passing of the 'dummy' array with the one you created (the 'dummy'
array can be removed).
* When you re-compile and run the program now, you should get a black
image whose size corresponds to the --size argument. Try running it
with different values to verify this.
STEP 2: Compute the ray tracing using a serial loop
---------------------------------------------------
* Find the second "TODO" section in the program, bracketed by timer
calls, and replace it with a computation of your image array's
values via calls to 'computePixel()' (pre-defined further down in
the file), storing the results into your array. For this step, try
using a serial loop.
* After re-compiling and re-running, do you get a reasonable image?
* Use the --scene config const to try reading in a more interesting
sccene like 'sphfract'.
* Try experimenting with setting other configuration options on the
command-line to see if they work as expected. See the list of
options by searching on 'config' at the top of the file, or by
running the program with the --usage flag.
* Time how long these renderings take. Recompile with the --fast flag
(intended for performance runs, once a program is working, it will
turn off various execution-time checks and turn on back-end compiler
optimizations). Note these timings for future reference.
STEP 3: Compute the ray tracing using a parallel loop
-----------------------------------------------------
* Try changing your serial loop to a parallel one and make sure your
code still produces the right image.
* What kind of speed improvements do you see over the serial loop?
(with or without --fast?)
STEP 4: Compute the ray tracing using promotion
-----------------------------------------------
* Can you replace the above parallel loop with promotion? Does it affect the
speed at all?
STEP 5 (optional): Try the dynamic load balancing iterator
----------------------------------------------------------
* Ray tracing can be notoriously poorly load balanced since some
pixels result in far more ray bounces than others. Can you achieve
a speed improvement by applying the dynamic() iterator from the
DynamicIters module:
https://chapel-lang.org/docs/modules/standard/DynamicIters.html
Do you need to create a more load-imbalanced scene (or increase the
degree of parallelism?) in order to see a noticeable difference?
**********************************************************************
* The following steps are intended for later in the day, though *
* you're welcome to work ahead if you breezed through the prior steps*
**********************************************************************
STEP 6: Distribute the computation of the image
-----------------------------------------------
* Starting from step 3 or 4, make your array a distributed array by
making its domain a distributed domain. Do you still get the
correct image?
* Do you see additional overhead relative to your previous timings due
to the additional complexity of distributed arrays?
* If you've been working from a laptop, you've probably been compiling
in '--local' mode, which is the default when CHPL_COMM==none (which
is the default for laptops). If you've been working from a Cray or
cluster, you've likely been compiling in '--no-local' mode, which is
the default when CHPL_COMM!=none. Try compiling in both '--local'
and '--no-local' modes to see what kind of overhead is added when
compiling the code for a single vs. multiple locales.
STEP 7: Get set up to run on multiple locales
---------------------------------------------
* One way to run using multiple locales is to run on a Cray or
cluster. This is also the way to (potentially) see parallel speedup
as you run on an increasing number of locales.
* You can also set up your laptop to run multiple locales in an
oversubscribed manner (essentially, each locale will be a distinct
process running on your laptop). This won't result in scalable
performance, but can be a way to debug multi-locale programs and
runs in a convenient setting.
The following instructions are designed to help you get set up in
this mode:
https://chapel-lang.org/docs/platforms/udp.html
While these ones provide more general information about multi-locale
runs, for example, if running on a cluster:
https://chapel-lang.org/docs/usingchapel/multilocale.html
* Try running your program using two or more locales using the
'-nl'/'--numLocales' flags.
Note: Initially, you may need to reduce the image size using the
'--size' flag to make this step complete in a tractable amount of
time.
* Do you see a performance improvement or slip when running on
multiple locales? (recall that, on a laptop, the best you should be
able to do is break even since all of the "locales" are sharing the
same physical resources).
* What could you do within the code to "prove" to yourself that
different locales are involved in computing the various pixels when
it runs?
* Why are things so slow when running on multiple locales? Are you
able to figure this out either by (1) reasoning about the code, (2)
using the CommDiagnostics module:
https://chapel-lang.org/docs/modules/standard/CommDiagnostics.html
or (3) using the 'chplvis' tool:
https://chapel-lang.org/docs/tools/chplvis/chplvis.html
STEP 8: Make the program scale
---------------------------------
* Given the answer to the previous question (check with the
instructors if you need help with it), what can you do to make the
ray-tracing scalable? How close can you get to linear speed-up as
you add locales?