-
Notifications
You must be signed in to change notification settings - Fork 81
/
radis.xml
589 lines (475 loc) · 38.1 KB
/
radis.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="../assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Universe OpenAstronomy (Posts about radis)</title><link>http://openastronomy.org/Universe_OA/</link><description></description><atom:link href="http://openastronomy.org/Universe_OA/categories/radis.xml" rel="self" type="application/rss+xml"></atom:link><language>en</language><lastBuildDate>Thu, 30 May 2024 01:00:38 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>Benchmark Tests</title><link>http://openastronomy.org/Universe_OA/posts/2023/08/20230825_0000_1someshverma/</link><dc:creator>Somesh Verma</dc:creator><description><p>I have finished the refactoring the code for vaex and also writtten test cases to compare the spectrum calculated using pandas with the dataframe calculated using vaex dataframe . Also, Various spectroscopic quantities as absorbance , emissitivity is also compared for the both the dataframams.
Also, there was many issues that was raised by the maintainers and I have resolved almost all of these , and commented on the other issues to discuss the problem and discuss some possible solution .Issues raised by the maintainers was mainly related to make changes more matainable and easy to understand and simple programming logic is preferred inplace of using some complex code without explaining that in detail.
Also , the issue was to ensure a light test suite , that is test cases which takes less resources and time . Initialy , I didn’t focused on this thing and focused on testing the code and changes more elaborately by writing the test cases that cover many areas of code .</p>
<!-- TEASER_END -->
<p>But, later as told by maintainer I have refacatored the changes and made the changes more light and test cases more light .It helped to reduce the time required test the new commit as excecution time of the test cases were reduced.
After, all this another thing was to add benchmark test to compare the memory use by vaex and pandas and also compare the execution time used by both these engines.</p>
<p>Benchmark Test added to compare time taken by code is :</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>def compare_vaex_pandas_time():
"""
Compares the time performance of pandas and Vaex and generates a plot. This scripts takes several minutes to run.
This results shoud shown that vaex and pandas provide similar performances in term if speed.
Returns
-------
None.
"""
time_list, timeC_list, lines_list = [], [], []
time_list_va, timeC_list_va, lines_list_va = [], [], []
wmin = 1000
steps = 5
wmax_arr = np.geomspace(10, 1000, steps)
initial_engine = config[
"DATAFRAME_ENGINE"
] # To make sure dataframe engine not changed after running this test
pb = ProgressBar(N=2 * steps)
for i, engine in enumerate(["vaex", "pandas"]):
config["DATAFRAME_ENGINE"] = engine
for j, w_range in enumerate(wmax_arr):
t0 = time.time()
s = calc_spectrum(
wmin,
wmin + w_range, # cm-1
molecule="H2O",
isotope="1,2,3",
pressure=1.01325, # bar
Tgas=1000,
mole_fraction=0.1,
databank="hitemp", # or 'hitemp'
wstep="auto",
cutoff=1e-28,
verbose=0,
)
t1 = time.time()
if engine == "vaex":
timeC_list_va.append(s.conditions["calculation_time"])
lines_list_va.append(s.conditions["lines_calculated"])
time_list_va.append(t1 - t0)
# lines_list_va.append(s.conditions['lines_calculated']+s.conditions['lines_cutoff'])
else:
timeC_list.append(s.conditions["calculation_time"])
lines_list.append(s.conditions["lines_calculated"])
time_list.append(t1 - t0)
# lines_list.append(s.conditions['lines_calculated']+s.conditions['lines_cutoff'])
pb.update(i * steps + (j + 1))
plt.figure()
plt.plot(lines_list, time_list, "k", label="pandas total")
plt.plot(lines_list, timeC_list, "k--", label="pandas computation")
plt.plot(lines_list_va, time_list_va, "r", label="vaex total")
plt.plot(lines_list_va, timeC_list_va, "r--", label="vaex computation")
plt.ylabel("Time [s]")
plt.xlabel("Number of lines")
plt.legend()
config["DATAFRAME_ENGINE"] = initial_engine
</code></pre></div></div>
<p><img alt="Vaex Comparison Time" src="https://1someshverma.github.io/images/timeComparison.png"></p>
<p>while Graph for Memory use and code are :
<img alt="Vaex Comparison" src="https://1someshverma.github.io/images/vaexcomparison.png"></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Compare the memory performance of Pandas and Vaex
def compare_pandas_vs_vaex_memory():
"""
Compare memory usage of `engine="vaex"` and `engine="pandas"` in calc_spectrum.
Expected behavior is "vaex" using much less memory. This function takes tens of seconds to run.
Returns
-------
None.
"""
import tracemalloc
initial_engine = config[
"DATAFRAME_ENGINE"
] # To make sure dataframe engine not changed after running this test
for engine in ["pandas", "vaex"]:
config["DATAFRAME_ENGINE"] = engine
tracemalloc.start()
s = calc_spectrum(
1000,
1500, # cm-1
molecule="H2O",
isotope="1,2,3",
pressure=1.01325, # bar
Tgas=1000, # K
mole_fraction=0.1,
wstep="auto",
databank="hitemp", # or 'hitemp', 'geisa', 'exomol'
verbose=0,
)
snapshot = tracemalloc.take_snapshot()
memory = tracemalloc.get_traced_memory()
tracemalloc.stop()
# Some raw outputs
print("\n******** Engine = {} ***********".format(engine))
print(
"Peak, current = {:.1e}, {:.1e} for {:} lines calculated".format(
*memory, s.conditions["lines_calculated"]
)
)
# More sophisticated
print("*** List of biggest objects ***")
top_stats = snapshot.statistics("lineno")
for rank, stat in enumerate(top_stats[:3]):
print("#{}".format(rank + 1))
print(stat)
# Clear for next engine in the loop
tracemalloc.clear_traces()
config["DATAFRAME_ENGINE"] = initial_engine
</code></pre></div></div></description><category>radis</category><guid>http://openastronomy.org/Universe_OA/posts/2023/08/20230825_0000_1someshverma/</guid><pubDate>Thu, 24 Aug 2023 23:00:00 GMT</pubDate></item><item><title>Progress on Kurucz and NIST databases</title><link>http://openastronomy.org/Universe_OA/posts/2023/08/20230811_1723_menasrac/</link><dc:creator>Racim MENASRIA</dc:creator><description><p>Since the last article, I received a lot of feedback and comments about the Kurucz PR.</p>
<figure><img alt="" src="https://cdn-images-1.medium.com/max/952/1*iyB8Ya_dKD5OIkQ8gbcovg.png"></figure><p>Here is and example of a Fe_I spectrum I can obtain with these conditions.</p>
<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*alxgHx0L0Bg54hyUcp8h0Q.png"></figure><h4>The main remarks where that :</h4><p>I needed to adjust the code to make it more general and user friendly. I introduced a specie argument to SpectrumFactory and calc_spectrum to replace atom and molecule and gather them under a same name.</p>
<!-- TEASER_END -->
<p>I made sure to respect the Radis structure by mooving files where I needed to and adding a new Partfunc class for Kurucz. <br>Then I added a few tests and removed old tests that were not needed any longer.</p>
<p>I also cleaned my PR : removed all the unused methods from the Kurucz API,added references, moved hardcoded arrays to proper files.</p>
<p>We asked the Exojax team for more help about the broadening parameters. For the moment, there are some approximations and placeholders about the airbrd (air broadening which is required in the Radis format) by computing it thanks to the Kurucz parameters.<br>A simplified version of the broadening allows to plot spectra for now but there are still values to adjust for the various species.</p>
<p>I also started to work on the NIST database by fixing a parsers developed last year. Though I can plot NIST spectra for some wavelength, there still are issues particularly about the FWHM to deal with.</p>
<img alt="" height="1" src="https://medium.com/_/stat?event=post.clientViewed&amp;referrerSource=full_rss&amp;postId=e955d61c1591" width="1"></description><category>radis</category><guid>http://openastronomy.org/Universe_OA/posts/2023/08/20230811_1723_menasrac/</guid><pubDate>Fri, 11 Aug 2023 16:23:24 GMT</pubDate></item><item><title>Writing Test Cases</title><link>http://openastronomy.org/Universe_OA/posts/2023/08/20230801_0000_1someshverma/</link><dc:creator>Somesh Verma</dc:creator><description><p>For testing specturm produce using vaex and pandas for non-equilibrium calculations are same , the code similar to equilibrium calculations is used</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>from radis import calc_spectrum
<!-- TEASER_END -->
import time
t0=time.time()
s, factory_s = calc_spectrum(1800, 1820, # cm-1
molecule='CO',
isotope='1',
pressure=1.01325, # bar
Tgas=700, # K
Tvib=710,
Trot=710,
mole_fraction=0.1,
wstep='auto',
path_length=1, # cm
databank='hitemp', # or 'hitemp', 'geisa', 'exomol'
optimization=None,
engine='vaex',
verbose=3,
return_factory=True,
)
s.apply_slit(0.5, 'nm') # simulate an experimental slit
t1=time.time()
print('Time taken : '+str(t1 - t0))
t0=time.time()
s1, factory_s1 = calc_spectrum(1800, 1820, # cm-1
molecule='CO',
isotope='1',
pressure=1.01325, # bar
Tgas=700, # K
Tvib=710,
Trot=710,
mole_fraction=0.1,
wstep='auto',
path_length=1, # cm
databank='hitemp', # or 'hitemp', 'geisa', 'exomol'
engine='pandas',
verbose=3,
return_factory=True,
)
s.apply_slit(0.5, 'nm') # simulate an experimental slit
t1=time.time()
print(s.get("absorbance"))
s.plot('radiance_noslit')
print('Time taken : '+str(t1 - t0))
import numpy as np
print(np.allclose(s.get("absorbance"), s1.get("absorbance")))
for column in factory_s1.df1.columns:
assert np.all(factory_s1.df1[column] == factory_s.df1[column].to_numpy())
</code></pre></div></div>
<p>I will add more test cases .</p></description><category>radis</category><guid>http://openastronomy.org/Universe_OA/posts/2023/08/20230801_0000_1someshverma/</guid><pubDate>Mon, 31 Jul 2023 23:00:00 GMT</pubDate></item><item><title>Adapting Kurucz to SpectrumFactory and what is next ?</title><link>http://openastronomy.org/Universe_OA/posts/2023/07/20230729_2343_menasrac/</link><dc:creator>Racim MENASRIA</dc:creator><description><h4><strong>Adapting Kurucz to SpectrumFactory and what is next ?</strong></h4><p>After my first pull request I received some feedback.<br>Optional and major changes were requested. The most important changes were that my code should <strong>better integrate the existing Radis code</strong>. Indeed, though I added a new database with Kurucz, its API remained distinct which is something which will make Radis progress toward a common API.</p>
<p>Another key remark was that my code didn’t take into account the <strong>Broadening effects</strong> that modify the lineshapes.<br>This is why I had a Team meeeting with my mentors to discuss the physics behind the code. It helped me a lot to understand what was expected.</p>
<p>After this I worked on adding broadening and merging the new AdB Kurucz with SpectrumFactory. In order to do so, I worked on an example which allows to plot a spectrum using the Kurucz atomic data and <strong>SpectrumFactory.</strong> My first attempt was to use one of the existing Radis formats for databanks named <strong>hdf5-radisdb</strong> since I worked with hdf5 files in my Class.<br>This attempt happened to be too difficult because the formats were made for molecules and too many columns of my dataframe were different from the expected columns.<br>This is why I eventually decided to add <strong>a new format named “kurucz” </strong>to the load_databank method which allows to load the kurucz data with the proper form.</p>
<!-- TEASER_END -->
<p>Then, I worked on the <strong>eq_spectrum</strong> method to adjust it to this newformat.<br>I added some methods and adapted methods from Exojax to handle linestrength computation, broadening,convolution,pressure layers and create a Spectrum Object. It took me a lot of efforts and I modified many files as Broadening.py, Base.by,Factory.py or loader.py.<br>However, the results of the Spectrum I obtained were not convincing and some parameters and units didn’t fit properly.</p>
<p>Moreover, by the time I wrote this spectrocopy code, I fell behind in my project, that’s why we organized a long meeting with one of my supervisors in order to take stock, we adjusted the objectives of the project.</p>
<h4>We gave up the last one about adding the CIA database and agreed on the following timeline :</h4><ul><li>finishing with SpectrumFactory for Kurucz ASAP</li><li>Moving to NIST</li><li>Then working on the DatabaseManager Class architecture and adapting to AdB and MdB manager subclasses</li><li>Moving to the TheoreTS ( it will require to reach people in Reims to fix the db that I still cannot access).</li><li>Working on developing an example during the last week to show what applications the atomic spectra physics brings.</li></ul><p>We also noticed that I had written my Spectrum Factory example from the beginning rather than using the existing radis methods which is why I lost time and it was unaccurate. However, the meeting brought me the right guidelines and working on this code allowed me to getting a better understanding of the architecture and adapting the example to the existing structure of the code should be easier now. We also discussed about a few existing codes which could be a could starting point for adding NIST to Kurucz.</p>
<h4><strong>Conclusion</strong></h4><p>The next weeeks will take me a lot of time and effort to complete the objectives but in the end, I am happy that we had this meeting because it unblocked me when I was kinda stuck with Kurucz for a while.</p>
<img alt="" height="1" src="https://medium.com/_/stat?event=post.clientViewed&amp;referrerSource=full_rss&amp;postId=d3453292daf1" width="1"></description><category>radis</category><guid>http://openastronomy.org/Universe_OA/posts/2023/07/20230729_2343_menasrac/</guid><pubDate>Sat, 29 Jul 2023 22:43:53 GMT</pubDate></item><item><title>Refactoring for Non-equilibrium Calculations</title><link>http://openastronomy.org/Universe_OA/posts/2023/07/20230725_0000_1someshverma/</link><dc:creator>Somesh Verma</dc:creator><description><p>I passed the midterm evaluation, Next i have to refactor part of code in vaex which is used in non-equilibrium calculations.</p>
<!-- TEASER_END --></description><category>radis</category><guid>http://openastronomy.org/Universe_OA/posts/2023/07/20230725_0000_1someshverma/</guid><pubDate>Mon, 24 Jul 2023 23:00:00 GMT</pubDate></item><item><title>Improving time efficiency of Vaex Implementation</title><link>http://openastronomy.org/Universe_OA/posts/2023/07/20230706_0000_1someshverma/</link><dc:creator>Somesh Verma</dc:creator><description><p>Though Vaex reduced memory use by RADIS to compute specturm but it is slow for smaller databank and in our case when the number of lines in the databank is very less . The slow performance of Vaex for smaller dataframes is due to three main reasons for our implementation of RADIS</p>
<ul>
<!-- TEASER_END -->
<li>First vaex is optimized for larger databank and doesn’t focus that much for smaller dataframe .</li>
<li>Vaex uses virtual columns to reduce memory and only compute the virutal column when it is required it saves memory space but in case when virtual column
is required multiple times then it is computed multiple times and it costs time . For Pandas it only compute the column only once and saves it for further calculations and in-memory compute of Pandas are faster than Vaex for smaller dataframes.</li>
<li>Vaex is based on Apache Arrow and uses Expression class for column while for Pandas which stores column as numpy no conversion is required to use library functions of numpy but for vaex some operations require explict conversion to numpy array and it costs time.</li>
</ul>
<p>Apart from this there was issue it the implementation of vaex which are now optimized by better alternatives.
Intially the time graph for Vaex and Pandas in terms of time comparison was as given below-</p>
<p>Total Time which is the sum of loading time and computation time for calculating spectrum .
Plot of Total Time vs Number of lines Graph</p>
<p><img alt="Vaex Comparison" src="https://1someshverma.github.io/images/earlierTotal.png"></p>
<p>Computation time , it is time required to compute the spectrum using Vaex or Implementation
Plot of Compuation Time vs Number of lines Graph</p>
<p><img alt="Vaex Comparison" src="https://1someshverma.github.io/images/earlierCom.png"></p>
<p>##Optimizations</p>
<ul>
<li>Calculating Sum</li>
</ul>
<p>At first we were computing the sum as</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> error = df[b].S.sum() / df.S.sum() * 100
</code></pre></div></div>
<p>but later changed it to</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>error_cutoff = df[b].sum(df[b].S) / df.sum(df.S) * 100
</code></pre></div></div>
<p>The time taken to calculate 25 spectra decreased from 6.0 s to 5.4 s</p>
<p>Code used was</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>from radis import calc_spectrum
import time
t0 = time.time()
for i in range(25):
s = calc_spectrum(2000, 2010, # cm-1
molecule='CO',
isotope='1,2,3',
pressure=1.01325, # bar
Tgas=1000,
mole_fraction=0.1,
databank='hitemp', # or 'hitemp'
diluent = "air",
verbose = 0,
engine = "vaex"
)
t1 = time.time()
print(t1 -t0)
</code></pre></div></div>
<ul>
<li>Not using df.extract()
After masking some of the rows, that is filtering some of the rows based on some conditions . Then I was using df.extract(), later i found it was using a lot of time .So i commented that and refactored code to work without it .
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>df = df.extract() # later commented it .
</code></pre></div> </div>
</li>
</ul>
<p>Improvements after this was quite impressive as i found out running below codes</p>
<p>It reduced calculation time for the code below by 10 seconds</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>from radis import calc_spectrum
import time
t0=time.time()
s = calc_spectrum(1500, 2500, # cm-1
molecule='H2O',
isotope='1,2,3',
pressure=1.01325, # bar
Tgas=700, # K
mole_fraction=0.1,
wstep='auto',
path_length=1, # cm
databank='hitemp', # or 'hitemp', 'geisa', 'exomol'
engine='vaex',
)
s.apply_slit(0.5, 'nm') # simulate an experimental slit
t1=time.time()
print('Time taken : '+str(t1 - t0))
</code></pre></div></div>
<p>And reduced 0.5 seconds calculation time for the code</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>from radis import calc_spectrum
import time
t0=time.time()
s = calc_spectrum(2000, 2010, # cm-1
molecule='CO',
isotope='1,2,3',
pressure=1.01325, # bar
Tgas=1000,
mole_fraction=0.1,
databank='hitran', # or 'hitemp'
diluent = "air",
verbose = 3,
engine = "vaex"
)
t1=time.time()
print('Time taken : '+str(t1-t0))
</code></pre></div></div>
<p>After all of this updated time graph were as below
<img alt="Vaex Comparison" src="https://1someshverma.github.io/images/updatedCom.png"></p>
<p><img alt="Vaex Comparison" src="https://1someshverma.github.io/images/updatedTotal.png"></p>
<p>I significant improvement can be observed from it .</p>
<p>Now to smaller time performance of smaller dataframe , I converted the vaex dataframes to pandas for smaller databases. And overall improvemets are as</p>
<ul>
<li>Memory performance is improved for all dataframes.</li>
<li>Time performance is same for smaller dataframes , and for larger dataframes time performance of vaex is quite better than Pandas.</li>
</ul></description><category>radis</category><guid>http://openastronomy.org/Universe_OA/posts/2023/07/20230706_0000_1someshverma/</guid><pubDate>Wed, 05 Jul 2023 23:00:00 GMT</pubDate></item><item><title>Setting Up for the Kurucz PR and transitioning to the TheoReTS</title><link>http://openastronomy.org/Universe_OA/posts/2023/07/20230702_1407_menasrac/</link><dc:creator>Racim MENASRIA</dc:creator><description><p>This week was not the most enjoyable phase of the project so far, as I had to exert considerable effort to fix failing tests before opening a pull request.</p>
<p>Once I ensured that the initial tests passed, I wrote my own tests to confirm that the new AdB Kurucz class didn’t interfere with any part of the existing code. At this point, I encountered a primary issue. I hadn’t noticed that one of the methods I had adapted from ExoJAX was still reading a file which necessitated an ExoJAX package dependency. This caused the build to fail on GitHub due to one of the tests in my kurucz_test.py file failing.</p>
<p>Since there’s a conflict related to the JAX installation on Windows, I couldn’t add it to the requirements file. Doing so would create a conflict for every Windows user installing Radis. Consequently, I had to write a program to extract the data from this package and store a copy of it in a local file called pfdat.txt. This enabled the problematic function to read from the local copy instead of the ExoJAX file. This solution successfully rectified the problem, and now my PR passes the tests and is awaiting review before merging.</p>
<!-- TEASER_END -->
<p>The next step is to transition to the TheoReTS as planned. According to the TheoReTS website, it is an information system for theoretical spectra based on variational predictions from molecular potential energy and dipole moment surfaces. It is jointly developed by the PMT team of GSMA (Reims), Tomsk University, and IAO Acad Sci. Russia. As a result, it provides two access points, one French and the other Russian. However, I noticed that the access to the French website (<a href="http://theorets.univ-reims.fr/">http://theorets.univ-reims.fr/</a>) is currently unavailable, preventing me from visualizing the data. This is an issue I should discuss with my mentors.</p>
<img alt="" height="1" src="https://medium.com/_/stat?event=post.clientViewed&amp;referrerSource=full_rss&amp;postId=d9643c0269aa" width="1"></description><category>radis</category><guid>http://openastronomy.org/Universe_OA/posts/2023/07/20230702_1407_menasrac/</guid><pubDate>Sun, 02 Jul 2023 13:07:42 GMT</pubDate></item><item><title>Comparing memory performance of Vaex and Pandas</title><link>http://openastronomy.org/Universe_OA/posts/2023/06/20230624_0000_1someshverma/</link><dc:creator>Somesh Verma</dc:creator><description><p>After completing all the changes to compute Spectrum using Vaex , I compared the memory used during the execution of the program . I used tracemalloc to compute memory uses to compute Spectrum .</p>
<h4 id="computing-the-spectrum">Computing the spectrum</h4>
<!-- TEASER_END -->
<p>Following code is used , and memory maximum memory used during the execution of this code is recorded for Vaex and Pandas</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
from radis import calc_spectrum
import tracemalloc
tracemalloc.start()
s, factory_s = calc_spectrum(1800, 2500, # cm-1
molecule='H2O',
isotope='1,2,3',
pressure=1.01325, # bar
Tgas=700, # K
mole_fraction=0.1,
path_length=1, # cm
databank='hitemp', # or 'hitemp', 'geisa', 'exomol'
wstep='auto',
use_cached=False,
engine='pandas',
return_factory=True,
)
s.apply_slit(0.5, 'nm') # simulate an experimental slit
s.plot('radiance')
print(tracemalloc.get_traced_memory())
tracemalloc.stop()
</code></pre></div></div>
<h4 id="results">Results</h4>
<p>It can be seen from the graph below that Vaex takes very less memory space in comparison to Pandas.</p>
<p><img alt="Vaex Comparison" src="https://1someshverma.github.io/images/vaexcomparison.png"></p></description><category>radis</category><guid>http://openastronomy.org/Universe_OA/posts/2023/06/20230624_0000_1someshverma/</guid><pubDate>Fri, 23 Jun 2023 23:00:00 GMT</pubDate></item><item><title>Implementation of the Kurucz database to Radis</title><link>http://openastronomy.org/Universe_OA/posts/2023/06/20230617_2033_menasrac/</link><dc:creator>Racim MENASRIA</dc:creator><description><p>As planed in my last article, I started my project by adding a first database to Radis : Kurucz.</p>
<p>I based my work on a existing class developed in Exojax. I reviewed the associated methods that allowed to download the data from the database, store it in numpy arrays and extract the key information from it for further calculation.</p>
<p>By running a few examples on Exojax, I got familiar with the structure and nature of the data and key functions.</p>
<!-- TEASER_END -->
<p>However, I noticed a problem in the install command of Radis’s sister code Exojax while running theses examples. After further investigation with my mentors and the Exojax team, it appeared to be a jax problem so we couldn’t fix it for the moment.<br>Since I could only make them work on a wsl environment, I couldn’t afford to import jax libraries used in the AdbKurucz database implemented to Exojax. This is the reason why I had to adapt the structure of the data and methods and stick to Pandas dataframes and numpy arrays.</p>
<h4><strong>First try : using a DatabaseManager structure</strong></h4><p>As a explained it in the previous article, Radis has developed a special Class the handle the database processes. Since Kurucz is an atomic Database, I tried to implement it by making it inherit from the DatabaseManager class and setting the molecule parameter to “None”. Unfortunately, it led to many exceptions in the methods that I gave up on this idea.</p>
<h4><strong>Second try : using Exojax methods without jax imports</strong></h4><p>This approach provided very nice results because the major part of the methods were already efficient. <br>Nevertheless I had some errors because of the data wasn’t loaded properly or syntax errors had broken a few parts of the code.</p>
<p>I finally managed to load, store and use the data from Kurucz.</p>
<p>Then I added an example to show how this new database can be used.</p>
<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*oJXYNuWaqdZFCpuhYTNSjw.png"></figure><p>This is the first spectrum that I obtained from the Kurucz database for Fe.</p>
<p><strong><em>A bit more explanations:</em></strong></p>
<p>The population of the lower energy level of a given transition is the number of atoms that are in that energy state at any given time. So if you have a large population in a certain energy state, you have a lot of atoms that are able to make the transition and therefore emit a photon. Einstein’s coefficient A for a particular transition is a measure of the probability of that transition occurring. So if A is large, then each atom has a high chance of making the transition and emitting a photon. Thus, the intensity of the spectral line (i.e. the number of photons emitted per unit time) is proportional to both the population of the lower energy level (the number of atoms capable of making the transition) and to A (the probability that each atom actually makes the transition). So the intensity can be approximately represented as A * population.</p>
<p>The users can chose the temperature and the function then interpolates the values from the database and plots the spectrum.</p>
<p>In order to generalize this to all the atoms and ions of the database, I had to adjust the function load_pf_Barklem2016() from Exojax and fix an error in the way the partition functions were extracted.</p>
<p>Now I can load the data and use it properly. For Kurucz’s data, each file corresponds to a single species of atom only. For example, “gf2600.all” is dedicated to absorption lines of “neutral iron atoms”. The “26” is the atomic number of iron, followed by a “00” indicating zero ionization (=neutral; Fe I). For example, if you want to use spectral lines of singly-ionized sodium (Na II or Na+), you should download “gf1101.all”.</p>
<p>Here is another example for Ca.</p>
<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*q_748Zt_K4rqW5fWtFAdTQ.png"></figure><h4><strong>What is next ?</strong></h4><p>I will end up this week by adding a few tests to ensure my code doesn’t break any part of the Radis architecture and may go for a PR in the next days.</p>
<p>Then the next step for Week 4 will be to implement the TheoReTS database to Radis .</p>
<img alt="" height="1" src="https://medium.com/_/stat?event=post.clientViewed&amp;referrerSource=full_rss&amp;postId=27c2724fde74" width="1"></description><category>radis</category><guid>http://openastronomy.org/Universe_OA/posts/2023/06/20230617_2033_menasrac/</guid><pubDate>Sat, 17 Jun 2023 19:33:30 GMT</pubDate></item><item><title>Refactoring code for calculating Spectrum Using Vaex dataframe</title><link>http://openastronomy.org/Universe_OA/posts/2023/06/20230616_0000_1someshverma/</link><dc:creator>Somesh Verma</dc:creator><description><p>After the community bonding period , I continued refactoring the code for vaex .
I have the following the things in this period</p>
<!-- TEASER_END -->
<ul>
<li>Test cases for fetch_databank() and load_databank() functions</li>
<li>Replaced the portion of code which is different for Vaex as compared with Pandas</li>
</ul>
<h4 id="test-cases-for-loading-dataframe-in-vaex-dataframe-format">Test Cases for loading dataframe in Vaex dataframe format</h4>
<p>Test Cases are written to test the following :</p>
<ul>
<li>Number of columns are same in both Vaex dataframe and Pandas dataframe</li>
<li>Number of lines in same in both the dataframe</li>
<li>Compared the value of some of the column with corresponding the column value</li>
<li>Calculated the spectrum of CO molecule under equilibrium and compared both the spectrum are the same</li>
</ul>
<p>One of the test case added is following , I have simply fetched dataframe in Vaex dataframe format and pandas , then compared the if the above conditions are satisfied are not .</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>def test_df_from_vaex_and_pandas():
from radis.lbl import SpectrumFactory
sf = SpectrumFactory(
2284,
2300,
wstep=0.001, # cm-1
pressure=20 * 1e-3, # bar
cutoff=0,
path_length=0.1,
mole_fraction=400e-6,
molecule="CO",
isotope="1,2",
medium="vacuum",
truncation=5,
verbose=0,
)
sf.engine = 'vaex'
sf.warnings["MissingSelfBroadeningWarning"] = "ignore"
# Testing Hitran
sf.fetch_databank("hitran", memory_mapping_engine='vaex', output='pandas', load_columns="all")
df_pandas = sf.df0
sf.fetch_databank("hitran", memory_mapping_engine='vaex', output='vaex', load_columns="all")
df_vaex = sf.df0
assert df_vaex[0][0] == df_pandas.iloc[0]["wav"]
columns_vaex = df_vaex.column_names
columns_pandas = df_pandas.column
comparison = (df_vaex.column_names == df_pandas.columns)
assert comparison.all()
# Testing Hitemp
sf.fetch_databank("hitemp", memory_mapping_engine='vaex', output='pandas', load_columns="all")
df_pandas = sf.df0
sf.fetch_databank("hitemp", memory_mapping_engine='vaex', output='vaex', load_columns="all")
df_vaex = sf.df0
assert df_vaex[0][0] == df_pandas.iloc[0]["wav"]
columns_vaex = df_vaex.column_names
columns_pandas = df_pandas.column
comparison = (df_vaex.column_names == df_pandas.columns)
</code></pre></div></div>
<h4 id="refactoring-the-code-to-calculate-the-spectrum-using-vaex-dataframe">Refactoring the code to calculate the Spectrum using Vaex dataframe</h4>
<p>To calculate the spectrum using the Vaex dataframe ,I had to made changes to the following</p>
<ul>
<li>radis/api/hitranapi.py</li>
<li>radis/api/cdsdapi.py</li>
<li>radis/api/hdf5.py</li>
<li>radis/api/cache_files.py</li>
<li>radis/api/tools.py</li>
<li>radis/io/exomol.py</li>
<li>radis/io/geisa.py</li>
<li>radis/io/hitran.py</li>
<li>radis/io/query.py</li>
<li>radis/lbl/base.py</li>
<li>radis/lbl/broadening.py</li>
<li>radis/lbl/calc.py</li>
<li>radis/lbl/loader.py</li>
</ul>
<p>These are all files which are involved in calculating the Spectrum of molecule(s) ,from loading to calculating . I have only refactord the part that is required to Equilibrium Calculations .I kept the previous implementation of Pandas and just added code for Vaex .</p>
<p>For loading the main changes are done in hitranapi.py for parsing the molecules and i also had to spend a lot time on it to find the equivalent operations functions in Vaex as in Pandas .
Finally , i was able to resolve the issue and below is final code that worked for me .</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>def _parse_HITRAN_class1(df, verbose=True,dataframe_type="pandas"):
r"""Diatomic molecules: CO, HF, HCl, HBr, HI, N2, NO+
Parameters
----------
df: pandas Dataframe
lines read from a HITRAN-like database
Notes
-----
HITRAN syntax [1]_ :
&gt;&gt;&gt; v
&gt;&gt;&gt; 13x I2
References
----------
.. [1] `Table 3 of Rothman et al. HITRAN 2004 &lt;https://www.cfa.harvard.edu/hitran/Download/HITRAN04paper.pdf&gt;`__
"""
if dataframe_type == "vaex":
# 1. Parse
extracted_values = df['globu'].str.extract_regex(pattern = r"[ ]{13}(?P&lt;vu&gt;[\d ]{2})")
df['vu'] = extracted_values.apply(lambda x : x.get('globu'))
df['vu'] = df.evaluate(df['vu'])
extracted_values = df['globl'].str.extract_regex(pattern = r"[ ]{13}(?P&lt;vl&gt;[\d ]{2})")
df['vl'] = extracted_values.get(df['globl'])
df['vl'] = df.evauate(df['vl'])
# 2. Convert to numeric
cast_to_int64_with_missing_values(df, ["vu" ,"vl"], dataframe_type=dataframe_type)
# 3. Clean
del df["globu"]
del df["globl"]
return df
elif dataframe_type == "pandas":
# 1. Parse
dgu = df["globu"].astype(str).str.extract(r"[ ]{13}(?P&lt;vu&gt;[\d ]{2})", expand=True)
dgl = df["globl"].astype(str).str.extract(r"[ ]{13}(?P&lt;vl&gt;[\d ]{2})", expand=True)
# 2. Convert to numeric
cast_to_int64_with_missing_values(dgu, ["vu"],dataframe_type=dataframe_type)
cast_to_int64_with_missing_values(dgl, ["vl"],dataframe_type=dataframe_type)
# 3. Clean
del df["globu"]
del df["globl"]
return pd.concat([df, dgu, dgl], axis=1)
else:
raise NotImplementedError(dataframe_type)
</code></pre></div></div>
<p>For other part of the code , i added an additional parameter self.dataframe_type and whenever operations are different for vaex and pandas ,I used it to execute the part of code for the respective dataframe type as whether it is Vaex or Pandas .</p>
<p>#####Spectrum using Vaex and Pandas</p>
<ul>
<li>Code used</li>
</ul>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>from radis import calc_spectrum
s, factory_s = calc_spectrum(1800, 2500, # cm-1
molecule='H2O',
isotope='1,2,3',
pressure=1.01325, # bar
Tgas=700, # K
mole_fraction=0.1,
path_length=1, # cm
databank='hitemp', # or 'hitemp', 'geisa', 'exomol'
wstep='auto',
use_cached=False,
engine='vaex',
return_factory=True,
)
s.apply_slit(0.5, 'nm') # simulate an experimental slit
s.plot('radiance')
</code></pre></div></div>
<h6 id="using-vaex">Using Vaex</h6>
<p><img alt="spectrum using vaex" src="https://1someshverma.github.io/images/specturm-using-vaex.png"></p>
<h6 id="using-pandas">Using Pandas</h6>
<p>As i also kept the Pandas implementation , spectrum calculated using that is</p>
<p><img alt="spectrum using pandas" src="https://1someshverma.github.io/images/specturm-using-vaex.png"></p></description><category>radis</category><guid>http://openastronomy.org/Universe_OA/posts/2023/06/20230616_0000_1someshverma/</guid><pubDate>Thu, 15 Jun 2023 23:00:00 GMT</pubDate></item></channel></rss>