Histograms from timed dataframe #116

rettigl · 2023-04-03T22:02:42Z

This adds support for a per-time-unit dataframe, and histogram calculation based on this dataframe.

Histogram calculation in the example notebook takes on our machine ~55s. This can potentially still be improved, e.g. by taking our fast binning method (currently fails because of xarray generation and concatenation issues)

rettigl · 2023-05-15T21:14:39Z

This does not produce the right kind of normalization histogram yet, I think...

rettigl · 2023-05-17T21:41:14Z

This implements both variants now. Both routes produce almst identical results, but histograms from timed dataframes are substantially faster. Also a normalization routine for the compute function is added.

One issue remains: If histogram axes are jittered, the actual values of the jitter are different during computation of the data and of the normalization histogram, leading to small errors in the histogram on the order of a percent or so. A workaround would be to include the histogram calculation into bin_partition. This would also avoid reading the data twice, which I think is the major time bottleneck at the moment.

rettigl · 2023-05-29T20:41:21Z

Histogram ontop of data (continously scanned delay stage):

Performance comparison:

Two histogram methods compared (bottom: difference):

Histogram computed twice and compared (jittered):

Bottomline: All effects of randomness and difference between methods are < 1%.

rettigl · 2023-08-12T20:11:07Z

@zainsohail04 This one is pending on your implementation into the Flash loader (and some review)...

rettigl · 2023-09-23T22:51:12Z

Rebased, and added dummy implementation for flash loader. Tests are also still pending.

coveralls · 2023-09-23T22:56:17Z

Pull Request Test Coverage Report for Build 6760852807

291 of 314 (92.68%) changed or added relevant lines in 14 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.7%) to 90.759%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
sed/loader/flash/loader.py	6	7	85.71%
sed/loader/generic/loader.py	3	4	75.0%
sed/loader/mpes/loader.py	34	39	87.18%
sed/core/processor.py	92	108	85.19%

Totals
Change from base Build 6733527384:	0.7%
Covered Lines:	4901
Relevant Lines:	5400

💛 - Coveralls

rettigl · 2023-10-10T15:19:59Z

@zain-sohail For me the flash timed_dataframes don't work. They consume endless amounts of memory and never finish computation.

zain-sohail · 2023-10-10T15:26:21Z

@zain-sohail For me the flash timed_dataframes don't work. They consume endless amounts of memory and never finish computation.

Just tried and it seems to work fine. Where are you facing a problem?

rettigl · 2023-10-10T17:40:57Z

After merging the fast loading code from main, it is at lead performing faster now, but still does not seem to do the right thing:
I am loadings runs=[44824, 44825, 44826, 44827]

It does not make sense that the timed dataframe has 8 times more rows...

rettigl · 2023-10-10T17:42:59Z

Or does it? Are there so many empty macrobunches?

rettigl · 2023-10-10T17:50:46Z

Hmm, okay the generated histogram does indeed not look to bad

rettigl · 2023-10-10T18:43:34Z

The time-stamp version does produce some strange artefacts, though:

steinnymir · 2023-10-10T18:46:09Z

Or does it? Are there so many empty macrobunches?

if you look at the tables you show, the pulse IDs in the electron dataframe are very sparse (7, 38, 69...) so it seems like count rate was really bad, and therefore I expect many empty pulses.

rettigl · 2023-10-10T18:49:50Z

rebased to current main

rettigl · 2023-10-10T18:54:43Z

Or does it? Are there so many empty macrobunches?

if you look at the tables you show, the pulse IDs in the electron dataframe are very sparse (7, 38, 69...) so it seems like count rate was really bad, and therefore I expect many empty pulses.

I realized that after posting my previous post, but don't undertand the data format completely yet. What exactly are the Train, pulse, and electron IDs? Are these Macrobunches, microbunches, and electrons per microbunch? And what is the time-base for the timed_dataframe? I was expecting macrobunches, but then I would expect one row per trainID, no? Now it seems one row per microbunch. This is certainly typically more than electrons. And why is it going up to 1000, if tpyically there are 400 or 500 microbunches in a macrobunch?

steinnymir · 2023-10-10T19:11:15Z

How you described it is all correct.
the reason there are 1000 is I think because the data structure accomodates up to 1000 pulses, as flash can theoretically provide 800, which it never does, max 500 in reality.
I'm not sure if using trainID (macrobunches) would be still ok, don't you think a coarser step size induces more noise?

rettigl · 2023-10-10T19:49:58Z

I'm not sure if using trainID (macrobunches) would be still ok, don't you think a coarser step size induces more noise?

All you want here is to capture changes of the scanned variable, which typically is anyways only read out once per macrobunch, no? So no, there should be no difference. It would only make a (marginal) difference for parameters that are varied within the macrobunch. Not sure, though, how this would look like if you consider, e.g. BAM correction etc.

steinnymir · 2023-10-10T20:23:23Z

All you want here is to capture changes of the scanned variable, which typically is anyways only read out once per macrobunch, no? So no, there should be no difference. It would only make a (marginal) difference for parameters that are varied within the macrobunch. Not sure, though, how this would look like if you consider, e.g. BAM correction etc.

Right! but still there might be some pulse resolved normalizations one would want, like the FEL intensity for example. I'm afraid we need to keep the per-pulse dataframe for those

zain-sohail · 2023-10-10T21:47:52Z

After merging the fast loading code from main, it is at lead performing faster now, but still does not seem to do the right thing: I am loadings runs=[44824, 44825, 44826, 44827]

Wouldn't make sense that there is a timing difference in the compute of dataframe, only makes a difference in creating buffer files quicker, I would imagine. Regarding your question about why the dataframe is bigger, I think Steinn managed to answer it.

steinnymir

Looks good! I tested it together with the hextof changes from #169 and seems to work correctly. Only exception the comments I made. Once those are resolved you can merge for what concerns me.

sed/core/processor.py

steinnymir · 2023-10-27T09:52:55Z

I accidentally uploaded the branch where i tested this. I'll leave it till this is merged. its hist_testing

…is dataframe

…code to binning.py

fix performance issue in mpes loader for timed dataframes return normalization histograms as xarrays

…essor function add accessor functions for binned and normalized histograms and normalization histograms

rettigl · 2023-10-28T22:08:49Z

I fixed this, and modified the tests to test the timed dataframes.

rettigl · 2023-10-28T22:10:04Z

I accidentally uploaded the branch where i tested this. I'll leave it till this is merged. its hist_testing

I also updated your test branch

steinnymir

I tested this and seems to work fine. LGTM!

…_dataframe

steinnymir · 2023-11-04T09:11:59Z

closes #101

rettigl · 2023-11-04T10:51:46Z

I am working on improving tests for this still, and update the new processor functions, then I will merge

sed/calibrator/energy.py

rettigl mentioned this pull request Apr 3, 2023

Histograms from timestamps #117

Closed

rettigl requested a review from steinnymir April 3, 2023 22:05

rettigl force-pushed the histograms_from_timed_dataframe branch 3 times, most recently from 7a2ba24 to c9b8a17 Compare May 15, 2023 21:05

rettigl requested a review from zain-sohail May 17, 2023 21:41

rettigl mentioned this pull request May 26, 2023

Possibility for normalization by the number of FEL pulses: #101

Closed

rettigl force-pushed the histograms_from_timed_dataframe branch from d3076e6 to 0d64a32 Compare September 23, 2023 22:47

rettigl force-pushed the histograms_from_timed_dataframe branch from 0d64a32 to 1fc17c6 Compare September 23, 2023 23:22

zain-sohail force-pushed the histograms_from_timed_dataframe branch from 1fc17c6 to b8f996b Compare September 29, 2023 12:42

rettigl force-pushed the histograms_from_timed_dataframe branch from b8f996b to 2fc1e0a Compare October 10, 2023 18:48

steinnymir requested changes Oct 27, 2023

View reviewed changes

sed/core/processor.py Show resolved Hide resolved

sed/core/processor.py Show resolved Hide resolved

sed/core/processor.py Show resolved Hide resolved

rettigl and others added 8 commits October 28, 2023 22:37

Adds support for a timed dataframe, and histogram calculation from th…

93ebed6

…is dataframe

unified both methods for histogram calculation, and moved the actual …

773f402

…code to binning.py

add config value for unit time of timed data frame

509a489

add normalization option to compute function of processor

c4ccef4

fix performance issue in mpes loader for timed dataframes return normalization histograms as xarrays

add jittering of timed dataframe

57ac19b

fix merge issues

fc5cb26

added the timed dataframe

1acba31

add tests for timed dataframes, histogram generation and related proc…

4377ad2

…essor function add accessor functions for binned and normalized histograms and normalization histograms

rettigl force-pushed the histograms_from_timed_dataframe branch 3 times, most recently from 05c2eb2 to 0d621d9 Compare October 28, 2023 21:29

add exceptions for missing columns in timed_dataframe

9c3faf0

rettigl force-pushed the histograms_from_timed_dataframe branch from 0d621d9 to 9c3faf0 Compare October 28, 2023 21:30

Update processor tests to use mpes loader and test for timed dataframe

aa4c35f

rettigl requested a review from steinnymir October 28, 2023 22:08

steinnymir approved these changes Nov 3, 2023

View reviewed changes

rettigl added 2 commits November 3, 2023 23:20

Merge remote-tracking branch 'origin/main' into histograms_from_timed…

af86fa1

…_dataframe

add timed dataframe to new processor functions

fc40a36

steinnymir linked an issue Nov 4, 2023 that may be closed by this pull request

Possibility for normalization by the number of FEL pulses: #101

Closed

fix bugs, and add tests for processor functions

157a07c

rettigl commented Nov 5, 2023

View reviewed changes

sed/calibrator/energy.py Show resolved Hide resolved

try fix for test failures

3dc7c5f

rettigl merged commit bb717a7 into main Nov 5, 2023
5 checks passed

rettigl deleted the histograms_from_timed_dataframe branch November 5, 2023 11:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Histograms from timed dataframe #116

Histograms from timed dataframe #116

rettigl commented Apr 3, 2023

rettigl commented May 15, 2023

rettigl commented May 17, 2023

rettigl commented May 29, 2023

rettigl commented Aug 12, 2023

rettigl commented Sep 23, 2023

coveralls commented Sep 23, 2023 •

edited

Loading

rettigl commented Oct 10, 2023

zain-sohail commented Oct 10, 2023

rettigl commented Oct 10, 2023

rettigl commented Oct 10, 2023

rettigl commented Oct 10, 2023

rettigl commented Oct 10, 2023

steinnymir commented Oct 10, 2023

rettigl commented Oct 10, 2023

rettigl commented Oct 10, 2023

steinnymir commented Oct 10, 2023

rettigl commented Oct 10, 2023

steinnymir commented Oct 10, 2023

zain-sohail commented Oct 10, 2023

steinnymir left a comment

steinnymir commented Oct 27, 2023

rettigl commented Oct 28, 2023

rettigl commented Oct 28, 2023

steinnymir left a comment

steinnymir commented Nov 4, 2023

rettigl commented Nov 4, 2023

Histograms from timed dataframe #116

Histograms from timed dataframe #116

Conversation

rettigl commented Apr 3, 2023

rettigl commented May 15, 2023

rettigl commented May 17, 2023

rettigl commented May 29, 2023

rettigl commented Aug 12, 2023

rettigl commented Sep 23, 2023

coveralls commented Sep 23, 2023 • edited Loading

Pull Request Test Coverage Report for Build 6760852807

💛 - Coveralls

rettigl commented Oct 10, 2023

zain-sohail commented Oct 10, 2023

rettigl commented Oct 10, 2023

rettigl commented Oct 10, 2023

rettigl commented Oct 10, 2023

rettigl commented Oct 10, 2023

steinnymir commented Oct 10, 2023

rettigl commented Oct 10, 2023

rettigl commented Oct 10, 2023

steinnymir commented Oct 10, 2023

rettigl commented Oct 10, 2023

steinnymir commented Oct 10, 2023

zain-sohail commented Oct 10, 2023

steinnymir left a comment

Choose a reason for hiding this comment

steinnymir commented Oct 27, 2023

rettigl commented Oct 28, 2023

rettigl commented Oct 28, 2023

steinnymir left a comment

Choose a reason for hiding this comment

steinnymir commented Nov 4, 2023

rettigl commented Nov 4, 2023

coveralls commented Sep 23, 2023 •

edited

Loading