Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

questions related to fast interferometry #49

Open
caseyjlaw opened this issue Nov 1, 2016 · 4 comments
Open

questions related to fast interferometry #49

caseyjlaw opened this issue Nov 1, 2016 · 4 comments
Labels

Comments

@caseyjlaw
Copy link
Collaborator

With help from Danny, I'm playing with bifrost in the context of my work with fast-sampled interferometric data. I like what you have so far and I think it could work for me, but there are a few big issues I need to understand first. I would appreciate any tips to help me move forward.
For context, my data analysis had previously been based on Python and multiprocessing (e.g., http://github.com/caseyjlaw/rtpipe). I need to redesign to use GPUs. I also need to scale up to a 32-node cluster, although that is easy given how my problem parallelizes.
The basic pipeline is:

  1. read visibility data into 4d numpy array,
  2. multiply each element by a complex gain,
  3. run statistical tests to identify bad data (and set it to 0),
  4. dedisperse and resample 4d array (iteration, baseline, channel, polarization),
  5. grid visibilities and run 2d FFT for each DM and time.

I'd appreciate input on the following concepts:

  1. Block computing load
    In my playing, I see that bifrost tends to assume that each block gets a core. My tests quickly reveal bottlenecks in my bifrost pipeline. What is the best way to deal with blocks of different computing demand? A related point is how best to optimize for CPU usage (just profile? or is there way of getting the equivalent a resource pool as in multiprocessing?).

  2. Parallelism
    Traditionally, I've read data into shared memory (on say an 8-core worker node) and used multiprocessing as best I could to get 100% CPU usage. Multiprocessing makes this simple: open, start, close the pool. What is the best way to do this with bifrost? I could imagine having the Pipeline class include this as a feature via the threading library.

  3. High-D data
    I typically work with a 4d numpy array. How should I use concepts like "gulp" and so on? They seem tuned to a 1d stream.

  4. Iteration
    My past use case pushed me to read in lots of data (1 to 10 GB) into memory and craft each stage of the parallelize efficiently. This lets me measure statistical properties for large segments of data (e.g., to find a bad channel). However, bifrost encourages the algorithm to treat each iteration through a data stream as independent. What is the recommended way of measuring statistical properties over large data volumes with bifrost?
    I've also noticed that in some of stages of my pipeline I'd like to iterate through the data structure in one way (e.g., by channel), while later I may want to iterate in a different way (e.g., by integration). Is that allowed with bifrost?

Thanks for your time. I am excited by the potential here and am willing to put more time into making it work for my use case.

@benbarsdell
Copy link
Collaborator

benbarsdell commented Nov 2, 2016

This is excellent feedback, thanks Casey!

I think several of the issues you mention relate to the fundamental design of Bifrost vs. other approaches to solving similar problems. While the intention is to be as modular and flexible as possible (and we hope to support many varied use-cases), Bifrost primarily targets applications involving continuously streaming data; i.e., where there is a dominant 'time' axis. This manifests in its use of ring buffers (which, importantly, are not designed to behave like queues) to pass data between blocks and its ability to have each block run asynchronously. This design does not preclude it from being used for more batch-oriented processing, but it may mean that it provides less advantage over other approaches for such applications.

  1. Block computing load
  2. Parallelism

Bifrost provides a means of implementing pipeline parallelism, but not batch parallelism. As it currently stands, parallelism within a block is the responsibility of the block (typically one would use OpenMP or the GPU). It might be possible to provide some form of automatic batch parallelism, but it has not been on my radar. The model I've had in mind so far is to use multiple pipelines for parallelising over batches (e.g., running on multiple compute nodes as you suggest).

  1. High-D data

This problem should be addressed imminently; there will be full support for N-D data, which should make things easier. However, blocks will still operate on 'gulps' of time.

  1. Iteration
    What is the recommended way of measuring statistical properties over large data volumes with bifrost?

If you need to compute statistics over large spans of time (and cannot do it using single-pass/online algorithms), then you will need to have blocks process large gulps at a time. Theoretically you could gulp in the entire dataset at once if it will fit in memory, but this is not the ideal way to use the library.

I've also noticed that in some of stages of my pipeline I'd like to iterate through the data structure in one way (e.g., by channel), while later I may want to iterate in a different way (e.g., by integration). Is that allowed with bifrost?

Bifrost always steps through data along a single axis (usually time, e.g., streamed from some data source), where each 'frame' contains all the other axes. E.g., you might read in 10 time slices each containing 100 channels and 2 polarisations, and you process that [10,100,2] array before moving on to the next 10 time slices. The idea is for the time axis to be processed in smalls gulps so that they fit in memory.

A quick question about your processing pipeline: does 'iteration' correspond to the time axis? And for dedispersion, is that just rolling the time axis, without summing over channels?

@caseyjlaw
Copy link
Collaborator Author

Thanks, Ben. Yes, we do use time as our basic iteration axis. That is also a natural way to parallelize the bulk of the processing (e.g., read 8 integrations and image one integration per core).

Your suggestion of reading in a small number of integrations would be fine to feed most of the algorithms in my pipeline. The killer is the dedispersion, though. We have dispersion delays as large as hundreds of integrations, which is why we typically read in frames larger than that.

I wonder if there's a way to do this by having a dedispersion block write out to multiple rings? Perhaps one ring per DM trial, each of which accumulates data until it has the entire DM sweep.

It sounds like my algorithms are generally more oriented around batches. To bridge that to the stream concept, I'll need a good strategy for accumulating data. If there is a good general way to do that, it may extend your streaming concepts to more use cases.

@MilesCranmer
Copy link
Collaborator

Hi Casey,

Thanks for trying Bifrost out. Would you like to join in on our next telecon? (usually they are on Fridays at 1pm PT). I think hearing more about your experience so far would be useful for prioritizing different development ideas we have for Bifrost.

Regarding data accumulation, would you need to put the DM trials in rings? Which blocks are immediately after the dedisp block in your pipeline? You can accumulate data over multiple gulps of a ring within a block, if that is what you are asking. I think if an algorithm requires some sort of internal accumulation, it generally might be best to keep that as one whole block (coarse-grained pipelines are fine in BF).

class ...Block...:
    # block definition (for current /master branch API)
    def main(self):
        ...
        for inspan in self.read('in_1'):
            ... # some calculation
            accumulator.append(data)
            ... # do something with accumulator

If you are reusing gulp data between reads, there are plans to make a method that will let you provide stride specifications to ring reads, so you can re-read data on a ring, which will make sliding window reads on a ring for things like FIR filters more convenient.

There are also plans to make another data type for storing models that are intermittently updated by a block (e.g., gains), but that data structure has not been fleshed out yet.

Also, apologies we don't have much documentation at the moment. Please post usage issues/send emails for questions as they come up...

@caseyjlaw
Copy link
Collaborator Author

I put your telecon time on my calendar, so I'll keep it in mind.
In answer to your other questions:

  • The only blocks I am using now are NumpySourceBlock and NumpyBlock. No dedispersion in bifrost yet; just reading and multiplying by gain factors. My dedispersion algorithm is custom and runs on a 4d numpy array.
  • It does sound like a large block read (multiple GB) makes sense for me. I'm concerned that if I create lots of blocks and rings then I'll have huge memory usage. Is that a valid concern?
  • Nice to know there are options for accumulation. Your example makes me think I could just accumulate data in a block until it triggers some logic to run the algorithm (e.g., wait until 1 second of data is available before searching for DM delays of 1 second).
  • Some of the docs are good, but I am relying a lot on examples. I still don't know how the headers work, for example.

BTW, prior to hearing about bifrost, my plan was to use dask distributed. That is designed more generally and tends to serialize everything and pass it around. Serialization is my biggest concern there, which is why bifrost is attractive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants