Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

real-time proc codes and recipe #341

Merged
merged 4 commits into from
Jan 11, 2024
Merged

real-time proc codes and recipe #341

merged 4 commits into from
Jan 11, 2024

Conversation

ahmadtourei
Copy link
Collaborator

Description

This PR provides codes and a demonstration of real-time processing.

Checklist

I have (if applicable):

  • referenced the GitHub issue this PR closes.
  • documented the new feature with docstrings or appropriate doc page.
  • included a test. See testing guidelines.
  • your name has been added to the contributors page (docs/contributors.md).
  • added the "ready_for_review" tag once the PR is ready to be reviewed.

@ahmadtourei ahmadtourei added the ready_for_review PR is ready for review label Jan 10, 2024
Copy link
Contributor

@d-chambers d-chambers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions on improve conciseness but looks good otherwise.


## Set real-time processing parameters (if needed)

In this section, we define the window size and step size required for [rolling](https://dascore.org/api/dascore/proc/rolling/rolling.html) mean processing. With a sampling interval of 10 seconds, the cutoff frequency (Nyquist frequency) is determined to be 0.05 Hz. Additionally, we establish the desired wait time after each run by using the `sleep_time_mult` parameter, which acts as a multiplier coefficient for the number of seconds in each patch.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link should use DASCore's internal linking:

[rolling](`dascore.proc.rolling.rolling`)

That way each doc build will link to its own version of the rolling page rather than pointing to this static URL.

---


This recipe serves as an example to showcase the real-time processing capability of DASCore. Here, we demonstrate how to use DASCore to perform rolling mean processing on a spool in real time for edge computing purposes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest using the term "near real-time batch processing". True real-time usually implies some kind of high frequency streaming.

run_num = i+1
print(f"\nRun number: {run_num}")

# Select a updated sub-spool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need "Select an updated sub-spool"

print(f"\nRun number: {run_num}")

# Select a updated sub-spool
sp = dc.spool(data_path).sort("time").update()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not 100% sure, but I don't think update will preserve the sorting, so I suggest you change to:

sp = dc.spool(data_path).update().sort("time")

Also, any reason not to define the spool outside of the while loop so we don't have to init it each iteration?

len_updated_sp = len(sp)

# Get number of seconds in the first patch
sampling_interval = sp[0].attrs['d_time']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are trying to move away from using attrs to get coord info. I suggest:

sp[0].coords.step("time")

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops I forgot to change this from my old codes.


# Get number of seconds in the first patch
sampling_interval = sp[0].attrs['d_time']
num_sec = len(sp[0].coords["time"]) * sampling_interval
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe more clear as:

num_sec = sp[0].coords.max("time") - sp[0].coords.min("time")

Then we don't need sampling_interval in the previous line.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, actually:

num_sec = sp[0].coords.max("time") - sp[0].coords.min("time") + sampling_interval

since sp[0].coords.min("time") is equal to patch's start time + sampling_interval

num_sec = len(sp[0].coords["time"]) * sampling_interval

# Set sleep time after each run to the
sleep_time = num_sec * sleep_time_mult
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea on the sleep time being a multiple of the patch time duration.

Comment on lines 77 to 88
# Sleep longer
time.sleep(4*num_sec)

# Check whether new data was detected in the spool
sp = dc.spool(data_path).sort("time").update()
len_updated_sp = len(sp)

if len_last_sp == len_updated_sp:
print(f"No new data was detected in spool even after "
"four times of patch time, {4*num_sec} sec. "
"Real-time data processing ended successfully.")
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the example is getting a bit long already, maybe remove this and just change sleep_time_mult to 3? Then update the print statement and just break like it was before.

Comment on lines 57 to 58
initial_run = (i == 0)
run_num = i+1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just set i = 1 and

initial_run = (i==1)

so we can remove line 58?

@d-chambers
Copy link
Contributor

Also, probably goes without saying, but before merging, it would be good to manually run the code to make sure it works 😄. With the exec: false setting the doc build won't actually verify that for us.

@ahmadtourei ahmadtourei merged commit 8390f0f into master Jan 11, 2024
1 check passed
@ahmadtourei ahmadtourei deleted the real_time_proc branch January 11, 2024 20:57
@ahmadtourei
Copy link
Collaborator Author

Also, probably goes without saying, but before merging, it would be good to manually run the code to make sure it works 😄. With the exec: false setting the doc build won't actually verify that for us.

Thank you for the comments! Please feel free to re-open if any further improvement is needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready_for_review PR is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants