Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameters for noisy, radiation damaged, inelastic scattering-inclusive, fractionated dataset generation #26

Open
JatGreer opened this issue Feb 21, 2024 · 0 comments

Comments

@JatGreer
Copy link
Contributor

JatGreer commented Feb 21, 2024

The dataset will be composed of the same conformations sampled from the partially open covid spike trimer trajectory (...71...) from DE-Shaw datasets.

Previous datasets have used no radiation damage, no inelastic scattering, a high electron dose in a 1s window (45e/A^2/s) (affecting radiation damage, DQE and MTF metadata outputs) and either no or insufficient fractionation.

We need to decide what parameters to use to generate this dataset:

Dataset size:

  • Small to keep data volume small, but needs to be large enough for HRAs to have a good chance of training. Maarten has shown cryoDRGN trains well on a dataset of 50k particles for RTC dataset with fractionation of 3 images per movie (no radiation damage and static particles).
  • Decide on particles per micrograph (and therefore no. micrographs)
  • Understand capabilities of image compression for dataset transfer
  • 3 images per movie, 4000x4000 images for 50k particles is 30GB. Therefore, for 50k particles, each frame in a movie for our standard micrograph size is 10GB of data. 45 frames would be 450GB of micrograph data.
  • Need to benchmark simulation speed (V100)
  • Need to benchmark image/movie compression

Answers:

  • dataset est 10k simulated at a time

Fractionation:

  • The fractionation of the movie affects how realistic the implementation of radiation damage is. If we use 45e-/A^2/s, for a single image with no radiation damage, we will see high freq info. If we turn on radiation damage, we lose high freq info. If we fractionate, we will still have high freq information in the first few frames but less in the later frames.
  • We should establish how many frames we actually need (given that we have no motion and later frames may not be useful in data processing).
  • Establish how to vary the electron dose per second via roodmus CLI args and how that syncs up with the current fractionation implementation.
  • After chatting with Colin, for our case, we should be able to say our total e-/A^2/s is a certain amount and then simply fractionate into multiple frames, but we need to look into the code to make sure we understand how this is actually applied (ie: tomo approach might assume different things than if the Parakeet code was designed for SPA). One thing I can think of off top of head is that the DQE/MTF depends on e-/A^2/s. Double check how Parakeet config parameters can be used to get a e-/A^2/s value that is compatible with the DQE interpolation present in Parakeet.

Answers:

  • 45 e-/A^2, which is used as the input as this is split over the fractionation by parakeet
  • want total of 45e-2/A^2, with 1.5 e-/A^2 per frame. Which means an exposure of 1.5e-/A^2/s, which falls within the interpolation range of Parakeets DQE functionality. This requires exposure time to be set to 30s.

Ice model:

  • Planning to continue using the GRF ice model. Need to make sure understand how that is applied when fractionation is present

Answers:

  • using GRF ice model

Radiation damage:

  • Affected by fractionation as explained above
  • need to choose sensitivity coefficient (ie: understand the default and if necessary choose a better one)

Answers:

  • Using 0.022 as sensitivity coefficient and turning on radiation damage

Inelastic scattering:

  • Need to decide whether to use inelastic scattering model, the difference between the two models James suggests (zero_loss and mp_loss) and any sub-parameters of each.
  • Plan is currently to implement one of these in the new dataset

Answers:

  • Not using inelastic scattering
  • could be introduced in later datasets

DQE:

  • Parakeet generation of MTF file (metadata) assumes 1e-/A^2/s (I think...), which is a hard-coded parameter. Need to also ensure DQE is within range 1-5, so that interpolation from DQE table is a reasonable approximation. Previous datasets use the values from 5, as the electron dose was set to 45 and there is no extrapolation applied above 5.

Answers:

  • As explained above, 1.5e-/A^2/s puts us within DQE table values (which are pre-installation manual values for a Krios G4 at 300kV

Sample thickness:

  • We use 500A for all micrographs. This affects ...I think the radiation damage implementation?... re-look this up and finish
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant