bug in agilent .uv parser #28

ethanbass · 2022-03-27T14:55:32Z

I was investigating the UV parser more and I think there are still some problems. For example, I was trying to import a UV file from my lab and it looks pretty good for about the first 15 minutes, but then the baseline starts going all over the place. Any idea what might be going on? I'm attaching a picture of the entab imported file in black and the CSV I exported from chemstation in blue.

The example file that ships with entab doesn't look too good either:

Below is the code to reproduce what I did in R. You can find the file I tried to convert and the CSV version here https://cornell.box.com/v/example-DAD-files .
Thanks!
Ethan

library(entab)
files[1]
path <- "~/Library/CloudStorage/Box-Box/kessler-data/lactuca/botrytis_experiment/data/lettuce_roots/ETHAN_01_19_21 2021-01-20 00-27-52/679.D/dad1.uv"
r <- as.data.frame(Reader(path))
ch.entab <- data.frame(tidyr::pivot_wider(r, id_cols = "time",
                        names_from = "wavelength", values_from = "intensity"))

ch.csv <- read.csv("~/Library/CloudStorage/Box-Box/kessler-data/lactuca/botrytis_experiment/data/lettuce_roots/export3D/EXPORT3D_ETHAN_01_19_21 2021-01-20 00-27-52/679.CSV",
                   row.names = 1, header=TRUE,
                   fileEncoding="utf-16",check.names = FALSE)
par(mfrow=c(1,1))
matplot(ch.entab$time, ch.entab[,"X280"], type="l",ylim=c(-100,800))
matplot(ch.entab$time, ch.csv[,"280.00000"],type="l",add=T,lty=2,col="blue")
abline(v="15.00",col="red",lty=3)

example_file <- as.data.frame(Reader("~/entab/entab/tests/data/carotenoid_extract.d/dad1.uv"))
library("tidyverse")
df %>% filter(wavelength=="200")
df <- data.frame(tidyr::pivot_wider(example_file, id_cols = "time", names_from = "wavelength", values_from = "intensity"))
matplot(df$time, df$X280, type="l")

The text was updated successfully, but these errors were encountered:

ethanbass · 2022-03-27T19:25:39Z

I tried the aston parser. it works beautifully!

bovee · 2022-03-30T04:09:09Z

I'm not sure if this was the issue (I haven't checked the graphs yet), but there's definitely a bug where it was pulling an unsigned int instead of a signed one (fixed in 7b751f5). I vaguely remember a bug like this happening in Aston too a long time ago too so it's possible there's still something else.

ethanbass · 2022-03-30T13:19:14Z

Thanks for looking into this. Your example file now seems to be reading correctly, but my file 679.D still has the crazy shifting baseline in both versions (CLI and entab-R). Also, in the R version something there seems to be a newly introduced bug where there are some values making it into the wavelength column that appear to be retention times (but this doesn't happen in the CLI version).

Also, I don't have benchmarks, but it seems like something you did slowed down the R version considerably. I'm not sure if this could be related to the retention times appearing with the wavelengths. The slowdown only seems to affect the chemstation UV parser. The masshunter parser, for example, is working beautifully from what I can tell.

bovee · 2022-03-31T01:46:35Z

I think the R slowness/bad data is unrelated to the UV parsing, but might be from 622c036 ? It's extremely weird.

Thank you for the UV data BTW! I took a quick look and I think there are still two things going on:

The values between Aston and Entab start the same, but go off track after the first record so there's a parsing bug around file lengths I'll try to track down.
Both of their values are (very slightly) different from the CSV. I think there's a multiplier or offset in the header that they need to be corrected by?

bovee · 2022-03-31T04:37:50Z

I refactored the UV parser a bit in 14059d2 and I think both of these issues should be fixed (and there should be metadata available on these files now).

I'm still not sure what's happening with the R bindings, but I can futz with it. You might also try deleting the current ones before reinstalling?

ethanbass · 2022-03-31T13:57:05Z

awesome this is great!!! I tried removing the R package before reinstalling as you suggested and it seems to have helped dramatically with the speed. This also seems to have improved the issue I mentioned with retention times appearing in the wavelengths column (about 9/10 times). The weird part though (!?) is that this behavior is still happening about one tenth of the time, as in, if I repeatedly run the Reader on the same file. 🧐 (This seems to be independent of the file used). Also i'm pretty confident that the speed issue is somewhat related to this behavior. It runs much slower on the runs where it ends up producing the wrong values

ethanbass · 2022-03-31T14:16:42Z

Also re: metadata I'm not quite sure what kind of metadata there should be or how to access it?

bovee · 2022-04-01T03:21:40Z

I opened a new bug (#29) for the retention time crossover issue to track that on its own since it's weird and I don't fully understand it.

Some of the file parsers read additional metadata out (e.g. sample name, operator name, etc) if the file contains it and I've figured out the format; you can access it with the -m flag on the CLI or in R with Reader(path)$metadata().

ethanbass · 2022-04-01T14:26:21Z

sounds good!!

bovee mentioned this issue Mar 31, 2022

Memory corruption in R #29

Closed

bovee closed this as completed Apr 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug in agilent .uv parser #28

bug in agilent .uv parser #28

ethanbass commented Mar 27, 2022 •

edited

Loading

ethanbass commented Mar 27, 2022

bovee commented Mar 30, 2022 •

edited

Loading

ethanbass commented Mar 30, 2022 •

edited

Loading

bovee commented Mar 31, 2022

bovee commented Mar 31, 2022

ethanbass commented Mar 31, 2022 •

edited

Loading

ethanbass commented Mar 31, 2022

bovee commented Apr 1, 2022

ethanbass commented Apr 1, 2022

bug in agilent .uv parser #28

bug in agilent .uv parser #28

Comments

ethanbass commented Mar 27, 2022 • edited Loading

ethanbass commented Mar 27, 2022

bovee commented Mar 30, 2022 • edited Loading

ethanbass commented Mar 30, 2022 • edited Loading

bovee commented Mar 31, 2022

bovee commented Mar 31, 2022

ethanbass commented Mar 31, 2022 • edited Loading

ethanbass commented Mar 31, 2022

bovee commented Apr 1, 2022

ethanbass commented Apr 1, 2022

ethanbass commented Mar 27, 2022 •

edited

Loading

bovee commented Mar 30, 2022 •

edited

Loading

ethanbass commented Mar 30, 2022 •

edited

Loading

ethanbass commented Mar 31, 2022 •

edited

Loading