Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: read.adp.ad2cp() should only read one "data type" #2005

Closed
richardsc opened this issue Oct 5, 2022 · 15 comments
Closed

Proposal: read.adp.ad2cp() should only read one "data type" #2005

richardsc opened this issue Oct 5, 2022 · 15 comments

Comments

@richardsc
Copy link
Collaborator

Basically, rather than try and read all the present data types into a single object (currently handled by doing, e.g. obj@data$average$... for the 'average' data type), we create an argument for read.adp.ad2cp() that specifies which one to read. The argument could be type or dataID or dataType (I like the latter).

This allows objects to fit the traditional oce style of @data and @metadata slots, without mixing up things that have different data fields and metadata. If an analyst needs to work with more than one of the data types together (e.g. velocities and echosounder), they can devise their own methods based working with the separate objects.

image

@richardsc richardsc added the ad2cp label Oct 5, 2022
@richardsc
Copy link
Collaborator Author

read.adp.ad2cp() could be made so tha if the user doesn't specify dataType, then it will parse the file for the types that are present, and print them to the console as a guide for what things can be read.

@dankelley
Copy link
Owner

Two very good ideas, thanks.

@dankelley dankelley self-assigned this Oct 6, 2022
@dankelley
Copy link
Owner

Work on this issue, likely to begin on the weekend, will be in a branch named i2005 which descends from branch ad2cp (which in turn descends from branch develop).

@dankelley
Copy link
Owner

specifies which one to read. The argument could be type or dataID or dataType (I >like the latter).

@richardsc I'd like to settle on the param name. I'll give some notes, then I'll vote, and then I'll ask you to vote.

Presently, we have an argument that does this called which. (Importantly, it permits multiple types, which will not be allowed in the changes). But we use which a lot in plotting things, so I don't like it here. I prefer your suggestions type, dataID and dataType.

I'm not keen on type because that's already an argument for some read functions. It means the type of instrument, and it has appeared in some objects as @metadata$type for over a decade, so I think it is "taken".

I prefer dataType over dataID because it's more inclusive. I would allow a string or a numeric code (base-10 or base-16), so folks could choose their method.

My vote:

  • -1 for type
  • -0.5 for dataID (because I think that means only the numeric codes, which not everyone will remember)
  • 0 for which (only >0 because it exists now, meaning I liked it before)
  • +1 for dataType

Can you vote, @richardsc so I can make a plan?

@richardsc
Copy link
Collaborator Author

I vote +1 for dataType, and I like your idea of accepting both the "name" (which would be in the docs in a table) and the base-16 code that is in the Nortek manual (i.e. the above screenshot).

@dankelley
Copy link
Owner

Good. I'll go with dataType. There will be concerns with what to put in metadata for each type, but they ought to be in new issues so I can keep track.

@dankelley
Copy link
Owner

As usual, I am starting with the documentation. It is below, in first-draft form, so the wording might change but not the meaning -- unless @richardsc interjects soonish.

As I go along, I am also modifying the test suite as appropriate. But I have not commented-out the now-failing tests, so that the package in this branch will not build-test without error. (I want those errors to tell me what to do next.)

Screenshot 2022-10-07 at 8 41 35 AM

@dankelley
Copy link
Owner

Here's an example of the unspecified-dataType usage (with a private file)

[1] "~/Dropbox/oce_secret_data/ad2cp/secret1_trimmed.ad2cp"
> read.adp.ad2cp(f1)
  IDhex IDdec    name occurance
1  0x15    21   burst        88
2  0x16    22 average        11
3  0xa0   160    text         1

@dankelley
Copy link
Owner

NB: I'm changing some warnings to messages, because they don't indicate a problematic state. This will involve changing tests like

            expect_warning(
                expect_warning(
                    d1 <- read.oce(f1, dataType="average"),
                    "using to=100 based on file contents"),
                "setting plan=0")

to read instead like

            expect_message(
                expect_message(
                    d1 <- read.oce(f1, dataType="average"),
                    "using to=100 based on file contents"),
                "setting plan=0")

An advantage of this is that people might use options(warn=3) to get into the debugger at spots that are actually problematic (e.g. an early end-of-file).

@dankelley
Copy link
Owner

I've made some progress in the "i2005" branch, commit b2199da. Only a small part of plotting is known to work (and I've not even begun to update its documentation).

@dankelley
Copy link
Owner

dankelley commented Oct 11, 2022

I am going to use this comment, which will be edited in-place, for a checklist of items remaining to be done.

  • read.adp.ad2cp(..., dataType="echosounderRaw") not computing distance (done in i2005 branch commit 9696ae8)

@dankelley
Copy link
Owner

I have started looking at this again, because I want this in the new CRAN. As a first step, I updated "ad2cp" based on "develop". Then I updated "i2005" based on "ad2cp" (which was tricky because there was a merge conflict that amounted to several hundred lines). Then I tested "ad2cp" and since it passed, I deleted the "i2005" branch.

My next step will be to merge "ad2cp" into "develop", and then to delete "ad2cp".

Why the branch deletions? Because the longer code sits in a branch other than "develop", the more likely it becomes that there will be merge conflicts that (a) waste my time and (b) may introduce errors because it's hard to compare logic flow across similar-but-different 200-line code chunks.

Since the ad2cp code was marked provisional, I don't see any reason to deprecate functions, or to make a new one called (something)NEW. The problem is that the object structure is changed. That means that all the generics -- summary(), subset() and plot etc -- need to be changed, not just read.*()..

@dankelley
Copy link
Owner

I've merged the "ad2cp" branch into "commit", and deleted "ad2cp" locally and on github.

@dankelley
Copy link
Owner

@richardsc I think this issue ought to be closed. It was basically completed a few months ago, and if there are problems, they ought to be discussed in new issues. Note that the "ad2cp" branch no longer exists; testing ought to be done in the "develop" branch.

Closing this gets us one step closer to being able to contemplate a CRAN release. And I always like to have a month or so of stable code before doing that. I am not too worried that read.adp.ad2cp() now works differently than it did prior to the posting of this issue, because it was always marked as provisional code, subject to change. And I am pretty certain that the new code is much more useful than the old code.

@dankelley
Copy link
Owner

During a discussion, CR and I decided to merge this. I've done that, checked that local build/check works, and pushed to GH as commit 68fb448 of the "develop" branch, so I am closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants