Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check dataset element description #329

Open
stevenchong opened this issue Jan 7, 2019 · 8 comments
Open

check dataset element description #329

stevenchong opened this issue Jan 7, 2019 · 8 comments

Comments

@stevenchong
Copy link
Contributor

Following a conversation with @mpsaloha and @gothub , we wanted to get clarification on the definition of a "dataset" that appears in the dataset element description:

The dataset field encompasses all information about a single dataset.  A dataset is 
defined as all of the information describing a data collection event.  This event may 
take place over some period of time and include many actual collections (a time 
series or remote sensing application) or it could be just one actual collection (a day 
in the field).

The second sentence caught our attention and sounds more relevant to the metadata about a dataset, rather than to a dataset itself.

If this description gets edited, note that it also appears in the DatasetType description.

@amoeba
Copy link
Contributor

amoeba commented Jan 8, 2019

Good catch. I think the word 'dataset' is being used in two different ways in this description. First, two describe the scientific concept of a dataset and, second, to describe what an EML dataset is. I think sentence two holds if you use the second definition but not the first. I actually think that might have been the original intent of the wording.

Did you and the others think up any alternatives, or would you like to have a try at it if you think it still needs tweaking?

@mpsaloha
Copy link
Contributor

mpsaloha commented Jan 8, 2019 via email

@amoeba
Copy link
Contributor

amoeba commented Jan 9, 2019

Thanks @mpsaloha that looks pretty good.

I put together a version with minimal modification to increase clarity:

DatasetType is the base type for the dataset element. The dataset element is a container for the information describing a data collection event. This event may take place over some period of time and include many actual collections (a time series or remote sensing application) or it could be just one actual collection (a day in the field).

What do you think? If you like your version better, I'd be fine with that. I'll send this over to the #eml channel in case anyone else has thoughts.

@srearl
Copy link
Contributor

srearl commented Jan 9, 2019

Hi @amoeba - Interesting discussion. I am surely overthinking this but I find 'event' a bit misleading and, maybe, constraining as it conveys the sense that a dataset results only from going into the field. I wonder if the language could be a bit more encompassing to make it read less "fieldy" and reflect that a dataset could in fact describe the output extensive research. I played around a bit focusing on research effort as a substitute.

DatasetType is the base type for the dataset element. The dataset element is a container for the information describing the features and products of a research effort. The research effort described may be expansive, taking place over an extended period of time and include many unique data products (e.g., tabular data tables, shapefiles), such as resulting from a thesis, or could be a short, focused effort resulting in one or more data products.

@amoeba
Copy link
Contributor

amoeba commented Jan 9, 2019

Thanks for chiming in, @srearl! I take your point about the constrained scope of the current wording. We certainly do use EML dataset to document resources where concepts like temporal/spatial coverage do not apply (e.g., the derived output of running a physical simulation model).

@mpsaloha
Copy link
Contributor

mpsaloha commented Jan 10, 2019 via email

@stevenchong
Copy link
Contributor Author

I'll just point out that the eml-dataset module has this sentence that (sort of) defines a dataset:

"A dataset can be (and often is) composed of a series of data entities (tables) that are linked together by particular integrity constraints."

I don't see that text appearing in any of the field descriptions. Perhaps it's worth adding to the "dataset" field description.

Original context: https://github.com/NCEAS/eml/blob/BRANCH_EML_2_2/docs/eml-modules-resources.md#the-eml-dataset-module---dataset-specific-information

@mpsaloha
Copy link
Contributor

mpsaloha commented Jan 10, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants