Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The description of the LTDataset class is currently not very clear #15

Open
ChristophLeonhardt opened this issue Apr 19, 2023 · 0 comments

Comments

@ChristophLeonhardt
Copy link
Collaborator

The description states that the class "facilitates the linkage of datasets via shared unique identifiers". However, the class enriches data via shared attributes (i.e. metadata / variables) and the mechanism can be used to add unique identifiers. The entire documentation is very focussed on the addition of these IDs, but other data could be added, too. This should be made more clear.

Also, the first step described in the documentation of the class is described as

a) the preparation of datasets which should be linked, i.e. the transformation into a comparable format and the assignment of shared unique identifiers

This step currently is done in advance. Here, a decision should be made where this should be done. The class is probably not the place to do this transformation.

#' LTDataset
#'
#' This class facilitates the linkage of datasets via shared unique
#' identifiers. Three steps are realized by this class: a) the preparation
#' of datasets which should be linked, i.e. the transformation into a comparable
#' format and the assignment of shared unique identifiers, b) the merge of
#' datasets based on these identifiers, c) the encoding or enrichment of the
#' data with three output formats (data.table, XML or CWB).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant