Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to instantiate a dataset without instantiating a dataverse #28

Open
pdurbin opened this issue Oct 6, 2015 · 16 comments
Open

Comments

@pdurbin
Copy link
Member

pdurbin commented Oct 6, 2015

In working on an internal ticket that ultimately lead to IQSS/dataverse#2599 being opened, I wrote the following to @garthg about my perception that it is impossible to instantiate a dataset without first instantiating a dataverse:

It looks like the get_dataset_by_doi method is really a loop that
iterates through get_datasets which seems to be a representation of the
list of datasets from the operation above that's failing:

$ grep get_dataset_by_doi dataverse/dataverse.py -A2
    def get_dataset_by_doi(self, doi, refresh=False):
        return next((s for s in self.get_datasets(refresh) if s.doi == doi), None)

I only bring this up because it seems like I can operate on your
dataset DOIs if I skip the "List datasets in a dataverse" operation and
go directly to these API endpoints:

http://guides.dataverse.org/en/4.2/api/sword.html#display-a-dataset-atom-entry

http://guides.dataverse.org/en/4.2/api/sword.html#display-a-dataset-statement

I'd be happy to be told I'm wrong about this.

Especially once datasets can be found via search (#21) I imagine that datasets will be able to be instantiated without instantiating a dataverse but I thought I'd go ahead and create this issue so we can talk about it.

@pdurbin pdurbin changed the title Add ability to instatiate a dataset without instantiating a dataverse Add ability to instantiate a dataset without instantiating a dataverse Oct 6, 2015
@rliebz
Copy link
Contributor

rliebz commented Oct 10, 2015

While it is possible to get some information from those endpoints, it appears that it's insufficient to create a Dataset object.

Example outputs from those endpoints:

<entry xmlns="http://www.w3.org/2005/Atom">
  <bibliographicCitation xmlns="http://purl.org/dc/terms/">rliebz@gmail.com, 2015, "Study of Cats", http://dx.doi.org/10.5072/FK2/LZGXQ8,  API Test Dataverse,  DRAFT VERSION</bibliographicCitation>
  <generator uri="http://www.swordapp.org/" version="2.0"/>
  <id>https://apitest.dataverse.org/dvn/api/data-deposit/v1.1/swordv2/edit/study/doi:10.5072/FK2/LZGXQ8</id>
  <link href="https://apitest.dataverse.org/dvn/api/data-deposit/v1.1/swordv2/edit/study/doi:10.5072/FK2/LZGXQ8" rel="edit"/>
  <link href="https://apitest.dataverse.org/dvn/api/data-deposit/v1.1/swordv2/edit/study/doi:10.5072/FK2/LZGXQ8" rel="http://purl.org/net/sword/terms/add"/>
  <link href="https://apitest.dataverse.org/dvn/api/data-deposit/v1.1/swordv2/edit-media/study/doi:10.5072/FK2/LZGXQ8" rel="edit-media"/>
  <link href="https://apitest.dataverse.org/dvn/api/data-deposit/v1.1/swordv2/statement/study/doi:10.5072/FK2/LZGXQ8" rel="http://purl.org/net/sword/terms/statement" type="application/atom+xml; type=feed"/>
  <treatment xmlns="http://purl.org/net/sword/terms/">no treatment information available</treatment>
  <link href="http://dx.doi.org/10.5072/FK2/LZGXQ8" rel="alternate"/>
</entry>
<feed xmlns="http://www.w3.org/2005/Atom">
  <id>https://apitest.dataverse.org/dvn/api/data-deposit/v1.1/swordv2/edit/study/doi:10.5072/FK2/LZGXQ8</id>
  <link href="https://apitest.dataverse.org/dvn/api/data-deposit/v1.1/swordv2/edit/study/doi:10.5072/FK2/LZGXQ8" rel="self"/>
  <title type="text">Study of Cats</title>
  <author>
    <name>rliebz@gmail.com</name>
  </author>
  <updated>2015-10-10T12:12:00.569Z</updated>
  <category term="isMinorUpdate" scheme="http://purl.org/net/sword/terms/state" label="State">true</category>
  <category term="locked" scheme="http://purl.org/net/sword/terms/state" label="State">false</category>
  <category term="latestVersionState" scheme="http://purl.org/net/sword/terms/state" label="State">DRAFT</category>
</feed>

One issue is that there is no reference to the dataset id (the non-DOI id) in either of those endpoints, but the id is needed for some of the operations to take advantage of the native API. Normally the client gets that information from the parent Dataverse object using the native endpoint /dataverses/<alias>/contents and iterating through the datasets, except if we haven't instantiated a Dataverse object, I don't think there's a way to get that id.

@pdurbin
Copy link
Member Author

pdurbin commented Oct 10, 2015

One issue is that there is no reference to the dataset id (the non-DOI id) in either of those endpoints, but the id is needed for some of the operations to take advantage of the native API.

Right, it's absolutely an issue that SWORD operates on DOIs and the native API operates on database IDs. We want them both to operate on DOIs, which is what IQSS/dataverse#1837 is about.

Meanwhile, I've had the same thought that as a workaround (until the native API supports DOIs) perhaps SWORD could somehow expose the database ID of a dataset. I played around with this in a branch at IQSS/dataverse@639d8c3 as I commented at IQSS/dataverse#1837 (comment) . In a comment at IQSS/dataverse@639d8c3 you can see that the XML contains "datasetEntityId".

By the way, another way to get database IDs is via the Search API if you use the (undocumented) show_entity_ids=true query parameter: https://github.com/IQSS/dataverse/blob/v4.2/src/main/java/edu/harvard/iq/dataverse/api/Search.java#L66 . It's probably time to simply document this since (again) the native API currently doesn't support DOIs.

Thanks for your patience with all this, @rliebz ! And thanks for your continued involvement in the Dataverse Python client!

@garthg
Copy link

garthg commented Oct 12, 2015

Hi @pdurbin and @rliebz ,

I'm not sure this will be useful, but I wanted to make it available to you both just in case. I have a helper class that can handle some of the Dataset API functionality without instantiating a Dataset object. It also has a wrapper for finding the database ID using the undocumented "show_entity_ids=true" feature.

The code has minimal comments and some strange workarounds, so it's not production ready and I haven't released it properly anywhere, but I dumped it on Pastebin here if you'd like to look at it: http://pastebin.com/ipdhEPXA .

@pdurbin
Copy link
Member Author

pdurbin commented Oct 15, 2015

@garthg wow, you've got a whole repo at https://github.com/garthg/petitions-dataverse you're working on! Great! I found this in your pastebin. :)

@garthg
Copy link

garthg commented Oct 16, 2015

Hi @pdurbin ,

Yep! That's part of our ongoing project around the Antislavery Petitions. Kevin Condon helped me set up a process where the code in that repo creates a zip archive with a suitable structure of XML files for importing into a Dataverse by a non-public backend script. My recent work with you has been to migrate that to use the Dataverse API instead of a backend script.

If you're curious, you can also check out the front end prototype we built for the data at http://antislaverypetitions.pythonanywhere.com/map, which links back to the Dataverse studies.

@pdurbin
Copy link
Member Author

pdurbin commented Oct 16, 2015

@garthg wow! That's fantastic! @mcrosas @thegaryking et al. should check out http://antislaverypetitions.pythonanywhere.com/map and how it links back to datasets under https://dataverse.harvard.edu/dataverse/antislaverypetitionsma . I love the timeline feature. :)

Yes, please keep reminding us of anything you need API-wise. I know IQSS/dataverse#2599 was a big issue and it's slated for the next release (4.2.1). Please keep the feedback coming!

@mercecrosas
Copy link
Member

👍 Yes, really nice @garthg great way to integrate visualization with the supporting data

@vajlex
Copy link

vajlex commented Oct 16, 2015

truly excellent! is there a schema for the minimum elements in the json needed for the front end to work? I would like to create another example based on the mapviewdb.json, though some of the elements seem specific to the dataset. THANKS for this!

@garthg
Copy link

garthg commented Oct 16, 2015

Hi @vajlex and @mcrosas , thanks for the supportive comments!

@vajlex Regarding your question of minimum schema, the visualization is currently fairly tightly integrated with the petitions dataset. It would work with minimal changes on another petition dataset, or with some work you could adapt it to use different columns. Right now it expects the following to be defined per row:

  • time start
  • time end
  • signatures
  • title
  • topic
  • pds url
  • dataverse id
  • location

And it also expects pre-built maps of rowsForPlace, rowsForYear, and latLngForPlace.

If you're interested in looking at the source code for generating the mapviewdb.json file as well as the html/css/js source, it's all available in another repo at:
https://github.com/garthg/petitions-visualization

@pdurbin I sure will keep hassling you! I really do appreciate how responsive you and the team have been.

@vajlex
Copy link

vajlex commented Oct 16, 2015

Great @garthg
Actually, I think I can hack this with some of my data, which is historical placename data... so I think I can populate the rowsForPlace, rowsForYear, and latLngForPlace elements.
Will let you know how my experiment goes, meanwhile, awesome work you've done!

@garthg
Copy link

garthg commented Oct 16, 2015

@vajlex That sounds promising! I'd love to see it when you get it up and running. Very cool!

@eaquigley
Copy link

@garthg do you mind if i use http://antislaverypetitions.pythonanywhere.com/map as an example in a presentation on ways people are building off the Dataverse APIs?

It is for the Increasing Openness and Connections portion of this session: https://dlfforum2015.sched.org/event/62384c349f7a6aaf6aa5b3e7d6b5bd88#.VivEvBCrT-Y

@garthg
Copy link

garthg commented Oct 24, 2015

@eaquigley Please feel free to include any of that work in your presentation! I'm excited that you're interested in it. If you receive any interesting comments or questions on it, I'd love to hear about that afterwards as well.

@mercecrosas
Copy link
Member

@garthg I'm using it as one of the examples too in a talk on Monday, among other visualizations and analysis from data in Dataverse. Thanks!

@garthg
Copy link

garthg commented Oct 26, 2015

@mcrosas That's great to hear that you chose to include it as an example! Really exciting. As with Elizabeth, if you get any comments or questions on the work, I'd love to hear about it afterwards.

@eaquigley
Copy link

Will gladly report back any comments @garthg! Thanks for building it so we can show it off!

On Oct 25, 2015, at 6:39 PM, garthg <notifications@github.commailto:notifications@github.com> wrote:

@mcrosashttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_mcrosas&d=CwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=Y9aY3P6kFFpmMLaYYO_id08dS3gL1xWMQuI2CZ74PoI&m=tYbOJIek5imONaMWagnA-noOkfkonVyJmmWME3zNosI&s=RoGfQya_wkiULQL8DZAB0aXr_FcdRoHvV2Xgc203tf8&e= That's great to hear that you chose to include it as an example! Really exciting. As with Elizabeth, if you get any comments or questions on the work, I'd love to hear about it afterwards.


Reply to this email directly or view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_IQSS_dataverse-2Dclient-2Dpython_issues_28-23issuecomment-2D150999243&d=CwMCaQ&c=WO-RGvefibhHBZq3fL85hQ&r=Y9aY3P6kFFpmMLaYYO_id08dS3gL1xWMQuI2CZ74PoI&m=tYbOJIek5imONaMWagnA-noOkfkonVyJmmWME3zNosI&s=-fJBzsRZNz3HaerBjkVtKNTPMdbjpn-GwXDeJRhzbiI&e=.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants