Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dataset] Microposts2013 wrapper #41

Closed
RicardoUsbeck opened this issue Nov 4, 2014 · 7 comments
Closed

[dataset] Microposts2013 wrapper #41

RicardoUsbeck opened this issue Nov 4, 2014 · 7 comments

Comments

@RicardoUsbeck
Copy link
Collaborator

Write a wrapper for the Microposts2013 dataset.
Annotate the license, experiment type and language.
Give provenance.
Update https://github.com/AKSW/gerbil/wiki/Licences-for-datasets

@RicardoUsbeck RicardoUsbeck added this to the Version 2 - new core and better logging milestone Nov 4, 2014
@TortugaAttack
Copy link
Contributor

Just to have it somewhere:

type: microposts
licence: CC BY-NC-SA 3.0
language: en
provenance: http://oak.dcs.shef.ac.uk/msm2013/ie_challenge/MSM2013-CEChallengeFinal.zip

@MichaelRoeder
Copy link
Member

Please add it to this page: https://github.com/AKSW/gerbil/wiki/Licences-for-datasets

@TortugaAttack
Copy link
Contributor

done

@sagnik
Copy link

sagnik commented Aug 2, 2017

I am trying to evaluate the algorithms on micropost data sets on my local machine. The wiki mentions that it expects the data in certain formats, specifically:

Microposts2013

gerbil_data/datasets/microposts2013/goldStandard.tsv
gerbil_data/datasets/microposts2013/testSet.tsv
gerbil_data/datasets/microposts2013/TweetsTrainingSetCH.tsv

Microposts2014

gerbil_data/datasets/microposts2014/Microposts2014-NEEL_challenge_TweetsTestSet.csv
gerbil_data/datasets/microposts2014/Microposts2014-NEEL_challenge_TweetsTrainingSet.csv

Microposts2015

gerbil_data/datasets/microposts2015/dev/NEEL2015-dev-gold_v3.tsv
gerbil_data/datasets/microposts2015/dev/NEEL2015-dev-tweets.tsv
gerbil_data/datasets/microposts2015/test/NEEL2015-test-gold_v2.tsv
gerbil_data/datasets/microposts2015/test/NEEL2015-test-tweets.tsv
gerbil_data/datasets/microposts2015/training/NEEL2015-training-gold_v4.ts
gerbil_data/datasets/microposts2015/training/NEEL2015-training-tweets_v2.tsv

Microposts2016

gerbil_data/datasets/microposts2016/Dev Set/NEEL2016-dev.tsv
gerbil_data/datasets/microposts2016/Dev Set/NEEL2016-dev_neel.gs
gerbil_data/datasets/microposts2016/Test Set/NEEL2016-test.tsv
gerbil_data/datasets/microposts2016/Test Set/NEEL2016-test_neel.gs
gerbil_data/datasets/microposts2016/Training Set/NEEL2016-training.tsv
gerbil_data/datasets/microposts2016/Training Set/NEEL2016-training_neel.gs

I have the original datasets downloaded, if you ever wrote a transformer to convert them to the format gerbil expects in, could you please point that to me? Otherwise, if you have a data model for the micropost data, I can write the transformer myself.

@TortugaAttack
Copy link
Contributor

Actually they should just show up (the microposts datasets ara defined in thr dataset.properties with the given paths.
If they do not show up in the web frontend, please check the logs and may create a new issue ;)
Cheerz

@MichaelRoeder
Copy link
Member

MichaelRoeder commented Aug 3, 2017

@sagnik the files that you have listed there should be the original files. At least that are the files that we got from the microposts organizers. If you have different files, or other problems, please open a new issue since this issue is already closed.

@sagnik
Copy link

sagnik commented Aug 3, 2017

Please see #206, @TortugaAttack , I added the logs in there as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants