New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a task that does entity linking through DBpedia Spotlight for a single document #52

Closed
wants to merge 5 commits into
base: master
from

Conversation

Projects
None yet
2 participants
@aolieman
Copy link
Contributor

aolieman commented Apr 30, 2014

I've tried to keep this task as consistent as possible with semanticize. It returns the same kind of output, but does take a couple of parameters. There is also a unittest.

@@ -8,4 +8,5 @@ nltk
scikit-learn>=0.13
setuptools>=1.3.2
weighwords
git+https://github.com/aolieman/pyspotlight.git

This comment has been minimized.

@larsmans

larsmans Apr 30, 2014

Collaborator

This is a bit problematic as it introduces a dependency on GitHub as well (and on you ;)

Can't we use the version on PyPI?

This comment has been minimized.

@aolieman

aolieman May 1, 2014

Contributor

I see the problem.

Yes, as a matter of fact, we can use the PyPI version. The changes I made were due to a bug in DBp Spotlight, which has been fixed in their newest build. Using PyPI pyspotlight could only cause a minor issue if it is used with custom DBp Spotlight instances running an older build. In that case, there will be some duplicates in the output.

I'll try to get my modifications accepted upstream, so pyspotlight can be made to work with any Spotlight instance. I've changed the dependency in my xtas fork, but find it impossible to push on NS WiFi. Will have to try later.

This comment has been minimized.

@larsmans

larsmans May 1, 2014

Collaborator

In fact, I'm getting a test failure from Travis for this...

The other option is to integrate this library into xtas. Since it's low-profile and therefore likely to break (no users == no complaints) in future versions, that might be a better idea. I can try to do a history merge to make later patching easier.

def test_dbpedia_spotlight():
en_text = u"Will the efforts of artists like Moby help to preserve the Arctic?"
nl_text = u"Ik kan me iets herrinneren over de burgemeester van Amstelveen \
en het achterwerk van M\xe1xima. Verder was Koningsdag een zwart gat."

This comment has been minimized.

@larsmans

larsmans Apr 30, 2014

Collaborator

Now that's real-world text.

This comment has been minimized.

@aolieman

aolieman May 1, 2014

Contributor

Not an entirely true story, but I'm glad you like it 😉

@@ -8,4 +8,5 @@ nltk
scikit-learn>=0.13
setuptools>=1.3.2
weighwords
pyspotlight

This comment has been minimized.

@aolieman

aolieman May 1, 2014

Contributor

Travis likes it better this way.
About integrating this library into xtas: seems reasonable, it's just a small convenience wrapper anyway. I seem to be using it more than the original maintainer at the moment, so it's likely I'll be doing the fixes when it breaks.

I looked into what a "history merge" is, but didn't find an explanation. If it makes patching from an upstream branch (e.g. my fork) really easy, this sounds like a good plan. And I would like to learn more about it :-).

But I also see the merits of having a version on PyPI that is maintained. I've asked the original maintainer if I can take this over from him while he is occupied with other things. Btw, my modification to pyspotlight allows for arbitrary keyword arguments to be passed, so it would be robust against the majority of DBp Spotlight REST API changes. Just as long as those docs are updated and we know which arguments to pass... ;-).

This comment has been minimized.

@larsmans

larsmans May 3, 2014

Collaborator

What I meant is actually a subtree merge. Can you keep in touch with upstream, and tell me if and when we need to integrate pyspotlight?

@larsmans

This comment has been minimized.

Copy link
Collaborator

larsmans commented May 2, 2014

All tests pass on my box. Merged by rebase as 1d43bc7, thanks!

@larsmans larsmans closed this May 2, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment