Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use functionality of DBpedia Spotlight #13

Open
ablaette opened this issue Apr 13, 2023 · 3 comments
Open

Use functionality of DBpedia Spotlight #13

ablaette opened this issue Apr 13, 2023 · 3 comments

Comments

@ablaette
Copy link

ablaette commented Apr 13, 2023

See this as an entry point:
https://github.com/dbpedia-spotlight/spotlight-docker

Alternative: https://opentapioca.org/ (without docker)

@ablaette
Copy link
Author

The docker containers are not available for all architectures. Particularly not for M1. Therefore, we need to build from the dockerfile as follows.

git clone https://github.com/dbpedia-spotlight/spotlight-docker.git
cd spotlight-docker
docker build -t dbpedia/dbpedia-spotlight:latest

Then run the image as follows.

docker run -tid --restart unless-stopped --name dbpedia-spotlight.de --mount source=spotlight-model,target=/opt/spotlight -p 2222:80 dbpedia/dbpedia-spotlight spotlight.sh de

@ablaette
Copy link
Author

This is a snippet to pass data to DBpedia Spotlight using classes from the NLP package. Two alerts:

  • code not working: What is wrong with httr::GET()?
  • new dev version of polmineR required, there was a bug with setAs(..., "AnnotatedPlainTextDocument")
library(polmineR)
use("polmineR")

merkel_speeches <- corpus("GERMAPARLMINI") %>% 
  subset(speaker == "Angela Dorothea Merkel") %>%
  as.speeches(s_attribute_name = "speaker", s_attribute_date = "date")

doc <- as(merkel_speeches[[2]], "AnnotatedPlainTextDocument")

y <- httr::GET(
  url = "http://localhost:2222/rest/annotate",
  body = list(
    "data-urlencode" = sprintf("text=%s", doc[["content"]]),
    "data" = "confidence=0.35"
  ),
  httr::accept_json()
)

@ablaette
Copy link
Author

I would have hoped that offset positions of input and output correspond, but that does not seem to be the case:

library(jsonlite)

merkel_speeches <- corpus("GERMAPARLMINI") %>% 
  subset(speaker == "Angela Dorothea Merkel") %>%
  as.speeches(s_attribute_name = "speaker", s_attribute_date = "date")

doc <- as(merkel_speeches[[2]], "AnnotatedPlainTextDocument")

request <- httr::GET(
  url = "http://localhost:2222/rest/annotate",
  query = list(
    text = substr(doc[["content"]], 1, 990),
    confidence = 0.35
  ),
  httr::add_headers('Accept' = 'application/json')
)

# Output
httr::content(request, as = "text") %>%
  jsonlite::fromJSON() %>%
  pluck("Resources") %>%
  head() %>%
  .[, c("@surfaceForm", "@offset")]

# Input
as.data.frame(doc[["annotation"]]) %>% 
  as_tibble() %>%
  mutate(word = sapply(features, `[[`, "word")) %>%
  mutate(pos = sapply(features, `[[`, "pos")) %>%
  select(-features) %>%
  head()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant