Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differences in citations "cited_by_count" and "oa_fetch" #115

Open
JeffreySmithA opened this issue Jun 8, 2023 · 3 comments
Open

Differences in citations "cited_by_count" and "oa_fetch" #115

JeffreySmithA opened this issue Jun 8, 2023 · 3 comments

Comments

@JeffreySmithA
Copy link

JeffreySmithA commented Jun 8, 2023

Hi,

I'm struggling to reconcile two facts.

On the one hand, when I get a specific author, say using the following code:

test_oa <- oa_fetch(
author.id = "A2054529157",
entity = "works",
verbose = TRUE
)

I then look at the first row/paper in this dataframe, and open up the counts_by_year, I get the following (which has paperid: W2004018659):

year / cited_by_count
2023/5
2022/6
2021/10
2020/15
2019/11
2018/11
2017/5
2016/12

But, now when I try and calculate the citations to that same paper for all of the years, I use this code:

dat2 <- oa_fetch("works", cites = "W2004018659") |>
dplyr::count(publication_year)
dat2

But now the citations are not matching up, so I get:

year/citations
2023/3
2022/7
2021/13
2020/13
2019/11
2018/11
2017/5
2016/13

Would anyone be able to explain why these numbers are differing, for the same work?

Thanks again in advance!

@trangdata
Copy link
Collaborator

Hi @JeffreySmithA good question! The OpenAlex docs explained that cited_by_count only goes back 10 years, so that explains some missing years, but I'm not sure why we see 12 here (perhaps they recently increased to 12)?.

List: Works.cited_by_count for each of the last ten years, binned by year. To put it another way: each year, you can see how many times this work was cited.
Any citations older than ten years old aren't included. Years with zero citations have been removed so you will need to add those in if you need them.

I'm not sure how exactly is cited_by_count calculated, but I imagine some differences in when the counts are made (maybe not publication_year) result in some other discrepancies?

If resolving this issue is important to you, I recommend reaching out to the OpenAlex team.

@JeffreySmithA
Copy link
Author

Thank you! I've reached out to them and will report back here when I get a response. If I don't get a response in the near future, I will close the thread.

@amacanovic
Copy link

@JeffreySmithA did you ever receive a response from the OA team? Could you please share the response?

I am seeing discrepancies even in the ordinary oa_fetch, where "cited_by_total" counts per year, added together, outnumber the "cited_by_count" count within the author entity. I cannot deduct, from the docs, why this would be the case.

E.g. here:

test <- oa_fetch(entity = "authors", openalex_id = "https://openalex.org/A5002522655")

# cited_by_count returns 19k citations
test$cited_by_count
[1] 19254


# and if we add up together year-by-year citations since 2012 **only**, we get more!
sum(test[[8]][[1]]$cited_by_count)
[1] 29367

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants