Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue parsing some WoS exports with Group Authors. #13

Closed
Phocion opened this issue Feb 19, 2014 · 6 comments
Closed

Issue parsing some WoS exports with Group Authors. #13

Phocion opened this issue Feb 19, 2014 · 6 comments

Comments

@Phocion
Copy link

Phocion commented Feb 19, 2014

Greetings all. Have been using Tethne as a means to parse the vast amounts of WoS exports I have for a particular project.

Out of 160 files (each containing the maximum 500 references), ~31 of them fail with the following error (although the actual string values vary):

In [19]: meta_list = rd.wos.convert(wos_list)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-19-c13337f31f8c> in <module>()
----> 1 meta_list = rd.wos.convert(wos_list)

/usr/local/lib/python2.7/dist-packages/tethne/readers/wos.pyc in convert(wos_data)
    592                         #  as our mapping key to ensure consistency with older
    593                         #  datasets.
--> 594                         author_index = wos_dict['AF'].index(author)
    595                         #e.g."WU, ZD"
    596                         author_au = wos_dict['AU'][author_index].upper()

ValueError: 'Brazilian Aging Brain Study Grp' is not in list

For example, the only line "Brazilian Aging Brain Study Grp" appears in the file in question is:

data/ims/arrastra$ grep -H -r "Brazilian Aging Brain Study Grp" /data/arrastra/alzheimers_xph1_v4/Citations/
/data/arrastra/alzheimers_xph1_v4/Citations/sub1/savedrecs (12).txt.perror:CA Brazilian Aging Brain Study Grp
/data/arrastra/alzheimers_xph1_v4/Citations/sub1/savedrecs (12).txt.perror:   [Grinberg, L. T.; Alba, J. G.; Farfel, J. M.; Suemoto, C. K.; de Lucena Ferretti, R. E.; Leite, R. E. P.; de Andrade, M. P.; Pasqualucci, C. A.; Nitrini, R.; Jacob-Filho, W.; Brazilian Aging Brain Study Grp] Univ Sao Paulo, Sch Med, Brazilian Aging Brain Study Grp LIM22, Sao Paulo, Brazil.

I could easily add a check to ensure that the value is in fact present in the list, but that wouldn't really be solving the issue. Perhaps this is failing due to the fact that it is listed as a "Group Author" (WoS Field = CA)?

Thanks!

@erickpeirson
Copy link
Collaborator

Hi there, @Phocion; glad to hear that you're finding Tethne useful! I think that I see where the hiccup is occurring. Any chance you could send along some sample data? (send to erick.peirson@asu.edu)

Thanks!

@Phocion
Copy link
Author

Phocion commented Feb 19, 2014

Thanks for the quick reply! Just sent an email as requested under my Drexel U. account.

erickpeirson added a commit that referenced this issue Feb 19, 2014
erickpeirson added a commit that referenced this issue Feb 19, 2014
@erickpeirson
Copy link
Collaborator

Ok, see whether that works. The group author ('CA') field is now treated like a regular author in readers.wos.convert. The issue was that values from CA are included in the author address field ('C1'), and so the bit of .convert() that does author-institution mapping got confused.

In the future, we should think harder about what else we might want to do with the CA field.

@erickpeirson erickpeirson reopened this Feb 19, 2014
@erickpeirson
Copy link
Collaborator

Oops, closed automatically. Feel free to close if this resolves the problem.

@Phocion
Copy link
Author

Phocion commented Feb 19, 2014

Thanks for the quick response! I'll test and confirm within the hour. One thing to consider about the CA field - at least specifically in the way I'm thinking - is that it's used to cite a body of contributors from where the analyzed data originated. Many papers include "and the Alzheimer's Disease Neuroimaging Initiative" at the end of the author's list if they used the ADNI dataset in their analysis. It's a specific way of citing this effort. Just a thought to consider.

@Phocion
Copy link
Author

Phocion commented Feb 20, 2014

Looks great! Issue resolved. Thanks again for the quick turn-around!

@Phocion Phocion closed this as completed Feb 20, 2014
erickpeirson added a commit that referenced this issue Aug 25, 2014
The group author ('CA') field is now treated like a regular author in
readers.wos.convert. The issue was that values from CA are included in
the author address field ('C1'), and so the bit of .convert() that does
author-institution mapping got confused.


Former-commit-id: 0c4558f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants