## Web of Science API
This code is designed to interact with the WOS API using "premium" or "lite" access protocols.

-    `wos.py` contains code for interacting with the API directly.
-    `woscalls.py` includes calls made to the `WOS` class
-    `metawos.py` can be used to extract metadata from search results.
-    `buildsearch.py` builds search strings from data supplied in a certain style of tsv file.

### Dependencies

-    [lxml](http://lxml.de/) for xml parsing
-    [suds](https://fedorahosted.org/suds/wiki/Documentation) for SOAP API interaction.

### The API Class

It's possible to work with the API class, `WOS`, directly using syntax like the following:

In [13]:
from wos import Wos
wos = Wos(client="Lite")

This will initiate the search client but not yet run any API calls. There are two options for the "client" keyword argument: "Search" and "Lite". The Lite API should be available to all institutions that subscribe to at least the Web of Knowledge Core Collection. Contact your representative to have access opened to your IP address, if it isn't already. [More details here](http://wokinfo.com/products_tools/products/related/webservices/). The Lite API provides basic search and basic metadata retrieval functionality, but _not_ citation retrieval.

The "Search" client is an available at an additional cost, and does provide access to both forward and backward citation data. Access is available on a project basis, or as a yearly subscription. I would recommend first requesting a trial period to work with the API.

In [14]:
wos.authorize()

Search client authorized.


The `authorize` function attempts to authenticate based on IP address. If successful, an authorization token will be attached to all future requests in the session. 

In [15]:
wos.query_parameters('AU=(Peiretti AND Palmegiano) AND SO=(Animal Feed Science and Technology) AND PY=2004', database_id="WOK")

(queryParameters){
   databaseId = "WOK"
   userQuery = "AU=(Peiretti AND Palmegiano) AND SO=(Animal Feed Science and Technology) AND PY=2004"
   editions[] = <empty>
   symbolicTimeSpan = None
   timeSpan = 
      (timeSpan){
         begin = "1900-01-01"
         end = "2015-10-01"
      }
   queryLanguage = "en"
 }

First establish the `query_parameters` object, which should include a query string along with, optionally, a set of parameters. Details on the structure of the query can be found in the API documentation (which can be requested from ThomsonReutuers, but which I'm also making available [here](https://www.msu.edu/~higgi135/WebServicesLiteguide.pdf) (in possibly an outdated version).  

Parameters can be provided as the following keyword arguments:

- **time_begin (str)** -- date in YYYY-MM-DD format.
- **time_end (str)** -- date in YYYY-MM-DD format.
- **database_id (str)** -- from the WOS set of database abbreviations. "WOS" correpsonds to the WOS core collection.
- **query_language (str)** -- "en" the only currently allowed value.
- **symbolic_timespan (str)** -- a human-readable timespan, e.g. "4weeks", must be null if time_begin and time_end used.
- **editions (list)** -- TODO list of sub-components of the selected database to use.

In [16]:
wos.retrieve_parameters()

(retrieveParameters){
   firstRecord = 1
   count = 100
   sortField[] = <empty>
 }

The `retrieve_parameters` allow for some control of the data that is returned.  

- **first_record (int)** -- The number of the first record to return in the search.
- **count (int)** -- Number of records to return (maximum 100).
- **sort_field (list)** -- TODO Field to sort by (should be WOS field abbreviation).

In [17]:
wos.search(wos.qp, wos.rp)

Found 1 Results for AU=(Peiretti AND Palmegiano) AND SO=(Animal Feed Science and Technology) AND PY=2004


(searchResults){
   queryId = "1"
   recordsFound = 1
   recordsSearched = 198371943
   records[] = 
      (liteRecord){
         uid = "WOS:000224567700011"
         title[] = 
            (labelValuesPair){
               label = "Title"
               value[] = 
                  "Chemical composition, organic matter digestibility and fatty acid content of evening primrose (Oenothera paradoxa) during its growth cycle",
            },
         source[] = 
            (labelValuesPair){
               label = "Issue"
               value[] = 
                  "3-4",
            },
            (labelValuesPair){
               label = "Pages"
               value[] = 
                  "293-299",
            },
            (labelValuesPair){
               label = "Published.BiblioDate"
               value[] = 
                  "OCT 15",
            },
            (labelValuesPair){
               label = "Published.BiblioYear"
               value[] = 
                  "2004",
   

Run the search by calling the `search` function with the query parameters and retrieve parameters objects as arguments (`wos.qp` and `wos.rp` respectively). 

Results can be found in the `wos.search_results.records` object, if any results were returned. More generally, `wos.search_results` can be used to find info about the response, including number of results.  

Additional methods can be used to get cited references as well as citing articles if the "Search" client is used.

In [2]:
from wos import Wos
wos = Wos(client="Lite")
wos.authorize()
wos.query_parameters('AU=(Peiretti AND Palmegiano) AND SO=(Animal Feed Science and Technology) AND PY=2004', database_id="WOK")
wos.retrieve_parameters(view_field=["title", "name"])
wos.search(wos.qp, wos.rp)

Search client authorized.
Found 1 Results for AU=(Peiretti AND Palmegiano) AND SO=(Animal Feed Science and Technology) AND PY=2004


(searchResults){
   queryId = "1"
   recordsFound = 1
   recordsSearched = 198821686
   records[] = 
      (liteRecord){
         uid = "WOS:000224567700011"
         title[] = 
            (labelValuesPair){
               label = "Title"
               value[] = 
                  "Chemical composition, organic matter digestibility and fatty acid content of evening primrose (Oenothera paradoxa) during its growth cycle",
            },
         source[] = 
            (labelValuesPair){
               label = "Issue"
               value[] = 
                  "3-4",
            },
            (labelValuesPair){
               label = "Pages"
               value[] = 
                  "293-299",
            },
            (labelValuesPair){
               label = "Published.BiblioDate"
               value[] = 
                  "OCT 15",
            },
            (labelValuesPair){
               label = "Published.BiblioYear"
               value[] = 
                  "2004",
   

The above code should return 1 result and can be used as a test to ensure code is working properly.

### Automating Searches

The `WosCalls` class provides a means of interacting with the API 1 level up. That is, lists of search strings or sets of search parameters can be provided to run in batch. *This functionality is still work in progress*.

In [19]:
from woscalls import WosCalls
wosc = WosCalls(search_queries=bs.searches, database_id="WOK")
wosc.get_all_search_results()

Search client authorized.
Found 1 Results for AU=(Lambertsen) AND PY=1966 AND SO=(Acta Agric* Scand*)
Found 1 Results for AU=(Bentes) AND PY=1986 AND SO=(Acta Amazonica)
Found 1 Results for AU=(Maia) AND PY=1978 AND SO=(Acta Amazonica)
Found 0 Results for AU=(Loth) AND PY=1991 AND SO=(Agrochimica)
Found 1 Results for AU=(Jellum AND Powell) AND PY=1971 AND SO=(Agron* J*)
Found 5 Results for AU=(Bertoni) AND PY=1994 AND SO=(An* Asoc* Quim* Argent*)
Found 0 Results for AU=(Balnchini) AND PY=1981 AND SO=(Anal* Chem*)
Found 1 Results for AU=(Peiretti AND Palmegiano AND Masoero) AND PY=2004 AND SO=(Animal Feed Science and Technology)
Found 0 Results for AU=(Adhikari) AND PY=1991 AND SO=(Bangladesh J* Sci* Ind* Res*)
Found 0 Results for AU=(Serrano AND Guzm?n) AND PY=1994 AND SO=(Biochem* Systemat* Ecology)
Process complete.
Returned 10 records


In [3]:
from woscalls import WosCalls
wosc = WosCalls(search_term_sets=bs.search_terms, database_id="WOK")
wosc.find_exact_match()

Search client authorized.
1 Found 0 Results for AU=(Furukawa) AND PY=1976 AND SO=(uu)
2 Found 0 Results for AU=(Furukawa) AND PY=1976 AND SO=(uu)
3 Found 0 Results for AU=(Nii) AND PY=1980 AND SO=(uu)
4 Found 0 Results for AU=(International AND Organization AND for AND Standardization) AND PY=1990 AND SO=((ISO/DIS 5507))
5 Found 0 Results for AU=(Wang) AND PY=1997 AND SO=(22nd World Congress of the International Society for Fat Research (Sept* 8-12, 1997), Kuala Lumpur, Malaysia)
6 Found 0 Results for AU=(Aitzetmueller AND Werner) AND SO=(4011-4013)
7 Found 0 Results for AU=(Kleiman) AND PY=1969 AND SO=(60th AOCS Meeting, San Francisco)
8 Found 0 Results for AU=(Daulatabad) AND PY=1988 AND SO=(Abstr* In: J* Oil Technol* Assoc* India)
9 Found 0 Results for AU=(Daulatabad) AND PY=1988 AND SO=(Abstr* Nr* 39 In: J* Oil Technol* Assoc* India)
10 Found 1 Results for AU=(Lambertsen) AND PY=1966 AND SO=(Acta Agriculturae Scandinavica)
11 Found 1 Results for AU=(Lambertsen) AND PY=1966 AND SO=(

The `WosCalls` class is additionally a place to house content-specific methods built on `Wos`. See `run_phylo_process` method as it develops.

### Additional Information

The `BuildSearch` class is currently quite content specific but could in principle be broadened to allow for automatically generating searches from data in other formats, such as CSV or JSON. Currently the algorithm below assumes a very specific data structure to work.

In [1]:
from buildsearch import OhlroggeSearch
bs = OhlroggeSearch("data/ohlrogge/ohlrogge_test_10.txt")
bs.make_search_list() # from here the object bs.searches can be used in WosCalls()

SyntaxError: Non-ASCII character '\xc3' in file buildsearch.py on line 45, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details (buildsearch.py, line 45)

In [1]:
from buildsearch import OhlroggeSearch
bs = OhlroggeSearch("ohlrogge-data.tsv")
bs.make_search_dict() # from here the object bs.search_terms can be used in WosCalls()

File loaded.
Extracted 2896 lines


In [2]:
bs.search_terms

[{'author': '(Lambertsen)',
  'id': u'11702',
  'page': u'213',
  'plant_count': 8,
  'query': u'AU=(Lambertsen) AND PY=1966 AND SO=(Acta Agricultarae Scandinavica)',
  'result_count': 7,
  'source': '(Acta Agricultarae Scandinavica)',
  'volume': u'16',
  'year': u'1966'},
 {'author': '(Lambertsen)',
  'id': u'11703',
  'page': u'213',
  'plant_count': 8,
  'query': u'AU=(Lambertsen) AND PY=1966 AND SO=(Acta Agricultarae Scandinavica)',
  'result_count': 7,
  'source': '(Acta Agricultarae Scandinavica)',
  'volume': u'16',
  'year': u'1966'},
 {'author': '(Lambertsen)',
  'id': u'11704',
  'page': u'213',
  'plant_count': 8,
  'query': u'AU=(Lambertsen) AND PY=1966 AND SO=(Acta Agricultarae Scandinavica)',
  'result_count': 7,
  'source': '(Acta Agricultarae Scandinavica)',
  'volume': u'16',
  'year': u'1966'},
 {'author': '(Lambertsen)',
  'id': u'11705',
  'page': u'213',
  'plant_count': 8,
  'query': u'AU=(Lambertsen) AND PY=1966 AND SO=(Acta Agriculturae Scandinavica)',
  'resul

The `bs.searches` object contains a list of searches, suitable to pass as an argument in the `WosCalls` class.

#### Get in touch!

If I can be of help in using this code, or if you have suggestions for improvement, please do contact me.

In [2]:
from buildsearch import OhlroggeSearch
bs = OhlroggeSearch("data/ohlrogge/ohlrogge_test_10.txt")

File loaded.


In [3]:
test = dict(wos.search_results.records[0])

In [10]:
type(test["authors"][0]["value"])

list

In [4]:
"hello".isupper()

False

In [19]:
for ids in test["other"]:
    print ids["label"], ids["value"]

Contributor.ResearcherID.Names [Peiretti, Pier Giorgio, Peiretti, Pier Giorgio]
Contributor.ResearcherID.ResearcherIDs [B-6871-2013, None]
Identifier.Doi [10.1016/j.anifeedsci.2004.07.001]
Identifier.Ids [863ND]
Identifier.Issn [0377-8401]
Identifier.Xref_Doi [10.1016/j.anifeedsci.2004.07.001]
ResearcherID.Disclaimer [ResearcherID data provided by Thomson Reuters]


In [3]:
d = {'volume': u'32', 'source': '(Soviet plant physiology)', 'author': '(Rikhter)', 'query': u'AU=(Rikhter) AND PY=1985 AND SO=(Soviet plant physiology)', 'year': u'1985', 'page': u'755-760', 'id': u'12392'}

In [6]:
d2 = {'wos_ids': {'Identifier.Issn': u'[0038-5719]', 'ResearcherID.Disclaimer': u'[ResearcherID data provided by Thomson Reuters]', 'Identifier.Ids': u'[C2732]'}, 'wos_authors': ['RIKHTER, AA'], 'wos_source_data': {'SourceTitle': u'[SOVIET PLANT PHYSIOLOGY]', 'Published.BiblioYear': u'[1985]', 'Published.BiblioDate': u'[SEP-OCT]', 'Volume': u'[32]', 'Issue': u'[5]', 'Pages': u'[755-760]'}, 'wos_uid': 'WOS:A1985C273200010', 'wos_title': ['VARIABILITY OF SEED OIL FATTY-ACID COMPOSITION IN DIFFERENT SPECIES AND VARIETIES OF ALMOND']}

In [3]:
"Furukawa, K. et al.".strip().replace(",", "").replace(";", "").replace(".", "***").replace("et al.", "").replace("et al", "").replace('"', '')

'Furukawa K*** ***'

In [8]:
d

{'author': '(Rikhter)',
 'id': u'12392',
 'page': u'755-760',
 'query': u'AU=(Rikhter) AND PY=1985 AND SO=(Soviet plant physiology)',
 'source': '(Soviet plant physiology)',
 'volume': u'32',
 'wos_authors': ['RIKHTER, AA'],
 'wos_ids': {'Identifier.Ids': u'[C2732]',
  'Identifier.Issn': u'[0038-5719]',
  'ResearcherID.Disclaimer': u'[ResearcherID data provided by Thomson Reuters]'},
 'wos_source_data': {'Issue': u'[5]',
  'Pages': u'[755-760]',
  'Published.BiblioDate': u'[SEP-OCT]',
  'Published.BiblioYear': u'[1985]',
  'SourceTitle': u'[SOVIET PLANT PHYSIOLOGY]',
  'Volume': u'[32]'},
 'wos_title': ['VARIABILITY OF SEED OIL FATTY-ACID COMPOSITION IN DIFFERENT SPECIES AND VARIETIES OF ALMOND'],
 'wos_uid': 'WOS:A1985C273200010',
 'year': u'1985'}

In [5]:
b = {'wos_ids': {u'Identifier.Issn': [u'0038-5719'], u'ResearcherID.Disclaimer': [u'ResearcherID data provided by Thomson Reuters'], u'Identifier.Ids': [u'C2732']}, 'wos_authors': u'[RIKHTER, AA]', 'wos_source_data': {u'SourceTitle': [u'SOVIET PLANT PHYSIOLOGY'], u'Published.BiblioYear': [u'1985'], u'Published.BiblioDate': [u'SEP-OCT'], u'Volume': [u'32'], u'Issue': [u'5'], u'Pages': [u'755-760']}, 'wos_uid': u'WOS:A1985C273200010', 'wos_title': u'[VARIABILITY OF SEED OIL FATTY-ACID COMPOSITION IN DIFFERENT SPECIES AND VARIETIES OF ALMOND]'}

In [1]:
b = {u'12297': [{'wos_ids': {u'Identifier.Issn': [u'0030-2082'], u'ResearcherID.Disclaimer': [u'ResearcherID data provided by Thomson Reuters'], u'Identifier.Ids': [u'LM430']}, 'wos_title': u'[WILD PALMS IN MADAGASCAR - FATTY-ACID COMPOSITION OF OILS EXTRACTED FROM THE FRUITS OF 26 SPECIES]', 'author': '(Rabarisoa)', 'wos_uid': u'WOS:A1993LM43000006', 'wos_source_data': {u'SourceTitle': [u'OLEAGINEUX'], u'Published.BiblioYear': [u'1993'], u'Published.BiblioDate': [u'MAY'], u'Volume': [u'48'], u'Issue': [u'5'], u'Pages': [u'251-255']}, 'id': u'12297', 'volume': u'48', 'source': '(Oleagineux)', 'wos_authors': u'[RABARISOA, I, GAYDOU, EM, BIANCHINI, JP]', 'year': u'1993', 'query': u'AU=(Rabarisoa) AND PY=1993 AND SO=(Oleagineux)', 'page': u'251-255'}], u'11880': [], u'12295': [{'wos_ids': {u'Identifier.Issn': [u'0030-2082'], u'ResearcherID.Disclaimer': [u'ResearcherID data provided by Thomson Reuters'], u'Identifier.Ids': [u'LM430']}, 'wos_title': u'[WILD PALMS IN MADAGASCAR - FATTY-ACID COMPOSITION OF OILS EXTRACTED FROM THE FRUITS OF 26 SPECIES]', 'author': '(Rabarisoa)', 'wos_uid': u'WOS:A1993LM43000006', 'wos_source_data': {u'SourceTitle': [u'OLEAGINEUX'], u'Published.BiblioYear': [u'1993'], u'Published.BiblioDate': [u'MAY'], u'Volume': [u'48'], u'Issue': [u'5'], u'Pages': [u'251-255']}, 'id': u'12295', 'volume': u'48', 'source': '(Oleagineux)', 'wos_authors': u'[RABARISOA, I, GAYDOU, EM, BIANCHINI, JP]', 'year': u'1993', 'query': u'AU=(Rabarisoa) AND PY=1993 AND SO=(Oleagineux)', 'page': u'251-255'}], u'11508': [{'wos_ids': {u'ResearcherID.Disclaimer': [u'ResearcherID data provided by Thomson Reuters']}, 'wos_title': u"[Oil-bearing plants of Zaire. II. Botanical families providing oils of medium unsaturation., Les plantes a huile du Zaire. II. Familles botaniques fournissant des huiles d'insaturation moyenne.]", 'author': '(Kabele-Ngiefu)', 'wos_uid': u'CABI:19770351574', 'wos_source_data': {u'SourceTitle': [u'Oleagineux'], u'Volume': [u'31'], u'Published.BiblioYear': [u'1976'], u'Issue': [u'12'], u'Pages': [u'545-547']}, 'id': u'11508', 'volume': u'31', 'source': '(Oleagineux)', 'wos_authors': u'[Ngiefu, C. K., Paquot, C., Vieux, A., Kabele Ngiefu, C.]', 'year': u'1976', 'query': u'AU=(Kabele-Ngiefu) AND PY=1976 AND SO=(Oleagineux)', 'page': u'545'}], u'11509': [{'wos_ids': {u'ResearcherID.Disclaimer': [u'ResearcherID data provided by Thomson Reuters']}, 'wos_title': u"[Oil-bearing plants of Zaire. III. Botanical families providing oils of relatively high unsaturation., Les plantes a huile du Zaire. III. Familles botaniques fournissant des huiles d'insaturation relativement elevee.]", 'author': '(Kabele-Ngiefu)', 'wos_uid': u'CABI:19780360953', 'wos_source_data': {u'SourceTitle': [u'Oleagineux'], u'Volume': [u'32'], u'Published.BiblioYear': [u'1977'], u'Issue': [u'12'], u'Pages': [u'535-537']}, 'id': u'11509', 'volume': u'32', 'source': '(Oleagineux)', 'wos_authors': u'[Ngiefu, C. K., Paquot, C., Vieux, A., Kabele Ngiefu, C.]', 'year': u'1977', 'query': u'AU=(Kabele-Ngiefu) AND PY=1977 AND SO=(Oleagineux)', 'page': u'535'}], u'11506': [{'wos_ids': {u'ResearcherID.Disclaimer': [u'ResearcherID data provided by Thomson Reuters']}, 'wos_title': u"[Oil-bearing plants of Zaire. II. Botanical families providing oils of medium unsaturation., Les plantes a huile du Zaire. II. Familles botaniques fournissant des huiles d'insaturation moyenne.]", 'author': '(Kabele-Ngiefu)', 'wos_uid': u'CABI:19770351574', 'wos_source_data': {u'SourceTitle': [u'Oleagineux'], u'Volume': [u'31'], u'Published.BiblioYear': [u'1976'], u'Issue': [u'12'], u'Pages': [u'545-547']}, 'id': u'11506', 'volume': u'31', 'source': '(Oleagineux)', 'wos_authors': u'[Ngiefu, C. K., Paquot, C., Vieux, A., Kabele Ngiefu, C.]', 'year': u'1976', 'query': u'AU=(Kabele-Ngiefu) AND PY=1976 AND SO=(Oleagineux)', 'page': u'545'}], u'11507': [{'wos_ids': {u'ResearcherID.Disclaimer': [u'ResearcherID data provided by Thomson Reuters']}, 'wos_title': u"[Oil-bearing plants of Zaire. II. Botanical families providing oils of medium unsaturation., Les plantes a huile du Zaire. II. Familles botaniques fournissant des huiles d'insaturation moyenne.]", 'author': '(Kabele-Ngiefu)', 'wos_uid': u'CABI:19770351574', 'wos_source_data': {u'SourceTitle': [u'Oleagineux'], u'Volume': [u'31'], u'Published.BiblioYear': [u'1976'], u'Issue': [u'12'], u'Pages': [u'545-547']}, 'id': u'11507', 'volume': u'31', 'source': '(Oleagineux)', 'wos_authors': u'[Ngiefu, C. K., Paquot, C., Vieux, A., Kabele Ngiefu, C.]', 'year': u'1976', 'query': u'AU=(Kabele-Ngiefu) AND PY=1976 AND SO=(Oleagineux)', 'page': u'545'}], u'11504': [{'wos_ids': {u'Identifier.Issn': [u'0030-2082'], u'ResearcherID.Disclaimer': [u'ResearcherID data provided by Thomson Reuters']}, 'wos_title': u'[Oil plants of Zaire. I. Botanical families giving oils of relatively low unsaturation.]', 'author': '(Kabele-Ngiefu)', 'wos_uid': u'FSTA:1977-03-N-0175', 'wos_source_data': {u'SourceTitle': [u'Oleagineux'], u'Volume': [u'31'], u'Published.BiblioYear': [u'1976'], u'Issue': [u'7'], u'Pages': [u'335-337']}, 'id': u'11504', 'volume': u'31', 'source': '(Oleagineux)', 'wos_authors': u'[Kabele Ngiefu, C., Paquot, C., Vieux, A.]', 'year': u'1976', 'query': u'AU=(Kabele-Ngiefu) AND PY=1976 AND SO=(Oleagineux)', 'page': u'335'}], u'11505': [{'wos_ids': {u'ResearcherID.Disclaimer': [u'ResearcherID data provided by Thomson Reuters']}, 'wos_title': u"[Oil-bearing plants of Zaire. II. Botanical families providing oils of medium unsaturation., Les plantes a huile du Zaire. II. Familles botaniques fournissant des huiles d'insaturation moyenne.]", 'author': '(Kabele-Ngiefu)', 'wos_uid': u'CABI:19770351574', 'wos_source_data': {u'SourceTitle': [u'Oleagineux'], u'Volume': [u'31'], u'Published.BiblioYear': [u'1976'], u'Issue': [u'12'], u'Pages': [u'545-547']}, 'id': u'11505', 'volume': u'31', 'source': '(Oleagineux)', 'wos_authors': u'[Ngiefu, C. K., Paquot, C., Vieux, A., Kabele Ngiefu, C.]', 'year': u'1976', 'query': u'AU=(Kabele-Ngiefu) AND PY=1976 AND SO=(Oleagineux)', 'page': u'545'}], u'12037': [{'wos_ids': {u'Identifier.Issn': [u'0030-2082'], u'ResearcherID.Disclaimer': [u'ResearcherID data provided by Thomson Reuters'], u'Identifier.Ids': [u'H7985']}, 'wos_title': u'[INTERNAL AND EXTERNAL DISTRIBUTION OF TRIGLYCERIDE FATTY-ACIDS IN A FEW GAMMA-LINOLENIC OILS]', 'author': '(Muderhwa)', 'wos_uid': u'WOS:A1987H798500005', 'wos_source_data': {u'SourceTitle': [u'OLEAGINEUX'], u'Published.BiblioYear': [u'1987'], u'Published.BiblioDate': [u'MAY'], u'Volume': [u'42'], u'Issue': [u'5'], u'Pages': [u'207-211']}, 'id': u'12037', 'volume': u'42', 'source': '(Oleagineux)', 'wos_authors': u'[MUDERHWA, JM, DHUIQUEMAYER, C, PINA, M, GALZY, P, GRIGNAC, P, GRAILLE, J]', 'year': u'1987', 'query': u'AU=(Muderhwa) AND PY=1987 AND SO=(Oleagineux)', 'page': u'207'}], u'11503': [{'wos_ids': {u'ResearcherID.Disclaimer': [u'ResearcherID data provided by Thomson Reuters']}, 'wos_title': u"[Oil-bearing plants of Zaire. II. Botanical families providing oils of medium unsaturation., Les plantes a huile du Zaire. II. Familles botaniques fournissant des huiles d'insaturation moyenne.]", 'author': '(Kabele-Ngiefu)', 'wos_uid': u'CABI:19770351574', 'wos_source_data': {u'SourceTitle': [u'Oleagineux'], u'Volume': [u'31'], u'Published.BiblioYear': [u'1976'], u'Issue': [u'12'], u'Pages': [u'545-547']}, 'id': u'11503', 'volume': u'31', 'source': '(Oleagineux)', 'wos_authors': u'[Ngiefu, C. K., Paquot, C., Vieux, A., Kabele Ngiefu, C.]', 'year': u'1976', 'query': u'AU=(Kabele-Ngiefu) AND PY=1976 AND SO=(Oleagineux)', 'page': u'545'}], u'11500': [{'wos_ids': {u'ResearcherID.Disclaimer': [u'ResearcherID data provided by Thomson Reuters']}, 'wos_title': u"[Oil-bearing plants of Zaire. III. Botanical families providing oils of relatively high unsaturation., Les plantes a huile du Zaire. III. Familles botaniques fournissant des huiles d'insaturation relativement elevee.]", 'author': '(Kabele)', 'wos_uid': u'CABI:19780360953', 'wos_source_data': {u'SourceTitle': [u'Oleagineux'], u'Volume': [u'32'], u'Published.BiblioYear': [u'1977'], u'Issue': [u'12'], u'Pages': [u'535-537']}, 'id': u'11500', 'volume': u'32', 'source': '(Oleagineux)', 'wos_authors': u'[Ngiefu, C. K., Paquot, C., Vieux, A., Kabele Ngiefu, C.]', 'year': u'1977', 'query': u'AU=(Kabele) AND PY=1977 AND SO=(Oleagineux)', 'page': u'535'}], u'11501': [{'wos_ids': {u'ResearcherID.Disclaimer': [u'ResearcherID data provided by Thomson Reuters']}, 'wos_title': u"[Oil-bearing plants of Zaire. III. Botanical families providing oils of relatively high unsaturation., Les plantes a huile du Zaire. III. Familles botaniques fournissant des huiles d'insaturation relativement elevee.]", 'author': '(Kabele)', 'wos_uid': u'CABI:19780360953', 'wos_source_data': {u'SourceTitle': [u'Oleagineux'], u'Volume': [u'32'], u'Published.BiblioYear': [u'1977'], u'Issue': [u'12'], u'Pages': [u'535-537']}, 'id': u'11501', 'volume': u'32', 'source': '(Oleagineux)', 'wos_authors': u'[Ngiefu, C. K., Paquot, C., Vieux, A., Kabele Ngiefu, C.]', 'year': u'1977', 'query': u'AU=(Kabele) AND PY=1977 AND SO=(Oleagineux)', 'page': u'535'}], u'12239': [{'wos_ids': {u'Identifier.Issn': [u'0030-2082'], u'ResearcherID.Disclaimer': [u'ResearcherID data provided by Thomson Reuters'], u'Identifier.Ids': [u'AEE94']}, 'wos_title': u'[RESEARCH ON OENOTHERA RICH IN GAMMA-LINOLENIC ACID]', 'author': '(Pina)', 'wos_uid': u'WOS:A1984AEE9400006', 'wos_source_data': {u'SourceTitle': [u'OLEAGINEUX'], u'Volume': [u'39'], u'Published.BiblioYear': [u'1984'], u'Issue': [u'12'], u'Pages': [u'593-596']}, 'id': u'12239', 'volume': u'39', 'source': '(Oleagineux)', 'wos_authors': u'[PINA, M, GRAILLE, J, GRIGNAC, P, LACOMBE, A, QUENOT, O, GARNIER, P]', 'year': u'1984', 'query': u'AU=(Pina) AND PY=1984 AND SO=(Oleagineux)', 'page': u'593'}], u'11588': [{'wos_ids': {u'Identifier.Issn': [u'0030-2082'], u'ResearcherID.Disclaimer': [u'ResearcherID data provided by Thomson Reuters'], u'Identifier.Ids': [u'CC848']}, 'wos_title': u'[THE COMPOSITION OF FREE AND ESTERIFIED STEROLS IN TOMATO SEED OIL]', 'author': '(Kiosseoglou AND Boskou)', 'wos_uid': u'WOS:A1989CC84800005', 'wos_source_data': {u'SourceTitle': [u'OLEAGINEUX'], u'Published.BiblioYear': [u'1989'], u'Published.BiblioDate': [u'FEB'], u'Volume': [u'44'], u'Issue': [u'2'], u'Pages': [u'113-115']}, 'id': u'11588', 'volume': u'44', 'source': '(Oleagineux)', 'wos_authors': u'[KIOSSEOGLOU, B, BOSKOU, D]', 'year': u'1989', 'query': u'AU=(Kiosseoglou AND Boskou) AND PY=1989 AND SO=(Oleagineux)', 'page': u'113'}], u'12240': [{'wos_ids': {u'Identifier.Issn': [u'0030-2082'], u'ResearcherID.Disclaimer': [u'ResearcherID data provided by Thomson Reuters'], u'Identifier.Ids': [u'AEE94']}, 'wos_title': u'[RESEARCH ON OENOTHERA RICH IN GAMMA-LINOLENIC ACID]', 'author': '(Pina)', 'wos_uid': u'WOS:A1984AEE9400006', 'wos_source_data': {u'SourceTitle': [u'OLEAGINEUX'], u'Volume': [u'39'], u'Published.BiblioYear': [u'1984'], u'Issue': [u'12'], u'Pages': [u'593-596']}, 'id': u'12240', 'volume': u'39', 'source': '(Oleagineux)', 'wos_authors': u'[PINA, M, GRAILLE, J, GRIGNAC, P, LACOMBE, A, QUENOT, O, GARNIER, P]', 'year': u'1984', 'query': u'AU=(Pina) AND PY=1984 AND SO=(Oleagineux)', 'page': u'593'}], u'12241': [{'wos_ids': {u'Identifier.Issn': [u'0030-2082'], u'ResearcherID.Disclaimer': [u'ResearcherID data provided by Thomson Reuters'], u'Identifier.Ids': [u'AEE94']}, 'wos_title': u'[RESEARCH ON OENOTHERA RICH IN GAMMA-LINOLENIC ACID]', 'author': '(Pina)', 'wos_uid': u'WOS:A1984AEE9400006', 'wos_source_data': {u'SourceTitle': [u'OLEAGINEUX'], u'Volume': [u'39'], u'Published.BiblioYear': [u'1984'], u'Issue': [u'12'], u'Pages': [u'593-596']}, 'id': u'12241', 'volume': u'39', 'source': '(Oleagineux)', 'wos_authors': u'[PINA, M, GRAILLE, J, GRIGNAC, P, LACOMBE, A, QUENOT, O, GARNIER, P]', 'year': u'1984', 'query': u'AU=(Pina) AND PY=1984 AND SO=(Oleagineux)', 'page': u'593'}], u'11510': [], u'12084': [], u'12085': [], u'12082': [{'wos_ids': {u'ResearcherID.Disclaimer': [u'ResearcherID data provided by Thomson Reuters']}, 'wos_title': u"[Oil-bearing plants of Zaire. II. Botanical families providing oils of medium unsaturation., Les plantes a huile du Zaire. II. Familles botaniques fournissant des huiles d'insaturation moyenne.]", 'author': '(Ngiefu)', 'wos_uid': u'CABI:19770351574', 'wos_source_data': {u'SourceTitle': [u'Oleagineux'], u'Volume': [u'31'], u'Published.BiblioYear': [u'1976'], u'Issue': [u'12'], u'Pages': [u'545-547']}, 'id': u'12082', 'volume': u'31', 'source': '(Oleagineux)', 'wos_authors': u'[Ngiefu, C. K., Paquot, C., Vieux, A., Kabele Ngiefu, C.]', 'year': u'1976', 'query': u'AU=(Ngiefu) AND PY=1976 AND SO=(Oleagineux)', 'page': u'545'}], u'12083': [{'wos_ids': {u'ResearcherID.Disclaimer': [u'ResearcherID data provided by Thomson Reuters']}, 'wos_title': u"[Oil-bearing plants of Zaire. I. Botanical families providing oils of relatively low unsaturation., Les plantes a huile du Zaire. I. Familles botaniques fournissant des huiles d'insaturation relativement faible.]", 'author': '(Ngiefu)', 'wos_uid': u'CABI:19760347525', 'wos_source_data': {u'SourceTitle': [u'Oleagineux'], u'Volume': [u'31'], u'Published.BiblioYear': [u'1976'], u'Issue': [u'7'], u'Pages': [u'335-337']}, 'id': u'12083', 'volume': u'31', 'source': '(Oleagineux)', 'wos_authors': u'[Ngiefu, C. K., Paquot, C., Vieux, A.]', 'year': u'1976', 'query': u'AU=(Ngiefu) AND PY=1976 AND SO=(Oleagineux)', 'page': u'335'}], u'12081': [{'wos_ids': {u'ResearcherID.Disclaimer': [u'ResearcherID data provided by Thomson Reuters']}, 'wos_title': u"[Oil-bearing plants of Zaire. I. Botanical families providing oils of relatively low unsaturation., Les plantes a huile du Zaire. I. Familles botaniques fournissant des huiles d'insaturation relativement faible.]", 'author': '(Ngiefu)', 'wos_uid': u'CABI:19760347525', 'wos_source_data': {u'SourceTitle': [u'Oleagineux'], u'Volume': [u'31'], u'Published.BiblioYear': [u'1976'], u'Issue': [u'7'], u'Pages': [u'335-337']}, 'id': u'12081', 'volume': u'31', 'source': '(Oleagineux)', 'wos_authors': u'[Ngiefu, C. K., Paquot, C., Vieux, A.]', 'year': u'1976', 'query': u'AU=(Ngiefu) AND PY=1976 AND SO=(Oleagineux)', 'page': u'335'}]}

In [4]:
b["12037"]

[{'author': '(Muderhwa)',
  'id': u'12037',
  'page': u'207',
  'query': u'AU=(Muderhwa) AND PY=1987 AND SO=(Oleagineux)',
  'source': '(Oleagineux)',
  'volume': u'42',
  'wos_authors': u'[MUDERHWA, JM, DHUIQUEMAYER, C, PINA, M, GALZY, P, GRIGNAC, P, GRAILLE, J]',
  'wos_ids': {u'Identifier.Ids': [u'H7985'],
   u'Identifier.Issn': [u'0030-2082'],
   u'ResearcherID.Disclaimer': [u'ResearcherID data provided by Thomson Reuters']},
  'wos_source_data': {u'Issue': [u'5'],
   u'Pages': [u'207-211'],
   u'Published.BiblioDate': [u'MAY'],
   u'Published.BiblioYear': [u'1987'],
   u'SourceTitle': [u'OLEAGINEUX'],
   u'Volume': [u'42']},
  'wos_title': u'[INTERNAL AND EXTERNAL DISTRIBUTION OF TRIGLYCERIDE FATTY-ACIDS IN A FEW GAMMA-LINOLENIC OILS]',
  'wos_uid': u'WOS:A1987H798500005',
  'year': u'1987'}]

In [3]:
len(wosc.article_data)

2854

In [7]:
import json
import codecs
with codecs.open("article_data.json", "w", "utf-8") as f:
    json.dump(wosc.article_data, f)

In [4]:
wosc.article_data.keys()

[u'11542',
 u'11543',
 u'11540',
 u'11541',
 u'11546',
 u'11547',
 u'11544',
 u'11545',
 u'11548',
 u'11549',
 u'11380',
 u'11381',
 u'10899',
 u'10898',
 u'11384',
 u'11386',
 u'11387',
 u'11388',
 u'10892',
 u'10891',
 u'10890',
 u'10897',
 u'10895',
 u'10894',
 u'12701',
 u'12700',
 u'12703',
 u'12702',
 u'12705',
 u'12704',
 u'12707',
 u'12706',
 u'12709',
 u'12708',
 u'10967',
 u'10966',
 u'10965',
 u'10964',
 u'10963',
 u'10962',
 u'10961',
 u'10960',
 u'10969',
 u'10968',
 u'11903',
 u'11973',
 u'12019',
 u'12018',
 u'12015',
 u'12014',
 u'12017',
 u'12016',
 u'12011',
 u'12010',
 u'12013',
 u'12012',
 u'11768',
 u'11769',
 u'11762',
 u'11763',
 u'11760',
 u'11761',
 u'11766',
 u'11767',
 u'11764',
 u'11765',
 u'10709',
 u'10708',
 u'10703',
 u'10702',
 u'10701',
 u'10700',
 u'10707',
 u'10706',
 u'10705',
 u'10704',
 u'12526',
 u'11121',
 u'12992',
 u'11616',
 u'12128',
 u'11614',
 u'11615',
 u'11612',
 u'11613',
 u'10639',
 u'10638',
 u'10637',
 u'12120',
 u'10635',
 u'10634',

In [5]:
wosc.article_data["11542"]

[{u'Identifier.Ids': [u'AX840'],
  u'Identifier.Issn': [u'0021-8561'],
  u'Identifier.Xref_Doi': [u'10.1021/jf60202a041'],
  u'Issue': [u'6'],
  u'Pages': [u'1204-1207'],
  u'Published.BiblioYear': [u'1975'],
  u'ResearcherID.Disclaimer': [u'ResearcherID data provided by Thomson Reuters'],
  u'SourceTitle': [u'JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY'],
  u'Volume': [u'23'],
  'author': u'(Karakoltsidis AND Constantinides)',
  'id': u'11542',
  'page': u'1204',
  'plant_count': 8,
  'query': u'AU=(Karakoltsidis AND Constantinides) AND PY=1975 AND SO=(Journal of agricultural and food chemistry)',
  'result_count': 7,
  'source': u'(Journal of agricultural and food chemistry)',
  'volume': u'23',
  'wos_authors': u'[KARAKOLTSIDIS, PA, CONSTANTINIDES, SM]',
  'wos_title': u'[OKRA SEEDS - NEW PROTEIN SOURCE]',
  'wos_uid': u'WOS:A1975AX84000048',
  'year': u'1975'}]

In [4]:
import codecs

with codecs.open("data/updated_ohlrogge_data_20151017.txt", "w", "utf-8") as f:
    headings = wosc.article_data["11542"][0].keys()
    f.write("\t".join(headings)+"\n")
    for key in wosc.article_data:
        for record in wosc.article_data[key]:
            print record
            record_data = []
            for h in headings:
                data_point = record.get(h, "None")
                if isinstance(data_point, list):
                    record_data.append(data_point[0] if len(data_point)>0 else"None")
                else:
                    record_data.append(data_point)
            f.write("\t".join([unicode(i) for i in record_data]) + "\n")

{u'Identifier.Xref_Doi': [u'10.1021/jf60202a041'], 'orig_author': u'Karakoltsidis, P. A. Constantinides, S. M.', u'Identifier.Ids': [u'AX840'], u'Published.BiblioYear': [u'1975'], 'volume': u'23', 'year': u'1975', 'query': u'AU=(Karakoltsidis AND Constantinides) AND PY=1975 AND SO=(Journal of agricultural and food chemistry)', 'wos_result_count': 1, u'Issue': [u'6'], 'id': u'11542', 'result_count': u'9', 'wos_title': u'OKRA SEEDS - NEW PROTEIN SOURCE', 'author': u'(Karakoltsidis AND Constantinides)', u'SourceTitle': [u'JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY'], u'Identifier.Issn': [u'0021-8561'], u'Pages': [u'1204-1207'], u'Volume': [u'23'], 'source': u'(Journal of agricultural and food chemistry)', 'wos_authors': u'[KARAKOLTSIDIS, PA, CONSTANTINIDES, SM]', 'plant_count': u'1', u'ResearcherID.Disclaimer': [u'ResearcherID data provided by Thomson Reuters'], 'wos_uid': u'WOS:A1975AX84000048', 'page': u'1204'}
{u'Identifier.Ids': [u'R9553'], u'Published.BiblioYear': [u'1974'], 'year': 

AttributeError: 'str' object has no attribute 'get'

In [11]:
import codecs

with codecs.open("data/updated_ohlrogge_data_20151017.tsv", "w", "utf-8") as f:
    headings = wosc.article_data["11542"][0].keys()
    f.write("\t".join(headings)+"\n")
    for key in wosc.article_data:
        if isinstance(wosc.article_data[key], dict):
            record = wosc.article_data[key]
            record_data = []
            for h in headings:
                data_point = record.get(h, "None")
                if isinstance(data_point, list):
                    record_data.append(data_point[0] if len(data_point)>0 else"None")
                else:
                    record_data.append(data_point)
            f.write("\t".join([unicode(i) for i in record_data]) + "\n")
        else:
            for record in wosc.article_data[key]:
                record_data = []
                for h in headings:
                    data_point = record.get(h, "None")
                    if isinstance(data_point, list):
                        record_data.append(data_point[0] if len(data_point)>0 else"None")
                    else:
                        record_data.append(data_point)
                f.write("\t".join([unicode(i) for i in record_data]) + "\n")

{u'Identifier.Xref_Doi': [u'10.1021/jf60202a041'], 'orig_author': u'Karakoltsidis, P. A. Constantinides, S. M.', u'Identifier.Ids': [u'AX840'], u'Published.BiblioYear': [u'1975'], 'volume': u'23', 'year': u'1975', 'query': u'AU=(Karakoltsidis AND Constantinides) AND PY=1975 AND SO=(Journal of agricultural and food chemistry)', 'wos_result_count': 1, u'Issue': [u'6'], 'id': u'11542', 'result_count': u'9', 'wos_title': u'OKRA SEEDS - NEW PROTEIN SOURCE', 'author': u'(Karakoltsidis AND Constantinides)', u'SourceTitle': [u'JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY'], u'Identifier.Issn': [u'0021-8561'], u'Pages': [u'1204-1207'], u'Volume': [u'23'], 'source': u'(Journal of agricultural and food chemistry)', 'wos_authors': u'[KARAKOLTSIDIS, PA, CONSTANTINIDES, SM]', 'plant_count': u'1', u'ResearcherID.Disclaimer': [u'ResearcherID data provided by Thomson Reuters'], 'wos_uid': u'WOS:A1975AX84000048', 'page': u'1204'}
{u'Identifier.Ids': [u'R9553'], u'Published.BiblioYear': [u'1974'], 'year': 

In [8]:
wosc.article_data["11102"]

{'author': u'(Gulati)',
 'id': u'11102',
 'orig_author': u'Gulati, N.K. et al.',
 'page': u'',
 'plant_count': u'2',
 'query': u'AU=(Gulati) AND SO=(Forest Research Inst* Dehra Dun-248006)',
 'result_count': u'2',
 'source': u'(Forest Research Inst* Dehra Dun-248006)',
 'volume': u'SD 43',
 'year': '9999'}

In [7]:
for key, value in wosc.article_data.items():
    print key, len(value)

11542 1
11543 1
11540 10
11541 1
11546 10
11547 10
11544 10
11545 10
11548 1
11549 10
11380 1
11381 10
10899 1
10898 10
11384 1
11386 1
11387 1
11388 1
10892 10
10891 10
10890 10
10897 10
10896 1
10895 1
10894 2
12701 10
12700 10
12703 1
12702 10
12705 1
12704 10
12707 1
12706 1
12709 10
12708 10
10967 1
10966 1
10965 1
10964 10
10963 1
10962 1
10961 1
10960 1
10969 1
10968 1
11903 1
11973 1
12019 1
12018 10
12015 1
12014 1
12017 1
12016 1
12011 1
12010 1
12013 1
12012 1
11768 1
11769 1
11762 1
11763 1
11760 10
11761 1
11766 10
11767 1
11764 10
11765 10
10709 10
10708 1
10703 10
10702 1
10701 1
10700 1
10707 1
10706 1
10705 10
10704 1
12526 1
11121 10
12992 10
11616 1
12128 10
11614 1
11615 1
11612 1
11613 1
10639 1
10638 1
10637 10
12120 1
10635 10
10634 1
10633 1
10632 10
10631 1
10630 1
10417 10
10416 10
10143 10
10414 1
10413 1
10412 10
10411 1
11949 1
11946 1
11947 1
11944 1
11945 1
11942 10
11943 1
10419 1
10418 1
10387 10
11083 10
10385 10
10384 1
10383 10
10382 1
10381 1
11082 