# Harvesting radar from OAI

Using Sickle OAI client:
https://pypi.org/project/Sickle/

## OAI has invalid records

The Radar OAI returns sometimes invalid records. 
For example record with **identifier** `10.22000/332` has no valid XML within the `metadata` section.

In [1]:
# Example
import requests
resp = requests.get(
    "https://www.radar-service.eu/oai/OAIHandler?verb=GetRecord&metadataPrefix=datacite&identifier=10.22000/332")
print(resp.text)

<?xml version="1.0" encoding="UTF-8" ?><?xml-stylesheet type="text/xsl" href="/oai/stylesheet"?><OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"><responseDate>2022-03-28T14:38:02Z</responseDate><request verb="GetRecord" metadataPrefix="datacite" identifier="10.22000/332">https://www.radar-service.eu/oai/OAIHandler</request><GetRecord><record><header><identifier>10.22000/332</identifier><datestamp>2022-03-02T15:08:54Z</datestamp></header><metadata>10.22000/332
        10.1029/2020JD033857
            Söder, Jens
            Jens
            Söder
            0000-0001-6869-545X
            Leibniz Institute of Atmospheric Physics at the University of Rostock
    SoederJGR2020
        Data to reproduce the figures in publication.
        Leibniz Institute of Atmospheric Physics at the University of Rostock
    2020
    2021
 

## Patch Sickle to show error details

Using: https://pypi.org/project/mock/

In [2]:
from unittest import mock

In [3]:
from sickle import models
from sickle.utils import xml_to_dict

def get_metadata(self):
    try:
        metadata = xml_to_dict(
            self.xml.find(
                './/' + self._oai_namespace + 'metadata'
            ).getchildren()[0], strip_ns=self._strip_ns)
    except Exception:
        # print(self.raw)
        raise Exception(f"Invalid xml for metadata: {self.header.identifier}")
    return metadata

## Run patched Sickle

In [4]:
from sickle import Sickle

In [5]:
with mock.patch.object(models.Record, "get_metadata", get_metadata):
    sickle = Sickle("https://www.radar-service.eu/oai/OAIHandler")
    records = sickle.ListRecords(metadataPrefix='datacite', ignore_deleted=True)

    count = 0
    num_failures = 0
    num_success = 0
    limit = 1000
    ok = True

    while ok:
        try:
            record = records.next()
        except StopIteration:
            print("all records received:", count)
            ok = False
        except Exception as e:
            print(count, "Error:", e)
            num_failures += 1
            # break
        else:
            num_success += 1
            last_valid_index = count
            print(count, record.header.identifier)
        count += 1
        if count >= limit:
            ok = False


0 10.22000/358
1 10.22000/360
2 10.22000/361
3 10.22000/374
4 10.22000/381
5 10.22000/385
6 10.22000/394
7 10.22000/399
8 10.22000/400
9 10.22000/404
10 10.22000/43
11 10.22000/44
12 10.22000/458
13 10.22000/53
14 10.22000/54
15 10.22000/64
16 10.22000/152
17 10.22000/155
18 10.22000/156
19 10.22000/237
20 10.22000/251
21 10.22000/258
22 10.22000/263
23 10.22000/271
24 10.22000/272
25 10.22000/275
26 10.22000/276
27 10.22000/280
28 10.22000/284
29 10.22000/286
30 10.22000/288
31 10.22000/289
32 10.22000/290
33 10.22000/291
34 10.22000/292
35 10.22000/293
36 10.22000/294
37 10.22000/295
38 10.22000/296
39 10.22000/297
40 10.22000/298
41 10.22000/299
42 10.22000/300
43 10.22000/301
44 10.22000/302
45 10.22000/303
46 10.22000/304
47 10.22000/305
48 10.22000/306
49 10.22000/307
50 10.22000/308
51 10.22000/309
52 10.22000/310
53 10.22000/311
54 10.22000/312
55 10.22000/313
56 10.22000/314
57 10.22000/315
58 10.22000/318
59 10.22000/319
60 10.22000/321
61 10.22000/324
62 10.22000/325
63 10.2

In [6]:
print("\n\nResult")
print("Failures:", num_failures)
print("Success:", num_success)
print("Last valid index:", last_valid_index)



Result
Failures: 38
Success: 114
Last valid index: 146
