# Importation of documents from Open Science sources

---
## HAL (France, UPHF)

 - Need to be register : No
 - API : Yes
 - Cost : Free
 - Documentation : Yes (in french only)
 - Access : simple GET request
 - Ref biblio : bibtex, endnote, COINS
 - answer format : json (default), xml, atom, csv

- fields : a lot !!
  - title, abstract, authors, origin (conference, journal, project, ...), date

In [1]:
import requests

---
### Define the request
See ["type de champs"](https://api.archives-ouvertes.fr/docs/search/?schema=fields#fields) for the detail about the parameters.

In [2]:
#  HAL API URL to search documents (here focused on UPHF docs)
base_url = 'https://api.archives-ouvertes.fr/search/uphf/'
#  HAL API URL to search documents (here focused on ANR projects)
#base_url = 'https://api.archives-ouvertes.fr/search/anr/'

In [3]:
# search parameters
# query = agent
# fields to get = title, abstract, authors, conference, journal, date, ....
# format = JSON (can be XML, Bibtex, ..)
# nb of document to get = 5 for the example
# Paramètres de la requête
#params = {
#    'query': 'artificial intelligence',  # Ajoutez ici votre requête de recherche
#    'scope': 'your-scope',  # Ajoutez ici la portée de la recherche si nécessaire
#}


#query found 'agent' in the abstract 
query='abstract_t:(agent)'

# Fields to get back (title, abstract, ...) (see API doc)
details = 'title_s,authFullName_s,keyword_s,conferenceTitle_s,journalTitle_s,proceedings_s,bookTitle_s,abstract_s,submittedDateY_i,anrProjectAcronym_s,collIsParentOfColl_fs'

params = {
    'q': query, #question
    'fl': details,  
    'sort': 'submittedDateY_i desc', # sort by year decreasing (asc for ascending)    'wt': 'json',  # output format (JSON here, but XML, Bibtex, .... are possible)
    'rows': 50,  # nb of documents to get
}

----
### Launch the request
A simple GET request interrogate the HAL repository to retreive the asked document in the format you have chosen


In [14]:
# GET request
response = requests.get(base_url, params=params)
#response = requests.get(base_url)


In [16]:
# WARNING you should check if the request succeed (200 = Ok, 404 = Error, 102 = in process)
#if response.status_code == 200: 
#    print("response in JSON format : ")
#    print(response.content) 
#    ....
print(response.content[0:100])

b'{\n  "response":{"numFound":258,"start":0,"numFoundExact":true,"docs":[\n      {\n        "proceedings_'


---
### Decode the result
You have to choose the decoder in relation with the format : 


In [6]:
data = response.json()
# data is a dictionary containing the nb of articles found, and a dictionary with the articles

In [7]:
# from the response of the data, get the docs
articles = data["response"]["docs"]

In [13]:
for idx, article in enumerate(articles, start=1):
    tab_details = details.split(',')
    dict_details = {}
    for d in tab_details:
        dict_details[d] = article.get(d, '')
    for d in tab_details:
        if dict_details[d] != "": print(f"{d} : {dict_details[d]}")
    print("="*50 + "\n")

title_s : ['A Generic and Configurable Electronic Informer to Assist the Evaluation of Agent-Based Interactive Systems']
authFullName_s : ['Chi Dung Tran', 'Houcine Ezzedine', 'Christophe Kolski']
keyword_s : ['Interactive System', 'Configurable Model', 'Interactive Application', 'Interface Agent', 'Public Transport System']
conferenceTitle_s : Computer-Aided Design of User Interfaces VI, Proceedings of the 7th international conference on Computer- Aided Design of User Interfaces (CADUI 2008)
proceedings_s : 1
abstract_s : ['The evaluation of user interactive systems has been an active subject of research for many years. Many methods have been proposed but most existing evaluation methods do not take the specific architecture of an agent-based interac-tive system into account and nor do they focus on the coupling between the archi-tecture and evaluation phase. In this article, we propose an agent-based architec-ture of interactive systems that is considered as being mixed (it is both f