# Gather list of MICCAI papers
We can't distinguish papers from MICCAI using only OpenAlex, therefore we've used the API of [dblp.org](dblp.org) to get them.

We have to query an url similar to https://dblp.org/search/publ/api?q=stream%3Aconf%2Fmiccai%3A%20streamid%3Aconf%2Fmiccai%3A%20type%3AConference_and_Workshop_Papers%3A&h=1000&f=1000&format=json

We iterate through f value, as we can only get 1000 papers from a query, the f parameter indicates at which paper the result starts


## Get list of papers

In [11]:
import requests
indice_paper = 0
nextPage = True
#Dictionnary with doi as key and title as value
lst_paper = {}
while nextPage:
    request_url = f"https://dblp.org/search/publ/api?q=stream%3Aconf%2Fmiccai%3A%20streamid%3Aconf%2Fmiccai%3A%20type%3AConference_and_Workshop_Papers%3A&h=1000&f={indice_paper}&format=json"
    request = requests.get(request_url)
    if request.status_code == 200:
        r_json = request.json()
        if r_json["result"]["hits"]["@sent"] != '0':
            for paper in r_json["result"]["hits"]["hit"]:
                if "doi" not in paper["info"] or "title" not in paper["info"] or int(paper["info"]["year"]) < 2013 or paper["info"]["venue"] != "MICCAI":
                    continue
                title = paper["info"]["title"]
                title = title.replace(",","")
                title = title.replace("\n","")
                year = paper["info"]["year"]
                venue = paper["info"]["venue"]
                doi = paper["info"]["doi"]
                lst_paper[doi] = title
            indice_paper += 1000
        else:
            nextPage = False
        
            

## Save it in a csv
So we don't need to query again the API

In [12]:
import csv
with open('../../data/miccai_papers.csv','w') as f:
    f.write("doi,title")
    for k in lst_paper:
        f.write(f"\n{k},{lst_paper[k]}")

# Request OpenAlex
For each paper obtained above, we use its API to query OpenAlex and check if a dataset is in the references. Then we apply the wrong reference detection process

### Load list of papers and datasets

In [29]:
lst_paper = {}
reader = csv.DictReader(open('../../data/miccai_papers.csv'))
for paper in reader:
    lst_paper[paper["doi"]] = paper["title"]

#Dictionnary with dataset's name as key and DOI as value
datasets_doi = {}
ds_reader = csv.DictReader(open('../../data/datasets.csv'))
for ds in ds_reader:
    datasets_doi[ds["name"]] = ds["DOI"]

"""
Convert a DOI to OpenAlex ID used as value in some API field such as "referenced_works"
@param
    - DOI: the doi we want to convert
@return
    The OpenAlex ID if the DOI is in OpenAlex database, None otherwise
"""
def doi_to_OpenAlexId(doi):
    base_url = f"https://api.openalex.org/works/doi:{doi}"
    r = requests.get(base_url)
    if r.status_code == 200:
        r_json = r.json()
        return r_json["id"]
    else:
        return None

#Dictionnary with dataset names as key and openalex id as value. We associate an openalex ID because it's the value in the "referenced_works" field given by the API.
datasets_id = {}

#Convert DOI to OpenAlexID
for ds  in datasets_doi:
    openalex_id = doi_to_OpenAlexId(datasets_doi[ds])
    if not openalex_id:
        print(f"Couldn't convert DOI for {ds} into OpenAlex ID")
    datasets_id[ds]=openalex_id

Couldn't convert DOI for Synapse into OpenAlex ID


In [30]:
datasets_id

{'ACDC': 'https://openalex.org/W2804047627',
 'LA': 'https://openalex.org/W3093394156',
 'MSCMR': 'https://openalex.org/W4312016581',
 'M&Ms': 'https://openalex.org/W4226199676',
 'PROMISE12': 'https://openalex.org/W2106033751',
 'I2CVB': 'https://openalex.org/W2049522781',
 'BRATS': 'https://openalex.org/W1641498739',
 'Synapse': None}

### Query OpenAlex for each paper

In [34]:
from tqdm import tqdm
paper_referencing = {ds:[] for ds in datasets_id}

for doi in tqdm(lst_paper):
    title = lst_paper[doi]
    request_url = f"https://api.openalex.org/works/doi:{doi}"
    request = requests.get(request_url)
    if request.status_code == 200:
        r_json = request.json()
        fulltext_url = r_json["open_access"]["oa_url"]
        
        #Remove review/survey paper
        if "review" in title.lower() or "survey" in title.lower():
            continue

        for ds in paper_referencing:
            if datasets_id[ds] in r_json["referenced_works"]:
                paper_referencing[ds].append((title,doi,r_json["publication_year"],r_json["abstract_inverted_index"],fulltext_url))
                #paper_referencing[ds].append(doi)

100%|██████████| 3839/3839 [28:28<00:00,  2.25it/s]


In [35]:
for d in paper_referencing:
    print(f"Number of citations for {d}: {len(paper_referencing[d])}")

Number of citations for ACDC: 40
Number of citations for LA: 6
Number of citations for MSCMR: 0
Number of citations for M&Ms: 8
Number of citations for PROMISE12: 19
Number of citations for I2CVB: 7
Number of citations for BRATS: 76
Number of citations for Synapse: 0
