### OpenAlex

Raw:

* JSONL file automatically downloaded

Data:

* id
* issns
* title
* other titles
* publisher id
* publisher title
* publisher other titles
* link

Metrics:

* Impact Factor (IF)
* H-index

In [None]:
from data.openalex import openalex

In [None]:
publishers, journals = openalex.get_data()

In [None]:
# Fix strange characters in The Lancet Gastroenterology & Hepatology
journals['S2530914053']['title'] = 'The Lancet Gastroenterology & Hepatology'

# Fix Nature Reviews titles
for journal in journals.values():
	journal['title'] = journal['title'].replace('Nature reviews.', 'Nature Reviews')

# Fix CA: A Cancer Journal for Clinicians title
journals['S126094547']['title'] = 'CA: A Cancer Journal for Clinicians'

### Scopus

Raw:

* XLSX file manually downloaded from: https://www.elsevier.com/products/scopus/content#4-titles-on-scopus

	* Click on: `Download the Source title list`

Data:

* scopus id
* issns
* other titles
* active
* in scopus
* last year
* publisher other titles
* fields

In [None]:
from data.scopus import scopus
from data.merge import *

In [None]:
data = scopus.get_data(file='ext_list_March_2025.xlsx', sheet='Scopus Sources Mar. 2025')

In [None]:
exact_matches, pairs = create_pairs(journals, data)

In [None]:
exact_matches = filter_pairs(publishers, journals, data, exact_matches, pairs)

In [None]:
publishers, journals = update(publishers, journals, data, exact_matches)

### CiteScore (Scopus)

Raw:

* XLSX files manually downloaded from: https://www.scopus.com/sources.uri

	* Select 1,000 journals

	* Click on: `Export to Excel`

	* Repeat for the next 1,000 journals until all journals are downloaded

Data:

* other titles
* publisher other titles

Metrics:

* CiteScore
* Source Normalized Impact per Paper (SNIP)
* SCImago Journal Rank (SJR)

In [None]:
from data.citescore import citescore

In [None]:
data = citescore.get_data()

In [None]:
exact_matches, pairs = create_pairs(journals, data)

In [None]:
exact_matches = filter_pairs(publishers, journals, data, exact_matches, pairs)

In [None]:
publishers, journals = update(publishers, journals, data, exact_matches)

### SCImago

Raw:

* CSV files manually downloaded from: https://www.scimagojr.com/journalrank.php

	* Select a year

	* Click on: `Download data`

	* Repeat for all the available years

Data:

* scopus id
* issns
* other titles
* last year
* publisher other titles
* fields

Metrics:

* Impact Factor (IF)
* H index
* SCImago Journal Rank (SJR)

In [None]:
from data.scimago import scimago

In [None]:
data = scimago.get_data()

In [None]:
exact_matches, pairs = create_pairs(journals, data)

In [None]:
exact_matches = filter_pairs(publishers, journals, data, exact_matches, pairs)

In [None]:
publishers, journals = update(publishers, journals, data, exact_matches)

### CWTS

Raw:

* XLSX file manually downloaded from: https://www.journalindicators.com/downloads

	* Click on: `Download results of CWTS Journal Indicators`

Data:

* issns
* other titles
* publisher other titles
* fields

Metrics:

* Source Normalized Impact per Paper (SNIP)
* Self-Citation Ratio

In [None]:
from data.cwts import cwts

In [None]:
data = cwts.get_data(file='CWTS Journal Indicators March 2024.xlsx', sheet='Sources')

In [None]:
exact_matches, pairs = create_pairs(journals, data)

In [None]:
exact_matches = filter_pairs(publishers, journals, data, exact_matches, pairs)

In [None]:
publishers, journals = update(publishers, journals, data, exact_matches)

### Eigenfactor

Raw:

* JSONL file automatically downloaded

Data:

* issns
* other titles

Metrics:

* Source Normalized Impact per Paper (SNIP)
* Self-Citation Ratio

In [None]:
from data.eigenfactor import eigenfactor

In [None]:
data = eigenfactor.get_data()

In [None]:
exact_matches, pairs = create_pairs(journals, data)

In [None]:
exact_matches = filter_pairs(publishers, journals, data, exact_matches, pairs)

In [None]:
publishers, journals = update(publishers, journals, data, exact_matches)

### SciScore

Raw:

* JSONL file automatically downloaded

Data:

* other titles

Metrics:

* Rigor & Transparency Index (RTI)

In [None]:
from data.sciscore import sciscore

In [None]:
data = sciscore.get_data()

In [None]:
exact_matches, pairs = create_pairs(journals, data)

In [None]:
exact_matches = filter_pairs(publishers, journals, data, exact_matches, pairs)

In [None]:
publishers, journals = update(publishers, journals, data, exact_matches)

### OSF

Raw:

* CSV file manually downloaded from: https://osf.io/qatkz

	* Click on: `⁝` → `Download`

Data:

* issns
* other titles
* publisher other titles

Metrics:

* Transparency and Openness Promotion Factor (TOP Factor)

In [None]:
from data.osf import osf

In [None]:
data = osf.get_data(file='top-factor.csv')

In [None]:
exact_matches, pairs = create_pairs(journals, data)

In [None]:
exact_matches = filter_pairs(publishers, journals, data, exact_matches, pairs)

In [None]:
publishers, journals = update(publishers, journals, data, exact_matches)

### Altmetric

Raw:

* CSV file manually downloaded from: https://www.altmetric.com/journal-selection-dashboard

	* Fill in the form

	* Access: https://lookerstudio.google.com/u/0/reporting/bf225056-5331-44ac-a9c8-ed75c745dce2/page/4RByC

	* Click on: `⁝` → `Exporter`

Data:

* other titles

Metrics:

* News mentions

In [None]:
from data.altmetric import altmetric

In [None]:
data = altmetric.get_data(file='Shareable Journal Selection Dashboard Demo (MT)_Journal List and Filtering_Tableau.csv')

In [None]:
exact_matches, pairs = create_pairs(journals, data)

In [None]:
exact_matches = filter_pairs(publishers, journals, data, exact_matches, pairs)

In [None]:
publishers, journals = update(publishers, journals, data, exact_matches)

### Fixes

In [None]:
# Remove strange Impact Factor value for International Journal of Engineering and Technology
journals['S2764657047']['metrics']['if'].remove(max(journals['S2764657047']['metrics']['if']))

# Only set the Multidisciplinary field for Science
journals['S3880285']['fields'] = [1000]

### Merge metrics

In [None]:
from data.utils import *

for journal in journals.values():
	for key, value in journal['metrics'].items():
		if key == 'h':
			journal['metrics'][key] = round(np.mean(value)) if len(value) > 0 else None
		else:
			journal['metrics'][key] = float(np.mean(value)) if len(value) > 0 else None

	journal['metrics'] = remove_none(journal['metrics'])

### Save

In [None]:
import json

results = []

for journal in journals.values():
	results.append({
		'id': journal['id'],
		'title': journal['title'],
		'publisher': publishers[journal['publisher']]['title'] if journal['publisher'] is not None else None,
		'link': journal['link'],
		'fields': journal['fields'],
		'metrics': journal['metrics'],
	})

results.sort(key=lambda x: x['title'])

with open('data/journals.jsonl', 'w', encoding='utf-8') as file:
	for result in results:
		file.write(json.dumps(result, ensure_ascii=False) + '\n')