As shown in `fill_quota.ipynb`, the theses of all three repositories contain referees, whereas only depositonce has a maintained list of advisors. Therefore, referees could be used as a substitute for venues. In refubium the most popular referee is "N.N.", which means that there isn't one. Therefore, we don't count it.

In [1]:
import json
from matplotlib import pyplot as plt
from collections import Counter
import re

In [2]:
tu = json.load(open('../../../data/processed/dim/depositonce.json'))
hu = json.load(open('../../../data/processed/dim/edoc.json'))
fu = json.load(open('../../../data/processed/dim/refubium.json'))

In [3]:
referees = {'TU': {'total': 0, 'distinct': 0}, 'HU': {'total': 0, 'distinct': 0}, 'FU': {'total': 0, 'distinct': 0}}
seen_referees = {'TU': [], 'HU': [], 'FU': []}
nulls = {'TU': 0, 'HU': 0, 'FU': 0}
totals = {'TU': 0, 'HU': 0, 'FU': 0}
repos = ['TU', 'HU', 'FU']
for i, repo in enumerate([tu, hu, fu]):
    for doc in repo:
        totals[repos[i]] += 1
        if doc['type'][1] == 'thesis':
            has_referee = False
            for author in doc['authors']:
                if author[1] in ('advisor', 'referee') and re.match('N.[\s]?N[\.]?', author[0]) is None:
                    has_referee = True
                    referees[repos[i]]['total'] += 1
                    if author[0] not in seen_referees[repos[i]]:
                        referees[repos[i]]['distinct'] += 1
                        seen_referees[repos[i]].append(author[0])
            if not has_referee:
                nulls[repos[i]] += 1

In [4]:
for repo in referees:
    print(f'{repo} has {referees[repo]["total"]} experts, {referees[repo]["distinct"]} distinct ones. {nulls[repo]} documents do not have an expert ({round(nulls[repo]/totals[repo], 2)}).')

TU has 8383 experts, 2785 distinct ones. 5 documents do not have an referee (0.0).
HU has 7308 experts, 3395 distinct ones. 30 documents do not have an referee (0.0).
FU has 9133 experts, 4863 distinct ones. 662 documents do not have an referee (0.05).


What is the avg. number of referees per thesis?

In [5]:
referees_per_doc = {'TU': [], 'HU': [], 'FU': []}
for i, repo in enumerate([tu, hu, fu]):
    for doc in repo:
        cnt = 0
        for author in doc['authors']:
            if author[1] in ('advisor', 'referee') and re.match('N.[\s]?N[\.]?', author[0]) is None:
                cnt += 1
        if cnt > 0:
            referees_per_doc[repos[i]].append(cnt)

In [6]:
for repo in referees_per_doc:
    print(repo, sum(referees_per_doc[repo]) / len(referees_per_doc[repo]))

TU 2.5565721256480636
HU 2.7639939485627836
FU 2.1991331567541534


Check in how many documents each author occurs.

In [7]:
people = {'TU': {}, 'HU': {}, 'FU': {}}
for i, repo in enumerate([tu, hu, fu]):
    for doc in repo:
        if doc['type'][1] == 'thesis':
            for author in doc['authors']:
                if author[1] in ('advisor', 'referee') and re.match('N.[\s]?N[\.]?', author[0]) is None:
                    if author[0] in people[repos[i]]:
                        people[repos[i]][author[0]] += 1
                    else:
                        people[repos[i]][author[0]] = 1

In [8]:
for repo in people:
    if len(people[repo]) > 0:
        print(f'{repo} avg.: {round(sum(people[repo].values())/len(people[repo]), 2)}')
    else:
        print(f'{repo} avg.: 0')

TU avg.: 3.01
HU avg.: 2.15
FU avg.: 1.88


In [9]:
sorted_people = {'TU': {}, 'HU': {}, 'FU': {}}
for repo in sorted_people:
    sorted_people[repo] = {person: n for person, n in sorted(people[repo].items(), key=lambda item: item[1], reverse=True)}

In [10]:
for repo in sorted_people:
    print(repo)
    cnt = 0
    for key in sorted_people[repo]:
        print(key, sorted_people[repo][key])
        cnt += 1
        if cnt == 5:
            print()
            break

TU
Lauster, Roland 89
Müller, Klaus-Robert 82
Neubauer, Peter 80
Schomäcker, Reinhard 76
Knorr, Andreas 75

HU
Härdle, Wolfgang 66
Lohse, Thomas 60
Benson, Oliver 55
Herrmann, Andreas 50
Härdle, Wolfgang Karl 47

FU
Prof. Dr. Rupert Mutzel 72
Prof. Dr. Rainer Haag 65
Prof. Dr. Udo Heinemann 49
Prof. Dr. Martin Vingron 49
Prof. Dr. Petra Knaus 41



In [11]:
for repo in sorted_people:
    cnt = 0
    for key, value in sorted_people[repo].items():
        if value > 10:
            cnt += 1
        else:
            print(repo, cnt)
            break


TU 157
HU 97
FU 88
