# Wikidata - External Reference Counts

Count the references to external resources across Wikidata entities.

First, execute the following queries and save the results to the indicated CSV files:

* Query for all properties that are related to external references and return them and their references (WDQS, saved to __query_Props.csv__)

```
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
SELECT DISTINCT ?prop ?ext ?type WHERE {
  	   ?prop wikibase:propertyType []. 
  	   { {?prop wdt:P2235 ?ext . BIND("Ext_Superprop" as ?type)} UNION 
       	     {?prop wdt:P2236 ?ext . BIND("Ext_Subprop" as ?type)} UNION 
    	     {?prop wdt:P1628 ?ext . BIND("Equiv_Prop" as ?type)} } 
  	   FILTER (!CONTAINS(str(?ext),"http://www.wikidata.org")) .
} ORDER BY DESC(?type)
```

* Query for external references that are equivalent classes (QLever, saved to __query_Equiv_Class.csv__)

```
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT DISTINCT ?item ?ext WHERE {
  	    ?item wdt:P1709 ?ext .  # Equivalent class
	    { {?item wdt:P279 ?x} UNION {?y wdt:P279 ?item} } # In class hierarchy
  	    FILTER (!CONTAINS(str(?ext),"http://www.wikidata.org")) .
}
```

* Query for external references that are exact matches (QLever, saved to __query_Exact_Class.csv__)

```
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
SELECT DISTINCT ?item ?ext WHERE {
  	    ?item wdt:P2888 ?ext .  # Exact match class
	    { {?item wdt:P279 ?x} UNION {?y wdt:P279 ?item} } # In class hierarchy
  	    FILTER (!CONTAINS(str(?ext),"http://www.wikidata.org")) .
}
```

Then, execute the following cell and results will be printed. Results are shown below for counts of 50 references and above.

In [1]:
files = ("query_Props.csv", "query_Equiv_Class.csv", "query_Exact_Class.csv")
domains = dict()

for file in files:
    with open(file, "r") as inputs:
        while True:
            line = inputs.readline()
            if len(line) == 0:
                break
            external = line.split(",")[1]
            if not external.startswith("http"):
                continue
            iri = external.split("//")[1]
            segments = iri.split("/")
            final = segments[0]
            if "purl.org" in iri:
                final = final + "/" + segments[1]
            cnt = 0
            if final in domains:
                cnt = domains[final]
            final_cnt = cnt + 1
            domains[final] = final_cnt

sorted_references = sorted(domains.items(), key=lambda x:x[1], reverse=True)
for iri, count in sorted_references:
    if count > 50:   # Output results if more than 50 references
        print(iri, count)

identifiers.org 199777
purl.obolibrary.org 57676
www.orpha.net 8573
publications.europa.eu 5844
www.rhea-db.org 4392
schema.org 788
wordnet-rdf.princeton.edu 543
www.tcdb.org 494
dbpedia.org 424
purl.uniprot.org 406
www.ncbi.nlm.nih.gov 198
www.w3.org 163
www.lexinfo.net 141
id.loc.gov 120
www.uniprot.org 95
pcp-on-web.de 92
purl.org/ontology 80
purl.org/spar 69
purl.org/dc 66
cv.iptc.org 60
purl.org/coar 56
d-nb.info 55
