Institution data cleansing and name resolution
We use Microsoft Academic Graph for cleaning institution names and aliases.
The result of the process described below are stored in
inst_fullname.csv (containing names, urls and wikipedia urls), and
inst_alias.csv (containing mappings of multiple surfaces strings to a single institution as described in item 1 below).
e.g. (Searching name for a key)
aalto university --> aalto university aalto university school of business --> aalto university aalto university school of electrical engineering --> aalto university aalto university school of science --> aalto university
We found 6231 keys from 6245 institution names (Updated 11/July/2019)
|name == key (primary name)||6231|
|name != key (alias name)||14|
Common Error Types
- Different key values for aliases
at t --> at t at t labs --> at t labs auburn university --> auburn university auburn university at montgomery --> auburn university at montgomery
current solution: rely on MAG result as it is.
- Alias not registered or unrecognisable
No result: department of systems biology No result: school of industrial technology
current solution: use the name as a key
Affiliations.txt has (key, fullname, grid, url, wikipedia_url) tuples.
$ cut -f3,4,5,6,7 Affiliations.txt > inst_fullname