# Assortative Matching With Coarse Types: Chi-Square Estimator of Deep Parameters given Stochastic Classification of Economics Departments by Placement Rates

James Yu, 30 November 2021 - modified by mike December

## VERSION: $n$, $m$ fixed, ratios endogenous

In [1]:
INSTALL_PACKAGES = false   # change this to true if you are running this notebook for the first time
YEAR_INTERVAL = 2003:2021  # change this to select the years of data to include in the estimation
NUMBER_OF_TYPES = 4        # change this to select the number of types to classify academic departments into
NUMBER_OF_SINKS = 1        # change this to 4 to use individual sink types
SAVE_TO_DATADASE = false   # change to true to save the type allocation to the database
TOTAL_DISTRIBUTIONS = NUMBER_OF_TYPES + NUMBER_OF_SINKS;

In [2]:
import Pkg
for package in ["BlackBoxOptim", "Distributions", "ForwardDiff", "JSON", "Optim", "Quadrature", 
        "StatsPlots","DotEnv","MySQL","DBInterface","Tables"]
    if INSTALL_PACKAGES
        Pkg.add(package)
    end
end
using  JSON, HTTP, Distributions, DotEnv, MySQL, DBInterface, Tables

In [3]:
function get_data(url)
    resp = HTTP.get(url);
    es = String(resp.body)
    return placements = JSON.parse(es)
end

get_data (generic function with 1 method)

In [4]:
function find_inst(name::String,outcomes,to_from::Bool = false)
    check = false
    for outcome in outcomes
        if to_from == true
            if outcome["to_name"] == name
                println(outcome)
                check = true
            end
        else
            if outcome["from_institution_name"] == name
                println(outcome)
                check = true
            end
        end
    end
    if check == false
        println("Not found")
    end
end

find_inst (generic function with 2 methods)

This is the type allocation notebook, modified by Mike.

This part gets all the placement data using `https://support.econjobmarket.org/api/placement_data`

In [11]:
url = "https://support.econjobmarket.org/api/placement_data"
placements = get_data(url);
m = length(placements)
println(m)
x =
println(rand(1:m))
println(typeof(placements))

14882
11539
Vector{Any}


In [13]:
placements[rand(1:m)]

Dict{String, Any} with 17 entries:
  "to_shortname"          => "Mechanical Engineering, Indian I"
  "to_name"               => "Indian Institute of Technology, Delhi"
  "to_department"         => "Mechanical Engineering"
  "name"                  => "Decision Sciences"
  "category_id"           => "25"
  "from_shortname"        => "Deaptment of Business Administra"
  "to_oid"                => "9949"
  "recruiter_type"        => "4"
  "description"           => "Academic organization (other than econ, business,…
  "from_oid"              => "9079"
  "position_name"         => "Post-Doc"
  "year"                  => "2022"
  "aid"                   => "54010"
  "to_institution_id"     => "4394"
  "postype"               => "6"
  "from_institution_id"   => "2747"
  "from_institution_name" => "Aligarh Muslim University"

In [6]:
placements = unique(placements);
println(length(placements))

14882


future debugging apparently there are 16 duplicates

In [7]:
const DB = DBInterface
cfg = DotEnv.config();

Find all the placements to assistant professor positions and put them in a set called academic_builder.
Put the other placements in a set called sink_builder

In [8]:
i = 0
oid_mapping = Dict{}()
institution_mapping = Dict{}()
academic = Set{}()
academic_to = Set{}()
academic_builder = Set{}()
sink_builder = Set{}()
for placement in placements
    if in(parse(Int64, placement["year"]), YEAR_INTERVAL)
        push!(academic, placement["from_institution_name"])
        push!(academic_to, placement["to_name"])
        oid_mapping[placement["from_oid"]] = placement["from_institution_id"]
        oid_mapping[placement["to_oid"]] = placement["to_institution_id"]
        institution_mapping[placement["from_institution_id"]] = placement["from_institution_name"]
        institution_mapping[placement["to_institution_id"]] = placement["to_name"]
        if placement["position_name"] == "Assistant Professor"
            push!(academic_builder, placement)
        else
            push!(sink_builder, placement)
        end
    end
end

println(length(academic_builder), " total assistant professor outcomes")
println(length(sink_builder), " other outcomes")
println(length(academic_builder)+length(sink_builder))


6174 total assistant professor outcomes
3575 other outcomes
9749


This is smaller than the total placements because it doesn't include placements outside the year interval.

This piece of code deals with teaching universities by checking if they ever graduated Ph.D. students:

In [10]:
tch_sink = Set{}() # sink of teaching universities that do not graduate PhDs
for key in academic_to
    if !(key in academic)
        push!(tch_sink, key)
    end
end

println(length(academic))
println(length(academic_to))
println(length(tch_sink))

468
1386
983


these are institutions, not placements.  Logic relies on the fact that julia dicts and sets can't contain duplicates

The next piece of code sorts all the sink departments (except teaching universities, which are dealt with above) by category:

In [12]:
acd_sink = Set{}()
gov_sink = Set{}()
pri_sink = Set{}()

for outcome in sink_builder
    # CODE global academic, other_placements, pri_sink, gov_sink, acd_sink
    if outcome["recruiter_type"] in ["6","7"]
        # private sector: for and not for profit
        push!(pri_sink, string(outcome["to_name"], " (private sector)"))
    elseif outcome["recruiter_type"] == "5"
        # government institution
        push!(gov_sink, string(outcome["to_name"], " (public sector)"))
    else
        # everything else including terminal academic positions
        push!(acd_sink, string(outcome["to_name"], " (academic sink)"))
    end
end

println(length(acd_sink))
println(length(gov_sink))
println(length(pri_sink))

492
114
136


Now that we have five sets for every category of department, we can construct a matrix representing the placements between these departments:

In [13]:
institutions = vcat(collect(academic), collect(acd_sink), collect(gov_sink), collect(pri_sink), collect(tch_sink));


In [14]:

out = zeros(Int64, length(institutions), length(collect(academic)))
i = 0
for outcome in academic_builder
    i += 1
    out[findfirst(isequal(outcome["to_name"]), institutions), findfirst(isequal(outcome["from_institution_name"]), institutions)] += 1
end
for outcome in sink_builder
    i += 1
    keycheck = ""
    if outcome["recruiter_type"] in ["6", "7"]
        keycheck = string(outcome["to_name"], " (private sector)")
    elseif outcome["recruiter_type"] == "5"
        keycheck = string(outcome["to_name"], " (public sector)")
    else
        keycheck = string(outcome["to_name"], " (academic sink)")
    end
    #println(keycheck)
    #println(findfirst(isequal(keycheck), institutions))
    #println(outcome["from_institution_name"]," ",findfirst(isequal(outcome["from_institution_name"]), institutions))
    out[findfirst(isequal(keycheck), institutions), findfirst(isequal(outcome["from_institution_name"]), institutions)] += 1
end
println("Total number of outcomes: ", i)
println(length(gov_sink))
println(length(pri_sink))


Total number of outcomes: 9749
114
136


In [42]:
sum(out)

9749

Finally, we get to the estimator. For this estimate, we assume that each observed set of placement outcomes between any two pairs of departments is drawn from a distribution common to the "type" of the hiring department and the "type" of the graduating department. Here this distribution is assumed to be Poisson, in line with classical stochastic block models used for similar estimations in Karrer and Newman (2011) and Peixoto (2014).

Given a particular assignment of departments to types, and given the placement outcomes, a single round of estimation computes the mean number of applicants from any single type $t$ department that would be hired at a single type $t^\prime$ department and measures the probability that each independent observation was drawn from its corresponding mean. When summed together, the logarithms of the probabilities form a log-likelihood which can be used for maximum likelihood estimation.

In [16]:
function bucket_estimate(assign::Array{Int64}, A::Matrix{Int64}, num, numsink)
    b = zeros(Int64, size(A)[1], size(A)[2])
    T = zeros(num + numsink, num)
    count = zeros(num + numsink, num)
    for i in 1:size(A)[1], j in 1:size(A)[2]
         @inbounds val = (num + 1) * (assign[j] - 1) + assign[i]
         @inbounds b[i, j] = val
         @inbounds T[val] = ((T[val] * count[val]) + A[i, j]) / (count[val] + 1)
         @inbounds count[val] += 1
    end
    L = 0.0
    @simd for i in eachindex(A)
        @inbounds L += logpdf(Poisson(T[b[i]]), A[i])
    end
    return -L, T
end

bucket_estimate (generic function with 1 method)

Finally, we compute the maximum-likelihood estimated Poisson means by stochastically re-allocating departments to types and saving likelihood-improving re-allocations until no further re-allocations are found.

### To skip re-allocating, uncomment the following line instead if you have an allocation already. Do not run the cell after.

In [17]:
#est_alloc = JSON.parsefile("type_allocation.json");
function doit(sample, academic_institutions, asink, gsink, psink, tsink, all_institutions, num, numsink)
    # some initial states
    current_allocation = Array{Int64}(undef, length(all_institutions))
    cur_objective = Inf
    best_mat = nothing
    cursor = 1
    for inst in academic_institutions
        current_allocation[cursor] = 1
        cursor += 1
    end
    # the sinks must stay in fixed types
    # this was built to support more sinks, but by default we only use one
    # change the "current_allocation[cursor] = ..." lines to group sinks together
    for key in asink # other academic
        current_allocation[cursor] = num + min(1, numsink)
        cursor += 1
    end
    for key in gsink # public sector
        current_allocation[cursor] = num + min(2, numsink)
        cursor += 1
    end
    for key in psink # private sector
        current_allocation[cursor] = num + min(3, numsink)
        cursor += 1
    end
    for key in tsink # assistant professor at teaching universities
        current_allocation[cursor] = num + min(4, numsink)
        cursor += 1
    end
    blankcount = 0

    # BEGIN MONTE CARLO REALLOCATION ROUTINE
    while true
        # attempt to reallocate academic institutions to a random spot
        temp_allocation = copy(current_allocation)
        k = rand(1:length(academic_institutions))
        @inbounds temp_allocation[k] = rand(delete!(Set(1:num), temp_allocation[k]))
        # check if the new assignment is better
        test_objective, estimated_means = bucket_estimate(temp_allocation, sample, num, numsink)
        if test_objective < cur_objective
            print(test_objective, " ")
            blankcount = 0
            cur_objective = test_objective
            best_mat = estimated_means
            current_allocation = temp_allocation
        else
            blankcount += 1
            if blankcount % 1000 == 0
                print(blankcount, " ")
            end
        end
        if blankcount == 100000
            return cur_objective, best_mat, current_allocation
        end
    end
end
est_obj, est_mat, est_alloc = doit(out, collect(academic), collect(acd_sink), collect(gov_sink), collect(pri_sink),
    collect(tch_sink), institutions, NUMBER_OF_TYPES, NUMBER_OF_SINKS)

56466.84401132045 56424.586560782285 56405.6693928734 56378.31231579637 56356.56215587721 56332.66034988355 56309.010438527 56290.29400179503 56278.82390601321 56269.09264913503 56255.01695557108 56231.585557319486 56212.58755675718 56187.834053512306 56166.30772602402 55933.43677954476 55922.81331960971 55900.273067646856 55884.67480525072 55860.372316357105 55643.85819299809 55619.742418626025 55599.83570530347 55578.072442621735 55555.90737340263 55518.51448189492 55447.50492167668 55421.284772513936 55297.91743244851 55252.60101344534 54877.805720253826 54808.329237581114 54791.41250688725 54770.69315149761 54759.16970022621 54743.37127122358 54118.800542500794 54094.82556385884 53632.21641332765 53626.93108287333 53610.724734770905 53601.85105566877 53586.606723003744 53406.37389424328 53321.512273554865 53300.424152546126 53118.937237923696 53102.20427741214 53088.48161768851 53067.477366182386 53061.91037911947 52946.67706441528 52938.19559829428 52931.55200468726 52920.34804168

(41436.52328465017, [0.016198347107437994 0.0010019550342130928 0.13094098883572564 0.3069518716577538; 0.002346041055718491 0.0010893600416233059 0.018887945670628205 0.022058823529411777; … ; 0.014616755793226356 0.0007115749525616725 0.14551083591331263 2.3044982698961944; 0.00424418093983327 0.0010238429172510354 0.026712433257055485 0.0664620630861042], [2, 4, 2, 1, 1, 1, 1, 4, 2, 1  …  5, 5, 5, 5, 5, 5, 5, 5, 5, 5])

In [18]:
function doit(sample, academic_institutions, asink, gsink, psink, tsink, all_institutions, num, numsink)
    # some initial states
    current_allocation = Array{Int64}(undef, length(all_institutions))
    cur_objective = Inf
    best_mat = nothing
    cursor = 1
    for inst in academic_institutions
        current_allocation[cursor] = 1
        cursor += 1
    end
    # the sinks must stay in fixed types
    # this was built to support more sinks, but by default we only use one
    # change the "current_allocation[cursor] = ..." lines to group sinks together
    for key in asink # other academic
        current_allocation[cursor] = num + min(1, numsink)
        cursor += 1
    end
    for key in gsink # public sector
        current_allocation[cursor] = num + min(2, numsink)
        cursor += 1
    end
    for key in psink # private sector
        current_allocation[cursor] = num + min(3, numsink)
        cursor += 1
    end
    for key in tsink # assistant professor at teaching universities
        current_allocation[cursor] = num + min(4, numsink)
        cursor += 1
    end
    blankcount = 0

    # BEGIN MONTE CARLO REALLOCATION ROUTINE
    while true
        # attempt to reallocate academic institutions to a random spot
        temp_allocation = copy(current_allocation)
        k = rand(1:length(academic_institutions))
        @inbounds temp_allocation[k] = rand(delete!(Set(1:num), temp_allocation[k]))
        # check if the new assignment is better
        test_objective, estimated_means = bucket_estimate(temp_allocation, sample, num, numsink)
        if test_objective < cur_objective
            print(test_objective, " ")
            blankcount = 0
            cur_objective = test_objective
            best_mat = estimated_means
            current_allocation = temp_allocation
        else
            blankcount += 1
            if blankcount % 1000 == 0
                print(blankcount, " ")
            end
        end
        if blankcount == 100000
            return cur_objective, best_mat, current_allocation
        end
    end
end

est_obj, est_mat, est_alloc = doit(out, collect(academic), collect(acd_sink), collect(gov_sink), collect(pri_sink), collect(tch_sink), institutions, NUMBER_OF_TYPES, NUMBER_OF_SINKS)

56487.04046492536 56038.4118153541 56002.0480589509 55984.212083460385 55967.34151313887 55948.457038532295 55924.591989433706 55913.35751988321 55890.62852865761 55870.263746183184 55864.35446295893 55840.49734140678 55822.74312203061 55798.61338449356 55781.5602502772 55759.109000889744 55750.640818890104 55702.00072139573 55660.17382125017 55645.154695236386 55625.62180436727 55607.97283358264 55603.58824471621 55582.60046421768 55559.27213882943 55545.15545464188 55521.66613007265 55502.84717288996 55485.81756907625 55482.03003489222 55460.825707993536 55441.405394125235 55429.58043887144 55405.84751552813 55144.010135545956 55136.63570453963 55134.42821812731 55125.05427567062 55111.87642299775 55090.777695470475 55078.21054276967 55062.25554077472 54956.638300197024 54952.90517957586 54935.07897904165 54912.17926359084 54839.22711531975 54674.7540379075 54653.98737115772 54631.281848454135 54608.407005564324 54592.51831605395 54082.94492216498 54062.2373118018 54042.55845264708 5

(41436.52328465017, [0.016198347107437994 0.13094098883572564 0.3069518716577538 0.0010019550342130928; 0.01722488038277513 0.18213296398891968 0.9520123839009291 0.0015916808149405836; … ; 0.002346041055718491 0.018887945670628205 0.022058823529411777 0.0010893600416233059; 0.00424418093983327 0.026712433257055485 0.0664620630861042 0.0010238429172510354], [4, 3, 4, 1, 1, 1, 1, 3, 4, 1  …  5, 5, 5, 5, 5, 5, 5, 5, 5, 5])

In [19]:
function bucket_extract(assign, A::Matrix{Int64}, num, numsink)
    T = zeros(Int64, num + numsink, num)
    for i in 1:size(A)[1], j in 1:size(A)[2]
         @inbounds T[(num + 1) * (assign[j] - 1) + assign[i]] += A[i, j]
    end
    return T
end

bucket_extract (generic function with 1 method)

The estimated means are:

In [20]:
est_mat = bucket_extract(est_alloc, out, NUMBER_OF_TYPES, NUMBER_OF_SINKS)

5×4 Matrix{Int64}:
  441   821   861   41
  108   263   615   15
   41    94   666    3
   96   178    93   67
 1208  1751  1949  438

The matrix is then ordered such that, for any two elements in symmetric positions across the diagonal, the element below the diagonal is greater than the element above the diagonal. For example, the 204 in cell (1, 2) is greater than the 50 in the symmetric cell (2, 1) below, as are all the other symmetric pairs.

In [21]:
sum(est_mat)

9749

In [22]:
M = est_mat

5×4 Matrix{Int64}:
  441   821   861   41
  108   263   615   15
   41    94   666    3
   96   178    93   67
 1208  1751  1949  438

The next bit re-orders the tiers so that tier 1 is in the first, tier 2 the second, etc.  The ordering of the tiers is by total placements.

In [23]:
# the new placements matrix
placement_rates = zeros(Int64, (TOTAL_DISTRIBUTIONS, NUMBER_OF_TYPES))
#row sums in the estimated matrix
ovector = sum(M, dims=1)
# row sums reordered highest to lowest
svector = sortslices(ovector,dims=2, rev=true) 
#println(svector)
#println(length(ovector))
# a mapping from current row index to the index it should have in the new matrix
o = Dict{}()
for i in 1:length(ovector)
    for k in 1:length(svector)
        if ovector[1,i] == svector[1,k]
            o[i] = k
            break
        end
    end
end 
#println(o)
P = zeros(Int64, (TOTAL_DISTRIBUTIONS, NUMBER_OF_TYPES))
#shuffle the cells for the tier to tier placements
for i in 1:NUMBER_OF_TYPES
    for j in 1:NUMBER_OF_TYPES
        placement_rates[o[i],o[j]] = M[i,j]
    end
end
#shuffle the cells for tier to sink placements (separate since sink row indices don't change)
for i in NUMBER_OF_TYPES+1:NUMBER_OF_TYPES+NUMBER_OF_SINKS
    for j in 1:NUMBER_OF_TYPES
        placement_rates[i,o[j]] = M[i,j]
    end
end
placement_rates


            
    

5×4 Matrix{Int64}:
  666    94    41    3
  615   263   108   15
  861   821   441   41
   93   178    96   67
 1949  1751  1208  438

In [43]:
println(sum(placement_rates))

9749


In [44]:
o # this is the type to tier mapping, first argument is type, second is the tier

Dict{Any, Any} with 4 entries:
  4 => 4
  2 => 2
  3 => 1
  1 => 3

To verify that this is ordered properly, we can check symmetric indices:

In [28]:
for i in 1:NUMBER_OF_TYPES, j in 1:NUMBER_OF_TYPES
    if i > j # not a diagonal and only check once
        if placement_rates[i, j] <= placement_rates[j, i]
            println("FAULT: hiring ", i, " with graduating ", j, ": downward rate: ", placement_rates[i, j], ", upward rate: ", placement_rates[j, i])
        end
    end
end
println("Check Complete")

Check Complete


If everything worked fine, there should be no faults in this cell. If there are, change the order in the cell with the explicit order.

James' code only matched an institution name with a tier.  To save the matching to the database (and to use it for other purposes) we need to match tiers with oids.  Two dictionaries were created above
`oid_mapping` has keys that correspond with oids and values that coincide with institution ids,
`institution_mapping` has keys which are institution ids, while values are the corresponding institution name
The next function reverses this, takes an institution name and returns the institution id, then the set of oids that are associated with that institution.
`false` is returned if either is not found.

In [29]:
function name_to_oid(institution_name::String, institutions::Dict, organizations::Dict)
    oids = String[]
    institution_id = String
    for k in keys(institutions)
        if institutions[k] == institution_name
            institution_id = k
            break
            #push!(oids,k)
        end
    end
    for k in keys(organizations)
        if organizations[k] == institution_id
            push!(oids, k)
        end
    end
return institution_id, oids
end


name_to_oid (generic function with 1 method)

The lengths of mappings.  The total number of institutions should exceed the total number in the institution mapping by the number of academic institutions who hired at a level other than assistant professor (postdocs,lecturer, etc).  The saved mapping the the database should then contain the same number as the institution_mapping.

In [34]:
println(length(institution_mapping))
print(length(institutions))

1451
2193

For mysql

In [35]:
#saving type allocation
function save_type(tier, oids, stm)
    for oid in oids
        DB.execute(stm, [tier, oid])
    end
end


save_type (generic function with 1 method)

In [41]:
o

Dict{Any, Any} with 4 entries:
  4 => 4
  2 => 2
  3 => 1
  1 => 3

In [37]:
#to record the type allocation
n = 0
B = Set{String}()

if SAVE_TO_DATABASE
    d = DB.connect(MySQL.Connection,cfg["host"], cfg["user"], cfg["password"], db =cfg["database"], port = parse(Int64,cfg["port"]))
    query = "drop table if exists type_distribution"
    DB.execute(d, query)
    query = "create table type_distribution (id int auto_increment primary key, type int, oid int,created timestamp default CURRENT_TIMESTAMP )"
    DB.execute(d, query)
    query = "insert into type_distribution set type=?,oid=?"
    stm = DB.prepare(d, query)
end
for j in 1:NUMBER_OF_TYPES
    println("Type ", j)
    println()
    i = 1
    for entry in est_alloc
        if entry == j
            push!(B,institutions[i])
            iid, oids = name_to_oid(institutions[i],institution_mapping, oid_mapping)
            println(institutions[i]," ",oids)
            if(length(oids)) == 0
                println("*****Error in data****")
                n += 1
            end
            if SAVE_TO_DATABASE
                save_type(j, oids, stm)
            end
        end
        i += 1
    end
    println(n," errors counted")
    println()
end
if SAVE_TO_DATABASE
    DB.close(d)
end


Type 1

University of Oslo ["3810", "675"]
Rice University ["2099", "110"]
University of Massachusetts, Amherst ["1635", "2250", "1642", "330"]
Eastern Kentucky University ["65"]
Peking University ["1830", "767", "3008", "1273", "2904"]
University of Lausanne (Université de Lausanne) ["376", "3355"]
Georgetown University ["6269", "1419", "99", "4600", "1184"]
Universidad de los Andes ["1033", "533"]
Vrije Universiteit Amsterdam ["898", "72", "250", "1369", "2252"]
Hong Kong Baptist University ["1080"]
Simon Fraser University ["332", "1310"]
Colorado State University ["4997", "1788", "3130"]
University of Leicester ["477"]
London Business School ["1754", "660", "3203", "3276"]
University of Amsterdam (Universiteit van Amsterdam) ["5211", "1343", "2389"]
Wuhan University ["1183", "4820"]
George Mason University ["2756", "1770", "1637"]
Drexel University ["2379", "634"]
University of Miami ["2546", "3527", "750"]
Concordia University ["160", "7009"]
University of Kentucky ["1535", "3924",

In [38]:
n = 0
B = Set{String}()
if SAVE_TO_DATABASE
    d = DB.connect(MySQL.Connection,cfg["host"], cfg["user"], cfg["password"], db =cfg["database"], port = parse(Int64,cfg["port"]))
    query = "insert into type_distribution set type=?,oid=?"
    stm = DB.prepare(d, query)
end
for j in NUMBER_OF_TYPES+1:NUMBER_OF_SINKS+NUMBER_OF_TYPES
    println("SINK ", j - NUMBER_OF_TYPES)
    println()
    i = 1
    for entry in est_alloc
        if entry == j 
            if occursin("(private sector)", institutions[i])
                ch = 17
            elseif (!occursin("(private sector)", institutions[i]) && !occursin("(public sector)", institutions[i]) 
                    && !occursin("(academic sink)", institutions[i]))
                ch = 0
            else ch = 16
            end
            if !(institutions[i][1:prevind(institutions[i], end,ch)] in B)
                iid, oids = name_to_oid(institutions[i][1:prevind(institutions[i],end,ch)],institution_mapping, oid_mapping)
                println(institutions[i]," ", oids)
                if length(oids) == 0
                    println("****** error in data*****")
                    n +=1
                end
                if SAVE_TO_DATABASE
                    save_type(j, oids, stm)
                end
                push!(B, institutions[i][1:prevind(institutions[i], end,ch)])
            end
        end
        i += 1
    end
    println()
    println(n," errors counted")
end
DB.close(d)

SINK 1

CNRS (Centre national de la recherche scientifique) (academic sink) ["270"]
De Nederlandsche Bank (academic sink) ["3357"]
Institute of Economics (Ekonomski institut), Zagreb (academic sink) ["3474"]
University of Bremen (Universität Bremen) (academic sink) ["7847"]
Texas A&M University, College Station (academic sink) ["1990", "596", "1823", "3003"]
Universitas Indonesia (academic sink) ["1544"]
University of Oxford (academic sink) ["1050", "1398", "1233", "612", "693", "1832", "1504", "5931", "217"]
Universitat Pompeu Fabra (academic sink) ["453", "998"]
City University of London (academic sink) ["883", "1094"]
Coventry University (academic sink) ["2996"]
Centre d'Etudes Prospectives et d'Informations Internationales (CEPII) (academic sink) ["1696"]
University of Sussex (academic sink) ["430", "2549", "831"]
Johns Hopkins University (academic sink) ["1043", "49", "1345", "2864", "2340", "2624", "1328", "537"]
State University of New York at Buffalo (academic sink) ["1756", "4

In [39]:
println("acd_sink ",length(acd_sink))
println("teaching ",length(tch_sink))
println("academic ",length(academic))
println(" gov ", length(gov_sink))
println("private ", length(pri_sink))
println("institutions ", length(institutions))

acd_sink 492
teaching 983
academic 468
 gov 114
private 136
institutions 2193


## References

Karrer, B., and M. E. J. Newman (2011): "Stochastic Blockmodels and community structure in networks," Physical Review, 83(1).

Peixoto, T. (2014): "Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models," Physical Review, 89(1).