# Assortative Matching With Coarse Types: Chi-Square Estimator of Deep Parameters given Stochastic Classification of Economics Departments by Placement Rates

James Yu, 30 November 2021 - modified by mike December

## VERSION: $n$, $m$ fixed, ratios endogenous

In [1]:
INSTALL_PACKAGES = false   # change this to true if you are running this notebook for the first time
YEAR_INTERVAL = 2003:2021  # change this to select the years of data to include in the estimation
NUMBER_OF_TYPES = 4        # change this to select the number of types to classify academic departments into
NUMBER_OF_SINKS = 1        # change this to 4 to use individual sink types
SAVE_TO_DATADASE = false   # change to true to save the type allocation to the database
TOTAL_DISTRIBUTIONS = NUMBER_OF_TYPES + NUMBER_OF_SINKS;

In [2]:
import Pkg
for package in ["BlackBoxOptim", "Distributions", "ForwardDiff", "JSON", "Optim", "Quadrature", 
        "StatsPlots","DotEnv","MySQL","DBInterface","Tables"]
    if INSTALL_PACKAGES
        Pkg.add(package)
    end
end
using  JSON, HTTP, Distributions, DotEnv, MySQL, DBInterface, Tables, Random

In [3]:
function get_data(url)
    resp = HTTP.get(url);
    es = String(resp.body)
    return placements = JSON.parse(es)
end

get_data (generic function with 1 method)

In [4]:
function find_inst(name::String,outcomes,to_from::Bool = false)
    check = false
    for outcome in outcomes
        if to_from == true
            if outcome["to_name"] == name
                println(outcome)
                check = true
            end
        else
            if outcome["from_institution_name"] == name
                println(outcome)
                check = true
            end
        end
    end
    if check == false
        println("Not found")
    end
end

find_inst (generic function with 2 methods)

This is the type allocation notebook, modified by Mike.

This part gets all the placement data using `https://support.econjobmarket.org/api/placement_data`

In [5]:
url = "https://support.econjobmarket.org/api/placement_data"
placements = get_data(url);
println(length(placements))
println(typeof(placements))

14318
Vector{Any}


In [6]:
placements = unique(placements);
println(length(placements))

14318


future debugging apparently there are 16 duplicates

In [7]:
const DB = DBInterface
cfg = DotEnv.config();

Find all the placements to assistant professor positions and put them in a set called academic_builder.
Put the other placements in a set called sink_builder

In [8]:
i = 0
oid_mapping = Dict{}()
institution_mapping = Dict{}()
academic = Set{}()
academic_to = Set{}()
academic_builder = Set{}()
sink_builder = Set{}()
for placement in placements
    if in(parse(Int64, placement["year"]), YEAR_INTERVAL)
        push!(academic, placement["from_institution_name"])
        push!(academic_to, placement["to_name"])
        oid_mapping[placement["from_oid"]] = placement["from_institution_id"]
        oid_mapping[placement["to_oid"]] = placement["to_institution_id"]
        institution_mapping[placement["from_institution_id"]] = placement["from_institution_name"]
        institution_mapping[placement["to_institution_id"]] = placement["to_name"]
        if placement["position_name"] == "Assistant Professor"
            push!(academic_builder, placement)
        else
            push!(sink_builder, placement)
        end
    end
end

println(length(academic_builder), " total assistant professor outcomes")
println(length(sink_builder), " other outcomes")
println(length(academic_builder)+length(sink_builder))


6951 total assistant professor outcomes
7048 other outcomes
13999


This is smaller than the total placements because it doesn't include placements outside the year interval.

This piece of code deals with teaching universities by checking if they ever graduated Ph.D. students:

In [9]:
tch_sink = Set{}() # sink of teaching universities that do not graduate PhDs
for key in academic_to
    if !(key in academic)
        push!(tch_sink, key)
    end
end

println(length(academic))
println(length(academic_to))
println(length(tch_sink))

692
1758
1168


these are institutions, not placements.  Logic relies on the fact that julia dicts and sets can't contain duplicates

The next piece of code sorts all the sink departments (except teaching universities, which are dealt with above) by category:

In [10]:
acd_sink = Set{}()
gov_sink = Set{}()
pri_sink = Set{}()

for outcome in sink_builder
    # CODE global academic, other_placements, pri_sink, gov_sink, acd_sink
    if outcome["recruiter_type"] in ["6","7"]
        # private sector: for and not for profit
        push!(pri_sink, string(outcome["to_name"], " (private sector)"))
    elseif outcome["recruiter_type"] == "5"
        # government institution
        push!(gov_sink, string(outcome["to_name"], " (public sector)"))
    else
        # everything else including terminal academic positions
        push!(acd_sink, string(outcome["to_name"], " (academic sink)"))
    end
end

println(length(acd_sink))
println(length(gov_sink))
println(length(pri_sink))

863
144
187


Now that we have five sets for every category of department, we can construct a matrix representing the placements between these departments:

In [11]:
institutions = vcat(collect(academic), collect(acd_sink), collect(gov_sink), collect(pri_sink), collect(tch_sink));


In [12]:

out = zeros(Int64, length(institutions), length(collect(academic)))
i = 0
for outcome in academic_builder
    i += 1
    out[findfirst(isequal(outcome["to_name"]), institutions), findfirst(isequal(outcome["from_institution_name"]), institutions)] += 1
end
for outcome in sink_builder
    i += 1
    keycheck = ""
    if outcome["recruiter_type"] in ["6", "7"]
        keycheck = string(outcome["to_name"], " (private sector)")
    elseif outcome["recruiter_type"] == "5"
        keycheck = string(outcome["to_name"], " (public sector)")
    else
        keycheck = string(outcome["to_name"], " (academic sink)")
    end
    #println(keycheck)
    #println(findfirst(isequal(keycheck), institutions))
    #println(outcome["from_institution_name"]," ",findfirst(isequal(outcome["from_institution_name"]), institutions))
    out[findfirst(isequal(keycheck), institutions), findfirst(isequal(outcome["from_institution_name"]), institutions)] += 1
end
println("Total number of outcomes: ", i)
println(length(gov_sink))
println(length(pri_sink))


Total number of outcomes: 13999
144
187


In [13]:
sum(out)

13999

Finally, we get to the estimator. For this estimate, we assume that each observed set of placement outcomes between any two pairs of departments is drawn from a distribution common to the "type" of the hiring department and the "type" of the graduating department. Here this distribution is assumed to be Poisson, in line with classical stochastic block models used for similar estimations in Karrer and Newman (2011) and Peixoto (2014).

Given a particular assignment of departments to types, and given the placement outcomes, a single round of estimation computes the mean number of applicants from any single type $t$ department that would be hired at a single type $t^\prime$ department and measures the probability that each independent observation was drawn from its corresponding mean. When summed together, the logarithms of the probabilities form a log-likelihood which can be used for maximum likelihood estimation.

In [14]:
function bucket_estimate(assign::Array{Int64}, A::Matrix{Int64}, num, numsink)
    b = zeros(Int64, size(A)[1], size(A)[2])
    T = zeros(num + numsink, num)
    count = zeros(num + numsink, num)
    for i in 1:size(A)[1], j in 1:size(A)[2]
         @inbounds val = (num + 1) * (assign[j] - 1) + assign[i]
         @inbounds b[i, j] = val
         @inbounds T[val] = ((T[val] * count[val]) + A[i, j]) / (count[val] + 1)
         @inbounds count[val] += 1
    end
    L = 0.0
    @simd for i in eachindex(A)
        @inbounds L += logpdf(Poisson(T[b[i]]), A[i])
    end
    return -L, T
end

bucket_estimate (generic function with 1 method)

In [15]:
using FLoops

In [16]:
function bucket_estimate_floop(assign::Array{Int64}, A::Matrix{Int64}, num, numsink,b::Matrix{Int64})
    T = zeros(num + numsink, num)
    count = zeros(num + numsink, num)
    for i in 1:size(A)[1], j in 1:size(A)[2]
         @inbounds val = (num + 1) * (assign[j] - 1) + assign[i]
         @inbounds b[i, j] = val
         @inbounds T[val] = ((T[val] * count[val]) + A[i, j]) / (count[val] + 1)
         @inbounds count[val] += 1
    end
    L = 0.0
    @floop for i in eachindex(A)
        @inbounds @reduce L += logpdf(Poisson(T[b[i]]), A[i])
    end
    return -L, T
end

bucket_estimate_floop (generic function with 1 method)

Finally, we compute the maximum-likelihood estimated Poisson means by stochastically re-allocating departments to types and saving likelihood-improving re-allocations until no further re-allocations are found.

### To skip re-allocating, uncomment the following line instead if you have an allocation already. Do not run the cell after.

In [17]:
Random.seed!(1234)

TaskLocalRNG()

In [None]:
#est_alloc = JSON.parsefile("type_allocation.json");
function doit(sample, academic_institutions, asink, gsink, psink, tsink, all_institutions, num, numsink)
    # some initial states
    current_allocation = Array{Int64}(undef, length(all_institutions))
    cur_objective = Inf
    best_mat = nothing
    cursor = 1
    for inst in academic_institutions
        current_allocation[cursor] = 1
        cursor += 1
    end
    # the sinks must stay in fixed types
    # this was built to support more sinks, but by default we only use one
    # change the "current_allocation[cursor] = ..." lines to group sinks together
    for key in asink # other academic
        current_allocation[cursor] = num + min(1, numsink)
        cursor += 1
    end
    for key in gsink # public sector
        current_allocation[cursor] = num + min(2, numsink)
        cursor += 1
    end
    for key in psink # private sector
        current_allocation[cursor] = num + min(3, numsink)
        cursor += 1
    end
    for key in tsink # assistant professor at teaching universities
        current_allocation[cursor] = num + min(4, numsink)
        cursor += 1
    end
    blankcount = 0

    # BEGIN MONTE CARLO REALLOCATION ROUTINE
    while true
        # attempt to reallocate academic institutions to a random spot
        temp_allocation = copy(current_allocation)
        k = rand(1:length(academic_institutions))
        @inbounds temp_allocation[k] = rand(delete!(Set(1:num), temp_allocation[k]))
        # check if the new assignment is better
        test_objective, estimated_means = bucket_estimate(temp_allocation, sample, num, numsink)
        if test_objective < cur_objective
            print(test_objective, " ")
            blankcount = 0
            cur_objective = test_objective
            best_mat = estimated_means
            current_allocation = temp_allocation
        else
            blankcount += 1
            if blankcount % 1000 == 0
                print(blankcount, " ")
            end
        end
        if blankcount == 100000
            return cur_objective, best_mat, current_allocation
        end
    end
end
@time est_obj, est_mat, est_alloc = doit(out, collect(academic), collect(acd_sink), collect(gov_sink), collect(pri_sink),
    collect(tch_sink), institutions, NUMBER_OF_TYPES, NUMBER_OF_SINKS)

### Outtput the above cell gives

7167.71403596654 87149.61885406218 87148.22380765609 87137.3598778919 87121.08450287407 87107.49513358274 87089.95521100592 87067.22995523066 87052.42110819285 87033.77503675935 87014.16041146395 86990.95315372797 86981.13212814093 86958.77089993196 86944.27284229794 86935.87397400678 86917.47202784251 86916.74157582349 86897.942167008 86882.51817746961 86873.79466863033 86873.2255956182 86854.54367711033 86843.87163316063 86827.37846181304 86807.8140260719 86802.78642604398 86786.98998096393 86775.48052296237 86756.62538460981 86737.6565426398 86723.53625220232 86719.0358606693 86711.3272902198 86693.42959895938 86676.7811185076 86654.84144016127 86636.91262302686 86624.56779982938 86603.95608740965 86581.21764476171 86562.2123948964 86542.25617678998 86523.72551719293 86507.42913447478 86488.53174148615 86471.28009163613 86470.28453385635 86453.39969779807 86449.97760778978 86446.62987768483 86427.57282611691 86405.44950754986 86383.13600645104 86360.05754275116 86344.10307452256 86320.55333416148 86319.10074547828 86300.78081766663 86278.03511877089 86255.91857782418 86232.92790463418 86231.41811084808 86223.49173332578 86204.26294299567 86185.96080498993 86164.47679715225 86144.34686904722 86120.5195254958 86100.11652467874 86077.95143727778 86062.62772052718 86058.82515073133 86058.11223829188 86036.0253509285 86012.50491928413 85988.82669877828 85973.12024838332 85957.39600828306 85936.45462461471 85930.01909373583 85911.35882489545 85888.19301704575 85868.69325092186 85847.9718765884 85823.89010478681 85799.13486552275 85775.45404618033 85774.23200159606 85763.123667893 85753.87042197675 85728.79015925525 85706.66020330759 85682.66871688623 85677.16397827596 85653.06465176311 85629.82027499902 85628.86781360637 85604.5548980143 85582.93637571349 85563.00914124158 85543.80168923474 85519.21735303767 85508.70690338955 85507.58680781185 85483.77008354448 85466.43973471073 85442.49995324542 85426.42067673136 85408.27310740697 85402.1175365539 85391.02973982232 85379.92963322099 85361.0062355132 85355.04860045738 85332.15835034962 85312.7380270489 85297.7488903285 85292.35422794214 85267.16796851566 85242.82859387898 85217.46209313608 85203.39225107034 85177.12545998767 85176.7870912172 85156.20926661488 85133.06242448997 85126.2109300678 85103.58580122956 85077.77158887523 85053.09758626482 85027.11014513765 85004.99831114085 84997.28306014171 84970.46192089458 84946.74033634509 84921.67592050991 84911.852391156 84891.38958791652 84871.79085001812 84852.88622584227 84829.36452162777 84811.96706673999 84787.38540602036 84761.52049472158 84758.22452163088 84753.03892735996 84727.28060026935 84700.87423693291 84695.79715730867 84669.8568581013 84647.04158779424 84629.04481181609 84606.260707402 84605.5083138072 84578.53123525668 84566.28418668175 84539.79639717996 84519.85097051282 84491.65119311804 84467.24300915224 84444.87653301487 84416.42145004222 84401.82134325331 84374.3272006225 84347.7067807016 84330.30999181622 84303.99933220081 84276.41943029783 84248.93174884429 84223.03062218022 84200.10852094131 84199.4707165633 84174.82580360076 84173.85531137718 84146.3348395354 84128.73011486737 84123.76539719992 84102.6144558432 84085.03092905869 84057.15893744535 84033.74954606511 84005.67353005311 83980.01263594512 83955.39915794235 83932.24195651895 83910.3276297308 83900.51645803779 83879.43364683898 83852.62958642124 83823.95690436049 83798.7366519996 83771.00789992248 83749.87753024005 83720.62541438876 83719.90185445028 83691.77354934927 83665.33009794973 83643.39905490288 83624.9293861102 83609.3241454376 83608.69198657945 83579.28106415551 83578.42500098677 83564.12024249756 83553.62098350374 83532.93766053191 83524.43371585393 83511.77884380914 83494.07971107976 83493.75252709181 83471.52447784165 83458.22906458183 83431.23897913707 83415.70030283011 83409.71827791125 83383.2375429431 83382.55292438657 83376.82434373863 83347.36072957783 83316.85796515188 83290.39132331235 83266.7396571635 83255.04953167272 83225.49660109801 83209.89735265508 83184.30343608938 83182.30840430922 83150.72422973378 83127.39879889667 83101.57748875639 83072.29854797687 83071.17031701037 83056.86109065848 83035.58235670268 83035.11461224114 83034.51398385795 83033.832588197 83005.7646873461 82997.86374633522 82969.74764808267 82941.48714144852 82917.20321084428 82896.76213213471 82896.15362504561 82867.76382733404 82867.20116048784 82841.19151394516 82840.94589880004 82808.2630157521 82807.301837701 82806.89315518684 82775.47233259236 82746.37601054572 82745.10333986602 82712.08814033931 82688.28319243946 82687.25200089884 82659.57616752625 82656.24336731424 82623.86622229694 82591.36284751199 82590.81602313202 82572.01127718395 82551.51271843073 82528.09196841644 82527.47207819043 82497.54685254139 82466.73144094515 82434.36369381136 82409.50421752616 82409.06585916114 82375.86767098158 82341.75170408621 82312.48383688636 82283.27144504001 82282.36548407833 82281.8804017306 82250.47520026824 82219.1510733262 82201.95671856105 82168.69553770073 82167.9143438016 82133.22208519849 82101.37800648573 82099.29750870114 82096.38327407966 82069.45642113202 82035.37665589554 82001.64657078864 81969.7201508475 81939.87697790703 81906.30504121268 81904.85469642388 81874.601670498 81842.2022399459 81808.61822740256 81801.32984946352 81772.0120540891 81764.12847013508 81746.75758613292 81745.46531936753 81712.62325564628 81710.50420122164 81675.68362524659 81643.76450744335 81643.16836093797 81641.4997229855 81606.00655266586 81605.3203411238 81580.71163877835 81580.07429503724 81580.05352380521 81578.85953245158 81546.6048847676 81511.51712143958 81479.10129012981 81476.60648663204 81470.12431944847 81469.51131748322 81467.61146855965 81437.97525018107 81406.31914520595 81369.98717895494 81368.78489131149 81346.84352819093 81346.47162020933 81328.40869048536 81292.64334144587 81263.96771944246 81228.48779342615 81217.85436095535 81181.7226812422 81146.00323659834 81145.02685765873 81116.97645690675 81093.03472376234 81056.96185854456 81020.84007140945 80989.06207995834 80953.60657409397 80952.05422999628 80950.3897577763 80949.56229674505 80949.45165746877 80939.89810809433 80935.46065331888 80921.97515179 80921.14291778074 80918.92024233291 80889.45818915898 80854.22292719544 80821.73244768009 80819.39812458506 80794.34539783251 80767.54879298926 80747.42251593166 80747.26225211669 80724.23608562564 80706.32965465085 80705.9341871948 80696.09940059166 80656.20762278003 80626.68610232887 80606.37436496346 80594.00826286754 80553.6946108771 80520.38877896205 80519.53002712085 80484.62581899302 80482.69474870063 80452.02930044908 80412.46543048644 80375.3891822471 80366.38055797822 80365.00120815258 80329.65282138411 80328.53275853884 80324.00829698572 80300.65513889903 80260.43694232195 80220.44366009349 80185.20244367947 80184.7545193642 80184.57117409572 80163.47853408167 80126.03047867675 80122.95068231008 80083.19565088637 80048.38806134253 80043.11854028798 80005.27628399542 79972.70249029304 79970.43994076803 79968.5030177153 79930.11929659468 79899.28872997171 79873.12011695432 79869.56713304365 79868.69797718283 79827.98666907047 79794.58321496187 79792.96035767556 79791.11341253236 79747.83003110663 79746.71108752489 79734.33636106378 79713.81690170303 79706.9250508427 79666.84659240961 79636.90598703823 79635.66218802157 79629.79945336762 79614.60138949259 79578.8656001348 79535.1459625492 79533.37580454253 79491.36086545375 79451.06658117793 79450.31276429082 79449.6591425425 79448.72210534106 79448.03844018147 79424.89716084914 79382.92742985295 79354.26235711276 79354.05392161744 79352.45015215286 79349.96712999007 79318.81794060492 79276.79456420512 79274.4649415061 79272.86687101127 79227.33916494783 79226.19527672025 79180.48974597106 79145.33799466377 79103.9038405545 79085.93642391029 79040.78422192624 79001.48471874117 78998.6876173976 78997.83875833756 78996.46601784334 78996.3006623107 78995.91515296094 78954.89154950873 78954.1227626199 78914.39121190405 78905.30790033717 78905.23578573586 78877.5376803856 78835.20944886185 78797.95959627576 78754.6955611955 78711.89119950822 78707.17571589146 78660.30424985505 78658.25108053636 78656.1074809516 78656.0725353308 78628.8420644258 78627.71378778489 78627.27636424151 78583.67731825782 78550.0005792197 78504.67405089871 78489.79815222838 78489.5963830036 78448.58086460718 78402.59495008316 78389.7975118163 78347.14820452796 78322.50605690268 78322.01310231867 78273.30581384379 78273.1765221043 78269.14264634863 78222.36626706469 78219.53973172937 78176.73417030658 78176.47326929304 78169.02965730829 78152.68756243541 78152.4996763453 78151.74618718498 78145.11465042902 78110.33898810555 78072.31126140502 78035.83758056765 78034.02205302134 78033.74625544777 78031.79033916564 77987.56611871826 77945.37760793102 77943.31411505608 77891.9676374851 77886.82857598913 77837.68716788029 77785.91676166252 77745.57498301449 77739.2976925099 77689.563686873 77688.78113527375 77687.42782265817 77644.47205559682 77641.89167254575 77641.33447808924 77641.22160041668 77598.69004508332 77597.08379754174 77597.08356554018 77577.07861781714 77532.90007682425 77532.8443103482 77529.96973737994 77525.55981076401 77479.53939128465 77446.04195171037 77434.40198950471 77388.71789787618 77387.8902014261 77387.03260387201 77386.08858331632 77385.44231604233 77340.74450695362 77298.21188876092 77243.53668329945 77228.78710220329 77181.2225357782 77180.25569562917 77178.5330615377 77178.12410783661 77175.73971012117 77123.64106624249 77112.47162969763 77102.64379981757 77100.0893755922 77096.0737433437 77040.07092023564 77039.0622657533 77037.9117094408 77007.72734481299 76956.32232171745 76949.49544609184 76897.8248601771 76882.56036953392 76882.09991625296 76880.230366896 76876.49916571895 76876.05694207504 76875.58310820113 76857.09180989079 76854.94844233656 76840.84268374812 76839.35277402533 76835.37103100953 76818.51707834104 76815.06741337001 76813.870744453 76792.96843455316 76791.81871062967 76790.06759449525 76754.15512380034 76753.125570011 76744.55001879344 76743.92916541653 76714.89854621216 76671.11631595244 76637.74971618367 76602.65440446371 76601.50034345294 76585.63976087018 76582.05735726013 76538.51249665343 76534.9466508297 76534.11754583461 76533.36148936802 76530.43222475413 76525.52602041309 76511.93306572587 76511.14996910655 76487.03108703713 76438.09755372451 76437.03122255302 76427.11621704312 76390.54060615039 76389.19717055585 76384.1674447534 76377.62430718089 76377.15694596067 76357.90994237163 76357.7622353188 76352.64310431959 76291.64783066108 76290.00991411797 76270.1489949741 76269.6716641967 76269.47571305132 76264.20291456535 76207.94424673473 76201.75972438105 76145.8280615389 76141.47736129716 76126.20489153023 76125.80479636467 76122.30257358478 76121.07732667336 76118.60513137247 76117.35105501622 76116.78885078723 76112.18224681747 76110.93978608842 76109.85507261484 76053.90448930982 75991.3554781675 75942.38330005144 75927.71944017966 75927.68889022172 75923.33371219586 75914.77453009524 75912.25917911823 75900.14141968887 75842.21329909169 75779.97367581615 75723.35113594522 75689.50735030485 75689.00240594805 75687.69692035696 75686.34893591588 75676.12469879992 75669.2783088293 75667.72894359514 75643.44291563616 75638.78335663918 75579.40251469864 75576.83691485447 75573.48849042178 75555.40290986664 75550.00782710727 75527.80180780341 75526.98272045417 75467.25202497125 75463.89322566966 75409.21662876953 75408.88387351458 75342.26336780516 75338.21209649881 75335.83617071403 75334.23620446886 75267.24972514373 75208.02448967625 75205.31223363294 75200.57202071778 75199.61986774938 75198.43493985878 75196.62852880446 75193.04301023005 75190.18992399631 75159.99669729442 75138.94895708929 75109.34967507033 75103.1331013997 75098.38234976167 75048.81598353412 75025.1722956521 75024.65972203288 74957.18641705834 74920.91553357126 74919.40873872634 74875.37963396599 74867.3799650937 74862.06646641645 74859.04428620538 74852.37975317649 74849.04966765924 74785.37789264161 74775.46284666481 74772.23839619584 74736.94436917614 74735.19997123502 74675.18825753064 74673.46415737622 74669.68448651729 74604.5283931722 74541.16036317323 74540.8364389864 74537.9161986723 74534.962304261 74522.3536318142 74522.19184908642 74500.25495791428 74491.01028006557 74486.09873644562 74485.03869457514 74471.76611368063 74469.89167284082 74412.84574562867 74408.59727279154 74390.07770513878 74387.74121699297 74322.67415781757 74319.4933762313 74296.06486228958 74294.9431063713 74246.48297452171 74225.65374187811 74223.20305457395 74222.8973315545 74204.93507516304 74136.19986029493 74133.6809387133 74060.02842902242 74058.51739445537 74055.6575193387 74055.27334133384 74052.76079747747 74052.1296744281 74048.7446846814 74042.6668389537 74042.66320613821 74015.06823810842 74009.90845806163 74009.08704931801 74008.69496031338 73994.36565671444 73993.36238582923 73991.39102049262 73988.81562893449 73969.75418879949 73941.84693325704 73935.45305922799 73913.46519831615 73909.72782300302 73908.50447927028 73903.79277250521 73838.29068550168 73761.69214814853 73753.45835551356 73753.16636618008 73751.06867160035 73748.46187480041 73747.74710544119 73745.08924384718 73744.46519655742 73739.50057674627 73729.0928664609 73726.99808484738 73721.91021367963 73721.2691313403 73715.84072763838 73713.910549784 73700.76904745992 73683.89204355559 73682.82237807158 73610.7762424235 73599.08399479781 73534.57200235219 73533.71747859613 73533.42697469855 73528.38832692575 73521.10628919031 73519.86089129385 73513.2597783443 73507.76151263212 73502.27425631104 73499.48205073709 73464.40589715687 73442.00483375299 73437.94063796292 73371.03145899606 73351.43865771734 73340.14022193594 73309.41246089594 73308.69401322103 73306.03867105056 73301.05875590828 73300.95644231295 73275.56271219709 73271.70997718888 73258.10548496117 73257.473399279 73191.31403923777 73187.56313519708 73165.63957734598 73165.57576253758 73124.48793985424 73112.89523418926 73111.12157800136 73109.65664385707 73102.96716564205 73094.65898623114 73094.23165742397 73092.34329005906 73090.05690366865 73087.88576567463 73072.38363530682 72998.62072958228 72953.12250958676 72952.72807469577 72946.2396315106 72858.47468027144 72855.8525268395 72853.26800220706 72844.80160657382 72843.89501255969 72841.40662923777 72821.34076812549 72819.93267211205 72814.4250272883 72814.39900417061 72811.58375961107 72796.47006151003 72795.5184078073 72787.8295204193 72784.94013704949 72784.69281541328 72771.44529611693 72770.37882566922 72767.54429085697 72758.97261599245 72758.2395168408 72754.6777196157 72700.53934512823 72672.09466452578 72593.12834125746 72591.64923009383 72512.02621993421 72510.70245087246 72509.79177905586 72496.15286906049 72494.39578530972 72486.82435684024 72480.74732504226 72475.07109529823 72465.40473602201 72464.42900753528 72463.46911693575 72460.96834812404 72459.67031488223 72451.86406055593 72440.35516465708 72437.56165562387 72434.57864561943 72431.98879690205 72431.92761406642 72414.86953865174 72407.60761064268 72396.26808156974 72318.4277998456 72279.41780297345 72271.35954219018 72260.05649552072 72258.4843710004 72257.75032759582 72253.85985654853 72207.25108461475 72201.89947415669 72120.20908925538 72085.35094359965 72081.21614193352 72063.05045538799 72059.42353824488 72016.28244169433 72005.15095629361 71997.98138367511 71967.2207372804 71961.54787713978 71949.22347902035 71944.93749164585 71942.36851390793 71940.11982817123 71839.67351734525 71835.8845217403 71828.01208268202 71824.33393816504 71788.95942522898 71787.25811284284 71784.87206845825 71780.05459070971 71763.07344798058 71685.62569055363 71666.18612173929 71648.20517504617 71641.090899901 71639.3890178575 71635.50989471047 71631.59722311689 71631.4651312998 71531.92178154316 71530.15252939322 71530.02110969348 71527.31663335636 71476.20653087263 71411.24804022037 71378.55546124076 71375.47593263163 71371.4076242733 71368.46569647845 71360.69187950762 71352.17112201583 71351.33606860624 71335.87047346715 71327.8346407129 71319.50615511114 71296.95404459999 71193.15539965608 71161.46778382437 71158.4104877191 71156.3462016166 71148.347953868 71134.28202283656 71109.4943039551 71106.4425997149 71075.30274133515 71013.17794330865 71010.91355309082 70899.87030999162 70896.00452049365 70853.83250768937 70843.06256839156 70839.50790742243 70836.23313982096 70832.92078993062 70828.54397293323 70812.07567261271 70803.3044193912 70799.896139303 70791.98399810668 70788.53674994012 70788.25567420284 70783.40961789162 70780.11792320416 70772.69423379445 70765.93670433541 70762.7469571954 70740.5312451535 70734.60714303242 70720.3854747955 70712.26812858213 70711.31885645501 70707.22263423688 70677.3344772791 70668.07639809902 70639.86741549519 70607.57941974862 70603.74374064356 70568.83424600241 70538.77882211423 70535.1034509543 70531.3783742556 70529.67850826622 70506.89328287022 70502.37109871153 70499.10192567033 70497.62774738473 70486.10274052912 70469.09678903011 70465.79759063467 70463.29123509185 70458.41061510413 70458.30071050025 70447.65840496434 70425.7202496264 70349.47461620488 70331.63516780105 70219.38564784364 70173.250817538 70169.45162035424 70143.45602986524 70073.32437784011 70068.81111717132 70067.24501194751 70050.40658903039 69980.75396884145 69976.15370471432 69974.60064230737 69972.21984365121 69967.56056151427 69939.06133747559 69936.67588446567 69890.70747168051 69888.40524213466 69861.44273497096 69843.49245130512 69837.85629265742 69833.45917805817 69832.07045525618 69828.06121448941 69809.23514111832 69806.39400518245 69804.97035648373 69800.33719531518 69796.52807764735 69794.02264663207 69790.46596975636 69783.54561607634 69779.55341935823 69774.35081717368 69774.11336967013 69748.28305957702 69710.95170979359 69641.83478833259 69603.5298063238 69598.72080484577 69587.12247110126 69582.52834859569 69578.8348768177 69545.27250685966 69519.47259442409 69512.72499313817 69508.75504349815 69507.12224353323 69506.98188166192 69503.90398009474 69498.58244497767 69493.68728724169 69492.92256759625 69489.18218820156 69488.2098062019 69482.71812374092 69461.43795861996 69460.59461943235 69440.49304237831 69439.38256355358 69414.27285745108 69413.63255691611 69411.42027577639 69406.3789364394 69402.91096827434 69376.16497153004 69374.92052586308 69371.0566022293 69370.66954472507 69364.9769783633 69353.87370858599 69328.22376963189 69327.07731112515 69326.12173019849 69325.92164949066 69291.44783645435 69271.35632467683 69266.23411206914 69264.22896297365 69263.81852618272 69177.10893239737 69165.86175157146 69151.32417187568 69145.82755077674 69139.93392804312 69139.88069431565 69133.89643242172 69098.09665434061 69092.13880058807 69084.59012195567 69080.6283055683 69074.55965649607 69067.57693653519 69054.92780834714 69022.55253839605 69019.29382727308 69015.62480985487 69014.3148409516 68968.1969411958 68960.46904280993 68956.65165896088 68908.82751574402 68908.01098111196 68902.80066931153 68896.38190326568 68895.35875203782 68891.3825488788 68888.75202029281 68862.87681482204 68848.78613431158 68811.62685266782 68805.30221421822 68804.78082192421 68800.35784312965 68780.14713367958 68717.58952458874 68694.2804116369 68687.42849802975 68683.08241357651 68677.54891364125 68660.31776039267 68604.57238818258 68598.38026862749 68598.20303328645 68594.65487077157 68572.99630330858 68571.29915001354 68518.91270508293 68471.31692149548 68460.30031988904 68457.0754469166 68456.36146222959 68450.05278094021 68438.4846007774 68437.72527322867 68432.35454385151 68425.9191163563 68424.86384862846 68290.23420045868 68289.64347410057 68255.6257505117 68238.68421993474 68237.92926773298 68235.43039679166 68174.52287431629 68164.18466422161 68160.43015012983 68083.53629981678 68079.22558938916 68073.02802973024 68027.32665615046 67992.6532218189 67982.23331008821 67975.51775388436 67950.82702996908 67926.39269491694 67919.61588788702 67919.26753857678 67917.77189020041 67908.98889606341 67825.12939690061 67820.80707977212 67811.48277796447 67804.96701692272 67800.94512679089 67800.11329780893 67792.44065103742 67785.67119964794 67784.1068340844 67783.52239641907 67780.4691319763 67778.40532222514 67770.47492583324 67766.65917426728 67765.83349118527 67755.0808179818 67731.95273686269 67731.30723284364 67730.26163160316 67726.5784390177 67726.1363005064 67724.00944075998 67719.15102437862 67715.38023025503 67680.74963965091 67661.17894948272 67657.44371158897 67638.57046179465 67638.17358441533 67633.52619681714 67632.0218946069 67620.5549762072 67596.0690437576 67581.7227957099 67558.34029157642 67552.51449098093 67548.24013120348 67547.99568746172 67547.31655140681 67537.91663808761 67537.68088557968 67536.638531262 67534.90658520968 67530.75478202397 67527.61454678267 67526.6048664426 67522.91123567324 67512.34210076812 67509.81625600871 67508.0480874164 67500.26532850147 67496.19142487767 67494.65151223193 67486.8939525352 67466.1014314649 67463.77457460697 67456.44971302309 67448.25187412316 67440.84086056193 67435.0930074908 67431.4034178892 67429.20774997748 67424.13288540616 67421.92849977061 67421.55589826498 67416.3285630802 67415.8812169038 67411.70410222 67404.19959006413 67391.48146136347 67388.4734613088 67376.6916591509 67375.39463951951 67327.050439479 67313.01072102076 67310.87721765417 67310.82483213526 67310.69147357014 67307.23229622944 67306.85074216523 67306.5617492245 67255.4156420599 67249.65631628642 67246.0559890093 67243.72164951434 67243.14052358594 67201.58399271464 67200.72368828661 67194.17917215174 67185.93396221615 67183.2131976372 67154.07902751456 67146.65102403147 67139.16631791653 67138.78945292011 67107.08474284351 67103.24178776132 67102.52767772129 67100.3812238902 67096.73636607843 67095.79387365219 67087.26371374469 67081.155248661 67077.15774598096 67031.79248444168 67016.82443845547 67013.81398378377 67009.20890971832 67004.82143926396 66997.53664262267 66996.41269181564 66992.3431404683 66990.90341133234 66941.83700292249 66938.03667762999 66931.75602757726 66931.74386998218 66926.69147564491 66921.68541134374 66913.12817672934 66907.09115336805 66905.61972446072 66905.43726948349 66905.21540071264 66881.13104702294 66879.04809475406 66870.87545797578 66854.97558241677 66849.16812453017 66848.69035199926 66842.95077380684 66842.54037061051 66840.51399211341 66840.42479442523 66832.27412747202 66827.86137244888 66825.12499373077 66816.92393949628 66809.33482160495 66804.72777814027 66800.70233395329 66800.03461732177 66799.73159453682 66794.35506050466 66752.99426560116 66752.3412882237 66744.75068140657 66741.56407669905 66741.47734069856 66712.09543789522 66697.94778665525 66696.565616747 66696.49438016431 66689.91490196668 66684.1758472892 66684.01772441779 66675.7784633123 66661.93878345129 66660.885640425 66655.87452423248 66653.0794676184 66650.63428294618 66650.40238331766 66649.51366600457 66647.80382892142 66647.6596788887 66638.82294677602 66638.45283849142 66638.099302394 66629.0940151145 66628.70227417498 66626.93260313319 66619.82571723375 66619.30992517632 66615.63062178982 66610.40530991426 66571.1375270656 66565.61325431366 66565.58130610436 66563.52091834613 66563.12821159726 66561.94242217869 66561.73668985748 66560.98532739887 66560.96767640345 66560.5601299048 66555.1863847816 66553.4523748752 66549.48355191578 66540.65445783988 66538.96008169625 66537.5413066855 66537.25547949782 66528.2620826575 66521.79494158152 66518.07165268098 66503.64333850604 66502.77985403335 66500.55845244706 66482.95311453166 66482.82975249676 66476.18465918035 66474.88755441332 66472.62534635441 66471.50843545314 66466.05922103555 66465.6768695188 66461.01601204916 66455.18916366335 66452.9304424318 66445.34909676903 66438.39044889805 66429.41643594715 66423.86370598934 66422.06257689037 66421.31437537997 66420.7101397019 66411.37407166514 66409.74062124953 66409.10232364596 66407.1445163571 66406.10099072987 66396.35572803208 66387.93761405942 66387.5479649625 66372.6200558351 66361.43536264315 66360.2406151342 66358.13481128476 66350.45982473617 66349.6802717238 66349.48495499868 66317.18104798981 66316.99767392308 66316.94726493293 66316.85564549555 66314.33085466122 66312.72773533105 66300.06614476965 66295.09107017024 66294.4421000883 66291.4464296331 66274.47912448535 66274.0032011566 66271.86561151971 66269.8655832008 66264.76904996124 66263.77246015909 66260.50102375844 66258.83313878426 66254.50688353593 66196.68226622618 66189.51708539957 66188.04304862891 66187.44901666405 66142.14839162753 66136.44923415697 66134.52338268331 66134.33075359148 66133.75116192708 66113.84712136905 66104.93160246413 66103.64922652006 66102.43735800973 66100.03642725898 66087.0529711706 66086.03586990398 66085.35520876916 66085.34685470544 66083.80352503457 66083.13461844213 66078.91077668214 66078.65517716027 66078.43255881287 66071.53081073402 66071.15637591748 66061.65250441071 66061.24406522169 66060.88632441587 66060.80834511096 66059.60471317118 66054.08450471621 66053.8360854242 66050.7969013156 66047.98951924656 66047.88038673076 66047.83459353622 66047.6572104447 66047.54389039271 66047.44882258064 66047.0442023817 66045.56355329033 66045.48379517115 66045.06084163826 66043.7265067791 66043.29773720405 66042.56236704305 66011.35083534614 66007.77609490375 66006.82433252028 66006.76894164509 65993.74290645543 65987.15327663225 65983.46972614584 65981.92575941418 65981.06183469409 65978.33688096111 65978.25903455463 65977.92096935255 65977.76898802545 65974.68501555086 65973.74159294713 65973.13779435669 65973.09811699697 65970.36973875396 65965.39318318697 65964.83210597977 65949.31315811501 65948.17462649943 65947.69663066175 65944.98544921662 65944.5300514644 65942.17127988323 65940.85368970533 65938.38549892737 65938.1215899154 65935.68873624703 65934.61296851138 65933.18960214003 65929.21866467177 65925.28145097703 65925.10510092655 65923.42651769072 65922.76848187721 65920.355150313 1000 2000 3000 65917.77366988536 65917.42124579384 65917.35472768181 65915.191683115 1000 2000 3000 65913.74173835457 1000 2000 65913.67071983297 65913.54663343848 65913.38135904584 65912.91251332521 65912.86595519359 65912.38138264038 65911.87388184865 65911.0422651884 65910.91262789708 65909.44643254887 65907.18517001916 65906.6905428516 65906.63047319464 65903.66367389775 65903.04345749777 65900.11680201309 65898.03444366314 65897.89127352579 65897.72474233736 65897.42686146114 65889.75761242646 1000 2000 65889.4503775269 65888.60005291186 65887.96757268206 1000 2000 3000 65887.78502027241 65887.6735900183 1000 2000 65883.30746975438 65879.42451526482 65876.62593266835 65867.54309209646 65866.8705382418 65864.70234736537 65864.63872184069 65861.65396520337 65861.56001467745 65860.69255144284 65857.91063649424 65857.63904245653 65855.83938529169 65852.53674203974 65851.75722372605 1000 65849.42529692323 65849.39129165064 65849.21964358159 65849.13550889574 65849.01722979212 65848.309837367 65848.0190122691 65847.31340942821 65846.87780415732 65845.22126533584 1000 65844.68403011608 65844.50079455215 65843.99443125122 65833.12417188319 65823.5528378201 65820.2217303047 65820.11724628639 65815.84208487211 65815.59498372914 65812.9306441441 65812.68960067733 65809.75631444433 65809.49070219029 65809.40824727764 65808.68961578053 65807.856322961 65806.8239693821 1000 2000 65805.10277843967 65805.03041109092 1000 65804.22582580996 1000 65802.76896417789 1000 2000 3000 4000 65802.76660850768 65802.712087849 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000 16000 17000 18000 19000 20000 21000 22000 23000 24000 25000 26000 27000 28000 29000 30000 31000 32000 33000 34000 35000 36000 37000 38000 39000 40000 41000 42000 43000 44000 45000 46000 47000 48000 49000 50000 51000 52000 53000 54000 55000 56000 57000 58000 59000 60000 61000 62000 63000 64000 65000 66000 67000 68000 69000 70000 71000 72000 73000 74000 75000 76000 77000 78000 79000 80000 81000 82000 83000 84000 85000 86000 87000 88000 89000 90000 91000 92000 93000 94000 95000 96000 97000 98000 99000 100000 15356.835419 seconds (2.08 M allocations: 2.805 TiB, 1.15% gc time, 0.00% compilation time)
(65802.712087849, [2.1141975308641965 0.0006598046978094491 0.11655773420479303 0.012777777777777782; 0.015043547110055414 0.000603697790014729 0.009920357691770257 0.0020190023752968953; … ; 0.252222222222222 0.0006175771971496377 0.0974509803921567 0.012699999999999986; 0.06587066082615065 0.0009636113373857049 0.022799624300758936 0.0048664688427299185], [2, 4, 2, 2, 4, 2, 2, 2, 2, 4  …  5, 5, 5, 5, 5, 5, 5, 5, 5, 5])

In [18]:
state = copy(Random.default_rng()) #this reaset the random seed so both tests have the same set of "random numbers" 
Random.seed!(1234)

TaskLocalRNG()

In [19]:
function doit_floop(sample, academic_institutions, asink, gsink, psink, tsink, all_institutions, num, numsink)
    b = zeros(Int64, size(sample)[1], size(sample)[2])
    # some initial states
    current_allocation = Array{Int64}(undef, length(all_institutions))
    cur_objective = Inf
    best_mat = nothing
    cursor = 1
    for inst in academic_institutions
        current_allocation[cursor] = 1
        cursor += 1
    end
    # the sinks must stay in fixed types
    # this was built to support more sinks, but by default we only use one
    # change the "current_allocation[cursor] = ..." lines to group sinks together
    for key in asink # other academic
        current_allocation[cursor] = num + min(1, numsink)
        cursor += 1
    end
    for key in gsink # public sector
        current_allocation[cursor] = num + min(2, numsink)
        cursor += 1
    end
    for key in psink # private sector
        current_allocation[cursor] = num + min(3, numsink)
        cursor += 1
    end
    for key in tsink # assistant professor at teaching universities
        current_allocation[cursor] = num + min(4, numsink)
        cursor += 1
    end
    blankcount = 0

    # BEGIN MONTE CARLO REALLOCATION ROUTINE
    while true
        # attempt to reallocate academic institutions to a random spot
        temp_allocation = copy(current_allocation)
        k = rand(1:length(academic_institutions))
        @inbounds temp_allocation[k] = rand(delete!(Set(1:num), temp_allocation[k]))
        # check if the new assignment is better
        test_objective, estimated_means = bucket_estimate_floop(temp_allocation, sample, num, numsink,b)
        if test_objective < cur_objective
            print(test_objective, " ")
            blankcount = 0
            cur_objective = test_objective
            best_mat = estimated_means
            current_allocation = temp_allocation
        else
            blankcount += 1
            if blankcount % 1000 == 0
                print(blankcount, " ")
            end
        end
        if blankcount == 100000
            return cur_objective, best_mat, current_allocation
        end
    end
end



doit_floop (generic function with 1 method)

In [21]:
@time est_obj, est_mat, est_alloc = doit_floop(out, collect(academic), collect(acd_sink), collect(gov_sink), collect(pri_sink), collect(tch_sink), institutions, NUMBER_OF_TYPES, NUMBER_OF_SINKS)

87429.51396509117 87424.40729770961 87416.80166725596 87409.58037319232 87407.34251709997 87402.66757667929 87041.44276817921 87000.6411683785 86928.52346723192 86912.82196385112 86910.7689921239 86906.6460305377 86763.83756922305 86751.77162111866 86738.4299762939 86723.7176055536 86718.91881509808 86702.77634905092 86700.50092777653 86683.96728489985 86670.91647288017 86668.9913878547 86320.19774241728 86304.14027299744 86295.45882943878 86279.80241802127 86262.06818380282 86075.78649163293 86057.93292209749 86051.43059346496 86035.58710945083 86020.88317270644 86015.93572340399 86012.50806788079 85641.32595192705 85635.85938508411 85583.25155000719 85568.01136944073 85550.23983857938 85546.75701388613 85532.10217632542 85515.14074195613 85496.92012982654 85485.8507130594 85475.57758505712 85458.25987824745 85455.62732936858 85442.0773037119 85423.4047862952 85404.5564175505 85385.53978321818 85369.0262410478 85361.455348348 85352.25316251739 85334.3581965768 85316.34364229265 85299.

(66009.5622722823, [0.012697705502339117 0.00062483789582891 0.09725880401911995 0.25152017689331124; 0.002004197024356918 0.0006064553806069095 0.010082706068209334 0.014876250658241148; … ; 0.01271420674405749 0.0006582411795681945 0.11655773420479303 2.1141975308641965; 0.00486348949578954 0.0009611101524532698 0.022762364895153663 0.06588108006397589], [2, 1, 2, 2, 1, 2, 2, 2, 2, 1  …  5, 5, 5, 5, 5, 5, 5, 5, 5, 5])

87429.51396509117 87424.40729770961 87416.80166725596 87409.58037319232 87407.34251709997 87402.66757667929 87041.44276817921 87000.6411683785 86928.52346723192 86912.82196385112 86910.7689921239 86906.6460305377 86763.83756922305 86751.77162111866 86738.4299762939 86723.7176055536 86718.91881509808 86702.77634905092 86700.50092777653 86683.96728489985 86670.91647288017 86668.9913878547 86320.19774241728 86304.14027299744 86295.45882943878 86279.80241802127 86262.06818380282 86075.78649163293 86057.93292209749 86051.43059346496 86035.58710945083 86020.88317270644 86015.93572340399 86012.50806788079 85641.32595192705 85635.85938508411 85583.25155000719 85568.01136944073 85550.23983857938 85546.75701388613 85532.10217632542 85515.14074195613 85496.92012982654 85485.8507130594 85475.57758505712 85458.25987824745 85455.62732936858 85442.0773037119 85423.4047862952 85404.5564175505 85385.53978321818 85369.0262410478 85361.455348348 85352.25316251739 85334.3581965768 85316.34364229265 85299.53287208601 85287.06415387164 85267.57362600011 84923.01048061988 84905.09568484708 84887.07590672057 84867.6145057457 84851.92136605403 84832.25257011986 84822.81041743675 84808.53829674158 84804.66334539204 84790.32076149415 84770.57271223904 84759.42042654913 84636.58301052765 84616.88348610037 84597.07512666748 84577.41858971232 84448.18689265453 84431.26847692323 84423.55380547809 84406.61214526511 84386.67389547893 84290.03142781506 84270.15698421517 84257.28828258513 84238.76770907789 84220.4997913475 84204.89079673898 84187.35038653543 84120.34617427512 84106.36146779278 84033.95239302695 83449.18248671241 83432.85096410257 83416.22942069579 83405.45479900521 83385.74656593204 83380.08790853225 82779.4355154824 82766.20163162884 82751.74513088063 82742.17317422589 82722.88190299764 82627.64829518447 82617.96490055199 82601.79944494474 82586.53638044588 82535.22408635003 82519.04915731179 82508.3098944634 82496.51563084715 82478.69832730843 82467.82563135747 82433.10653916861 82416.24522575649 82176.58915356117 82166.90278816629 82147.81681693545 82130.24585153192 82115.01792643758 82095.73769768763 82076.38117004184 82063.1593602642 81997.03984911632 81993.3824065185 81978.47399043628 81970.89146414123 81954.732369596 81946.71842932192 81927.3755840269 81926.58184316377 81912.93428795274 81894.64904607416 81875.67695461665 81864.55266785139 81848.0275687885 81828.43630652837 81661.5349376951 81659.86827020522 81647.2882880595 81627.85484595179 81608.3481042941 81588.76847116069 81571.7307807993 81553.46574276048 81536.9562496978 81401.39564478604 81389.06419992339 81372.43614868762 81361.39130096177 81350.25149233718 81330.25855382977 81329.2869167285 81313.57973698896 81016.88916992175 81004.05690851851 80985.96249189888 80923.01230689124 80909.8587538455 80513.52281706841 80495.66332762653 80476.32158775441 80459.96314643735 80446.94003350055 80434.36408448871 80427.79011245086 80418.54062065548 80416.01022994716 80407.3639291293 80387.74902183074 80370.71682967138 80336.11637673409 80332.26294074609 80316.80218140985 80298.62102567249 80280.23913583014 80260.3273353551 80240.34726368905 80222.83008864793 80204.36607264518 80184.20547619206 80165.49481995385 80157.0784894677 80139.79642489286 80120.96213529736 80020.40238397397 80006.34616227512 79990.17266828005 79972.27138031286 79537.82894439128 79535.91003508135 79528.52532422193 79434.43372170102 78994.09248626968 78974.721379673 78955.28392733255 78943.68269811146 78926.19938061625 78438.73719545567 78423.69177075071 78420.39919065707 78401.22705753997 78241.0304286122 78225.23697982162 78217.33638663711 78212.85566982512 78193.49941694553 78178.64578431477 78163.62524405142 78150.36239744144 77042.67770000636 77025.4063320579 77007.07651076272 76988.68263619425 76975.00768414405 76913.3513095315 76905.1024430354 76810.21861341802 76792.05792393917 76779.03506945659 76774.56128736278 76772.53026003645 76757.17175801286 76738.82480970127 76728.13022161448 76719.8690253105 76703.00988291315 76691.12151176346 76684.6162861886 76674.44728772096 76156.94762836443 76140.60556659929 76136.67801250936 76119.78881506069 76107.31769541961 75987.54996478384 75982.81609596616 75965.03604769998 75947.19300765313 75870.743292445 75838.86621731783 75822.76625620738 75810.73309265627 75808.8282746434 75801.13609114011 75577.85938858172 75561.9882549055 75483.14247874268 75465.79724338619 75432.52598361262 75415.25826404654 75399.52400454471 75383.7348482517 75247.3384905 75234.63132344614 75221.93935436958 75207.57190339442 75190.23361389112 75177.3700653931 75159.92467954278 75145.22729572744 75130.40015956809 75112.93167561888 75097.84639674451 75090.51722747317 75079.88466582334 75070.18830315169 74983.05684434566 74243.81849171183 74230.85816568426 74223.42274925357 73471.1312078069 73387.35281488058 73371.59550738588 73363.48448802586 73356.08624304246 73349.2623659544 73336.27343999626 73322.99694170612 73310.71695776905 73295.55735724664 73288.2738727296 73272.28853086289 72638.82768234625 72624.59875809227 72612.03797973027 72599.82787284823 72584.44578044832 72577.09341884381 72563.34796196313 72551.58509780321 72546.96840103055 72531.3970995375 72421.73327857649 72413.49348189589 72413.12633319368 72401.69931682065 72386.224021475 72372.19643734934 72362.629057229 72356.2506064753 72343.59720586098 72335.56699296206 72261.90336852451 72254.32162793938 72251.17219603385 72249.6656677755 71991.34756663334 71980.22059745772 71966.44663827034 71951.1321547846 71938.74127082151 71923.31687224907 71916.98719488594 71906.8866188566 71899.20622127617 71893.38691145182 71856.98866886663 71853.44341341159 70928.31922280617 70924.5243779863 70910.14736328536 70907.54809715165 70907.20474869394 70899.12706357508 70887.47336519536 70882.97142556173 70869.3571838453 70858.51586880512 70743.84468653925 70729.91618876475 70717.8239962265 70703.0839629066 70699.98965425162 70685.1710970487 70670.29326547527 70659.67000832042 70397.70938432445 70387.37499415515 70376.97066465071 70186.37647868384 70125.43267264694 70112.5373159131 70101.45376989286 70086.7193695941 69955.62826352971 69953.3232939177 69944.19670947807 69664.15317574749 69655.15329879179 69642.13165682416 69618.04964290597 69601.46711265483 69588.44431790021 69574.02198505859 69560.48983073898 69548.99724353786 69534.39139695904 69521.13071598003 69509.4862117052 69494.71735232355 69485.54111995769 69390.05691002653 69380.9867778379 69367.71203184556 69355.75555885967 69340.97538119822 69330.35369029653 69315.46436914991 69300.40878813791 69291.04073290847 69285.8524242399 69281.34202739305 69270.32868593794 69055.61960408055 69053.38883509708 69040.03318269989 69027.56540223709 68783.0450097949 68781.82367924474 68777.74129553972 68769.13388044003 68733.33765616085 68722.16050727342 68706.73769846495 68694.60848259043 68692.66031384793 68677.70349908782 68662.68395663997 68589.80411649396 68576.21048866026 68561.07107344014 68547.28383903316 68543.1863144596 68533.14458341476 68527.92406720875 68476.14212032952 68459.65353914048 68448.48524428323 68433.40864462536 68432.62917809925 68417.58189839762 68402.46995282566 68387.2930492085 68372.05089119787 68361.98384907372 68348.0508446169 68297.61244320677 68283.7793050092 68272.81639006485 68262.48516770377 68250.17969061833 68135.69391108747 68120.49195222586 68110.93241423588 68096.97798282519 68084.61308711882 68075.32578304163 68035.5479350944 68023.36278967775 68007.90060053449 67993.81108572803 67978.28101907301 67962.68272933678 67947.01586598798 67942.84792575584 67937.33926228774 67926.47125789436 67913.58218486796 67900.70891281463 67896.70753382379 67883.69542821014 67881.77758522348 67867.19606041702 67861.43732696128 67860.76963443383 67853.41832635766 67842.84875224147 67784.38154323546 67774.71585780868 67759.75188369858 67746.13791957636 67730.94892616308 67717.71742619686 67702.3767784147 67695.03917810853 67693.74877753563 67681.54801707082 67670.99830644965 67665.96127378303 67651.22274755829 67637.43888168744 67570.3516613399 67553.49322773893 67480.29839456436 67478.8829474281 67466.72276958163 67462.68266730188 67446.76213387464 67428.08580782486 67412.6480548834 67397.14055318233 67393.99483679302 67391.82677487693 67384.09888605154 67370.14258016078 67354.66710153902 67336.99526816483 67319.70302872725 67302.57400148208 67288.39492577013 67271.09665312014 67258.92674289516 67197.31581065622 67181.67960066168 67177.54752217318 67168.70910878322 67151.37835740014 67150.64477265584 67132.59719110957 67121.36348289544 67119.4415154231 67115.62725604649 67106.64840426558 66845.22177376934 66833.7450543709 66816.10892736688 66816.10587400122 66798.31227144325 66795.65880418275 66779.27617167245 66773.01940336768 66766.6910752061 66764.42244160276 66755.55828648501 66741.81372608176 66723.69883325856 66705.49404085806 66700.92351082475 66682.58667014624 66664.18955023409 66651.99767651528 66651.30224276631 66634.37122920052 66630.39602863611 66627.10990996176 66609.78490411652 66526.549989789 66509.34304573832 66492.0863870573 66480.86697067421 66464.73037973378 66449.24768326717 66430.3788310717 66412.95917550765 66408.71990942021 66392.86331404539 66386.57356121333 66370.83619034669 66351.72781644385 66332.51829536044 66317.68693532143 66306.76752847756 66293.73519426194 66268.38056388589 66259.12188833731 66256.2555841178 66236.80348085069 66235.30434339428 66229.06765574528 66228.15527147625 66226.31679939434 66210.87905394525 66210.77411137536 66192.66613255942 66177.77245363923 66168.53594148805 66150.1687033793 66149.89506060389 66142.16709703398 66140.23252588922 66139.96654049476 66139.86283488787 66121.89778304496 66112.96318892561 66111.99856345268 66111.25597349108 66092.49558516507 66092.00322478796 66091.82960996372 66091.16609541698 66088.7702566309 66068.21808175184 66053.11461555632 66049.16956712083 66047.73787354902 66047.00895857095 66045.7741185386 66045.58424260883 66045.56490913968 66043.98626723737 66042.83142429305 66041.79306913874 66041.34412674683 66036.2276631238 66035.43901024846 66035.43513913345 66034.8963603024 66034.75186727155 66033.50493963133 66031.22433681512 66031.18784604239 1000 66011.26181983278 66009.64121093752 1000 66009.5622722823 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000 16000 17000 18000 19000 20000 21000 22000 23000 24000 25000 26000 27000 28000 29000 30000 31000 32000 33000 34000 35000 36000 37000 38000 39000 40000 41000 42000 43000 44000 45000 46000 47000 48000 49000 50000 51000 52000 53000 54000 55000 56000 57000 58000 59000 60000 61000 62000 63000 64000 65000 66000 67000 68000 69000 70000 71000 72000 73000 74000 75000 76000 77000 78000 79000 80000 81000 82000 83000 84000 85000 86000 87000 88000 89000 90000 91000 92000 93000 94000 95000 96000 97000 98000 99000 100000 2434.850830 seconds (19.86 M allocations: 4.852 GiB, 0.01% gc time)
(66009.5622722823, [0.012697705502339117 0.00062483789582891 0.09725880401911995 0.25152017689331124; 0.002004197024356918 0.0006064553806069095 0.010082706068209334 0.014876250658241148; … ; 0.01271420674405749 0.0006582411795681945 0.11655773420479303 2.1141975308641965; 0.00486348949578954 0.0009611101524532698 0.022762364895153663 0.06588108006397589], [2, 1, 2, 2, 1, 2, 2, 2, 2, 1  …  5, 5, 5, 5, 5, 5, 5, 5, 5, 5])

In [19]:
function bucket_extract(assign, A::Matrix{Int64}, num, numsink)
    T = zeros(Int64, num + numsink, num)
    for i in 1:size(A)[1], j in 1:size(A)[2]
         @inbounds T[(num + 1) * (assign[j] - 1) + assign[i]] += A[i, j]
    end
    return T
end

bucket_extract (generic function with 1 method)

The estimated means are:

In [20]:
est_mat = bucket_extract(est_alloc, out, NUMBER_OF_TYPES, NUMBER_OF_SINKS)

5×4 Matrix{Int64}:
  441   821   861   41
  108   263   615   15
   41    94   666    3
   96   178    93   67
 1208  1751  1949  438

The matrix is then ordered such that, for any two elements in symmetric positions across the diagonal, the element below the diagonal is greater than the element above the diagonal. For example, the 204 in cell (1, 2) is greater than the 50 in the symmetric cell (2, 1) below, as are all the other symmetric pairs.

In [21]:
sum(est_mat)

9749

In [22]:
M = est_mat

5×4 Matrix{Int64}:
  441   821   861   41
  108   263   615   15
   41    94   666    3
   96   178    93   67
 1208  1751  1949  438

The next bit re-orders the tiers so that tier 1 is in the first, tier 2 the second, etc.  The ordering of the tiers is by total placements.

In [23]:
# the new placements matrix
placement_rates = zeros(Int64, (TOTAL_DISTRIBUTIONS, NUMBER_OF_TYPES))
#row sums in the estimated matrix
ovector = sum(M, dims=1)
# row sums reordered highest to lowest
svector = sortslices(ovector,dims=2, rev=true) 
#println(svector)
#println(length(ovector))
# a mapping from current row index to the index it should have in the new matrix
o = Dict{}()
for i in 1:length(ovector)
    for k in 1:length(svector)
        if ovector[1,i] == svector[1,k]
            o[i] = k
            break
        end
    end
end 
#println(o)
P = zeros(Int64, (TOTAL_DISTRIBUTIONS, NUMBER_OF_TYPES))
#shuffle the cells for the tier to tier placements
for i in 1:NUMBER_OF_TYPES
    for j in 1:NUMBER_OF_TYPES
        placement_rates[o[i],o[j]] = M[i,j]
    end
end
#shuffle the cells for tier to sink placements (separate since sink row indices don't change)
for i in NUMBER_OF_TYPES+1:NUMBER_OF_TYPES+NUMBER_OF_SINKS
    for j in 1:NUMBER_OF_TYPES
        placement_rates[i,o[j]] = M[i,j]
    end
end
placement_rates


            
    

5×4 Matrix{Int64}:
  666    94    41    3
  615   263   108   15
  861   821   441   41
   93   178    96   67
 1949  1751  1208  438

In [43]:
println(sum(placement_rates))

9749


In [44]:
o # this is the type to tier mapping, first argument is type, second is the tier

Dict{Any, Any} with 4 entries:
  4 => 4
  2 => 2
  3 => 1
  1 => 3

To verify that this is ordered properly, we can check symmetric indices:

In [28]:
for i in 1:NUMBER_OF_TYPES, j in 1:NUMBER_OF_TYPES
    if i > j # not a diagonal and only check once
        if placement_rates[i, j] <= placement_rates[j, i]
            println("FAULT: hiring ", i, " with graduating ", j, ": downward rate: ", placement_rates[i, j], ", upward rate: ", placement_rates[j, i])
        end
    end
end
println("Check Complete")

Check Complete


If everything worked fine, there should be no faults in this cell. If there are, change the order in the cell with the explicit order.

James' code only matched an institution name with a tier.  To save the matching to the database (and to use it for other purposes) we need to match tiers with oids.  Two dictionaries were created above
`oid_mapping` has keys that correspond with oids and values that coincide with institution ids,
`institution_mapping` has keys which are institution ids, while values are the corresponding institution name
The next function reverses this, takes an institution name and returns the institution id, then the set of oids that are associated with that institution.
`false` is returned if either is not found.

In [29]:
function name_to_oid(institution_name::String, institutions::Dict, organizations::Dict)
    oids = String[]
    institution_id = String
    for k in keys(institutions)
        if institutions[k] == institution_name
            institution_id = k
            break
            #push!(oids,k)
        end
    end
    for k in keys(organizations)
        if organizations[k] == institution_id
            push!(oids, k)
        end
    end
return institution_id, oids
end


name_to_oid (generic function with 1 method)

The lengths of mappings.  The total number of institutions should exceed the total number in the institution mapping by the number of academic institutions who hired at a level other than assistant professor (postdocs,lecturer, etc).  The saved mapping the the database should then contain the same number as the institution_mapping.

In [34]:
println(length(institution_mapping))
print(length(institutions))

1451
2193

For mysql

In [35]:
#saving type allocation
function save_type(tier, oids, stm)
    for oid in oids
        DB.execute(stm, [tier, oid])
    end
end


save_type (generic function with 1 method)

In [41]:
o

Dict{Any, Any} with 4 entries:
  4 => 4
  2 => 2
  3 => 1
  1 => 3

In [37]:
#to record the type allocation
n = 0
B = Set{String}()

if SAVE_TO_DATABASE
    d = DB.connect(MySQL.Connection,cfg["host"], cfg["user"], cfg["password"], db =cfg["database"], port = parse(Int64,cfg["port"]))
    query = "drop table if exists type_distribution"
    DB.execute(d, query)
    query = "create table type_distribution (id int auto_increment primary key, type int, oid int,created timestamp default CURRENT_TIMESTAMP )"
    DB.execute(d, query)
    query = "insert into type_distribution set type=?,oid=?"
    stm = DB.prepare(d, query)
end
for j in 1:NUMBER_OF_TYPES
    println("Type ", j)
    println()
    i = 1
    for entry in est_alloc
        if entry == j
            push!(B,institutions[i])
            iid, oids = name_to_oid(institutions[i],institution_mapping, oid_mapping)
            println(institutions[i]," ",oids)
            if(length(oids)) == 0
                println("*****Error in data****")
                n += 1
            end
            if SAVE_TO_DATABASE
                save_type(j, oids, stm)
            end
        end
        i += 1
    end
    println(n," errors counted")
    println()
end
if SAVE_TO_DATABASE
    DB.close(d)
end


Type 1

University of Oslo ["3810", "675"]
Rice University ["2099", "110"]
University of Massachusetts, Amherst ["1635", "2250", "1642", "330"]
Eastern Kentucky University ["65"]
Peking University ["1830", "767", "3008", "1273", "2904"]
University of Lausanne (Université de Lausanne) ["376", "3355"]
Georgetown University ["6269", "1419", "99", "4600", "1184"]
Universidad de los Andes ["1033", "533"]
Vrije Universiteit Amsterdam ["898", "72", "250", "1369", "2252"]
Hong Kong Baptist University ["1080"]
Simon Fraser University ["332", "1310"]
Colorado State University ["4997", "1788", "3130"]
University of Leicester ["477"]
London Business School ["1754", "660", "3203", "3276"]
University of Amsterdam (Universiteit van Amsterdam) ["5211", "1343", "2389"]
Wuhan University ["1183", "4820"]
George Mason University ["2756", "1770", "1637"]
Drexel University ["2379", "634"]
University of Miami ["2546", "3527", "750"]
Concordia University ["160", "7009"]
University of Kentucky ["1535", "3924",

In [38]:
n = 0
B = Set{String}()
if SAVE_TO_DATABASE
    d = DB.connect(MySQL.Connection,cfg["host"], cfg["user"], cfg["password"], db =cfg["database"], port = parse(Int64,cfg["port"]))
    query = "insert into type_distribution set type=?,oid=?"
    stm = DB.prepare(d, query)
end
for j in NUMBER_OF_TYPES+1:NUMBER_OF_SINKS+NUMBER_OF_TYPES
    println("SINK ", j - NUMBER_OF_TYPES)
    println()
    i = 1
    for entry in est_alloc
        if entry == j 
            if occursin("(private sector)", institutions[i])
                ch = 17
            elseif (!occursin("(private sector)", institutions[i]) && !occursin("(public sector)", institutions[i]) 
                    && !occursin("(academic sink)", institutions[i]))
                ch = 0
            else ch = 16
            end
            if !(institutions[i][1:prevind(institutions[i], end,ch)] in B)
                iid, oids = name_to_oid(institutions[i][1:prevind(institutions[i],end,ch)],institution_mapping, oid_mapping)
                println(institutions[i]," ", oids)
                if length(oids) == 0
                    println("****** error in data*****")
                    n +=1
                end
                if SAVE_TO_DATABASE
                    save_type(j, oids, stm)
                end
                push!(B, institutions[i][1:prevind(institutions[i], end,ch)])
            end
        end
        i += 1
    end
    println()
    println(n," errors counted")
end
DB.close(d)

SINK 1

CNRS (Centre national de la recherche scientifique) (academic sink) ["270"]
De Nederlandsche Bank (academic sink) ["3357"]
Institute of Economics (Ekonomski institut), Zagreb (academic sink) ["3474"]
University of Bremen (Universität Bremen) (academic sink) ["7847"]
Texas A&M University, College Station (academic sink) ["1990", "596", "1823", "3003"]
Universitas Indonesia (academic sink) ["1544"]
University of Oxford (academic sink) ["1050", "1398", "1233", "612", "693", "1832", "1504", "5931", "217"]
Universitat Pompeu Fabra (academic sink) ["453", "998"]
City University of London (academic sink) ["883", "1094"]
Coventry University (academic sink) ["2996"]
Centre d'Etudes Prospectives et d'Informations Internationales (CEPII) (academic sink) ["1696"]
University of Sussex (academic sink) ["430", "2549", "831"]
Johns Hopkins University (academic sink) ["1043", "49", "1345", "2864", "2340", "2624", "1328", "537"]
State University of New York at Buffalo (academic sink) ["1756", "4

In [39]:
println("acd_sink ",length(acd_sink))
println("teaching ",length(tch_sink))
println("academic ",length(academic))
println(" gov ", length(gov_sink))
println("private ", length(pri_sink))
println("institutions ", length(institutions))

acd_sink 492
teaching 983
academic 468
 gov 114
private 136
institutions 2193


## References

Karrer, B., and M. E. J. Newman (2011): "Stochastic Blockmodels and community structure in networks," Physical Review, 83(1).

Peixoto, T. (2014): "Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models," Physical Review, 89(1).