Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert quadrinomials to infrasubspecies script #819

Closed
2 tasks done
jonkerz opened this issue Dec 20, 2019 · 4 comments
Closed
2 tasks done

Convert quadrinomials to infrasubspecies script #819

jonkerz opened this issue Dec 20, 2019 · 4 comments

Comments

@jonkerz
Copy link
Member

jonkerz commented Dec 20, 2019

See #714, AC issue: https://antcat.org/issues/41

TODO:

@jonkerz jonkerz added the script label Dec 20, 2019
@jonkerz jonkerz mentioned this issue Dec 20, 2019
11 tasks
@jonkerz
Copy link
Member Author

jonkerz commented Dec 20, 2019

Batch 1: Quadrinomials where the target subspecies exists

Script
jonkerz = User.find 60
Activity.execute_script_activity jonkerz, "Convert quadrinomials to infrasubspecies [batch 1], see %github819"

# For PaperTrail
antcat_bot = User.find 62
PaperTrail.request.whodunnit = antcat_bot.id

def quadrinomials
  Subspecies.joins(:name).where("(LENGTH(names.name) - LENGTH(REPLACE(names.name, ' ', '')) >= 3) ")
end

def puts_stats
  "Quadrinomials count: #{quadrinomials.count}"
end

def fix! soon_infrasubspecies, antcat_bot
  name_string = soon_infrasubspecies.name_cache
  raise "#{soon_infrasubspecies.id}: has soft-validation issues" if soon_infrasubspecies.soft_validation_warnings.size.positive?
  raise "#{soon_infrasubspecies.id}: #{name_string} contains weird characters" unless name_string =~ /^[[:alpha:][:blank:]-]+$/

  target_subspecies_name_string = name_string.split[0..2].join(' ')
  target_subspecies = Subspecies.where(name_cache: target_subspecies_name_string)

  target_subspecies_count = target_subspecies.count
  if target_subspecies_count == 0
    puts "#{soon_infrasubspecies.id}: found no subspecies".blue
    return
  end

  if target_subspecies_count > 1
    raise "#{soon_infrasubspecies.id}: found too many subspecies"
  end

  the_target_subspecies = target_subspecies.first

  Taxon.transaction do
    soon_infrasubspecies.name.update!(type: 'InfrasubspeciesName')
    soon_infrasubspecies.update_columns(type: 'Infrasubspecies')

    infrasubspecies = Infrasubspecies.find(soon_infrasubspecies.id)
    infrasubspecies.update!(subspecies: the_target_subspecies)

    puts "#{soon_infrasubspecies.id}: fixed!".green

    if infrasubspecies.soft_validation_warnings.size.positive?
      puts "#{soon_infrasubspecies.id}: but it has soft-validation issues...".red
    end

    infrasubspecies.create_activity :update, antcat_bot, edit_summary: "Convert subspecies quadrinomial to infrasubspecies quadrinomial, see %github819"
  end
end

puts_stats

quadrinomials.each do |taxon|
  fix! taxon, antcat_bot
end; nil

puts_stats

puts "Done"
Results

Quadrinomials count: 763

430274: found no subspecies
430286: found no subspecies
430288: found no subspecies
430291: fixed!
430310: fixed!
430337: fixed!
430351: fixed!
430355: found no subspecies
430359: fixed!
430377: fixed!
430585: fixed!
430687: fixed!
430738: found no subspecies
430743: found no subspecies
430745: found no subspecies
430759: found no subspecies
430768: found no subspecies
430770: found no subspecies
430771: found no subspecies
430887: found no subspecies
431016: fixed!
431020: found no subspecies
431173: fixed!
431203: fixed!
431244: found no subspecies
431284: found no subspecies
431285: found no subspecies
431426: found no subspecies
431553: fixed!
431554: fixed!
431559: fixed!
431579: fixed!
431651: fixed!
431678: found no subspecies
431745: fixed!
431824: found no subspecies
431825: found no subspecies
431902: found no subspecies
431925: found no subspecies
431949: found no subspecies
431955: found no subspecies
432060: fixed!
432068: found no subspecies
432109: found no subspecies
432110: found no subspecies
432115: found no subspecies
432125: fixed!
432144: found no subspecies
432157: fixed!
432165: fixed!
432181: found no subspecies
432188: fixed!
432229: fixed!
432242: fixed!
432243: found no subspecies
432267: fixed!
432279: fixed!
432285: found no subspecies
432287: found no subspecies
432295: fixed!
432298: found no subspecies
432303: found no subspecies
432304: found no subspecies
432305: found no subspecies
432306: found no subspecies
432326: found no subspecies
432337: found no subspecies
432341: fixed!
432342: fixed!
432346: found no subspecies
432393: fixed!
432410: found no subspecies
432429: found no subspecies
432431: found no subspecies
432435: found no subspecies
432451: fixed!
432459: fixed!
432469: fixed!
432484: fixed!
432490: found no subspecies
432495: found no subspecies
432500: found no subspecies
432507: fixed!
432612: fixed!
432624: fixed!
432637: found no subspecies
432646: found no subspecies
432650: found no subspecies
432663: found no subspecies
432672: fixed!
432683: found no subspecies
432710: fixed!
432723: found no subspecies
432747: found no subspecies
432758: found no subspecies
432770: found no subspecies
432814: found no subspecies
432820: fixed!
432829: found no subspecies
432837: found no subspecies
432845: found no subspecies
432859: found no subspecies
432874: fixed!
432881: found no subspecies
432926: found no subspecies
432982: fixed!
433012: found no subspecies
433019: fixed!
433058: fixed!
433060: found no subspecies
433067: found no subspecies
433082: found no subspecies
433083: found no subspecies
433085: found no subspecies
433091: fixed!
433110: found no subspecies
433128: found no subspecies
433132: fixed!
433149: found no subspecies
433151: found no subspecies
433167: found no subspecies
433169: found no subspecies
433179: fixed!
433204: found no subspecies
433209: found no subspecies
433224: found no subspecies
433260: found no subspecies
433261: found no subspecies
433281: found no subspecies
433293: found no subspecies
433310: found no subspecies
433312: found no subspecies
433326: found no subspecies
433327: found no subspecies
433336: found no subspecies
433338: found no subspecies
433347: found no subspecies
433348: fixed!
433357: fixed!
433380: found no subspecies
433385: found no subspecies
433396: found no subspecies
433416: found no subspecies
433428: fixed!
433464: found no subspecies
433479: fixed!
433522: found no subspecies
433540: fixed!
433586: fixed!
433589: found no subspecies
433596: found no subspecies
433607: found no subspecies
433616: found no subspecies
433622: found no subspecies
433634: fixed!
433644: found no subspecies
433650: found no subspecies
433654: found no subspecies
433677: found no subspecies
433683: found no subspecies
433693: found no subspecies
433711: found no subspecies
433714: fixed!
433776: found no subspecies
433813: found no subspecies
433818: fixed!
433820: found no subspecies
433827: found no subspecies
433834: found no subspecies
433841: found no subspecies
433858: found no subspecies
433905: fixed!
433906: fixed!
433930: found no subspecies
433931: found no subspecies
433946: found no subspecies
433968: found no subspecies
433992: found no subspecies
433997: found no subspecies
434019: fixed!
434040: fixed!
434070: found no subspecies
434372: found no subspecies
434373: found no subspecies
434378: found no subspecies
434379: found no subspecies
434384: found no subspecies
434394: found no subspecies
434398: found no subspecies
434413: found no subspecies
434415: found no subspecies
434424: found no subspecies
434442: found no subspecies
434457: found no subspecies
434468: found no subspecies
434481: found no subspecies
434484: found no subspecies
434492: found no subspecies
434504: found no subspecies
434511: found no subspecies
434599: fixed!
434621: found no subspecies
434646: fixed!
434740: found no subspecies
434769: fixed!
434789: fixed!
434857: found no subspecies
434884: fixed!
435165: fixed!
435181: found no subspecies
435245: found no subspecies
435246: found no subspecies
435253: fixed!
435255: fixed!
435258: fixed!
435296: found no subspecies
435322: fixed!
435331: found no subspecies
435373: found no subspecies
435378: found no subspecies
435380: fixed!
435396: found no subspecies
435431: found no subspecies
435437: found no subspecies
435441: fixed!
435450: found no subspecies
435458: fixed!
435460: fixed!
435470: found no subspecies
435478: fixed!
435481: found no subspecies
435483: fixed!
435490: fixed!
435512: found no subspecies
435521: fixed!
435530: found no subspecies
435538: fixed!
435555: found no subspecies
435571: found no subspecies
435588: fixed!
435633: fixed!
435656: fixed!
435658: found no subspecies
435688: found no subspecies
435693: found no subspecies
435695: found no subspecies
435732: found no subspecies
435753: found no subspecies
435763: fixed!
435773: found no subspecies
435789: fixed!
435795: found no subspecies
435799: found no subspecies
435801: found no subspecies
435853: fixed!
435856: found no subspecies
435875: found no subspecies
435881: fixed!
435893: fixed!
435894: found no subspecies
435905: fixed!
435915: found no subspecies
435962: found no subspecies
435969: fixed!
435972: fixed!
435987: fixed!
435993: found no subspecies
436053: found no subspecies
436060: found no subspecies
436064: fixed!
436066: found no subspecies
436077: fixed!
436079: fixed!
436092: found no subspecies
436097: fixed!
436099: fixed!
436100: found no subspecies
436102: fixed!
436104: found no subspecies
436132: found no subspecies
436136: found no subspecies
436149: found no subspecies
436167: fixed!
436257: found no subspecies
436312: found no subspecies
436320: found no subspecies
436324: found no subspecies
436328: found no subspecies
436329: found no subspecies
436331: found no subspecies
436333: found no subspecies
436334: found no subspecies
436341: found no subspecies
436342: found no subspecies
436356: found no subspecies
436358: found no subspecies
436359: found no subspecies
436366: found no subspecies
436367: found no subspecies
436368: found no subspecies
436369: found no subspecies
436385: found no subspecies
436588: fixed!
436757: fixed!
436768: found no subspecies
436815: fixed!
436862: fixed!
436876: fixed!
436916: found no subspecies
436923: found no subspecies
436925: found no subspecies
436942: found no subspecies
436943: found no subspecies
437006: found no subspecies
437108: found no subspecies
437236: fixed!
437266: found no subspecies
437269: found no subspecies
437293: found no subspecies
437354: found no subspecies
437382: found no subspecies
437384: found no subspecies
437388: fixed!
437389: found no subspecies
437392: found no subspecies
437449: found no subspecies
437455: found no subspecies
437465: fixed!
437494: found no subspecies
437523: fixed!
437532: fixed!
437572: found no subspecies
437575: found no subspecies
437579: found no subspecies
437588: fixed!
437602: fixed!
437608: fixed!
437611: fixed!
437627: found no subspecies
437629: found no subspecies
437701: fixed!
437704: fixed!
437708: found no subspecies
437730: found no subspecies
437753: found no subspecies
437762: fixed!
437775: fixed!
437802: fixed!
437817: found no subspecies
437827: found no subspecies
437830: found no subspecies
437841: found no subspecies
437846: found no subspecies
437848: fixed!
437877: fixed!
438026: found no subspecies
438160: found no subspecies
438169: found no subspecies
438172: found no subspecies
438322: found no subspecies
438426: fixed!
438492: found no subspecies
438500: found no subspecies
438503: found no subspecies
438522: fixed!
438524: found no subspecies
438542: found no subspecies
438559: found no subspecies
438571: found no subspecies
438609: fixed!
438647: found no subspecies
438710: fixed!
438728: fixed!
438737: found no subspecies
438767: fixed!
438865: fixed!
438886: fixed!
438908: fixed!
438936: found no subspecies
438945: fixed!
438974: found no subspecies
438991: fixed!
439013: fixed!
439015: found no subspecies
439144: fixed!
439150: found no subspecies
439255: found no subspecies
439286: fixed!
439323: fixed!
439325: fixed!
439364: fixed!
439365: found no subspecies
439366: fixed!
439373: found no subspecies
439546: found no subspecies
439552: fixed!
439553: fixed!
439554: found no subspecies
439556: fixed!
439558: found no subspecies
439563: fixed!
439566: fixed!
439567: fixed!
439568: fixed!
439569: found no subspecies
439575: fixed!
439576: fixed!
439580: found no subspecies
439582: fixed!
439583: found no subspecies
439589: found no subspecies
439597: fixed!
439600: found no subspecies
439602: found no subspecies
439603: found no subspecies
439604: found no subspecies
439607: fixed!
439613: found no subspecies
439648: fixed!
439649: found no subspecies
439654: found no subspecies
439660: fixed!
439843: found no subspecies
439883: fixed!
439891: found no subspecies
440011: found no subspecies
440051: found no subspecies
440066: found no subspecies
440077: found no subspecies
440079: found no subspecies
440081: found no subspecies
440105: found no subspecies
440108: found no subspecies
440134: found no subspecies
440151: found no subspecies
440153: found no subspecies
440161: found no subspecies
440176: found no subspecies
440177: found no subspecies
440185: found no subspecies
440196: found no subspecies
440205: found no subspecies
440213: found no subspecies
440230: found no subspecies
440235: found no subspecies
440237: found no subspecies
440249: found no subspecies
440251: found no subspecies
440342: found no subspecies
440354: found no subspecies
440356: found no subspecies
440399: found no subspecies
440415: found no subspecies
440486: found no subspecies
440523: found no subspecies
440551: found no subspecies
440570: found no subspecies
440591: found no subspecies
440671: fixed!
440716: found no subspecies
440720: found no subspecies
440728: found no subspecies
440734: fixed!
440778: found no subspecies
440783: found no subspecies
440794: found no subspecies
440795: found no subspecies
440830: found no subspecies
440843: found no subspecies
440863: found no subspecies
440888: found no subspecies
441239: fixed!
441392: found no subspecies
441408: found no subspecies
441425: found no subspecies
441468: found no subspecies
441484: found no subspecies
441663: found no subspecies
441666: found no subspecies
441669: found no subspecies
441728: fixed!
441762: fixed!
442171: fixed!
442194: fixed!
442245: found no subspecies
442369: fixed!
442401: fixed!
442466: found no subspecies
442478: found no subspecies
442494: found no subspecies
442516: found no subspecies
442539: found no subspecies
442542: found no subspecies
442586: found no subspecies
442589: found no subspecies
442600: found no subspecies
442601: found no subspecies
442603: fixed!
442625: fixed!
442771: fixed!
442963: fixed!
443125: found no subspecies
443183: fixed!
443315: fixed!
443325: fixed!
443340: found no subspecies
443344: fixed!
443391: fixed!
443392: fixed!
443414: found no subspecies
443514: found no subspecies
443530: fixed!
443552: found no subspecies
443625: found no subspecies
443659: found no subspecies
443669: found no subspecies
443706: fixed!
443752: found no subspecies
443755: fixed!
443772: found no subspecies
443799: found no subspecies
443828: found no subspecies
443833: found no subspecies
443890: found no subspecies
443925: found no subspecies
443929: found no subspecies
443932: fixed!
443954: found no subspecies
443981: found no subspecies
444045: fixed!
444066: found no subspecies
444093: fixed!
444181: fixed!
444187: fixed!
444193: found no subspecies
444196: found no subspecies
444215: found no subspecies
444240: fixed!
444325: found no subspecies
444334: fixed!
444340: fixed!
444343: found no subspecies
444375: found no subspecies
444390: fixed!
444407: found no subspecies
444411: fixed!
444486: fixed!
444515: found no subspecies
444544: fixed!
444559: found no subspecies
444623: fixed!
444625: found no subspecies
444639: found no subspecies
444674: fixed!
444752: found no subspecies
444839: fixed!
444856: fixed!
444871: found no subspecies
444876: found no subspecies
444886: found no subspecies
444888: found no subspecies
444941: found no subspecies
445078: fixed!
445123: found no subspecies
445165: fixed!
445189: found no subspecies
445249: found no subspecies
445336: found no subspecies
445342: found no subspecies
445347: found no subspecies
445361: fixed!
445387: found no subspecies
445396: found no subspecies
445419: found no subspecies
445465: found no subspecies
445468: found no subspecies
445627: found no subspecies
445635: found no subspecies
445721: found no subspecies
445756: found no subspecies
445926: found no subspecies
445969: found no subspecies
446027: found no subspecies
446047: fixed!
446062: found no subspecies
446125: found no subspecies
446144: found no subspecies
446184: found no subspecies
446228: fixed!
446266: found no subspecies
446294: found no subspecies
446357: found no subspecies
446410: found no subspecies
446462: found no subspecies
446526: found no subspecies
446714: found no subspecies
446734: found no subspecies
446738: fixed!
446742: found no subspecies
446748: found no subspecies
447055: found no subspecies
447093: fixed!
447130: fixed!
447218: found no subspecies
447267: found no subspecies
447363: found no subspecies
447408: found no subspecies
447429: found no subspecies
447675: found no subspecies
447747: fixed!
447754: fixed!
447873: found no subspecies
447903: fixed!
447905: fixed!
447928: fixed!
447960: fixed!
448257: found no subspecies
448882: found no subspecies
449277: fixed!
449304: found no subspecies
449355: found no subspecies
449367: fixed!
450089: found no subspecies
450090: found no subspecies
450109: fixed!
450165: fixed!
450175: found no subspecies
450184: found no subspecies
450220: fixed!
450227: found no subspecies
450230: fixed!
450252: found no subspecies
450262: fixed!
450288: found no subspecies
450302: found no subspecies
450307: found no subspecies
450348: fixed!
450351: found no subspecies
450355: fixed!
450356: found no subspecies
450370: found no subspecies
450444: fixed!
450475: found no subspecies
450480: found no subspecies
450505: fixed!
450590: found no subspecies
450618: found no subspecies
450639: found no subspecies
450674: found no subspecies
450681: found no subspecies
450746: found no subspecies
450748: found no subspecies
450760: found no subspecies
450786: fixed!
450796: found no subspecies
450876: fixed!
450906: found no subspecies
450990: fixed!
451016: fixed!
451056: fixed!
451078: fixed!
451087: found no subspecies
451190: fixed!
451228: fixed!
451356: found no subspecies
451357: found no subspecies
451365: found no subspecies
451370: found no subspecies
451371: found no subspecies
451376: found no subspecies
451399: found no subspecies
451401: found no subspecies
451424: found no subspecies
451462: fixed!
451492: found no subspecies
456462: fixed!
456717: found no subspecies
457066: found no subspecies
457083: found no subspecies
457111: fixed!
457192: fixed!
457207: fixed!
457255: found no subspecies
457276: fixed!
457615: fixed!
457703: found no subspecies
457780: found no subspecies
457881: fixed!
457962: fixed!
458031: found no subspecies
458112: fixed!
458121: fixed!
458147: fixed!
458162: fixed!
458184: fixed!
458186: found no subspecies
458332: fixed!
458333: found no subspecies
458338: found no subspecies
458343: fixed!
458421: found no subspecies
458424: fixed!
458852: fixed!
458902: fixed!
459015: fixed!
459038: fixed!
459349: fixed!
459495: fixed!
459499: fixed!
459507: fixed!
459509: fixed!
459776: found no subspecies
459807: fixed!
459838: fixed!
459909: found no subspecies
460466: fixed!
461201: fixed!
461226: found no subspecies
461278: fixed!
461299: fixed!
461311: fixed!
461322: fixed!
461383: found no subspecies
464793: fixed!
464914: fixed!
465783: found no subspecies
466016: found no subspecies
466017: found no subspecies
470441: fixed!
470483: fixed!
470566: fixed!
470576: fixed!
470584: fixed!
470615: fixed!
470632: fixed!
470643: fixed!
470659: found no subspecies
470689: fixed!
470690: fixed!
470789: fixed!
470909: found no subspecies
470961: found no subspecies
470993: fixed!
470994: fixed!
470996: fixed!
472997: fixed!
475263: fixed!
475270: fixed!
475276: fixed!
475311: fixed!
475319: fixed!
475320: fixed!
475462: fixed!
475751: fixed!
475842: fixed!
479326: fixed!
486512: fixed!
486817: fixed!
497260: fixed!
506795: fixed!
506898: fixed!
508189: found no subspecies
508199: found no subspecies

Quadrinomials count: 474

@jonkerz
Copy link
Member Author

jonkerz commented Dec 20, 2019

Step 2: Re-create missing subspecies

Script
jonkerz = User.find(60)
Activity.execute_script_activity jonkerz, "Reified missing subspecies for quadrinomial, see %github819"

# For PaperTrail
def antcat_bot
  @antcat_bot ||= User.find(62)
end
PaperTrail.request.whodunnit = antcat_bot.id

$reified_with_issues = []
$reified_without_issues = []

def quadrinomials
  Subspecies.joins(:name).where("(LENGTH(names.name) - LENGTH(REPLACE(names.name, ' ', '')) >= 3) ")
end

def reify_subspecies subspecies_version
  puts "reify_subspecies.... version #{subspecies_version.id}".blue
  if (taxon = Taxon.where(id: subspecies_version.item_id).exists?)
    puts "taxon exists now".green
    return taxon
  end

  reified = subspecies_version.reify
  unless reified.is_a?(Subspecies)
    raise "reified subspecies is not a subspecies".red
  end

  unless reified.species
    puts "reified subspecies has no species".red
    return
  end

  if reified.status == Status::UNAVAILABLE_UNCATEGORIZED && reified.current_valid_taxon.nil?
    puts "#{subspecies_version.id}: reified subspecies has current_valid_taxon".red
    return
  end

  raise "Name #{reified.name_id} already exists" if Name.where(id: reified.name_id).exists?

  reified_name = PaperTrail::Version.where(item_type: "Name", item_id: reified.name_id).last&.reify
  raise "found no name to reify" unless reified_name

  Taxon.transaction do
    if reified_name && reified_name.save!
      reified.ichnotaxon ||= false
      reified.nomen_nudum ||= false
      reified.collective_group_name ||= false
      reified.update!(name: reified_name)
      puts "reified #{reified.id}".green
    end
  end

  if reified.persisted?
    if reified.soft_validation_warnings.present?
      puts "#{reified.id}: reified has soft-validation issues...".red
      $reified_with_issues << reified.id
      reified.create_activity :create, antcat_bot, edit_summary: "Reified missing subspecies for quadrinomial (taxon has soft-validation issues), see %github819"
    else
      $reified_without_issues << reified.id
      reified.create_activity :create, antcat_bot, edit_summary: "Reified missing subspecies for quadrinomial, see %github819"
    end
    return reified
  else
    raise "could not persist subspecies version #{subspecies_version.id}"
  end
end

quadrinomials.each do |taxon|
  name_string = taxon.name_cache
  raise "#{taxon.id}: has soft-validation issues" if taxon.soft_validation_warnings.size.positive?
  raise "#{taxon.id}: #{name_string} contains weird characters" unless name_string =~ /^[[:alpha:][:blank:]-]+$/

  target_subspecies_name_string = name_string.split[0..2].join(' ')

  subspecies_version = PaperTrail::Version.where(item_type: "Taxon").where("object LIKE ?", "%name_cache: #{target_subspecies_name_string}\n%").last
  if subspecies_version
    reify_subspecies subspecies_version
  else
    puts "no subspecies version".red
  end
end; nil

puts "reified_with_issues = #{$reified_with_issues}"
puts "reified_without_issues = #{$reified_without_issues}"
puts "all_reified = #{$reified_with_issues + $reified_without_issues}"
puts "Done"
Results
reified_with_issues = [497287, 505104, 504885, 491154, 505448, 491526, 490467, 503153, 464741, 490044, 490574, 465957, 491372, 491120, 490618, 490074, 468634, 505895, 503332, 506856, 493406, 474883, 489783, 479474, 504415]

reified_without_issues = [495560, 462941, 462965, 463175, 463496, 463950, 464622, 465638, 466281, 464310, 464240, 484921, 465222, 466309, 465243, 490480, 466590, 466785, 502707, 490110, 484327, 495209, 466360, 484067, 484544, 466679, 464484, 485279, 466965, 467194, 485524, 467899, 467839, 467768, 468269, 492242, 468586, 468180, 467863, 485569, 485547, 468912, 469146, 469078, 503579, 469088, 469835, 470938, 470915, 470764, 470233, 486559, 471473, 471524, 471512, 471780, 472253, 506348, 472871, 473260, 473004, 474014, 473690, 474020, 473799, 473844, 473985, 474100, 487479, 474205, 475486, 475633, 475542, 477616, 476479, 476610, 476616, 477908, 477856, 487997, 488218, 488179, 477886, 488255, 488112, 478282, 478521, 488143, 488261, 488100, 478098, 477876, 488161, 488205, 479026, 488823, 488707, 479605, 479793, 488826, 488811, 488936, 480053, 488954, 480405, 480867, 481533, 481701, 481039, 481857, 489463, 482134, 482352, 466921, 471750, 472177, 480387, 483108, 466001, 465681]

all_reified = [497287, 505104, 504885, 491154, 505448, 491526, 490467, 503153, 464741, 490044, 490574, 465957, 491372, 491120, 490618, 490074, 468634, 505895, 503332, 506856, 493406, 474883, 489783, 479474, 504415, 495560, 462941, 462965, 463175, 463496, 463950, 464622, 465638, 466281, 464310, 464240, 484921, 465222, 466309, 465243, 490480, 466590, 466785, 502707, 490110, 484327, 495209, 466360, 484067, 484544, 466679, 464484, 485279, 466965, 467194, 485524, 467899, 467839, 467768, 468269, 492242, 468586, 468180, 467863, 485569, 485547, 468912, 469146, 469078, 503579, 469088, 469835, 470938, 470915, 470764, 470233, 486559, 471473, 471524, 471512, 471780, 472253, 506348, 472871, 473260, 473004, 474014, 473690, 474020, 473799, 473844, 473985, 474100, 487479, 474205, 475486, 475633, 475542, 477616, 476479, 476610, 476616, 477908, 477856, 487997, 488218, 488179, 477886, 488255, 488112, 478282, 478521, 488143, 488261, 488100, 478098, 477876, 488161, 488205, 479026, 488823, 488707, 479605, 479793, 488826, 488811, 488936, 480053, 488954, 480405, 480867, 481533, 481701, 481039, 481857, 489463, 482134, 482352, 466921, 471750, 472177, 480387, 483108, 466001, 465681]

@jonkerz
Copy link
Member Author

jonkerz commented Jan 28, 2020

Batch 2: Quadrinomials where a Subspecies with the target name exists after missing subspecies were recreated

Script
jonkerz = User.find 60
EDIT_SUMMARY = "Convert quadrinomials to infrasubspecies [batch 2], see %github819"
Activity.execute_script_activity jonkerz, EDIT_SUMMARY

# For PaperTrail
antcat_bot = User.find 62
PaperTrail.request.whodunnit = antcat_bot.id

def quadrinomials
  Subspecies.joins(:name).where("(LENGTH(names.name) - LENGTH(REPLACE(names.name, ' ', '')) >= 3) ")
end

def puts_stats
  "Quadrinomials count: #{quadrinomials.count}"
end

def fix! soon_infrasubspecies, antcat_bot, edit_summary
  name_string = soon_infrasubspecies.name_cache
  soon_infrasubspecies_validation_issues = soon_infrasubspecies.soft_validations.failed.reject do |validation|
    validation.database_script.is_a?(DatabaseScripts::UnavailableUncategorizedTaxa)
  end
  raise "#{soon_infrasubspecies.id}: has soft-validation issues" if soon_infrasubspecies_validation_issues.present?
  raise "#{soon_infrasubspecies.id}: #{name_string} contains weird characters" unless name_string =~ /^[[:alpha:][:blank:]-]+$/

  target_subspecies_name_string = name_string.split[0..2].join(' ')
  target_subspecies = Subspecies.where(name_cache: target_subspecies_name_string)

  target_subspecies_count = target_subspecies.count
  if target_subspecies_count == 0
    puts "#{soon_infrasubspecies.id}: found no subspecies".blue
    return
  end

  if target_subspecies_count > 1
    raise "#{soon_infrasubspecies.id}: found too many subspecies"
  end

  the_target_subspecies = target_subspecies.first

  the_target_subspecies_validation_issues = the_target_subspecies.soft_validations.failed.reject do |validation|
    validation.database_script.is_a?(DatabaseScripts::UnavailableUncategorizedTaxa) ||
      validation.database_script.is_a?(DatabaseScripts::NonValidTaxaWithACurrentValidTaxonThatIsNotValid)
  end

  if the_target_subspecies_validation_issues.present?
    puts "target subspecies #{the_target_subspecies.id} has soft-validation issues"
    return
  end

  Taxon.transaction do
    soon_infrasubspecies.name.update!(type: 'InfrasubspeciesName')
    soon_infrasubspecies.update_columns(type: 'Infrasubspecies')

    infrasubspecies = Infrasubspecies.find(soon_infrasubspecies.id)
    infrasubspecies.update!(subspecies: the_target_subspecies)

    puts "#{soon_infrasubspecies.id}: fixed!".green

    if infrasubspecies.soft_validations.failed.present?
      puts "#{soon_infrasubspecies.id}: but it has soft-validation issues...".red
    end

    infrasubspecies.create_activity :update, antcat_bot, edit_summary: edit_summary
  end
end

puts_stats

quadrinomials.each do |taxon|
  fix! taxon, antcat_bot, EDIT_SUMMARY
end; nil

puts_stats

puts "Done"

@jonkerz
Copy link
Member Author

jonkerz commented Jul 15, 2020

Batch 3/4/5

Script

EDIT_SUMMARY = 'Convert quadrinomials to infrasubspecies [batch 5], see %github819'
ANTCATBOT = User.find_by!(name: 'AntCatBot')

# For activities and PaperTrail.
RequestStore.store[:current_request_uuid] = SecureRandom.uuid
PaperTrail.request.whodunnit = ANTCATBOT.id

def quadrinomials
  Subspecies.joins(:name).where("(LENGTH(names.name) - LENGTH(REPLACE(names.name, ' ', '')) >= 3) ")
end

def puts_stats
  puts <<~STATS
    Quadrinomials count: #{quadrinomials.count}
    Infrasubspecies count: #{Infrasubspecies.count}
  STATS
end

def fix! soon_infrasubspecies, antcat_bot, edit_summary
  name_string = soon_infrasubspecies.name_cache
  raise "#{soon_infrasubspecies.id}: has soft-validation issues" if soon_infrasubspecies.soft_validations.failed?
  raise "#{soon_infrasubspecies.id}: #{name_string} contains weird characters" unless name_string =~ /^[[:alpha:][:blank:]-]+$/

  target_subspecies_name_string = name_string.split[0..2].join(' ')
  target_subspecies = Subspecies.where(name_cache: target_subspecies_name_string)

  target_subspecies_count = target_subspecies.count
  if target_subspecies_count == 0
    puts "#{soon_infrasubspecies.id}: found no subspecies".blue
    return
  end

  if target_subspecies_count > 1
    raise "#{soon_infrasubspecies.id}: found too many subspecies"
  end

  the_target_subspecies = target_subspecies.first

  the_target_subspecies_validation_issues = the_target_subspecies.soft_validations.failed.reject do |validation|
    validation.database_script.is_a?(DatabaseScripts::NonValidTaxaWithACurrentTaxonThatIsNotValid)
  end

  if the_target_subspecies_validation_issues.present?
    puts "target subspecies #{the_target_subspecies.id} has soft-validation issues"
    return
  end

  Taxon.transaction do
    soon_infrasubspecies.name.update!(type: 'InfrasubspeciesName')
    soon_infrasubspecies.update_columns(type: 'Infrasubspecies')

    infrasubspecies = Infrasubspecies.find(soon_infrasubspecies.id)
    infrasubspecies.update!(subspecies: the_target_subspecies)

    puts "#{soon_infrasubspecies.id}: fixed!".green

    if infrasubspecies.soft_validations.failed.present?
      puts "#{soon_infrasubspecies.id}: but it has soft-validation issues...".red
    end

    infrasubspecies.create_activity :update, antcat_bot, edit_summary: edit_summary
  end
end

puts_stats

quadrinomials.each do |taxon|
  fix! taxon, ANTCATBOT, EDIT_SUMMARY
end; nil

Activity.execute_script_activity User.find_by!(name: 'Fredrik Palmkron'), EDIT_SUMMARY

puts_stats

puts "Done"


Output 3

Quadrinomials count: 248
Infrasubspecies count: 491




430288: fixed!
430355: fixed!
430770: fixed!
430771: fixed!
431244: fixed!
431284: fixed!
431285: fixed!
431426: found no subspecies
431824: fixed!
431825: fixed!
431902: found no subspecies
431925: fixed!
431949: found no subspecies
431955: found no subspecies
432115: found no subspecies
432243: fixed!
432298: fixed!
432303: fixed!
432304: fixed!
432306: fixed!
432326: fixed!
432337: found no subspecies
432435: fixed!
432495: fixed!
432650: fixed!
432663: found no subspecies
432683: fixed!
432723: fixed!
432747: fixed!
432758: found no subspecies
432770: fixed!
432837: fixed!
432926: found no subspecies
433012: fixed!
433082: fixed!
433083: fixed!
433085: fixed!
433149: fixed!
433167: found no subspecies
433209: fixed!
433261: fixed!
433293: found no subspecies
433336: fixed!
433338: fixed!
433385: fixed!
433396: fixed!
433522: fixed!
433622: fixed!
433644: fixed!
433776: fixed!
433841: fixed!
433858: fixed!
433930: found no subspecies
433946: fixed!
433968: fixed!
433992: fixed!
434372: fixed!
434373: fixed!
434378: fixed!
434379: fixed!
434384: fixed!
434394: fixed!
434398: fixed!
434413: found no subspecies
434415: fixed!
434442: fixed!
434492: fixed!
434504: fixed!
434511: fixed!
434621: fixed!
434740: fixed!
434857: found no subspecies
435181: fixed!
435246: fixed!
435373: fixed!
435378: fixed!
435396: found no subspecies
435431: fixed!
435450: found no subspecies
435470: fixed!
435481: fixed!
435512: fixed!
435530: fixed!
435555: found no subspecies
435753: fixed!
435773: fixed!
435801: fixed!
435856: fixed!
435875: fixed!
435962: fixed!
436060: fixed!
436066: fixed!
436092: fixed!
436104: fixed!
436324: found no subspecies
436329: found no subspecies
436341: found no subspecies
436358: fixed!
436359: found no subspecies
436768: fixed!
436916: fixed!
436923: fixed!
436925: fixed!
436942: fixed!
437006: fixed!
437108: fixed!
437266: fixed!
437293: fixed!
437354: fixed!
437382: fixed!
437384: fixed!
437455: fixed!
437575: fixed!
437579: fixed!
437627: fixed!
437753: fixed!
437827: fixed!
437830: fixed!
437841: found no subspecies
437846: fixed!
438026: found no subspecies
438160: fixed!
438169: fixed!
438172: found no subspecies
438492: fixed!
438500: fixed!
438524: fixed!
438542: fixed!
438571: fixed!
438647: fixed!
438936: fixed!
438974: fixed!
439015: found no subspecies
439365: fixed!
439546: fixed!
439558: fixed!
439569: found no subspecies
439583: fixed!
439602: fixed!
439603: found no subspecies
439604: found no subspecies
439649: found no subspecies
439654: found no subspecies
439843: fixed!
439891: fixed!
440011: fixed!
440051: fixed!
440066: fixed!
440079: fixed!
440108: fixed!
440134: fixed!
440151: fixed!
440176: found no subspecies
440177: found no subspecies
440185: fixed!
440196: fixed!
440230: fixed!
440237: fixed!
440251: fixed!
440342: found no subspecies
440354: fixed!
440356: fixed!
440486: fixed!
440523: fixed!
440551: fixed!
440570: fixed!
440716: found no subspecies
440720: fixed!
440728: fixed!
440778: fixed!
440795: fixed!
440830: found no subspecies
440863: fixed!
440888: found no subspecies
441392: fixed!
441425: found no subspecies
441484: fixed!
441666: fixed!
442478: fixed!
442494: fixed!
442516: fixed!
442539: fixed!
442586: fixed!
442589: fixed!
442600: fixed!
443125: fixed!
443340: fixed!
443659: found no subspecies
443752: fixed!
443828: fixed!
443954: fixed!
444639: found no subspecies
444871: found no subspecies
444876: found no subspecies
444886: found no subspecies
444888: found no subspecies
444941: found no subspecies
445123: fixed!
445189: fixed!
445249: fixed!
445627: fixed!
446125: fixed!
446357: found no subspecies
446714: fixed!
446734: fixed!
446742: found no subspecies
446748: fixed!
447218: fixed!
447267: fixed!
447429: fixed!
447675: fixed!
449304: found no subspecies
450089: fixed!
450090: found no subspecies
450175: found no subspecies
450184: found no subspecies
450227: fixed!
450252: fixed!
450288: fixed!
450307: found no subspecies
450351: fixed!
450356: fixed!
450370: found no subspecies
450480: fixed!
450618: fixed!
450639: found no subspecies
450681: found no subspecies
450748: found no subspecies
450760: found no subspecies
450796: fixed!
450906: fixed!
451087: found no subspecies
451357: found no subspecies
451376: fixed!
451399: fixed!
451424: fixed!
456717: found no subspecies
457066: found no subspecies
457083: found no subspecies
457255: found no subspecies
457703: found no subspecies
457780: found no subspecies
458031: found no subspecies
458333: found no subspecies
458338: found no subspecies
459909: found no subspecies
461383: found no subspecies
508189: found no subspecies



Quadrinomials count: 70
Infrasubspecies count: 669


Output 4

Quadrinomials count: 70
Infrasubspecies count: 670

431426: found no subspecies
431902: found no subspecies
431949: found no subspecies
431955: fixed!
432115: found no subspecies
432337: found no subspecies
432663: found no subspecies
432758: found no subspecies
432926: found no subspecies
433167: found no subspecies
433293: found no subspecies
433930: found no subspecies
434413: found no subspecies
434857: found no subspecies
435396: found no subspecies
435450: found no subspecies
435555: found no subspecies
436324: found no subspecies
436329: found no subspecies
436341: found no subspecies
436359: found no subspecies
437841: fixed!
438026: fixed!
438172: found no subspecies
439015: fixed!
439569: found no subspecies
439603: found no subspecies
439604: found no subspecies
439649: fixed!
439654: fixed!
440176: fixed!
440177: found no subspecies
440342: found no subspecies
440716: fixed!
440830: fixed!
440888: found no subspecies
441425: fixed!
443659: found no subspecies
444639: found no subspecies
444871: found no subspecies
444876: found no subspecies
444886: found no subspecies
444888: found no subspecies
444941: fixed!
446357: fixed!
446742: fixed!
449304: found no subspecies
450090: fixed!
450175: fixed!
450184: fixed!
450307: found no subspecies
450370: fixed!
450639: found no subspecies
450681: fixed!
450748: fixed!
450760: fixed!
451087: found no subspecies
451357: found no subspecies
456717: fixed!
457066: found no subspecies
457083: fixed!
457255: found no subspecies
457703: fixed!
457780: fixed!
458031: fixed!
458333: found no subspecies
458338: found no subspecies
459909: fixed!
461383: found no subspecies
508189: fixed!

Quadrinomials count: 43
Infrasubspecies count: 697

Output 4

Quadrinomials count: 43
Infrasubspecies count: 700

431426: fixed!
431902: fixed!
431949: fixed!
432115: fixed!
432337: fixed!
432663: fixed!
432758: fixed!
432926: fixed!
433167: fixed!
433293: fixed!
433930: fixed!
434413: fixed!
434857: fixed!
435396: fixed!
435450: fixed!
435555: fixed!
436324: fixed!
436329: fixed!
436341: fixed!
436359: fixed!
438172: fixed!
439569: fixed!
439603: fixed!
439604: fixed!
440177: fixed!
440342: fixed!
440888: fixed!
443659: fixed!
444639: fixed!
444871: fixed!
444876: fixed!
444886: fixed!
444888: fixed!
449304: fixed!
450307: fixed!
450639: fixed!
451087: fixed!
451357: fixed!
457066: fixed!
457255: fixed!
458333: fixed!
458338: fixed!
461383: fixed!

Quadrinomials count: 0
Infrasubspecies count: 743

@jonkerz jonkerz closed this as completed Jul 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant