In [1]:
from bs4 import BeautifulSoup
import requests
import json
import os

def load_tvtropes_for_media(media):
    cached_file_name = f"{media.replace('/', '_', 1)}.json"
    
    if os.path.exists(cached_file_name):
        # if it already exists, we're done!
        return
    
    r = requests.get(f"https://tvtropes.org/pmwiki/pmwiki.php/{media}")
    soup = BeautifulSoup(r.text)
    main_article = soup.find(id="main-article")

    media_vector = {}
    main_prefix = '/pmwiki/pmwiki.php/Main/'
    for link in main_article.find_all('a'):
        href = link.get('href')
        if not href.startswith(main_prefix):
            continue
        media_vector[href[len(main_prefix):]] = 1

    with open(cached_file_name, "w") as fp:
        json.dump(media_vector, fp)

## quick test

A quick test of this flow with some fake data:

In [11]:
!cat Series_TheTaleOfFoo.json

{
    "Foo": 1,
    "Bar": 1
}

In [12]:
!cat Series_FooTwoElectricBoogaloo.json

{
    "Bar": 1,
    "Baz": 1
}

Half of the tropes overlap! What does our similarity calculator say?

In [3]:
load_tvtropes_for_media("Film/FooTwoElectricBoogaloo")

In [6]:
!cargo run Series/TheTaleOfFoo Series/FooTwoElectricBoogaloo

[K[0m[0m[1m[32m    Finished[0m dev [unoptimized + debuginfo] target(s) in 0.00s         ] 2/3
[0m[0m[1m[32m     Running[0m `target/debug/cosine-similarity Series/TheTaleOfFoo Series/FooTwoElectricBoogaloo`
0.4999999999999999


## a few test cases

Sometimes the list of tropes for a work gets so long that it has to get split into multiple pages. Popular, complex ("troperific"), and longer works tend to get this treatment. We haven't built support for this into the script, so don't pick something _too_ huge, like Naruto or Buffy The Vampire Slayer.

### Suicide Squad versus Alice in Wonderland

These couldn't be more different, right?

In [13]:
load_tvtropes_for_media("Film/SuicideSquad2016")

In [14]:
load_tvtropes_for_media("Literature/AlicesAdventuresInWonderland")

In [16]:
!cargo run Film/SuicideSquad2016 Literature/AlicesAdventuresInWonderland

[K[0m[0m[1m[32m   Compiling[0m cosine-similarity v0.1.0 (/spell)                        ] 0/3
[K[0m[0m[1m[32m    Finished[0m dev [unoptimized + debuginfo] target(s) in 0.73s         ] 2/3
[0m[0m[1m[32m     Running[0m `target/debug/cosine-similarity Film/SuicideSquad2016 Literature/AlicesAdventuresInWonderland`
Similarity score: 0.055827312578945035
These works both feature the following tropes: ["AdaptationalVillainy", "Mooks", "Foreshadowing", "VisualPun", "ThePardon", "BrickJoke", "OnlySaneMan", "RealLife", "CrazyPrepared", "AnAesop", "BodyHorror", "TheCameo", "UnusuallyUninterestingSight", "VillainProtagonist", "TropeNamer"]


### La La Land versus Moonlight

Two film buff favorites.

In [17]:
load_tvtropes_for_media("Film/LaLaLand")

In [18]:
load_tvtropes_for_media("Film/Moonlight2016")

In [19]:
!cargo run Film/LaLaLand Film/Moonlight2016

[K[0m[0m[1m[32m    Finished[0m dev [unoptimized + debuginfo] target(s) in 0.00s         ] 2/3
[0m[0m[1m[32m     Running[0m `target/debug/cosine-similarity Film/LaLaLand Film/Moonlight2016`
Similarity score: 0.050043459373697946
These works both feature the following tropes: ["Homage", "BabiesEverAfter", "SpiritualAntithesis", "NiceGuy", "Irony", "AmicableExes"]


### Django Unchained versus The Hateful Eight

Both Quentin Tarentino Westerns, so these should overlap quite a bit. And they do!

In [20]:
load_tvtropes_for_media("Film/DjangoUnchained")

In [21]:
load_tvtropes_for_media("Film/TheHatefulEight")

In [22]:
!cargo run Film/DjangoUnchained Film/TheHatefulEight

[K[0m[0m[1m[32m    Finished[0m dev [unoptimized + debuginfo] target(s) in 0.00s         ] 2/3
[0m[0m[1m[32m     Running[0m `target/debug/cosine-similarity Film/DjangoUnchained Film/TheHatefulEight`
Similarity score: 0.21731656198624275
These works both feature the following tropes: ["LampshadeHanging", "ButtMonkey", "LargeHam", "PunctuatedForEmphasis", "GroinAttack", "GunTwirling", "MoodWhiplash", "CruelAndUnusualDeath", "AssholeVictim", "SpannerInTheWorks", "InUniverse", "HighPressureBlood", "TakeThat", "DragonAscendant", "TheWestern", "AlternateHistory", "EvenEvilHasLovedOnes", "MoralEventHorizon", "PlayedWith", "BountyHunter", "BittersweetEnding", "FauxAffablyEvil", "HypocriticalHumor", "RunningGag", "GenreThrowback", "CrazyPrepared", "GoryDiscretionShot", "NiceHat", "BadassLongcoat", "EvilIsHammy", "MeaningfulName", "ColdBloodedTorture", "RapeAsDrama", "KarmicDeath", "MaleFrontalNudity", "InstantDeathBullet", "RealityEnsues", "SmashCut", "AskAStupidQuestion", "Alliterati

### The Terminator versus The Terminator 2

Direct sequels should be very close.

In [23]:
load_tvtropes_for_media("Film/TheTerminator")

In [24]:
load_tvtropes_for_media("Film/Terminator2JudgmentDay")

In [25]:
!cargo run Film/TheTerminator Film/Terminator2JudgmentDay

[K[0m[0m[1m[32m    Finished[0m dev [unoptimized + debuginfo] target(s) in 0.00s         ] 2/3
[0m[0m[1m[32m     Running[0m `target/debug/cosine-similarity Film/TheTerminator Film/Terminator2JudgmentDay`
Similarity score: 0.22974188043864677
These works both feature the following tropes: ["VillainsBlendInBetter", "WrongGenreSavvy", "MonsterThreatExpiration", "NeverendingTerror", "OnlyAFleshWound", "SequelHook", "ContrivedCoincidence", "InvokedTrope", "CurbstompCushion", "StealthPun", "BadFuture", "Foreshadowing", "EvilDetectingDog", "CantTakeAnythingWithYou", "TrashcanBonfire", "InsistentTerminology", "ScrewDestiny", "AIIsACrapshoot", "TropeCodifier", "ShotgunsAreJustBetter", "FailedASpotCheck", "EnemyRisingBehind", "GunPorn", "TookALevelInBadass", "TheCameo", "SuddenlyShouting", "TimeTravel", "BilingualBonus", "EyeScream", "UncannyValley", "MoodWhiplash", "BittersweetEnding", "LaResistance", "MoreDakka", "NakedOnArrival", "HeroicSacrifice", "AvertedTrope", "ChekhovsGun", "Bo

## Moonrise Kingdom versus The Grand Budapest Hotel

Trey: "every Wes Anderson movie is the same".

In [26]:
load_tvtropes_for_media("Film/MoonriseKingdom")

In [27]:
load_tvtropes_for_media("Film/TheGrandBudapestHotel")

In [28]:
!cargo run Film/MoonriseKingdom Film/TheGrandBudapestHotel

[K[0m[0m[1m[32m    Finished[0m dev [unoptimized + debuginfo] target(s) in 0.00s         ] 2/3
[0m[0m[1m[32m     Running[0m `target/debug/cosine-similarity Film/MoonriseKingdom Film/TheGrandBudapestHotel`
Similarity score: 0.06993363769976844
These works both feature the following tropes: ["WordOfGod", "ShoutOut", "EveryoneCallsHimBarkeep", "DistinguishedGentlemansPipe", "MeaningfulName", "SecretTestOfCharacter", "AllStarCast", "JerkWithAHeartOfGold", "ReasonableAuthorityFigure"]
