# Data Analytics: Exploring Specific Tags

Cara Marta Messina <br/>
Northeastern University<br/>
messina [dot] c [at] husky [dot] neu [dot] edu

This notebook takes data collected from <em>Archive of Our Own</em>, a popular fanfiction repository, and sets it up to be analyzed. The data was collected using [this AO3 python scraper](https://github.com/radiolarian/AO3Scraper). The corpus consists of <em>The Legend of Korra</em> and <em>Game of Thrones</em> fanfics, from the first one published on AO3 to 2019.

<em>This notebook is part of the Critical Fan Toolkit, Cara Marta Messina's public + digital dissertation</em>

In [2]:
#pandas for working with dataframes
import pandas as pd

#regular expression library
import re

#numpy specifically works with numbers
import numpy as np

from nltk import word_tokenize

import string
punctuations = list(string.punctuation)

#has the nice counter feature for counting tags
import collections
from collections import Counter 

#for making a string of elements separated by commas into a list
from nltk.tokenize.punkt import PunktSentenceTokenizer, PunktLanguageVars 

#visualizations
#import plotly as py
#import plotly.express as px
#import plotly.graph_objs as go
#from plotly.subplots import make_subplots

#calling my plotly thing
#import chart_studio
#chart_studio.tools.set_credentials_file(username='caramessina', api_key='IdA4LjtaqYKmFJnfS8Uv')

## Reading and Prepping the Data

In [94]:
korra_all = pd.read_csv('./data/allkorra.csv')
korra_all.head(3)

Unnamed: 0.1,Unnamed: 0,work_id,title,rating,category,fandom,relationship,character,additional tags,language,...,status,status date,words,chapters,comments,kudos,bookmarks,hits,body,month
0,0,6388009,A More Perfect Union,General Audiences,Gen,Avatar: Legend of Korra,,"Noatak (Avatar), Tarrlok (Avatar), Amon (Avatar)",Alternate Universe,English,...,Updated,2018-03-14,8139,4/?,11.0,27.0,4.0,286.0,He's forgotten how to be warm. The thought wou...,2016-03
1,1,13974048,let go,Teen And Up Audiences,F/F,Avatar: Legend of Korra,Lin Beifong/Korra,"Korra (Avatar), Lin Beifong","just a quick one-shot i never posted properly,...",English,...,Completed,2018-03-14,996,1/1,,,,41.0,"""Korra."" Somewhere distant. Someone holding h...",2018-03
2,2,13720947,Is This Your Dog?,General Audiences,F/F,Avatar: Legend of Korra,Korra/Asami Sato,"Korra (Avatar), Asami Sato, Tonraq (Avatar), M...","Alternate Universe - College/University, Alter...",English,...,Updated,2018-03-14,7585,5/?,111.0,318.0,18.0,3637.0,Korra's father kneeled down in front of her so...,2018-02


In [3]:
#reading in multiple csv files, since one large one breaks my kernels 

got0 = pd.read_csv(r'./data/got_data_clean/got0.csv')
got1 = pd.read_csv(r'./data/got_data_clean/got1.csv')
got2 = pd.read_csv(r'./data/got_data_clean/got2.csv')
got3 = pd.read_csv(r'./data/got_data_clean/got3.csv')

merged_got = pd.concat([got0, got1, got2, got3])
merged_got.head(3)

Unnamed: 0.1,Unnamed: 0,work_id,title,published,rating,character,relationship,additional tags,category,body,month
0,0,19289563,"game of thrones,",2019-06-20,"explicit,","arya stark, bella, gendry baratheon - characte...","gendry baratheon/arya stark, gendry baratheon/...",,"multi,",authors note: this is really short but i promi...,2019-06
1,1,17179712,"game of thrones,",2018-12-27,"teen and up audiences,","jon snow | aegon targaryen, arya stark, sansa ...","jon snow/daenerys targaryen, arya stark/gendry...","armies and allies, war, romance, eventual happ...","f/m,",arya's chambers still felt different.\n\n \n\n...,2018-12
2,2,2352779,"game of thrones,",2014-09-24,"teen and up audiences,",,"xander harris/spike,","established relationship, drabble,","m/m,","""it's asking again"". \n\n""then press the blood...",2014-09


In [4]:
merged_got.count()

Unnamed: 0         29897
work_id            29897
title              29897
published          29897
rating             29896
character          28913
relationship       27733
additional tags    27118
category           28472
body               29897
month              29897
dtype: int64

## Creating the Functions: Extracting Metadata

created a function that will take the different tags (which are phrased as characterA/characterB, characterA/characterB, etc in the data) and count the most common relationships to then output it as the most common relationship tags used. 

In [5]:
def column_to_list(df,columnName):
    '''
    this function takes all the information from a specific column, joins it to a string, and then tokenizes & cleans that string.
    input: the name of the dataframe and the column name
    output: the tokenized list of the text with all lower case, punctuation removed, and no stop words
    '''
    df[columnName] = df[columnName].replace(np.nan,'',regex=True) 
    string = ' '.join(df[columnName].tolist())
    return string

In [6]:
def clean_tokens(string):    
    stopwords = ['i', 'me', 'my', 'myself', "“", "”", 'we', 'our', '’', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 'same', 'so', 'than', 'too', 'very', 's', 't', 'can', 'will', 'just', 'don', "don't", 'should', "should've", 'now', 'd', 'll', 'm', 'o', 're', 've', 'y', 'ain', 'aren', "aren't", 'couldn', "couldn't", 'didn', "didn't", 'doesn', "doesn't", 'hadn', "hadn't", 'hasn', "hasn't", 'haven', "haven't", 'isn', "isn't", 'ma', 'mightn', "mightn't", 'mustn', "mustn't", 'needn', "needn't", 'shan', "shan't", 'shouldn', "shouldn't", 'wasn', "wasn't", 'weren', "weren't", "would", "could", 'won', "won't", 'wouldn', "wouldn't"]
    text_lc = [word.lower() for word in string]
    text_tokens_clean = [word for word in text_lc if word not in stopwords]
    text_tokens_clean = [word for word in text_tokens_clean if word not in punctuations]
    return text_tokens_clean
    print(text_tokens_clean[:20])

In [7]:
def TagsAnalyzer(df, columnName):
    '''
    input: the index month+year, such as '2012-04', and the specific metadata, such as 'additional tags'
    output: a tupple of the count of tags in a specific month/year
    
    load in the proper data into a string'''
    
    #replace empty values & make a list of all the words
    string = column_to_list(df, columnName)
    
    #the function to tokenize, or put each value as an element in a list
    class CommaPoint(PunktLanguageVars):
        sent_end_chars = (',') 
    tokenizer = PunktSentenceTokenizer(lang_vars = CommaPoint())
    
    #tokenizing the list of strings based on the COMMA, not the white space (as seen in the CommaPoint above)
    ListOfTags = tokenizer.tokenize(string)
    
    #the "Counter" function is from the collections library
    allCounter=collections.Counter(ListOfTags)
    
    li = []
    #returning a dictionary in which the keys are all the tags, and the items are the counts
    for word0,word1 in allCounter.most_common(300):
        li.append((word0, (word1/length)*100))

#     #return
    return li

In [8]:
def extract_tags(dictionary,search_term,count):
    '''
    input: a dictionary to be searched, the term you want to search, and the total number of tags or whatever you count
    output: all the items in which your specific term appears and the ratio of times each one appears
    '''
    for tag, number in dictionary.items():    # for name, age in dictionary.iteritems():  (for Python 2.x)
        if search_term in tag:
            print(tag, number, (number/count)*100)

## The Legend of Korra: Exploring Additional Tags

I've already done some of this work, but I want to add more potential writers to the list. What is really interesting about these tags, though, is there are many less of them. I wonder, because Korra/Asami is the "norm", if there needs to be a labeling of difference. For example, Korrasami/KorraxKuvira is much more used than other relationships in this fandom, so being explicitly open about its queerness isn't important. Tags don't need to be used to define race/queerness because these identities are already so embedded in both the show and the fanfics.

For example, feminism and pointing to feminism does <em>not</em> really happen.

### Race
- canon character of color
- male character of color
- character of color
- lgbtq female characters of color
- 'of color' is a tag to look at

### Queer/Lesbian/Gay/Gender
- socially awkward queer girls
- genderqueer korra
- okay it would be smut but i started writing a lesbian adventure romcom
- kya being a horribly flirtatious lesbian
- nonbinary lesbians
- gaycation all i ever wanted
- everyone's gay/everyone is gay
- idk how else to describe this but gay gay gay
- older gays giving advice to younger gays
- drag queen/drag queens
- trans
- trans female character
- trans characters
- trans!korraxasami
- pre-transition
- for trans!korra to help dysphoria the other way around
- genderqueer 
- gender dysphoria/euphoria
- gender queer characters

In [None]:
tlok_AT_count = korra_all['additional tags'].str.len().sum()

In [96]:
#All of the Additional Tags used in the Game of Thrones fandom
tlok_AT = TagsAnalyzer(korra_all,'additional tags')

In [99]:
extract_tags(tlok_AT,'color',tlok_AT_count)
extract_tags(tlok_AT,'race',tlok_AT_count) #...master race?
extract_tags(tlok_AT,'queer',tlok_AT_count)
extract_tags(tlok_AT,'lesbian',tlok_AT_count)
extract_tags(tlok_AT,'gay',tlok_AT_count)
extract_tags(tlok_AT,'drag',tlok_AT_count)
extract_tags(tlok_AT,'bisex',tlok_AT_count)
extract_tags(tlok_AT,'trans',tlok_AT_count)
extract_tags(tlok_AT,'feminis',tlok_AT_count) #less feminist tags coming up–FASCINATING
extract_tags(tlok_AT,'gender',tlok_AT_count)

seeing your soul mate makes you see in color, 1 0.00019161823515773054
Waterbending master race, 2 0.00038323647031546107
water tribe master race, 1 0.00019161823515773054
Watertribe master race Mako - Freeform, 1 0.00019161823515773054
Korracentric howrra, 1 0.00019161823515773054
Korracentric  Embarrassment, 1 0.00019161823515773054
Katara the queer uniter, 1 0.00019161823515773054
useless queers, 1 0.00019161823515773054
useless queers Yuri, 1 0.00019161823515773054
queer  Alternate Universe - Canon Divergence, 1 0.00019161823515773054
queer, 4 0.0007664729406309221
Genderqueer, 5 0.0009580911757886527
Genderqueer Korra, 2 0.00038323647031546107
other queer characters, 1 0.00019161823515773054
Surfing AU Genderqueer Korra, 1 0.00019161823515773054
Genderqueer bolin Red Lotus Korra, 1 0.00019161823515773054
gender queer character, 1 0.00019161823515773054
queer dorks, 1 0.00019161823515773054
Genderqueer Character Pre-Canon, 1 0.00019161823515773054
Genderqueer Character  Fluff, 1 0.

## Game of Thrones: Exploring Additional Tags

In order to better discover <em>Game of Thrones</em> fanfics that are thinking about race, sexuality, and gender in nuanced ways, I will first search for particular tags that are used and create a list of tags here that I will be using to then chose the particular fics I want to read.

### Race
- female character of color
- character of color
- <strong>pov character of color</strong>
- <strong>bisexual female character of color</strong>
- <strong>canon character of color</strong>
- <strong>canon queer character of color</strong>
- lesbian character of color
- lgbtq character(s) of color
- <strong>male character of color</strong>
- characters of color
- lgbtq character of color
- <strong>race relations</strong>
- <strong>racebending</strong>
- race
- alternate universe - race changes
- <strong>reference to race relations</strong>
- different races (not interesting)

### Racism
- fantastic racism
- racism
- period-typical racism
- magical racism (not interesting)
- racism allegory
- just a little bit of valyrian racism

### Gender
- gender issues
- genderfuck
- <strong>genderfluid character</strong>
- gender roles
- gender equality
- gender confusion
- transgender
- gender identity
- arya is genderqueer
- gender neutrality
- gender dysphoria
- <strong>oberyn gives zero fucks about your gendered shit</strong>
- gender-neutral pronouns
- genderqueer jon
- <strong>masculine presenting gender neutral jon</strong>
- transgender teens
- <strong>fuck gender roles</strong>

### Queer
- queer character
- <strong>original character is an autistic queer woman of colour</strong>
- cersei and jaime are genderqueer
- queer
- queer themes
- <strong>canon queer character of color</strong>
- canon queer character
- pov queer character
- everyone is queer
- arya is queer
- <strong>almost everyone is queer</strong>
- queer characters
- queer youth
- queer identity
- queer friendships
- canon queer relationship
- queer queens
- bisexual jaime is here and queer
- queer daenerys
- queer culture
- <strong>trans character</strong>
- transphobic language
- canon lgbtq female character
- <strong>jon is a trans man please stop telling me to tag this 'female jon snow'</strong>
- jon snow is trans
- <strong>alternate universe - trans</strong>
- trans! margaery
- trans!jon snow
- <strong>trans!tywin lannister</strong>
- trans daenerys
- i edited it to make it trans inclusive
- trans female character
- <strong>trans!arya stark</strong>
- trans loras

### Lesbian
There are a lot of results here. I think just keeping "lesbian" and "queer" as basic search terms already pulls up what I need it to.

### Fem/Masc (and feminism)
- <strong>feminist themes</strong>
- female sexuality
- female-centric
- strong female characters
- bisexual female character
- trans female character
- lack of female rights
- <strong>calling the show out on its overuse of female nudity</strong>
- <strong>there's not enough femslash here</strong>
- female jon snow is also autistic
- feminist wives everywhere
- <strong>the idealization of violence makes for shitty feminism</strong>
- feminism
- <strong>my response to 'female characters can only have growth through physical pain</strong>
- feminism. idk.
- <strong>because fuck d&d and their offending representation of female relationships</strong>
- <strong>au in which female characters have agency</strong>
- <strong>tyrion lannister is a feminist icon</strong>
- <strong>robb stark doesn't have time for your hypermasculine bullshit</strong>
- toxic masculinity
- <strong>tormund is my feminist</strong>

### Consent
- <strong>explicit description of consent because that's important y'all</strong>
- <strong>explicit consent</strong>
- consensual non-consent
- enthusiastic consent
- so much consent
- dubious consent due to identity issues
- consentual sex
- consenting adults
- consent is v important folks
- consent is sexy
- <strong>look d&d you can't consent when you're that drunk</strong>
- <strong>jaime has a consent kink</strong>
- consent is sexy kids
- but everyone consents
- comment on consent (hmm what is this one)

### Loras
- loras' pov
- fem!loras
- <strong>loras is king of coming out parties</strong>
- pov loras tyrell
- <strong>loras doesn't get the recognition he deserves</strong>

### Missandei/Grey Worm
- <strong>missandei deserved better</strong>
- <strong>bisexual missandei</strong>
- daenerys targaryen/missandei
- <strong>inspired by grey worm and missandei goodbye scene</strong>
- <strong>dany's army now waiting for missandei's word</strong>
- <strong>i haven't coped with missandei's death</strong>
- <strong>a fix it fic because its what missandei deserves goddamnit</strong>
- missandei is an angel
- missandei is a precious angel who must be protected
- daenerys and missandei have to say goodbye
- brotp arya & missandei (instead of OTP, it's a friendship pairing)
- <strong>grey worm deserved better</strong>
- <strong>y'all need to write stories about the trauma grey worm is going through</strong>

### Misc
- <strong>slavery abolition</strong>
- non-romanticized slavery – not GOT
- there is no canon to diverge from anymore-freedom is now (no idea what this means, but interested)

In [91]:
merged_got.count()

Unnamed: 0         29897
work_id            29897
title              29897
published          29897
rating             29896
relationship       27733
additional tags    29897
category           28472
body               29897
month              29897
dtype: int64

In [12]:
got_AT_count = merged_got['additional tags'].str.len().sum()

In [10]:
#All of the Additional Tags used in the Game of Thrones fandom
got_AT = TagsAnalyzer(merged_got,'additional tags')

In [14]:
extract_tags(got_AT,'color',got_AT_count)
extract_tags(got_AT,'race',got_AT_count)
extract_tags(got_AT,'racism',got_AT_count)
extract_tags(got_AT,'black people',got_AT_count)
extract_tags(got_AT,'gender',got_AT_count)
extract_tags(got_AT,'queer',got_AT_count)
extract_tags(got_AT,'lesbian',got_AT_count)
extract_tags(got_AT,'gay',got_AT_count)
extract_tags(got_AT,'nonbinary',got_AT_count)
extract_tags(got_AT,'genderqueer',got_AT_count)
extract_tags(got_AT,'missandei',got_AT_count)
extract_tags(got_AT,'grey worm',got_AT_count)
extract_tags(got_AT,'dothraki',got_AT_count)
extract_tags(got_AT,'khal',got_AT_count)
extract_tags(got_AT,'dysphoria',got_AT_count)
extract_tags(got_AT,'slave',got_AT_count)
extract_tags(got_AT,'drag ',got_AT_count)
extract_tags(got_AT,'abolition',got_AT_count)
extract_tags(got_AT,'freedom',got_AT_count)
extract_tags(got_AT,'revolution',got_AT_count)
extract_tags(got_AT,'rebellion',got_AT_count)
extract_tags(got_AT,"y'all",got_AT_count)
extract_tags(got_AT,"consent",got_AT_count)
extract_tags(got_AT,"trans",got_AT_count)
extract_tags(got_AT,"loras",got_AT_count)
extract_tags(got_AT,"fem",got_AT_count)
extract_tags(got_AT,"masc",got_AT_count)
extract_tags(got_AT,"feminis",got_AT_count)
extract_tags(got_AT,"heterono",got_AT_count)
extract_tags(got_AT,"autistic",got_AT_count)

canon character of color, 6 0.00014151741100500712
canon queer character of color, 2 4.717247033500237e-05
lgbtq character of color, 3 7.075870550250356e-05
character(s) of color, 5 0.00011793117583750594
female character of color, 11 0.000259448586842513
pov character of color, 4 9.434494067000474e-05
eye color, 1 2.3586235167501186e-05
colorblind soulmate au, 1 2.3586235167501186e-05
color of blood, 1 2.3586235167501186e-05
rose colored glasses fic, 1 2.3586235167501186e-05
sandor's colorful language, 1 2.3586235167501186e-05
soulmate seeing color au, 1 2.3586235167501186e-05
white is the color of the day, 1 2.3586235167501186e-05
true colors, 1 2.3586235167501186e-05
color soulmate au, 1 2.3586235167501186e-05
seeing color au, 1 2.3586235167501186e-05
lesbian character of color, 2 4.717247033500237e-05
character of color, 1 2.3586235167501186e-05
her true colors will be revealed by the end, 1 2.3586235167501186e-05
setting: late 1800's colorado, 1 2.3586235167501186e-05
male charact

gay jon, 1 2.3586235167501186e-05
gay marriage in the middle ages, 1 2.3586235167501186e-05
this is gay, 1 2.3586235167501186e-05
im gay, 1 2.3586235167501186e-05
everything is gay, 1 2.3586235167501186e-05
because i love soft gays, 1 2.3586235167501186e-05
hey you're my prisoner but you intimidated me and i'm gay so let's bang, 1 2.3586235167501186e-05
lommy is forever fun & gay, 1 2.3586235167501186e-05
all the gay, 1 2.3586235167501186e-05
(but elia will probably feature later because i'm so gay for her), 1 2.3586235167501186e-05
loras tyrell is a gay crab who's in love with a gay merman renly, 1 2.3586235167501186e-05
it's gay, 1 2.3586235167501186e-05
gay porn, 1 2.3586235167501186e-05
arya is a little bit gay for margaery, 1 2.3586235167501186e-05
uhhh idk this is gonna be really gay ok, 1 2.3586235167501186e-05
knights sitting about doing nothing except being gay, 1 2.3586235167501186e-05
gay bros theon and robb, 1 2.3586235167501186e-05
basically everyone is a big gay mess, 1 2

drag queens, 2 4.717247033500237e-05
she's going off to drag hot pie back, 1 2.3586235167501186e-05
slight drag queen elements, 1 2.3586235167501186e-05
slavery abolition, 1 2.3586235167501186e-05
freedom, 7 0.0001651036461725083
sexual freedom at camp lannister, 1 2.3586235167501186e-05
sansa + freedom, 1 2.3586235167501186e-05
freedom fighters, 2 4.717247033500237e-05
there is no canon to diverge from anymore-freedom is now., 1 2.3586235167501186e-05
revolutions, 1 2.3586235167501186e-05
french revolution, 3 7.075870550250356e-05
revolution, 11 0.000259448586842513
revolutionaries, 1 2.3586235167501186e-05
american revolution au, 1 2.3586235167501186e-05
industrial revolution, 1 2.3586235167501186e-05
american revolution, 1 2.3586235167501186e-05
mexican revolution background, 1 2.3586235167501186e-05
vive la revolution, 1 2.3586235167501186e-05
revolutionist daenerys, 1 2.3586235167501186e-05
rebellion, 19 0.0004481384681825226
robert's rebellion, 85 0.002004829989237601
no robert's

ugly duckling!loras, 1 2.3586235167501186e-05
loras is put on trial, 1 2.3586235167501186e-05
loras is loud, 1 2.3586235167501186e-05
petulant loras is petulant, 1 2.3586235167501186e-05
loras is really just a lost puppy, 1 2.3586235167501186e-05
loras in mourning, 1 2.3586235167501186e-05
loras is a brat, 1 2.3586235167501186e-05
(renloras is more a side pairing), 1 2.3586235167501186e-05
loras tyrell mentioned, 1 2.3586235167501186e-05
loras gets annoyed by girls, 1 2.3586235167501186e-05
unexplained past loras/renly breakup, 1 2.3586235167501186e-05
ballet dancer loras tyrell, 1 2.3586235167501186e-05
trans loras, 1 2.3586235167501186e-05
periods are tough on young men huh loras, 1 2.3586235167501186e-05
renly and loras are bastards, 1 2.3586235167501186e-05
loras survived the sept, 1 2.3586235167501186e-05
loras is mistaken for a girl (easy mistake to make), 1 2.3586235167501186e-05
loras & theon are disaster gays, 1 2.3586235167501186e-05
margaery and loras pine together, 1 2.3586

middle age feminism, 1 2.3586235167501186e-05
everyone hates heteronormativity, 1 2.3586235167501186e-05
autistic headcanons, 1 2.3586235167501186e-05
autistic arya, 1 2.3586235167501186e-05
autistic ned, 1 2.3586235167501186e-05
autistic sansa too, 1 2.3586235167501186e-05
autistic character, 2 4.717247033500237e-05
autistic sansa implied...a fic should be coming soon !!, 1 2.3586235167501186e-05
autistic lila barton, 1 2.3586235167501186e-05
autistic sansa stark, 1 2.3586235167501186e-05
autistic!stannis, 1 2.3586235167501186e-05
autistic!ned, 1 2.3586235167501186e-05
original character is an autistic queer woman of colour, 1 2.3586235167501186e-05
autistic!melisandre, 1 2.3586235167501186e-05
female jon snow is also autistic, 1 2.3586235167501186e-05
jaime is autistic, 1 2.3586235167501186e-05
sansa is autistic, 1 2.3586235167501186e-05
autistic arthur dayne, 1 2.3586235167501186e-05


## Extracting Fics that Use These Tags

Using the tag list above, I will extract fics that use these tags to determine if I want to read the fics and reach out to the authors!

Instead of putting all this information in the Jupyer Notebook, though, I will be putting it all in a Google Sheet where I will be keeping track of all this information

These are the tags I am choosing to extract:
- Missandei
- Grey Worm
- feminis (femism, femist)
    - the idealization of violence makes for shitty feminism
    - feminist themes
    - feminism
- trans!
- nonbinary
    - nonbinary character
    - nonbinary arya
- transge
- femal
    - my response to 'female characters can only have growth through physical pain
    - female character in command
    - female friendships yo
- gay
- bisex
- LGBTQ
- dothraki
- heteronormativity
- lesbian
- dysphoria
    - body dysphoria
- gender
    - genderqueer
    - genderqueer jon
    - genderqueer character
    - genderfluid arya
    - gender role expectations
    - oberyn gives zero fucks about your gendered shit
    - gender confusion
    - genderbending
    - misgenderin
- abolition
    - slavery abolition
- of color
    - canon character of color
    - character(s) of color
    - lgbtq character(s) of color
- of colour
- masc
- queer
- ' race' (put the space before race so it's not a word like 'trace')
    - racebend 
    - 
- drag (specifically "drag " with a space at the end to eliminate dragons)
    - drag queen

Decided not to include:
- racism, as this is usually a justification to include racism, claiming it is "canonical" or part of the world
- consent, because most things that mention consent use the tag 'dubious consent.' There are some tags that emphasize the need for consent, but these are much more rare.
- slash, as often slash is centered around two male characters, both erasing women and fetishizing gay men


In [151]:
def find_text(search_term):
    '''
    Simple function so I dont need to copy-and-paste over and over
    '''
    show = got_all[got_all['additional tags'].str.contains(search_term)]
    print(show)

In [115]:
Got_LGBTQ = merged_got[merged_got['additional tags'].str.contains("slash|lgbtq|feminis|drag |nonbinary|bisex|heteronormativity|trans!|transg|dysphoria|gay| bi |lesbian|genderqu|gender iden|gender neu|masc|queer", na=False)]
Got_LGBTQ.count()

Unnamed: 0         1011
work_id            1011
title              1011
published          1011
rating             1011
relationship        982
additional tags    1011
category            986
body               1011
month              1011
dtype: int64

In [142]:
GoT_race_AT = merged_got[merged_got['additional tags'].str.contains("missandei|grey worm|dothraki|abolition|of color|of colour| race|naath|meereen|oberyn|ellaria|dorne|coloni", na=False)]
GoT_race_AT.count()

Unnamed: 0         348
work_id            348
title              348
published          348
rating             348
relationship       325
additional tags    348
category           323
body               348
month              348
dtype: int64

In [189]:
def character_count(charactername,df):
    df_new = df[df['character'].str.contains(charactername, na=False)]
    print(charactername, df_new['work_id'].count())

character_count('jon snow', merged_got)
character_count('tyrion', merged_got)
character_count('arya', merged_got)
character_count('sansa stark', merged_got)
character_count('sandor', merged_got)
character_count('baelish', merged_got)
character_count('daenerys targaryen', merged_got)
character_count('brienne', merged_got)
character_count('margaery', merged_got)
character_count('jaime', merged_got)
character_count('loras', merged_got)
character_count('oberyn', merged_got)
character_count('missandei', merged_got)
character_count('grey worm', merged_got)
character_count('ellaria', merged_got)
character_count('bran', merged_got)
character_count('drogo', merged_got)

jon snow 12190
tyrion 4393
arya 8354
sansa stark 13740
sandor 2959
baelish 2167
daenerys targaryen 5550
brienne 5013
margaery 3173
jaime 6351
loras 1336
oberyn 1080
missandei 1018
grey worm 621
ellaria 395
bran 4286
drogo 731


In [129]:
GoT_race_REL = merged_got[merged_got['relationship'].str.contains("missandei|grey worm|khal|jhiqui|irri|qotho|rakharo|sallador saan|xaro|oberyn|ellaria", na=False)]
GoT_race_REL.count()

Unnamed: 0         1113
work_id            1113
title              1113
published          1113
rating             1112
relationship       1113
additional tags    1113
category           1073
body               1113
month              1113
dtype: int64

In [11]:
GoT_race_REL = merged_got[~merged_got['relationship'].str.contains("missandei|grey worm|khal|jhiqui|irri|qotho|rakharo|sallador saan|xaro|oberyn|ellaria", na=False)]
GoT_race_REL.count()

Unnamed: 0         28784
work_id            28784
title              28784
published          28784
rating             28784
character          27833
relationship       26620
additional tags    26096
category           27399
body               28784
month              28784
dtype: int64

In [9]:
GoT_chr_REL = merged_got[merged_got['character'].str.contains("missandei|grey worm|khal|jhiqui|irri|qotho|rakharo|sallador saan|xaro|oberyn|ellaria", na=False)]
GoT_chr_REL.count()

Unnamed: 0         2508
work_id            2508
title              2508
published          2508
rating             2507
character          2508
relationship       2368
additional tags    2302
category           2386
body               2508
month              2508
dtype: int64

In [146]:
#did not include slavery because sometimes slavery is used in a sexual connotation, not in the historic and politic and violent way

GOT_woke = merged_got[merged_got['additional tags'].str.contains("lgbtq|missandei|grey worm|femal|feminis|drag |bisex|heteronormativity|trans!|transg|dysphoria|consent|gay|nonbinary|dothraki|lesbian|gender|abolition|of color|of colour|naath|meereen|oberyn|ellaria|dorne|coloni|masc|queer| race", na=False)]
GOT_woke

Unnamed: 0.1,Unnamed: 0,work_id,title,published,rating,relationship,additional tags,category,body,month
8,8,5387489,"game of thrones fusion,",2015-12-09,"explicit,","castiel/dean winchester, castiel/john winchest...","wipadoptions, work up for adoption, alternate ...","m/m,",i did some very necessary research on misha co...,2015-12
11,11,15928970,"game of thrones stories,",2018-09-08,"teen and up audiences,","various/reader, robb stark/reader, sansa stark...","fluff, angst, smut, sexual activity, rough sex...","f/f, f/m, m/m, multi,","(y/n) lannister is the twin to tyrion, but unl...",2018-09
38,38,10423077,"otherwhen: game of thrones,",2017-03-24,"general audiences,","talisa maegyr/robb stark, catelyn stark/ned st...","what-if, other additional tags to be added, ta...","gen, f/m,",robb gets a raven from riverrun letting him kn...,2017-03
57,57,18996928,"game of thrones femslash ficlets,",2019-05-28,"explicit,","sansa stark/margaery tyrell, myrcella baratheo...","alternate universe - canon divergence, alterna...","f/f, multi,",femslash fictlets involving characters in game...,2019-05
73,73,19479925,"game of heroes: my thrones university,",2019-07-05,"mature,","jon snow/robb stark,","lots of characters, my hero academia verse, al...","m/m,",so i only started watching game of thrones thi...,2019-07
...,...,...,...,...,...,...,...,...,...,...
1866,1866,11841501,"boys of snow,",2017-08-18,"mature,","jon snow/robb stark, jon snow/jojen reed, jon ...","other additional tags to be added, smut, i wri...","m/m,","""you're such a tease."" robb's voice broke the ...",2017-08
1868,1868,19463911,"bits of stuff ii,",2019-07-03,"explicit,","tormund giantsbane/jon snow, derek hale/pack, ...","alternate universe - modern setting, establish...","m/m, multi, other,","\n""oi,"" tormund hollered after slamming the fr...",2019-07
1873,1873,17779256,"zenith,",2019-02-20,"general audiences,",,"action, action/adventure, adventure, affection...","gen,",zenith\n\nauthor's note: titled after the zeni...,2019-02
1874,1874,3509540,"alpha/omega one-shots,",2015-03-09,"teen and up audiences,","dean winchester/reader, sherlock holmes/ reade...","ofc - freeform, omega verse, alpha dean, alpha...","f/m,",something was wrong.\n\nwell...not wrong but d...,2015-03


In [144]:
GOT_maybeWokeIDK = merged_got[merged_got['additional tags'].str.contains("lgbtq|missandei|grey worm|femal|feminis|drag |bisex|heteronormativity|trans!|transg|dysphoria|consent|gay|dothraki|lesbian|gender|abolition|of color|of colour|naath|meereen|oberyn|ellaria|dorne|coloni|masc|queer| race")==False]
GOT_maybeWokeIDK

Unnamed: 0         27885
work_id            27885
title              27885
published          27885
rating             27884
relationship       25821
additional tags    27885
category           26546
body               27885
month              27885
dtype: int64

In [104]:
korra_all[korra_all['additional tags'].str.contains("rans!",  na=False)]

Unnamed: 0.1,Unnamed: 0,work_id,title,rating,category,fandom,relationship,character,additional tags,language,...,status,status date,words,chapters,comments,kudos,bookmarks,hits,body,month
193,193,3979267,The Prince of the South,Mature,"F/F, F/M, Other",Avatar: Legend of Korra,"Korrasami, Korra/Asami Sato","Korra (Avatar), Asami Sato, Tonraq, Senna, Hir...","Trans!KorraXAsami, Korrasami - Freeform, Korra...",English,...,Updated,2018-02-03,99119,13/?,180.0,574.0,58.0,8519.0,\nThe Prince of the South- \n \n \n \nAvatar ...,2015-05
204,204,13544514,Promises Kept,Explicit,F/F,Avatar: Legend of Korra,Korra/Asami Sato,"Korra (Avatar), Korra, Asami Sato","Omegaverse, Alpha!Korra, Omega!Asami, Sex, Pen...",English,...,Completed,2018-02-01,3710,1/1,46.0,494.0,29.0,8561.0,\nKorra tore down the long drive to the Sato ...,2018-02
2421,2421,5680369,Helpless to the Bass and the Fadiing Light,Explicit,"F/F, F/M",Avatar: Legend of Korra,"Korra/Asami Sato, Bolin/Opal (Avatar)","Korra (Avatar), Asami Sato, Bolin (Avatar), Op...","New Year's Eve, Clubbing, Alcohol, Trans Male ...",English,...,Completed,2016-01-09,51,1/1,4.0,12.0,1.0,1405.0,Opal sighed for the millionth time as they fin...,2016-01
2791,2791,5003569,Who Am I?,Explicit,F/M,Avatar: Legend of Korra,Korra/Asami Sato,"Korra (Avatar), Asami Sato, Mako (Avatar)","Trans!Korra, Transphobia, Angst, Hurt/Comfort,...",English,...,Completed,2015-10-15,8544,1/1,19.0,316.0,38.0,6853.0,It was six weeks after they've returned from t...,2015-10
