# Ao3 Tags on "Fix-it Fics" as an Indicator of Satisfying Endings #

Are there common themes among fix it fics that show what audiences crave that shows lack? To answer this, I will scrape data from Archive of Our Own (Ao3) to compare tags used on “fix it” fanfictions, or fanfictions that aim to give the show a “better” ending, across the shows Voltron, Supernatural, and Star Wars: The Sequel Trilogy. The answer to this question could be used to suggest what themes audiences are wanting to see in media in general and could be used as a recommendation to media writers on how to create “satisfying” endings. 


## Data Collection ##

To do this, I will use the following packages.

In [None]:
import pandas as pd
from bs4 import BeautifulSoup as bs
import json
import requests
import time

I request data from Ao3 using the requests library. I did some work on Ao3's actual page as it would be easier than generate the url, which I was unsure how to do. There is not a lot of documentation on how Ao3 works, as seen by the lack of API, so all urls will be generated on the actual Ao3 site and then pasted over.



### Playing with the Data ###

I wanted to see how to scrape Ao3, so I used Voltron as a test. I obtained the website url to scrape from by doing a manual search on Ao3's website. Ao3 lists 20 fanfics per page, so this will have 20 fanfics worth of tags. Tags are not limited, so this is entirely dependent on what the author decided to list when publishing the fic. I also decided to exclude any explicit labeled fanfictions for the sake of this being a school project and having to present this data. This works out for the overall mission as well, as all of the fandoms I chose are aired on TV and follow FCC guidelines on how graphic their content can be for TV-14/PG-13.

In [None]:
page = requests.request("GET", 'https://archiveofourown.org/tags/Voltron:%20Legendary%20Defender/works?commit=Sort+and+Filter&exclude_work_search%5Brating_ids%5D%5B%5D=13&page=1&utf8=%E2%9C%93&work_search%5Bcomplete%5D=&work_search%5Bcrossover%5D=&work_search%5Bdate_from%5D=&work_search%5Bdate_to%5D=&work_search%5Bexcluded_tag_names%5D=&work_search%5Blanguage_id%5D=&work_search%5Bother_tag_names%5D=Fix-It&work_search%5Bquery%5D=&work_search%5Bsort_column%5D=revised_at&work_search%5Bwords_from%5D=&work_search%5Bwords_to%5D=')

In [None]:
page

In [None]:
soup = bs(page.text, "html.parser")

I looked through the soup in order to find what html tags contained the "tag" items that I will need for my analysis. In this case, it is in linked items (<a/href>) with the class "tag".

In [None]:
soup

In [None]:
tags = soup.find_all('a', class_='tag')

With this, I can clean up the list so it just contains what is inside the tags.

In [None]:
tags

In [None]:
tagList = [x.text for x in soup.find_all('a', class_='tag')]

In [None]:
tagList

And with that, I have a list of tags for the first 20 fix-it fanfics from Voltron in Ao3. 


### Function Creation ###
Now that I have learned how to scrape the tags from Ao3, I will create a function that can take the fandom name, a data frame, and how many pages to iterate through as parameters and spit out a data frame of tags for that fandom. The function can be used as long as you use the exact phrase used for the fandom as Ao3 does. If it is your first time using the function, use an empty data frame. In order to keep appending onto that data frame, feed that same data frame into the function when using a different fandom title. The function will a dataframe with two columns: one with the tags and the other with the name of the fandom that tag came from.

The first part of the function is where the showName is changed so the spaces are replaced with "%20", as that is how the url reads spaces. The function updates the showName and then iterates through each page requested, scraping tags from each page. It then goes back through to add the final column of the "Fandom" is came from.

In [None]:
def getTags(showName, df, getPages):
    pageNum = 1
    x = 0
    showNameParse = showName.split()
    showNameAddSpace = ""
    for word in showNameParse:
        showNameAddSpace = showNameAddSpace + showNameParse[x] + "%20"
        x += 1
    print(showNameAddSpace)
    while pageNum < getPages:
        page = requests.request("GET", "https://archiveofourown.org/tags/" + showNameAddSpace + "/works?commit=Sort+and+Filter&exclude_work_search%5Brating_ids%5D%5B%5D=13&page=" + str(pageNum) + "&utf8=%E2%9C%93&work_search%5Bcomplete%5D=&work_search%5Bcrossover%5D=&work_search%5Bdate_from%5D=&work_search%5Bdate_to%5D=&work_search%5Bexcluded_tag_names%5D=&work_search%5Blanguage_id%5D=&work_search%5Bother_tag_names%5D=Fix-It&work_search%5Bquery%5D=&work_search%5Bsort_column%5D=revised_at&work_search%5Bwords_from%5D=&work_search%5Bwords_to%5D=")
        soup = bs(page.text, "html.parser")
        tagList = [x.text for x in soup.find_all('a', class_='tag')]
        pageNum += 1
        if df.empty:
            df = df.append(tagList)
        else:
            df = df.append(tagList)
        for x in df:
            df["Fandom"] = showName
    return df

### Voltron ###

I chose Voltron for this project due to its notoriety of having a disliked ending by the fans. I will use my function to obtain the tags from the first 5 pages.

In [None]:
tag_df = pd.DataFrame()
final_tagList = getTags("Voltron: Legendary Defender", tag_df, 5)

In [None]:
final_tag_df.head(30)

### Supernatural ###

Now that I have 40 fanfics worth of tags for Voltron, I will repeat this process for the tv show Supernatural, chosen for its popularity as the subject of fanfiction (over 240,000 fics!) and for its bad ending. 

Since I have already worked out what tags I need and how I can get them from this site, I will be condensing my process to fewer cells with less explanation.

In [None]:
final_tagList = getTags("Supernatural", final_tagList, 5)

### Star Wars: The Sequel Trilogy ###

This fandom is different than the previous two chosen as it is much bigger than Voltron or even Supernatural since it encompasses many different movies and tv shows. When I chose Star Wars, I did so because I personally did not like the ending to Star Wars Episode IX Rise of Skywalker. I did a preliminary check and found fix-it fanfics for that movie. However, when choosing which fandom website to sort and scrape, I found that there were 11 categories for the Star Wars media franchise. Rather than look through all of the media, which was a category, I chose to stay true to what made me pick Star Wars in general and chose the "Star Wars: The Sequel Trilogy" fandom since it includes Rise of Skywalker.

Despite this being a movie trilogy rather than a tv show, I do not expect this media difference to affect tag usage and preferences among fix-it fanfics. 

Since I have already worked out what tags I need and how I can get them from this site, I will be condensing my process to fewer cells with less explanation.

In [None]:
final_tagList = getTags("Star Wars Sequel Trilogy", final_tagList, 5)

In [None]:
final_tagList