# Background Information and Data Collection

## Background Information

### Origins and Dragons

The genesis of RPGs can be traced back to the late 1960s and early 1970s, rooted in tabletop wargaming traditions. A pivotal moment occurred in 1969 with "Braunstein," a game devised by David Wesely in St. Paul, Minnesota. Unlike traditional wargames focused solely on military tactics, Braunstein introduced players to roles such as university chancellors and student revolutionaries, emphasizing individual character actions and open-ended narratives. This innovative approach laid the groundwork for modern role-playing games. 


Building upon this foundation, Dave Arneson and Gary Gygax collaborated to create "Dungeons & Dragons" (D&D), first published in 1974. D&D combined fantasy storytelling with strategic gameplay, allowing players to assume unique character roles within a structured rule system. Its release marked the commercial birth of RPGs, fostering a dedicated following and inspiring numerous other games



### The Old School Renaissance (OSR) Movement

In the early 2000s, the RPG community witnessed the emergence of the Old School Renaissance (OSR). This movement sought to revive the gameplay styles and design philosophies of early RPGs, particularly the original editions of D&D. OSR enthusiasts emphasized simplicity, creativity, and a do-it-yourself (DIY) ethos, often favoring "rulings over rules" to grant game masters greater flexibility and players a heightened sense of adventure. 


The OSR led to the development of "retro-clones," games that emulate the mechanics and feel of early RPGs. Notable examples include "Labyrinth Lord" (2007),"Swords & Wizardry" (2008), and "Old School Essentials" (2019). These games allowed new generations to experience classic RPG gameplay and encouraged the creation of original content within established frameworks.


### Growth of the RPG Community and Industry

From a financial perspective, the RPG industry has experienced significant growth since its inception. While exact monetary figures are challenging to pinpoint due to the diverse and fragmented nature of the market, several indicators highlight this expansion. According to "Wizards of the Coast (WotC)" which has been a part of Hasbro since 1999, "has experienced a decade of growth that saw their annual profits grow 400%. The company doubled in size in the five years between 2012 to 2017, and Hasbro’s CEO Brian Goldner said today that Wizards was on track to double once again in the following five year period from 2018 to 2023."

The 2000s introduced the Open Game License (OGL), allowing third-party publishers to create compatible content for D&D. This initiative spurred a proliferation of supplementary materials and alternative RPG systems, diversifying the market and contributing to its financial growth. 


In recent years, the advent of crowdfunding platforms like Kickstarter has further fueled the industry's expansion. Projects such as "Shadowdark: The Western Reaches" have raised substantial funds, with this particular campaign garnering over $1.5 million. These successes underscore the community's enthusiasm and the financial viability of innovative RPG endeavors. 

Additionally, the rise of actual-play web/podcast/streaming series like "Critical Role", "The Adventure Zone", and "Dimension 20" has introduced RPGs to broader audiences, leading to increased sales of core rulebooks and related merchandise. By 2021, D&D had attracted approximately 85 million global fans, engaging through both tabletop and digital formats. 


[Tim Harford’s epic, 40-year Dungeons & Dragons odyssey, Financial Times](https://www.ft.com/content/40dbd2a9-d651-497b-8a02-30dbf520f154)

[History of Role-Playing Games](https://ogres.fandom.com/wiki/History_of_Role-playing_Games)

[What’s an OSR RPG? A Guide to Old School Renaissance](https://blog.worldanvil.com/dm-tips-advice/osr-rpg-guide-to-old-school-renaissance/)

 [Wizards of the Coast is Now a Division of Hasbro, Will Lead Digital Licensing Initiatives](https://www.hipstersofthecoast.com/2021/02/wizards-of-the-coast-is-now-a-division-of-hasbro-will-lead-digital-licensing-initiatives/)

[The D&D Open Game License controversy, explained](https://www.washingtonpost.com/video-games/2023/01/19/dungeons-and-dragons-open-game-license-wizards-of-the-coast-explained/)

[D&D co-creator Gary Gygax is having a big week in crowdfunding](https://www.polygon.com/dnd-dungeons-dragons/541862/gary-gygax-castle-zygag-shadowdark-crowdfunding-campaigns)

[Why Dungeons & Dragons is still winning at 50](https://www.axios.com/local/seattle/2025/01/29/dnd-wizards-2025-roadmap-dungeons-dragons-50-years)

## Data Collection

We're going to use [PRAW](https://praw.readthedocs.io/en/stable/index.html) to scrape Reddit for posts from the OSR and RPG subreddits. We have to be careful about making too many requests to Reddit at a given time, so we will not be making requests in a loop, but rather line by line. This also allows us to see what we are getting in our returns so we can iterate our search terms intelligently. 
* First, we'll make some helper functions to make collecting our data easier. We'll make a get request to the Reddit API via PRAW with arguments pertaining to our subreddit and search terms we will get returns from.
* Next we'll store the returned response in a dictionary to ensure that duplicate posts were not returned, and to save the post's title, selftext(or the body of the post), comments, subreddit name, and the UTC time.
* Then we convert the dictionary to a dataframe and save it as a csv by the number range of data pulled in.
  

In [1]:
import praw
import pandas as pd

In [8]:
def praw_search(subreddit, search_str, **extra_kwargs):
    reddit = praw.Reddit()
    return reddit.subreddit(subreddit).search(search_str, **extra_kwargs)

def create_praw_dict(posts):
    subred_dict = {}
    for post in posts:
        subred_dict[post.id] = {
                 "title":post.title,
                "selftext":post.selftext,
                "comments":post.comments,
                "subreddit":post.subreddit,
                "created_utc":post.created_utc 
        }
    return subred_dict

def save_csv(start_index, posts_dict, subreddit):
    #make dataframe out of post dictionary
    df = pd.DataFrame(posts_dict).T

    #creating filename to save df
    num_posts = len(posts_dict)
    stop_index = start_index + num_posts
    filename = f"../data/{subreddit}_{start_index}-{stop_index}.csv"
    #save df to filename
    df.to_csv(filename)

### OSR Data Collection

In [6]:
start_index = 0
subreddit = "osr"

In [9]:
# First we look at explicit OSR reviews

posts = praw_search('osr', "flair:review", limit=500)
osr_dict = create_praw_dict(posts)
save_csv(start_index, osr_dict, subreddit)
start_index += len(osr_dict)

In [None]:
# newest posts that explicitly contain text
posts = praw_search("osr", "self:true",sort = "new", limit= 500)
osr_dict = create_praw_dict(posts)
save_csv(start_index, osr_dict, subreddit)
start_index += len(osr_dict)

In [None]:
#posts that explicitly contain text sorted by highest comment count 
#(intending to get posts that are less recent with "all time")

posts = praw_search("osr", "self:true",sort = "new", limit= 500)
osr_dict = create_praw_dict(posts)
save_csv(start_index, osr_dict, subreddit)
start_index += len(osr_dict)

In [None]:
#posts that explicitly contain text sorted by highest comment count, but recent

posts = praw_search("osr", "self:true",sort = "new", limit= 200)
osr_dict = create_praw_dict(posts)
save_csv(start_index, osr_dict, subreddit)
start_index += len(osr_dict)

In [None]:
#posts that explicitly contain text that mention "rpg" and sorting by relevance

posts = praw_search("osr", "rpg self:true",sort = "relevance", limit= 300)
osr_dict = create_praw_dict(posts)
save_csv(start_index, osr_dict, subreddit)
start_index += len(osr_dict)

### RPG Data Collection

In [None]:
start_index = 0
subreddit = "rpg1"

In [None]:
#posts that explicitly contain text that mention "rpg" and sorting by relevance
#posts = reddit.subreddit("rpg").new(limit=150)

posts = praw.Reddit().subreddit(subreddit).new(limit=150)
osr_dict = create_praw_dict(posts)
save_csv(start_index, osr_dict, subreddit)
start_index += len(osr_dict)

In [None]:
posts = praw.Reddit().subreddit(subreddit).top(limit=150)
osr_dict = create_praw_dict(posts)
save_csv(start_index, osr_dict, subreddit)
start_index += len(osr_dict)

In [None]:
# explicitly text posts that mention rpg by highest comment count

posts = praw_search("rpg", "rpg self:true",sort = "comment count", limit= 500)
osr_dict = create_praw_dict(posts)
save_csv(start_index, osr_dict, subreddit)
start_index += len(osr_dict)

In [None]:
# explicitly text posts that mention rpg by relevance

posts = praw_search("rpg", "rpg self:true",sort = "relevance", limit= 300)
osr_dict = create_praw_dict(posts)
save_csv(start_index, osr_dict, subreddit)
start_index += len(osr_dict)

### Now that we have our data save in CSVs in the data folder, we'll move over to [cleaning our data and performing EDA](./cleaning_and_eda.ipynb)