![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fcallysto%2Fdata-viz-of-the-week&branch=main&subPath=dungeons-dragons/dungeons-and-dragons.ipynb&depth=1" target="_parent"><img src="https://raw.githubusercontent.com/callysto/curriculum-notebooks/master/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"/></a>

# Callysto’s Weekly Data Visualization

## Dungeons and Dragons

### Recommended Grade levels: 9-12
<br>

<center> <img src='https://github.com/callysto/data-files/raw/main/data-viz-of-the-week/dungeons-dragons/img/DND_Art7.png' width=400></center>

### Instructions
#### “Run” the cells to see the graphs
Click “Cell” and select “Run All”.<br> This will import the data and run all the code, so you can see this week's data visualization. Scroll to the top after you’ve run the cells.<br> 

![instructions](https://github.com/callysto/data-viz-of-the-week/blob/main/images/instructions.png?raw=true)

**You don’t need to do any coding to view the visualizations**.
The plots generated in this notebook are interactive. You can hover over and click on elements to see more information. 

Email contact@callysto.ca if you experience issues.

### About this Notebook

Callysto's Weekly Data Visualization is a learning resource that aims to develop data literacy skills. We provide Grades 5-12 teachers and students with a data visualization, like a graph, to interpret. This companion resource walks learners through how the data visualization is created and interpreted by a data scientist. 

The steps of the data analysis process are listed below and applied to each weekly topic.

1. Question - What are we trying to answer? 
2. Gather - Find the data source(s) you will need. 
3. Organize - Arrange the data, so that you can easily explore it. 
4. Explore - Examine the data to look for evidence to answer the question. This includes creating visualizations. 
5. Interpret - Describe what's happening in the data visualization. 
6. Communicate - Explain how the evidence answers the question.

### Acknowledgement

This notebook is unofficial Fan Content permitted under the Fan Content Policy. Not approved/endorsed by Wizards. Portions of the materials used are property of Wizards of the Coast. ©Wizards of the Coast LLC.

For more information about the game, check out their website, or some of the resources linked at the end of the notebook.

# Question

What are the most popular types of Dungeons and Dragons characters to play?

### Background

Dungeons and Dragons (D&D) is an incredibly popular table-top roleplaying game (TTRPG), first published in 1974 after being developed by Gary Gygax and Dave Arneson. Over the years, it has seen its popularity and sales grow exponentially, as newer editions are released and the game finds wider adoption. As of writing, the current version is 5th Edition, which was released in late 2014, though a newer edition is set for release in 2024.

The gameplay consists of one player, the Dungeon Master (or DM), describing the setting and interpreting the rules, as other players act out their characters within that world. Often D&D is played without any board, but sometimes groups will use maps and miniature figurines to keep track of the chaos. In either case, every player will have their character sheet and a set of dice.

The game presents an opportunity for players to act out the gameplay, taking on the persona of their character and interacting with the world according to their character's abilities. For a lot of D&D players, creating their character is the most important and most creative aspect of the game, so we can explore what's common (and what isn't) when it comes to character creation.

**If you're new to D&D**, skip to the resources at the end of the notebook, as it's a very easy game to learn and Wizards of the Coast has made the game rules free to access!


### Goal
Can we determine some of the most popular choices in character building in D&D, and develop a character ourselves?

# Gather

### Code:
The code below will import the Python programming libraries we need to gather and organize the data to answer our question.

In [None]:
## import libraries
%pip install -q pyodide_http plotly nbformat
import pyodide_http
pyodide_http.patch_all()
import pandas as pd
import plotly.express as px
import random

### Data:

For this notebook, the data came from a user who had developed an app to print character sheets. As part of the printing process, the user was able to collect data on the characters, and pulled it together into one large dataset that they've shared on their [GitHub profile here](https://github.com/oganm/dnddata).

### Import the data

In [None]:
## import data
df = pd.read_csv('https://raw.githubusercontent.com/callysto/data-files/main/data-viz-of-the-week/dungeons-dragons/data/dnd_chars_data.csv')
display(df)

### Comment on the data
As we can see from the numbers below the dataframe, we have 10,894 characters in this dataset, where each character has 36 attributes for us to consider. We probably won't need all of these, so at the next step we can remove unneeded columns.

<center> <img src='https://github.com/callysto/data-files/raw/main/data-viz-of-the-week/dungeons-dragons/img/DND_Art2.png' width=400></center>

# Organize

The code below will arrange the data cleanly so that we can analyze it. This is a quality control step for our data and involves examining the data to detect anything odd with the data (e.g. structure, missing values), fixing the oddities, and checking if the fixes worked.

The first few columns and the final column contain information that might be important for the creator of the app that collected the data, but they don't hold any meaning for us, so we can get rid of them. Similarly, there are several columns throughout the dataset that contain information on alignment, but the data is spotty, so it might be best to remove it.

In [None]:
# See what the column names are
df.columns

In [None]:
df_cleaned = df.drop(['Unnamed: 0',
                      'ip',
                      'finger',
                      'hash',
                      'name',
                      'alias',
                      'alignment',
                      'processedAlignment',
                      'good',
                      'lawful'],
                    axis=1)
df_cleaned

One of the most important aspects of a D&D character is its class, as that has the greatest effect on how the character interacts with the world. There is information in the dataset about class, but because in D&D 5E you have the ability to *multiclass* (or have more than one class), storing this information in a single column can be kind of muddy.

Let's see what classes (or multiclass combinations) exist in this data, starting with the most classes;

In [None]:
sortedClasses = sorted(df_cleaned['justClass'].unique(),key=len)
sortedClasses[::-1]

Wow, there are really creative characters in here! It's not uncommon to multiclass so that you can take advantage of many attributes, but it's very unusual to have more than 2 or 3 multiclasses. Imagine keeping track of all those abilities and attributes!

But as we see below, it's really not very common, and most of the top 20 most common classes are single classes:

In [None]:
classCounts = df_cleaned['justClass'].value_counts().head(20) # find 20 most common values for 'justClass' and their frequency
classCounts

All the same, we'll remove any entries that have absolutely unique classes, as they're not really representative of how most people play the game:

In [None]:
repeatedClass = set(classCounts[classCounts > 1].index.values) # find values for 'justClass' that appear more than once
repeatedClass

In [None]:
df_cleaned = df_cleaned[df_cleaned['justClass'].isin(repeatedClass)] # filter the data to only include the above classes
df_cleaned

By removing those uncommon entries, our dataset went from 10,894 characters to 10,345, so we still retain $(10345 / 10894) \approx 95\%$ of our data. We'll be removing more in the next steps, but it's important to consider the effect that data cleaning has on our data.

While we're here, we'll also do something similar with `race`, as there are likely many custom (or *homebrew*) races in this dataset as well. For this part of cleaning, we'll include any race that makes an appearance more than 50 times in the dataset, as the subraces pollute this attribute quite a bit:

In [None]:
raceCounts = df_cleaned['race'].value_counts().head(45)
repeatedRace = set(raceCounts[raceCounts > 50].index.values)
df_cleaned = df_cleaned[df_cleaned['race'].isin(repeatedRace)].reset_index(drop=True)

<center> <img src='https://github.com/callysto/data-files/raw/main/data-viz-of-the-week/dungeons-dragons/img/DND_Art1.png' width=400></center>

# Explore

The code below will be used to help us look for evidence to answer our question, to see which races and classes are the most common in D&D. This can involve looking at data in table format, applying math and statistics, and creating different types of visualizations to represent our data.

The first thing we should look at is which classes and races are the most popular. This is easy to do with a bar chart:

In [None]:
px.histogram(df_cleaned, x='justClass', 
             category_orders=dict(justClass=classCounts.index),
             title='Most Popular D&D Classes',
             labels={'justClass':'Class'})

In [None]:
px.histogram(df_cleaned, x='race', 
             category_orders=dict(race=raceCounts.index),
             title='Most Popular D&D Races',
             labels={'race':'Race'})

### Interpret
Let's take a second to look at the visualizations and see what we can learn from them.

1. What are the most common classes? Is there a reason there's a clear dropoff after the top 12 answers? (hint: check the basic rules or Player's Handbook)
1. What is the moost common race? Why do you think that might be?

## Bringing it Together

Some combinations of race and class are more common than others. Using the heatmap below, we can see if the trends we saw above with the bar charts hold when we start combining the data. Hover over the squares for more specific data on each combination:

In [None]:
px.density_heatmap(df_cleaned, x='race', y='justClass',
                   height=600, width=1000,
                   title='Heatmap of common D&D race and class combinations',
                   labels={'justClass':'Class', 'race':'Race'})

### Interpret

1. What can you learn from this that the previous two visualizations didn't show?
1. Is there a build that's obviously the most common?

## Ability scores

Though choosing your character's race (or *species* in more recent versions of the game) and class are important from a roleplaying perspective, even more important is how they affect your character main stats: ability scores.

In D&D 5E there are 6 ability scores:
- Strength (STR)
- Dexterity (DEX)
- Constitution (CON)
- Intelligence (INT)
- Wisdom (WIS)
- Charisma (CHA)

Both racial traits and class traits can affect these scores, and mixing and matching the two in the character building process has to consider the resulting ability scores. Below we can look at the distribution of each ability score for race and class combinations.

However, because the tool we got this data from allows players to input their own values, there are some values that would not be considered within the rules of the game. [According to the Player's Handbook](https://www.dndbeyond.com/sources/basic-rules/using-ability-scores#AbilityScoresandModifiers), even with ability modifiers, an ability score should not exceed 30.

<center> <img src='https://github.com/callysto/data-files/raw/main/data-viz-of-the-week/dungeons-dragons/img/DND_PHCover.png' width=400> </center>

Below we'll remove the offending characters:

In [None]:
# Identify index of rows meeting each criteria; force to 'set' datatype to remove duplicates, and drop those rows
toDrop = set(df_cleaned.index[df_cleaned['Str'] > 30].tolist() 
+ df_cleaned.index[df_cleaned['Dex'] > 30].tolist() 
+ df_cleaned.index[df_cleaned['Con'] > 30].tolist()
+ df_cleaned.index[df_cleaned['Int'] > 30].tolist()
+ df_cleaned.index[df_cleaned['Wis'] > 30].tolist()
+ df_cleaned.index[df_cleaned['Cha'] > 30].tolist())
df_cleaned = df_cleaned.drop(toDrop)

And then we can plot the data to look at what the typical ranges are for each of the ability scores for each race / class combination! With this many races and classes, the data can be kind of messy, but feel free to click on the classes in the legend to enable or disable them in the plot.

### Box plots

Box plots are an excellent way to visualize the distribution of numbers within a category. In the plot below, you'll see for each race and class a shape that tells you how the numbers are distributed. 

The box itself represents several statistical parameters: the bottom of the box is the 1st quartile (counting 25% of the way up from the lowest value to the highest), the upper edge of the box is the 3rd quartile (counting 75% of the way up), and the line in the box is the 2nd quartile, or *median* (counting 50% of the way up).

Projecting outside of the box in either direction are lines, often called *whiskers*. These denote the maximum and minimum values within the dataset, once outliers have been excluded. Sometimes data within a dataset is so much higher or lower than the other values, that it's called an *outlier*, and in box plots these are usually represented as dots outside of the whiskers.

In [None]:
# Change the below value to one of: 'Str', 'Dex', 'Con', 'Int', 'Wis', 'Cha'
abScore = 'Str'

fig = px.box(df_cleaned, x='race', y=abScore, color='justClass',
             height=800, title=f'Distribution of {abScore.upper()} ability score by race and class',
            labels={abScore:abScore.upper(), 'race':'Race','justClass':'Class'})
fig.update_layout({'plot_bgcolor': 'gainsboro'})
fig.show()

### Interpret

1. Do certain races tend to higher or lower scores in specific abilities? What about classes?
1. How do different classes compare across the same race?
1. Compare the different spellcasting classes (Bard, Druid, Cleric, Warlock, Sorcerer, Wizard). Is there a clearer pattern to the races used for these classes? (hint: consider the [ability used for spellcasting](https://nwn2.fandom.com/wiki/Spellcasting_class#Spellcasting_ability) for each of these classes)

## Build your own character!

Now that we've explored some common themes in character creation, use the tool below to randomly generate a character! Each time you run this cell, it will randomly generate a new character.

In [None]:
stdArray = [15, 14, 12, 12, 10, 8]
random.shuffle(stdArray)

print(f"Your character:\n\nA {random.choice(df_cleaned['race'].unique())} {random.choice(df_cleaned['justClass'].unique())}, with the following base ability scores: \nSTR: {stdArray[0]}\nDEX: {stdArray[1]}\nCON: {stdArray[2]}\nINT: {stdArray[3]}\nWIS: {stdArray[4]}\nCHA: {stdArray[5]}")

For more tips on how to build a character, check out the resources below:

<center><img src='https://github.com/callysto/data-files/raw/main/data-viz-of-the-week/dungeons-dragons/img/DND_Art9.png' width=400></center>

# Further Resources

1. [D&D Website](https://dnd.wizards.com/)
1. [Basic Rules](https://dnd.wizards.com/what-is-dnd/basic-rules)  (free to download!)
1. [D&D Resources for Educators](https://dnd.wizards.com/resources/educators)
1. [Fantasy Name Generator](https://www.fantasynamegenerators.com/)

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)