![Callysto.ca Banner](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-top.jpg?raw=true)

# Callysto’s Weekly Data Visualization

## Dungeons and Dragons

### Recommended Grade levels: 6-12
<br>

### Instructions
#### “Run” the cells to see the graphs
Click “Cell” and select “Run All”.<br> This will import the data and run all the code, so you can see this week's data visualization. Scroll to the top after you’ve run the cells.<br> 

![instructions](https://github.com/callysto/data-viz-of-the-week/blob/main/images/instructions.png?raw=true)

**You don’t need to do any coding to view the visualizations**.
The plots generated in this notebook are interactive. You can hover over and click on elements to see more information. 

Email contact@callysto.ca if you experience issues.

### About this Notebook

Callysto's Weekly Data Visualization is a learning resource that aims to develop data literacy skills. We provide Grades 5-12 teachers and students with a data visualization, like a graph, to interpret. This companion resource walks learners through how the data visualization is created and interpreted by a data scientist. 

The steps of the data analysis process are listed below and applied to each weekly topic.

1. Question - What are we trying to answer? 
2. Gather - Find the data source(s) you will need. 
3. Organize - Arrange the data, so that you can easily explore it. 
4. Explore - Examine the data to look for evidence to answer the question. This includes creating visualizations. 
5. Interpret - Describe what's happening in the data visualization. 
6. Communicate - Explain how the evidence answers the question.

### Acknowledgement

Dungeons and Dragons is a property of [Wizards of the Coast](https://dnd.wizards.com/). For more information about the game, check out their website, or some of the resources linked at the end of the notebook.

# Question

What are the most popular types of Dungeons and Dragons characters to play?

### Background

Dungeons and Dragons (D&D) is an incredibly popular table-top roleplaying game (TTRPG), first published in 1974 after being developed by Gary Gygax and Dave Arneson. Over the years, it has seen its popularity and sales grow exponentially, as newer editions are released and the game finds wider adoption. As of writing, the current version is 5th Edition, which was released in late 2014, though a newer edition is set for release in 2024.

The gameplay consists of one player, the Dungeon Master (or DM), describing the setting and interpreting the rules, as other players act out their characters within that world. Often D&D is played without any board, but sometimes groups will use maps and miniature figurines to keep track of the chaos. In either case, every player will have their character sheet and a set of dice.

The game presents an opportunity for players to act out the gameplay, taking on the persona of their character and interacting with the world according to their character's abilities. For a lot of D&D players, creating their character is the most important and most creative aspect of the game, so we can explore what's common (and what isn't) when it comes to character creation.


### Goal
Can we determine some of the most popular choices in character building in D&D, and develop a character ourselves?

# Gather

### Code:
The code below will import the Python programming libraries we need to gather and organize the data to answer our question.

In [None]:
## import libraries
import pandas as pd
import plotly.express as px
import random

### Data:

For this notebook, the data came from a user who had developed an app to print character sheets. As part of the printing process, the user was able to collect data on the characters, and pulled it together into one large dataset that they've shared on their [GitHub profile here](https://github.com/oganm/dnddata).

### Import the data

In [None]:
## import data
df = pd.read_csv('data/dnd_chars_data.csv')
df

### Comment on the data
As we can see from the numbers below the dataframe, we have 10,894 characters in this dataset, where each character has 36 attributes for us to consider. We probably won't need all of these, so at the next step we can remove unneeded columns.

# Organize

The code below will arrange the data cleanly so that we can do analysis on it. This is a quality control step for our data and involves examining the data to detect anything odd with the data (e.g. structure, missing values), fixing the oddities, and checking if the fixes worked.

The first few columns and the final column contain information that might be important for the creator of the app that collected the data, but they don't hold any meaning for us, so we can get of them. Similarly, there are several columns throughout the dataset that contain information on alignment, but the data is spotty, so it might be best to remove it.

In [None]:
# See what the column names are
df.columns

In [None]:
df_cleaned = df.drop(['Unnamed: 0',
                      'ip',
                      'finger',
                      'hash',
                      'name',
                      'alias',
                      'alignment',
                      'processedAlignment',
                      'good',
                      'lawful'],
                    axis=1)
df_cleaned

One of the most important aspects of a D&D character is its class, as that has the greatest effect on how the character interacts with the world. There is information in the dataset about class, but because in D&D 5E you have the ability to *multiclass* (or have more than one class), storing this information in a single column can be kinda of muddy.

Let's see what classes (or multiclass combinations) exist in this data, starting with the most classes;

In [None]:
sortedClasses = sorted(df_cleaned['justClass'].unique(),key=len)
sortedClasses[::-1]

Wow, there are really creative characters in here! It's not uncommon to multiclass so that you can take advantage of many attributes, but it's very unusual to have more than 2 or 3 multiclasses. Imagine keeping track of all those abilities and attributes!

But as we see below, it's really not very common, and most of the top 20 most common classes are single classes:

In [None]:
classCounts = df_cleaned['justClass'].value_counts().head(20) # find 20 most common values for 'justClass' and their frequency
classCounts

All the same, we'll remove any entries that have absolutely unique classes, as they're not really representative of the data:

In [None]:
repeatedClass = set(classCounts[classCounts > 1].index.values) # find values for 'justClass' that appear more than once
repeatedClass

In [None]:
df_cleaned = df_cleaned[df_cleaned['justClass'].isin(repeatedClass)]
df_cleaned

By removing those uncommon entries, our dataset went from 10,894 characters to 10,345, so we still retain $(10345 / 10894) \approx 95\%$ of our data!

While we're here, we'll also do something similar with `race`, as there are likely many homebrew races in this dataset as well. For this part of cleaning, we'll take any race that makes an appearance more than 50 times in the dataset, as the subraces pollute this attribute quite a bit:

In [None]:
raceCounts = df_cleaned['race'].value_counts().head(45)
repeatedRace = set(raceCounts[raceCounts > 50].index.values)
df_cleaned = df_cleaned[df_cleaned['race'].isin(repeatedRace)]

# Explore

The code below will be used to help us look for evidence to answer our question. This can involve looking at data in table format, applying math and statistics, and creating different types of visualizations to represent our data.

The first thing we should look at is which classes and races are the most popular. This is easy to do with a bar chart:

In [None]:
px.histogram(df_cleaned, x='justClass', 
             category_orders=dict(justClass=classCounts.index),
             title='Most Popular D&D Classes',
             labels={'justClass':'Class'})

In [None]:
px.histogram(df_cleaned, x='race', 
             category_orders=dict(race=raceCounts.index),
             title='Most Popular D&D Races',
             labels={'race':'Race'})

# Interpret
(Cycle between Explore and Interpret)<br>
(Describe what’s happening in the data visualization (graph). What do you notice? For example, big or small values, or trends.)

# Communicate
Below we will discuss the results of the data exploration.
(How does our key evidence help answer our question?)

[![Callysto.ca License](https://github.com/callysto/curriculum-notebooks/blob/master/callysto-notebook-banner-bottom.jpg?raw=true)](https://github.com/callysto/curriculum-notebooks/blob/master/LICENSE.md)