# D&D Race and Class Breakdown
- Here is an updated count of the various Race/Class combinations on DandDBeyond.com.
-What we will do here is a brief bit of Data Exploration, followed by answering some questions.


## Data Exploration
In this section we will:
- Take a look at the data
- Make a crosstab to better see the data
- Make some basic visualizations
- Lay down the framework for our questions

### Import, Load and Look

In [1]:
import pandas as pd
import matplotlib as plt
import numpy as np 
%config IPCompleter.greedy=True
%matplotlib inline
import seaborn as sns
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import cufflinks as cf
import seaborn as sns


In [2]:
init_notebook_mode(connected=True)

In [3]:
cf.go_offline()

In [4]:
pwd

'C:\\Users\\Hakuj\\Documents\\DnD Project'

In [5]:
df = pd.read_csv('DnDBeyond.csv')

In [6]:
df.head()

Unnamed: 0,RACE,Class,n_characters
0,Human,Wizard,21665
1,Half-Elf,Warlock,19173
2,Human,Fighter,18920
3,Half-Orc,Fighter,13922
4,Half-Elf,Bard,12903


In [7]:
df.shape

(555, 3)

In [8]:
df = df.rename(columns={'n_characters':'Total', 'RACE':'Race'})

### Break up the data
In this section let's:
- Make a pivot table to better utilize the 'Totals' column
- Make a list of the Races and Classes
- Make Data Frame grouped by Race
- And one by race!

#### Here's that table


In [9]:
RaceClass = df.pivot_table(index='Race', columns='Class', values='Total', margins=True, fill_value=0)

In [10]:
RaceClass
#Totals are floats?

Class,Artificer,Barbarian,Bard,Blood Hunter,Cleric,Druid,Fighter,Monk,Paladin,Ranger,Rogue,Sorcerer,Warlock,Wizard,All
Race,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
Aarakocra,610,9169,1834,869,6175,1698,10603,5272,3764,3822,3006,1810,1807,2901,3810.0
Aasimar,105,113,1637,120,963,247,1079,308,964,336,1280,1886,2159,1502,907.071429
Bugbear,76,3215,303,266,361,325,3086,1080,585,843,2344,188,363,215,946.428571
Centaur,43,973,136,187,858,771,994,454,415,702,199,110,159,114,436.785714
Centaur-UA,0,363,74,109,244,231,514,190,189,320,65,46,68,39,188.615385
Changeling,393,243,4806,370,1030,782,1231,935,698,658,6374,3004,4786,1671,1927.214286
Dragonborn,247,5198,3479,1274,4445,2024,9187,2973,11973,3276,5393,9072,7216,3342,4935.642857
Dwarf,29,233,255,162,1671,1563,347,809,133,315,405,585,311,237,503.928571
Elf,228,2459,687,449,11401,2437,2571,1941,1101,1104,916,1009,676,860,1988.5
Feral Tiefling,76,166,93,212,126,89,663,603,70,370,1108,115,287,458,316.857143


#### Those lists:

In [11]:
races = list(df['Race'].unique())

In [12]:
classes = list(df['Class'].unique())

In [13]:
free = 'AARAKOCRA DRAGONBORN DWARF ELF GENASI GNOME GOLIATH HALF-ELF HALF-ORC HALFLING HUMAN TIEFLING  AASIMAR'.title().split()

In [14]:
free

['Aarakocra',
 'Dragonborn',
 'Dwarf',
 'Elf',
 'Genasi',
 'Gnome',
 'Goliath',
 'Half-Elf',
 'Half-Orc',
 'Halfling',
 'Human',
 'Tiefling',
 'Aasimar']

#### And the Data Frames.
##### (We'll add to these later)

In [15]:
grouped_race = df.groupby('Race').sum().sort_values(by='Total').reset_index()

In [16]:
grouped_class = df.groupby('Class').sum().sort_values(by='Total').reset_index()

In [17]:
grouped_free_race = grouped_race[grouped_race['Race'].isin(free)]

In [18]:
grouped_premium_race = grouped_race[~grouped_race['Race'].isin(free)]

In [19]:
grouped_free_class = df[df['Race'].isin(free)].groupby('Class').sum().sort_values(by='Total').reset_index()

In [20]:
grouped_premium_class = df[~df['Race'].isin(free)].groupby('Class').sum().sort_values(by='Total').reset_index()

In [21]:
grouped_class

Unnamed: 0,Class,Total
0,Artificer,7843
1,Blood Hunter,16979
2,Druid,52138
3,Ranger,52678
4,Bard,54427
5,Monk,62305
6,Paladin,64533
7,Wizard,64688
8,Sorcerer,64934
9,Barbarian,74541


In [22]:
grouped_premium_class

Unnamed: 0,Class,Total
0,Artificer,4240
1,Blood Hunter,6330
2,Paladin,15679
3,Ranger,18346
4,Bard,19640
5,Sorcerer,20722
6,Wizard,21317
7,Cleric,24872
8,Druid,24964
9,Barbarian,25240


In [23]:
grouped_free_class

Unnamed: 0,Class,Total
0,Artificer,3603
1,Blood Hunter,10649
2,Druid,27174
3,Ranger,34332
4,Bard,34787
5,Monk,35756
6,Wizard,43371
7,Sorcerer,44212
8,Rogue,46272
9,Paladin,48854


### Quick Visuals
- Let's just get a peek at some simple visuals to show basic relationships

In [24]:
grouped_race.iplot(kind='bar', x='Race')

In [25]:
grouped_class.iplot(kind='bar', x='Class')

In [26]:
grouped_free_race.iplot(kind='bar', x='Race')

In [27]:
grouped_premium_race.iplot(kind='bar', x='Race')

In [28]:
RaceClass.iplot(kind='box')

# Answer the Questions!
What questions you ask? That's a great question!

Here, we will answer some basic questions, such as:
- What's the most stereotypical class for each race?
- What's The least stereotypical combinations?
- Can we see what percentage of the population each combination makes?

## Percentages:


### Ugly:

#### CrossTabs
We're going to make some crosstabs showing
- The percetage of the population that each combination makes up
- The percentage of the class each combination represents
- The percentage of the race each combination represtns

In [29]:
# Here we get the percentage of the population each combo makes up
total_percent  = pd.crosstab(df['Race'], df['Class'],values=df['Total'],
                           aggfunc='mean', normalize=True)

In [30]:
total_percent

Class,Artificer,Barbarian,Bard,Blood Hunter,Cleric,Druid,Fighter,Monk,Paladin,Ranger,Rogue,Sorcerer,Warlock,Wizard
Race,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
Aarakocra,0.000699,0.010502,0.002101,0.000995,0.007073,0.001945,0.012144,0.006038,0.004311,0.004378,0.003443,0.002073,0.00207,0.003323
Aasimar,0.00012,0.000129,0.001875,0.000137,0.001103,0.000283,0.001236,0.000353,0.001104,0.000385,0.001466,0.00216,0.002473,0.00172
Bugbear,8.7e-05,0.003682,0.000347,0.000305,0.000413,0.000372,0.003535,0.001237,0.00067,0.000966,0.002685,0.000215,0.000416,0.000246
Centaur,4.9e-05,0.001114,0.000156,0.000214,0.000983,0.000883,0.001138,0.00052,0.000475,0.000804,0.000228,0.000126,0.000182,0.000131
Centaur-UA,0.0,0.000416,8.5e-05,0.000125,0.000279,0.000265,0.000589,0.000218,0.000216,0.000367,7.4e-05,5.3e-05,7.8e-05,4.5e-05
Changeling,0.00045,0.000278,0.005505,0.000424,0.00118,0.000896,0.00141,0.001071,0.000799,0.000754,0.007301,0.003441,0.005482,0.001914
Dragonborn,0.000283,0.005954,0.003985,0.001459,0.005091,0.002318,0.010522,0.003405,0.013713,0.003752,0.006177,0.010391,0.008265,0.003828
Dwarf,3.3e-05,0.000267,0.000292,0.000186,0.001914,0.00179,0.000397,0.000927,0.000152,0.000361,0.000464,0.00067,0.000356,0.000271
Elf,0.000261,0.002816,0.000787,0.000514,0.013058,0.002791,0.002945,0.002223,0.001261,0.001264,0.001049,0.001156,0.000774,0.000985
Feral Tiefling,8.7e-05,0.00019,0.000107,0.000243,0.000144,0.000102,0.000759,0.000691,8e-05,0.000424,0.001269,0.000132,0.000329,0.000525


In [31]:
#Here is percentage of each race the combo reps
percent_of_race  = pd.crosstab(df['Race'], df['Class'],values=df['Total'],
                           aggfunc='mean', margins=True, normalize='index')

In [32]:
percent_of_race

Class,Artificer,Barbarian,Bard,Blood Hunter,Cleric,Druid,Fighter,Monk,Paladin,Ranger,Rogue,Sorcerer,Warlock,Wizard
Race,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
Aarakocra,0.011436,0.171897,0.034383,0.016292,0.115767,0.031834,0.198781,0.098838,0.070566,0.071654,0.056355,0.033933,0.033877,0.054387
Aasimar,0.008268,0.008898,0.128908,0.00945,0.075833,0.01945,0.084967,0.024254,0.075911,0.026459,0.100795,0.148516,0.170013,0.118277
Bugbear,0.005736,0.242642,0.022868,0.020075,0.027245,0.024528,0.232906,0.081509,0.044151,0.063623,0.176906,0.014189,0.027396,0.016226
Centaur,0.007032,0.159117,0.02224,0.030581,0.140311,0.126083,0.162551,0.074244,0.067866,0.1148,0.032543,0.017989,0.026002,0.018643
Centaur-UA,0.0,0.148042,0.030179,0.044454,0.099511,0.094209,0.209625,0.077488,0.07708,0.130506,0.026509,0.01876,0.027732,0.015905
Changeling,0.014566,0.009006,0.178125,0.013713,0.038175,0.028983,0.045625,0.034654,0.02587,0.024388,0.23624,0.111338,0.177384,0.061932
Dragonborn,0.003575,0.075225,0.050348,0.018437,0.064328,0.029291,0.132954,0.043025,0.173273,0.04741,0.078047,0.13129,0.10443,0.048365
Dwarf,0.004111,0.033026,0.036145,0.022962,0.236853,0.221545,0.049185,0.11467,0.018852,0.044649,0.057406,0.08292,0.044082,0.033593
Elf,0.00819,0.088329,0.024678,0.016128,0.409533,0.087539,0.092352,0.069722,0.039549,0.039657,0.032903,0.036244,0.024282,0.030892
Feral Tiefling,0.017133,0.037421,0.020965,0.047791,0.028404,0.020063,0.149459,0.135933,0.01578,0.083408,0.249775,0.025924,0.064698,0.103246


In [33]:
# This shows the percentag of the class each combo reps
perecent_of_class = pd.crosstab(df['Race'], df['Class'],values=df['Total'],
                           aggfunc='mean', normalize='columns', margins=True)

In [34]:
perecent_of_class

Class,Artificer,Barbarian,Bard,Blood Hunter,Cleric,Druid,Fighter,Monk,Paladin,Ranger,Rogue,Sorcerer,Warlock,Wizard,All
Race,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
Aarakocra,0.077776,0.123006,0.033697,0.051181,0.076891,0.032567,0.098946,0.084616,0.058327,0.072554,0.034599,0.027874,0.021597,0.044846,0.061052
Aasimar,0.013388,0.001516,0.030077,0.007068,0.011991,0.004737,0.010069,0.004943,0.014938,0.006378,0.014733,0.029045,0.025804,0.023219,0.014535
Bugbear,0.00969,0.043131,0.005567,0.015666,0.004495,0.006233,0.028798,0.017334,0.009065,0.016003,0.02698,0.002895,0.004339,0.003324,0.015166
Centaur,0.005483,0.013053,0.002499,0.011014,0.010684,0.014788,0.009276,0.007287,0.006431,0.013326,0.002291,0.001694,0.0019,0.001762,0.006999
Centaur-UA,0.0,0.00487,0.00136,0.00642,0.003038,0.004431,0.004797,0.00305,0.002929,0.006075,0.000748,0.000708,0.000813,0.000603,0.003022
Changeling,0.050108,0.00326,0.088302,0.021792,0.012825,0.014999,0.011488,0.015007,0.010816,0.012491,0.073366,0.046262,0.057202,0.025832,0.030882
Dragonborn,0.031493,0.069733,0.06392,0.075034,0.055349,0.03882,0.085732,0.047717,0.185533,0.062189,0.062074,0.139711,0.086245,0.051663,0.079089
Dwarf,0.003698,0.003126,0.004685,0.009541,0.020807,0.029978,0.003238,0.012985,0.002061,0.00598,0.004662,0.009009,0.003717,0.003664,0.008075
Elf,0.029071,0.032989,0.012622,0.026444,0.141964,0.046741,0.023992,0.031153,0.017061,0.020958,0.010543,0.015539,0.008079,0.013295,0.031864
Feral Tiefling,0.00969,0.002227,0.001709,0.012486,0.001569,0.001707,0.006187,0.009678,0.001085,0.007024,0.012753,0.001771,0.00343,0.00708,0.005077


#### DataFrames!
In this section, we merge the pivot tables we made earlier into a sing Pandas Data Frame. 

Our final Data Frame should show the count, %_of_pop, %_of_class, and %_of_race for each possible combination!

I don't know that this is entirely necessary, but we'll have it!

In [35]:
merged = pd.merge(df,percent_of_race.reset_index().melt(['Race'])
                 ).rename(columns={'value': 'Percent_of_Race'})


In [36]:
merged

Unnamed: 0,Race,Class,Total,Percent_of_Race
0,Human,Wizard,21665,0.196699
1,Half-Elf,Warlock,19173,0.206077
2,Human,Fighter,18920,0.171777
3,Half-Orc,Fighter,13922,0.170780
4,Half-Elf,Bard,12903,0.138685
5,Half-Orc,Barbarian,12377,0.151828
6,Dragonborn,Paladin,11973,0.173273
7,Goliath,Barbarian,11938,0.287989
8,Half-Elf,Sorcerer,11706,0.125820
9,Half-Orc,Ranger,11509,0.141180


In [37]:
almost_complete = pd.merge(merged, total_percent.reset_index().melt('Race')
                          ).rename(columns={'value':'Percent_of_Pop'})

In [38]:
completed = pd.merge(almost_complete,
                     pd.crosstab(df['Race'], df['Class'],values=df['Total'],
                           aggfunc='mean', normalize='columns')
                     .reset_index().melt('Race') #This is the same as 'percent_of_class'
        ).rename(columns={'value':'Percent_of_Class'})

In [39]:
completed

Unnamed: 0,Race,Class,Total,Percent_of_Race,Percent_of_Pop,Percent_of_Class
0,Human,Wizard,21665,0.196699,0.024814,0.334915
1,Half-Elf,Warlock,19173,0.206077,0.021960,0.229153
2,Human,Fighter,18920,0.171777,0.021670,0.176560
3,Half-Orc,Fighter,13922,0.170780,0.015946,0.129919
4,Half-Elf,Bard,12903,0.138685,0.014779,0.237070
5,Half-Orc,Barbarian,12377,0.151828,0.014176,0.166043
6,Dragonborn,Paladin,11973,0.173273,0.013713,0.185533
7,Goliath,Barbarian,11938,0.287989,0.013673,0.160153
8,Half-Elf,Sorcerer,11706,0.125820,0.013408,0.180275
9,Half-Orc,Ranger,11509,0.141180,0.013182,0.218478


In [40]:
completed.iplot(kind='heatmap', x='Class', y='Race', z='Percent_of_Race')

##### Free and Premium Gouping
- Break out free and premium then recalculate %'s for group

In [43]:
free_group = df[df['Race'].isin(free)]
premium = df[~df['Race'].isin(free)]

In [44]:
free_group = pd.merge(free_group,
                     pd.crosstab(free_group['Race'], free_group['Class'],values=free_group['Total'],
                           aggfunc='mean', normalize='columns')
                     .reset_index().melt('Race') #This is the same as 'percent_of_class'
        ).rename(columns={'value':'Percent_of_Class'})

In [45]:
free_group = pd.merge(free_group,
         pd.crosstab(free_group['Race'], free_group['Class'],values=free_group['Total'],
                           aggfunc='mean', normalize=True)
                     .reset_index().melt('Race') #This is the same as 'percent_of_pop'
        ).rename(columns={'value':'Percent_of_Group'})

In [46]:
free_group= pd.merge(free_group,
                     pd.crosstab(free_group['Race'], free_group['Class'],values=free_group['Total'],
                           aggfunc='mean', normalize='index')
                     .reset_index().melt('Race') #This is the same as 'percent_of_race'
        ).rename(columns={'value':'Percent_of_Race'})

In [47]:
free_group

Unnamed: 0,Race,Class,Total,Percent_of_Class,Percent_of_Group,Percent_of_Race
0,Human,Wizard,21665,0.499527,0.038371,0.196699
1,Half-Elf,Warlock,19173,0.340829,0.033958,0.206077
2,Human,Fighter,18920,0.253568,0.033509,0.171777
3,Half-Orc,Fighter,13922,0.186584,0.024657,0.170780
4,Half-Elf,Bard,12903,0.370914,0.022853,0.138685
5,Half-Orc,Barbarian,12377,0.251050,0.021921,0.151828
6,Dragonborn,Paladin,11973,0.245077,0.021206,0.173273
7,Goliath,Barbarian,11938,0.242145,0.021144,0.287989
8,Half-Elf,Sorcerer,11706,0.264770,0.020733,0.125820
9,Half-Orc,Ranger,11509,0.335227,0.020384,0.141180


In [48]:
premium = pd.merge(premium,
                     pd.crosstab(premium['Race'], premium['Class'],values=premium['Total'],
                           aggfunc='mean', normalize='columns')
                     .reset_index().melt('Race') #This is the same as 'percent_of_class'
        ).rename(columns={'value':'Percent_of_Class'})

In [49]:
premium = pd.merge(premium,
         pd.crosstab(premium['Race'], premium['Class'],values=premium['Total'],
                           aggfunc='mean', normalize=True)
                     .reset_index().melt('Race') #This is the same as 'percent_of_pop'
        ).rename(columns={'value':'Percent_of_Group'})

In [50]:
premium= pd.merge(premium,
                     pd.crosstab(premium['Race'], premium['Class'],values=premium['Total'],
                           aggfunc='mean', normalize='index')
                     .reset_index().melt('Race') #This is the same as 'percent_of_race'
        ).rename(columns={'value':'Percent_of_Race'})

In [51]:
premium

Unnamed: 0,Race,Class,Total,Percent_of_Class,Percent_of_Group,Percent_of_Race
0,Tabaxi,Rogue,10508,0.258767,0.034065,0.240871
1,Firbolg,Druid,8443,0.338207,0.027371,0.253103
2,Tabaxi,Monk,6569,0.247429,0.021296,0.150579
3,Changeling,Rogue,6374,0.156964,0.020664,0.236240
4,Firbolg,Cleric,5431,0.218358,0.017606,0.162810
5,Kenku,Rogue,5183,0.127635,0.016803,0.270865
6,Changeling,Bard,4806,0.244705,0.015580,0.178125
7,Changeling,Warlock,4786,0.174576,0.015515,0.177384
8,Tabaxi,Fighter,4506,0.138459,0.014608,0.103289
9,Firbolg,Warlock,4279,0.156082,0.013872,0.128275


#### Other Percentages!
In this section we find the percentage that:
- Each race represents of the pop
- Each class represents of the pop

In [52]:
#Finds the percentage of pop each class reps, 
#then adds that data to 'grouped_class'
grouped_class = pd.merge(grouped_class,
         pd.DataFrame.from_dict(
             orient='index', columns=['Percent'],
             data={grouped_class.iloc[i][0]:grouped_class.iloc[i][1] / 
                   grouped_class.sum()[1] for i in range(len(grouped_class))}
         ).reset_index().rename(columns={'index':'Class'}))



In [53]:
#Finds the percentage of pop each race reps, 
#then adds that data to 'grouped_race'
grouped_race = pd.merge(grouped_race,
         pd.DataFrame.from_dict(
             orient='index', columns=['Percent'],
             data={grouped_race.iloc[i][0]:grouped_race.iloc[i][1] / 
                   grouped_race.sum()[1] for i in range(len(grouped_race))}
         ).reset_index().rename(columns={'index':'Race'}))



In [54]:
grouped_race

Unnamed: 0,Race,Total,Percent
0,Viashino-UA,577,0.000661
1,Simic Hybrid-UA,588,0.000673
2,Vedalken-UA,787,0.000901
3,Loxodon-UA,1390,0.001592
4,Verdan,1414,0.00162
5,Centaur-UA,2452,0.002808
6,Minotaur-UA,2590,0.002966
7,Feral Tiefling,4436,0.005081
8,Simic Hybrid,5221,0.00598
9,Vedalken,5934,0.006797


#### Pretty:

#### Wordy

### Most Streotypical:

#### Ugly:

In [None]:
#Make a dictionary of most popular class for each race
pop_x_race = {i:[
    percent_of_race.loc[i].sort_values().reset_index().iloc[-1][0],
    percent_of_race.loc[i].sort_values().reset_index().iloc[-1][-1].round(4)
]for i in races}

In [None]:
#Make a dictionary with verbose explanation
pop_text = {i: f'The most popular class for {i} is {pop_x_race[i][0]}. '
            f'Which accounts for {(pop_x_race[i][1]*100).round(2)}% of the {i} race' for i in pop_x_race}

In [None]:
pop_text['Human']

#### Pretty:

#### Wordy

### Least Sterotypical:

#### Ugly:

In [None]:
#Make a dictionary of least popular class for each race
least_pop_x_race = {i:[
    percent_of_race.loc[i].sort_values().reset_index().iloc[0][0],
    percent_of_race.loc[i].sort_values().reset_index().iloc[0][1].round(4)
] for i in races}

In [None]:
#Make a dictionary with verbose explanation
least_pop_text = {i: f'The least popular class for {i} is {least_pop_x_race[i][0]}. '
            f'Which only accounts for {(least_pop_x_race[i][1]*100).round(2)}'
                  f'% of the {i} race' for i in least_pop_x_race}

In [None]:
least_pop_text['Human']

#### Pretty:

#### Wordy

### So, Who inhabits the world?