# Part 1:  Clean the files and combine them into one final DataFrame.

This dataframe should have the following columns:
- Hero (Just the name of the Hero)
- Publisher
- Gender
- Eye color
- Race
- Hair color
- Height (numeric)
- Skin color
- Alignment
- Weight (numeric)
- Plus, one-hot-encoded columns for every power that appears in the dataset. E.g.:
    - Agility
    - Flight
    - Superspeed
    - etc.

Hint: There is a space in "100 kg" or "52.5 cm"

## Import Libraries

In [1]:
## Standard Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
## Importing the OS and JSON Modules
import os,json

## Read in the Data

In [2]:
df = pd.read_csv('Data/superhero_info.csv')
df.info()
df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 463 entries, 0 to 462
Data columns (total 8 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Hero|Publisher  463 non-null    object
 1   Gender          463 non-null    object
 2   Race            463 non-null    object
 3   Alignment       463 non-null    object
 4   Hair color      463 non-null    object
 5   Eye color       463 non-null    object
 6   Skin color      463 non-null    object
 7   Measurements    463 non-null    object
dtypes: object(8)
memory usage: 29.1+ KB


Unnamed: 0,Hero|Publisher,Gender,Race,Alignment,Hair color,Eye color,Skin color,Measurements
0,A-Bomb|Marvel Comics,Male,Human,good,No Hair,yellow,Unknown,"{'Height': '203.0 cm', 'Weight': '441.0 kg'}"
1,Abe Sapien|Dark Horse Comics,Male,Icthyo Sapien,good,No Hair,blue,blue,"{'Height': '191.0 cm', 'Weight': '65.0 kg'}"
2,Abin Sur|DC Comics,Male,Ungaran,good,No Hair,blue,red,"{'Height': '185.0 cm', 'Weight': '90.0 kg'}"
3,Abomination|Marvel Comics,Male,Human / Radiation,bad,No Hair,green,Unknown,"{'Height': '203.0 cm', 'Weight': '441.0 kg'}"
4,Absorbing Man|Marvel Comics,Male,Human,bad,No Hair,blue,Unknown,"{'Height': '193.0 cm', 'Weight': '122.0 kg'}"


In [3]:
df2 = pd.read_csv('Data/superhero_powers.csv')
df2.info()
df2.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 667 entries, 0 to 666
Data columns (total 2 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   hero_names  667 non-null    object
 1   Powers      667 non-null    object
dtypes: object(2)
memory usage: 10.5+ KB


Unnamed: 0,hero_names,Powers
0,3-D Man,"Agility,Super Strength,Stamina,Super Speed"
1,A-Bomb,"Accelerated Healing,Durability,Longevity,Super..."
2,Abe Sapien,"Agility,Accelerated Healing,Cold Resistance,Du..."
3,Abin Sur,Lantern Power Ring
4,Abomination,"Accelerated Healing,Intelligence,Super Strengt..."


## Split Hero|Publisher into two columns

In [4]:
# Exploring existing format with a few examples
df['Hero|Publisher'].head(2)

0            A-Bomb|Marvel Comics
1    Abe Sapien|Dark Horse Comics
Name: Hero|Publisher, dtype: object

In [5]:
## adding expand=True
df['Hero|Publisher'].str.split('|',expand=True)

Unnamed: 0,0,1
0,A-Bomb,Marvel Comics
1,Abe Sapien,Dark Horse Comics
2,Abin Sur,DC Comics
3,Abomination,Marvel Comics
4,Absorbing Man,Marvel Comics
...,...,...
458,Yellowjacket,Marvel Comics
459,Yellowjacket II,Marvel Comics
460,Yoda,George Lucas
461,Zatanna,DC Comics


In [6]:
## save the 2 new columns into the dataframe
df[['Hero','Publisher']] = df['Hero|Publisher'].str.split('|',expand=True)
df.head(2)

Unnamed: 0,Hero|Publisher,Gender,Race,Alignment,Hair color,Eye color,Skin color,Measurements,Hero,Publisher
0,A-Bomb|Marvel Comics,Male,Human,good,No Hair,yellow,Unknown,"{'Height': '203.0 cm', 'Weight': '441.0 kg'}",A-Bomb,Marvel Comics
1,Abe Sapien|Dark Horse Comics,Male,Icthyo Sapien,good,No Hair,blue,blue,"{'Height': '191.0 cm', 'Weight': '65.0 kg'}",Abe Sapien,Dark Horse Comics


In [7]:
## drop the original column 
df = df.drop(columns=['Hero|Publisher'])
df.head(2)

Unnamed: 0,Gender,Race,Alignment,Hair color,Eye color,Skin color,Measurements,Hero,Publisher
0,Male,Human,good,No Hair,yellow,Unknown,"{'Height': '203.0 cm', 'Weight': '441.0 kg'}",A-Bomb,Marvel Comics
1,Male,Icthyo Sapien,good,No Hair,blue,blue,"{'Height': '191.0 cm', 'Weight': '65.0 kg'}",Abe Sapien,Dark Horse Comics


## Split Measurements Column

In [8]:
df['Measurements'].head(3)

0    {'Height': '203.0 cm', 'Weight': '441.0 kg'}
1     {'Height': '191.0 cm', 'Weight': '65.0 kg'}
2     {'Height': '185.0 cm', 'Weight': '90.0 kg'}
Name: Measurements, dtype: object

In [9]:
#check data type
type(df['Measurements'][0])

str

In [10]:
# replacing single quotes with double quotes
df['Measurements'] = df['Measurements'].str.replace("'", '"')
df['Measurements']

0      {"Height": "203.0 cm", "Weight": "441.0 kg"}
1       {"Height": "191.0 cm", "Weight": "65.0 kg"}
2       {"Height": "185.0 cm", "Weight": "90.0 kg"}
3      {"Height": "203.0 cm", "Weight": "441.0 kg"}
4      {"Height": "193.0 cm", "Weight": "122.0 kg"}
                           ...                     
458     {"Height": "183.0 cm", "Weight": "83.0 kg"}
459     {"Height": "165.0 cm", "Weight": "52.0 kg"}
460      {"Height": "66.0 cm", "Weight": "17.0 kg"}
461     {"Height": "170.0 cm", "Weight": "57.0 kg"}
462     {"Height": "185.0 cm", "Weight": "81.0 kg"}
Name: Measurements, Length: 463, dtype: object

- Now that we have double quotes inside our string, we can use json.loads

In [11]:
df['Measurements'] = df['Measurements'].apply(json.loads)
df['Measurements'].head(3)

0    {'Height': '203.0 cm', 'Weight': '441.0 kg'}
1     {'Height': '191.0 cm', 'Weight': '65.0 kg'}
2     {'Height': '185.0 cm', 'Weight': '90.0 kg'}
Name: Measurements, dtype: object

In [12]:
#check data type
type(df['Measurements'][0])

dict

In [13]:
#Convert into two different columns 
height_weight = df['Measurements'].apply(pd.Series)
height_weight

Unnamed: 0,Height,Weight
0,203.0 cm,441.0 kg
1,191.0 cm,65.0 kg
2,185.0 cm,90.0 kg
3,203.0 cm,441.0 kg
4,193.0 cm,122.0 kg
...,...,...
458,183.0 cm,83.0 kg
459,165.0 cm,52.0 kg
460,66.0 cm,17.0 kg
461,170.0 cm,57.0 kg


In [14]:
# concat height_weight with original dataframe
df = pd.concat((df, height_weight), axis = 1)
df.head(2)

Unnamed: 0,Gender,Race,Alignment,Hair color,Eye color,Skin color,Measurements,Hero,Publisher,Height,Weight
0,Male,Human,good,No Hair,yellow,Unknown,"{'Height': '203.0 cm', 'Weight': '441.0 kg'}",A-Bomb,Marvel Comics,203.0 cm,441.0 kg
1,Male,Icthyo Sapien,good,No Hair,blue,blue,"{'Height': '191.0 cm', 'Weight': '65.0 kg'}",Abe Sapien,Dark Horse Comics,191.0 cm,65.0 kg


In [15]:
# Drop the original column now
df = df.drop(columns=['Measurements'])

In [16]:
df.head(3)

Unnamed: 0,Gender,Race,Alignment,Hair color,Eye color,Skin color,Hero,Publisher,Height,Weight
0,Male,Human,good,No Hair,yellow,Unknown,A-Bomb,Marvel Comics,203.0 cm,441.0 kg
1,Male,Icthyo Sapien,good,No Hair,blue,blue,Abe Sapien,Dark Horse Comics,191.0 cm,65.0 kg
2,Male,Ungaran,good,No Hair,blue,red,Abin Sur,DC Comics,185.0 cm,90.0 kg


## Convert Height and Weight to Numeric

In [17]:
# Check current data types
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 463 entries, 0 to 462
Data columns (total 10 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Gender      463 non-null    object
 1   Race        463 non-null    object
 2   Alignment   463 non-null    object
 3   Hair color  463 non-null    object
 4   Eye color   463 non-null    object
 5   Skin color  463 non-null    object
 6   Hero        463 non-null    object
 7   Publisher   463 non-null    object
 8   Height      463 non-null    object
 9   Weight      463 non-null    object
dtypes: object(10)
memory usage: 36.3+ KB


### Height Column

In [18]:
df.rename(columns={'Height': 'Height (cm)', 'Weight': 'Weight (kg)'},
              inplace=True)
df.head(2)

Unnamed: 0,Gender,Race,Alignment,Hair color,Eye color,Skin color,Hero,Publisher,Height (cm),Weight (kg)
0,Male,Human,good,No Hair,yellow,Unknown,A-Bomb,Marvel Comics,203.0 cm,441.0 kg
1,Male,Icthyo Sapien,good,No Hair,blue,blue,Abe Sapien,Dark Horse Comics,191.0 cm,65.0 kg


In [19]:
type(df['Height (cm)'][0])

str

In [20]:
# remove empty space and 'cm' from 'Height (cm)'

to_replace = [' ', 'cm']

for char in to_replace:
    df['Height (cm)'] = df['Height (cm)'].str.replace(char, '', regex=False)
df.head(2)

Unnamed: 0,Gender,Race,Alignment,Hair color,Eye color,Skin color,Hero,Publisher,Height (cm),Weight (kg)
0,Male,Human,good,No Hair,yellow,Unknown,A-Bomb,Marvel Comics,203.0,441.0 kg
1,Male,Icthyo Sapien,good,No Hair,blue,blue,Abe Sapien,Dark Horse Comics,191.0,65.0 kg


In [21]:
#Change data type to float 
df['Height (cm)'] = df['Height (cm)'].astype('float')
type(df['Height (cm)'][0])

numpy.float64

In [22]:
#Check data type again
type(df['Height (cm)'][0])

numpy.float64

### Weight Column

In [23]:
#Check data type 
type(df['Weight (kg)'][0])

str

In [24]:
# remove empty space and 'cm' from 'Height (cm)'

to_replace = [' ', 'kg']

for char in to_replace:
    df['Weight (kg)'] = df['Weight (kg)'].str.replace(char, '', regex=False)
df.head(2)

Unnamed: 0,Gender,Race,Alignment,Hair color,Eye color,Skin color,Hero,Publisher,Height (cm),Weight (kg)
0,Male,Human,good,No Hair,yellow,Unknown,A-Bomb,Marvel Comics,203.0,441.0
1,Male,Icthyo Sapien,good,No Hair,blue,blue,Abe Sapien,Dark Horse Comics,191.0,65.0


In [25]:
#Change data type to float 
df['Weight (kg)'] = df['Weight (kg)'].astype('float')
type(df['Weight (kg)'][0])

numpy.float64

In [26]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 463 entries, 0 to 462
Data columns (total 10 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Gender       463 non-null    object 
 1   Race         463 non-null    object 
 2   Alignment    463 non-null    object 
 3   Hair color   463 non-null    object 
 4   Eye color    463 non-null    object 
 5   Skin color   463 non-null    object 
 6   Hero         463 non-null    object 
 7   Publisher    463 non-null    object 
 8   Height (cm)  463 non-null    float64
 9   Weight (kg)  463 non-null    float64
dtypes: float64(2), object(8)
memory usage: 36.3+ KB


## One Hot Encode Powers Column

In [27]:
#Checking data type
print(type(df2['Powers'][0]))
df2['Powers'][0]

<class 'str'>


'Agility,Super Strength,Stamina,Super Speed'

In [28]:
#Convert Powers column to list
df2['Powers_Split'] = df2['Powers'].str.split(',')

In [29]:
#checking data type 
print(type(df2['Powers_Split'][0]))
df2['Powers_Split']

<class 'list'>


0        [Agility, Super Strength, Stamina, Super Speed]
1      [Accelerated Healing, Durability, Longevity, S...
2      [Agility, Accelerated Healing, Cold Resistance...
3                                   [Lantern Power Ring]
4      [Accelerated Healing, Intelligence, Super Stre...
                             ...                        
662               [Flight, Energy Blasts, Size Changing]
663    [Cold Resistance, Durability, Longevity, Super...
664    [Agility, Stealth, Danger Sense, Marksmanship,...
665    [Cryokinesis, Telepathy, Magic, Fire Control, ...
666    [Super Speed, Intangibility, Time Travel, Time...
Name: Powers_Split, Length: 667, dtype: object

In [30]:
df2['Powers_Split'].value_counts()

[Intelligence]                                                                                                                                                                                                                                                                          8
[Durability, Super Strength]                                                                                                                                                                                                                                                            5
[Agility, Stealth, Marksmanship, Weapons Master, Stamina]                                                                                                                                                                                                                               4
[Marksmanship]                                                                                                                                            

In [31]:
## exploding the column of lists
exploded = df2.explode('Powers_Split')
exploded.head(5)

Unnamed: 0,hero_names,Powers,Powers_Split
0,3-D Man,"Agility,Super Strength,Stamina,Super Speed",Agility
0,3-D Man,"Agility,Super Strength,Stamina,Super Speed",Super Strength
0,3-D Man,"Agility,Super Strength,Stamina,Super Speed",Stamina
0,3-D Man,"Agility,Super Strength,Stamina,Super Speed",Super Speed
1,A-Bomb,"Accelerated Healing,Durability,Longevity,Super...",Accelerated Healing


Now we will take the .unique values from this exploded column as our list of columns to create from our original non-exploded dataframe.

Importantly, we do not want to save NaN to our list, so we can use .dropna() right before our .unique()

In [32]:
## saving the unique values from the exploded column
cols_to_make = exploded['Powers_Split'].dropna().unique()
cols_to_make

array(['Agility', 'Super Strength', 'Stamina', 'Super Speed',
       'Accelerated Healing', 'Durability', 'Longevity', 'Camouflage',
       'Self-Sustenance', 'Cold Resistance', 'Underwater breathing',
       'Marksmanship', 'Weapons Master', 'Intelligence', 'Telepathy',
       'Immortality', 'Reflexes', 'Enhanced Sight', 'Sub-Mariner',
       'Lantern Power Ring', 'Invulnerability', 'Animation',
       'Super Breath', 'Dimensional Awareness', 'Flight', 'Size Changing',
       'Teleportation', 'Magic', 'Dimensional Travel',
       'Molecular Manipulation', 'Energy Manipulation', 'Power Cosmic',
       'Energy Absorption', 'Elemental Transmogrification',
       'Fire Resistance', 'Natural Armor', 'Heat Resistance',
       'Matter Absorption', 'Regeneration', 'Stealth', 'Power Suit',
       'Energy Blasts', 'Energy Beams', 'Heat Generation', 'Danger Sense',
       'Phasing', 'Force Fields', 'Hypnokinesis', 'Invisibility',
       'Enhanced Senses', 'Jump', 'Shapeshifting', 'Elasticity',
 

### Using a For Loop and .str.contains to create the new columns

In [33]:
for col in cols_to_make:
    df2[col] = df2['Powers'].str.contains(col)
df2.head()

  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col] = df2['Powers'].str.contains(col)
  df2[col]

Unnamed: 0,hero_names,Powers,Powers_Split,Agility,Super Strength,Stamina,Super Speed,Accelerated Healing,Durability,Longevity,...,Weather Control,Omnipresent,Omniscient,Hair Manipulation,Nova Force,Odin Force,Phoenix Force,Intuitive aptitude,Melting,Changing Armor
0,3-D Man,"Agility,Super Strength,Stamina,Super Speed","[Agility, Super Strength, Stamina, Super Speed]",True,True,True,True,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1,A-Bomb,"Accelerated Healing,Durability,Longevity,Super...","[Accelerated Healing, Durability, Longevity, S...",False,True,True,False,True,True,True,...,False,False,False,False,False,False,False,False,False,False
2,Abe Sapien,"Agility,Accelerated Healing,Cold Resistance,Du...","[Agility, Accelerated Healing, Cold Resistance...",True,True,True,False,True,True,True,...,False,False,False,False,False,False,False,False,False,False
3,Abin Sur,Lantern Power Ring,[Lantern Power Ring],False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4,Abomination,"Accelerated Healing,Intelligence,Super Strengt...","[Accelerated Healing, Intelligence, Super Stre...",False,True,True,True,True,False,False,...,False,False,False,False,False,False,False,False,False,False


Let's drop the original Powers column and our Powers_Split column now that we have confirmed our new columns were created correctly.

In [35]:
# drop Powers and Powers_Split columns
df2.drop(columns=['Powers', 'Powers_Split'], inplace=True)
df2.head(3)

Unnamed: 0,hero_names,Agility,Super Strength,Stamina,Super Speed,Accelerated Healing,Durability,Longevity,Camouflage,Self-Sustenance,...,Weather Control,Omnipresent,Omniscient,Hair Manipulation,Nova Force,Odin Force,Phoenix Force,Intuitive aptitude,Melting,Changing Armor
0,3-D Man,True,True,True,True,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1,A-Bomb,False,True,True,False,True,True,True,True,True,...,False,False,False,False,False,False,False,False,False,False
2,Abe Sapien,True,True,True,False,True,True,True,False,False,...,False,False,False,False,False,False,False,False,False,False


## Combining both Dataframes into one

In [36]:
df_combined = pd.merge(df, df2, left_on='Hero', right_on='hero_names')
df_combined.head(2)

Unnamed: 0,Gender,Race,Alignment,Hair color,Eye color,Skin color,Hero,Publisher,Height (cm),Weight (kg),...,Weather Control,Omnipresent,Omniscient,Hair Manipulation,Nova Force,Odin Force,Phoenix Force,Intuitive aptitude,Melting,Changing Armor
0,Male,Human,good,No Hair,yellow,Unknown,A-Bomb,Marvel Comics,203.0,441.0,...,False,False,False,False,False,False,False,False,False,False
1,Male,Icthyo Sapien,good,No Hair,blue,blue,Abe Sapien,Dark Horse Comics,191.0,65.0,...,False,False,False,False,False,False,False,False,False,False


In [38]:
print(df_combined.info(verbose=True))

<class 'pandas.core.frame.DataFrame'>
Int64Index: 463 entries, 0 to 462
Data columns (total 178 columns):
 #    Column                        Dtype  
---   ------                        -----  
 0    Gender                        object 
 1    Race                          object 
 2    Alignment                     object 
 3    Hair color                    object 
 4    Eye color                     object 
 5    Skin color                    object 
 6    Hero                          object 
 7    Publisher                     object 
 8    Height (cm)                   float64
 9    Weight (kg)                   float64
 10   hero_names                    object 
 11   Agility                       bool   
 12   Super Strength                bool   
 13   Stamina                       bool   
 14   Super Speed                   bool   
 15   Accelerated Healing           bool   
 16   Durability                    bool   
 17   Longevity                     bool   
 18   Camoufla

In [43]:
#drop hero_names since it's a duplicate column of "Hero"
df_combined= df_combined.drop(columns=['hero_names'])

In [44]:
print(df_combined.info(verbose=True))

<class 'pandas.core.frame.DataFrame'>
Int64Index: 463 entries, 0 to 462
Data columns (total 177 columns):
 #    Column                        Dtype  
---   ------                        -----  
 0    Gender                        object 
 1    Race                          object 
 2    Alignment                     object 
 3    Hair color                    object 
 4    Eye color                     object 
 5    Skin color                    object 
 6    Hero                          object 
 7    Publisher                     object 
 8    Height (cm)                   float64
 9    Weight (kg)                   float64
 10   Agility                       bool   
 11   Super Strength                bool   
 12   Stamina                       bool   
 13   Super Speed                   bool   
 14   Accelerated Healing           bool   
 15   Durability                    bool   
 16   Longevity                     bool   
 17   Camouflage                    bool   
 18   Self-Sus

# Part 2: Use your combined DataFrame to answer the following questions.

## 1. Compare the average weight of super powers who have Super Speed to those who do not.


In [45]:
np.where(cols_to_make == 'Super Speed')

(array([3], dtype=int64),)

In [51]:
heroes_speed = df_combined['Super Speed'] == True
df_speed = df_combined[heroes_speed]
df_speed

Unnamed: 0,Gender,Race,Alignment,Hair color,Eye color,Skin color,Hero,Publisher,Height (cm),Weight (kg),...,Weather Control,Omnipresent,Omniscient,Hair Manipulation,Nova Force,Odin Force,Phoenix Force,Intuitive aptitude,Melting,Changing Armor
3,Male,Human / Radiation,bad,No Hair,green,Unknown,Abomination,Marvel Comics,203.0,441.0,...,False,False,False,False,False,False,False,False,False,False
5,Male,Human,good,Blond,blue,Unknown,Adam Strange,DC Comics,185.0,88.0,...,False,False,False,False,False,False,False,False,False,False
8,Male,Unknown,bad,White,blue,Unknown,Air-Walker,Marvel Comics,188.0,108.0,...,False,False,False,False,False,False,False,False,False,False
9,Male,Cyborg,bad,Black,brown,Unknown,Ajax,Marvel Comics,193.0,90.0,...,False,False,False,False,False,False,False,False,False,False
10,Male,Unknown,good,Blond,blue,Unknown,Alan Scott,DC Comics,180.0,90.0,...,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
453,Female,Demi-God,good,Blond,blue,Unknown,Wonder Girl,DC Comics,165.0,51.0,...,False,False,False,False,False,False,False,False,False,False
454,Male,Unknown,good,Black,red,Unknown,Wonder Man,Marvel Comics,188.0,171.0,...,False,False,False,False,False,False,False,False,False,False
455,Female,Amazon,good,Black,blue,Unknown,Wonder Woman,DC Comics,183.0,74.0,...,False,False,False,False,False,False,False,False,False,False
460,Male,Yoda's species,good,White,brown,green,Yoda,George Lucas,66.0,17.0,...,False,False,False,False,False,False,False,False,False,False


In [53]:
df_not_speed = df_combined[~heroes_speed]
df_not_speed

Unnamed: 0,Gender,Race,Alignment,Hair color,Eye color,Skin color,Hero,Publisher,Height (cm),Weight (kg),...,Weather Control,Omnipresent,Omniscient,Hair Manipulation,Nova Force,Odin Force,Phoenix Force,Intuitive aptitude,Melting,Changing Armor
0,Male,Human,good,No Hair,yellow,Unknown,A-Bomb,Marvel Comics,203.0,441.0,...,False,False,False,False,False,False,False,False,False,False
1,Male,Icthyo Sapien,good,No Hair,blue,blue,Abe Sapien,Dark Horse Comics,191.0,65.0,...,False,False,False,False,False,False,False,False,False,False
2,Male,Ungaran,good,No Hair,blue,red,Abin Sur,DC Comics,185.0,90.0,...,False,False,False,False,False,False,False,False,False,False
4,Male,Human,bad,No Hair,blue,Unknown,Absorbing Man,Marvel Comics,193.0,122.0,...,False,False,False,False,False,False,False,False,False,False
6,Male,Human,good,Brown,brown,Unknown,Agent Bob,Marvel Comics,178.0,81.0,...,False,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
456,Female,Mutant / Clone,good,Black,green,Unknown,X-23,Marvel Comics,155.0,50.0,...,False,False,False,False,False,False,False,False,False,False
457,Male,Unknown,good,Brown,blue,Unknown,X-Man,Marvel Comics,175.0,61.0,...,False,False,False,False,False,False,False,False,False,False
458,Male,Human,good,Blond,blue,Unknown,Yellowjacket,Marvel Comics,183.0,83.0,...,False,False,False,False,False,False,False,False,False,False
459,Female,Human,good,Strawberry Blond,blue,Unknown,Yellowjacket II,Marvel Comics,165.0,52.0,...,False,False,False,False,False,False,False,False,False,False


In [58]:
avg_speed_weight = df_speed['Weight (kg)'].mean().round(2)
avg_not_speed_weight = df_not_speed['Weight (kg)'].mean().round(2)
print(f'The average weight (kg) for Super Heroes that have "Super Speed" is {avg_speed_weight}')
print(f'The average weight (kg) for Super Heroes that do not have "Super Speed" is {avg_not_speed_weight}')

The average weight (kg) for Super Heroes that have "Super Speed" is 129.4
The average weight (kg) for Super Heroes that do not have "Super Speed" is 101.77


## 2. What is the average height of heroes for each publisher?

In [59]:
publisher_height = df.groupby('Publisher')['Height (cm)'].mean().sort_values(ascending=False)
publisher_height

Publisher
Image Comics         211.000000
Marvel Comics        191.546128
DC Comics            181.923913
Star Trek            181.500000
Team Epic TV         180.750000
Unknown              178.000000
Dark Horse Comics    176.909091
Shueisha             171.500000
George Lucas         159.600000
Name: Height (cm), dtype: float64