# **Applying Advanced Transformations (Core)**

- Yvon Bilodeau
- May 2022

## **Source of Data**

You will be working with a heavily modified version of the Superheroes dataset from Kaggle.

The dataset includes two csv's:

- [superhero_info.csv:](https://docs.google.com/spreadsheets/d/e/2PACX-1vS1ZstYLwFgwhZnqDsPjtnlHYhJp_cmW55J8JD5mym0seRsaem3px7QBtuFF0LiI7z1PLCkVKAkdO7J/pub?output=csv)
 - Contains Name, Publisher, Demographic Info, and Body measurements.
- [superhero_powers.csv:](https://docs.google.com/spreadsheets/d/e/2PACX-1vSzdWOBaXOoz52vPmCFV5idNlDBohLY1Lsbc1IfZIZQ7cV_aNB2wYBfhF49uE1TaO1B5MQCGWiNrFfd/pub?output=csv)
 - Contains Hero name and list of powers
 
 [Source](https://www.kaggle.com/datasets/claudiodavi/superhero-set)

## **The Task**

**I. Clean the files and combine them into one final DataFrame.**

This dataframe should have the following columns:

- Hero (Just the name of the Hero)
- Publisher
- Gender
- Eye color
- Race
- Hair color
- Height (numeric)
- Skin color
- Alignment
- Weight (numeric)

Plus, one-hot-encoded columns for every power that appears in the dataset. E.g.:
- Agility
- Flight
- Superspeed
- etc.

Hint: There is a space in "100 kg" or "52.5 cm"

**II. Use your combined DataFrame to answer the following questions.**

1. Compare the average weight of super powers who have Super Speed to those who do not.
2. What is the average height of heroes for each publisher?

## **Import Libraries**

In [1]:
# Standard Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import json

## **Load the Data**

In [2]:
df_1 = pd.read_csv('https://docs.google.com/spreadsheets/d/e/2PACX-1vS1ZstYLwFgwhZnqDsPjtnlHYhJp_cmW55J8JD5mym0seRsaem3px7QBtuFF0LiI7z1PLCkVKAkdO7J/pub?output=csv')
df_2 = pd.read_csv('https://docs.google.com/spreadsheets/d/e/2PACX-1vSzdWOBaXOoz52vPmCFV5idNlDBohLY1Lsbc1IfZIZQ7cV_aNB2wYBfhF49uE1TaO1B5MQCGWiNrFfd/pub?output=csv')

## **Inspect the Data**

### **Superhero_info**

#### **Display First (10) Rows**

In [3]:
# Display the first (10) rows of the dataframe
df_1.head(10)

Unnamed: 0,Hero|Publisher,Gender,Race,Alignment,Hair color,Eye color,Skin color,Measurements
0,A-Bomb|Marvel Comics,Male,Human,good,No Hair,yellow,Unknown,"{'Height': '203.0 cm', 'Weight': '441.0 kg'}"
1,Abe Sapien|Dark Horse Comics,Male,Icthyo Sapien,good,No Hair,blue,blue,"{'Height': '191.0 cm', 'Weight': '65.0 kg'}"
2,Abin Sur|DC Comics,Male,Ungaran,good,No Hair,blue,red,"{'Height': '185.0 cm', 'Weight': '90.0 kg'}"
3,Abomination|Marvel Comics,Male,Human / Radiation,bad,No Hair,green,Unknown,"{'Height': '203.0 cm', 'Weight': '441.0 kg'}"
4,Absorbing Man|Marvel Comics,Male,Human,bad,No Hair,blue,Unknown,"{'Height': '193.0 cm', 'Weight': '122.0 kg'}"
5,Adam Strange|DC Comics,Male,Human,good,Blond,blue,Unknown,"{'Height': '185.0 cm', 'Weight': '88.0 kg'}"
6,Agent Bob|Marvel Comics,Male,Human,good,Brown,brown,Unknown,"{'Height': '178.0 cm', 'Weight': '81.0 kg'}"
7,Agent Zero|Marvel Comics,Male,Unknown,good,Unknown,Unknown,Unknown,"{'Height': '191.0 cm', 'Weight': '104.0 kg'}"
8,Air-Walker|Marvel Comics,Male,Unknown,bad,White,blue,Unknown,"{'Height': '188.0 cm', 'Weight': '108.0 kg'}"
9,Ajax|Marvel Comics,Male,Cyborg,bad,Black,brown,Unknown,"{'Height': '193.0 cm', 'Weight': '90.0 kg'}"


#### **Display the Row and Column Count**

In [4]:
# Display the number of rows and columns for the dataframe
df_1.shape
print(f'There are {df_1.shape[0]} rows, and {df_1.shape[1]} columns.')

There are 463 rows, and 8 columns.


#### **Display Data Types**

In [5]:
# Display the column names and datatypes for each column
# Columns with mixed datatypes are identified as an object datatype
df_1.dtypes

Hero|Publisher    object
Gender            object
Race              object
Alignment         object
Hair color        object
Eye color         object
Skin color        object
Measurements      object
dtype: object

#### **Display Column Names, Count of Non-Null Values, and Data Types**

In [6]:
# Display the column names, count of non-null values, and their datatypes
df_1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 463 entries, 0 to 462
Data columns (total 8 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Hero|Publisher  463 non-null    object
 1   Gender          463 non-null    object
 2   Race            463 non-null    object
 3   Alignment       463 non-null    object
 4   Hair color      463 non-null    object
 5   Eye color       463 non-null    object
 6   Skin color      463 non-null    object
 7   Measurements    463 non-null    object
dtypes: object(8)
memory usage: 29.1+ KB


### **Superhero_powers**

#### **Display First (10) Rows**

In [7]:
# Display the first (10) rows of the dataframe
df_2.head(10)

Unnamed: 0,hero_names,Powers
0,3-D Man,"Agility,Super Strength,Stamina,Super Speed"
1,A-Bomb,"Accelerated Healing,Durability,Longevity,Super..."
2,Abe Sapien,"Agility,Accelerated Healing,Cold Resistance,Du..."
3,Abin Sur,Lantern Power Ring
4,Abomination,"Accelerated Healing,Intelligence,Super Strengt..."
5,Abraxas,"Dimensional Awareness,Flight,Intelligence,Supe..."
6,Absorbing Man,"Cold Resistance,Durability,Energy Absorption,S..."
7,Adam Monroe,"Accelerated Healing,Immortality,Regeneration"
8,Adam Strange,"Durability,Stealth,Flight,Marksmanship,Weapons..."
9,Agent Bob,Stealth


#### **Display Column Names, Count of Non-Null Values, and Data Types**

In [8]:
# Display the number of rows and columns for the dataframe
df_2.shape
print(f'There are {df_2.shape[0]} rows, and {df_2.shape[1]} columns.')

There are 667 rows, and 2 columns.


#### **Display Data Types**

In [9]:
# Display the column names and datatypes for each column
# Columns with mixed datatypes are identified as an object datatype
df_2.dtypes

hero_names    object
Powers        object
dtype: object

#### **Display Column Names, Count of Non-Null Values, and Data Types**

In [10]:
# Display the column names, count of non-null values, and their datatypes
df_2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 667 entries, 0 to 666
Data columns (total 2 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   hero_names  667 non-null    object
 1   Powers      667 non-null    object
dtypes: object(2)
memory usage: 10.5+ KB


## **Clean the Data**

### **Superhero_info**

#### **Remove Unecessary Rows**


In [11]:
# Display the number of duplicate rows in the dataset
print(f'There are {df_1.duplicated().sum()} duplicate rows.')

There are 0 duplicate rows.


#### **Missing Values**

- There are no missing values.

### **Superhero_powers**

#### **Remove Unecessary Rows**

In [12]:
# Display the number of duplicate rows in the dataset
print(f'There are {df_2.duplicated().sum()} duplicate rows.')

There are 0 duplicate rows.


#### **Missing Values**

- There are no missing values

## **Transformations**

### **Superhero_info**

In [13]:
# Display the first (2) rows of the dataframe
df_1.head(2)

Unnamed: 0,Hero|Publisher,Gender,Race,Alignment,Hair color,Eye color,Skin color,Measurements
0,A-Bomb|Marvel Comics,Male,Human,good,No Hair,yellow,Unknown,"{'Height': '203.0 cm', 'Weight': '441.0 kg'}"
1,Abe Sapien|Dark Horse Comics,Male,Icthyo Sapien,good,No Hair,blue,blue,"{'Height': '191.0 cm', 'Weight': '65.0 kg'}"


#### **Hero|Publisher column**

In [14]:
## adding expand=True
df_1['Hero|Publisher'].str.split('|',expand=True)

Unnamed: 0,0,1
0,A-Bomb,Marvel Comics
1,Abe Sapien,Dark Horse Comics
2,Abin Sur,DC Comics
3,Abomination,Marvel Comics
4,Absorbing Man,Marvel Comics
...,...,...
458,Yellowjacket,Marvel Comics
459,Yellowjacket II,Marvel Comics
460,Yoda,George Lucas
461,Zatanna,DC Comics


In [15]:
## Save the 2 new columns into the dataframe
df_1[['Name','Publisher']] = df_1['Hero|Publisher'].str.split('|', expand=True)

In [16]:
## drop the original column 
df_1 = df_1.drop(columns=['Hero|Publisher'])

In [17]:
# Display the first (2) rows of the dataframe
df_1.head(2)

Unnamed: 0,Gender,Race,Alignment,Hair color,Eye color,Skin color,Measurements,Name,Publisher
0,Male,Human,good,No Hair,yellow,Unknown,"{'Height': '203.0 cm', 'Weight': '441.0 kg'}",A-Bomb,Marvel Comics
1,Male,Icthyo Sapien,good,No Hair,blue,blue,"{'Height': '191.0 cm', 'Weight': '65.0 kg'}",Abe Sapien,Dark Horse Comics


In [18]:
# Pop Name column
first_column = df_1.pop('Name')

In [19]:
# Insert Name column as the first column
df_1.insert(0, 'Name', first_column)

In [20]:
# Display the first (2) rows of the dataframe
df_1.head(2)

Unnamed: 0,Name,Gender,Race,Alignment,Hair color,Eye color,Skin color,Measurements,Publisher
0,A-Bomb,Male,Human,good,No Hair,yellow,Unknown,"{'Height': '203.0 cm', 'Weight': '441.0 kg'}",Marvel Comics
1,Abe Sapien,Male,Icthyo Sapien,good,No Hair,blue,blue,"{'Height': '191.0 cm', 'Weight': '65.0 kg'}",Dark Horse Comics


#### **Measurements column**

##### **Convert column sample from string to dictionary**

In [21]:
# Create a sample value
test_msm = df_1.loc[0, 'Measurements']

In [22]:
# Display the sample value's datatype and the sample value
print(type(test_msm))
test_msm

<class 'str'>


"{'Height': '203.0 cm', 'Weight': '441.0 kg'}"

In [23]:
# Replace single quotation with the double quotation marks
test_msm = test_msm.replace("'",'"')
test_msm

'{"Height": "203.0 cm", "Weight": "441.0 kg"}'

In [24]:
# Convert string to dictionary
test_msm = json.loads(test_msm)
print(type(test_msm))
test_msm

<class 'dict'>


{'Height': '203.0 cm', 'Weight': '441.0 kg'}

##### **Convert entire column from string to dictionary**

In [25]:
# Use .str.replace to replace all single quotes
df_1['Measurements'] = df_1['Measurements'].str.replace("'",'"')
# Apply the json.loads to the full column
df_1['Measurements'] = df_1['Measurements'].apply(json.loads)
df_1['Measurements'].head()

0    {'Height': '203.0 cm', 'Weight': '441.0 kg'}
1     {'Height': '191.0 cm', 'Weight': '65.0 kg'}
2     {'Height': '185.0 cm', 'Weight': '90.0 kg'}
3    {'Height': '203.0 cm', 'Weight': '441.0 kg'}
4    {'Height': '193.0 cm', 'Weight': '122.0 kg'}
Name: Measurements, dtype: object

In [26]:
# Check a sample value after transformation
test_msm = df_1.loc[0, 'Measurements']
# Display the sample's datatype and the sample
print(type(test_msm))
test_msm

<class 'dict'>


{'Height': '203.0 cm', 'Weight': '441.0 kg'}

##### **Unpack Measurements column of dictionaries into separate columns**

In [27]:
# Unpack height_weight into a pandas series
height_weight = df_1['Measurements'].apply(pd.Series)
# Display the pandas series
height_weight.head()

Unnamed: 0,Height,Weight
0,203.0 cm,441.0 kg
1,191.0 cm,65.0 kg
2,185.0 cm,90.0 kg
3,203.0 cm,441.0 kg
4,193.0 cm,122.0 kg


In [28]:
# Concat Height and Weight columns to the dataframe
df_1 = pd.concat((df_1, height_weight), axis = 1)
df_1.head()

Unnamed: 0,Name,Gender,Race,Alignment,Hair color,Eye color,Skin color,Measurements,Publisher,Height,Weight
0,A-Bomb,Male,Human,good,No Hair,yellow,Unknown,"{'Height': '203.0 cm', 'Weight': '441.0 kg'}",Marvel Comics,203.0 cm,441.0 kg
1,Abe Sapien,Male,Icthyo Sapien,good,No Hair,blue,blue,"{'Height': '191.0 cm', 'Weight': '65.0 kg'}",Dark Horse Comics,191.0 cm,65.0 kg
2,Abin Sur,Male,Ungaran,good,No Hair,blue,red,"{'Height': '185.0 cm', 'Weight': '90.0 kg'}",DC Comics,185.0 cm,90.0 kg
3,Abomination,Male,Human / Radiation,bad,No Hair,green,Unknown,"{'Height': '203.0 cm', 'Weight': '441.0 kg'}",Marvel Comics,203.0 cm,441.0 kg
4,Absorbing Man,Male,Human,bad,No Hair,blue,Unknown,"{'Height': '193.0 cm', 'Weight': '122.0 kg'}",Marvel Comics,193.0 cm,122.0 kg


In [29]:
# Drop Measurements column which has been replaced by Height and Weight
df_1 = df_1.drop(columns=['Measurements'])

In [30]:
# Confirm Measurements column has been removed
df_1.head()

Unnamed: 0,Name,Gender,Race,Alignment,Hair color,Eye color,Skin color,Publisher,Height,Weight
0,A-Bomb,Male,Human,good,No Hair,yellow,Unknown,Marvel Comics,203.0 cm,441.0 kg
1,Abe Sapien,Male,Icthyo Sapien,good,No Hair,blue,blue,Dark Horse Comics,191.0 cm,65.0 kg
2,Abin Sur,Male,Ungaran,good,No Hair,blue,red,DC Comics,185.0 cm,90.0 kg
3,Abomination,Male,Human / Radiation,bad,No Hair,green,Unknown,Marvel Comics,203.0 cm,441.0 kg
4,Absorbing Man,Male,Human,bad,No Hair,blue,Unknown,Marvel Comics,193.0 cm,122.0 kg


#### **Height column**

In [31]:
# Remove spaces and unit of measures
to_replace = [' ', 'cm']
for char in to_replace:
    df_1['Height'] = df_1['Height'].str.replace(char, '', regex=False)

In [32]:
# Confirm spaces and units of measure have been removed
df_1['Height'].head()

0    203.0
1    191.0
2    185.0
3    203.0
4    193.0
Name: Height, dtype: object

In [33]:
# Convert Weight column to integer datatype
df_1["Height"] =pd.to_numeric(df_1["Height"]).astype(int)

In [34]:
# Confirm column datatype has been updated to an integer datatype
df_1["Height"].dtype

dtype('int32')

#### **Weight column**

In [35]:
# Remove spaces and unit of measures
to_replace = [' ', 'kg']
for char in to_replace:
    df_1['Weight'] = df_1['Weight'].str.replace(char, '', regex=False)

In [36]:
# Confirm spaces and units of measure have been removed
df_1['Weight'].head()

0    441.0
1     65.0
2     90.0
3    441.0
4    122.0
Name: Weight, dtype: object

In [37]:
# Convert Weight column to integer datatype
df_1["Weight"] =pd.to_numeric(df_1["Weight"]).astype(int)

In [38]:
# Confirm column datatype has been updated to an integer datatype
df_1["Weight"].dtype

dtype('int32')

#### **Confirm dataframe has been transformed**

In [39]:
# Display transformed dataframe
df_1.head()

Unnamed: 0,Name,Gender,Race,Alignment,Hair color,Eye color,Skin color,Publisher,Height,Weight
0,A-Bomb,Male,Human,good,No Hair,yellow,Unknown,Marvel Comics,203,441
1,Abe Sapien,Male,Icthyo Sapien,good,No Hair,blue,blue,Dark Horse Comics,191,65
2,Abin Sur,Male,Ungaran,good,No Hair,blue,red,DC Comics,185,90
3,Abomination,Male,Human / Radiation,bad,No Hair,green,Unknown,Marvel Comics,203,441
4,Absorbing Man,Male,Human,bad,No Hair,blue,Unknown,Marvel Comics,193,122


### **Superhero_powers**

#### **hero_names column**

In [40]:
# Display the first (5) rows of the dataframe
df_2.head()

Unnamed: 0,hero_names,Powers
0,3-D Man,"Agility,Super Strength,Stamina,Super Speed"
1,A-Bomb,"Accelerated Healing,Durability,Longevity,Super..."
2,Abe Sapien,"Agility,Accelerated Healing,Cold Resistance,Du..."
3,Abin Sur,Lantern Power Ring
4,Abomination,"Accelerated Healing,Intelligence,Super Strengt..."


In [41]:
# Rename column to align with df_1 column name
df_2 = df_2.rename(columns = {"hero_names": "Name"})

In [42]:
# Confirm name has been updated
df_2.head()

Unnamed: 0,Name,Powers
0,3-D Man,"Agility,Super Strength,Stamina,Super Speed"
1,A-Bomb,"Accelerated Healing,Durability,Longevity,Super..."
2,Abe Sapien,"Agility,Accelerated Healing,Cold Resistance,Du..."
3,Abin Sur,Lantern Power Ring
4,Abomination,"Accelerated Healing,Intelligence,Super Strengt..."


#### **Powers column**

In [43]:
# Display a sample of Powers
df_2.loc[0,'Powers']

'Agility,Super Strength,Stamina,Super Speed'

In [44]:
# Display the data type for the sample of Powers
print(type(df_2.loc[0,'Powers']))

<class 'str'>


In [45]:
# Create a series by spliting string
new_powers = df_2["Powers"].str.split(",")

In [46]:
# Display the data type for the series
print(type(new_powers))

<class 'pandas.core.series.Series'>


In [47]:
# Display the series
new_powers

0        [Agility, Super Strength, Stamina, Super Speed]
1      [Accelerated Healing, Durability, Longevity, S...
2      [Agility, Accelerated Healing, Cold Resistance...
3                                   [Lantern Power Ring]
4      [Accelerated Healing, Intelligence, Super Stre...
                             ...                        
662               [Flight, Energy Blasts, Size Changing]
663    [Cold Resistance, Durability, Longevity, Super...
664    [Agility, Stealth, Danger Sense, Marksmanship,...
665    [Cryokinesis, Telepathy, Magic, Fire Control, ...
666    [Super Speed, Intangibility, Time Travel, Time...
Name: Powers, Length: 667, dtype: object

In [48]:
# Explode the series into seperate instances
new_powers.explode("Powers")

0                   Agility
1            Super Strength
2                   Stamina
3               Super Speed
4       Accelerated Healing
               ...         
5869        Weather Control
5870            Super Speed
5871          Intangibility
5872            Time Travel
5873      Time Manipulation
Name: Powers, Length: 5874, dtype: object

In [49]:
# Create a sorted array of new columns to make and drop the duplicates
cols_to_make = new_powers.explode("Powers").dropna().sort_values().unique()

In [50]:
# Display the array of new columns to make
cols_to_make

array(['Accelerated Healing', 'Adaptation', 'Agility',
       'Animal Attributes', 'Animal Control', 'Animal Oriented Powers',
       'Animation', 'Anti-Gravity', 'Astral Projection', 'Astral Travel',
       'Audio Control', 'Banish', 'Biokinesis', 'Camouflage',
       'Changing Armor', 'Clairvoyance', 'Cloaking', 'Cold Resistance',
       'Cryokinesis', 'Danger Sense', 'Darkforce Manipulation',
       'Death Touch', 'Density Control', 'Dexterity',
       'Dimensional Awareness', 'Dimensional Travel', 'Duplication',
       'Durability', 'Echolocation', 'Elasticity', 'Electrical Transport',
       'Electrokinesis', 'Element Control',
       'Elemental Transmogrification', 'Empathy', 'Energy Absorption',
       'Energy Armor', 'Energy Beams', 'Energy Blasts',
       'Energy Constructs', 'Energy Manipulation', 'Energy Resistance',
       'Enhanced Hearing', 'Enhanced Memory', 'Enhanced Senses',
       'Enhanced Sight', 'Enhanced Smell', 'Enhanced Touch',
       'Fire Control', 'Fire Resis

In [51]:
# Loop through array of new columns to make and create a new column for each
# Indicate true or false depending on if the value was in Powers
for col in cols_to_make:
    df_2[col] = df_2['Powers'].str.contains(col)

  after removing the cwd from sys.path.


In [52]:
# Confirm columns have been added
df_2.head()

Unnamed: 0,Name,Powers,Accelerated Healing,Adaptation,Agility,Animal Attributes,Animal Control,Animal Oriented Powers,Animation,Anti-Gravity,...,Vision - Thermal,Vision - X-Ray,Vitakinesis,Wallcrawling,Water Control,Weapon-based Powers,Weapons Master,Weather Control,Web Creation,Wind Control
0,3-D Man,"Agility,Super Strength,Stamina,Super Speed",False,False,True,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1,A-Bomb,"Accelerated Healing,Durability,Longevity,Super...",True,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,Abe Sapien,"Agility,Accelerated Healing,Cold Resistance,Du...",True,False,True,False,False,False,False,False,...,False,False,False,False,False,False,True,False,False,False
3,Abin Sur,Lantern Power Ring,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4,Abomination,"Accelerated Healing,Intelligence,Super Strengt...",True,False,False,False,False,False,True,False,...,False,False,False,False,False,False,False,False,False,False


In [53]:
# Drop Powers column which has been replaced by added columns
df_2 = df_2.drop(columns=['Powers'])

In [54]:
# Confirm Powers column has been dropped
df_2.head()

Unnamed: 0,Name,Accelerated Healing,Adaptation,Agility,Animal Attributes,Animal Control,Animal Oriented Powers,Animation,Anti-Gravity,Astral Projection,...,Vision - Thermal,Vision - X-Ray,Vitakinesis,Wallcrawling,Water Control,Weapon-based Powers,Weapons Master,Weather Control,Web Creation,Wind Control
0,3-D Man,False,False,True,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
1,A-Bomb,True,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
2,Abe Sapien,True,False,True,False,False,False,False,False,False,...,False,False,False,False,False,False,True,False,False,False
3,Abin Sur,False,False,False,False,False,False,False,False,False,...,False,False,False,False,False,False,False,False,False,False
4,Abomination,True,False,False,False,False,False,True,False,False,...,False,False,False,False,False,False,False,False,False,False


## **Merge Dataframes**

In [55]:
# Merge dataframes on the Name column
df_3 = pd.merge(df_1, df_2, on='Name')

In [56]:
# Display merged dataframe
df_3.head()

Unnamed: 0,Name,Gender,Race,Alignment,Hair color,Eye color,Skin color,Publisher,Height,Weight,...,Vision - Thermal,Vision - X-Ray,Vitakinesis,Wallcrawling,Water Control,Weapon-based Powers,Weapons Master,Weather Control,Web Creation,Wind Control
0,A-Bomb,Male,Human,good,No Hair,yellow,Unknown,Marvel Comics,203,441,...,False,False,False,False,False,False,False,False,False,False
1,Abe Sapien,Male,Icthyo Sapien,good,No Hair,blue,blue,Dark Horse Comics,191,65,...,False,False,False,False,False,False,True,False,False,False
2,Abin Sur,Male,Ungaran,good,No Hair,blue,red,DC Comics,185,90,...,False,False,False,False,False,False,False,False,False,False
3,Abomination,Male,Human / Radiation,bad,No Hair,green,Unknown,Marvel Comics,203,441,...,False,False,False,False,False,False,False,False,False,False
4,Absorbing Man,Male,Human,bad,No Hair,blue,Unknown,Marvel Comics,193,122,...,False,False,False,False,False,False,False,False,False,False


## **Questions**

### **Compare the average weight of super powers who have Super Speed to those who do not.**

In [57]:
# Group Weight by Super Speed
groupby_speed = df_3.groupby(["Super Speed"])
round(groupby_speed["Weight"].mean(),3).sort_values(ascending=False )

Super Speed
True     129.404
False    101.774
Name: Weight, dtype: float64

### **What is the average height of heroes for each publisher?**

In [58]:
# Group Height by Publisher
groupby_publisher = df_3.groupby(["Publisher"])
round(groupby_publisher["Height"].mean(),3).sort_values(ascending=False )

Publisher
Image Comics         211.000
Marvel Comics        191.545
DC Comics            181.920
Star Trek            181.500
Team Epic TV         180.750
Unknown              178.000
Dark Horse Comics    176.909
Shueisha             171.500
George Lucas         159.600
Name: Height, dtype: float64