# Applying Advance Transformations

The Task
Your task is two-fold:

I. Clean the files and combine them into one final DataFrame.

- This dataframe should have the following columns:
  - Hero (Just the name of the Hero)
  - Publisher
  - Gender
  - Eye color
  - Race
  - Hair color
  - Height (numeric)
  - Skin color
  - Alignment
  - Weight (numeric)
  - Plus, one-hot-encoded columns for every power that appears in the dataset. E.g.:
  - Agility
  - Flight
  - Superspeed
etc.

II. Use your combined DataFrame to answer the following questions.

  - Compare the average weight of super powers who have Super Speed to those who do not.
What is the average height of heroes for each publisher?

## Import libraries & Load data

In [32]:
## Standard Imports

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

## Importing the OS and JSON Modules

import os,json

In [33]:
info = pd.read_csv('/Users/alicia/Documents/DataEnrichment/Wk14 JSON/Advance Transformations Wk14 Core #3/Advanced-Transformations/Data/superhero_info - superhero_info.csv')
info.head()

Unnamed: 0,Hero|Publisher,Gender,Race,Alignment,Hair color,Eye color,Skin color,Measurements
0,A-Bomb|Marvel Comics,Male,Human,good,No Hair,yellow,Unknown,"{'Height': '203.0 cm', 'Weight': '441.0 kg'}"
1,Abe Sapien|Dark Horse Comics,Male,Icthyo Sapien,good,No Hair,blue,blue,"{'Height': '191.0 cm', 'Weight': '65.0 kg'}"
2,Abin Sur|DC Comics,Male,Ungaran,good,No Hair,blue,red,"{'Height': '185.0 cm', 'Weight': '90.0 kg'}"
3,Abomination|Marvel Comics,Male,Human / Radiation,bad,No Hair,green,Unknown,"{'Height': '203.0 cm', 'Weight': '441.0 kg'}"
4,Absorbing Man|Marvel Comics,Male,Human,bad,No Hair,blue,Unknown,"{'Height': '193.0 cm', 'Weight': '122.0 kg'}"


In [34]:
powers = pd.read_csv('/Users/alicia/Documents/DataEnrichment/Wk14 JSON/Advance Transformations Wk14 Core #3/Advanced-Transformations/Data/superhero_powers - superhero_powers.csv')
powers.head()

Unnamed: 0,hero_names,Powers
0,3-D Man,"Agility,Super Strength,Stamina,Super Speed"
1,A-Bomb,"Accelerated Healing,Durability,Longevity,Super..."
2,Abe Sapien,"Agility,Accelerated Healing,Cold Resistance,Du..."
3,Abin Sur,Lantern Power Ring
4,Abomination,"Accelerated Healing,Intelligence,Super Strengt..."


## Info DF Preprocessing

In [35]:
info.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 463 entries, 0 to 462
Data columns (total 8 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Hero|Publisher  463 non-null    object
 1   Gender          463 non-null    object
 2   Race            463 non-null    object
 3   Alignment       463 non-null    object
 4   Hair color      463 non-null    object
 5   Eye color       463 non-null    object
 6   Skin color      463 non-null    object
 7   Measurements    463 non-null    object
dtypes: object(8)
memory usage: 29.1+ KB


In [36]:
#Split hero and publisher into two columns 

info['Hero|Publisher'].str.split('|', expand = True)

Unnamed: 0,0,1
0,A-Bomb,Marvel Comics
1,Abe Sapien,Dark Horse Comics
2,Abin Sur,DC Comics
3,Abomination,Marvel Comics
4,Absorbing Man,Marvel Comics
...,...,...
458,Yellowjacket,Marvel Comics
459,Yellowjacket II,Marvel Comics
460,Yoda,George Lucas
461,Zatanna,DC Comics


In [37]:
info[['Hero','Publisher']] = info['Hero|Publisher'].str.split('|',expand=True)
info.head(2)

Unnamed: 0,Hero|Publisher,Gender,Race,Alignment,Hair color,Eye color,Skin color,Measurements,Hero,Publisher
0,A-Bomb|Marvel Comics,Male,Human,good,No Hair,yellow,Unknown,"{'Height': '203.0 cm', 'Weight': '441.0 kg'}",A-Bomb,Marvel Comics
1,Abe Sapien|Dark Horse Comics,Male,Icthyo Sapien,good,No Hair,blue,blue,"{'Height': '191.0 cm', 'Weight': '65.0 kg'}",Abe Sapien,Dark Horse Comics


In [38]:
#drop unwanted columns

info = info.drop(columns = ['Hero|Publisher'])
info.head()

Unnamed: 0,Gender,Race,Alignment,Hair color,Eye color,Skin color,Measurements,Hero,Publisher
0,Male,Human,good,No Hair,yellow,Unknown,"{'Height': '203.0 cm', 'Weight': '441.0 kg'}",A-Bomb,Marvel Comics
1,Male,Icthyo Sapien,good,No Hair,blue,blue,"{'Height': '191.0 cm', 'Weight': '65.0 kg'}",Abe Sapien,Dark Horse Comics
2,Male,Ungaran,good,No Hair,blue,red,"{'Height': '185.0 cm', 'Weight': '90.0 kg'}",Abin Sur,DC Comics
3,Male,Human / Radiation,bad,No Hair,green,Unknown,"{'Height': '203.0 cm', 'Weight': '441.0 kg'}",Abomination,Marvel Comics
4,Male,Human,bad,No Hair,blue,Unknown,"{'Height': '193.0 cm', 'Weight': '122.0 kg'}",Absorbing Man,Marvel Comics


In [39]:
# split measurments into 2 columns and change the : symbol

replace = [',']
for i in replace:
    info['Measurements'] = info['Measurements'].str.replace(i,':',regex=False)

In [40]:
info['Measurements'].str.split(':',expand = True)

Unnamed: 0,0,1,2,3
0,{'Height','203.0 cm','Weight','441.0 kg'}
1,{'Height','191.0 cm','Weight','65.0 kg'}
2,{'Height','185.0 cm','Weight','90.0 kg'}
3,{'Height','203.0 cm','Weight','441.0 kg'}
4,{'Height','193.0 cm','Weight','122.0 kg'}
...,...,...,...,...
458,{'Height','183.0 cm','Weight','83.0 kg'}
459,{'Height','165.0 cm','Weight','52.0 kg'}
460,{'Height','66.0 cm','Weight','17.0 kg'}
461,{'Height','170.0 cm','Weight','57.0 kg'}


In [41]:
info[['0','Height(cm)','2','Weight(kg)']] = info['Measurements'].str.split(':',expand=True)
info.head(2)

Unnamed: 0,Gender,Race,Alignment,Hair color,Eye color,Skin color,Measurements,Hero,Publisher,0,Height(cm),2,Weight(kg)
0,Male,Human,good,No Hair,yellow,Unknown,{'Height': '203.0 cm': 'Weight': '441.0 kg'},A-Bomb,Marvel Comics,{'Height','203.0 cm','Weight','441.0 kg'}
1,Male,Icthyo Sapien,good,No Hair,blue,blue,{'Height': '191.0 cm': 'Weight': '65.0 kg'},Abe Sapien,Dark Horse Comics,{'Height','191.0 cm','Weight','65.0 kg'}


In [42]:
#drop uneccessary columns

info = info.drop(columns=['Measurements', '0','2'])
info.head(2)

Unnamed: 0,Gender,Race,Alignment,Hair color,Eye color,Skin color,Hero,Publisher,Height(cm),Weight(kg)
0,Male,Human,good,No Hair,yellow,Unknown,A-Bomb,Marvel Comics,'203.0 cm','441.0 kg'}
1,Male,Icthyo Sapien,good,No Hair,blue,blue,Abe Sapien,Dark Horse Comics,'191.0 cm','65.0 kg'}


In [43]:
#clean up height and weight columns a little more so theyre just numbers

replace = ["'", ' cm', ' kg', '}',' ']
for i in replace:
    info['Height(cm)'] = info['Height(cm)'].str.replace(i,'',regex=False)

In [44]:
replace = ["'", 'cm', 'kg', '}',' ']
for i in replace:
    info['Weight(kg)'] = info['Weight(kg)'].str.replace(i,'',regex=False)

In [45]:
#checking that the unwanted are off columns

info.head()

Unnamed: 0,Gender,Race,Alignment,Hair color,Eye color,Skin color,Hero,Publisher,Height(cm),Weight(kg)
0,Male,Human,good,No Hair,yellow,Unknown,A-Bomb,Marvel Comics,203.0,441.0
1,Male,Icthyo Sapien,good,No Hair,blue,blue,Abe Sapien,Dark Horse Comics,191.0,65.0
2,Male,Ungaran,good,No Hair,blue,red,Abin Sur,DC Comics,185.0,90.0
3,Male,Human / Radiation,bad,No Hair,green,Unknown,Abomination,Marvel Comics,203.0,441.0
4,Male,Human,bad,No Hair,blue,Unknown,Absorbing Man,Marvel Comics,193.0,122.0


## Powers DF Preprocessing

In [46]:
powers.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 667 entries, 0 to 666
Data columns (total 2 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   hero_names  667 non-null    object
 1   Powers      667 non-null    object
dtypes: object(2)
memory usage: 10.5+ KB
