## <span style="color:#ee2e22">Super Transformations in Python</span>


In [1]:
# Import Libraries
import pandas as pd

# Read superhero_info.csv
superhero_info_df = pd.read_csv('superhero_info - superhero_info (1).csv')

# Read superhero_powers.csv
superhero_powers_df = pd.read_csv('superhero_powers - superhero_powers.csv')
print("Imports and csv readings done.")

Imports and csv readings done.


- **<span style="color:#7c1618">Understand:**

In [2]:
superhero_info_df.head()

Unnamed: 0,Hero|Publisher,Gender,Race,Alignment,Hair color,Eye color,Skin color,Measurements
0,A-Bomb|Marvel Comics,Male,Human,good,No Hair,yellow,Unknown,"{'Height': '203.0 cm', 'Weight': '441.0 kg'}"
1,Abe Sapien|Dark Horse Comics,Male,Icthyo Sapien,good,No Hair,blue,blue,"{'Height': '191.0 cm', 'Weight': '65.0 kg'}"
2,Abin Sur|DC Comics,Male,Ungaran,good,No Hair,blue,red,"{'Height': '185.0 cm', 'Weight': '90.0 kg'}"
3,Abomination|Marvel Comics,Male,Human / Radiation,bad,No Hair,green,Unknown,"{'Height': '203.0 cm', 'Weight': '441.0 kg'}"
4,Absorbing Man|Marvel Comics,Male,Human,bad,No Hair,blue,Unknown,"{'Height': '193.0 cm', 'Weight': '122.0 kg'}"


In [3]:
superhero_powers_df.head()

Unnamed: 0,hero_names,Powers
0,3-D Man,"Agility,Super Strength,Stamina,Super Speed"
1,A-Bomb,"Accelerated Healing,Durability,Longevity,Super..."
2,Abe Sapien,"Agility,Accelerated Healing,Cold Resistance,Du..."
3,Abin Sur,Lantern Power Ring
4,Abomination,"Accelerated Healing,Intelligence,Super Strengt..."


### <span style="color:#7c1618">First, let's clean and preprocess the "superhero_info_df"</span> :

- **Read the superhero_info.csv and superhero_powers.csv files:**

In [4]:
df_info = pd.read_csv('superhero_info - superhero_info (1).csv', delimiter='|')
df_powers = pd.read_csv('superhero_powers - superhero_powers.csv')

- **Clean and preprocess the superhero_info DataFrame:**

In [5]:
# Split the 'Publisher,Gender,Race,Alignment,Hair color,Eye color,Skin color,Measurements' column into separate columns
info_columns = ['Hero', 'Publisher', 'Gender', 'Race', 'Alignment', 'Hair color', 'Eye color', 'Skin color', 'Measurements']
df_info[info_columns] = df_info['Publisher,Gender,Race,Alignment,Hair color,Eye color,Skin color,Measurements'].str.split(',', expand=True)

# Clean up the 'Hero' column by removing the extra characters after '|'
df_info['Hero'] = df_info['Hero'].str.split('|').str[0]

# Extract height and weight from the 'Measurements' column
df_info['Height'] = df_info['Measurements'].str.extract(r"'Height': '([\d.]+) cm'").astype(float)
df_info['Weight'] = df_info['Measurements'].str.extract(r"'Weight': '([\d.]+) kg'").astype(float)

# Drop unnecessary columns
df_info = df_info.drop(['Publisher,Gender,Race,Alignment,Hair color,Eye color,Skin color,Measurements'], axis=1)

- **Clean and preprocess the superhero_powers DataFrame:**

In [6]:
# Convert 'Powers' column to string type
df_powers['Powers'] = df_powers['Powers'].astype(str)

# Split the powers column into separate rows
df_powers['Powers'] = df_powers['Powers'].str.split(',')

# Create a set of all unique powers
unique_powers = set(power for powers in df_powers['Powers'] if isinstance(powers, list) for power in powers)

# Initialize a dictionary to store the one-hot-encoded powers
powers_dict = {power: [] for power in unique_powers}

# Encode the powers as one-hot vectors
for powers in df_powers['Powers']:
    if isinstance(powers, list):
        power_vector = [1 if power in powers else 0 for power in unique_powers]
    else:
        power_vector = [0] * len(unique_powers)
    for power, value in zip(unique_powers, power_vector):
        powers_dict[power].append(value)

# Create a DataFrame from the powers dictionary
df_powers_encoded = pd.DataFrame(powers_dict)

# Concatenate the hero names with the encoded powers DataFrame
df_powers_combined = pd.concat([df_powers['hero_names'], df_powers_encoded], axis=1)
print("Finally it worked")
print(df_powers_combined.head())

Finally it worked
    hero_names  Radar Sense  Audio Control  Dimensional Awareness  Omnipotent  \
0      3-D Man            0              0                      0           0   
1       A-Bomb            0              0                      0           0   
2   Abe Sapien            0              0                      0           0   
3     Abin Sur            0              0                      0           0   
4  Abomination            0              0                      0           0   

   Intangibility  Enhanced Hearing  Time Travel  Self-Sustenance  \
0              0                 0            0                0   
1              0                 0            0                1   
2              0                 0            0                0   
3              0                 0            0                0   
4              0                 0            0                0   

   Shapeshifting  ...  Substance Secretion  Natural Armor  Weapons Master  \
0        

- **Grouping the data by hero names
To calculate the total power level for each hero, we need to group the data by hero names and sum the values across all powers. We can use the groupby function in pandas to achieve this.**

In [7]:
df_grouped = df_powers_combined.groupby('hero_names').sum().reset_index()
print(df_grouped)

          hero_names  Radar Sense  Audio Control  Dimensional Awareness  \
0            3-D Man            0              0                      0   
1             A-Bomb            0              0                      0   
2         Abe Sapien            0              0                      0   
3           Abin Sur            0              0                      0   
4        Abomination            0              0                      0   
..               ...          ...            ...                    ...   
662  Yellowjacket II            0              0                      0   
663             Ymir            0              0                      0   
664             Yoda            0              0                      0   
665          Zatanna            0              0                      0   
666             Zoom            0              0                      0   

     Omnipotent  Intangibility  Enhanced Hearing  Time Travel  \
0             0              0    

- **Step 7: Calculating the total power level
We can now calculate the total power level for each hero by summing the values across all power columns. We can create a new column called 'Total Power Level' to store this information.**

In [8]:
# Calculate the total power level
df_grouped['Total Power Level'] = df_grouped.sum(axis=1)

  df_grouped['Total Power Level'] = df_grouped.sum(axis=1)


- **Sorting the data by total power level
Finally, we can sort the data in descending order based on the total power level to find the most powerful heroes.**

In [9]:
# Sort the data by total power level
df_sorted = df_grouped.sort_values(by='Total Power Level', ascending=False)

- **Now, df_sorted contains the heroes sorted in descending order of their total power level. You can access the top N most powerful heroes by using the head() function. For example, to get the top 10 most powerful heroes, you can use:**

In [10]:
top_10_powerful_heroes = df_sorted.head(10)
top_10_powerful_heroes

Unnamed: 0,hero_names,Radar Sense,Audio Control,Dimensional Awareness,Omnipotent,Intangibility,Enhanced Hearing,Time Travel,Self-Sustenance,Shapeshifting,...,Natural Armor,Weapons Master,Seismic Power,Flight,Power Suit,Natural Weapons,Enhanced Touch,Enhanced Smell,Time Manipulation,Total Power Level
563,Spectre,0,0,1,0,0,0,1,0,0,...,0,0,0,1,0,0,0,0,1,49
18,Amazo,0,0,0,0,0,1,1,0,1,...,0,0,0,1,0,0,0,1,0,44
394,Martian Manhunter,0,0,0,0,1,1,0,1,1,...,0,0,0,1,0,0,0,0,0,35
370,Living Tribunal,0,0,1,1,0,0,1,0,0,...,0,0,0,1,0,0,0,0,1,35
388,Man of Miracles,0,0,1,1,0,1,1,0,0,...,0,0,0,0,0,0,0,1,1,34
139,Captain Marvel,0,0,0,0,0,1,0,1,0,...,0,0,0,1,0,0,0,0,0,33
597,T-X,0,0,0,0,0,1,0,0,1,...,0,1,0,0,0,1,1,1,0,33
246,Galactus,0,0,1,0,0,0,0,0,0,...,0,0,0,1,0,0,0,0,0,32
594,T-1000,0,0,0,0,0,1,0,0,1,...,0,1,0,0,0,1,1,1,0,32
455,One-Above-All,0,0,1,1,0,0,1,1,0,...,0,0,0,1,0,0,0,0,1,31


In [17]:
df_powers_combined['Total Power'] = df_powers_combined.iloc[:, 1:].sum(axis=1)
df_sorted = df_powers_combined.sort_values('Total Power', ascending=False)
most_powerful_hero = df_sorted.iloc[0]['hero_names']
print("The hero with the most powerful superpower is:", most_powerful_hero)

The hero with the most powerful superpower is: Spectre


- **Find The Average height:**

In [22]:
import ast
# Split the 'Publisher' and 'Height' columns from the 'Hero|Publisher' column
superhero_info_df[['Hero', 'Publisher']] = superhero_info_df['Hero|Publisher'].str.split('|', expand=True)

# Convert the 'Height' column to numeric values
superhero_info_df['Height'] = superhero_info_df['Measurements'].apply(lambda x: float(ast.literal_eval(x)['Height'].split()[0]) if 'Height' in x else None)

# Group the data by 'Publisher' and calculate the mode of 'Height' for each group
mode_height_by_publisher = superhero_info_df.groupby('Publisher')['Height'].apply(lambda x: x.mode().iloc[0] if len(x.mode()) > 0 else None)

# Print the most used height for heroes from each publisher
print("Most used height of heroes for each publisher:")
print(mode_height_by_publisher)

Most used height of heroes for each publisher:
Publisher
DC Comics            183.0
Dark Horse Comics     71.0
George Lucas         183.0
Image Comics         211.0
Marvel Comics        183.0
Shueisha             168.0
Star Trek            178.0
Team Epic TV         175.0
Unknown              178.0
Name: Height, dtype: float64
