# Avengers 

#### Let's take a closer look at the avengers heroes that everyone watches and follows closely. In this data analysis, we will do a visual-based analysis.

<div style="width:100%;text-align: center;"> <img align=middle src="https://gifimage.net/wp-content/uploads/2017/07/avengers-gif-1-1.gif" alt="Heat beating" style="height:300px;margin-top:3rem;"> </div>

# Libraries

In [None]:
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns
import os
from wordcloud import WordCloud
import nltk
from PIL import Image
import numpy as np
%matplotlib inline

# CONSEPT OF DATA

`avengers.csv` details the deaths of Marvel comic book characters between the time they joined the Avengers and April 30, 2015, the week before Secret Wars #1.

Header | Definition
---|---------
`URL`| The URL of the comic character on the Marvel Wikia
`Name/Alias` | The full name or alias of the character
`Appearances` | The number of comic books that character appeared in as of April 30 
`Current?` | Is the member currently active on an avengers affiliated team?
`Gender` | The recorded gender of the character
`Probationary` | Sometimes the character was given probationary status as an Avenger, this is the date that happened
`Full/Reserve` | The month and year the character was introduced as a full or reserve member of the Avengers
`Year` | The year the character was introduced as a full or reserve member of the Avengers
`Years since joining` | 2015 minus the year
`Honorary` | The status of the avenger, if they were given "Honorary" Avenger status, if they are simply in the "Academy," or "Full" otherwise
`Death1` | Yes if the Avenger died, No if not. 
`Return1` | Yes if the Avenger returned from their first death, No if  they did not, blank if not applicable
`Death2` | Yes if the Avenger died a second time after their revival, No if they did not, blank if not applicable
`Return2` | Yes if the Avenger returned from their second death, No if they did not, blank if not applicable
`Death3` | Yes if the Avenger died a third time after their second revival, No if they did not, blank if not applicable
`Return3` | Yes if the Avenger returned from their third death, No if they did not, blank if not applicable
`Death4` | Yes if the Avenger died a fourth time after their third revival, No if they did not, blank if not applicable
`Return4` | Yes if the Avenger returned from their fourth death, No if they did not, blank if not applicable
`Death5` | Yes if the Avenger died a fifth time after their fourth revival, No if they did not, blank if not applicable
`Return5` | Yes if the Avenger returned from their fifth death, No if they did not, blank if not applicable
`Notes` | Descriptions of deaths and resurrections. 


#### Once we know about the data we can start the data edits

<div style="width:100%;text-align: center;"> <img align=middle src="https://comicvine.gamespot.com/a/uploads/original/11140/111403694/7623687-7476845086-be17d.gif" alt="Heat beating" style="height:300px;margin-top:3rem;"> </div>


### LOAD DATA

In [None]:
data = pd.read_csv("../input/avengers/avengers.csv",encoding = "latin-1")
data.drop("URL",axis=1,inplace=True)

In [None]:
data.head(3)

In [None]:
# Chechking null values 
def about_data(df):
    df.info()
    total_missing_values = df.isnull().sum().reset_index()
    total_missing_values = total_missing_values.rename(columns={'index':'columns',0:'total missing'})
    total_missing_values['ration of missing'] = total_missing_values['total missing']/len(df)
    return total_missing_values

In [None]:
about_data(data)

#### As we can see there are lots of null values for Probationary Introl. That is a good idea to drop that columns. Ratio of missing value should be under 0.25 at least. However, there are some columns that have ratio of missing value below 0.25 but that columns are not necessary. Death and Return columns will be filled in the future

In [None]:
data.drop('Probationary Introl',axis=1,inplace=True)

# Null Values

In [None]:
data['Death1'].fillna('Never Happen', inplace = True)
data['Death2'].fillna('Never Happen', inplace = True) 
data['Death3'].fillna('Never Happen', inplace = True) 
data['Death4'].fillna('Never Happen', inplace = True) 
data['Death5'].fillna('Never Happen', inplace = True) 
data['Return1'].fillna('Never Happen', inplace = True)
data['Return2'].fillna('Never Happen', inplace = True) 
data['Return3'].fillna('Never Happen', inplace = True) 
data['Return4'].fillna('Never Happen', inplace = True) 
data['Return5'].fillna('Never Happen', inplace = True) 
data["Full/Reserve Avengers Intro"].fillna('Unknown', inplace = True) 

#### Since the Notes column is too complicated to understand, some editing is needed to make it easy to visualize in the future.

In [None]:
data.Notes = data.Notes.str.replace("_"," ")

In [None]:
data.head()

As we can see, status of the avenger generally is full for both gender

In [None]:
sns.set(rc={'figure.figsize':(10,7)})
sns.countplot(data = data, x="Gender",hue="Honorary",palette='Set1')

#### Gender discrimination of heroes with and without active group members

In [None]:
sns.set(rc={'figure.figsize':(10,7)})
sns.countplot(data = data, x="Current?",hue="Gender",palette='Paired')

In [None]:
def disturbition(x,title):
    plt.figure(figsize=(10,8))
    ax = sns.distplot(x, kde=False,bins=30)
    values = np.array([rec.get_height() for rec in ax.patches])
    norm = plt.Normalize(values.min(), values.max())
    colors = plt.cm.jet(norm(values))
    for rec, col in zip(ax.patches,colors):
        rec.set_color(col)
    plt.title(title, size=20, color='black')

In [None]:
disturbition(data.Appearances,"Disturbition Appearance")

#### Discrimination by gender by looking at the year since joining in which heroes with and without an active group participated.

In [None]:
plt.figure(figsize=(12,6),dpi=110)
sns.boxplot(y='Years since joining',x='Current?',hue="Gender",data=data)

In [None]:
plt.figure(figsize=(12,6),dpi=110)
sns.violinplot(y='Appearances',x='Current?',data=data,palette='Set2')

In [None]:
plt.figure(figsize=(12,6),dpi=110)
sns.jointplot(x='Appearances',y='Years since joining',data=data,hue="Gender")

> ### there are no honored actresses who currently active on an avengers affiliated team

In [None]:
is_honory = data[data['Honorary']=='Honorary']
plt.figure(figsize=(10,4),dpi=100)
sns.boxplot(x='Years since joining',y='Current?',data=is_honory,orient='h',hue="Gender")
plt.title("Honored Actors & Actresses")

**Lets Looking for which charachter is still alive and which is death**

In [None]:
still_alive = data[data['Notes'].isnull()]
death = data[data['Notes'].notnull()]

# Death

In [None]:
death.head()

> #### When we look at the heroes who died, the male heroes lost more than the female heroes. Is it more risky to be a male hero in Avengers?

In [None]:
sns.set(rc={'figure.figsize':(10,7)})
sns.countplot(data = death, x="Gender",palette='coolwarm')

> #### When we look at the heroes who died, male heroes are more compared to the heroines who have been the subject of books.

In [None]:
plt.figure(figsize=(12,6),dpi=110)
sns.boxplot(y='Appearances',x='Gender',data=death,palette='Pastel2')

# Still Alive

In [None]:
still_alive

> #### When we look at the heroes who still alive , same result but we can say that  avengers have more male heros  than female heros

In [None]:
sns.set(rc={'figure.figsize':(10,7)})
sns.countplot(data = still_alive, x="Gender",palette='Paired')

In [None]:
plt.figure(figsize=(12,6),dpi=110)
sns.boxplot(y='Appearances',x='Gender',data=still_alive,palette='coolwarm')

# WordCLoud

> ##### What about visualization notes that heros said when they died. Lets use worldCloud for awesome visualization. 

In [None]:
# this fonction for make only black and white color
def transform_format(val):
    if val == 0:
        return 255
    else:
        return val

> What about creat masked worldcloud?

In [None]:
mask = np.array(Image.open('../input/1qaxffbfxxs/png-clipart-avengers-logo-logo-avengers-marvel-cinematic-universe-burning-letter-a-text-superhero.png'))
transformed = np.ndarray((mask.shape[0],mask.shape[1]), np.int32)

for i in range(len(mask)):
    transformed[i] = list(map(transform_format,mask[i]))

 >### **Word Cloud with death notes**

In [None]:
cloud_data = ' '.join([line for line in death.Notes])
stopword = nltk.corpus.stopwords.words('english')

# plot the WordCloud image                       
wc = WordCloud(background_color='white', max_words=2000, mask=transformed,colormap='Set1')
wc.generate(cloud_data)
wc.to_file('word_cloud.png')
wc.to_image()

In [None]:
mask = np.array(Image.open('../input/ironnnn/png-clipart-iron-man-head-iron-man-stencil-star-lord-carving-pumpkin-skin-marvel-avengers-assemble-white.png'))
transformed = np.ndarray((mask.shape[0],mask.shape[1]), np.int32)

for i in range(len(mask)):
    transformed[i] = list(map(transform_format,mask[i]))

> ### **Word Cloud with actors and actresses's names**
> remember the legend Tony Stark🕶✊

In [None]:
names = data.dropna()
cloud_data = ' '.join([line for line in names['Name/Alias']])
stopword = nltk.corpus.stopwords.words('english')

# plot the WordCloud image                       
wc = WordCloud(background_color='white', max_words=2000, mask=transformed,colormap='twilight_shifted_r')
wc.generate(cloud_data)
wc.to_file('word_cloud_ironman.png')
wc.to_image()

# END

<div style="width:100%;text-align: center;"> <img align=middle src="https://www.icegif.com/wp-content/uploads/icegif-843.gif" alt="Heat beating" style="height:300px;margin-top:3rem;"> </div>


## I hope you enjoy✌🏻