## Get Key

In [None]:
publickey = '6fd8165e564c853efc4f2e0f6f2f92a0'
privatekey = '5de3aefb015991bdb3ca31a29df72110d19a5976'

## Make first request

Here we will only make requests to the public endpoints. The whole list is available at the [docs](https://developer.marvel.com/docs). We'll begin by directing our request to the ```characters``` endpoint in order to retrieve information about a single character of your choice.

In addition to the query parameters for each endpoint, the API expects developers to fill in the values for three parameters in <b>all</b> requests:

- **apikey**. This parameter takes the *public* key.
- **ts**. This parameter takes a timestamp in string form or any other long string which can change on a request-by-request basis.
- **hash**. This parameter takes a MD5 hash of ts+privatekey+publickey.

#### Generating a timestamp


In [None]:
import time
ts = str(time.time())

#### Generating a MD5 hash

In [None]:
import hashlib
code = ts+privatekey+publickey
md5hash = hashlib.md5(code.encode('utf-8')).hexdigest()

<div class="alert alert-danger">Note that the code need to have the key and hash elements updated each time the code is executed. Otherwise, the solution will be based on expired values.</div>

#### Building requests

The ```characters``` endpoint allows to retrieve information about characters by providing different parameters, including their name, the comics or series twhere they appear, etc.

<img src='https://www.dropbox.com/s/phrm53wa066cnvu/characters.png?raw=1' width=500>

Here we will begin by retrieving information about a single character by ```name```.

In [None]:
character_name = 'Black Widow'

In [None]:
import requests

base_url = 'https://gateway.marvel.com'
character_endpoint = '/v1/public/characters'

url = base_url + character_endpoint

params = {'name': character_name, 'apikey': publickey, 'ts': ts, 'hash': md5hash}
response = requests.get(url, params=params)

Identify the full url address for the character's <i>wiki</i> and format it so that it only includes the <i>regular</i> url.

In [None]:
# YOUR CODE HERE
r = response.json()

url_wiki = r['data']['results'][0]['urls'][1]['url'].split('?')[0]
url_wiki

'http://marvel.com/universe/Black_Widow_(Natasha_Romanova)'

Take a look at the webpage

<img src='http://marvel.com/universe/Black_Widow_(Natasha_Romanova)' width=1000>

We are interested in retrieving part of the information contained in the ```IN COMICS FULL REPORT``` tab. 

In [None]:
import bs4

def get_soup(url):
    response = requests.get(url)
    soup = bs4.BeautifulSoup(response.text, 'html.parser')
    
    return soup

In [None]:
soup_wiki = get_soup(url_wiki)
soup_wiki

<!DOCTYPE html>
<html lang="en"><head><meta charset="utf-8" class="next-head next-head"/><title class="next-head">Black Widow (Natasha Romanoff) | Characters | Marvel</title><meta class="next-head" content="Black Widow (Natasha Romanoff) | Characters | Marvel" name="title"/><meta class="next-head" content="Marvel Entertainment" property="og:site_name"/><meta class="next-head" content="@Marvel" name="twitter:creator"/><meta class="next-head" content="@Marvel" name="twitter:site"/><meta class="next-head" content="website" property="og:type"/><meta class="next-head" content="summary_large_image" name="twitter:card"/><meta class="next-head" content="Natasha Romanoff, separated from the now-fractured Avengers, confronts the dark path she took to becoming a spy and assassin, as well as events that followed." name="description"/><meta class="next-head" content="Natasha Romanoff, separated from the now-fractured Avengers, confronts the dark path she took to becoming a spy and assassin, as well

Identify the tag that corresponds to the ```IN COMICS FULL REPORT``` and write the code to retrieve the corresponding link.

In [None]:
if 'page__contents character-page' == 'page__contents character-page':
    print(True)

True


In [None]:
def get_comics_full_report(wiki):
    
    soup = get_soup(wiki)
    character_base = 'https://www.marvel.com' 
    
    try:
        url_in_comic = character_base + soup.find_all('a', {'class': 'masthead__tabs__link'})[-1]['href']
        return url_in_comic
    except:
        if soup.find('div', {'class': 'page__contents character-page'}):
            return wiki
        elif soup.find('div', {'class': 'page__contents detail-character-default-page'}):
            return wiki
        else:
            return None
    
get_comics_full_report(url_wiki)


'https://www.marvel.com/characters/black-widow-natasha-romanova/in-comics'

In [None]:
get_comics_full_report('http://marvel.com/universe/Moonstone_%28Karla_Sofen%29')

'http://marvel.com/universe/Moonstone_%28Karla_Sofen%29'

Extract information about different attributes, including the height, the weight, the gender, etc. 

Note: return the <b>height</b> (in cm), and the <b>weight</b> (in kg) for the considered character in <i>float</i> form as output, respectively. Assume that 1 lbs = 0.453592 kg, 1 foot = 30.48 cm and 1 inch = 2.54 cm. In cases where no such information is given or not in either feet/inches or lbs, return <i>None</i>.

In [None]:
import re

def get_height(soup_in_comic):
    try:
        if 'height' in str(soup_in_comic.find_all('div', {'class': "bioheader__charInfo"})).lower():
            height = re.findall(r'\d+', soup_in_comic.find_all('p', {'class': 'bioheader__stat'})[0].text)
            if len(height) ==1:
                height_cm = float(int(height[0])*30.48)
                return height_cm
            else:
                height_cm = float(int(height[0])*30.48 + int(height[1])*2.54)
                return height_cm
        else:
            return None
    except:
        return None

In [None]:
get_height(get_soup(get_comics_full_report('http://marvel.com/universe/Abomination')))

203.2

In [None]:
def get_weight(soup_in_comic):
    
    if 'weight' in str(soup_in_comic.find_all('div', {'class': "bioheader__charInfo"})).lower():
        try:
            weight = re.findall(r'\d+', soup_in_comic.find_all('p', {'class': 'bioheader__stat'})[1].text)
            weight_kg = float(int(weight[0]))*0.453592
            return weight_kg
        except:
            return None
    else:
        return None

In [None]:
get_weight(get_soup(get_comics_full_report('https://www.marvel.com/characters/firelord-pyreus-kril')))

99.79024

Retrieve information of the <b>gender</b>, the <b>eyecolor</b> and the <b>haircolor</b>. For the case of the eye and haircolors, return the whole string of information without extra empty spaces, i.e. "White (formerly black)".

In [None]:
def get_gender(soup_in_comic):
    if 'gender' in str(soup_in_comic.find_all('div', {'class': "bioheader__charInfo"})).lower():
        gender = soup_in_comic.find_all('p', {'class': 'bioheader__stat'})[2].text.split(',')[0]
        return str(gender)
    else:
        return None

In [None]:
get_gender(get_soup(get_comics_full_report(url_wiki)))

'Female'

In [None]:
def get_eyes(soup_in_comic):
    if 'eyes' in str(soup_in_comic.find_all('div', {'class': "bioheader__charInfo"})).lower():
        eye = soup_in_comic.find_all('p', {'class': 'bioheader__stat'})[3].text.split(',')[0]
        try:
            eye_first = eye.split('(')
            eye_first[0] = eye_first.strip()
            eye_bracket = eye_first[0] + '(' + eye_first[1:]
            return eye_bracket
        except:
            return eye
    else:
        return None

In [None]:
get_eyes(get_soup(get_comics_full_report(url_wiki)))

'Blue'

In [None]:
def get_hair(soup_in_comic):
    if 'hair' in str(soup_in_comic.find_all('div', {'class': "bioheader__charInfo"})).lower():
        hair = soup_in_comic.find_all('p', {'class': 'bioheader__stat'})[-1].text.split(',')[0]
        try:
            hair_first = hair.split('(')
            hair_first[0] = hair_first.strip()
            hair_bracket = hair_first[0] + '(' + hair_first[1:]
            return hair_bracket
        except:
            return hair
    else:
        return None

In [None]:
get_hair(get_soup(get_comics_full_report(url_wiki)))

'Red-auburn'

Extract information regarding the place of origin for a given character.

In [None]:
def get_place_of_origin(soup_in_comic):
    if 'place of origin' in str(soup_in_comic.find_all('ul',class_="railBioInfo")).lower():
        place = soup_in_comic.find_all('ul', {'class': 'railBioLinks'})[3].text
        return str(place)
    else:
        return None

In [None]:
get_place_of_origin(get_soup(get_comics_full_report(url_wiki)))

'Stalingrad, Former U.S.S.R'

Extract a list of relevant powers. In particular, we are interested in knowing whether the character has the powers of *flight*, *hypnosis*, *telepathy* and *teleportation*

In [None]:
four_powers = ['flight', 'hypnosis', 'telepathy', 'teleportation']

def get_powers(soup_in_comic):
    power_result = []
    if 'powers' in str(soup_in_comic.find_all('ul',class_="railBioInfo")).lower():
        powers = soup_in_comic.find_all('ul', {'class': 'railBioLinks'})[-2].text.split(' ')
        powers = [x.lower() for x in powers]
        for power in four_powers:
            if power not in powers:
                power_result.append(False)
            else:
                power_result.append(True)
        return power_result
    else:
        power_result = [None, None, None, None] 
        return power_result

In [None]:
get_powers(get_soup(get_comics_full_report(url_wiki)))

[False, False, False, False]

## Extracting the data

Choose a serie from the list of all the Marvel series in this [link](https://www.marvel.com/comics/series).

In [None]:
series_name = 'Avengers'

Extract information about the different characters that appear in your chosen series.

<div class="alert alert-warning">Note that Marvel's API returns information in batches of 100 characters at most. Make a single request, so that if the number of characters in the chosen series is larger than 100, only retrieve the first 100. If the number of characters in the chosen series is smaller than 100, retrieve them all in a single request.</div>

In [None]:
series_endpoint = '/v1/public/series/'

series_url = base_url + series_endpoint[:-1]

params = {'title': series_name, 'apikey': publickey, 'ts': ts, 'hash': md5hash}
response_series = requests.get(series_url, params=params).json()

In [None]:
seriesId = response_series['data']['results'][0]['id']
seriesId

354

In [None]:
ser_cha_url = base_url + series_endpoint + str(seriesId) + '/characters'

params = {'seriesId': seriesId, 'apikey': publickey, 'limit': 100, 'ts': ts, 'hash': md5hash}
response_characters = requests.get(ser_cha_url, params=params).json()
response_characters

{'code': 200,
 'status': 'Ok',
 'copyright': '© 2021 MARVEL',
 'attributionText': 'Data provided by Marvel. © 2021 MARVEL',
 'attributionHTML': '<a href="http://marvel.com">Data provided by Marvel. © 2021 MARVEL</a>',
 'etag': '843b0d9bf1b684a1dfd2d746f0355d32921aa49e',
 'data': {'offset': 0,
  'limit': 100,
  'total': 121,
  'count': 100,
  'results': [{'id': 1009144,
    'name': 'A.I.M.',
    'description': 'AIM is a terrorist organization bent on destroying the world.',
    'modified': '2013-10-17T14:41:30-0400',
    'thumbnail': {'path': 'http://i.annihil.us/u/prod/marvel/i/mg/6/20/52602f21f29ec',
     'extension': 'jpg'},
    'resourceURI': 'http://gateway.marvel.com/v1/public/characters/1009144',
    'comics': {'available': 52,
     'collectionURI': 'http://gateway.marvel.com/v1/public/characters/1009144/comics',
     'items': [{'resourceURI': 'http://gateway.marvel.com/v1/public/comics/36763',
       'name': 'Ant-Man & the Wasp (2010) #3'},
      {'resourceURI': 'http://gateway.

In [None]:
len(response_characters['data']['results'])

100

Retrieve information about each of the characters in the series above separately. For that purpose, first identify their names and wiki urls

In [None]:
names = []
url_wikis = []

for i in response_characters['data']['results']:
    for url in i['urls']:
        if url['type'] == 'wiki':
            url_wikis.append(url['url'].split('?')[0])
            names.append(i['name'])

print(names)
print(url_wikis)

['A.I.M.', 'Abomination (Emil Blonsky)', 'Ant-Man (Scott Lang)', 'Archangel', 'Ares', 'Atlas (Team)', 'Attuma', 'Avengers', 'Beast', 'Black Knight (Sir Percy of Scandia)', 'Black Panther', 'Black Widow', 'Bulldozer', 'Captain America', 'Captain Britain', 'Captain Marvel (Carol Danvers)', 'Colossus', 'Count Nefaria', 'Crystal', 'Daredevil', 'Darkhawk', 'Darkstar', 'Diablo', 'Doc Samson', 'Doctor Doom', 'Edwin Jarvis', 'Ego', 'Falcon', 'Firebird', 'Firestar', 'Gambit', 'Ghost Rider (Johnny Blaze)', 'Grim Reaper', 'Hank Pym', 'Hawkeye', 'Hellcat (Patsy Walker)', 'Hercules', 'Hulk', 'Human Torch', 'Hyperion (Earth-712)', 'Iceman', 'In-Betweener', 'Invaders', 'Invisible Woman', 'Iron Man', 'Jane Foster', 'Juggernaut', 'Justice', 'Kang', 'Kitty Pryde', 'Kulan Gath', 'Lilandra', 'Living Lightning', 'Machine Man', 'Marrow', 'Micromax', 'Moon Knight', 'Moondragon', 'Moonstone', 'Morgan Le Fay', 'Mr. Fantastic', 'Namor', 'Namorita', 'Nick Fury', 'Night Thrasher', 'Nightcrawler', 'Nova', 'Penance

In [None]:
print(len(names))
print(len(url_wikis))

92
92


Retrieve the name, height, weight, gender, eyecolor, haircolor, place of origin and the powers of each of the characters in the chosen series. Store this information in <b>marvel</b> DataFrame.

<div class="alert alert-warning">For the characters without the wiki website. When this happens, fill in all the values in the corresponding row using a <i>None</i>.</div>

In [None]:
url_in_comic = []
for url in url_wikis:
    
    try:
        print("URL",url)
        temp = get_comics_full_report(url)
        url_in_comic.append(temp)
        print("TEMP",temp)
        
    except:
        print(url)
        url_in_comic.append(None)
    
url_in_comic

URL http://marvel.com/universe/A.I.M.
TEMP None
URL http://marvel.com/universe/Abomination
TEMP http://marvel.com/universe/Abomination
URL http://marvel.com/universe/Ant-Man_(Scott_Lang)
TEMP None
URL http://marvel.com/universe/Angel_(Warren_Worthington_III)
TEMP http://marvel.com/universe/Angel_(Warren_Worthington_III)
URL http://marvel.com/universe/Ares
TEMP http://marvel.com/universe/Ares
URL http://marvel.com/universe/Atlas_(Team)
TEMP None
URL http://marvel.com/universe/Attuma
TEMP http://marvel.com/universe/Attuma
URL http://marvel.com/universe/Avengers
TEMP https://www.marvel.com/teams-and-groups/avengers/in-comics
URL http://marvel.com/universe/Beast_(Henry_McCoy)
TEMP http://marvel.com/universe/Beast_(Henry_McCoy)
URL http://marvel.com/universe/Black_Knight_(Sir_Percy_of_Scandia)
TEMP http://marvel.com/universe/Black_Knight_(Sir_Percy_of_Scandia)
URL http://marvel.com/universe/Black_Panther_(T%27Challa)
TEMP None
URL http://marvel.com/universe/Black_Widow_(Natasha_Romanova)
TE

TEMP None
URL http://marvel.com/universe/Starfox
TEMP http://marvel.com/universe/Starfox
URL http://marvel.com/universe/Stingray_%28Walter_Newell%29
TEMP http://marvel.com/universe/Stingray_%28Walter_Newell%29


[None,
 'http://marvel.com/universe/Abomination',
 None,
 'http://marvel.com/universe/Angel_(Warren_Worthington_III)',
 'http://marvel.com/universe/Ares',
 None,
 'http://marvel.com/universe/Attuma',
 'https://www.marvel.com/teams-and-groups/avengers/in-comics',
 'http://marvel.com/universe/Beast_(Henry_McCoy)',
 'http://marvel.com/universe/Black_Knight_(Sir_Percy_of_Scandia)',
 None,
 'https://www.marvel.com/characters/black-widow-natasha-romanova/in-comics',
 'http://marvel.com/universe/Bulldozer_(Henry_Camp)',
 'https://www.marvel.com/characters/captain-america-steve-rogers/in-comics',
 'http://marvel.com/universe/Captain_Britain_(Brian_Braddock)',
 'https://www.marvel.com/characters/captain-marvel-carol-danvers/in-comics',
 'http://marvel.com/universe/Colossus_(Piotr_Rasputin)',
 'http://marvel.com/universe/Count_Nefaria',
 'https://www.marvel.com/characters/crystal/in-comics',
 'https://www.marvel.com/characters/daredevil-matthew-murdock/in-comics',
 'http://marvel.com/universe/Da

In [None]:
soup_in_comic = []

for i in url_in_comic:
    try:
        if i != None:
            soup_in_comic.append(get_soup(i))
        else:
            soup_in_comic.append(None)
    except:
        soup_in_comic.append(None)

In [None]:
name = []
height = []
weight = []
gender = []
eyes = []
hair = []
place_of_origin = []
flight = []
hypnosis = []
telepathy = []
teleportation = []

In [None]:
for i in soup_in_comic:
    if i != None:
        height.append(get_height(i))
        weight.append(get_weight(i))
        gender.append(get_gender(i))
        eyes.append(get_eyes(i))
        hair.append(get_hair(i))
        place_of_origin.append(get_place_of_origin(i))          
        flight.append(get_powers(i)[0])
        hypnosis.append(get_powers(i)[1])
        telepathy.append(get_powers(i)[2])
        teleportation.append(get_powers(i)[3])
    else:
        height.append(None)
        weight.append(None)
        gender.append(None)
        eyes.append(None)
        hair.append(None)
        place_of_origin.append(None)
        flight.append(None)
        hypnosis.append(None)
        telepathy.append(None)
        teleportation.append(None)


In [None]:
import pandas as pd

marvel = pd.DataFrame()
marvel['names'] = names
marvel['height'] = height
marvel['weight'] = weight
marvel['gender'] = gender
marvel['eyes'] = eyes
marvel['hair'] = hair
marvel['place_of_origin'] = place_of_origin
marvel['flight'] =flight
marvel['hypnosis'] = hypnosis
marvel['telepathy'] =telepathy
marvel['teleportation'] =teleportation
marvel

Unnamed: 0,names,height,weight,gender,eyes,hair,place_of_origin,flight,hypnosis,telepathy,teleportation
0,A.I.M.,,,,,,,,,,
1,Abomination (Emil Blonsky),203.20,444.52016,,(Abomination) None; (Blonsky) Blond,(Abomination) None; (Blonsky) Blond,"Zagreb, Yugoslavia",,,,
2,Ant-Man (Scott Lang),,,,,,,,,,
3,Archangel,182.88,68.03880,,Blond,Blond,"Centerport, Long Island, New York",,,,
4,Ares,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...
87,Songbird,,,,,,,,,,
88,Spider-Man (Peter Parker),,,,,,,,,,
89,Squadron Supreme (Earth-712),,,,,,,,,,
90,Starfox,185.42,86.18248,,Red,Red,"Titan, moon of Saturn",,,,
