# Team Assembler

![header](../images/The_Marvel_Universe.png)

In this first notebook we are going to substract all the characters from different Marvel heroes and villain teams to create the graph that is going to be used on the project

In [1]:
import json
import urllib.request

import re

import pandas as pd
import numpy as np


from tqdm.notebook import tqdm

tqdm.pandas()

To get who bellongs in which team, first we need to get all the teams. We use `Category:Earth-616/Teams` for that.

In [2]:
def get_teams():
  
  cmcontinue_text = ""
  first_time = True
  
  team_list = []
  
  while cmcontinue_text or first_time: 
  
    first_time = False
  
    baseurl = "https://marvel.fandom.com/api.php?"
    action = "action=query&list=categorymembers"
    q_title = "cmtitle=Category:Earth-616/Teams"

    content = "prop=revisions&rvprop=content&rvslots=*"
    dataformat ="format=json"
    cmcontinue = "cmlimit=max&cmcontinue={}".format(cmcontinue_text)

    query = "{}{}&{}&{}&{}&{}".format(baseurl, action, q_title, content, dataformat, cmcontinue)
    wikiresponse = urllib.request.urlopen(query)
    wikidata = wikiresponse.read()
    wikitext = wikidata.decode('utf-8')
    
    wiki_json = json.loads(wikitext)
    
    team_list += [team["title"] for team in wiki_json["query"]["categorymembers"]
                 if not team["title"].startswith("Category:")]
    
    if "continue" in list(wiki_json.keys()):
      cmcontinue_text = wiki_json["continue"]["cmcontinue"]
    else:
      cmcontinue_text = ""
      
  first_time = True
    
  while cmcontinue_text or first_time: 
  
    first_time = False
  
    baseurl = "https://marvel.fandom.com/api.php?"
    action = "action=query&list=categorymembers"
    q_title = "cmtitle=Category:Earth-616/Organizations"

    content = "prop=revisions&rvprop=content&rvslots=*"
    dataformat ="format=json"
    cmcontinue = "cmlimit=max&cmcontinue={}".format(cmcontinue_text)

    query = "{}{}&{}&{}&{}&{}".format(baseurl, action, q_title, content, dataformat, cmcontinue)
    wikiresponse = urllib.request.urlopen(query)
    wikidata = wikiresponse.read()
    wikitext = wikidata.decode('utf-8')
    
    wiki_json = json.loads(wikitext)
    
    team_list += [team["title"] for team in wiki_json["query"]["categorymembers"]
                 if not team["title"].startswith("Category:")]
    
    if "continue" in list(wiki_json.keys()):
      cmcontinue_text = wiki_json["continue"]["cmcontinue"]
    else:
      cmcontinue_text = ""
      
  return team_list

In [3]:
teams = list(set(get_teams()))

In [4]:
len(teams)

3051

With this we get that there are 1778 different teams. Now we need the members of each team. Luckily, there is a special page where we can get that: `Category:{team_name}/Members`

In [5]:
def getMembers(team):  
  cmcontinue_text = ""
  first_time = True
  
  member_list = []
  
  while cmcontinue_text or first_time: 
  
    first_time = False
  
    baseurl = "https://marvel.fandom.com/api.php?"
    action = "action=query&list=categorymembers"
    q_title = "cmtitle=Category:{}/Members".format(urllib.parse.quote_plus(team.replace(" ", "_")))

    content = "prop=revisions&rvprop=content&rvslots=*"
    dataformat ="format=json"
    cmcontinue = "cmlimit=max&cmcontinue={}".format(cmcontinue_text)

    query = "{}{}&{}&{}&{}&{}".format(baseurl, action, q_title, content, dataformat, cmcontinue)
    wikiresponse = urllib.request.urlopen(query)
    wikidata = wikiresponse.read()
    wikitext = wikidata.decode('utf-8')
    
    wiki_json = json.loads(wikitext)
    
    member_list += [member["title"] for member in wiki_json["query"]["categorymembers"]]
    
    if "continue" in list(wiki_json.keys()):
      cmcontinue_text = wiki_json["continue"]["cmcontinue"]
    else:
      cmcontinue_text = ""
      
  return member_list

In [6]:
dataset = []
for team in tqdm(teams):
  dataset.append([team, getMembers(team)])

  0%|          | 0/3051 [00:00<?, ?it/s]

In [7]:
df = pd.DataFrame(dataset, columns=["team_name", "members"])
df

Unnamed: 0,team_name,members
0,Oracle Inc. (Earth-616),"[Anita Savvy (Earth-616), Caleb Alexander (Ear..."
1,Metal Mobsters (Earth-616),[]
2,Knights of the Atomic Round Table (Earth-616),"[Anthony Stark (Earth-616), Bruce Banner (Eart..."
3,Yoruba,[]
4,Emissaries of Evil (Electro) (Earth-616),"[Manuel Eloganto (Earth-616), Maxwell Dillon (..."
...,...,...
3046,Junkyard Dogs (Earth-616),[Rashid Hammer Jones (Earth-616)]
3047,S.H.I.E.L.D. Psi Division (Earth-616),"[Agent Locke (Earth-616), Daniel Fricks (Earth..."
3048,Crusade (Earth-616),[Ezra Keith (Earth-616)]
3049,Fraternity of Raptors (Warp World) (Earth-616),[]


In [8]:
df.to_csv("../data/marvel_teams.csv", index=False)