# Mapping controversies script 2: Make two different networks based on all links found on a wikipedia page 

In the script "MCTutorial2_Wikipedia_InText_reference_Network_final" we looked at all the links found in the main text of a Wikipedia article. By doing so, we exclude links that has been assigned to the article based on a template. As wikipedia puts it: _"Templates are pages that are embedded (transcluded) into other pages to allow for the repetition of information"_ ([Wikipedia templates](https://en.wikipedia.org/wiki/Wikipedia:Templates)). The template can be found in the buttom of every Wikipedia page: 

<img src="https://res.cloudinary.com/dra3btd6p/image/upload/v1549631130/Mapping%20controversies%202019/Template.jpg" title="Category:circumcision" style="width: 700px;" /> 

In this tutorial, we will include all "internal" links to other Wikipedia pages found on a page (i.e. the links found in the templates and in the main text). 

This script takes as input a file with category members from Wikipedia (e.g. "cat_members_circumcision_depth_2.json") and builds two networks. One network with the cat members (only) connected by the links and one with the cat-mebers + all the pages they point to.


## Step 1: Installing the right libraries
Libraries for Jupyter can be understood as preprogrammed script parts. This means, that instead of writing a lot of lines of code in order e.g. make contact to Wikipedia, you can do it in one command.


__Obs: in this workbook we will be using the wikipedia and networkx libraries. If you have already installed them once, there is no need to do it again. You may simply skip to step 2.__

In [1]:
# In this cell Jupyter checks whether you have the right libraries installed 

import sys

try: #First, Jupyter tries to import a library
    import wikipediaapi
    print("wikipediaapi library has been imported")
except: #If it fails, it will try to install the library
    print("wikipediaapi library not found. Installing...")
    !pip install wikipedia-api
    try:#... and try to import it again
        import wikipediaapi
    except: #unless it fails, and raises an error.
        print("Something went wrong in the installation of the wikipediaapi library. Please check your internet connection and consult output from the installation below")
try:
    import networkx
    print("NetworkX library has been imported")
except:
    print("NetworkX library not found. Installing...")
    !pip install networkx
    
    try:
        import networkx
    except:
        print("Something went wrong in the installation of the NetworkX library. Please check your internet connection and consult output from the installation below")

        

wikipediaapi library has been imported
NetworkX library has been imported


## Step 2: Make the networks of all links

The next step is to make the network. Here, you need to input the path to the json files you got from the MCTutorial1_Wikipedia_HarvestCatMembers_final script. 

If the JSON files are in the same directory as the scripts, you only need to input relational directions (i.e. the name of the json file e.g. cat_members_circumcision_depth_2)

<img src="https://res.cloudinary.com/dra3btd6p/image/upload/v1549444568/Mapping%20controversies%202019/Script_json_same_folder_in_text.jpg" title="Folder" style="width: 800px;" /> 

In order to run the script, click on the cell below and press "Run" in the menu.

In [None]:
import wikipediaapi
import networkx as nx
import json

cat_members_all=[]
print("Enter the name of the category members json file you wish to use for keyword search (e.g.cat_members_circumcision_depth_2). If you have multiple files separate them with a comma")
filename= input()
if "," in filename:

    for each in filename.split(","):


        if not each.endswith(".json"):
            path=each+".json"
        else: 
            path=each
            each=each.split(".")[0]
        with open(path) as jsonfile:
            cat_members = json.load(jsonfile)
            jsonfile.close()
        for every in cat_members:
            cat_members_all.append(every)
else:
    print(" ")


    if not filename.endswith(".json"):
        path=filename+".json"
    else: 
        path=filename
        filename=filename.split(".")[0]
    with open(path) as jsonfile:
        cat_members_all = json.load(jsonfile)
        jsonfile.close()

    
    
print('Enter the desired language version of wikipedia (e.g. "en","da","fr",etc.) or leave blank to use default (english):')

input_lan = input()
if not input_lan:
    lan="en"
else:
    lan=input_lan
print(" ")
wiki_wiki = wikipediaapi.Wikipedia(lan)


seen = []
network = {}
print("Harvesting all links from "+str(len(cat_members_all))+" wikipedia pages. This might take a while...")
print("")

count=1
for each in cat_members_all:
    title=each["title"]
    if count % 50 == 0:
        print("All links harvested from "+str(count)+" pages out of "+str(len(cat_members_all))+". Continuing harvest...")
    if not title in seen:
        seen.append(title)
        try:
        
            page=wiki_wiki.page(title)
            text_links = []
            links = page.links
            for link_title in sorted(links.keys()):
                text_links.append(link_title)
            network.update({title:text_links})

        except:
            print('SKIPPED: '+title)
            print("")
    count=count+1
    
print("All pages harvested...")
new_cat_members={}
for each in cat_members_all:
    new_cat_members[each["title"]]={"level":each["level"]}
    
membersonly_edges = []
all_edges = []
members = network.keys()
print("Calculating networks...")
print("")
for source in network:
    for target in network[source]:
        edge = (source,target)
        all_edges.append(edge)
        if target in members:
            membersonly_edges.append(edge)
print("Saving networks...")
print("")
G = nx.DiGraph()
G.add_edges_from(membersonly_edges)
nx.write_gexf(G,'MCTutorial2_2_'+ filename+'_AllLinksNet_membersonly.gexf')

G = nx.DiGraph()
G.add_edges_from(all_edges)
for each in G.nodes:
    if each in members:
        G.nodes[each]['member_level'] = 'Level '+str(new_cat_members[each]["level"])
    else:
        G.nodes[each]['member_level'] = 'Not a member'
nx.write_gexf(G, 'MCTutorial2_2_'+filename+'_AllLinksNet_allpages.gexf')
print("The script is done. You can find your network files by following these paths: ")
print("")
locale=!pwd
print(locale[0]+"/"+'MCTutorial2_2_'+filename+'_AllLinksNet_membersonly.gexf')
print("")
print(locale[0]+"/"+'MCTutorial2_2_'+filename+'_AllLinksNet_allpages.gexf')