# Mapping controversies script 5: Keyword search in articles  

In this script you can specify an array of articles (either individual or by category members files), and search for keywords. The script will output a csv file with the results. 


## Step 1: Installing the right libraries
Libraries for Jupyter can be understood as preprogrammed script parts. This means, that instead of writing a lot of lines of code in order e.g. make contact to Wikipedia, you can do it in one command.


__Obs: in this workbook we will be using the wikipedia library. If you have already installed it once, there is no need to do it again. You may simply skip to step 2.__

In [1]:
try:
    import wikipediaapi
    print("Wikipedia api library has been imported")
except:
    print("wikipedia api library not found. Installing...")
    !pip install wikipedia-api
    
    try:
        import wikipediaapi
    except:
        print("Something went wrong in the installation of the wikipedia api library. Please check your internet connection and consult output from the installation below")


Wikipedia api library has been imported


## Step 2: Make the queries 

In order to run the script, click on the cell below and press "Run" in the menu.

In [1]:
import wikipediaapi
import csv
import json
print("How do you want to input the pages for the keyword search?")
print("Enter '1' if you want to use a category members json file.")
print("Enter '2' if you want to enter the pages manually.")
print("Enter '0' if you want to use category members json file AND enter pages manually.")
pages=[]
input_style=input()
#input_style=2
if input_style==str(1) or input_style==1 or input_style==0 or input_style==str(0):
    print("Enter the name of the category members json file you wish to use for keyword search (e.g.cat_members_circumcision_depth_2). If you have multiple files separate them with a comma")
    filename= input()
    if "," in filename:
        
        for each in filename.split(","):
            

            if not each.endswith(".json"):
                path=each+".json"
            else: 
                path=each
                each=each.split(".")[0]
            with open(path) as jsonfile:
                cat_members = json.load(jsonfile)
                jsonfile.close()
            for every in cat_members:
                pages.append(every['title'])
    else:
        print(" ")
        

        if not filename.endswith(".json"):
            path=filename+".json"
        else: 
            path=filename
            filename=filename.split(".")[0]
        with open(path) as jsonfile:
            cat_members = json.load(jsonfile)
            jsonfile.close()
        for each in cat_members:
            pages.append(each['title'])
    
if input_style==str(2) or input_style==2 or input_style==0 or input_style==str(0):
    print("Enter the names of the pages you wish to use for keyword search. If multiple pages use comma separation (e.g. circumcision,Female genital mutilation etc)")
    raw_input=input()
    #raw_input="circumcision"
    if "," in raw_input:
        for each in raw_input.split(","):
            pages.append(each)
    else:
        pages.append(raw_input)

print('Enter the desired language version of wikipedia (e.g. "en","da","fr",etc.) or leave blank to use default (english):')

input_lan = input()
if not input_lan:
    lan="en"
else:
    lan=input_lan
wiki_wiki = wikipediaapi.Wikipedia(
        language=lan,
        extract_format=wikipediaapi.ExtractFormat.WIKI
)

print("Enter the keyword(s) you would like to query for. If more than one, use comma separation. Note, that the script will not differentiate between lower and capital letters.")

keywords=input()
#keywords="HIV,HPV"

print("Do you want to use wild card in the end of the keyword (y/n)? (e.g. keyword adult will return adult, adults, adulthood etc.)")
wildcard_end=input().lower()

keyword_list=[]

if "," in keywords:
    for each in keywords.split(","):
        keyword_list.append(each.strip().lower())
else:
    keyword_list.append(keywords.strip().lower())
prefix=""
for keyword in keyword_list:
    prefix=prefix+keyword+"_"
filename=prefix+"KeywordSearchInArticles.csv"

page_dict={}
keyword_dict={}
print("Collecting and analyzing text from "+str(len(pages))+" pages...")
for keyword in keyword_list:
    keyword_dict[keyword]=0
for page in pages:
    p_wiki = wiki_wiki.page(page)
    page_text=p_wiki.text.lower()

    for punc in page_text:
        if punc==',' or punc=='.':
            page_text=page_text.replace(punc, " ")
    page_dict[page]={"keywords":{}}
    for keyword in keyword_list:
        
        if wildcard_end=="n":
            new_keyword=" "+keyword+" "
        else:
            new_keyword=" "+keyword
        keyword_count=page_text.count(new_keyword)
        keyword_dict[keyword]=keyword_dict[keyword]+keyword_count
        page_dict[page]["keywords"][keyword.strip()]=keyword_count
print("")
print("Your search is over. ")

for keyword in keyword_dict:
    print("The keyword "+keyword+" appeared "+str(keyword_dict[keyword])+" times in total.")
    print("")
print("Saving CSV...")

headers=["id"]

csv_path=filename

for each in keyword_list: 
    headers.append(each)

with open(csv_path,"w", newline='',encoding='utf-8') as f:
    wr = csv.writer(f, delimiter=",")
    wr.writerow(headers)
for page in page_dict:
    csv_list=[page]
    for each in keyword_list:
        entry=page_dict[page]["keywords"][each]
        csv_list.append(entry)
    with open(csv_path,"a", newline='',encoding='utf-8') as f:
        wr = csv.writer(f, delimiter=",")
        wr.writerow(csv_list)
print('CSV file saved. You can find the network by following this path: ')
locale=!pwd
print(locale[0]+"/"+filename)

How do you want to input the pages for the keyword search?
Enter '1' if you want to use a category members json file.
Enter '2' if you want to enter the pages manually.
Enter '0' if you want to use category members json file AND enter pages manually.
1
Enter the name of the category members json file you wish to use for keyword search (e.g.cat_members_circumcision_depth_2). If you have multiple files separate them with a comma
category_members_Theory_of_cryptography_depth_2
 
Enter the desired language version of wikipedia (e.g. "en","da","fr",etc.) or leave blank to use default (english):
en
Enter the keyword(s) you would like to query for. If more than one, use comma separation. Note, that the script will not differentiate between lower and capital letters.
differential privacy
Do you want to use wild card in the end of the keyword (y/n)? (e.g. keyword adult will return adult, adults, adulthood etc.)
y
Collecting and analyzing text from 95 pages...

Your search is over. 
The keyword 