## The tutorial is mainly based on the paper "ArgumenText:Searching for Arguments in Heterogeneous Sources"

C.Stab et al., 2018

* **Tutors**:
    * Nesara Gurunatha 
    * Deepak Garg

### Task Description

The main intention of this tutorial is to get some idea on how to index and retrieve documents and differentiate the arguments between pros and cons. 

The implementation is divided into 2 parts :
    
    Part 1 : Importing and Indexing the data (TODO)
    Part 2 : Retrieval of the data (TODO)

#### Required libraries

<code>pip install whoosh</code><br>

### Part 1

#### Import the data which is in json format and prepare it for indexing.

In [1]:
import sys
import os
import json
from whoosh.index import create_in
from whoosh.fields import Schema, TEXT, ID, KEYWORD


def searchData(root):

    '''
    Schema: Set of all possible fields in a document.
    For example, consider a schema for indexing emails 
    that might contain fields like from_addr, to_addr, 
    subject, body and attachments.
    '''
    schema = Schema(Title=TEXT(stored=True),URL=ID(stored=True),\
              Context=KEYWORD(stored=True, scorable=True))
    
    # Create a directory if not present where the indices are stored
    if not os.path.exists("indir"):
        os.mkdir("indir")
        
    try:
    # Create an index writer to add document as per the schema
        index = create_in("indir",schema)
        writer = index.writer()
        filepaths = [os.path.join(root,i) for i in os.listdir(root)]
        for path in filepaths:
            with open(path, "r", encoding='utf8') as fp:
                data = json.load(fp)
                writer.add_document(Title=data['MetaData']["Title"], URL=data['MetaData']['URL'], Context=data['MetaData']['Context'])
        writer.commit()
              
    except Exception as e:
        print(e)


root = "json_dataset"
searchData(root)

### Part 2

#### After the data has been indexed, next step is to retrieve the data (search and score the documents).

In [2]:
from whoosh import qparser
from whoosh.index import open_dir
from whoosh.scoring import BM25F

input_search = input("enter an input to be searched: ").lower()
rank_func = 'BM25' # Ranking function to estimate relevance of documents for a search query

directory_name = "indir"
index = open_dir(directory_name) # Open a directory where the indices are present
qp = qparser.MultifieldParser(['Context', 'Title', 'URL'], index.schema) # Configured to search in multiple fields
query = qp.parse(input_search)

enter an input to be searched: north korea


In [6]:
# Here you can either opt BM25 ranking function or any other function in "else" part for providing the score to documents
if rank_func == "BM25":
    w = BM25F(B=0.75, K1=1.5)
else:
    pass

# Method for searching the index
with index.searcher(weighting=w) as searcher: # Runs an object on a searcher and returns a result
    result = searcher.search(query, terms=True)
    num_of_doc = result.scored_length() # Returns the number of scored documents 
   # run_time = result.runtime
    
    if num_of_doc == 0:
        top_output = str(num_of_doc) + " results found" 
        print(top_output)
        
    else:
        top_output = "Top " + str(num_of_doc) + " search results found for the input " + input_search + " using "+rank_func
        print(top_output)
        print()
      
    if result:
        for hit in result:
            Title = hit["Title"]
            URL = hit["URL"]
            Context = hit["Context"]
            score = hit.score
            score = URL + "   (Score: " + str(score) + ")"
            print(str(Context), '\n', str(score), '\n')
            
           

Top 1 search results found for the input north korea using BM25

URL: http://debatewise.org/debates/2562-western-relations-with-north-korea/
Crawl date: 2016/10/03 20:49:56
Title: Western Relations with North Korea - DebateWise
Number of arguments: 5


Claim For: One of the primary reasons that is often cited to support the establishment of diplomatic and trade ...

Evidence Pro: One of the primary reasons that is often cited to support the establishment of diplomatic and trade relations with North Korea is that of humanitarian concerns. It is an unassailable fact that scores of North Korean citizens die from poverty and starvation annually. At present there remain many barriers in place that ought to be eliminated in order to encourage the flow of more aid, especially food aid to the country.

Evidence Con: It is foolhardy to presume that no aid is received by North Korea at present due to the various barriers that exist. Arguably, the existing barriers do not significantly hinder the