# This tutorial is based on the paper ArgumenText: Searching for Arguments in Heterogeneous Sources
Stab et. al.

* **Tutors**:
    * Deepak Garg 
    * Nesara Gurunatha

The main intention of this tutorial is to get some idea on how to index and retrieve documents and differentiate the arguments between pros and cons.

The implementation is divided into 4 parts :
    
    Part 1 : Importing and structuring the Dataset 
    Part 2 : Indexing
    Part 3 : Retrieval
    Part 4 : Identification and Stance Recognition

#### Library Installation cells. Uncomment them and run the cells. To be skipped if already installed

In [None]:
#!pip install pandas

In [None]:
#!pip install elasticsearch

In [None]:
#!pip install textblob

In [None]:
#!pip install requests

In [1]:
import csv, json, io
import pandas as pd
import requests
from elasticsearch import Elasticsearch
from textblob import TextBlob

# Part 1.
## Importing and structuring the Dataset of IBM Debater

#### Importing the document

In [2]:
# import json data file here
jsonFile = open("../ibm_data/data.json", "r+", encoding="utf8")

# Part 2.
## Indexing


### TASK : Indexing the document


#### Getting then neighbouring sentences for each source and target pairs

In [3]:
# make sure ES is up and running
res = requests.get('http://localhost:9200')
print(res.content)
# Connect to the elastic cluster
es=Elasticsearch([{'host':'localhost','port':9200}])
es

b'{\n  "name" : "LAPTOP-MMVTV8MH",\n  "cluster_name" : "elasticsearch",\n  "cluster_uuid" : "ZEt-eehDQMCSTUf2nAmN0Q",\n  "version" : {\n    "number" : "7.2.0",\n    "build_flavor" : "unknown",\n    "build_type" : "unknown",\n    "build_hash" : "508c38a",\n    "build_date" : "2019-06-20T15:54:18.811730Z",\n    "build_snapshot" : false,\n    "lucene_version" : "8.0.0",\n    "minimum_wire_compatibility_version" : "6.8.0",\n    "minimum_index_compatibility_version" : "6.0.0-beta1"\n  },\n  "tagline" : "You Know, for Search"\n}\n'


<Elasticsearch([{'host': 'localhost', 'port': 9200}])>

In [4]:
# Make Index of the document
def buildIndex(index_name, data):
    if es.indices.exists(index_name):
        print("Index already exists... Aborting!" .format(index_name))
        return

    
    for idx, doc in enumerate(data):    
#         print("Processing id: {}".format(idx))
          es.index(index=index_name, doc_type='text', id=idx, body=doc)
#         print(result)        

In [5]:
#mention the name of index here
indexName = "data1"
buildIndex(indexName, jsonFile)

# Part 3.
## Retrieval : Retrieving using elastic search

In [6]:
# Retrieval of the document from the index
def searchIndex(indexname, search_query):
    res = es.search(index=indexname, body={"query": {"match": {"topic": search_query}}}, size=500)
    # print("%d documents found:" % res['hits']['total'])
    output_sentence = []
    output_score = []
    output_sentence_score = {}
    for doc in res['hits']['hits']:
        score = doc['_score']
        perspective_sentence = doc['_source']["sentence"]
        output_sentence.append(perspective_sentence)
        output_score.append(score)       
    output_sentence_score.update({'sentence_indexed':output_sentence,'index_score':output_score})
    return output_sentence_score

#### sample queries for retrieval

In [7]:
query1 = "Tower blocks are advantageous"
search_output = searchIndex(indexName, query1)

In [8]:
# search_output

In [9]:
# query2 = "We should abandon Youtube"
# search_output = searchIndex(indexName, query2)

In [10]:
search_output

{'sentence_indexed': ['Highland St. is mostly commercial, with a collection of high-rise low-to-moderate residential buildings, as well as Crichton College.',
  'A 13-story high-rise building was constructed in 1966 and served as housing for the elderly [REF].',
  'Upon completion of the high-rise, the development had a total of 1002 units.',
  'It was also the last conventional public housing development constructed in New Orleans and originally consisted of a 13-floor high-rise and fourteen 3-floor units [REF].',
  'A sniper sat on top of the high-rise building, preventing the police and other from entering.',
  'These plans included expanding the site to 73acre by acquiring adjacent properties, phased demolition of the high-rise and low-rise housing units, and construction of at least 640 new housing units.',
  'Replacing the highrise were numerous low-income houses.',
  'Toll readers were located on gantries at the east highrise,[REF] but additional gantries on the east mainland we

# Part 4.
## Identification and Stance Recognition

In [11]:
#just using two columns values in our case
columns = ['sentence_index','score_index']

In [12]:
df_data = pd.DataFrame(columns = columns)
df_data['sentence_index'] = search_output['sentence_indexed']
df_data['score_index'] = search_output['index_score']#just for representational purpose

In [13]:
# df_data

#### To check sentences are pro or con

In [17]:
pro = list()
con = list()
for index, row in df_data.iterrows():
    analysis = TextBlob(row['sentence_index'])
    if analysis.sentiment.polarity >= 0:
        pro.append(row['sentence_index'])
    else:
        con.append(row['sentence_index'])
# print(len(pro))
# print(len(con))
columns_pro = ['pros'] 
columns_cons = ['cons'] 
df_column_pro = pd.DataFrame(columns = columns_pro)
df_column_pro['pros'] = pro
df_column_cons = pd.DataFrame(columns = columns_cons)
df_column_cons['cons'] = con

In [18]:
#if we want to save result in file
df_column_pro.to_json('FinalResult_pros.json')
df_column_cons.to_json('FinalResult_cons.json')

In [19]:
print('pros')
print(df_column_pro)

pros
                                                 pros
0   Highland St. is mostly commercial, with a coll...
1   A 13-story high-rise building was constructed ...
2   Upon completion of the high-rise, the developm...
3   It was also the last conventional public housi...
4   A sniper sat on top of the high-rise building,...
5   Replacing the highrise were numerous low-incom...
6   Toll readers were located on gantries at the e...
7   From 1970 to 1990, All My Children was recorde...
8   There is also council housing in the form of m...
9   Entering the southwestern corner of Jefferson ...
10  The Original Towers, nine-story high-rise resi...
11  Apartment blocks have been re-purposed as acco...
12  It was planned as a satellite city: a string o...
13  In 1981 a new construction zone plan was desig...
14  In the central business district, the lifting ...
15  It comprises of two high-rise apartment buildi...
16  With the lifting of height restrictions in the...
17  The house was demol

In [20]:
print('cons')
print(df_column_cons)

cons
                                                cons
0  These plans included expanding the site to 73a...
1  The bleak economic climate of the early 1930s ...
2  The diverse suburb has been the subject of gen...
3  Two of these tower blocks, Moncreiffe House an...
4  At their base, one of the brick kilns has been...
5  The tallest building approved for the area is ...
6  8. "Loft Cube", Mobile Home Unit · Modular liv...
7  These will be replaced by a 110 metre tall tow...
8  The area is mainly a council housing estate wi...


### Optional task, How to delete an Index

In [22]:
es.indices.delete(index='data1')

{'acknowledged': True}