# Project PRI

The goal of this project is to implement an Information Search & Extraction System for
the analysis of political discourse. Your system will have access to a large set of documents containing the electoral manifestos
of several political parties from different countries in the world. Using this data,
the system should be able to provide the following functionalities.

a) Ad hoc search on the collection of documents

Given a query, represented by a set of keywords, the system should return all manifestos containing such keywords, ordered according to their relevance to the query.

1- create an inverted index/dictionary for all documents in the document collection

2- from query given in command line, transform it, to compare with document dictionary and retrieve all that are relevant

3- ranking of documents (based on relevance to the query)


we can surpress common words, or consider the different conjugations of the same verb the same term, and so on!

In [13]:
#read csv file 
import pandas as pd

data = pd.read_csv("en_docs_clean.csv")

#creates a frame with columns text, id, party, date and title
print(data.shape[0])
print(data.iloc[16725])
print(data.iloc[21]) #to access first row

#devo comprimir num só texto os que possuem o mesmo manifesto_id?
#https://whoosh.readthedocs.io/en/latest/indexing.html#indexing-documents

16726
text            Table 1 (continued) All figures in £bn TAX AND...
manifesto_id                                         51951_201505
party                           United Kingdom Independence Party
date                                                       201505
title                     Believe in Britain. UKIP Manifesto 2015
Name: 16725, dtype: object
text            Our aim: To make Britain the world's foremost ...
manifesto_id                                         51421_199705
party                                           Liberal Democrats
date                                                       199705
title                                         Make the Difference
Name: 21, dtype: object


In [38]:
#now we want to create an index in order to ease the access and search
import nltk
import os.path
import shutil
from whoosh.fields import *
from whoosh.index import create_in, open_dir
from whoosh.query import Every
from whoosh.qparser import MultifieldParser, OrGroup
from whoosh.analysis import StemmingAnalyzer
from whoosh.formats import Frequency
from whoosh import scoring

#define the index's schema, that lists the fields in the index

#a field is a piece of information for each document in the index,
#such as its title or text content. It can be searched and/or stored
#(meaning the value that fets indexed is returned with the results)

#ndexing of a field means it can be searched and it is also returned 
#with results if defined as argument (stored=True) in schema.

# in our data, we have the text,manifesto_id,party,date,title

def createIndexComplete(data):
    
    #composes a RegexTokenizer (class implements a customizable, regular-expression-based tokenizer that extracts words
    #and ignores whitespace and punctuation) + LowerCaseFilter + StopWordsFilter + stemming filter(verbs converted to infinitive)
    analyzer = StemmingAnalyzer() 
    
    vector_format = Frequency() #Stores the number of times each term appears in each document.
    
    schema = Schema(text=TEXT(analyzer=analyzer, vector=vector_format), manifesto_id=ID(stored=True), party=TEXT(stored=True, vector=vector_format), date=NUMERIC, title=TEXT(stored=True, analyzer=analyzer, vector=vector_format))
    print(schema)
    
    if os.path.isdir("index"):
        shutil.rmtree("index")

    if not os.path.exists("index"):
        os.mkdir("index")
    
    index = create_in("index", schema)
    #The main index is an inverted index. It maps terms to the documents they appear in.
    
    #create an index writer to add documents
    writer = index.writer()
    
    for i in range(data.shape[0]):
        #print(i)
        #print(data.loc[i, "manifesto_id"])
        writer.add_document(text=data.loc[i, "text"], manifesto_id=data.loc[i, "manifesto_id"], party=data.loc[i, "party"], date=data.loc[i, "date"], title=data.loc[i, "title"])
        #print("One added")
    print("Going to commit")
    writer.commit()
    return index
    
    
index = createIndexComplete(data)


<Schema: ['date', 'manifesto_id', 'party', 'text', 'title']>
Going to commit


In [36]:
#By default, Whoosh returns the results ordered using the BM25 similarity

def showIndex(index):
    with index.searcher() as searcher:
        # Match any documents with something in the "text" field
        q = Every("text")
        results = searcher.search(q, limit=None)
        print("Number of total documents:", searcher.doc_count())
        for result in results:
            print ("Id: %s Party: %s" % (result['manifesto_id'], result['party']))
            print ("Text:")
            print (result)
            #print("Score:", result.score)
        
       # freq = searcher.frequency("content", "wobble")
        
showIndex(index)

Number of total documents: 16726
Id: 51320_196410 Party: Labour Party
Text:
<Hit {'manifesto_id': '51320_196410', 'party': 'Labour Party', 'title': 'Let’s Go With Labour for the New Britain'}>
Id: 51620_196410 Party: Conservative Party
Text:
<Hit {'manifesto_id': '51620_196410', 'party': 'Conservative Party', 'title': '‘Prosperity with a Purpose’, Conservative and Unionist Party’s Policy'}>
Id: 51320_196603 Party: Labour Party
Text:
<Hit {'manifesto_id': '51320_196603', 'party': 'Labour Party', 'title': 'Time for Decision'}>
Id: 51620_196603 Party: Conservative Party
Text:
<Hit {'manifesto_id': '51620_196603', 'party': 'Conservative Party', 'title': 'Action not Words: New Conservative Programme'}>
Id: 51320_197006 Party: Labour Party
Text:
<Hit {'manifesto_id': '51320_197006', 'party': 'Labour Party', 'title': 'Now Britain’s Strong - Let’s Make it Great to Live In'}>
Id: 51620_197006 Party: Conservative Party
Text:
<Hit {'manifesto_id': '51620_197006', 'party': 'Conservative Party', 't

Id: 51902_199705 Party: Scottish National Party
Text:
<Hit {'manifesto_id': '51902_199705', 'party': 'Scottish National Party', 'title': 'Yes we can win the best for Scotland'}>
Id: 51902_199705 Party: Scottish National Party
Text:
<Hit {'manifesto_id': '51902_199705', 'party': 'Scottish National Party', 'title': 'Yes we can win the best for Scotland'}>
Id: 51902_199705 Party: Scottish National Party
Text:
<Hit {'manifesto_id': '51902_199705', 'party': 'Scottish National Party', 'title': 'Yes we can win the best for Scotland'}>
Id: 51902_199705 Party: Scottish National Party
Text:
<Hit {'manifesto_id': '51902_199705', 'party': 'Scottish National Party', 'title': 'Yes we can win the best for Scotland'}>
Id: 51902_199705 Party: Scottish National Party
Text:
<Hit {'manifesto_id': '51902_199705', 'party': 'Scottish National Party', 'title': 'Yes we can win the best for Scotland'}>
Id: 51902_199705 Party: Scottish National Party
Text:
<Hit {'manifesto_id': '51902_199705', 'party': 'Scottish

Text:
<Hit {'manifesto_id': '51320_200106', 'party': 'Labour Party', 'title': 'Ambitions for Britain'}>
Id: 51320_200106 Party: Labour Party
Text:
<Hit {'manifesto_id': '51320_200106', 'party': 'Labour Party', 'title': 'Ambitions for Britain'}>
Id: 51320_200106 Party: Labour Party
Text:
<Hit {'manifesto_id': '51320_200106', 'party': 'Labour Party', 'title': 'Ambitions for Britain'}>
Id: 51320_200106 Party: Labour Party
Text:
<Hit {'manifesto_id': '51320_200106', 'party': 'Labour Party', 'title': 'Ambitions for Britain'}>
Id: 51320_200106 Party: Labour Party
Text:
<Hit {'manifesto_id': '51320_200106', 'party': 'Labour Party', 'title': 'Ambitions for Britain'}>
Id: 51320_200106 Party: Labour Party
Text:
<Hit {'manifesto_id': '51320_200106', 'party': 'Labour Party', 'title': 'Ambitions for Britain'}>
Id: 51320_200106 Party: Labour Party
Text:
<Hit {'manifesto_id': '51320_200106', 'party': 'Labour Party', 'title': 'Ambitions for Britain'}>
Id: 51320_200106 Party: Labour Party
Text:
<Hit {'

<Hit {'manifesto_id': '51320_200106', 'party': 'Labour Party', 'title': 'Ambitions for Britain'}>
Id: 51320_200106 Party: Labour Party
Text:
<Hit {'manifesto_id': '51320_200106', 'party': 'Labour Party', 'title': 'Ambitions for Britain'}>
Id: 51320_200106 Party: Labour Party
Text:
<Hit {'manifesto_id': '51320_200106', 'party': 'Labour Party', 'title': 'Ambitions for Britain'}>
Id: 51320_200106 Party: Labour Party
Text:
<Hit {'manifesto_id': '51320_200106', 'party': 'Labour Party', 'title': 'Ambitions for Britain'}>
Id: 51320_200106 Party: Labour Party
Text:
<Hit {'manifesto_id': '51320_200106', 'party': 'Labour Party', 'title': 'Ambitions for Britain'}>
Id: 51320_200106 Party: Labour Party
Text:
<Hit {'manifesto_id': '51320_200106', 'party': 'Labour Party', 'title': 'Ambitions for Britain'}>
Id: 51320_200106 Party: Labour Party
Text:
<Hit {'manifesto_id': '51320_200106', 'party': 'Labour Party', 'title': 'Ambitions for Britain'}>
Id: 51320_200106 Party: Labour Party
Text:
<Hit {'manife

Id: 51902_200106 Party: Scottish National Party
Text:
<Hit {'manifesto_id': '51902_200106', 'party': 'Scottish National Party', 'title': 'Heart of the manifesto 2001'}>
Id: 51902_200106 Party: Scottish National Party
Text:
<Hit {'manifesto_id': '51902_200106', 'party': 'Scottish National Party', 'title': 'Heart of the manifesto 2001'}>
Id: 51902_200106 Party: Scottish National Party
Text:
<Hit {'manifesto_id': '51902_200106', 'party': 'Scottish National Party', 'title': 'Heart of the manifesto 2001'}>
Id: 51902_200106 Party: Scottish National Party
Text:
<Hit {'manifesto_id': '51902_200106', 'party': 'Scottish National Party', 'title': 'Heart of the manifesto 2001'}>
Id: 51902_200106 Party: Scottish National Party
Text:
<Hit {'manifesto_id': '51902_200106', 'party': 'Scottish National Party', 'title': 'Heart of the manifesto 2001'}>
Id: 51902_200106 Party: Scottish National Party
Text:
<Hit {'manifesto_id': '51902_200106', 'party': 'Scottish National Party', 'title': 'Heart of the mani

Id: 51110_201505 Party: Green Party of England and Wales
Text:
<Hit {'manifesto_id': '51110_201505', 'party': 'Green Party of England and Wales', 'title': 'For the common good. General election manifesto 2015'}>
Id: 51110_201505 Party: Green Party of England and Wales
Text:
<Hit {'manifesto_id': '51110_201505', 'party': 'Green Party of England and Wales', 'title': 'For the common good. General election manifesto 2015'}>
Id: 51110_201505 Party: Green Party of England and Wales
Text:
<Hit {'manifesto_id': '51110_201505', 'party': 'Green Party of England and Wales', 'title': 'For the common good. General election manifesto 2015'}>
Id: 51110_201505 Party: Green Party of England and Wales
Text:
<Hit {'manifesto_id': '51110_201505', 'party': 'Green Party of England and Wales', 'title': 'For the common good. General election manifesto 2015'}>
Id: 51110_201505 Party: Green Party of England and Wales
Text:
<Hit {'manifesto_id': '51110_201505', 'party': 'Green Party of England and Wales', 'title

Id: 51110_201505 Party: Green Party of England and Wales
Text:
<Hit {'manifesto_id': '51110_201505', 'party': 'Green Party of England and Wales', 'title': 'For the common good. General election manifesto 2015'}>
Id: 51110_201505 Party: Green Party of England and Wales
Text:
<Hit {'manifesto_id': '51110_201505', 'party': 'Green Party of England and Wales', 'title': 'For the common good. General election manifesto 2015'}>
Id: 51110_201505 Party: Green Party of England and Wales
Text:
<Hit {'manifesto_id': '51110_201505', 'party': 'Green Party of England and Wales', 'title': 'For the common good. General election manifesto 2015'}>
Id: 51110_201505 Party: Green Party of England and Wales
Text:
<Hit {'manifesto_id': '51110_201505', 'party': 'Green Party of England and Wales', 'title': 'For the common good. General election manifesto 2015'}>
Id: 51110_201505 Party: Green Party of England and Wales
Text:
<Hit {'manifesto_id': '51110_201505', 'party': 'Green Party of England and Wales', 'title

Id: 51110_201505 Party: Green Party of England and Wales
Text:
<Hit {'manifesto_id': '51110_201505', 'party': 'Green Party of England and Wales', 'title': 'For the common good. General election manifesto 2015'}>
Id: 51110_201505 Party: Green Party of England and Wales
Text:
<Hit {'manifesto_id': '51110_201505', 'party': 'Green Party of England and Wales', 'title': 'For the common good. General election manifesto 2015'}>
Id: 51110_201505 Party: Green Party of England and Wales
Text:
<Hit {'manifesto_id': '51110_201505', 'party': 'Green Party of England and Wales', 'title': 'For the common good. General election manifesto 2015'}>
Id: 51110_201505 Party: Green Party of England and Wales
Text:
<Hit {'manifesto_id': '51110_201505', 'party': 'Green Party of England and Wales', 'title': 'For the common good. General election manifesto 2015'}>
Id: 51110_201505 Party: Green Party of England and Wales
Text:
<Hit {'manifesto_id': '51110_201505', 'party': 'Green Party of England and Wales', 'title

Text:
<Hit {'manifesto_id': '51110_201505', 'party': 'Green Party of England and Wales', 'title': 'For the common good. General election manifesto 2015'}>
Id: 51110_201505 Party: Green Party of England and Wales
Text:
<Hit {'manifesto_id': '51110_201505', 'party': 'Green Party of England and Wales', 'title': 'For the common good. General election manifesto 2015'}>
Id: 51110_201505 Party: Green Party of England and Wales
Text:
<Hit {'manifesto_id': '51110_201505', 'party': 'Green Party of England and Wales', 'title': 'For the common good. General election manifesto 2015'}>
Id: 51110_201505 Party: Green Party of England and Wales
Text:
<Hit {'manifesto_id': '51110_201505', 'party': 'Green Party of England and Wales', 'title': 'For the common good. General election manifesto 2015'}>
Id: 51110_201505 Party: Green Party of England and Wales
Text:
<Hit {'manifesto_id': '51110_201505', 'party': 'Green Party of England and Wales', 'title': 'For the common good. General election manifesto 2015'

Id: 51210_201505 Party: We Ourselves
Text:
<Hit {'manifesto_id': '51210_201505', 'party': 'We Ourselves', 'title': 'Equality not austerity Comhionannas ní déine. Sinn Féin 2015 Westminster election manifesto'}>
Id: 51320_201505 Party: Labour Party
Text:
<Hit {'manifesto_id': '51320_201505', 'party': 'Labour Party', 'title': 'Britain can be better. The Labour Party Manifesto 2015'}>
Id: 51320_201505 Party: Labour Party
Text:
<Hit {'manifesto_id': '51320_201505', 'party': 'Labour Party', 'title': 'Britain can be better. The Labour Party Manifesto 2015'}>
Id: 51320_201505 Party: Labour Party
Text:
<Hit {'manifesto_id': '51320_201505', 'party': 'Labour Party', 'title': 'Britain can be better. The Labour Party Manifesto 2015'}>
Id: 51320_201505 Party: Labour Party
Text:
<Hit {'manifesto_id': '51320_201505', 'party': 'Labour Party', 'title': 'Britain can be better. The Labour Party Manifesto 2015'}>
Id: 51320_201505 Party: Labour Party
Text:
<Hit {'manifesto_id': '51320_201505', 'party': 'La

Id: 51320_201505 Party: Labour Party
Text:
<Hit {'manifesto_id': '51320_201505', 'party': 'Labour Party', 'title': 'Britain can be better. The Labour Party Manifesto 2015'}>
Id: 51320_201505 Party: Labour Party
Text:
<Hit {'manifesto_id': '51320_201505', 'party': 'Labour Party', 'title': 'Britain can be better. The Labour Party Manifesto 2015'}>
Id: 51320_201505 Party: Labour Party
Text:
<Hit {'manifesto_id': '51320_201505', 'party': 'Labour Party', 'title': 'Britain can be better. The Labour Party Manifesto 2015'}>
Id: 51320_201505 Party: Labour Party
Text:
<Hit {'manifesto_id': '51320_201505', 'party': 'Labour Party', 'title': 'Britain can be better. The Labour Party Manifesto 2015'}>
Id: 51320_201505 Party: Labour Party
Text:
<Hit {'manifesto_id': '51320_201505', 'party': 'Labour Party', 'title': 'Britain can be better. The Labour Party Manifesto 2015'}>
Id: 51320_201505 Party: Labour Party
Text:
<Hit {'manifesto_id': '51320_201505', 'party': 'Labour Party', 'title': 'Britain can be

Id: 51340_201505 Party: Social Democratic and Labour Party
Text:
<Hit {'manifesto_id': '51340_201505', 'party': 'Social Democratic and Labour Party', 'title': 'Prosperity not austerity. SDLP Westminster Manifesto 2015'}>
Id: 51340_201505 Party: Social Democratic and Labour Party
Text:
<Hit {'manifesto_id': '51340_201505', 'party': 'Social Democratic and Labour Party', 'title': 'Prosperity not austerity. SDLP Westminster Manifesto 2015'}>
Id: 51340_201505 Party: Social Democratic and Labour Party
Text:
<Hit {'manifesto_id': '51340_201505', 'party': 'Social Democratic and Labour Party', 'title': 'Prosperity not austerity. SDLP Westminster Manifesto 2015'}>
Id: 51340_201505 Party: Social Democratic and Labour Party
Text:
<Hit {'manifesto_id': '51340_201505', 'party': 'Social Democratic and Labour Party', 'title': 'Prosperity not austerity. SDLP Westminster Manifesto 2015'}>
Id: 51340_201505 Party: Social Democratic and Labour Party
Text:
<Hit {'manifesto_id': '51340_201505', 'party': 'Soc

Id: 51421_201505 Party: Liberal Democrats
Text:
<Hit {'manifesto_id': '51421_201505', 'party': 'Liberal Democrats', 'title': 'Manifesto 2015. Stronger Economy. Fairer Society. Opportunity for Everyone.'}>
Id: 51421_201505 Party: Liberal Democrats
Text:
<Hit {'manifesto_id': '51421_201505', 'party': 'Liberal Democrats', 'title': 'Manifesto 2015. Stronger Economy. Fairer Society. Opportunity for Everyone.'}>
Id: 51421_201505 Party: Liberal Democrats
Text:
<Hit {'manifesto_id': '51421_201505', 'party': 'Liberal Democrats', 'title': 'Manifesto 2015. Stronger Economy. Fairer Society. Opportunity for Everyone.'}>
Id: 51421_201505 Party: Liberal Democrats
Text:
<Hit {'manifesto_id': '51421_201505', 'party': 'Liberal Democrats', 'title': 'Manifesto 2015. Stronger Economy. Fairer Society. Opportunity for Everyone.'}>
Id: 51421_201505 Party: Liberal Democrats
Text:
<Hit {'manifesto_id': '51421_201505', 'party': 'Liberal Democrats', 'title': 'Manifesto 2015. Stronger Economy. Fairer Society. Oppo

Id: 51421_201505 Party: Liberal Democrats
Text:
<Hit {'manifesto_id': '51421_201505', 'party': 'Liberal Democrats', 'title': 'Manifesto 2015. Stronger Economy. Fairer Society. Opportunity for Everyone.'}>
Id: 51421_201505 Party: Liberal Democrats
Text:
<Hit {'manifesto_id': '51421_201505', 'party': 'Liberal Democrats', 'title': 'Manifesto 2015. Stronger Economy. Fairer Society. Opportunity for Everyone.'}>
Id: 51421_201505 Party: Liberal Democrats
Text:
<Hit {'manifesto_id': '51421_201505', 'party': 'Liberal Democrats', 'title': 'Manifesto 2015. Stronger Economy. Fairer Society. Opportunity for Everyone.'}>
Id: 51421_201505 Party: Liberal Democrats
Text:
<Hit {'manifesto_id': '51421_201505', 'party': 'Liberal Democrats', 'title': 'Manifesto 2015. Stronger Economy. Fairer Society. Opportunity for Everyone.'}>
Id: 51421_201505 Party: Liberal Democrats
Text:
<Hit {'manifesto_id': '51421_201505', 'party': 'Liberal Democrats', 'title': 'Manifesto 2015. Stronger Economy. Fairer Society. Oppo

Id: 51421_201505 Party: Liberal Democrats
Text:
<Hit {'manifesto_id': '51421_201505', 'party': 'Liberal Democrats', 'title': 'Manifesto 2015. Stronger Economy. Fairer Society. Opportunity for Everyone.'}>
Id: 51421_201505 Party: Liberal Democrats
Text:
<Hit {'manifesto_id': '51421_201505', 'party': 'Liberal Democrats', 'title': 'Manifesto 2015. Stronger Economy. Fairer Society. Opportunity for Everyone.'}>
Id: 51421_201505 Party: Liberal Democrats
Text:
<Hit {'manifesto_id': '51421_201505', 'party': 'Liberal Democrats', 'title': 'Manifesto 2015. Stronger Economy. Fairer Society. Opportunity for Everyone.'}>
Id: 51421_201505 Party: Liberal Democrats
Text:
<Hit {'manifesto_id': '51421_201505', 'party': 'Liberal Democrats', 'title': 'Manifesto 2015. Stronger Economy. Fairer Society. Opportunity for Everyone.'}>
Id: 51421_201505 Party: Liberal Democrats
Text:
<Hit {'manifesto_id': '51421_201505', 'party': 'Liberal Democrats', 'title': 'Manifesto 2015. Stronger Economy. Fairer Society. Oppo

Id: 51620_201505 Party: Conservative Party
Text:
<Hit {'manifesto_id': '51620_201505', 'party': 'Conservative Party', 'title': 'Strong leadership. A clear economic plan. A brighter, more secure future. The Conservative Party Manifesto 2015'}>
Id: 51620_201505 Party: Conservative Party
Text:
<Hit {'manifesto_id': '51620_201505', 'party': 'Conservative Party', 'title': 'Strong leadership. A clear economic plan. A brighter, more secure future. The Conservative Party Manifesto 2015'}>
Id: 51620_201505 Party: Conservative Party
Text:
<Hit {'manifesto_id': '51620_201505', 'party': 'Conservative Party', 'title': 'Strong leadership. A clear economic plan. A brighter, more secure future. The Conservative Party Manifesto 2015'}>
Id: 51620_201505 Party: Conservative Party
Text:
<Hit {'manifesto_id': '51620_201505', 'party': 'Conservative Party', 'title': 'Strong leadership. A clear economic plan. A brighter, more secure future. The Conservative Party Manifesto 2015'}>
Id: 51620_201505 Party: Cons

Text:
<Hit {'manifesto_id': '51620_201505', 'party': 'Conservative Party', 'title': 'Strong leadership. A clear economic plan. A brighter, more secure future. The Conservative Party Manifesto 2015'}>
Id: 51620_201505 Party: Conservative Party
Text:
<Hit {'manifesto_id': '51620_201505', 'party': 'Conservative Party', 'title': 'Strong leadership. A clear economic plan. A brighter, more secure future. The Conservative Party Manifesto 2015'}>
Id: 51620_201505 Party: Conservative Party
Text:
<Hit {'manifesto_id': '51620_201505', 'party': 'Conservative Party', 'title': 'Strong leadership. A clear economic plan. A brighter, more secure future. The Conservative Party Manifesto 2015'}>
Id: 51620_201505 Party: Conservative Party
Text:
<Hit {'manifesto_id': '51620_201505', 'party': 'Conservative Party', 'title': 'Strong leadership. A clear economic plan. A brighter, more secure future. The Conservative Party Manifesto 2015'}>
Id: 51620_201505 Party: Conservative Party
Text:
<Hit {'manifesto_id': 

Id: 51620_201505 Party: Conservative Party
Text:
<Hit {'manifesto_id': '51620_201505', 'party': 'Conservative Party', 'title': 'Strong leadership. A clear economic plan. A brighter, more secure future. The Conservative Party Manifesto 2015'}>
Id: 51620_201505 Party: Conservative Party
Text:
<Hit {'manifesto_id': '51620_201505', 'party': 'Conservative Party', 'title': 'Strong leadership. A clear economic plan. A brighter, more secure future. The Conservative Party Manifesto 2015'}>
Id: 51620_201505 Party: Conservative Party
Text:
<Hit {'manifesto_id': '51620_201505', 'party': 'Conservative Party', 'title': 'Strong leadership. A clear economic plan. A brighter, more secure future. The Conservative Party Manifesto 2015'}>
Id: 51620_201505 Party: Conservative Party
Text:
<Hit {'manifesto_id': '51620_201505', 'party': 'Conservative Party', 'title': 'Strong leadership. A clear economic plan. A brighter, more secure future. The Conservative Party Manifesto 2015'}>
Id: 51620_201505 Party: Cons

Id: 51901_201505 Party: The Party of Wales
Text:
<Hit {'manifesto_id': '51901_201505', 'party': 'The Party of Wales', 'title': 'Working for Wales. 2015 Westminster Election Manifesto'}>
Id: 51901_201505 Party: The Party of Wales
Text:
<Hit {'manifesto_id': '51901_201505', 'party': 'The Party of Wales', 'title': 'Working for Wales. 2015 Westminster Election Manifesto'}>
Id: 51901_201505 Party: The Party of Wales
Text:
<Hit {'manifesto_id': '51901_201505', 'party': 'The Party of Wales', 'title': 'Working for Wales. 2015 Westminster Election Manifesto'}>
Id: 51901_201505 Party: The Party of Wales
Text:
<Hit {'manifesto_id': '51901_201505', 'party': 'The Party of Wales', 'title': 'Working for Wales. 2015 Westminster Election Manifesto'}>
Id: 51901_201505 Party: The Party of Wales
Text:
<Hit {'manifesto_id': '51901_201505', 'party': 'The Party of Wales', 'title': 'Working for Wales. 2015 Westminster Election Manifesto'}>
Id: 51901_201505 Party: The Party of Wales
Text:
<Hit {'manifesto_id':

Id: 51901_201505 Party: The Party of Wales
Text:
<Hit {'manifesto_id': '51901_201505', 'party': 'The Party of Wales', 'title': 'Working for Wales. 2015 Westminster Election Manifesto'}>
Id: 51901_201505 Party: The Party of Wales
Text:
<Hit {'manifesto_id': '51901_201505', 'party': 'The Party of Wales', 'title': 'Working for Wales. 2015 Westminster Election Manifesto'}>
Id: 51901_201505 Party: The Party of Wales
Text:
<Hit {'manifesto_id': '51901_201505', 'party': 'The Party of Wales', 'title': 'Working for Wales. 2015 Westminster Election Manifesto'}>
Id: 51901_201505 Party: The Party of Wales
Text:
<Hit {'manifesto_id': '51901_201505', 'party': 'The Party of Wales', 'title': 'Working for Wales. 2015 Westminster Election Manifesto'}>
Id: 51901_201505 Party: The Party of Wales
Text:
<Hit {'manifesto_id': '51901_201505', 'party': 'The Party of Wales', 'title': 'Working for Wales. 2015 Westminster Election Manifesto'}>
Id: 51901_201505 Party: The Party of Wales
Text:
<Hit {'manifesto_id':

Text:
<Hit {'manifesto_id': '51902_201505', 'party': 'Scottish National Party', 'title': 'Stronger for Scotland. SNP Manifesto 2015'}>
Id: 51902_201505 Party: Scottish National Party
Text:
<Hit {'manifesto_id': '51902_201505', 'party': 'Scottish National Party', 'title': 'Stronger for Scotland. SNP Manifesto 2015'}>
Id: 51902_201505 Party: Scottish National Party
Text:
<Hit {'manifesto_id': '51902_201505', 'party': 'Scottish National Party', 'title': 'Stronger for Scotland. SNP Manifesto 2015'}>
Id: 51902_201505 Party: Scottish National Party
Text:
<Hit {'manifesto_id': '51902_201505', 'party': 'Scottish National Party', 'title': 'Stronger for Scotland. SNP Manifesto 2015'}>
Id: 51902_201505 Party: Scottish National Party
Text:
<Hit {'manifesto_id': '51902_201505', 'party': 'Scottish National Party', 'title': 'Stronger for Scotland. SNP Manifesto 2015'}>
Id: 51902_201505 Party: Scottish National Party
Text:
<Hit {'manifesto_id': '51902_201505', 'party': 'Scottish National Party', 'titl

Id: 51951_201505 Party: United Kingdom Independence Party
Text:
<Hit {'manifesto_id': '51951_201505', 'party': 'United Kingdom Independence Party', 'title': 'Believe in Britain. UKIP Manifesto 2015'}>
Id: 51951_201505 Party: United Kingdom Independence Party
Text:
<Hit {'manifesto_id': '51951_201505', 'party': 'United Kingdom Independence Party', 'title': 'Believe in Britain. UKIP Manifesto 2015'}>
Id: 51951_201505 Party: United Kingdom Independence Party
Text:
<Hit {'manifesto_id': '51951_201505', 'party': 'United Kingdom Independence Party', 'title': 'Believe in Britain. UKIP Manifesto 2015'}>
Id: 51951_201505 Party: United Kingdom Independence Party
Text:
<Hit {'manifesto_id': '51951_201505', 'party': 'United Kingdom Independence Party', 'title': 'Believe in Britain. UKIP Manifesto 2015'}>
Id: 51951_201505 Party: United Kingdom Independence Party
Text:
<Hit {'manifesto_id': '51951_201505', 'party': 'United Kingdom Independence Party', 'title': 'Believe in Britain. UKIP Manifesto 201

Id: 51951_201505 Party: United Kingdom Independence Party
Text:
<Hit {'manifesto_id': '51951_201505', 'party': 'United Kingdom Independence Party', 'title': 'Believe in Britain. UKIP Manifesto 2015'}>
Id: 51951_201505 Party: United Kingdom Independence Party
Text:
<Hit {'manifesto_id': '51951_201505', 'party': 'United Kingdom Independence Party', 'title': 'Believe in Britain. UKIP Manifesto 2015'}>
Id: 51951_201505 Party: United Kingdom Independence Party
Text:
<Hit {'manifesto_id': '51951_201505', 'party': 'United Kingdom Independence Party', 'title': 'Believe in Britain. UKIP Manifesto 2015'}>
Id: 51951_201505 Party: United Kingdom Independence Party
Text:
<Hit {'manifesto_id': '51951_201505', 'party': 'United Kingdom Independence Party', 'title': 'Believe in Britain. UKIP Manifesto 2015'}>
Id: 51951_201505 Party: United Kingdom Independence Party
Text:
<Hit {'manifesto_id': '51951_201505', 'party': 'United Kingdom Independence Party', 'title': 'Believe in Britain. UKIP Manifesto 201

In [48]:
#experimentar vários critérios de score!!! ranking

def searchQuery(arg, index, w):
    #arg = unicode(arg, "utf-8") #convert to unicode to be processed by Whoosh
    #arg = arg.decode(encoding = 'UTF-8',errors = 'strict')
    ix = open_dir("index")
    with ix.searcher(weighting = w) as searcher:
        print("Using the scoring criteria:", w)
        #search() takes query object and returns result object
        query_parser = MultifieldParser(["title","text", "party"], schema=ix.schema, group=OrGroup) #search 
        # OrGroup -> so that any of the terms may be present for a document to match
        query = query_parser.parse(arg)
        results = searcher.search(query, limit=None) 
        #By default the results contains at most the first 10 matching documents; limit=None all results
        print ("Number of results:", results.scored_length())
        #print("Documents that match with query:", list(query.docs(searcher)).sort())
        docs = []
        for result in results:
            #print(result)
            #print("Score:", result.score)
            #print("Document:", result.docnum)
            docs.append(result.docnum)
            #By default, Whoosh returns the results ordered using the BM25 similarity.
            #Consider not only the term frequency and inverse document
            #frequency heuristics, but also the document length as a
            #normalization factor for the term frequency
        print("Documents that match with query ordered by score:", docs)
        
        
w = scoring.BM25F(B=0.75, content_B=1.0, K1=1.5) #default parameters for BM25 ??
searchQuery("world school primary", index, w)
w = scoring.BM25F(B=0.75, content_B=1.0, K1=1.2)
searchQuery("world school primary", index, w)
w = scoring.TF_IDF()
searchQuery("world school primary", index, w)
w = scoring.Frequency()
searchQuery("world school primary", index, w)

#they will have different orders! of displaying the docs

Using the scoring criteria: <whoosh.scoring.BM25F object at 0x7efedb4bc4a8>
Number of results: 972
Documents that match with query ordered by score: [2481, 9308, 1922, 14421, 7692, 2007, 2482, 9316, 9333, 11626, 2434, 3778, 74, 1887, 1889, 10716, 11327, 11356, 13307, 7932, 9383, 11361, 2500, 15761, 2630, 15704, 2486, 3930, 11338, 2016, 3097, 11364, 11359, 5260, 9361, 11368, 11627, 14658, 14795, 32, 76, 5226, 1670, 1373, 12819, 2461, 5496, 15758, 15752, 15741, 3753, 3919, 12648, 12810, 3421, 1873, 11375, 3240, 5459, 16358, 86, 8166, 99, 1654, 11562, 8464, 9270, 4212, 2460, 7665, 3435, 2539, 2012, 7651, 8357, 15707, 15411, 2496, 9356, 10715, 11403, 12868, 3758, 3765, 5580, 6829, 2549, 9431, 3396, 8724, 3406, 4598, 2488, 2489, 2595, 3784, 5473, 5508, 8488, 11522, 14782, 15885, 16359, 2011, 2568, 9348, 9357, 11370, 11407, 13939, 69, 2490, 4304, 15769, 7660, 910, 1678, 1711, 1741, 1763, 1774, 2191, 2274, 3285, 4617, 6363, 7343, 10695, 14509, 2661, 3179, 8360, 8388, 5615, 9441, 11398, 13346,

In [None]:
#For each party, how many manifestos are in the results returned

#TODO

In [None]:
#How many times each party mentions each keyword

#TODO