# Introduction

Product reviews have become important part in decision making when considering online purchasing. People look at product descriptions and specifications to determine whether a product is a good fit to their requirement. However, in most cases, people have some specific question in their mind, e.g. “will this baby seat fit in the overhead compartment of 747?” which cannot be answered by product description as they contain generic information, and are not comprehensive i.e. missing crucial information that user needs. Consumer reviews, on the other hand, have a rich set of diversified information, based on each consumer’s individual experience. However, there can be many reviews associated with a product, which makes looking at every review time consuming

Automatic answering of user query through plethora of consumer reviews is the hot topic of research in the Information Retrieval community. Can we design a system to output relevant reviews based on the user need?  We need a system to automatically answer product related binary (yes/No) queries from an existing dataset of consumer reviews. We plan to implement this by determining the relevance of a review with respect to a query [1] based on existing product reviews and then finding user sentiment for top relevant reviews to conclude the answer as yes or no.

### For instance,

**Product:** BRAVEN BRV-1 Wireless Bluetooth Speaker

**Query:** "I want to use this with my iPad air while taking a jacuzzi bath. Will the volume be loud enough over the bath jets?"

**Customer opinions ranked by relevance and vote:**

* The sound quality is great, especially for the size, and if you place the speaker on a hard surface it acts as a sound board, and the bass really kicks up: **yes** 
* If you are looking for a water resistant blue tooth speaker you will be very pleased with this product: **yes** 
* However, if you are looking for something to throw a small party this just doesn’t have the sound output: **No**
etc. etc.

In the above scenario, we ranked the reviews based on their relevance score and used user sentiments to classify each review into yes/no. We use voting to conclude the answer as yes/no.


# Problem statement

Given a query $Q$ and set of reviews $R$ corresponding to a product $P$. We need to answer the query with the help of reviews as yes/no.

# Related Work

Our project aims to be at the interface of question-answering and opinion mining. Some previous work in relevance ranking includes word-level similarity approaches like Okapi BM25[2] and TF-IDF [3], grammatical rules and phrase-level approaches like ROGUE [4], and classical probabilistic language model [5]. They focused on learning the relevance or importance of some document, although the information learnt had not been used to answer queries. The work in the project is mainly focused on being as agnostic as possible in evaluating the relevance of an opinion, and thus a review in our case, to a query; and thus, training to learn this notion automatically from the review data accumulated. It is also differentiated from the existing Q/A techniques like Heterogeneous Model to decompose questions [6], and Identify expert and high quality answers [7,8,9] which are not centered around finding answers primarily, but rather to provide a relevance function that will help users navigate through a huge corpus of reviews and effectively find their answers based on most relevant subjective viewpoints and individual experiences.

# Approach / Methods

To address our goal, we’ll need a system with two components: (1) A relevance function, to determine which reviews contain information relevant to a query, and (2) a prediction function, allowing relevant reviews to 'vote' on the correct answer. Relevant reviews can act as expert to give opinion on the given query and we use prediction function to vote for yes/no.

1. Relevance Function:
    The reviews are characterized based on their expertise to provide a relevant answer to the corresponding binary question asked. The scoring function used to evaluate the relevance of a review can technically be any of the general similarity measures available to us; i.e. cosine similarity, ROGUE, BM25+ etc, or a mixture of these measures weighted through a voting scheme.
    
    The original paper mentions the relevance function as a parameterized scoring function:

    $s(r,q) = \phi (r,q).\theta  + \psi (q)M\psi (r)^{T}$

    Where, the first term $∅(r,q).θ$ is the pairwise similarity score taken by one of the general similarity measures, and $ψ(q)M〖ψ(r)〗^T$ is the bilinear scoring function.
    
    In our implementation we are using variance of state-of-the-art like TF-IDF method called BM25+, RougueL for finding logest common sequence and Bilinear model for considering synonyms.
    
2. Prediction function: 
    Given the relevance scores, we now need to have a voting scheme by which each review casts a Yes/No vote for the given question. Each question-review pair has a score associated with it, and these scores are weighted based on their relevance to the question. Each review gives a vote to the question, and the final score is a vote in favor of either a positive or negative answer. Here we are finding polarity of the review based on the setiments expressed in the review text. 

3. Mixture of experts
    The mixture of experts model treats each review as an expert that votes either in favor or opposition of a response of Yes/No. It considers both the relevance of a review, or it’s confidence in its expertise, a review is considered to be an 'expert' in our case; and the vote that each such 'expert' casts.
    The final classification can then be expressed as:
$$P('Yes'|q) = \sum_{r\in R} P(r,q).P('Yes'|r,q)$$
Where $P(r,q)$ = confidence in the ability of each review $r$ to classify question $q$ and $P(Yes|r,q)$ = the prediction of review $r$ to classify the response to question $q$ as $Yes$

4. Non Binary answers: 
    User may need non-binary answers for a query. Finding relevant reviews that satisfy user need is important. LDA (Latent Dirichlet Allocation) is a way to automatically discover topics for any sentence. It can be used to divide question and all reviews into topics. Based on the question’s topic distribution, we can rank reviews where top reviews will have similar topic distribution as question.
    LSA (Latent Semantic Analysis) can be used to group questions and reviews into concept(lower dimensional space).  Based on the question’s concept distribution, we can rank reviews where top reviews will have similar concept distribution as question.

### Data set collections

We used data from Amazon consisting of questions and reviews. Dataset has 1.4 million questions (and answers) on 191 thousand products, about which we have over 13 million customer reviews.

Our formatted review dataset is in the following format:

A3AEL89BFOJIU2 B005HKEJO6 4.0 1387670400 33 very nice and relatively easy to install . very fair price for what it is . led lighting is cool . a little loud . . . but that’s just being picky . Very nice and relatively easy to install. Very fair price for what it is. LED lighting is cool. A little loud...but that's just being picky.

<img src="img_1.png">

It should be noted here that the ‘Review Summary’ contains the actual review twice, once in raw text, and again in the form formatted for our project – with casefolding implemented. The count of total no of words in the review gives us the count of words to be extracted.

The formatted question-answer dataset is in the following format:
Question:
B005HKEJO6 Q YN 6011 6 does this vent to the outside does this vent to the outside

<img src="img_2.png">

Answer:
B005HKEJO6 A Y 6011 7 yes in my case through the ceiling Yes, in my case through the ceiling

<img src="img_3.png">




### Technical contributions

1. We implemented relevance function to find out relevant reviews per question. These reviews will act as experts to vote for negative or positive answer. We used combination of BM25+, Rogue, Cosine and Bilinear to train the model. 
2.	We used voting function to find user sentiments, we classify the review into positive or negative sentiments and use top relevant reviews as experts to conclude the answer as yes or no.
3.	To find top 5% of relevant reviews for non binary answer questions we are using LDA and LSI model which plots reviews and questions into same topics and concepts. 

### Challenges
1. Implementing bilinear relevance function described in the paper was difficult due to lack of information on paper and scarcity of material available on internet.
2. Finding out a different voting scheme to use relevant reviews as experts to answer the question as Yes/No.
3. Finding the appropiate model to train reviews to answer open ended questions.
4. Handling duplicates in questions and reviews dataset.
5. Handling questions with no review datasets.

We have used sentiments analysis to vote for positive or negative answer whereas paper uses bilinear model for same.

# Evaluation.

### Quantitative

In our model we are dividing datasets into 90% training and 10% testing. In the training we are applying weight measure on pairwise similarity and bilinear model. 

1)      We use accuracy as a metric to evaluate our performance to predict yes/no answer to a query. We considered 5 datasets to train our model and tested it against ground truth.  Average accuracy indicates how good our system is in predicting the answer to binary query. We are getting good accuracy which shows that we are successful in predicting yes/no answer to a query.

 

2)      We measured our performance with respect to measurements given in the paper. We can see both results are comparable.

 

3)      We also compared our results with existing baseliners mentioned in the paper and found that our model is performing well as compared to a combination of weighted BM25+ and rogue implementation.


### Qualitative (Shining part)

To model non binary question answers LSI and LSA is used. We have used ad-hoc method to evaluate these two methods. We predict following question for product "InSinkErator Badger 1, 1/3 HP Household Food Waste Disposer" by both model using 200 reviews and returned top 10 most relevant reviews.

From LSI:
1. review 1 & 2 talks about the duration customer is using thias product, thus justifying question.
2. review 3 is positive review but does not talk about durability issue.
3. review 4 & 5 also talks about the durability.
Thus, we can say LSI return majority of relevant reviews.

From LDA:
All are relevant as they all talk about duration this product lasted but indirectly.

Thus, it can be said that both LDA and LSI model can be used to sort reviews according to the relevance with question.

LSI: "Is it long lasting?"

Answers:
1. this unit came with my new constructed house . it finally went out after 12 . 5 years . i hope my new one last as long . 

2.  this model disposal was in my home when we first moved in and lasted us 9 years before it went on the blink and started to leak from the bottom . worked very well but a little noisy . replaced it with insinkerator badger 5 12 hp that has more power and is quieter hope its as reliable . 

3.  works great . great price 

4.  bought this disposal to replace an old badger 1 that id used for over 10 years . it came with the house so it could be much older . this one lasted less than a year and a half . i hope i just got a bad one but after looking at the other reviews i doubt it . im thinking their quality has gone downhill over the years at least with the badger line . just bought an evolution excel

5.  this was an exact replacement for what i already had which was in the house when i purchased it over 12 years ago . i didnt have to change any of the drain mounting hardware . hopefully this one will last as long . 

LDA: "Is it long lasting?"

Answer:
1. after buying 3 of these in 15 years ive started to see a pattern . . my latest badger 1 started leaking through the various screwcircuit breaker holes in the bottom . on the previous ones owned i had a permanent leak on one and an unclearablejam on the other . all of these cheaper badger insinkerators have galvanized steel rotators and stainless blades . . . what that means is in about a month or two when you look into the unit it will seem horribly rusted because.........6 oz compared to 26 oz for the badger 1 more hp and a 4 year warranty . if i get 10 years out of this one it would be the same cost as two badger 1s lasting 5 years each . so you can decide for yourself . if you live in a softerwater area or dont demand much from your disposal or are a landlord this one may be for you . but based on my experience and apparently that of several other people in the reviews here you do seem to be giving up some longerterm reliability with the lessexpensive models . 

2.  i was cooking and grinding up some chicken bones when i slipped on the floor in a huge puddle of water . the water had filled my entire under sink area and flowed out onto the floor ruining my cabinet bases . after cleaning up the mess i took my bader apart for find the inside a rusted and crusty mess . the nut holding the grinding plate desinagrated when i tried to remove it with a socket wrench . . . . my thoughts of repairing the unit were a joke . the housing under the grinder plate was cracked and rusted beyond belief . a very sad pos is what the badger disposal is . . . . not a deal at any price . 

3.  worked well for about 6 years . then it silently damaged the underthesink cabinet before we realized it was leaking from the bottom screws . must be made with substandard materials . no where in the installation manual does it warn about this potential problem . company has no integrety . my suggestion to who ever built this thing would be to put in a more durable bottom . and this would not require educated engineers . just someone with common sense . 

4.  personally used this for about 3 12 years in my condo that was built in late 06 install date rusted through the bottom casing last night with two holes about the diameter of pencil lead and two more very soon to come . disposal was always noisy but seemed to have enough power for the basic tasks i would need it for . i was satisfied w the disposal but i would not buy again . 

5.  i thought the leaking was caused by worn out seals . no its caused by rusted out metal . the bottom of the end bell assembly which seperates the motor from the shredder is completely rusted . there is one hole already . i can poke through the iorn sheet at several other places . in other words the disposal can at most last about 4 years under light use . 





# Conclusion 

The problem of question-answering, and more specifically using community sourced data to answer user questions has fascinated the researchers of Information Retrieval for a long time. Answering questions has advanced from evaluating simple probabilistic models such as Naïve Bayes on a bag-of-words model, to using advanced concepts from word embeddings to deep learning. We approached our problem statement with the intention of learning and implementing various information retrieval techniques, gaining exposure on how different relevance functions vary as per features of the dataset, the data size, nature etc. We implemented mixture-of-experts model on our binary question – answer dataset, and found that not only implementing a parameterized scoring function can be trained to learn the relevance of reviews, and subsequently cast vote with proportional confidence; but also saw that by using a bilinear model to transform two different word feature spaces to a common dimensional space improves our scoring function. We used LDA/LSA to divide the documents into lower dimensional space and learnt about dimensionality reduction. 

We took the help of various IR concepts that we learnt in class. We used weighted tf-idf scores to represent the question and review feature vectors. We used cosine similarity as a relevance score metric for binary question-answers; and used LSI and LDA to extract topic information from questions and reviews.

Finally, in our evaluation we used accuracy metric to baseline and evaluate our classification model.

There are some areas which can be used to extend the project. We are currently using bilinear transformation to depict similarity between the word features, we can use other techniques such as word2vec, word embeddings etc. These are known to be efficient for huge datasets, and for datasets which have a lot of associated sentences. These techniques can also be useful in dimensionality reduction, which will consequently be useful in improving both the time complexity and space complexity.


# Code Walkthrough


This model is divided into two part, binary question and non binary question.

### Binary questions

In [77]:
from collections import defaultdict
import random
import math
import operator
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import nltk

### Create Corpus

We extracted relevant information from the question and review dataset. Relevant information includes Product ID, question ID, review ID, cleaned questions and review text. Dataset have both open ended and binary question. Here we are considering only binary questions.

In [78]:
class corpus:
    def __init__(self, quesfile, reviewfile):
        self.question = dict()
        self.review = dict()
        self.data = dict()
        self.count = 0
        self.test_question = dict()
        self.num_lines = 0
        self.reviewfile = reviewfile
        self.quesfile = quesfile

    def create_dict(self):
        num_lines = sum(1 for line in open(self.quesfile))
        q = questions(self.quesfile, (0.90*self.num_lines)/2, self.data)
        # taking care of questions from Amazon data
        self.count, self.data, self.question, self.test_question= q.make_ques()
        r = reviews(self.reviewfile, self.question, self.data, self.count)
        # taking care of reviews from Amazon data
        self.review, self.data = r.make_review()

        unwanted = list()
        for item in self.question:
            if item not in self.review:
                unwanted.append(item)

        for item in unwanted:
            self.question.pop(item, 0)
            if item in self.test_question:
                self.test_question.pop(item, 0)


In [79]:
import itertools

class questions:
    def __init__(self, filename, ques_count, data):
        self.ques = set()
        self.tuning = 5000
        self.question = dict()
        self.test_question = dict()
        self.file = filename
        self.count = 0
        self.ques_count = ques_count
        self.data = data
    
        # Handling Question/answers here
    def make_ques(self):
        c = 0.0
        # opening questions file from amazon data
        with open(self.file) as fp:
            for line in itertools.islice(fp, 1, self.tuning):
                words = line.split()
                if words[1] != "A":
                    c += 1
                # Not taking open ended question for now, only taking YN(binary) questions
                if len(words) == 0 or words[2] == "O" or words[2] == "?":
                    continue
                if words[3] in self.ques:
                    continue
    
                self.ques.add(words[3])
    
                # product id and looking for questions
                if words[0] not in self.question and words[1] != "A":
                    self.question[words[0]] = list()
                if words[0] not in self.test_question and words[1] != "A" and c > self.ques_count:
                    self.test_question[words[0]] = list()
                # Handling anwers here
                # looking for answer value, default answer is stored as False, making it True
                if words[1] == "A" and words[2] == "?":
                    if words[0] not in self.question:
                        continue
                    self.question[words[0]].pop()
                    if c > self.ques_count and words[0] in self.test_question and len(self.test_question[words[0]]) !=0 :
                        self.test_question[words[0]].pop()
                    continue

                if words[1] == "A" and words[2] == "N":
                    # In case there is no question corresponsing to answer
                    if words[0] not in self.question:
                        continue
                    # Storing answer as True for "Y" answers
                    self.question[words[0]][-1]["A"] = False
                    if c > self.ques_count and words[0] in self.test_question and len(self.test_question[words[0]]) !=0 :
                        self.test_question[words[0]][-1]["A"] = False
                    continue
                # Handling questions here
                if words[1] == "Q" and words[2] == "YN":
    
                    # for each question in an item
                    freq = dict()
                    q = list()
                    # answer is False by default
                    freq["A"] = True
                    # Storing all words correpsonding to questions in a dictionary
                    freq["words"] = dict()
                    # To store sentence Id for Rogue implementation
                    freq["sentence"] = list()
                    # Question ID
                    freq["ID"] = words[3]
                    # traversing questions word by word
                    for i in range(5, 5 + int(words[4])):
                        if words[i] == ".":
                            continue
                        # Each word is represented by an ID
                        if words[i] not in self.data:
                            self.data[words[i]] = self.count
                            self.count += 1
                        id = self.data[words[i]]
                        # Increasing frequency of each word, initial 1 when first appear for first time in question
                        if id not in freq["words"]:
                            freq["words"][id] = 1
                        else:
                            freq["words"][id] += 1
                        # for Rogue
                        freq["sentence"].append(id)
                    # appending freq for each question corresponding to item
                    self.question[words[0]].append(freq)
                    if c > self.ques_count:
                        self.test_question[words[0]].append(freq)
        
        return self.count, self.data, self.question, self.test_question
        

In [80]:
# To populate dictionary
import itertools

class reviews:
    def __init__(self, filename, question, data, count):
        self.rev = set()
        self.review = dict()
        self.file = filename
        self.count = count
        self.learn = 50000
        self.data = data
        self.question = question
    
    # handling review part here
    def make_review(self):
        # opening Review data file from amazon data
        with open(self.file) as fp:
            for line in itertools.islice(fp, 1, self.learn):
                words = line.split()
                # if there is no questions for product, not considering reviews
                if len(words) == 0 or words[1] not in self.question:
                    continue
                if words[0] in self.rev:
                    continue
    
                self.rev.add(words[0])
                # If product Id is not present in review
                if words[1] not in self.review:
                    self.review[words[1]] = list()
                # considering each review
                freq = dict()
                # words in review
                freq["words"] = dict()
                # Id correpsond to all words in sequence in a review for Rogue
                freq["sentence"] = list()
                freq["actual_sen"] = ""
                # Review id
                freq["ID"] = words[0]
                # traversing all the words in a review
                for i in range(5, 5 + int(words[4])):
                    if words[i] == ".":
                        continue
                    # Assigning id to each word in whole review
                    if words[i] not in self.data:
                        self.data[words[i]] = self.count
                        self.count += 1
                    id = self.data[words[i]]
                    # storing freq of each word
                    if id not in freq["words"]:
                        freq["words"][id] = 1
                    else:
                        freq["words"][id] += 1
                    # sentence will have id for each word
                    freq["sentence"].append(id)
                    freq["actual_sen"] += words[i] + " "
                # appending to the list of review per product
                self.review[words[1]].append(freq)
                
        return self.review, self.data

### Relevant reviews

Here we are finding revelance between query and review. We want our scoring function to be parameterized so that we can learn from training data what constitutes a 'relevant' review. Thus we define a parameterized scoring functin by calculating following:

Cosine is used as feature for bilinear model.

In [81]:
class similarity_fact:
    
    def __init__(self, review, question):
        self.review_data = review
        self.question_data = question

    def evaluate_cosine_similarity(self):
        tf_scores_questions = self.evaluate_tf_questions(self.question_data)
        tf_scores_reviews = self.evaluate_tf_reviews(self.question_data, self.review_data)
        idf_scores = self.evaluate_idf(self.question_data, self.review_data)
        tf_idf_questions = self.evaluate_tf_idf_qstns(tf_scores_questions, idf_scores)
        tf_idf_reviews = self.evaluate_tf_idf_reviews(tf_scores_questions, tf_scores_reviews, idf_scores)
        cosine = self.evaluate_cosine(tf_idf_questions, tf_idf_reviews)
        return cosine,tf_idf_questions,tf_idf_reviews
    
    def evaluate_cosine(self, tf_idf_questions, tf_idf_reviews):
        cosine_score = defaultdict()
        for prod_key in tf_idf_questions:
            if prod_key in tf_idf_reviews:
                cosine_score[prod_key] = defaultdict(lambda: defaultdict())
                for question_id,each_question in tf_idf_questions[prod_key].iteritems():
                    question_length = 0
                    temp_tf_idf_multiplied = defaultdict(lambda: 0)
                    for key1, question_tf_idf in each_question.iteritems():
                        question_length += math.pow(question_tf_idf,2)
                    question_length = math.sqrt(question_length)
                    for review_id,each_review in tf_idf_reviews[prod_key].iteritems():
                        review_length = 0
                        score = 0
                        for key2,question_tf_idf in each_question.iteritems():
                            if key2 not in each_review:
                                review_tf_idf = 0
                            else:
                                review_tf_idf = each_review[key2]
                            score += question_tf_idf*review_tf_idf
                        for key3, review_tf_idf in each_review.iteritems():
                            review_length += math.pow(review_tf_idf,2)
                        review_length = math.sqrt(review_length)
                        if review_length == 0 or question_length == 0:
                            temp_tf_idf_multiplied[review_id] = 0
                        else:
                            temp_tf_idf_multiplied[review_id] = score/(review_length*question_length)
                    cosine_score[prod_key][question_id] = temp_tf_idf_multiplied
        return cosine_score
    
    
    def evaluate_tf_idf_reviews(self, tf_scores_questions, tf_scores_reviews, idf_scores):
        tf_idf_score = defaultdict()
        for prod_key in tf_scores_questions:
            if prod_key in tf_scores_reviews:
                tf_idf_score[prod_key] = defaultdict()
                review_id = -1
                for r_id,each_review in tf_scores_reviews[prod_key].iteritems():
                    review_id += 1
                    scores = defaultdict(lambda: 0)
                    for key, value in each_review.iteritems():
                        if prod_key in idf_scores:
                            idf_value = idf_scores[prod_key][key]
                        else:
                            idf_value = 0
                            idf_scores[prod_key][key] = idf_value
                        tf_idf = value * idf_value
                        scores[key] = tf_idf
                    # tf_idf_score[prod_key].append(scores)
                    tf_idf_score[prod_key][r_id] = scores
        return tf_idf_score
    
    
    def evaluate_tf_idf_qstns(self, tf_scores_questions, idf_scores):
        tf_idf_score = defaultdict()
        for prod_key in tf_scores_questions:
            if prod_key not in tf_idf_score:
                tf_idf_score[prod_key] = defaultdict()
            question_id = -1
            for q_id,each_question in tf_scores_questions[prod_key].iteritems():
                question_id += 1
                scores = defaultdict(lambda: 0)
                for key, value in each_question.iteritems():
                    if prod_key in idf_scores and key in idf_scores[prod_key]:
                        idf_value = idf_scores[prod_key][key]
                    else:
                        idf_value = 0
                        # idf_scores[prod_key][key] = idf_value
                    tf_idf = value * idf_value
                    scores[key] = tf_idf
                tf_idf_score[prod_key][q_id] = scores
                # tf_idf_score[prod_key].append(scores)
        return tf_idf_score
    
    
    def evaluate_tf_reviews(self,question_data, review_data):
        tf_scores = defaultdict()
        for prod_key, question in question_data.iteritems():
            if prod_key in review_data:
                if prod_key not in tf_scores:
                    tf_scores[prod_key] = defaultdict()
                for review in review_data[prod_key]:
                    term_count = defaultdict(lambda: 0)
                    words = review['words']
                    r_id = review['ID']
                    for key, count in words.iteritems():
                        term_count[key] = 1 + math.log(float(count))
                    tf_scores[prod_key][r_id] = term_count
                    # tf_scores[prod_key] = term_count
        return tf_scores
    
    
    def evaluate_tf_questions(self,question_data):
        tf_scores = defaultdict()
        for prod_key, question in question_data.iteritems():
            if prod_key not in tf_scores:
                tf_scores[prod_key] = defaultdict()
            for each_question in question:
                term_count = defaultdict(lambda: 0)
                words = each_question['words']
                q_id = each_question['ID']
                for key, count in words.iteritems():
                    term_count[key] = 1 + math.log(float(count))
                tf_scores[prod_key][q_id] = term_count
        return tf_scores
    
    
    def evaluate_idf(self,question_data, review_data):
        collection_freq = defaultdict()
        for prod_key, question in question_data.iteritems():
            term_count = defaultdict(lambda: 0)
            review_count = 0
            if prod_key in review_data:
                for review in review_data[prod_key]:
                    review_count += 1
                    words = review['words']
                    for key, count in words.iteritems():
                        term_count[key] += 1
                term_count['review_count'] = review_count
                collection_freq[prod_key] = term_count
    
        idf_scores = defaultdict()
        for prod in collection_freq:
            idf = defaultdict(lambda: 0)
            term_count = collection_freq[prod]
            for key, count in term_count.iteritems():
                idf_value = math.log(float(term_count['review_count']) / float(count))
                idf[key] = idf_value
            idf_scores[prod] = idf
        return idf_scores

Bilinear Model: To give same weightage to two words who are synonym of each other, we make use of biliear transformation model.
    It makes use of matrix where each cell[i][j] represents relationship between ith and jth word of the vocabulary.

In [82]:
def evaluate_bilinear(cosine, tf_idf_questions, tf_idf_reviews):
    bilinear_score = defaultdict(lambda: defaultdict())
    for prod_key in cosine:
        bilinear_score_per_product = evaluate_bilinear_per_product(cosine[prod_key], tf_idf_questions[prod_key], tf_idf_reviews[prod_key])
        bilinear_score[prod_key] = bilinear_score_per_product
    return bilinear_score

def evaluate_bilinear_per_product(cosine_product, tf_idf_questions_product, tf_idf_reviews_product):
    weights_questions, weights_reviews, d, dx, dy = preprocessing_per_product(cosine_product, \
                                        tf_idf_questions_product, tf_idf_reviews_product)
    beta = gamma = 0.01
    theta_x = theta_y = 1
    L_x, L_y = RMLS_per_product(weights_questions,weights_reviews,d,beta,gamma,theta_x, theta_y)
    bilinear_per_prod = calculate_score(tf_idf_questions_product,L_x,L_y,tf_idf_reviews_product,d,dx,dy)
    return bilinear_per_prod

def calculate_score(tf_idf_questions,L_x, L_y,tf_idf_reviews,d,dx,dy):
    final_score_list = defaultdict(lambda :defaultdict())
    for each_question in tf_idf_questions:
        score_sum = 0
        prod1 = [0] * d
        for each_question_word in tf_idf_questions[each_question].keys():
            for i in range(0,d):
                prod1[i]+= tf_idf_questions[each_question][each_question_word]*L_x[each_question_word][i]
        prod2 = defaultdict(lambda :0)
        for each_review_word in L_y.keys():
            for j in range(0,d):
                prod2[each_review_word] += L_y[each_review_word][j]*prod1[j]
        score_list = defaultdict(lambda :0)
        for each_review in tf_idf_reviews:
            score = 0
            for each_review_word in tf_idf_reviews[each_review]:
                score += prod2[each_review_word]*tf_idf_reviews[each_review][each_review_word]
            score_list[each_review] = score
            score_sum += math.pow(score,2)
        for each_score in score_list:
            if score_sum != 0:
                score_list[each_score] = score_list[each_score]/math.sqrt(score_sum)
        final_score_list[each_question]=score_list
        return final_score_list



def RMLS_per_product(weights_questions, weights_reviews, d, beta, gamma, theta_x, theta_y):
    # initialize L_x
    L_x = defaultdict()
    L_y = defaultdict()
    for question_word in weights_questions:
        l_xu = [random.random() for _ in range(d)]
        L_x[question_word] = l_xu
    for review_word in weights_reviews:
        l_yv = [random.random() for _ in range(d)]
        L_y[review_word] = l_yv

    T = 10 #setting the convergence limit
    t = 0
    while t<=T:
        t += 1
        for question_word in weights_questions:
            omega_u = [0] * d
            for review_word in weights_reviews:
                col = L_y[review_word]
                for i in range(0,len(col)):
                    omega_u[i] += col[i]*weights_questions[question_word][review_word]
            l_xu = L_x[question_word]
            non_zero = False
            for i in range(0,len(l_xu)):
                if l_xu[i] != 0:
                    non_zero = True
                    break
            if non_zero == False:
                l_xu = [0] * d
            else:
                length = 0
                for z in range (0,d):
                    positive = False
                    if omega_u[z] > 0:
                        positive = True
                    z_th_element = abs(omega_u[z])
                    max_value = max(z_th_element-beta,0)
                    if positive:
                        pass
                    else:
                        max_value = -max_value
                    l_xu[z] = max_value
                    length += math.pow(max_value,2)
                length = math.sqrt(length)
                for z in range(0,d):
                    if length == 0:
                        pass
                    else:
                        l_xu[z] = l_xu[z] * (theta_x/length)

            L_x[question_word] = l_xu

        for review_word in weights_reviews:
            eta_v = [0] * d
            for question_word in weights_questions:
                col = L_x[question_word]
                for i in range(0, len(col)):
                    eta_v[i] += col[i] * weights_reviews[review_word][question_word]
            l_yv = L_y[review_word]
            non_zero = False
            for i in range(0, len(l_yv)):
                if l_yv[i] != 0:
                    non_zero = True
                    break
            if non_zero == False:
                l_yv = [0] * d
            else:
                length = 0
                for z in range(0, d):
                    positive = False
                    if eta_v[z] > 0:
                        positive = True
                    z_th_element = abs(eta_v[z])
                    max_value = max(z_th_element - gamma, 0)
                    if positive:
                        pass
                    else:
                        max_value = -max_value
                    l_yv[z] = max_value
                    length += math.pow(max_value, 2)
                length = math.sqrt(length)
                for z in range(0, d):
                    if length == 0:
                        pass
                    else:
                        l_yv[z] = l_yv[z] * (theta_y / length)

            L_y[review_word] = l_yv

    return L_x, L_y


def preprocessing_per_product(cosine_product, tf_idf_questions_product, tf_idf_reviews_product):
    question_vocab = set()
    review_vocab = set()
    # calculate dx and dy (size of vocab for all questions, all reviews respectively)
    for question_id in tf_idf_questions_product:
        question_vocab = question_vocab.union(tf_idf_questions_product[question_id].keys())
    for review_id in tf_idf_reviews_product:
        review_vocab = review_vocab.union(tf_idf_reviews_product[review_id].keys())
    dx = len(question_vocab)  # size of question vocab
    dy = len(review_vocab)  # size of review vocab
    question_vocab_list = list(question_vocab)
    review_vocab_list = list(review_vocab)
    weights_questions = defaultdict(lambda: defaultdict)
    weights_reviews = defaultdict(lambda: defaultdict)
    # Initializing everything to 0
    for u in range(0, dx):
        weights_xu = defaultdict(lambda: 0)
        for v in range(0, dy):
            weights_xu[review_vocab_list[v]] = 0
        weights_questions[question_vocab_list[u]] = weights_xu
    for v in range(0, dy):
        weights_yv = defaultdict(lambda: 0)
        for u in range(0, dx):
            weights_yv[question_vocab_list[u]] = 0
        weights_reviews[review_vocab_list[v]] = weights_yv
    # Precalculation
    nx = len(tf_idf_questions_product)  # num questions
    ny = len(tf_idf_reviews_product)  # num reviews
    for key in weights_questions:
        for question_id in cosine_product:
            for review_id in cosine_product[question_id]:
                for review_word in review_vocab_list:
                    num = float(1) / float(nx * ny)
                    try:
                        num *= tf_idf_questions_product[question_id][key]
                    except KeyError:
                        num = 0
                    num *= cosine_product[question_id][review_id]
                    try:
                        num *= tf_idf_reviews_product[review_id][review_word]
                    except KeyError:
                        num = 0
                    weights_questions[key][review_word] += num

    for key in weights_reviews:
        for question_id in cosine_product:
            for review_id in cosine_product[question_id]:
                for question_word in question_vocab_list:
                    num = float(1) / float(nx * ny)
                    try:
                        num *= tf_idf_reviews_product[review_id][key]
                    except KeyError:
                        num = 0
                    num *= cosine_product[question_id][review_id]
                    try:
                        num *= tf_idf_questions_product[question_id][question_word]
                    except KeyError:
                        num = 0
                    weights_reviews[key][question_word] += num

    d = (min(dx,dy)/3) + 1

    return weights_questions, weights_reviews, d, dx, dy

There can be common sequences between review and query which can be used to calculate their relavance score.
Rogue-L calculates the length of longest common subsequence in the query and review

In [83]:
class roguel:

    def __init__(self, review, question):
        self.pairscore = dict()
        self.review = review
        self.question = question

    # Dynamic Programming implementation of LCS problem
    def lcs(self, X, Y):
        # find the length of the strings
        m = len(X)
        n = len(Y)
    
        # declaring the array for storing the dp values
        L = [[None] * (n + 1) for i in xrange(m + 1)]
    
        """Following steps build L[m+1][n+1] in bottom up fashion
        Note: L[i][j] contains length of LCS of X[0..i-1]
        and Y[0..j-1]"""
        for i in range(m + 1):
            for j in range(n + 1):
                if i == 0 or j == 0:
                    L[i][j] = 0
                elif X[i - 1] == Y[j - 1]:
                    L[i][j] = L[i - 1][j - 1] + 1
                else:
                    L[i][j] = max(L[i - 1][j], L[i][j - 1])
    
        # L[m][n] contains the length of LCS of X[0..n-1] & Y[0..m-1]
        return L[m][n]
    # end of function lcs



    def rogueL(self):
        # r = dict()
        for item in self.question:
            # r[item] = list()
            num = 0.0
            for q in self.question[item]:
                # res = list()
                if item not in self.review:
                    continue
                if item not in self.pairscore:
                    self.pairscore[item] = dict()
                if q["ID"] not in self.pairscore[item]:
                    self.pairscore[item][q["ID"]] = dict()
                for val in self.review[item]:
                    if val["ID"] not in self.pairscore[item][q["ID"]]:
                        self.pairscore[item][q["ID"]][val["ID"]] = dict()
                    self.pairscore[item][q["ID"]][val["ID"]]["ROGUE"] = self.lcs(q["sentence"], val["sentence"])
                    num += self.pairscore[item][q["ID"]][val["ID"]]["ROGUE"]
                for val in self.review[item]:
                    if num != 0:
                        self.pairscore[item][q["ID"]][val["ID"]]["ROGUE_NORM"] = self.pairscore[item][q["ID"]][val["ID"]]["ROGUE"]/num
                    else:
                        self.pairscore[item][q["ID"]][val["ID"]]["ROGUE_NORM"] = 0
                    #print "one review done"
                    # r[item].append(res)
        
        return self.pairscore


Frequency of words and document frequency plays an important role to calculate relevance score. BM25+ is "TF-IDF" model which make use of words frequencies in the query and reviews.

In [84]:
class bm25:
    def __init__(self, review, question, pairscore):
        self.review = review
        self.question = question
        self.pairscore = pairscore
        
    def getIDF(self, ques, review_l):
        count = 0
        for r in review_l:
            if ques in r["words"]:
                count += 1
    
        num = len(review_l)-count+0.5
        den = count+0.5
        idf_val = math.log(num) - math.log(den)
        return idf_val

    def get_avgdl(self, review_l):
        length = 0
        for r in review_l:
            length += len(r["sentence"])
        avg = length/len(review_l)
        #nneed to fix this
        if avg == 0:
            avg == 1
        return avg

    def okapi(self):
        k1, b = (1.2+2)/2, 0.75
    
        for item in self.question:
            if item not in self.review:
                continue
            if item not in self.pairscore:
                self.pairscore[item] = dict()
            for q in self.question[item]:
                if q["ID"] not in self.pairscore[item]:
                    self.pairscore[item][q["ID"]] = dict()
                # calculate avgdl
                avgdl = self.get_avgdl(self.review[item])
                norm_bm, norm_bmm = 0,0
                #find term frequency
                for r in self.review[item]:
                    if r["ID"] not in self.pairscore[item][q["ID"]]:
                        self.pairscore[item][q["ID"]][r["ID"]] = dict()
                    score = 0
                    index = 0
                    score1 = 0
                    for word in q["words"]:
                        # taking review as doc get idf for each word in query
                        idf = self.getIDF(word, self.review[item])
                        if word not in r["words"]:
                            w = 0
                        else:
                            w = r["words"][word]
                        num = (k1 +1)*w
                        den = w + k1 *(1-b +b *(len(r["sentence"])/avgdl))
                        score += idf*(num/den)
                        score1+=idf
                    self.pairscore[item][q["ID"]][r["ID"]]["BM25"] = score
                    self.pairscore[item][q["ID"]][r["ID"]]["BM25+"] = score1
                    norm_bm += score
                    norm_bmm += score1
                for r in self.review[item]:
                    if norm_bm != 0:
                        self.pairscore[item][q["ID"]][r["ID"]]["BM25"] /= norm_bm
                    if norm_bmm !=0:
                        self.pairscore[item][q["ID"]][r["ID"]]["BM25+"] /= norm_bmm
                    self.pairscore[item][q["ID"]][r["ID"]]["BM25+"] += self.pairscore[item][q["ID"]][r["ID"]]["BM25"]
                    self.pairscore[item][q["ID"]][r["ID"]].pop("BM25", 0)
                    #print q["ID"], r["ID"]
        return self.pairscore

Weight training is done so as to determine how they should be combined in order to achieve the best ranking. 

In [85]:
def evaluate_model(pairscore,bilinear_score,question):
    eta = 0.01
    T = 500
    t = 0
    final_weight = defaultdict(lambda: defaultdict())
    for prod_key in bilinear_score:
        for each_question in bilinear_score[prod_key]:
            question_weight = defaultdict(lambda: list())
            w = [random.random() for _ in range(3)]
            sum_weights = 0
            for i in range(0,3):
                sum_weights+=w[i]
            if (sum_weights != 0.0):
                for i in range(0,3):
                    w[i]=w[i]/sum_weights
            for each_review in bilinear_score[prod_key][each_question]:
                t = 0
                while t < T:
                    t += 1
                    sum = pairscore[prod_key][each_question][each_review]['ROGUE']+\
                        pairscore[prod_key][each_question][each_review]['BM25+']+\
                        bilinear_score[prod_key][each_question][each_review]
                    scores = [pairscore[prod_key][each_question][each_review]['ROGUE']/sum, \
                                pairscore[prod_key][each_question][each_review]['BM25+']/sum, \
                                bilinear_score[prod_key][each_question][each_review]/sum]
                    total_sum = w[0] * pairscore[prod_key][each_question][each_review]['ROGUE'] + \
                        w[1] * pairscore[prod_key][each_question][each_review]['BM25+'] + \
                        w[2] * bilinear_score[prod_key][each_question][each_review]
                    if total_sum > 0.5:
                        if question[prod_key][0]['A'] == False:
                            w[0] -= eta
                            w[1] -= eta
                            w[2] -= eta
                    else:
                        if question[prod_key][0]['A'] == True:
                            w[0] += eta
                            w[1] += eta
                            w[2] += eta
                    sum_weights = 0
                    for i in range(0,3):
                        sum_weights+=w[i]
                    if (sum_weights != 0.0):
                        for i in range(0,3):
                            w[i]=w[i]/sum_weights
                question_weight[each_question] = w
        final_weight[prod_key] = question_weight
    return final_weight

### Voting score

Once the review is ranked in the order of their relevant score, top 5 reviews are taken and sentiment analysis is done on the text to categorize as positive or negative response.

In [86]:
def classification(predicted_model, test_pairscore, test_bilinear_score, test_question, question, review):
    total_no = 0
    correct = 0
    incorrect = 0
    c , i = 0, 0
    relevant_reviews = defaultdict(lambda: defaultdict())
    for prod_key in test_bilinear_score:
        question_review_score = defaultdict(lambda: defaultdict())
        for each_question in test_bilinear_score[prod_key]:
            # total_no += 5
            review_score = defaultdict()
            for each_review in test_bilinear_score[prod_key][each_question]:
                score = defaultdict()
                weight_list = predicted_model[prod_key][each_question]
                sum = weight_list[0] * test_pairscore[prod_key][each_question][each_review]['ROGUE'] + \
                      weight_list[1] * test_pairscore[prod_key][each_question][each_review]['BM25+'] + \
                      weight_list[2] * test_bilinear_score[prod_key][each_question][each_review]
                abs_sum = abs(sum)
                score['score'] = sum
                score['abs_score'] = abs_sum
                for x in review[prod_key]:
                    if x['ID'] == each_review:
                        score['actual_sen'] = x['actual_sen']
                        break
                review_score[each_review] = score


            sorted_list = list(reversed(sorted(review_score.iteritems(), key=operator.itemgetter(1))))
            i =0
            #nltk.download()
            yes, no = 0, 0
            sid = SentimentIntensityAnalyzer()
            for x in sorted_list:
                sentence = x[1]['actual_sen']
                #print sentence
                ss = sid.polarity_scores(sentence)
                if ss["compound"] < 0:
                    no += 1
                else:
                    yes += 1

                i += 1
                if i == 5:
                    break
            if yes >= no:
                if test_question[prod_key][0]['A'] == True:
                    c += 1
                else:
                    i += 1
            else:
                if test_question[prod_key][0]['A'] == False:
                    c += 1
                else:
                    i +=  1
    #accuracy =  float(c) / float(c + i)
    #print ("accuracy: %s" %accuracy)
    return c, i

In [87]:
class mixture_of_experts:
    
    def __init__(self, corpus):
        self.rel_sim = dict()
        self.rel_syn = dict()
        self.voting = dict()
        self.corpus = corpus
        
    def get_relevance(self):
        r = roguel(self.corpus.review, self.corpus.question)
        self.rel_sim = r.rogueL()
        b = bm25(self.corpus.review, self.corpus.question, self.rel_sim)
        self.rel_sim = b.okapi()
        
        s = similarity_fact(self.corpus.review, self.corpus.question)
        cosine,tf_idf_questions,tf_idf_reviews = s.evaluate_cosine_similarity()

        bilinear_score = evaluate_bilinear(cosine,tf_idf_questions,tf_idf_reviews)
        #print("bilinear done")
        predicted_model = evaluate_model(self.rel_sim,bilinear_score,self.corpus.question)
        
        return predicted_model

    def test_relevance(self):
        r = roguel(self.corpus.review, self.corpus.test_question)
        self.rel_sim = r.rogueL()
        b = bm25(self.corpus.review, self.corpus.test_question, self.rel_sim)
        self.rel_sim = b.okapi()
        
        s = similarity_fact(self.corpus.review, self.corpus.test_question)
        cosine,tf_idf_questions,tf_idf_reviews = s.evaluate_cosine_similarity()
        bilinear_score = evaluate_bilinear(cosine,tf_idf_questions,tf_idf_reviews)
        return self.rel_sim, bilinear_score
        
    def get_voting(self, predicted_model,test_pairscore,test_bilinear_score,test_question, question, review):
        correct, incorrect = classification(predicted_model,test_pairscore,test_bilinear_score,test_question, question, review)
        return correct, incorrect


Here we first train the parameterized weights of the model and then test it on 10% dataset as test.

In [88]:
def main(question_file_name, review_file_name):
    
    corp = corpus(question_file_name,review_file_name)
    corp.create_dict()
    print "Dictionary formation done"

    mod = mixture_of_experts(corp)
    predicted_model = mod.get_relevance()
    print "training done"

    #testing
    modtest = mixture_of_experts(corp)
    test_pairscore,test_bilinear_score= modtest.test_relevance()
    print "testing done"

    c, i = mod.get_voting(predicted_model,test_pairscore,test_bilinear_score,corp.test_question, corp.question, corp.review)
    print "voting done"
    
    accuracy =  float(c) / float(c + i)
    print ("accuracy: %s" %accuracy)

if __name__ == "__main__":
    question_file_name = 'questions_Cell_Phones_and_Accessories.txt'
    review_file_name = 'reviews_Cell_Phones_and_Accessories.txt'
    main(question_file_name, review_file_name)

Dictionary formation done
training done
testing done
voting done
accuracy: 0.8


### Evaluation plots

We tested our model for binary QA against the following datasets:

<img src="plot_1.png"/>

Below is the quantitative visualization of our evaluation results against the baseline evalaution of the Moqa model:

<img src="plot_both.png"/>

Below is the quantitative visualization of our evaluation results against the baseline evalaution of the ro-L model, which is the learned weighted ROGUE + BM25+ model:

<img src="plot_3.png">

To get list of reviews relevant to the question which can help user to find answer to their query. We use LSI and LDA model for content and topic identification. 

In [70]:

from stop_words import get_stop_words
from nltk.stem.porter import PorterStemmer
from gensim import corpora, models
import gensim
from operator import itemgetter, attrgetter
import math


q_file = "question_2.txt"
r_file = "reviews_2.txt"


def create_dict(question, review):
    make_ques(question)
    make_review(review, question)


def make_ques(question):
    en_stop = get_stop_words('en')
    p_stemmer = PorterStemmer()
    with open(q_file) as fp:
        for line in fp:
            words = line.split()
            #not taking open ended question for now
            if len(words) == 0 or words[1] == "A":
                continue
            #product id
            if words[0] not in question:
                question[words[0]] = list()

            if words[1] == "Q":
                freq = list()
                line = ""
               # freq["words"] = dict()
               # freq["sentence"] = list()
               # freq["ID"] = words[3]
                for i in range(5, 5+int(words[4])):
                    line += " " + words[i]
                    if words[i] == ".":
                        continue
                    freq.append(words[i])
                # remove stop words from tokens
                freq = [i for i in freq if not i in en_stop]
                # stem tokens
                freq = [p_stemmer.stem(i) for i in freq]
                freq.append(line)

                question[words[0]].append(freq)

def make_review(review, question):
    en_stop = get_stop_words('en')
    p_stemmer = PorterStemmer()

    with open(r_file) as fp:
        for line in fp:
            words = line.split()
            if len(words) == 0 or words[1] not in question:
                continue
            if words[1] not in review:
                review[words[1]] = list()
            freq = list()
            line = ""
            for i in range(5, 5+int(words[4])):
                line += " "+words[i]
                if words[i] == ".":
                    continue
                freq.append(words[i])
            # remove stop words from tokens
            freq = [i for i in freq if not i in en_stop]
            # stem tokens
            freq = [p_stemmer.stem(i) for i in freq]
            freq.append(line)

            review[words[1]].append(freq)



def find(question, review):
    for item in question:
        val = list()
        r_review = list()
        if item not in review:
            continue
        for x in review[item]:
            r_review.append(x[-1])
            val.append(x[:len(x)-1])
        qr = corpora.Dictionary(val)
        qr.save('auto_review.dict')
        corpus = [qr.doc2bow(text) for text in val]
        tfidf = models.TfidfModel(corpus)
        corpus_tfidf = tfidf[corpus]
        lda_model = models.LsiModel(corpus_tfidf, id2word=qr, num_topics=5)
        corpus_lsi = lda_model[corpus_tfidf]
        #for i in range(0, len(corpus_lsi)):
         #   print i+1, r_review[i], corpus_lsi[i]#sorted(, key=itemgetter(1), reverse=True)

        c_list = dict()
        result = sorted(corpus_lsi[-1], key=lambda x: math.fabs(x[1]), reverse=True)
        #print result
        for i in range(0, len(corpus_lsi) - 1):
            for x in corpus_lsi[i]:
                if x[0] == result[0][0]:
                    c_list[i] = x[1]
                    break

       # for key in c_list:
        #    print r_review[key]
        c_list = {x: math.fabs(c_list[x]) for (x, _) in c_list.iteritems()}
        #print c_list
        sorted_x = sorted(c_list.items(), key=itemgetter(1), reverse=True)
        i = 0
        print "Question:", r_review[-1], "?", "\n"
        print "Answers:\n"
        index = 1
        for key in sorted_x:
            if i > 5:
                break
            print index, r_review[key[0]], "\n"
            i += 1
            index += 1



question, review = dict(), dict()
create_dict(question, review)
find(question, review)

Question:  Is it long lasting ? 

Answers:

1  this unit came with my new constructed house . it finally went out after 12 . 5 years . i hope my new one last as long . 

2  this model disposal was in my home when we first moved in and lasted us 9 years before it went on the blink and started to leak from the bottom . worked very well but a little noisy . replaced it with insinkerator badger 5 12 hp that has more power and is quieter hope its as reliable . 

3  works great . great price 

4  this was an exact replacement for what i already had which was in the house when i purchased it over 12 years ago . i didnt have to change any of the drain mounting hardware . hopefully this one will last as long . 

5  bought this disposal to replace an old badger 1 that id used for over 10 years . it came with the house so it could be much older . this one lasted less than a year and a half . i hope i just got a bad one but after looking at the other reviews i doubt it . im thinking their quality 

In [69]:
from stop_words import get_stop_words
from nltk.stem.porter import PorterStemmer
from gensim import corpora, models
import gensim
from operator import itemgetter, attrgetter
import math

q_file = "question_2.txt"
r_file = "reviews_2.txt"

#########################################
# create dictionary
########################################
def create_dict(question, review):
    make_ques(question)

    make_review(review, question)

    find(question, review)
    
#########################################
# create dictionary for question
########################################
def make_ques(question):
    #remove stop words and do stemming
    en_stop = get_stop_words('en')
    p_stemmer = PorterStemmer()
    
    #traverse each line of the file
    with open(q_file) as fp:
        for line in fp:
            words = line.split()
            
            #ignoring answers
            if len(words) == 0 or words[1] == "A":
                continue
                
            #product id
            if words[0] not in question:
                question[words[0]] = list()
            
            if words[1] == "Q":
                freq = list()
                line = ""
               
                for i in range(5, 5+int(words[4])):
                    line += " " + words[i]
                    if words[i] == ".":
                        continue
                    freq.append(words[i])
                # remove stop words from tokens
                freq = [i for i in freq if not i in en_stop]
                # stem tokens
                freq = [p_stemmer.stem(i) for i in freq]
                freq.append(line)
                    
                question[words[0]].append(freq)
                

##########################################
# create dictionary for review list
###########################################
def make_review(review, question):
    #stop words and stemming
    en_stop = get_stop_words('en')
    p_stemmer = PorterStemmer()

    #traverse each line of the file
    with open(r_file) as fp:
        for line in fp:
            words = line.split()
            
            #ignoring reviews not in questions
            if len(words) == 0 or words[1] not in question:
                continue
                
            if words[1] not in review:
                review[words[1]] = list()
            freq = list()
            line = ""
            for i in range(5, 5+int(words[4])):
                line += " "+words[i]
                if words[i] == ".":
                    continue
                freq.append(words[i])
            # remove stop words from tokens
            freq = [i for i in freq if not i in en_stop]
            # stem tokens
            freq = [p_stemmer.stem(i) for i in freq]
            freq.append(line)

            review[words[1]].append(freq)


############################################
#find relevant review
##########################################
def find(question, review):
    #for each question train model to 
    #get comparable topic
    for item in question:
        val = list()
        r_review = list()
        if item not in review:
            continue
        for x in review[item]:
            r_review.append(x[-1])
            val.append(x[:len(x)-1])
        #create corpus with each word have id
        qr = corpora.Dictionary(val)
        qr.save('auto_review.dict')
        corpus = [qr.doc2bow(text) for text in val]
        #learn lda model for given query
        lda_model = gensim.models.LdaModel(corpus, alpha='auto', num_topics=5, id2word=qr, passes=100)
        corpus_lsi = lda_model[corpus]
        
        #sort review with maximum probable topic
        c_list = dict()
        result = sorted(corpus_lsi[-1], key=lambda x: math.fabs(x[1]), reverse=True)
        c_list = dict()
        for i in range(0, len(corpus_lsi) - 1):
            for x in corpus_lsi[i]:
                if x[0] == result[0][0]:
                    c_list[i] = x[1]
                    break
        #create answer
        c_list = {x: math.fabs(c_list[x]) for (x, _) in c_list.iteritems()}
        # print c_list
        sorted_x = sorted(c_list.items(), key=itemgetter(1), reverse=True)
        i = 0
        print "Question:", r_review[-1], "?", "\n"
        print "Answers:\n"
        index = 1
        for key in sorted_x:
            if i > 5:
                break
            print index, r_review[key[0]], "\n"
            i += 1
            index += 1


create_dict(question, review)

Question:  Is it long lasting ? 

Answers:

1  before renting out my dads house we updated it including installing a badger 1 . the unit lasted 5 years but wait the house is only rented 3 months out of the year making the badgers life only 15 months . it locked up when the house was vacant and when i turned it with the provided wrench i could hear the grinding of rusty metal . to date it has yet to leak but you know it is coming when it locks up in one week of not being used . our previous ise not a badger model in our own home lasted nearly 20 years so i replaced it with another ise . if it does not last as long as the first one i will chalk it up as another example of hidden inflation cheaper made products but selling at near original prices . update manufacturers responseout of nowhere i received an email from a company rep . he had specific questions on the behavior and lifespan of the product which enabled him to diagnose the problem . after doing so he said the product should not