Fake-Product-Review-Montering

The scope and need of online markets and e-commerce platforms are on the rise and many people buy products from these platforms. The amount of feedbacks for products as a result are also present in detail for users to analyze the product they are buying. This can work against the users as well because users can sometime bombard the review section with extreme opinion comments which can work in favor or against the product. Thus, we need to take care of this because this can be done either by the merchant to increase the value of his product or the user to degrade the ratings of that product.

Features Used:

Sentimental Analysis
Content Similarity
Latent Symantic analysis (LSA)

Sentimental Analysis

Sentimental Analysis is contextual mining of text which identifies and extracts subjective information in source material and helping a business to understand the social sentiment of their brand, product or service while monitoring online conversations.

from sklearn.feature_extraction.text import TfidfVectorizer

vectorizer = TfidfVectorizer(max_features=2000,min_df =3 ,max_df = 0.6, stop_words = stopwords.words("english"))    
X = vectorizer.fit_transform(corpus).toarray()

#Cretaing TF-IDF from BOW
from sklearn.feature_extraction.text import TfidfTransformer

transformer = TfidfTransformer()
X = transformer.fit_transform(X).toarray()

#Spliting for testing and training

from sklearn.model_selection import train_test_split

text_train,text_test,sent_train,sent_test = train_test_split(X,y,test_size=0,random_state=0)
# here text size = 0 , so that all the data will be used for the training purpose only

# Training our classifier
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression()
classifier.fit(text_train,sent_train)

Latent symantic Analtysis

Latent semantic analysis (LSA) is a technique in natural language processing, in particular distributional semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. A matrix containing word counts per paragraph (rows represent unique words and columns represent each paragraph) is constructed from a large piece of text and a mathematical technique called singular value decomposition (SVD) is used to reduce the number of rows while preserving the similarity structure among columns Values close to 1 represent very similar words while values close to 0 represent very dissimilar words.

# Latent symantic analysis
# it will analyse all reviews and determine all reviews belong to the same concept
def LSA(text):
    #text is list of reviews of same product
    
    # Created TF-IDF Model
    vectorizer = TfidfVectorizer()
    X = vectorizer.fit_transform(text)
    
    # Created SVD(Singular Value Decomposition)
    lsa = TruncatedSVD(n_components = 1,n_iter = 100)
    lsa.fit(X)
    
    terms = vectorizer.get_feature_names()
    concept_words={}
    for j,comp in enumerate(lsa.components_):
        componentTerms = zip(terms,comp)
        sortedTerms = sorted(componentTerms,key=lambda x:x[1],reverse=True)
        sortedTerms = sortedTerms[:10]
        concept_words[str(j)] = sortedTerms
     
    sentence_scores = []
    for key in concept_words.keys():
        for sentence in text:
            words = nltk.word_tokenize(sentence)
            scores = 0
            for word in words:
                for word_with_scores in concept_words[key]:
                    if word == word_with_scores[0]:
                        scores += word_with_scores[1]
            sentence_scores.append(scores)
    return sentence_scores

Content Similarity

With cosine similarity, we need to convert sentences into vectors.Difference in the angle of these determines the similarity between two reviews.

from nltk.corpus import stopwords from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.metrics.pairwise import cosine_similarity

tfidf_vectorizer = TfidfVectorizer()
for i in range(len(dataset)):
    
    reviews = [str(dataset["review_body"][i])]
    
    tfidf_vectorizer.fit_transform(reviews)

tfidf_matrix = tfidf_vectorizer.fit_transform(reviews)
    
    #creates TF-IDF Model
    tfidf_list = cosine_similarity(tfidf_matrix[0:1], tfidf_matrix).tolist()
    # Creates matrix based on document similarity
         
    # To check similarity b/w 2 reviews 
    for k in range(1,len(tfidf_list[0])):
                
        if(tfidf_list[0][k]>0.6):
            # 0.6 is defind for the simmilarity level
            
            remove_reviews.append(dataset["review_id"][i+k])
            # i+k is to get the review id of the review

Methods used to determine Fake Reviews

Reviews which have dual view
Reviews in which same user promoting or demoting a particular brand
Reviews in which person from same IP Address promoting or demoting a particular brand
Reviews which are posted as flood by same user all the reviews are either positive or negative.
Reviews which are posted as flood by same person from same IP Address
Similar reviews posted in the same time interval
Reviews in which Reviewer using arming tone to by the product
Reviews in which reviewer is writing his own story
Meaningless Texts in reviews

Future Scope

Finding the opinion spam from huge amount of unstructured data has become an important research problem. Now business organizations, specialists and academics are putting forward their efforts and ideas to find the best system for opinion spam analysis. Although, some of the algorithms have been used in opinion spam analysis gives good results, but still no algorithm can resolve all the challenges and difficulties faced by today’s generation. More future work and knowledge is needed on further improving the performance of the opinion spam analysis.In the future we will do further investigate different kinds of features to make more accurate predictions.

Prerequisite for this Project

Required pickle files can be found here https://github.com/anubhavs11/Sentimental-Analysis-using-Logistic-Regression/tree/master/preserved%20files

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
LSA_reviews.ipynb		LSA_reviews.ipynb
README.md		README.md
Review Monitoring.ipynb		Review Monitoring.ipynb
Word Pattern.ipynb		Word Pattern.ipynb
reviews.zip		reviews.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LSA_reviews.ipynb

LSA_reviews.ipynb

README.md

README.md

Review Monitoring.ipynb

Review Monitoring.ipynb

Word Pattern.ipynb

Word Pattern.ipynb

reviews.zip

reviews.zip

Repository files navigation

Fake-Product-Review-Montering

Sentimental Analysis

Latent symantic Analtysis

Content Similarity

Methods used to determine Fake Reviews

Future Scope

Prerequisite for this Project

About

Releases

Packages

Languages

anubhavs11/Fake-Product-Review-Monitoring

Folders and files

Latest commit

History

Repository files navigation

Fake-Product-Review-Montering

Sentimental Analysis

Latent symantic Analtysis

Content Similarity

Methods used to determine Fake Reviews

Future Scope

Prerequisite for this Project

About

Resources

Stars

Watchers

Forks

Languages