<a href="https://colab.research.google.com/github/OmarMeriwani/Fake-Financial-News-Detection/blob/master/News_Sources_Analysis_Objectivity_Check.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# News Sources Analysis - Objectivity Check
This document combines the [feature of "who said"](https://github.com/OmarMeriwani/Fake-Financial-News-Detection/blob/master/News_Sources_Analysis_Who_Said.ipynb) with two other features to predict the fake news using the proposed objectivity check method. 
The libraries below are the one that are required for this file to run, they include libraries for evaluation, prediction and preprocessing methods.

In [0]:
import pandas as pd
import pickle
from sklearn.neural_network import MLPClassifier
from nltk.stem.porter import *
from sklearn.model_selection import train_test_split
import os
from stanfordcorenlp import StanfordCoreNLP
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import f1_score
from nltk.corpus import wordnet

Stanford core NLP method and two methods for verbs related features (previously explained in  ["Who Said"](https://github.com/OmarMeriwani/Fake-Financial-News-Detection/blob/master/News_Sources_Analysis_Who_Said.ipynb) )

In [0]:
java_path = "C:/Program Files/Java/jdk1.8.0_161/bin/java.exe"
os.environ['JAVAHOME'] = java_path
host='http://localhost'
port=9000
scnlp =StanfordCoreNLP(host, port=port,lang='en', timeout=30000)
stemmer = PorterStemmer()


In [0]:
def getVerbs(verb):
    synonyms = []
    for syn in wordnet.synsets(verb):
        for l in syn.lemmas():
            synonyms.append(l.name())
    return set(synonyms)


In [0]:
def WhoSaid (sent, verb):
    result = []
    deps = scnlp.dependency_parse(sent)
    tags = scnlp.pos_tag(sent)
    ners = scnlp.ner(sent)
    verbindex = []
    for i in range(1, len(tags)):
        if tags[i][0] == verb:
            verbindex .append( i + 1)
    for i in deps:
        if i[1] in verbindex and i[0] == 'nsubj':
            result.append([tags[i[2] - 1][0], tags[i[2] - 1][1], ners[i[2] - 1][1] ])
    return result


The same features explained  [previously](https://github.com/OmarMeriwani/Fake-Financial-News-Detection/blob/master/News_Sources_Analysis_Who_Said.ipynb)  are presented here, but using another dataset for evaluation, and with using new features for getting named entities and time expressions.

In [0]:

features = []
df = pd.read_csv('../Sentiments/FakeNewsSA.csv')
for i in range(0,len(df)):
    claim = df.loc[i][0]
    label = df.loc[i][1]
    lemTags = scnlp.pos_tag(claim)
    colonAvailable = 1 if (claim.find(':') != -1) else 0
    tags = scnlp.pos_tag(claim)
    tagsarr = []
    sayverbs = getVerbs('say')
    isSayVerb = 0
    isNPPSaid = 0
    isNERSaid = 0
    isQuestion = 0
    nnpfound = 0
    if '?' in claim:
        isQuestion = 1
    nnp_followed_by_colon = 0
    mid = int((len(tags) - 1) / 2)
    for t in lemTags:
        verb = stemmer.stem(str(t[0]).lower())
        if 'V' in t[1]:
            for j in sayverbs:

                if verb == str(j).lower():
                    whosaid = WhoSaid(claim, str(t[0]))
                    if whosaid != []:
                        for w in whosaid:
                            if w[1] == 'NNP' and isNPPSaid == 0:
                                isNPPSaid = 1
                            if w[2] != 'O' and isNERSaid == 0:
                                isNERSaid = 1
                        print('Whosaid', whosaid)
                    isSayVerb = 1
                    break
    for i in range(0, mid):
        word = tags[i][1]
        if nnpfound == 1 and word == ':':
            nnp_followed_by_colon = 1
            break
        if word == 'NNP':
            nnpfound = 1
        else:
            nnpfound = 0
    nnp_preceeded_by_colon = 0
    for i in range(0, mid):
        word = tags[len(tags) - 1 - i][1]
        word2 = tags[len(tags) - 1 - i][0]
        if word == 'NNP':
            nnpfound = 1
        if nnpfound == 1 and word == ':':
            nnp_preceeded_by_colon = 1
            break
        if word != 'NNP':
            nnpfound = 0
    numberOfNER = 0
    usingTimeExpressions = 0
    features.append([colonAvailable, nnp_followed_by_colon, nnp_preceeded_by_colon, isNPPSaid, isNERSaid, isQuestion])


Using the stored model of WhoSaid to predict the referral feature.

In [0]:
mlp = pickle.load(open('WhoSaid.pkl', 'rb'))
y= mlp.predict(features)
features2 = []
labels = []
for i in range(0, len(features)):
    claim = df.loc[i][0]
    Cited = y[i]
    label = df.loc[i][1]
    numberOfNER = 0
    usingTimeExpressions = 0
    '''
    NEW ADDITION, NOT PRESENTED IN WHO SAID SOURCE CODE
    The code below performs a named entity recognition feature on the claim, and checks whether it contains time expressions and / or named entities
    '''

    for tag in scnlp.ner(claim):
        if tag[1] != 'O':
            numberOfNER += 1
        if tag[1] == 'DATE' and usingTimeExpressions == 0:
            usingTimeExpressions = 1
    print(claim, numberOfNER, usingTimeExpressions, label)
    features2.append([ numberOfNER, usingTimeExpressions])
    labels.append(label)


Evaluation using accuracy, precision, recall, F1 score 

In [0]:
xtrain, xtest, ytrain, ytest = train_test_split(features2, labels)
mlp2 = MLPClassifier()
max = 0
for i in range(0,100):
    mlp2.fit(xtrain,ytrain)
    score = mlp2.score(xtest, ytest)
    if score > max:
        max = score
        print('Accuracy: ',score)
        yhat_classes = mlp2.predict(xtest)
        #yhat_classes = yhat_classes[:, 0]
        # precision tp / (tp + fp)
        precision = precision_score(ytest, yhat_classes)
        print('Precision: %f' % precision)
        # recall: tp / (tp + fn)
        recall = recall_score(ytest, yhat_classes)
        print('Recall: %f' % recall)
        # f1: 2 tp / (2 tp + fp + fn)
        f1 = f1_score(ytest, yhat_classes)
        print('F1 score: %f' % f1)