## Filipino Fake News Detector

This project was made by:
- Justin Clyde Frongoso
- Medwin Devilleres
- Rae Gabriel Samonte
- Alquen Antonio Sarmiento

This project is implemented as a chrome extension tool that helps identify if an article contains fake content in the form of a paragraph, phrase or sentence through the use of the Multinomial Naive Bayes model in predicting the validity of Filipino news articles. This Jupyter notebook is made for documentation and demonstration only.

The steps for the implementation can be seen below:

### 1. Import Required Libraries

First, we import the necessary libraries.

In [90]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn import metrics
from sklearn.feature_extraction.text import TfidfVectorizer

import stops

### 2. Import Dataset

We now import the dataset as well as separate the features (in this case, only the article) and the result.

In [65]:
path = 'full.csv'
data = pd.read_csv(path)
data.head()

Unnamed: 0,label,article
0,0,"Ayon sa TheWrap.com, naghain ng kaso si Krupa,..."
1,0,Kilala rin ang singer sa pagkumpas ng kanyang ...
2,0,"BLANTYRE, Malawi (AP) -- Bumiyahe patungong Ma..."
3,0,"Kasama sa programa ang pananalangin, bulaklak ..."
4,0,Linisin ang Friendship Department dahil dadala...


In [66]:
data.label.value_counts()

0    1603
1    1603
Name: label, dtype: int64

In [67]:
X = data['article']
y = data['label']

### 3. Splitting the Dataset (for training and testing)

The data will now be splitted into two: training set and test set. Since there are only 3000+ rows, we are splitting the data in this way: 80% training and 20% testing.

In [85]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20)

print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

(2564,)
(642,)
(2564,)
(642,)


### 4. Vectorizing the Dataset

Now, we need to vectorize the dataset to process it.

In [86]:
vect = CountVectorizer(stop_words = [i for i in stops.stop_words])
X_train_dtm = vect.fit_transform(X_train)
X_test_dtm = vect.transform(X_test)
X_test_dtm

<642x34717 sparse matrix of type '<class 'numpy.int64'>'
	with 51794 stored elements in Compressed Sparse Row format>

### 5. Building the Model

We are using the Multinomial Naive Bayes Classifier as it is suitable for classification of discrete features.

In [87]:
nb = MultinomialNB()
nb.fit(X_train_dtm, y_train)

MultinomialNB()

In [88]:
y_pred_class = nb.predict(X_test_dtm)

In [89]:
metrics.accuracy_score(y_test, y_pred_class)

0.9330218068535826

### 6. Predicting Text

Now that we trained the model, it is now time to use it and predict some pieces of text.

In [None]:
# wala pa