# Assignment 5: Fake News Detection using LSTM + BiLSTM
## **CSS432 Natural Language Processing and Information Retrieval**
### Section 1

---

## **Objective:**


The goal of this project is to create and assess LSTM and BiLSTM models specifically for identifying fake news. This endeavor will provide students with hands-on knowledge in the fields of natural language processing (NLP) and advanced learning strategies, encompassing areas such as text preparation, crafting model frameworks, adjusting hyperparameters, and evaluating model performance. By conducting experiments and analyzing results, students will investigate how various model configurations and parameters impact the accuracy of fake news detection, enhancing their grasp of applying deep learning techniques to text categorization challenges.



---

## **Download Dataset**

> [download link](https://gitlab.com/atlonxp/siit-nlp/-/raw/main/dl-rnn/fake-new-dataset.zip?ref_type=heads&inline=false )



In [None]:
# Download
!curl "https://gitlab.com/atlonxp/siit-nlp/-/raw/main/dl-rnn/fake-new-dataset.zip?ref_type=heads&inline=false" --output "fake-news.zip"

# Extract
import zipfile
with zipfile.ZipFile('fake-news.zip','r') as zip_ref:
  zip_ref.extractall()

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 90.8M  100 90.8M    0     0  95.3M      0 --:--:-- --:--:-- --:--:-- 95.3M




---


# LSTM

## Import Libraries

In [None]:
import tensorflow as tf
import numpy as np
import pandas as pd
import csv

## Preprocessing

Convert csv file into list

In [None]:
filename = "fake-news-dataset.csv"
dataset = open(filename, 'r')
dataset_reader = csv.reader(dataset)

# Set new field size limit
csv.field_size_limit(1000000)

# Convert to list
dataset_list = list(dataset_reader)

#Remove index column
dataset_list = [row[1:] for row in dataset_list]
print(dataset_list[0])

['title', 'text', 'label']


See the dataset

In [None]:
df = pd.DataFrame(dataset_list[1:6], columns=dataset_list[0])
df.style

Unnamed: 0,title,text,label
0,LAW ENFORCEMENT ON HIGH ALERT Following Threats Against Cops And Whites On 9-11By #BlackLivesMatter And #FYF911 Terrorists [VIDEO],"No comment is expected from Barack Obama Members of the #FYF911 or #FukYoFlag and #BlackLivesMatter movements called for the lynching and hanging of white people and cops. They encouraged others on a radio show Tuesday night to turn the tide and kill white people and cops to send a message about the killing of black people in America.One of the F***YoFlag organizers is called Sunshine. She has a radio blog show hosted from Texas called, Sunshine s F***ing Opinion Radio Show. A snapshot of her #FYF911 @LOLatWhiteFear Twitter page at 9:53 p.m. shows that she was urging supporters to Call now!! #fyf911 tonight we continue to dismantle the illusion of white Below is a SNAPSHOT Twitter Radio Call Invite #FYF911The radio show aired at 10:00 p.m. eastern standard time.During the show, callers clearly call for lynching and killing of white people.A 2:39 minute clip from the radio show can be heard here. It was provided to Breitbart Texas by someone who would like to be referred to as Hannibal. He has already received death threats as a result of interrupting #FYF911 conference calls.An unidentified black man said when those mother f**kers are by themselves, that s when when we should start f***ing them up. Like they do us, when a bunch of them ni**ers takin one of us out, that s how we should roll up. He said, Cause we already roll up in gangs anyway. There should be six or seven black mother f**ckers, see that white person, and then lynch their ass. Let s turn the tables. They conspired that if cops started losing people, then there will be a state of emergency. He speculated that one of two things would happen, a big-ass [R s?????] war, or ni**ers, they are going to start backin up. We are already getting killed out here so what the f**k we got to lose? Sunshine could be heard saying, Yep, that s true. That s so f**king true. He said, We need to turn the tables on them. Our kids are getting shot out here. Somebody needs to become a sacrifice on their side.He said, Everybody ain t down for that s**t, or whatever, but like I say, everybody has a different position of war. He continued, Because they don t give a f**k anyway. He said again, We might as well utilized them for that s**t and turn the tables on these n**ers. He said, that way we can start lookin like we ain t havin that many casualties, and there can be more causalities on their side instead of ours. They are out their killing black people, black lives don t matter, that s what those mother f**kers so we got to make it matter to them. Find a mother f**ker that is alone. Snap his ass, and then f***in hang him from a damn tree. Take a picture of it and then send it to the mother f**kers. We just need one example, and then people will start watchin . This will turn the tables on s**t, he said. He said this will start a trickle-down effect. He said that when one white person is hung and then they are just flat-hanging, that will start the trickle-down effect. He continued, Black people are good at starting trends. He said that was how to get the upper-hand. Another black man spoke up saying they needed to kill cops that are killing us. The first black male said, That will be the best method right there. Breitbart Texas previously reported how Sunshine was upset when racist white people infiltrated and disrupted one of her conference calls. She subsequently released the phone number of one of the infiltrators. The veteran immediately started receiving threatening calls.One of the #F***YoFlag movement supporters allegedly told a veteran who infiltrated their publicly posted conference call, We are going to rape and gut your pregnant wife, and your f***ing piece of sh*t unborn creature will be hung from a tree. Breitbart Texas previously encountered Sunshine at a Sandra Bland protest at the Waller County Jail in Texas, where she said all white people should be killed. She told journalists and photographers, You see this nappy-ass hair on my head? That means I am one of those more militant Negroes. She said she was at the protest because these redneck mother-f**kers murdered Sandra Bland because she had nappy hair like me. #FYF911 black radicals say they will be holding the imperial powers that are actually responsible for the terrorist attacks on September 11th accountable on that day, as reported by Breitbart Texas. There are several websites and Twitter handles for the movement. Palmetto Star describes himself as one of the head organizers. He said in a YouTube video that supporters will be burning their symbols of the illusion of their superiority, their false white supremacy, like the American flag, the British flag, police uniforms, and Ku Klux Klan hoods.Sierra McGrone or Nocturnus Libertus posted, you too can help a young Afrikan clean their a** with the rag of oppression. She posted two photos, one that appears to be herself, and a photo of a black man, wiping their naked butts with the American flag.For entire story: Breitbart News",1
1,,Did they post their votes for Hillary already?,1
2,UNBELIEVABLE! OBAMA’S ATTORNEY GENERAL SAYS MOST CHARLOTTE RIOTERS WERE “PEACEFUL” PROTESTERS…In Her Home State Of North Carolina [VIDEO],"Now, most of the demonstrators gathered last night were exercising their constitutional and protected right to peaceful protest in order to raise issues and create change. Loretta Lynch aka Eric Holder in a skirt",1
3,"Bobby Jindal, raised Hindu, uses story of Christian conversion to woo evangelicals for potential 2016 bid","A dozen politically active pastors came here for a private dinner Friday night to hear a conversion story unique in the context of presidential politics: how Louisiana Gov. Bobby Jindal traveled from Hinduism to Protestant Christianity and, ultimately, became what he calls an “evangelical Catholic.” Over two hours, Jindal, 42, recalled talking with a girl in high school who wanted to “save my soul,” reading the Bible in a closet so his parents would not see him and feeling a stir while watching a movie during his senior year that depicted Jesus on the cross. “I was struck, and struck hard,” Jindal told the pastors. “This was the Son of God, and He had died for our sins.” Jindal’s session with the Christian clergy, who lead congregations in the early presidential battleground states of Iowa and South Carolina, was part of a behind-the-scenes effort by the Louisiana governor to find a political base that could help propel him into the top tier of Republican candidates seeking to run for the White House in 2016. Known in GOP circles mostly for his mastery of policy issues such as health care, Jindal, a Rhodes Scholar and graduate of the Ivy League’s Brown University, does not have an obvious pool of activist supporters to help drive excitement outside his home state. So he is harnessing his religious experience in a way that has begun to appeal to parts of the GOP’s influential core of religious conservatives, many of whom have yet to find a favorite among the Republicans eyeing the presidential race. Other potential 2016 GOP candidates are wooing the evangelical base, including Sens. Rand Paul (Ky.) and Ted Cruz (Tex.) and Indiana Gov. Mike Pence. But over the weekend in Lynchburg — a mecca of sorts for evangelicals as the home of Liberty University, founded in the 1970s by the Rev. Jerry Falwell — Jindal appeared to make progress. In addition to his dinner with the pastors, he delivered a well-received “call to action” address to 40,000 Christian conservatives gathered for Liberty’s commencement ceremony, talking again about his faith while assailing what he said was President Obama’s record of attacking religious liberty. The pastors who came to meet Jindal said his intimate descriptions of his experiences stood out. “He has the convictions, and he has what it takes to communicate them,” said Brad Sherman of Solid Rock Christian Church in Coralville, Iowa. Sherman helped former Arkansas governor Mike Huckabee in his winning 2008 campaign for delegates in Iowa. Another Huckabee admirer, the Rev. C. Mitchell Brooks of Second Baptist Church in Belton, S.C., said Jindal’s commitment to Christian values and his compelling story put him “on a par” with Huckabee, who was a Baptist preacher before entering politics. The visiting pastors flew to Lynchburg over the weekend at the invitation of the American Renewal Project, a well-funded nonprofit group that encourages evangelical Christians to engage in the civic arena with voter guides, get-out-the-vote drives and programs to train pastors in grass-roots activism. The group’s founder, David Lane, has built a pastor network in politically important states such as Iowa, Missouri, Ohio and South Carolina and has led trips to Israel with Paul and others seeking to make inroads with evangelical activists. The group that Lane invited to Lynchburg included Donald Wild­mon, a retired minister and founder of the American Family Association, a prominent evangelical activist group that has influence through its network of more than 140 Christian radio stations. Most of the pastors that Lane’s organization brought to Lynchburg had not met Jindal. But they said he captured their interest recently when he stepped forward to defend Phil Robertson, patriarch of the “Duck Dynasty” television-show family, amid a controversy over disparaging remarks he made about gays in an interview with GQ magazine. Throughout his Lynchburg visit, Jindal presented himself as a willing culture warrior. During his commencement address Saturday, he took up the cause of twin brothers whose HGTV reality series about renovating and reselling houses, “Flip It Forward,” was canceled last week after a Web site revealed that they had protested against same-sex marriage at the 2012 Democratic National Convention in Charlotte. The siblings, Jason and David Benham, both Liberty graduates, attended the graduation and a private lunch with Jindal, who called the action against them “another demonstration of intolerance from the entertainment industry.” “If these guys had protested at the Republican Party convention, instead of canceling their show, HGTV would probably have given them a raise,” Jindal said as the Liberty crowd applauded. He cited the Hobby Lobby craft store chain, which faced a legal challenge after refusing to provide employees with insurance coverage for contraceptives as required under the Affordable Care Act. Members of the family that owns Hobby Lobby, who have become heroes to many religious conservatives, have said that they are morally opposed to the use of certain types of birth control and that they considered the requirement a violation of their First Amendment right to religious freedom. The family was “committed to honor the Lord by being generous employers, paying well above minimum wage and increasing salaries four years in a row even in the midst of the enduring recession,” Jindal told the Liberty graduates. “None of this matters to the Obama administration.” But for the pastors who came to see Jindal in action, the governor’s own story was the highlight of the weekend. And in many ways, he was unlike any other aspiring president these activists had met. Piyush Jindal was born in 1971, four months after his parents arrived in Baton Rouge, La., from their native India. He changed his name to Bobby as a young boy, adopting the name of a character on a favorite television show, “The Brady Bunch.” His decision to become a Christian, he told the pastors, did not come in one moment of lightning epiphany. Instead, he said, it happened in phases, growing from small seeds planted over time. Jindal recalled that his closest friend from grade school gave him a Bible with his name emblazoned in gold on the cover as a Christmas present. It struck him initially as an unimpressive gift, Jindal told the pastors. “Who in the world would spend good money for a Bible when everyone knows you can get one free in any hotel?” he recalled thinking at the time. “And the gold lettering meant I couldn’t give it away or return it.” His religious education reached a higher plane during his junior year in high school, he told his dinner audience. He wanted to ask a pretty girl on a date during a hallway conversation, and she started talking about her faith in God and her opposition to abortion. The girl invited him to visit her church. Jindal said he was skeptical and set out to “investigate all these fanciful claims” made by the girl and other friends. He started reading the Bible in his closet at home. “I was unsure how my parents would react,” he said. After the stirring moment when he saw Christ depicted on the cross during the religious movie, the Bible and his very existence suddenly seemed clearer to him, Jindal told the pastors. Jindal did not dwell on his subsequent conversion to Catholicism just a few years later in college, where he said he immersed himself in the traditions of the church. He touched on it briefly during the commencement address, noting in passing that “I am best described as an evangelical Catholic.” Mostly, he sought to showcase the ways in which he shares values with other Christian conservatives. “I read the words of Jesus Christ, and I realized that they were true,” Jindal told the graduates Saturday, offering a less detailed accounting of his conversion than he had done the night before with the pastors. “I used to think that I had found God, but I believe it is more accurate to say that He found me.”",0
4,SATAN 2: Russia unvelis an image of its terrifying new ‘SUPERNUKE’ – Western world takes notice,"The RS-28 Sarmat missile, dubbed Satan 2, will replace the SS-18 Flies at 4.3 miles (7km) per sec and with a range of 6,213 miles (10,000km) The weapons are perceived as part of an increasingly aggressive Russia It could deliver a warhead of 40 megatons – 2,000 times as powerful as the atom bombs dropped on Hiroshima and Nagasaki in 1945 By LIBBY PLUMMER and GARETH DAVIE S Russia has unveiled chilling pictures of its largest ever nuclear missile, capable of destroying an area the size of France. The RS-28 Sarmat missile, dubbed Satan 2 by Nato, has a top speed of 4.3 miles (7km) per second and has been designed to outfox anti-missile shield systems. The new Sarmat missile could deliver warheads of 40 megatons – 2,000 times as powerful as the atom bombs dropped on Hiroshima and Nagasaki in 1945. Scroll down for video Russian President Vladimir Putin is reportedly planning to replace the country’s older SS-18 Satan weapons with the new missiles amid a string of recent disagreements with the West. The Kremlin has stepped up the rhetoric against the West and carried a series of manoeuvres that has infuriated politicians in the US and UK. The pictures were revealed online by chief designers from the Makeyev Rocket Design Bureau. A message posted alongside the picture said: ‘In accordance with the Decree of the Russian Government ‘On the State Defense Order for 2010 and the planning period 2012-2013’, the Makeyev Rocket Design Bureau was instructed to start design and development work on the Sarmat. ‘ The RS-28 Sarmat missile is said to contain 16 nuclear warheads and is capable of destroying an area the size of France or Texas, according to Russian news network Zvezda, which is owned by Russia’s ministry of defence. The weapon is also able to evade radar. It is expected to have a range of 6,213 miles (10,000 km), which would allow Moscow to attack London and FOR ENTIRE ARTICLE CLICK LINK",1


Since the text is very long, only the title will be used.

In [None]:
#Remove text column
dataset_list = [[row[0]] + [row[-1]] for row in dataset_list]
df = pd.DataFrame(dataset_list[1:6], columns=dataset_list[0])
df.style

Unnamed: 0,title,label
0,LAW ENFORCEMENT ON HIGH ALERT Following Threats Against Cops And Whites On 9-11By #BlackLivesMatter And #FYF911 Terrorists [VIDEO],1
1,,1
2,UNBELIEVABLE! OBAMA’S ATTORNEY GENERAL SAYS MOST CHARLOTTE RIOTERS WERE “PEACEFUL” PROTESTERS…In Her Home State Of North Carolina [VIDEO],1
3,"Bobby Jindal, raised Hindu, uses story of Christian conversion to woo evangelicals for potential 2016 bid",0
4,SATAN 2: Russia unvelis an image of its terrifying new ‘SUPERNUKE’ – Western world takes notice,1


Remove entry with empty title

In [None]:
#Remove empty title
dataset_list = [row for row in dataset_list if row[0]]
df = pd.DataFrame(dataset_list[1:6], columns=dataset_list[0])
df.style

Unnamed: 0,title,label
0,LAW ENFORCEMENT ON HIGH ALERT Following Threats Against Cops And Whites On 9-11By #BlackLivesMatter And #FYF911 Terrorists [VIDEO],1
1,UNBELIEVABLE! OBAMA’S ATTORNEY GENERAL SAYS MOST CHARLOTTE RIOTERS WERE “PEACEFUL” PROTESTERS…In Her Home State Of North Carolina [VIDEO],1
2,"Bobby Jindal, raised Hindu, uses story of Christian conversion to woo evangelicals for potential 2016 bid",0
3,SATAN 2: Russia unvelis an image of its terrifying new ‘SUPERNUKE’ – Western world takes notice,1
4,About Time! Christian Group Sues Amazon and SPLC for Designation as Hate Group,1


In [None]:
print(len(dataset_list))

71577


Remove entry with empty label

In [None]:
dataset_list = [row for row in dataset_list if row[1].isnumeric()]
print(len(dataset_list))

71576


Preprocess Text

In [None]:
import re
import string
import nltk
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer
from nltk.tokenize import word_tokenize

nltk.download('stopwords')
nltk.download('punkt')

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [None]:
#Adapted from assignment 1
def process_title(title):
    """Process tweet function.
    Input:
        title: a string containing a title
    Output:
        title_clean: a string of cleaned title

    """
    stemmer = PorterStemmer()
    stopwords_english = stopwords.words('english')
    # only take alphabets
    title = re.sub(r'[^a-zA-Z]', ' ', title)
    # lower case
    title = title.lower()
    # tokenize
    title_token = word_tokenize(title, language='english', preserve_line=False)

    title_clean = ""
    for word in title_token:
        if (word not in stopwords_english): # remove stopwords
            stem_word = stemmer.stem(word)  # stemming word
            title_clean =  title_clean + " " + stem_word

    return title_clean

Test preprocess_title

In [None]:
print(process_title(dataset_list[1][0]))
print("\n" + dataset_list[1][0])

 unbeliev obama attorney gener say charlott rioter peac protest home state north carolina video

UNBELIEVABLE! OBAMA’S ATTORNEY GENERAL SAYS MOST CHARLOTTE RIOTERS WERE “PEACEFUL” PROTESTERS…In Her Home State Of North Carolina [VIDEO]


Preprocess the dataset

In [None]:
for row in dataset_list:
  row[0] = process_title(row[0])

See Preprocessed title

In [None]:
df = pd.DataFrame(dataset_list[:20], columns=["title", "label"])
df.style

Unnamed: 0,title,label
0,law enforc high alert follow threat cop white blacklivesmatt fyf terrorist video,1
1,unbeliev obama attorney gener say charlott rioter peac protest home state north carolina video,1
2,bobbi jindal rais hindu use stori christian convers woo evangel potenti bid,0
3,satan russia unv imag terrifi new supernuk western world take notic,1
4,time christian group sue amazon splc design hate group,1
5,dr ben carson target ir never audit spoke nation prayer breakfast,1
6,hous intel chair trump russia fake stori evid anyth video,1
7,sport bar owner ban nfl game show true american sport like speak rural america video,1
8,latest pipelin leak underscor danger dakota access pipelin,1
9,gop senat smack punchabl alt right nazi internet,1




---
## Define model

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Dropout
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import TextVectorization
from tensorflow.keras.layers import Embedding
from tensorflow.keras.utils import pad_sequences
from tensorflow.keras.callbacks import ModelCheckpoint
from tensorflow.keras.utils import to_categorical

Split into corpus and labels

In [None]:
corpus = [row[0] for row in dataset_list]
label = [int(row[1]) for row in dataset_list]

In [None]:
for i in range(20):
  print(corpus[i], label[i])

 law enforc high alert follow threat cop white blacklivesmatt fyf terrorist video 1
 unbeliev obama attorney gener say charlott rioter peac protest home state north carolina video 1
 bobbi jindal rais hindu use stori christian convers woo evangel potenti bid 0
 satan russia unv imag terrifi new supernuk western world take notic 1
 time christian group sue amazon splc design hate group 1
 dr ben carson target ir never audit spoke nation prayer breakfast 1
 hous intel chair trump russia fake stori evid anyth video 1
 sport bar owner ban nfl game show true american sport like speak rural america video 1
 latest pipelin leak underscor danger dakota access pipelin 1
 gop senat smack punchabl alt right nazi internet 1
 may brexit offer would hurt cost eu citizen eu parliament 0
 schumer call trump appoint offici overse puerto rico relief 0
 watch hilari ad call question health age clinton crime famili boss 1
 chang expect espn polit agenda despit huge subscrib declin breitbart 0
 billionair 

Split into training and test set.

In [None]:
dataset_size = len(label)
split_size = int(dataset_size * 0.8)
x_train = np.array(corpus[:split_size])
y_train =  np.array(label[:split_size])
x_test =  np.array(corpus[split_size:])
y_test =  np.array(label[split_size:])

Check label distribution

In [None]:
train_sum = 0
test_sum = 0
for i in y_train:
  train_sum += i
for i in y_test:
  test_sum += i

print("training set:", train_sum/split_size)
print("test set:", test_sum/(dataset_size-split_size))

training set: 0.5104785190359763
test set: 0.5111763062307907


Text Vectorization

In [None]:
text_vectorizer = TextVectorization(output_mode="int")
text_vectorizer.adapt(x_train)

In [None]:
for i in range(5):
  print(text_vectorizer(x_train[200+i]).numpy())
  print(x_train[200+i])

[ 191  390 3204  316  650 3213 6900  898  426   39  417]
 famili arm robberi suspect outrag pizza hut employe shot kill son
[  59    2  553 2769    2   29  185  409  428 2129    3]
 anti trump radic taunt trump support isi flag photo behead video
[ 279 3819 4414  704   85  146   32  683  166  277    3]
 wow atlanta gym owner ban cop make apolog polici sign video
[ 699   12  340 2978  277  152 1112 3911    4    6    5]
 turkish presid return istanbul sign militari coup falter new york time
[  112  4743   650  4826  4893    18 13611   678  6562  1001   421 12036]
 comment sjw outrag leonardo dicaprio white rumi role unfound iranian explain ztech


Define sequential model

User binary_crossentropy because output is 0 or 1

In [None]:
model = Sequential()
model.add(Embedding(input_dim=text_vectorizer.vocabulary_size(), output_dim=32))
model.add(LSTM(128, input_shape=(1,32), return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(128))
model.add(Dropout(0.2))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_3 (Embedding)     (None, None, 32)          583808    
                                                                 
 lstm_6 (LSTM)               (None, None, 128)         82432     
                                                                 
 dropout_6 (Dropout)         (None, None, 128)         0         
                                                                 
 lstm_7 (LSTM)               (None, 128)               131584    
                                                                 
 dropout_7 (Dropout)         (None, 128)               0         
                                                                 
 dense_3 (Dense)             (None, 1)                 129       
                                                                 
Total params: 797953 (3.04 MB)
Trainable params: 79795



---

## Define checkpoint

In [None]:
filepath = "LSTM-model2-weights-improvement-{epoch:02d}-{loss:.4f}-bigger.keras"
checkpoint_LSTM = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list_LSTM = [checkpoint_LSTM]



---
## Fit model


Text vectorize training and testing set

In [None]:
x_train_v = [text_vectorizer(text) for text in x_train]
x_test_v = [text_vectorizer(text) for text in x_test]

See vectorized titles

In [None]:
for i in range(20):
  print(x_train_v[1000+i])

tf.Tensor([ 251 6666  373 2579 4714  692  421 1086 1380  542], shape=(10,), dtype=int64)
tf.Tensor([ 182 1326  202 2361  243  291   75], shape=(7,), dtype=int64)
tf.Tensor([ 423   57   32 1798 2155 7956], shape=(6,), dtype=int64)
tf.Tensor([  164    72  3052 11140   514  9335], shape=(6,), dtype=int64)
tf.Tensor([ 352 1391 3576  877  309 3352 7119   22 1842 7120 2000  133   14], shape=(13,), dtype=int64)
tf.Tensor([1933  154  139 1066 1423  400  211  653], shape=(8,), dtype=int64)
tf.Tensor([ 221  535  775 2078 2806    2  379 1140  768], shape=(9,), dtype=int64)
tf.Tensor([    2   179    41   564  2793    96 16854    44  2331], shape=(9,), dtype=int64)
tf.Tensor([  18   13  893 1391 1373  388    2   44  427  300    2 3206], shape=(12,), dtype=int64)
tf.Tensor([   2    8   63  791  111  118 2729  264], shape=(8,), dtype=int64)
tf.Tensor([ 25 643 966 177 248  72  70], shape=(7,), dtype=int64)
tf.Tensor([ 400    8  123 2641  340  653  137], shape=(7,), dtype=int64)
tf.Tensor([  41 1629 11

In [None]:
with open('x_train_v.csv', 'w') as f:
    write = csv.writer(f)
with open('x_test_v.csv', 'w') as f:
    write = csv.writer(f)

Pad sequence to equal length

In [None]:
x_train_vp = pad_sequences(x_train_v, padding="post")
x_test_vp = pad_sequences(x_test_v, padding="post")

Fit model

In [None]:
model.fit(x_train_vp, y_train, epochs=10, batch_size=64, validation_split=0.2, callbacks=callbacks_list_LSTM)

Epoch 1/10
Epoch 1: loss improved from inf to 0.41825, saving model to LSTM-model2-weights-improvement-01-0.4183-bigger.keras
Epoch 2/10
Epoch 2: loss improved from 0.41825 to 0.21879, saving model to LSTM-model2-weights-improvement-02-0.2188-bigger.keras
Epoch 3/10
Epoch 3: loss improved from 0.21879 to 0.18299, saving model to LSTM-model2-weights-improvement-03-0.1830-bigger.keras
Epoch 4/10
Epoch 4: loss improved from 0.18299 to 0.15773, saving model to LSTM-model2-weights-improvement-04-0.1577-bigger.keras
Epoch 5/10
Epoch 5: loss improved from 0.15773 to 0.13833, saving model to LSTM-model2-weights-improvement-05-0.1383-bigger.keras
Epoch 6/10
Epoch 6: loss improved from 0.13833 to 0.12330, saving model to LSTM-model2-weights-improvement-06-0.1233-bigger.keras
Epoch 7/10
Epoch 7: loss improved from 0.12330 to 0.10413, saving model to LSTM-model2-weights-improvement-07-0.1041-bigger.keras
Epoch 8/10
Epoch 8: loss improved from 0.10413 to 0.08782, saving model to LSTM-model2-weights

<keras.src.callbacks.History at 0x7d7e7c4e17e0>



---

## Test the model and metrics such as accuracy, precision, recall, and F1-score

Testing the model with least training loss VS least validation loss

In [None]:
model_loss = Sequential()
model_loss.add(Embedding(input_dim=text_vectorizer.vocabulary_size(), output_dim=32))
model_loss.add(LSTM(128, input_shape=(1,32), return_sequences=True))
model_loss.add(Dropout(0.2))
model_loss.add(LSTM(128))
model_loss.add(Dropout(0.2))
model_loss.add(Dense(1, activation='sigmoid'))
model_loss.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model_loss.summary()

model_val_loss = Sequential()
model_val_loss.add(Embedding(input_dim=text_vectorizer.vocabulary_size(), output_dim=32))
model_val_loss.add(LSTM(128, input_shape=(1,32), return_sequences=True))
model_val_loss.add(Dropout(0.2))
model_val_loss.add(LSTM(128))
model_val_loss.add(Dropout(0.2))
model_val_loss.add(Dense(1, activation='sigmoid'))
model_val_loss.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model_val_loss.summary()

Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_4 (Embedding)     (None, None, 32)          583808    
                                                                 
 lstm_8 (LSTM)               (None, None, 128)         82432     
                                                                 
 dropout_8 (Dropout)         (None, None, 128)         0         
                                                                 
 lstm_9 (LSTM)               (None, 128)               131584    
                                                                 
 dropout_9 (Dropout)         (None, 128)               0         
                                                                 
 dense_4 (Dense)             (None, 1)                 129       
                                                                 
Total params: 797953 (3.04 MB)
Trainable params: 79795

In [None]:
#least training loss - Epoch 10
filename_loss = "/content/LSTM-model2-weights-improvement-10-0.0668-bigger.keras"
model_loss.load_weights(filename_loss)
model_loss.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])



In [None]:
#least validation loss - Epoch 3
filename_val_loss = "/content/LSTM-model2-weights-improvement-03-0.1830-bigger.keras"
model_val_loss.load_weights(filename_val_loss)
model_val_loss.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])



Predict on test set

In [None]:
y_predict_loss = model_loss.predict(x_test_vp)
y_predict_val_loss = model_val_loss.predict(x_test_vp)



In [None]:
y_predict_loss = y_predict_loss.round()
y_predict_val_loss = y_predict_val_loss.round()

Metrics

In [None]:
from sklearn.metrics import classification_report, accuracy_score

#y_test is label
#y_predict is prediction
print("Least training loss:\n")
print("Accuracy:", accuracy_score(y_test, y_predict_loss))
print( classification_report(y_test, y_predict_loss))
print("\nLeast validation loss:\n")
print("Accuracy:", accuracy_score(y_test, y_predict_val_loss))
print( classification_report(y_test, y_predict_val_loss))

Least training loss:

Accuracy: 0.8640681754680078
              precision    recall  f1-score   support

           0       0.87      0.85      0.86      6998
           1       0.86      0.87      0.87      7318

    accuracy                           0.86     14316
   macro avg       0.86      0.86      0.86     14316
weighted avg       0.86      0.86      0.86     14316


Least validation loss:

Accuracy: 0.877689298686784
              precision    recall  f1-score   support

           0       0.92      0.82      0.87      6998
           1       0.84      0.93      0.89      7318

    accuracy                           0.88     14316
   macro avg       0.88      0.88      0.88     14316
weighted avg       0.88      0.88      0.88     14316





---
#BiLSTM



---
## Define model


In [None]:
from tensorflow.keras.layers import Bidirectional

Define sequential model

Using two Bidirectional layers

In [None]:
model_bi = Sequential()
model_bi.add(Embedding(input_dim=text_vectorizer.vocabulary_size(), output_dim=32))
model_bi.add(Bidirectional(LSTM(64, input_shape=(1,32), return_sequences=True)))
model_bi.add(Dropout(0.2))
model_bi.add(Bidirectional(LSTM(64)))
model_bi.add(Dropout(0.2))
model_bi.add(Dense(1, activation='sigmoid'))
model_bi.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model_bi.summary()

Model: "sequential_11"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_11 (Embedding)    (None, None, 32)          583808    
                                                                 
 bidirectional_2 (Bidirecti  (None, None, 128)         49664     
 onal)                                                           
                                                                 
 dropout_22 (Dropout)        (None, None, 128)         0         
                                                                 
 bidirectional_3 (Bidirecti  (None, 128)               98816     
 onal)                                                           
                                                                 
 dropout_23 (Dropout)        (None, 128)               0         
                                                                 
 dense_11 (Dense)            (None, 1)               



---
## Define checkpoint

In [None]:
filepath_BiLSTM = "BiLSTM-model2-weights-improvement-{epoch:02d}-{loss:.4f}-bigger.keras"
checkpoint_BiLSTM = ModelCheckpoint(filepath_BiLSTM, monitor='loss', verbose=1, save_best_only=True, mode='min')
callbacks_list_BiLSTM = [checkpoint_BiLSTM]



---
## Fit model


In [None]:
model_bi.fit(x_train_vp, y_train, epochs=10, batch_size=64, validation_split=0.2, callbacks=callbacks_list_BiLSTM)

Epoch 1/10
Epoch 1: loss improved from inf to 0.28393, saving model to BiLSTM-model2-weights-improvement-01-0.2839-bigger.keras
Epoch 2/10
Epoch 2: loss improved from 0.28393 to 0.16783, saving model to BiLSTM-model2-weights-improvement-02-0.1678-bigger.keras
Epoch 3/10
Epoch 3: loss improved from 0.16783 to 0.12671, saving model to BiLSTM-model2-weights-improvement-03-0.1267-bigger.keras
Epoch 4/10
Epoch 4: loss improved from 0.12671 to 0.09713, saving model to BiLSTM-model2-weights-improvement-04-0.0971-bigger.keras
Epoch 5/10
Epoch 5: loss improved from 0.09713 to 0.07910, saving model to BiLSTM-model2-weights-improvement-05-0.0791-bigger.keras
Epoch 6/10
Epoch 6: loss improved from 0.07910 to 0.06022, saving model to BiLSTM-model2-weights-improvement-06-0.0602-bigger.keras
Epoch 7/10
Epoch 7: loss improved from 0.06022 to 0.04802, saving model to BiLSTM-model2-weights-improvement-07-0.0480-bigger.keras
Epoch 8/10
Epoch 8: loss improved from 0.04802 to 0.03850, saving model to BiLST

<keras.src.callbacks.History at 0x7d7e874ef6a0>



---

## Test the model and metrics

In [None]:
model_bi_loss = Sequential()
model_bi_loss.add(Embedding(input_dim=text_vectorizer.vocabulary_size(), output_dim=32))
model_bi_loss.add(LSTM(64, input_shape=(1,32), return_sequences=True))
model_bi_loss.add(Dropout(0.2))
model_bi_loss.add(LSTM(64))
model_bi_loss.add(Dropout(0.2))
model_bi_loss.add(Dense(1, activation='sigmoid'))
model_bi_loss.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model_bi_loss.summary()

model_bi_val_loss = Sequential()
model_bi_val_loss.add(Embedding(input_dim=text_vectorizer.vocabulary_size(), output_dim=32))
model_bi_val_loss.add(LSTM(64, input_shape=(1,32), return_sequences=True))
model_bi_val_loss.add(Dropout(0.2))
model_bi_val_loss.add(LSTM(64))
model_bi_val_loss.add(Dropout(0.2))
model_bi_val_loss.add(Dense(1, activation='sigmoid'))
model_bi_val_loss.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model_bi_val_loss.summary()

Model: "sequential_12"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_12 (Embedding)    (None, None, 32)          583808    
                                                                 
 lstm_24 (LSTM)              (None, None, 64)          24832     
                                                                 
 dropout_24 (Dropout)        (None, None, 64)          0         
                                                                 
 lstm_25 (LSTM)              (None, 64)                33024     
                                                                 
 dropout_25 (Dropout)        (None, 64)                0         
                                                                 
 dense_12 (Dense)            (None, 1)                 65        
                                                                 
Total params: 641729 (2.45 MB)
Trainable params: 6417

In [None]:
#least training loss - Epoch 9
filename_loss = "/content/BiLSTM-model2-weights-improvement-09-0.0337-bigger.keras"
model_bi_loss.load_weights(filename_loss)
model_bi_loss.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])



ValueError: Layer 'lstm_cell' expected 3 variables, but received 0 variables during loading. Expected: ['lstm_24/lstm_cell/kernel:0', 'lstm_24/lstm_cell/recurrent_kernel:0', 'lstm_24/lstm_cell/bias:0']

In [None]:
#least validation loss - Epoch 5
filename_val_loss = "/content/BiLSTM-model2-weights-improvement-05-0.0791-bigger.keras"
model_bi_val_loss.load_weights(filename_val_loss)
model_bi_val_loss.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])



ValueError: Layer 'lstm_cell' expected 3 variables, but received 0 variables during loading. Expected: ['lstm_26/lstm_cell/kernel:0', 'lstm_26/lstm_cell/recurrent_kernel:0', 'lstm_26/lstm_cell/bias:0']

In [None]:
y_predict_bi_loss = model_bi.predict(x_test_vp)
y_predict_bi_val_loss = model_bi_val_loss.predict(x_test_vp)



In [None]:
y_predict_bi_loss = y_predict_bi_loss.round()
y_predict_bi_val_loss = y_predict_bi_val_loss.round()

In [None]:
from sklearn.metrics import classification_report, accuracy_score
#y_test is label
#y_predict is prediction
print("Least training loss:\n")
print("Accuracy:", accuracy_score(y_test, y_predict_bi_loss))
print( classification_report(y_test, y_predict_bi_loss))
print("\nLeast validation loss:\n")
print("Accuracy:", accuracy_score(y_test, y_predict_bi_val_loss))
print( classification_report(y_test, y_predict_val_loss))

Least training loss:

Accuracy: 0.896968426934898
              precision    recall  f1-score   support

           0       0.91      0.88      0.89      6998
           1       0.89      0.92      0.90      7318

    accuracy                           0.90     14316
   macro avg       0.90      0.90      0.90     14316
weighted avg       0.90      0.90      0.90     14316


Least validation loss:

Accuracy: 0.5007683710533669
              precision    recall  f1-score   support

           0       0.92      0.82      0.87      6998
           1       0.84      0.93      0.89      7318

    accuracy                           0.88     14316
   macro avg       0.88      0.88      0.88     14316
weighted avg       0.88      0.88      0.88     14316



# Member Group 13

## **Members Group 13**
```
Jessada Peetapa	     6422770915	 6422770915@g.siit.tu.ac.th
Matas Thanamee	      6422771251	 6422771251@g.siit.tu.ac.th
Woramate Simrum	     6422771400	 6422771400@g.siit.tu.ac.th
Napat Khowyabud	     6422772051	 6422772051@g.siit.tu.ac.th
Pavida Pipatanagovit    6422772069	 6422772069@g.siit.tu.ac.th
```