# **PROJEKT-DETEKCIJA LAŽNIH VIJESTI**

### UVOD
U ovom projektu pokušavat ćemo detektirati lažne vijesti(binarna klasifikacija) pri čemu će vrijediti:

*   0-istinita vijest
*   1-lažna vijest

Ideja je bila isprobavavati različite modele(drugačiji hiperparametri i parametri,skriveni slojevi,optimizatori,funkcije gubitka...).Budući da se radi u tekstu,početna ideja bila je usporediti unaprijedne i rekurentne neuronske mreže.Naknadno se pokazalo da su konvolucijske neuronske mreže jako pogodne za ovaj problem(1D konvolucija) dok su rekurentne neuronske mreže bile jako zahtjevne za treniranje s mojim resursima.

Dataset koji ćemo koristiti preuzet je s Kagglea i zove se WELFake_Dataset koji se sastoji od 35028 točnih i 37106 lažnih vijesti.Možemo primjetiti da je otprilike jednako vijesti koje su lažne i istinite.






###POTREBNE BIBLIOTEKE

In [1]:
import zipfile
import pandas as pd
import os

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences


###UČITAVANJE I PRIPREMA PODATAKA

Prvo nam je cilj učitati podatke i vidjeti kako oni izgledaju.

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
%cd /content/drive/MyDrive/PROJEKT-ANM

/content/drive/MyDrive/PROJEKT-ANM


In [4]:
zip_file_path = 'WELFake_Dataset.csv.zip'

# Putanja do direktorija u koji želite ekstraktirati sadržaj zip datoteke
output_directory = '/content/Drive/MyDrive/PROJEKT-ANM'

# Provjerite je li datoteka prisutna
if os.path.exists(zip_file_path):
    # Stvorite odredišnu mapu ako ne postoji
    os.makedirs(output_directory, exist_ok=True)

    # Ekstraktirajte sadržaj zip datoteke
    with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
        zip_ref.extractall(output_directory)

    # Prikazujete sadržaj odredišne mape
    extracted_files = os.listdir(output_directory)
    print(extracted_files)
else:
    print(f"Datoteka {zip_file_path} nije pronađena.")

['WELFake_Dataset.csv']


In [5]:
csv_file_path = os.path.join(output_directory, 'WELFake_Dataset.csv')

# Učitajte CSV datoteku u Pandas DataFrame
podaci = pd.read_csv(csv_file_path)

In [6]:
podaci

Unnamed: 0.1,Unnamed: 0,title,text,label
0,0,LAW ENFORCEMENT ON HIGH ALERT Following Threat...,No comment is expected from Barack Obama Membe...,1
1,1,,Did they post their votes for Hillary already?,1
2,2,UNBELIEVABLE! OBAMA’S ATTORNEY GENERAL SAYS MO...,"Now, most of the demonstrators gathered last ...",1
3,3,"Bobby Jindal, raised Hindu, uses story of Chri...",A dozen politically active pastors came here f...,0
4,4,SATAN 2: Russia unvelis an image of its terrif...,"The RS-28 Sarmat missile, dubbed Satan 2, will...",1
...,...,...,...,...
72129,72129,Russians steal research on Trump in hack of U....,WASHINGTON (Reuters) - Hackers believed to be ...,0
72130,72130,WATCH: Giuliani Demands That Democrats Apolog...,"You know, because in fantasyland Republicans n...",1
72131,72131,Migrants Refuse To Leave Train At Refugee Camp...,Migrants Refuse To Leave Train At Refugee Camp...,0
72132,72132,Trump tussle gives unpopular Mexican leader mu...,MEXICO CITY (Reuters) - Donald Trump’s combati...,0


In [7]:
podaci.shape

(72134, 4)

In [8]:
podaci.iloc[0] #ispis prvog retka

Unnamed: 0                                                    0
title         LAW ENFORCEMENT ON HIGH ALERT Following Threat...
text          No comment is expected from Barack Obama Membe...
label                                                         1
Name: 0, dtype: object

In [9]:
podaci[podaci['label']==0]

Unnamed: 0.1,Unnamed: 0,title,text,label
3,3,"Bobby Jindal, raised Hindu, uses story of Chri...",A dozen politically active pastors came here f...,0
11,11,"May Brexit offer would hurt, cost EU citizens ...",BRUSSELS (Reuters) - British Prime Minister Th...,0
12,12,Schumer calls on Trump to appoint official to ...,"WASHINGTON (Reuters) - Charles Schumer, the to...",0
14,14,No Change Expected for ESPN Political Agenda D...,As more and more sports fans turn off ESPN to ...,0
15,15,Billionaire Odebrecht in Brazil scandal releas...,RIO DE JANEIRO/SAO PAULO (Reuters) - Billionai...,0
...,...,...,...,...
72124,72124,An Unlikely Contender Rises in France as the A...,"PARIS — In the age of Donald J. Trump, “Bre...",0
72126,72126,Determined to kill: Can tough gun laws end mas...,The flag at Desert Hot Springs' Condor Gun Sho...,0
72129,72129,Russians steal research on Trump in hack of U....,WASHINGTON (Reuters) - Hackers believed to be ...,0
72131,72131,Migrants Refuse To Leave Train At Refugee Camp...,Migrants Refuse To Leave Train At Refugee Camp...,0


Imamo 35208 istinitih vijesti.

In [10]:
podaci[podaci['label']==1]

Unnamed: 0.1,Unnamed: 0,title,text,label
0,0,LAW ENFORCEMENT ON HIGH ALERT Following Threat...,No comment is expected from Barack Obama Membe...,1
1,1,,Did they post their votes for Hillary already?,1
2,2,UNBELIEVABLE! OBAMA’S ATTORNEY GENERAL SAYS MO...,"Now, most of the demonstrators gathered last ...",1
4,4,SATAN 2: Russia unvelis an image of its terrif...,"The RS-28 Sarmat missile, dubbed Satan 2, will...",1
5,5,About Time! Christian Group Sues Amazon and SP...,All we can say on this one is it s about time ...,1
...,...,...,...,...
72125,72125,WOW! JILL STEIN’S ‘FIRESIDE CHAT’ Exposes Her ...,,1
72127,72127,WIKILEAKS EMAIL SHOWS CLINTON FOUNDATION FUNDS...,An email released by WikiLeaks on Sunday appea...,1
72128,72128,JUDGE JEANINE SOUNDS FREE SPEECH ALARM: “They ...,Judge Jeanine lets it rip! She s concerned wit...,1
72130,72130,WATCH: Giuliani Demands That Democrats Apolog...,"You know, because in fantasyland Republicans n...",1


Imamo 37106 lažnih vijesti.

###PROVJERA IMA LI NULL VRIJEDNOSTI

---



In [11]:
null_count = podaci.isnull().sum()


In [12]:
null_count

Unnamed: 0      0
title         558
text           39
label           0
dtype: int64

Imamo jako malo nul vrijednosti što je zanemarivo s obzirom na ukupan broj podataka pa ćemo izbaciti sve retke s null vrijednostima.

In [13]:
podaci=podaci.dropna()

In [14]:
podaci

Unnamed: 0.1,Unnamed: 0,title,text,label
0,0,LAW ENFORCEMENT ON HIGH ALERT Following Threat...,No comment is expected from Barack Obama Membe...,1
2,2,UNBELIEVABLE! OBAMA’S ATTORNEY GENERAL SAYS MO...,"Now, most of the demonstrators gathered last ...",1
3,3,"Bobby Jindal, raised Hindu, uses story of Chri...",A dozen politically active pastors came here f...,0
4,4,SATAN 2: Russia unvelis an image of its terrif...,"The RS-28 Sarmat missile, dubbed Satan 2, will...",1
5,5,About Time! Christian Group Sues Amazon and SP...,All we can say on this one is it s about time ...,1
...,...,...,...,...
72129,72129,Russians steal research on Trump in hack of U....,WASHINGTON (Reuters) - Hackers believed to be ...,0
72130,72130,WATCH: Giuliani Demands That Democrats Apolog...,"You know, because in fantasyland Republicans n...",1
72131,72131,Migrants Refuse To Leave Train At Refugee Camp...,Migrants Refuse To Leave Train At Refugee Camp...,0
72132,72132,Trump tussle gives unpopular Mexican leader mu...,MEXICO CITY (Reuters) - Donald Trump’s combati...,0


###IZBACIVANJE PRVOG STUPCA

In [15]:
podaci = podaci.drop(podaci.columns[:1], axis=1)


In [16]:
podaci

Unnamed: 0,title,text,label
0,LAW ENFORCEMENT ON HIGH ALERT Following Threat...,No comment is expected from Barack Obama Membe...,1
2,UNBELIEVABLE! OBAMA’S ATTORNEY GENERAL SAYS MO...,"Now, most of the demonstrators gathered last ...",1
3,"Bobby Jindal, raised Hindu, uses story of Chri...",A dozen politically active pastors came here f...,0
4,SATAN 2: Russia unvelis an image of its terrif...,"The RS-28 Sarmat missile, dubbed Satan 2, will...",1
5,About Time! Christian Group Sues Amazon and SP...,All we can say on this one is it s about time ...,1
...,...,...,...
72129,Russians steal research on Trump in hack of U....,WASHINGTON (Reuters) - Hackers believed to be ...,0
72130,WATCH: Giuliani Demands That Democrats Apolog...,"You know, because in fantasyland Republicans n...",1
72131,Migrants Refuse To Leave Train At Refugee Camp...,Migrants Refuse To Leave Train At Refugee Camp...,0
72132,Trump tussle gives unpopular Mexican leader mu...,MEXICO CITY (Reuters) - Donald Trump’s combati...,0


In [17]:
podaci['title']

0        LAW ENFORCEMENT ON HIGH ALERT Following Threat...
2        UNBELIEVABLE! OBAMA’S ATTORNEY GENERAL SAYS MO...
3        Bobby Jindal, raised Hindu, uses story of Chri...
4        SATAN 2: Russia unvelis an image of its terrif...
5        About Time! Christian Group Sues Amazon and SP...
                               ...                        
72129    Russians steal research on Trump in hack of U....
72130     WATCH: Giuliani Demands That Democrats Apolog...
72131    Migrants Refuse To Leave Train At Refugee Camp...
72132    Trump tussle gives unpopular Mexican leader mu...
72133    Goldman Sachs Endorses Hillary Clinton For Pre...
Name: title, Length: 71537, dtype: object

Smanjujemo broj podataka jer je postupak tokenizacije jako memorijski zahtjevan.

In [18]:
podaci_labela_0 = podaci[podaci['label'] == 0]
podaci_labela_1 = podaci[podaci['label'] == 1]

# Uzimamo polovicu podataka iz svake klase
uzorak_labela_0 = podaci_labela_0.sample(n=17500, random_state=42)
uzorak_labela_1 = podaci_labela_1.sample(n=17500, random_state=42)

# Kombinirajte uzorke u novi skup podataka
podaci = pd.concat([uzorak_labela_0, uzorak_labela_1])

###TOKENIZACIJA,UKLANJANJE STOP RIJEČI TE NEPOTREBNIH ZNAKOVA

In [19]:
import nltk

nltk.download('punkt')


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

In [20]:
import re
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import ENGLISH_STOP_WORDS

# Preuzimanje stop-riječi iz NLTK i scikit-learn
nltk.download('stopwords')
stop_words_nltk = set(stopwords.words('english'))
stop_words_sklearn = set(ENGLISH_STOP_WORDS)

def preprocess_text(text):
    # Tokenizacija
    tokens = nltk.word_tokenize(text)

    # Uklanjanje nepotrebnih znakova i normalizacija teksta
    tokens = [re.sub(r'[^a-zA-Z0-9]', '', token.lower()) for token in tokens]#uklanja znakove osim slova i brojeva

    # Uklanjanje stop-riječi
    tokens = [token for token in tokens if token not in stop_words_nltk and token not in stop_words_sklearn]

    # Ponovno sastavljanje teksta od tokena
    preprocessed_text = ' '.join(tokens)

    return preprocessed_text

# Primjer primjene funkcije na stupce "text" i "title" u DataFrameu df

podaci['text'] = podaci['text'].apply(preprocess_text)
podaci['title'] = podaci['title'].apply(preprocess_text)
print(podaci)

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.


                                                   title  \
42025  people die eating rawmilk cheese new york stat...   
56304  dup blames sinn fein northern ireland talks co...   
23174  exclusive  brazil prosecutor says successor  p...   
49262  intelligence chief says russia involvement 201...   
19935  macri coalition poised win key argentina midte...   
...                                                  ...   
71185  intelligence says details russian dossier expo...   
16072  justice department  firing workers expired pap...   
7810   clinton loses heckler fingerpointing rant   de...   
42619  trump lied  spontaneous  taiwan  planning talk...   
63277  sarah palin loses sh  rage whines hillary char...   

                                                    text  label  
42025  people died following outbreak listeria linked...      0  
56304  belfast  reuters   senior member northern irel...      0  
23174  brasilia  reuters   senior brazilian law enfor...      0  
49262  washingt

In [21]:
podaci

Unnamed: 0,title,text,label
42025,people die eating rawmilk cheese new york stat...,people died following outbreak listeria linked...,0
56304,dup blames sinn fein northern ireland talks co...,belfast reuters senior member northern irel...,0
23174,exclusive brazil prosecutor says successor p...,brasilia reuters senior brazilian law enfor...,0
49262,intelligence chief says russia involvement 201...,washington reuters director national intell...,0
19935,macri coalition poised win key argentina midte...,buenos aires reuters argentine president ma...,0
...,...,...,...
71185,intelligence says details russian dossier expo...,bombshell dossier exposing donald trump shady ...,1
16072,justice department firing workers expired pap...,justice department control 6 months extra let ...,1
7810,clinton loses heckler fingerpointing rant de...,,1
42619,trump lied spontaneous taiwan planning talk...,according washington post sources close taiwa...,1


###VEKTORIZACIJA PODATAKA

In [22]:
from sklearn.feature_extraction.text import TfidfVectorizer
import scipy.sparse


podaci['combined_text'] = podaci['text'] + ' ' + podaci['title']

# Preprocesiranje i vektorizacija
tfidf_vectorizer = TfidfVectorizer(max_features=5000, ngram_range=(1, 2), min_df=5, max_df=0.9, sublinear_tf=True)

X_tfidf = tfidf_vectorizer.fit_transform(podaci['combined_text'])

y = podaci['label']
X_tfidf_dense = X_tfidf.toarray() if scipy.sparse.issparse(X_tfidf) else X_tfidf


In [23]:
X_tfidf

<35000x5000 sparse matrix of type '<class 'numpy.float64'>'
	with 5198472 stored elements in Compressed Sparse Row format>

In [24]:
X_tfidf[0]

<1x5000 sparse matrix of type '<class 'numpy.float64'>'
	with 204 stored elements in Compressed Sparse Row format>

In [None]:
del X_tfidf
del podaci

###PODJELA NA SKUP ZA TRENIRANJE I TESTIRANJE

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X_tfidf_dense, y, test_size=0.2, random_state=42)


In [None]:
X_train.shape

(28000, 5000)

###**UNAPRIJEDNE NEURONSKE MREŽE**

Krećemo s izradom modela unaprijednih neuronskih mreža.Isprobavat ćemo razne slučajeve:različite brojeve neurona,skrivenih slojeva,epoha,batch_size,drugačije optimizatore i funkcije gubitka te vidjeti koji nam je najbolji krajnji model.


###**1.POKUŠAJ**:2 skrivena sloja,10 epoha,batch_size=32,bez regularizacije

In [None]:
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(128,input_shape=(X_train.shape[1],),activation='relu'))
model.add(tf.keras.layers.Dense(64, activation='relu'))
model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

In [None]:
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])


In [None]:
povijest=model.fit(X_train, y_train, epochs=10,batch_size=32)


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [None]:
model.evaluate(X_test, y_test)




[0.264142781496048, 0.954714298248291]

Vidimo da smo dobili jako veliku točnost.

###**2**.**POKUŠAJ**:2 skrivena sloja,10 epoha,batch_size=32,regularizacija

In [None]:
from tensorflow.keras.layers import Dropout

model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(128,input_shape=(X_train.shape[1],),activation='relu'))
model.add(Dropout(0.5))
model.add(tf.keras.layers.Dense(64, activation='relu'))
model.add(Dropout(0.5))

model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

In [None]:
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])


In [None]:
povijest=model.fit(X_train, y_train, epochs=10,batch_size=32)


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [None]:
model.evaluate(X_test, y_test)




[0.22476930916309357, 0.9562857151031494]

Vidimo da je točnost nešto bolja,a gubitak manji.Ubuduće ćemo koristiti regularizaciju.
###**3.POKUŠAJ**:2 skrivena sloja,10 epoha,batch_size=32,optimizator sgd


In [None]:
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(128,input_shape=(X_train.shape[1],),activation='relu'))
model.add(Dropout(0.5))
model.add(tf.keras.layers.Dense(64, activation='relu'))
model.add(Dropout(0.5))

model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

In [None]:
model.compile(optimizer='sgd', loss='binary_crossentropy', metrics=['accuracy'])


In [None]:
povijest=model.fit(X_train, y_train, epochs=10,batch_size=32)


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [None]:
model.evaluate(X_test, y_test)




[0.1893695890903473, 0.928857147693634]

Točnost je i dalje visoka,ali ipak nešto niža nego s optimizatorom adam.S druge strane,gubitak je manji.

###**4.POKUŠAJ**:2 skrivena sloja,batch_size=32,broj epoha 20,optimizator adam

In [None]:
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(128,input_shape=(X_train.shape[1],),activation='relu'))
model.add(Dropout(0.5))
model.add(tf.keras.layers.Dense(64, activation='relu'))
model.add(Dropout(0.5))

model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

In [None]:
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])


In [None]:
povijest=model.fit(X_train, y_train, epochs=20,batch_size=32)


Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [None]:
model.evaluate(X_test, y_test)




[0.24530182778835297, 0.9591428637504578]

Točnost nešto veća nego kad imamo 10 epoha i optimizator adam,ali gubitak je nešto veći pa ćemo dalje raditi s 10 epoha zbog uštede vremena i neznatne razlike u točnosti(nakon što provjerimo za sgd slučaj).
###**5.POKUŠAJ**:2 skrivena sloja,batch_size=32,broj epoha 20,optimizator sgd

In [None]:
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(128,input_shape=(X_train.shape[1],),activation='relu'))
model.add(Dropout(0.5))
model.add(tf.keras.layers.Dense(64, activation='relu'))
model.add(Dropout(0.5))

model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

In [None]:
model.compile(optimizer='sgd', loss='binary_crossentropy', metrics=['accuracy'])


In [None]:
povijest=model.fit(X_train, y_train, epochs=20,batch_size=32)


Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [None]:
model.evaluate(X_test, y_test)




[0.12028557807207108, 0.9544285535812378]

Točnost neznatno manja,ali je gubitak jako mali.

###**6.POKUŠAJ**:2 skrivena sloja,batch_size=64,broj epoha 20,optimizator sgd

In [None]:
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(128,input_shape=(X_train.shape[1],),activation='relu'))
model.add(Dropout(0.5))
model.add(tf.keras.layers.Dense(64, activation='relu'))
model.add(Dropout(0.5))

model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

In [None]:
model.compile(optimizer='sgd', loss='binary_crossentropy', metrics=['accuracy'])


In [None]:
povijest=model.fit(X_train, y_train, epochs=20,batch_size=64)


Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [None]:
model.evaluate(X_test, y_test)




[0.17902934551239014, 0.9330000281333923]

Točnost nam se nešto smanjila,bolje je ići s batch_size=32

###**7.POKUŠAJ**:3 skrivena sloja,batch_size=32,broj epoha 20,optimizator sgd



In [None]:
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(256,input_shape=(X_train.shape[1],),activation='relu'))
model.add(Dropout(0.5))
model.add(tf.keras.layers.Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(tf.keras.layers.Dense(64,activation='relu'))
model.add(Dropout(0.5))

model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

In [None]:
model.compile(optimizer='sgd', loss='binary_crossentropy', metrics=['accuracy'])


In [None]:
povijest=model.fit(X_train, y_train, epochs=20,batch_size=32)


Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [None]:
model.evaluate(X_test, y_test)




[0.11384072154760361, 0.9595714211463928]

Vidimo da je jedan skriveni sloj više popravio točnost modela(ako ga usporedimo s modelima u kojima smo samo promijenili taj parametar).Gubitak se još dodatno smanjio.
###**8.POKUŠAJ**:3 skrivena sloja,batch_size=32,broj epoha 20,optimizator adam


In [None]:
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(256,input_shape=(X_train.shape[1],),activation='relu'))
model.add(Dropout(0.5))
model.add(tf.keras.layers.Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(tf.keras.layers.Dense(64,activation='relu'))
model.add(Dropout(0.5))

model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

In [None]:
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])


In [None]:
povijest=model.fit(X_train, y_train, epochs=20,batch_size=32)


Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [None]:
model.evaluate(X_test, y_test)




[0.273084431886673, 0.961571455001831]

Ovaj model daje najveću točnost dosad,ali je gubitak veći.
###**9.POKUŠAJ**:3 skrivena sloja,batch_size=32,broj epoha 20,optimizator sgd,aktivacija softmax,


In [None]:
from tensorflow.keras.layers import Dropout


model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(256,input_shape=(X_train.shape[1],),activation='relu'))
model.add(Dropout(0.5))
model.add(tf.keras.layers.Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(tf.keras.layers.Dense(64,activation='relu'))
model.add(Dropout(0.5))

model.add(tf.keras.layers.Dense(1, activation='softmax'))

In [None]:
model.compile(optimizer='sgd', loss='sparse_categorical_crossentropy', metrics=['accuracy'])


In [None]:
povijest=model.fit(X_train, y_train, epochs=20,batch_size=32)


Epoch 1/20


  return dispatch_target(*args, **kwargs)


Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [None]:
model.evaluate(X_test, y_test)


  return dispatch_target(*args, **kwargs)




[nan, 0.49871429800987244]

Jako loša točnost u ovom slučaju.

###**10.POKUŠAJ**:3 skrivena sloja,batch_size=32,broj epoha 20,optimizator adam,aktivacija softmax,gubitak kategorička krosentropija-probati

In [None]:
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(256,input_shape=(X_train.shape[1],),activation='relu'))
model.add(Dropout(0.5))
model.add(tf.keras.layers.Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(tf.keras.layers.Dense(64,input_shape=(X_train.shape[1],),activation='relu'))
model.add(Dropout(0.5))

model.add(tf.keras.layers.Dense(1, activation='softmax'))

In [None]:
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])


In [None]:
povijest=model.fit(X_train, y_train, epochs=20,batch_size=32)


Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [None]:
model.evaluate(X_test, y_test)




[0.0, 0.50128573179245]

Vidimo da kategorička krosentropija nije dobra ideja u ovom primjeru,već binarna krosentropija i sigmoid.
###**11.POKUŠAJ**:3 skrivena sloja,batch_size=32,broj epoha 20,optimizator sgd,aktivacija sigmoid i tanh u skrivenim slojevima

In [None]:
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(256,input_shape=(X_train.shape[1],),activation='tanh'))
model.add(Dropout(0.5))
model.add(tf.keras.layers.Dense(128, activation='tanh'))
model.add(Dropout(0.5))
model.add(tf.keras.layers.Dense(64,activation='tanh'))
model.add(Dropout(0.5))

model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

In [None]:
model.compile(optimizer='sgd', loss='binary_crossentropy', metrics=['accuracy'])


In [None]:
povijest=model.fit(X_train, y_train, epochs=20,batch_size=32)


Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [None]:
model.evaluate(X_test, y_test)




[0.1516234278678894, 0.9462857246398926]

Visoka točnost,nizak gubitak,ali ipak niži nego s funkcijom relu.
###**12.POKUŠAJ**:3 skrivena sloja,batch_size=32,broj epoha 20,optimizator adam,aktivacija sigmoid i tanh u skrivenim slojevima

In [None]:
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(256,input_shape=(X_train.shape[1],),activation='tanh'))
model.add(Dropout(0.5))
model.add(tf.keras.layers.Dense(128, activation='tanh'))
model.add(Dropout(0.5))
model.add(tf.keras.layers.Dense(64,activation='tanh'))
model.add(Dropout(0.5))

model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

In [None]:
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])


In [None]:
povijest=model.fit(X_train, y_train, epochs=20,batch_size=32)


Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [None]:
model.evaluate(X_test, y_test)




[0.2892393171787262, 0.9430000185966492]

Slično kao u prethodnom primjeru,samo dosta veći gubitak.

###**13.POKUŠAJ**:4 skrivena sloja,batch_size=32,broj epoha 20,optimizator sgd,aktivacija sigmoid i relu u skrivenim slojevima

In [None]:
from tensorflow.keras.layers import Dropout


model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(512,input_shape=(X_train.shape[1],),activation='relu'))
model.add(Dropout(0.5))
model.add(tf.keras.layers.Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(tf.keras.layers.Dense(128,activation='relu'))
model.add(Dropout(0.5))
model.add(tf.keras.layers.Dense(64,activation='relu'))
model.add(Dropout(0.5))

model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

In [None]:
model.compile(optimizer='sgd', loss='binary_crossentropy', metrics=['accuracy'])


In [None]:
povijest=model.fit(X_train, y_train, epochs=20,batch_size=32)


Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [None]:
model.evaluate(X_test, y_test)




[0.12315348535776138, 0.9611428380012512]

In [None]:
from sklearn.metrics import f1_score

y_pred = model.predict(X_test)
y_pred_classes = tf.round(y_pred)

# Izračunavanje F1-score
f1 = f1_score(y_test, y_pred_classes)



In [None]:
f1

0.961560203504805

Dosad najbolji model!
###**14.POKUŠAJ**:4 skrivena sloja,batch_size=32,broj epoha 20,optimizator adam,aktivacija sigmoid i relu u skrivenim slojevima

In [None]:
from tensorflow.keras.layers import Dropout


model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(512,input_shape=(X_train.shape[1],),activation='relu'))
model.add(Dropout(0.5))
model.add(tf.keras.layers.Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(tf.keras.layers.Dense(128,activation='relu'))
model.add(Dropout(0.5))
model.add(tf.keras.layers.Dense(64,activation='relu'))
model.add(Dropout(0.5))

model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

In [None]:
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])


In [None]:
povijest=model.fit(X_train, y_train, epochs=20,batch_size=32)


Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


In [None]:
model.evaluate(X_test, y_test)




[0.3548876643180847, 0.9578571319580078]

Dakle,zaključujemo da nam je najbolje rezultate dao 13.pokušaj u kojem smo imali 4 skrivena sloja,aktivacijsku funkciju relu u skrivenim slojevima i sigmoid u izlaznom sloju,optimizator sgd,binarnu krosentropiju.
Za očekivati je da bi još više skrivenih slojeva dalo još bolje rezultate,ali točnost je već sada jako visoka,gubitak dosta mali i mogu biti zadovoljna dobivenim rezultatima.
Također,naknadno sam izračunala i f1_score za ovaj slučaj jer je on dobra mjera kvalitete modela i dobila da je on 0.961,što je odličan rezultat.

###**REKURENTNE NEURONSKE MREŽE**

Započet ćemo s jednostavnom strukturom,slično onoj s vježbi.Može se primjetiti da je uvježbavanje dosta sporije nego kod unaprijednih neuronskih mreža zato ćemo za početak imati manji broj epoha i veći batch_size nego kod unaprijednih neuronskih mreža.
###**1.POKUŠAJ**:1 embedding,1 rekurentni sloj,sigmoid,binarna krosentropija,optimizator adam

In [None]:
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Embedding(input_dim=5000,
                                    output_dim=128
                                    ))
model.add(tf.keras.layers.SimpleRNN(128))
model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

In [None]:
povijest=model.fit(X_train, y_train, epochs=5,batch_size=128)


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [None]:
model.evaluate(X_test, y_test)




[0.6946445107460022, 0.50128573179245]

Vidimo da ovaj model daje jako malu točnost,a i izvršavao se jako dugo.Pokušat ćemo neke tehnike regularizacije.
###**2.POKUŠAJ**:1 embedding,1 rekurentni sloj,regularizacijasigmoid,binarna krosentropija,optimizator adam

In [None]:
from tensorflow.keras.layers import Dropout

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Embedding(input_dim=5000,
                                    output_dim=128
                                    ))
model.add(tf.keras.layers.SimpleRNN(128))
model.add(Dropout(0.2))

model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

In [None]:
povijest=model.fit(X_train, y_train, epochs=5,batch_size=128)


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


Ponovno loša točnost.
###**3.POKUŠAJ**:1 embedding,1 LSTM sloj,regularizacijasigmoid,binarna krosentropija,optimizator adam,parametar više

In [None]:
from tensorflow.keras.layers import Dropout

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Embedding(input_dim=5000,
                                    output_dim=128,input_length=5000
                                    ))
model.add(tf.keras.layers.LSTM(128))
model.add(Dropout(0.2))

model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

In [None]:
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])


In [None]:
model.fit(X_train, y_train, epochs=5, batch_size=128)


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.src.callbacks.History at 0x7fa629e99810>

In [None]:
model.evaluate(X_test, y_test)




[0.6933560371398926, 0.49871429800987244]

Točnost ponovno loša.
###**4.POKUŠAJ**:1 embedding,više LSTM slojeva,regularizacija,sigmoid,binarna krosentropija,optimizator adam,parametar više

In [None]:
from tensorflow.keras.layers import Dropout

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Embedding(input_dim=5000,
                                    output_dim=128,input_length=5000
                                    ))
model.add(tf.keras.layers.LSTM(64,return_sequences=True))
model.add(Dropout(0.2))
model.add(tf.keras.layers.LSTM(64))
model.add(Dropout(0.2))

model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

In [None]:
model.fit(X_train, y_train, epochs=5, batch_size=128)


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.src.callbacks.History at 0x7fa610455a50>

In [None]:
model.evaluate(X_test, y_test)




[0.6931518316268921, 0.50128573179245]

Još loše,ali malo bolje.
###**5.POKUŠAJ**:1 embedding,više LSTM slojeva,regularizacija sigmoid,binarna krosentropija,optimizator adam,parametar više,manji batch_size i više epoha

In [None]:
from tensorflow.keras.layers import Dropout

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Embedding(input_dim=5000,
                                    output_dim=128,input_length=5000
                                    ))
model.add(tf.keras.layers.LSTM(64,return_sequences=True))
model.add(Dropout(0.2))
model.add(tf.keras.layers.LSTM(64))
model.add(Dropout(0.2))

model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

In [None]:
model.fit(X_train, y_train, epochs=10, batch_size=32)


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x7fa6108f3c70>

In [None]:
model.evaluate(X_test, y_test)




[0.693145215511322, 0.50128573179245]

###**6.POKUŠAJ**:1 embedding,više GRU slojeva,regularizacijasigmoid,binarna krosentropija,optimizator adam,parametar više,manji batch_size i više epoha

In [None]:
from tensorflow.keras.layers import Dropout

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Embedding(input_dim=5000,
                                    output_dim=128,input_length=5000
                                    ))
model.add(tf.keras.layers.GRU(64,return_sequences=True))
model.add(Dropout(0.2))
model.add(tf.keras.layers.GRU(64))
model.add(Dropout(0.2))

model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

In [None]:
model.fit(X_train, y_train, epochs=10, batch_size=32)


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x7be081fefcd0>

In [None]:
model.evaluate(X_test, y_test)




[0.6931454539299011, 0.50128573179245]

###**8.POKUŠAJ**:1 embedding,više LSTM slojeva,regularizacija sigmoid,binarna krosentropija,optimizator SGD,parametar više,manji batch_size i više epoha

In [None]:
from tensorflow.keras.layers import Dropout

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Embedding(input_dim=5000,
                                    output_dim=128,input_length=5000
                                    ))
model.add(tf.keras.layers.LSTM(64,return_sequences=True))
model.add(Dropout(0.2))
model.add(tf.keras.layers.LSTM(64))
model.add(Dropout(0.2))

model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='sgd',
              metrics=['accuracy'])

In [None]:
model.fit(X_train, y_train, epochs=10, batch_size=32)


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10

###**7.POKUŠAJ**:1 embedding,više BI-RNN slojeva,regularizacijasigmoid,binarna krosentropija,optimizator adam,parametar više,manji batch_size i više epoha

In [None]:
from tensorflow.keras.layers import Dropout

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Embedding(input_dim=5000,
                                    output_dim=128,input_length=5000
                                    ))
model.add(tf.keras.layers.Bidirectional(tf.keras.layers.SimpleRNN(64, return_sequences=True)))
model.add(Dropout(0.2))
model.add(tf.keras.layers.Bidirectional(tf.keras.layers.SimpleRNN(64)))
model.add(Dropout(0.2))

model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

In [None]:
model.fit(X_train, y_train, epochs=10, batch_size=32)


Epoch 1/10
 42/875 [>.............................] - ETA: 4:34:24 - loss: 0.7259 - accuracy: 0.4836

KeyboardInterrupt: 

In [None]:
model.evaluate(X_test, y_test)




[0.7071397304534912, 0.49871429800987244]

###**9.POKUŠAJ**:1 embedding,više rnn slojeva,sigmoid,binarna krosentropija,optimizator adam,parametar više,manji batch_size i više epoha

In [None]:
from tensorflow.keras.layers import Dropout

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Embedding(input_dim=5000,
                                    output_dim=256
                                    ))
model.add(tf.keras.layers.SimpleRNN(256,return_sequences=True))
model.add(Dropout(0.5))
model.add(tf.keras.layers.SimpleRNN(128,return_sequences=True))
model.add(Dropout(0.5))
model.add(tf.keras.layers.SimpleRNN(64,return_sequences=True))
model.add(Dropout(0.5))
model.add(tf.keras.layers.SimpleRNN(32))
model.add(Dropout(0.5))
model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

In [None]:
model.fit(X_train, y_train, epochs=10, batch_size=128)


Epoch 1/10
 46/219 [=====>........................] - ETA: 1:14:08 - loss: 0.8823 - accuracy: 0.4988

###**10.POKUŠAJ**:1 embedding,sigmoid,binarna krosentropija,optimizator adam,parametar više,manji batch_size i više epoha,early stopping

In [None]:
from keras.callbacks import EarlyStopping

from tensorflow.keras.layers import Dropout

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Embedding(input_dim=5000,
                                    output_dim=128
                                    ))
model.add(tf.keras.layers.SimpleRNN(128))
model.add(Dropout(0.5))

model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

early_stopping = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)


In [None]:
model.fit(X_train, y_train, epochs=5,  callbacks=[early_stopping])


Epoch 1/5



Epoch 2/5



Epoch 3/5



Epoch 4/5



Epoch 5/5

KeyboardInterrupt: 

In [None]:
model.evaluate(X_test, y_test)




[0.6936230063438416, 0.50128573179245]

###**11.POKUŠAJ**:1 embedding,više birnn slojeva,sigmoid,binarna krosentropija,optimizator adam,parametar više,manji batch_size i više epoha,early stopping,batch

In [None]:
from keras.callbacks import EarlyStopping
from tensorflow.keras.layers import Dropout,BatchNormalization,Dense



model = tf.keras.models.Sequential()
model.add(tf.keras.layers.LSTM(64,input_shape=(X_train.shape[1],1), return_sequences=True))
model.add(Dropout(0.2))
model.add(BatchNormalization())

model.add(tf.keras.layers.LSTM(64, return_sequences=True
                                    ))
model.add(Dropout(0.2))
model.add(BatchNormalization())

model.add(Dense(units=32, activation='relu'))

model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

early_stopping = EarlyStopping(monitor='val_loss', patience=5, restore_best_weights=True)

In [None]:
history = model.fit(X_train, y_train, epochs=5, batch_size=32, validation_split=0.2, callbacks=[early_stopping])


NameError: name 'model' is not defined

Vidimo da ni s jednim modelom nisam uspjela značajnije povećati točnost.Moguće da bih korištenjem transfer learninga ili fine tuninga dobila bolje rezultate,ali sam u međuvremenu vidjela da konvolucijske neuronske mreže daju veliku točnost pa sam se posvetila njima.

#**KONVOLUCIJSKE NEURONSKE MREŽE**
Budući da nam rekurentne neuronske mreže nisu dale jako dobre rezultate,pokušat ćemo vidjeti kakve rezultate bi dale konvolucijske neuronske mreže.Iako se one najviše koriste za slike,možda bi mogle dati dobre rezultate u našem slučaju(1D konvolucije pogodne za tekst).

###**1.POKUŠAJ**:jednostavan model,5 epoha,adam,binarna krosentropija(za početak)

In [None]:
from keras.callbacks import EarlyStopping
from tensorflow.keras.layers import Dropout,BatchNormalization,Dense



model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv1D(32, kernel_size=5,input_shape=(X_train.shape[1],1), activation='relu'))
model.add(tf.keras.layers.MaxPool1D(pool_size=2))                                                               # pool_size - dimenzija pooling okvira, po defaultu strides=pool_size
model.add(tf.keras.layers.Conv1D(64, kernel_size=5, activation='relu'))
model.add(tf.keras.layers.MaxPool1D(pool_size=2))

model.add(tf.keras.layers.Flatten())

model.add(tf.keras.layers.Dense(16, activation = 'relu'))

model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

In [None]:
history = model.fit(X_train, y_train, epochs=5)


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [None]:
model.evaluate(X_test, y_test)




[0.1799173802137375, 0.9474285840988159]

Vidimo da smo zapravo na prvom pokušaju dobili jako visoku točnost.Pokušat ćemo je sad još povećati,logičan pokušaj je regularizacija jer nam se točnost smanjila na skupu za testiranje.
###**2.POKUŠAJ**:jednostavan model,5 epoha,adam,binarna krosentropija(za početak),regularizacija

In [None]:
from keras.callbacks import EarlyStopping
from tensorflow.keras.layers import Dropout,BatchNormalization,Dense
from tensorflow.keras import regularizers



model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv1D(32, kernel_size=5,input_shape=(X_train.shape[1],1),kernel_regularizer=regularizers.l2(0.01), activation='relu'))
model.add(tf.keras.layers.MaxPool1D(pool_size=2))                                                               # pool_size - dimenzija pooling okvira, po defaultu strides=pool_size
model.add(tf.keras.layers.Conv1D(64, kernel_size=5, activation='relu',kernel_regularizer=regularizers.l2(0.01)))
model.add(tf.keras.layers.MaxPool1D(pool_size=2))

model.add(tf.keras.layers.Flatten())

model.add(tf.keras.layers.Dense(16, activation = 'relu'))

model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

In [None]:
history = model.fit(X_train, y_train, epochs=5)


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [None]:
model.evaluate(X_test, y_test)




[0.17536437511444092, 0.9457142949104309]

Dobili smo čak nešto lošiju točnost i nešto niži loss,ali i dalje odlične brojke.

###**3.POKUŠAJ**:jednostavan model,5 epoha,sgd,binarna krosentropija(za početak),bez regularizacije

In [None]:
from keras.callbacks import EarlyStopping
from tensorflow.keras.layers import Dropout,BatchNormalization,Dense



model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv1D(32, kernel_size=5,input_shape=(X_train.shape[1],1), activation='relu'))
model.add(tf.keras.layers.MaxPool1D(pool_size=2))                                                               # pool_size - dimenzija pooling okvira, po defaultu strides=pool_size
model.add(tf.keras.layers.Conv1D(64, kernel_size=5, activation='relu'))
model.add(tf.keras.layers.MaxPool1D(pool_size=2))

model.add(tf.keras.layers.Flatten())

model.add(tf.keras.layers.Dense(16, activation = 'relu'))

model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='sgd',
              metrics=['accuracy'])

In [None]:
history = model.fit(X_train, y_train, epochs=5)


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [None]:
model.evaluate(X_test, y_test)




[0.31809699535369873, 0.8658571243286133]

Vidimo da smo dobili lošije rezultate ovim optimizatorom.

###**4.POKUŠAJ**:jednostavan model,5 epoha,adam,binarna krosentropija,regularizacija(droput ovaj put)

In [None]:
from keras.callbacks import EarlyStopping
from tensorflow.keras.layers import Dropout,BatchNormalization,Dense



model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv1D(32, kernel_size=5,input_shape=(X_train.shape[1],1), activation='relu'))
model.add(tf.keras.layers.MaxPool1D(pool_size=2))                                                               # pool_size - dimenzija pooling okvira, po defaultu strides=pool_size
model.add(tf.keras.layers.Conv1D(64, kernel_size=5, activation='relu'))
model.add(tf.keras.layers.MaxPool1D(pool_size=2))
model.add(Dropout(0.25))

model.add(tf.keras.layers.Flatten())

model.add(tf.keras.layers.Dense(16, activation = 'relu'))
model.add(Dropout(0.5))

model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

In [None]:
history = model.fit(X_train, y_train, epochs=5)


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [None]:
model.evaluate(X_test, y_test)




[0.12654374539852142, 0.9557142853736877]

Na ovaj način smo još više povećali točnost i smanjili loss funkciju.

###**5.POKUŠAJ**:jednostavan model,10 epoha,adam,binarna krosentropija(za početak),regularizacija(droput ovaj put)

In [None]:
from keras.callbacks import EarlyStopping
from tensorflow.keras.layers import Dropout,BatchNormalization,Dense



model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv1D(32, kernel_size=5,input_shape=(X_train.shape[1],1), activation='relu'))
model.add(tf.keras.layers.MaxPool1D(pool_size=2))                                                               # pool_size - dimenzija pooling okvira, po defaultu strides=pool_size
model.add(tf.keras.layers.Conv1D(64, kernel_size=5, activation='relu'))
model.add(tf.keras.layers.MaxPool1D(pool_size=2))
model.add(Dropout(0.25))

model.add(tf.keras.layers.Flatten())

model.add(tf.keras.layers.Dense(16, activation = 'relu'))
model.add(Dropout(0.5))

model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

In [None]:
history = model.fit(X_train, y_train, epochs=10)


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [None]:
model.evaluate(X_test, y_test)




[0.15399624407291412, 0.9577142596244812]

Vidimo da smo dobili nešto veću točnost,ali je i gubitak veći od prethodnog pokušaja.

###**6.POKUŠAJ**:jednostavan model,5 epoha,adam,binarna krosentropija(za početak),regularizacija(droput ovaj put),promjenjiva stopa učenja

In [None]:
from keras.callbacks import EarlyStopping
from tensorflow.keras.layers import Dropout,BatchNormalization,Dense
from tensorflow.keras.callbacks import LearningRateScheduler

def exponential_decay(epoch):
    initial_lr = 0.1
    k = 0.1
    lr = initial_lr * np.exp(-k * epoch)
    return lr


model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv1D(32, kernel_size=5,input_shape=(X_train.shape[1],1), activation='relu'))
model.add(tf.keras.layers.MaxPool1D(pool_size=2))                                                               # pool_size - dimenzija pooling okvira, po defaultu strides=pool_size
model.add(tf.keras.layers.Conv1D(64, kernel_size=5, activation='relu'))
model.add(tf.keras.layers.MaxPool1D(pool_size=2))
model.add(Dropout(0.25))

model.add(tf.keras.layers.Flatten())

model.add(tf.keras.layers.Dense(16, activation = 'relu'))
model.add(Dropout(0.5))

model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

lr_scheduler = LearningRateScheduler(exponential_decay)


In [None]:
history = model.fit(X_train, y_train, epochs=5, callbacks=[lr_scheduler])


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [None]:
model.evaluate(X_test, y_test)




[0.6961305141448975, 0.49871429800987244]

Vidimo da su rezultati prilično loši.Zapravo,adam optimizator već ima ugrađeno individualno biranje stope učenja pa zapravo na ovaj način samo radimo štetu.

###**7.POKUŠAJ**:jednostavan model, epoha,adadelta,binarna krosentropija(za početak),regularizacija(droput ovaj put),promjenjiva stopa učenja

In [None]:
from tensorflow.keras.optimizers import Adadelta
from keras.callbacks import EarlyStopping
from tensorflow.keras.layers import Dropout,BatchNormalization,Dense



model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv1D(32, kernel_size=5,input_shape=(X_train.shape[1],1), activation='relu'))
model.add(tf.keras.layers.MaxPool1D(pool_size=2))                                                               # pool_size - dimenzija pooling okvira, po defaultu strides=pool_size
model.add(tf.keras.layers.Conv1D(64, kernel_size=5, activation='relu'))
model.add(tf.keras.layers.MaxPool1D(pool_size=2))
model.add(Dropout(0.25))

model.add(tf.keras.layers.Flatten())

model.add(tf.keras.layers.Dense(16, activation = 'relu'))
model.add(Dropout(0.5))

model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

optimizer = Adadelta()

model.compile(loss='binary_crossentropy',
              optimizer=optimizer,
              metrics=['accuracy'])


In [None]:
history = model.fit(X_train, y_train, epochs=5)


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [None]:
model.evaluate(X_test, y_test)




[0.69020015001297, 0.6664285659790039]

Ovaj optimizator daje lošije rezultate.

###**8.POKUŠAJ**:jednostavan model, epoha,rmsprop,binarna krosentropija(za početak),regularizacija(droput ovaj put),promjenjiva stopa učenja

In [None]:
from tensorflow.keras.optimizers import RMSprop
from keras.callbacks import EarlyStopping
from tensorflow.keras.layers import Dropout,BatchNormalization,Dense



model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv1D(32, kernel_size=5,input_shape=(X_train.shape[1],1), activation='relu'))
model.add(tf.keras.layers.MaxPool1D(pool_size=2))                                                               # pool_size - dimenzija pooling okvira, po defaultu strides=pool_size
model.add(tf.keras.layers.Conv1D(64, kernel_size=5, activation='relu'))
model.add(tf.keras.layers.MaxPool1D(pool_size=2))
model.add(Dropout(0.25))

model.add(tf.keras.layers.Flatten())

model.add(tf.keras.layers.Dense(16, activation = 'relu'))
model.add(Dropout(0.5))

model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

optimizer = RMSprop()

model.compile(loss='binary_crossentropy',
              optimizer=optimizer,
              metrics=['accuracy'])

In [None]:
history = model.fit(X_train, y_train, epochs=5)


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [None]:
model.evaluate(X_test, y_test)




[0.1446104198694229, 0.9520000219345093]

Vidimo da smo dobili dosta dobre rezultate,ali nešto gore nego s adam optimizatorom.

###**9.POKUŠAJ**:jednostavan model,5 epoha,adam,binarna krosentropija(za početak),regularizacija(droput i batch normalizacija),promjenjiva stopa učenja

In [None]:
from keras.callbacks import EarlyStopping
from tensorflow.keras.layers import Dropout,BatchNormalization,Dense



model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv1D(32, kernel_size=5,input_shape=(X_train.shape[1],1), activation='relu'))
model.add(BatchNormalization())
model.add(tf.keras.layers.MaxPool1D(pool_size=2))                                                               # pool_size - dimenzija pooling okvira, po defaultu strides=pool_size
model.add(tf.keras.layers.Conv1D(64, kernel_size=5, activation='relu'))
model.add(BatchNormalization())

model.add(tf.keras.layers.MaxPool1D(pool_size=2))
model.add(Dropout(0.25))

model.add(tf.keras.layers.Flatten())

model.add(tf.keras.layers.Dense(16, activation = 'relu'))
model.add(Dropout(0.5))

model.add(tf.keras.layers.Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

In [None]:
history = model.fit(X_train, y_train, epochs=5)


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [None]:
model.evaluate(X_test, y_test)




[0.4334709346294403, 0.8894285559654236]

Vidimo da batch normalizacija daje lošije rezultate.

###**10.POKUŠAJ**:jednostavan model,5 epoha,adam,binarna krosentropija(za početak),regularizacija(droput),veći broj filtera u slojevima

In [None]:
from tensorflow.keras.optimizers import RMSprop
from keras.callbacks import EarlyStopping
from tensorflow.keras.layers import Dropout,BatchNormalization,Dense



model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv1D(64, kernel_size=5,input_shape=(X_train.shape[1],1), activation='relu'))
model.add(tf.keras.layers.MaxPool1D(pool_size=2))                                                               # pool_size - dimenzija pooling okvira, po defaultu strides=pool_size
model.add(tf.keras.layers.Conv1D(128, kernel_size=5, activation='relu'))
model.add(tf.keras.layers.MaxPool1D(pool_size=2))
model.add(Dropout(0.25))

model.add(tf.keras.layers.Flatten())

model.add(tf.keras.layers.Dense(32, activation = 'relu'))
model.add(Dropout(0.5))

model.add(tf.keras.layers.Dense(1, activation='sigmoid'))



model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

In [None]:
history = model.fit(X_train, y_train, epochs=5)


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [None]:
model.evaluate(X_test, y_test)




[0.1261330544948578, 0.9568571448326111]

Dobili smo najveću točnost do sada.

###**11.POKUŠAJ**:jednostavan model,5 epoha,adam,binarna krosentropija(za početak),regularizacija(droput),veći broj filtera u slojevima,još jedan konvolucijskei sloj

In [None]:
from tensorflow.keras.optimizers import RMSprop
from keras.callbacks import EarlyStopping
from tensorflow.keras.layers import Dropout,BatchNormalization,Dense



model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv1D(64, kernel_size=5,input_shape=(X_train.shape[1],1), activation='relu'))
model.add(tf.keras.layers.MaxPool1D(pool_size=2))                                                               # pool_size - dimenzija pooling okvira, po defaultu strides=pool_size
model.add(tf.keras.layers.Conv1D(128, kernel_size=5, activation='relu'))
model.add(tf.keras.layers.MaxPool1D(pool_size=2))
model.add(tf.keras.layers.Conv1D(256, kernel_size=5, activation='relu'))
model.add(tf.keras.layers.MaxPool1D(pool_size=2))
model.add(Dropout(0.25))

model.add(tf.keras.layers.Flatten())

model.add(tf.keras.layers.Dense(32, activation = 'relu'))
model.add(Dropout(0.5))

model.add(tf.keras.layers.Dense(1, activation='sigmoid'))



model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

In [None]:
history = model.fit(X_train, y_train, epochs=5)


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [None]:
model.evaluate(X_test, y_test)




[0.1568034291267395, 0.9455714225769043]

Vidimo da smo dobili nešto lošije rezultate.

**12.POKUŠAJ**:jednostavan model,5 epoha,adam,binarna krosentropija(za početak),regularizacija(droput),veći broj filtera u slojevima,average pooling

In [None]:
from tensorflow.keras.optimizers import RMSprop
from keras.callbacks import EarlyStopping
from tensorflow.keras.layers import Dropout,BatchNormalization,Dense,AveragePooling1D



model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv1D(64, kernel_size=5,input_shape=(X_train.shape[1],1), activation='relu'))
model.add(tf.keras.layers.AveragePooling1D(pool_size=2))                                                               # pool_size - dimenzija pooling okvira, po defaultu strides=pool_size
model.add(tf.keras.layers.Conv1D(128, kernel_size=5, activation='relu'))
model.add(tf.keras.layers.AveragePooling1D(pool_size=2))
model.add(Dropout(0.25))

model.add(tf.keras.layers.Flatten())

model.add(tf.keras.layers.Dense(32, activation = 'relu'))
model.add(Dropout(0.5))

model.add(tf.keras.layers.Dense(1, activation='sigmoid'))



model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

In [None]:
history = model.fit(X_train, y_train, epochs=5)


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [None]:
model.evaluate(X_test, y_test)




[0.12849631905555725, 0.9548571705818176]

Dosta dobri rezultati,nešto niža točnost od najbolje točnosti,ali je loss funkcija manja.

**13.POKUŠAJ**:jednostavan model,5 epoha,adam,binarna krosentropija(za početak),regularizacija(droput),veći broj filtera u slojevima,kernel size=3

In [None]:
from tensorflow.keras.optimizers import RMSprop
from keras.callbacks import EarlyStopping
from tensorflow.keras.layers import Dropout,BatchNormalization,Dense



model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv1D(64, kernel_size=3,input_shape=(X_train.shape[1],1), activation='relu'))
model.add(tf.keras.layers.MaxPool1D(pool_size=2))                                                               # pool_size - dimenzija pooling okvira, po defaultu strides=pool_size
model.add(tf.keras.layers.Conv1D(128, kernel_size=3, activation='relu'))
model.add(tf.keras.layers.MaxPool1D(pool_size=2))
model.add(Dropout(0.25))

model.add(tf.keras.layers.Flatten())

model.add(tf.keras.layers.Dense(32, activation = 'relu'))
model.add(Dropout(0.5))

model.add(tf.keras.layers.Dense(1, activation='sigmoid'))



model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

In [None]:
history = model.fit(X_train, y_train, epochs=5)


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [None]:
model.evaluate(X_test, y_test)




[0.14169591665267944, 0.956428587436676]

Ipak nešto lošija točnost nego prije.

**14.POKUŠAJ**:jednostavan model,5 epoha,adam,binarna krosentropija(za početak),regularizacija(droput),veći broj filtera u slojevima,padding=same

In [None]:
from tensorflow.keras.optimizers import RMSprop
from keras.callbacks import EarlyStopping
from tensorflow.keras.layers import Dropout,BatchNormalization,Dense



model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv1D(64, kernel_size=5,input_shape=(X_train.shape[1],1), activation='relu',padding='same'))
model.add(tf.keras.layers.MaxPool1D(pool_size=2))                                                               # pool_size - dimenzija pooling okvira, po defaultu strides=pool_size
model.add(tf.keras.layers.Conv1D(128, kernel_size=5, activation='relu',padding='same'))
model.add(tf.keras.layers.MaxPool1D(pool_size=2))
model.add(Dropout(0.25))

model.add(tf.keras.layers.Flatten())

model.add(tf.keras.layers.Dense(32, activation = 'relu'))
model.add(Dropout(0.5))

model.add(tf.keras.layers.Dense(1, activation='sigmoid'))



model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

In [None]:
history = model.fit(X_train, y_train, epochs=5)


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [None]:
model.evaluate(X_test, y_test)




[0.1342094987630844, 0.9599999785423279]

Dobili smo najbolju točnost i najmanji gubitak!

**15.POKUŠAJ**:jednostavan model,5 epoha,adam,binarna krosentropija(za početak),regularizacija(dropout),veći broj filtera u slojevima,padding=same,strides=2

In [None]:
from tensorflow.keras.optimizers import RMSprop
from keras.callbacks import EarlyStopping
from tensorflow.keras.layers import Dropout,BatchNormalization,Dense



model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Conv1D(64, kernel_size=5,input_shape=(X_train.shape[1],1), activation='relu',padding='same',strides=2))
model.add(tf.keras.layers.MaxPool1D(pool_size=2))                                                               # pool_size - dimenzija pooling okvira, po defaultu strides=pool_size
model.add(tf.keras.layers.Conv1D(128, kernel_size=5, activation='relu',padding='same',strides=2))
model.add(tf.keras.layers.MaxPool1D(pool_size=2))
model.add(Dropout(0.25))

model.add(tf.keras.layers.Flatten())

model.add(tf.keras.layers.Dense(32, activation = 'relu'))
model.add(Dropout(0.5))

model.add(tf.keras.layers.Dense(1, activation='sigmoid'))



model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

In [None]:
history = model.fit(X_train, y_train, epochs=5)


Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


In [None]:
model.evaluate(X_test, y_test)




[0.17190752923488617, 0.9518571496009827]

Vidimo da je bolja točnost u prethodnom primjeru.

Najbolju točnost dobili smo u 14.pokušaju.Sada ćemo izračunati f1_score tog modela kako bismo ga mogli usporediti s onim kod unaprijednih neuronskih mreža i na taj način definitivno zaključiti koji je najbolji model.

In [None]:
from sklearn.metrics import f1_score

y_pred = model.predict(X_test)
y_pred_classes = tf.round(y_pred)

# Izračunavanje F1-score
f1 = f1_score(y_test, y_pred_classes)
f1



0.9601933466022179

Dakle,vidimo da smo ponovno dobili jako velik f1 score te zaključujemo da je i ovaj model jako uspješan za naš primjer.Možda je nešto uspješniji model s unaprijednim neuronskim mrežama,ali razlike su jako male pa mislim da ne bismo pogriješili koji god model odaberemo.