## <center>Credibility of the News - Training Models</center>

### Data Manipulation

In [5]:
import pandas as pd

#### Data loading

In [6]:
df=pd.read_csv("train.csv")

#### Preview/Description/Data Information

In [7]:
df.head(10)

Unnamed: 0,id,title,author,text,label
0,0,House Dem Aide: We Didn’t Even See Comey’s Let...,Darrell Lucus,House Dem Aide: We Didn’t Even See Comey’s Let...,1
1,1,"FLYNN: Hillary Clinton, Big Woman on Campus - ...",Daniel J. Flynn,Ever get the feeling your life circles the rou...,0
2,2,Why the Truth Might Get You Fired,Consortiumnews.com,"Why the Truth Might Get You Fired October 29, ...",1
3,3,15 Civilians Killed In Single US Airstrike Hav...,Jessica Purkiss,Videos 15 Civilians Killed In Single US Airstr...,1
4,4,Iranian woman jailed for fictional unpublished...,Howard Portnoy,Print \nAn Iranian woman has been sentenced to...,1
5,5,Jackie Mason: Hollywood Would Love Trump if He...,Daniel Nussbaum,"In these trying times, Jackie Mason is the Voi...",0
6,6,Life: Life Of Luxury: Elton John’s 6 Favorite ...,,Ever wonder how Britain’s most iconic pop pian...,1
7,7,Benoît Hamon Wins French Socialist Party’s Pre...,Alissa J. Rubin,"PARIS — France chose an idealistic, traditi...",0
8,8,Excerpts From a Draft Script for Donald Trump’...,,Donald J. Trump is scheduled to make a highly ...,0
9,9,"A Back-Channel Plan for Ukraine and Russia, Co...",Megan Twohey and Scott Shane,A week before Michael T. Flynn resigned as nat...,0


In [8]:
df.describe()

Unnamed: 0,id,label
count,20800.0,20800.0
mean,10399.5,0.500625
std,6004.587135,0.500012
min,0.0,0.0
25%,5199.75,0.0
50%,10399.5,1.0
75%,15599.25,1.0
max,20799.0,1.0


In [9]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20800 entries, 0 to 20799
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   id      20800 non-null  int64 
 1   title   20242 non-null  object
 2   author  18843 non-null  object
 3   text    20761 non-null  object
 4   label   20800 non-null  int64 
dtypes: int64(2), object(3)
memory usage: 812.6+ KB


#### Checking the dataset

__Check columns for missing data__

In [10]:
df.isnull().sum()

id           0
title      558
author    1957
text        39
label        0
dtype: int64

__Fill in missing data values__

In [11]:
df=df.fillna('')

In [12]:
df.isnull().sum()

id        0
title     0
author    0
text      0
label     0
dtype: int64

__Deletion "data" and "frame" columns__

In [13]:
df.columns

Index(['id', 'title', 'author', 'text', 'label'], dtype='object')

In [14]:
df=df.drop(['id', 'title', 'author'], axis=1)

In [15]:
df.head()

Unnamed: 0,text,label
0,House Dem Aide: We Didn’t Even See Comey’s Let...,1
1,Ever get the feeling your life circles the rou...,0
2,"Why the Truth Might Get You Fired October 29, ...",1
3,Videos 15 Civilians Killed In Single US Airstr...,1
4,Print \nAn Iranian woman has been sentenced to...,1


### Preparation of the text

In [16]:
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
import re
import nltk

__Initialisation of the stemming facility__

In [17]:
port_stem=PorterStemmer()

In [18]:
port_stem

<PorterStemmer>

#### Reliability prediction function

In [19]:
def stemming(content):
    con=re.sub('[^a-zA-Z]', ' ', content)
    con=con.lower()
    con=con.split()
    con=[port_stem.stem(word) for word in con if not word in stopwords.words('english')]
    con=' '.join(con)
    return con

__Performing stemming on all elements of the "text" column.__

In [20]:
from tqdm import tqdm
tqdm.pandas()

In [21]:
df['text']= df['text'].progress_apply(stemming)

100%|████████████████████████████████████████████████████████████████████████████| 20800/20800 [57:13<00:00,  6.06it/s]


In [22]:
x=df['text']

In [23]:
y=df['label']

In [24]:
y.shape

(20800,)

### Import of libraries for ML

In [36]:
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.tree import DecisionTreeClassifier
import pickle

__Division of data into training and test sets__

In [26]:
x_train , x_test , y_train, y_test = train_test_split(x, y, test_size=0.20)

__Preparation of text features using TF-IDF__

In [27]:
vect=TfidfVectorizer()

In [28]:
x_train=vect.fit_transform(x_train)
x_test=vect.transform(x_test)

__Checking the shape of the text data__

In [29]:
x_test.shape

(4160, 98388)

__Creation and training of a decision tree classifier model__

In [30]:
model=DecisionTreeClassifier()

In [31]:
model.fit(x_train, y_train)

__Predictive labelling on text data__

In [32]:
prediction=model.predict(x_test)

In [33]:
prediction

array([1, 1, 0, ..., 0, 1, 0], dtype=int64)

__Assessment of model accuracy__

In [34]:
model.score(x_test, y_test)

0.8790865384615385

__Saving the vector and model to files__

In [37]:
pickle.dump(vect, open('vector.pkl', 'wb'))

In [38]:
pickle.dump(model, open('model.pkl', 'wb'))

__Loading vector and model from files__

In [39]:
vector_form=pickle.load(open('vector.pkl', 'rb'))

In [40]:
load_model=pickle.load(open('model.pkl', 'rb'))

### Test

In [74]:
def check_news(news):
    news=stemming(news)
    input_data=[news]
    vector_form1=vector_form.transform(input_data)
    prediction = load_model.predict(vector_form1)
    return prediction

__Reading in the text to be checked__

Demonstration of the operation of the model using the example of an article from the website https://www.bbc.com/news/world-middle-east-68766592

Assumption made for Article 1 - credible.

In [76]:
val=check_news("""US President Joe Biden has said he believes that Israel's Prime Minister Benjamin Netanyahu is making a "mistake" in his handling of Gaza.

"I think what he's doing is a mistake. I don't agree with his approach," he said in an interview.

He said Gaza should have "total access to all food and medicine" for the next six to eight weeks.

Last week he warned ongoing US support for the war depended on Israel allowing in more food and medicine.

Israel has denied impeding the entry of aid or its distribution inside Gaza, and has accused UN agencies on the ground of failing to get the aid that is allowed in to the people who need it.

Weeks of talks have failed to produce a ceasefire agreement but international pressure is growing.

0:35
Watch: Biden says Netanyahu making 'a mistake' in Gaza
The hour-long interview was recorded last Wednesday - days after Israeli military strikes killed seven aid workers with World Central Kitchen - and it aired on Tuesday night on US Spanish-language network Univision.

Mr Biden said it was "outrageous" how the aid organisation's vehicles had been "hit by drones and taken out on a highway".

US pressure on Israel not enough, say dissenting officials
What we know about Israeli strike on aid convoy
Six months on, how close is Israel to eliminating Hamas?
The Israel Defense Forces have since said "grave mistakes" led to the fatal targeting of the workers. An inquiry led to two senior officers being dismissed.

In the interview Mr Biden said: "What I'm calling for is for the Israelis to just call for a ceasefire, allow for the next six, eight weeks, total access to all food and medicine going into the country."

The president has previously said Hamas must agree to a pause and release remaining hostages.

Israel said recently that it would open a crossing to northern Gaza and a deep water port, to allow more aid to flow into the area. It has not yet detailed when or how these routes will operate.

Mr Biden is facing domestic pressure over Israel. Over the past weeks he has sharpened his rhetoric, including towards Mr Netanyahu, over the conduct of the war which has now lasted six months.

Meanwhile, military supplies including bombs, missiles and ammunition have continued to flow from the US to Israel uninterrupted.

Hamas-led gunmen attacked southern Israeli border communities on 7 October, killing 1,200 people and taking more than 250 hostage.

Israel says that of 130 hostages still in Gaza, at least 34 are dead.

More than 33,000 Gazans, the majority of them civilians, have been killed during Israel's offensive in Gaza since the October attack, the Hamas-run health ministry says.""")

Article 2 generated by Gemini.google.com.app based on examples of fake news

Assumptions for article - Not credible

In [78]:
val=check_news("""The "Delta Plus" variant is more contagious and deadly for children than previous variants. Vaccines are not effective against this variant!

Symptoms include high fever, cough, difficulty breathing, chest pain, and bruising on the skin. If your child has any of these symptoms, contact a doctor immediately!

The government recommends that all children under 12 stay home and avoid contact with others. Schools and kindergartens will be closed until further notice.

This is a very serious situation! We must protect our children! Share this message with everyone you know!

Here's what you can do to protect your children: frequent handwashing, social distancing, and wearing masks when absolutely necessary. When unsure about any symptoms, it's always better to err on the side of caution and seek medical attention.

Talk to your children about the importance of hygiene and staying healthy during this time. Stock up on essential supplies like pain relievers, thermometers, and kid-friendly masks (if appropriate for your child's age).

Let your children know you're there for them and answer any questions they may have in a calm and reassuring way. Consider creating a fun and safe indoor activity schedule to keep your children entertained while they're at home.

Check in with friends, family, and neighbors who have children, especially those who may need extra support. Many local organizations offer online resources and activities for children stuck at home. Explore these options!

If you work from home, consider creating a dedicated workspace to minimize disruptions for both you and your children. Don't be afraid to ask for help! Childcare resources are available for those who qualify.

Remember, you're not alone in this. Many parents are facing similar challenges during this pandemic. Stay informed! Follow updates from trusted health organizations for the latest information on the Delta Plus variant.

If you suspect your child has been exposed to the virus, isolate them at home and monitor for symptoms. Be patient with yourself and your children. This is a stressful time for everyone.

Take care of your own mental and physical health so you can best care for your children. Practice relaxation techniques like deep breathing or meditation to manage stress and anxiety.

Let's work together to keep our communities safe and healthy. Social distancing is key! We can overcome this challenge by following recommended guidelines and supporting each other.

Stay positive and focus on the things you can control. This too shall pass. In the meantime, cherish this extra time spent with your children. Create lasting memories!

Let's use this opportunity to bond as a family and teach our children valuable life lessons about resilience. Show your children extra love and affection during this uncertain time.

Let them know how much you love them and how strong they are. Together, we will get through this! Share this message with everyone you know so we can protect our children and ensure a brighter future for them.

""")

### Result

Article 1

In [77]:
if val==[0]:
    print('Wiarygodny')
else:
    print('Nie Wiarygodny')

Wiarygodny


Article 2

In [79]:
if val==[0]:
    print('Wiarygodny')
else:
    print('Nie Wiarygodny')

Nie Wiarygodny
