Detecting Fake News

The Problem

Fake news is the intentional spread of misinformation it has been in our society since the days of the printing press. Technology such as social media and news writing bots have increased both the spread and the amount of intentionally misleading articles. Being able to detect if an article is genuine or not is a difficult task for many of my Facebook friends, maybe machine learning and NLP will be able to help.

Reporters with various forms of "fake news" from an 1894 illustration by Frederick Burr Opper

Raw Data

The dataset was sourced from Kaggle. It comes in two separate csv files one containing Fake news articles and the other with True news articles both with ~20,000 articles the dataset was fairly evenly split at 51/49. Because of this even split I will be using accuracy as my scoring metric.

Fake News

title text subject date
0 Donald Trump Sends Out Embarrassing New Year’s Eve Message; This is Disturbing Donald Trump just couldn t wish all Americans a Happy New Year and leave it at that. Instead, he had to give a shout out to his enemies, haters and the very dishonest fake news media. The former reality show star had just one job to do and he couldn t do it. As our Country rapidly grows stronger and smarter, I want to wish all of my friends, supporters, enemies, haters, and even the very dishonest Fake News Media, a Happy and Healthy New Year, President Angry Pants tweeted. 2018 will be a great year for America! As our Country rapidly grows stronger and smarter, I want to wish all of my friends, supporters, enemies, haters, and even the very dishonest Fake News Media, a Happy and Healthy New Year. 2018 will be a great year for America! Donald J. Trump (@realDonaldTrump) December 31, 2017Trump s tweet went down about as welll as you d expect.What kind of president sends a New Year s greeting like this despicable, petty, infantile gibberish? Only Trump! His lack of decency won t even allow him to rise above the gutter long enough to wish the American citizens a happy new year! Bishop Talbert Swan (@TalbertSwan) December 31, 2017no one likes you Calvin (@calvinstowell) December 31, 2017Your impeachment would make 2018 a great year for America, but I ll also accept regaining control of Congress. Miranda Yaver (@mirandayaver) December 31, 2017Do you hear yourself talk? When you have to include that many people that hate you you have to wonder? Why do the they all hate me? Alan Sandoval (@AlanSandoval13) December 31, 2017Who uses the word Haters in a New Years wish?? Marlene (@marlene399) December 31, 2017You can t just say happy new year? Koren pollitt (@Korencarpenter) December 31, 2017Here s Trump s New Year s Eve tweet from 2016.Happy New Year to all, including to my many enemies and those who have fought me and lost so badly they just don t know what to do. Love! Donald J. Trump (@realDonaldTrump) December 31, 2016This is nothing new for Trump. He s been doing this for years.Trump has directed messages to his enemies and haters for New Year s, Easter, Thanksgiving, and the anniversary of 9/11. Daniel Dale (@ddale8) December 31, 2017Trump s holiday tweets are clearly not presidential.How long did he work at Hallmark before becoming President? Steven Goodine (@SGoodine) December 31, 2017He s always been like this . . . the only difference is that in the last few years, his filter has been breaking down. Roy Schulze (@thbthttt) December 31, 2017Who, apart from a teenager uses the term haters? Wendy (@WendyWhistles) December 31, 2017he s a fucking 5 year old Who Knows (@rainyday80) December 31, 2017So, to all the people who voted for this a hole thinking he would change once he got into power, you were wrong! 70-year-old men don t change and now he s a year older.Photo by Andrew Burton/Getty Images. News December 31, 2017
1 Drunk Bragging Trump Staffer Started Russian Collusion Investigation House Intelligence Committee Chairman Devin Nunes is going to have a bad day. He s been under the assumption, like many of us, that the Christopher Steele-dossier was what prompted the Russia investigation so he s been lashing out at the Department of Justice and the FBI in order to protect Trump. As it happens, the dossier is not what started the investigation, according to documents obtained by the New York Times.Former Trump campaign adviser George Papadopoulos was drunk in a wine bar when he revealed knowledge of Russian opposition research on Hillary Clinton.On top of that, Papadopoulos wasn t just a covfefe boy for Trump, as his administration has alleged. He had a much larger role, but none so damning as being a drunken fool in a wine bar. Coffee boys don t help to arrange a New York meeting between Trump and President Abdel Fattah el-Sisi of Egypt two months before the election. It was known before that the former aide set up meetings with world leaders for Trump, but team Trump ran with him being merely a coffee boy.In May 2016, Papadopoulos revealed to Australian diplomat Alexander Downer that Russian officials were shopping around possible dirt on then-Democratic presidential nominee Hillary Clinton. Exactly how much Mr. Papadopoulos said that night at the Kensington Wine Rooms with the Australian, Alexander Downer, is unclear, the report states. But two months later, when leaked Democratic emails began appearing online, Australian officials passed the information about Mr. Papadopoulos to their American counterparts, according to four current and former American and foreign officials with direct knowledge of the Australians role. Papadopoulos pleaded guilty to lying to the F.B.I. and is now a cooperating witness with Special Counsel Robert Mueller s team.This isn t a presidency. It s a badly scripted reality TV show.Photo by Win McNamee/Getty Images. News December 31, 2017


True News

title text subject date
0 As U.S. budget fight looms, Republicans flip their fiscal script WASHINGTON (Reuters) - The head of a conservative Republican faction in the U.S. Congress, who voted this month for a huge expansion of the national debt to pay for tax cuts, called himself a “fiscal conservative” on Sunday and urged budget restraint in 2018. In keeping with a sharp pivot under way among Republicans, U.S. Representative Mark Meadows, speaking on CBS’ “Face the Nation,” drew a hard line on federal spending, which lawmakers are bracing to do battle over in January. When they return from the holidays on Wednesday, lawmakers will begin trying to pass a federal budget in a fight likely to be linked to other issues, such as immigration policy, even as the November congressional election campaigns approach in which Republicans will seek to keep control of Congress. President Donald Trump and his Republicans want a big budget increase in military spending, while Democrats also want proportional increases for non-defense “discretionary” spending on programs that support education, scientific research, infrastructure, public health and environmental protection. “The (Trump) administration has already been willing to say: ‘We’re going to increase non-defense discretionary spending ... by about 7 percent,’” Meadows, chairman of the small but influential House Freedom Caucus, said on the program. “Now, Democrats are saying that’s not enough, we need to give the government a pay raise of 10 to 11 percent. For a fiscal conservative, I don’t see where the rationale is. ... Eventually you run out of other people’s money,” he said. Meadows was among Republicans who voted in late December for their party’s debt-financed tax overhaul, which is expected to balloon the federal budget deficit and add about $1.5 trillion over 10 years to the $20 trillion national debt. “It’s interesting to hear Mark talk about fiscal responsibility,” Democratic U.S. Representative Joseph Crowley said on CBS. Crowley said the Republican tax bill would require the United States to borrow $1.5 trillion, to be paid off by future generations, to finance tax cuts for corporations and the rich. “This is one of the least ... fiscally responsible bills we’ve ever seen passed in the history of the House of Representatives. I think we’re going to be paying for this for many, many years to come,” Crowley said. Republicans insist the tax package, the biggest U.S. tax overhaul in more than 30 years, will boost the economy and job growth. House Speaker Paul Ryan, who also supported the tax bill, recently went further than Meadows, making clear in a radio interview that welfare or “entitlement reform,” as the party often calls it, would be a top Republican priority in 2018. In Republican parlance, “entitlement” programs mean food stamps, housing assistance, Medicare and Medicaid health insurance for the elderly, poor and disabled, as well as other programs created by Washington to assist the needy. Democrats seized on Ryan’s early December remarks, saying they showed Republicans would try to pay for their tax overhaul by seeking spending cuts for social programs. But the goals of House Republicans may have to take a back seat to the Senate, where the votes of some Democrats will be needed to approve a budget and prevent a government shutdown. Democrats will use their leverage in the Senate, which Republicans narrowly control, to defend both discretionary non-defense programs and social spending, while tackling the issue of the “Dreamers,” people brought illegally to the country as children. Trump in September put a March 2018 expiration date on the Deferred Action for Childhood Arrivals, or DACA, program, which protects the young immigrants from deportation and provides them with work permits. The president has said in recent Twitter messages he wants funding for his proposed Mexican border wall and other immigration law changes in exchange for agreeing to help the Dreamers. Representative Debbie Dingell told CBS she did not favor linking that issue to other policy objectives, such as wall funding. “We need to do DACA clean,” she said. On Wednesday, Trump aides will meet with congressional leaders to discuss those issues. That will be followed by a weekend of strategy sessions for Trump and Republican leaders on Jan. 6 and 7, the White House said. Trump was also scheduled to meet on Sunday with Florida Republican Governor Rick Scott, who wants more emergency aid. The House has passed an $81 billion aid package after hurricanes in Florida, Texas and Puerto Rico, and wildfires in California. The package far exceeded the $44 billion requested by the Trump administration. The Senate has not yet voted on the aid. politicsNews December 31, 2017
1 U.S. military to accept transgender recruits on Monday: Pentagon WASHINGTON (Reuters) - Transgender people will be allowed for the first time to enlist in the U.S. military starting on Monday as ordered by federal courts, the Pentagon said on Friday, after President Donald Trump’s administration decided not to appeal rulings that blocked his transgender ban. Two federal appeals courts, one in Washington and one in Virginia, last week rejected the administration’s request to put on hold orders by lower court judges requiring the military to begin accepting transgender recruits on Jan. 1. A Justice Department official said the administration will not challenge those rulings. “The Department of Defense has announced that it will be releasing an independent study of these issues in the coming weeks. So rather than litigate this interim appeal before that occurs, the administration has decided to wait for DOD’s study and will continue to defend the president’s lawful authority in District Court in the meantime,” the official said, speaking on condition of anonymity. In September, the Pentagon said it had created a panel of senior officials to study how to implement a directive by Trump to prohibit transgender individuals from serving. The Defense Department has until Feb. 21 to submit a plan to Trump. Lawyers representing currently-serving transgender service members and aspiring recruits said they had expected the administration to appeal the rulings to the conservative-majority Supreme Court, but were hoping that would not happen. Pentagon spokeswoman Heather Babb said in a statement: “As mandated by court order, the Department of Defense is prepared to begin accessing transgender applicants for military service Jan. 1. All applicants must meet all accession standards.” Jennifer Levi, a lawyer with gay, lesbian and transgender advocacy group GLAD, called the decision not to appeal “great news.” “I’m hoping it means the government has come to see that there is no way to justify a ban and that it’s not good for the military or our country,” Levi said. Both GLAD and the American Civil Liberties Union represent plaintiffs in the lawsuits filed against the administration. In a move that appealed to his hard-line conservative supporters, Trump announced in July that he would prohibit transgender people from serving in the military, reversing Democratic President Barack Obama’s policy of accepting them. Trump said on Twitter at the time that the military “cannot be burdened with the tremendous medical costs and disruption that transgender in the military would entail.” Four federal judges - in Baltimore, Washington, D.C., Seattle and Riverside, California - have issued rulings blocking Trump’s ban while legal challenges to the Republican president’s policy proceed. The judges said the ban would likely violate the right under the U.S. Constitution to equal protection under the law. The Pentagon on Dec. 8 issued guidelines to recruitment personnel in order to enlist transgender applicants by Jan. 1. The memo outlined medical requirements and specified how the applicants’ sex would be identified and even which undergarments they would wear. The Trump administration previously said in legal papers that the armed forces were not prepared to train thousands of personnel on the medical standards needed to process transgender applicants and might have to accept “some individuals who are not medically fit for service.” The Obama administration had set a deadline of July 1, 2017, to begin accepting transgender recruits. But Trump’s defense secretary, James Mattis, postponed that date to Jan. 1, 2018, which the president’s ban then put off indefinitely. Trump has taken other steps aimed at rolling back transgender rights. In October, his administration said a federal law banning gender-based workplace discrimination does not protect transgender employees, reversing another Obama-era position. In February, Trump rescinded guidance issued by the Obama administration saying that public schools should allow transgender students to use the restroom that corresponds to their gender identity. politicsNews December 29, 2017

Word Clouds & EDA

Staring with word clouds

Fake news word cloud from title

True news word cloud from title

Exploring the text with VADER sentiment analysis

Starting off I explored using VADER sentiment analysis to derive features for my model to train on. VADER is a lexicon and rule-based sentiment analysis tool, which when given text outputs four different scores compound, positivity, negativity, and neutrality. I expected that True news would score higher on neutrality because news reports should be unbiased

Performing sentiment analysis on the full text yields some interesting results! True news scores a -1 compound score far less than Fake news and the neutrality score is higher on average!

First Model

Training a RandomForestClassifier on only the sentiment scores for the full text of the article produces an Accuracy score of .692! This is above what we would expect from just random guessing.

Sentiment analysis of Titles

Its a little hard to see the distribution with large spikes at the zero mark. However, there are far more True news articles that have a compound score of 0 and also scores higher on neutrality as anticipated. Also when graded on negativity fake news scores higher on average with a lot fewer articles scoring zero.

Removing Zeros:

With the large spikes removed it is easier to see the underlying distributions such as Fake news with far more compound scores in the negative, True news still with higher Neutrality scores and scoring slightly more positive and less non-zero negative scores.

Second Model

Again training a RandomForestClassifier but this time on the sentiment scores of just the article's title results in an impressive .763 accuracy!

Combining both sentiment of the title and full text of the article produces a .801 accuracy!!!


Training models on just the sentiment analysis of the articles is clearly effective but in doing so we throw out the largest source of data in the corpus; the actual words.


Using tf-idf we can change from words to numbers I chose this because it offers more information on how often a word is used not just in a single document but also how unique it is to the rest of the documents in the corpus

Stop words added: 'reuters' '21st', 'century', 'wire', '21wire', 'www', 'https', 'com', 'pic'

I chose these stop words because they were either part of links embedded into the article or a source of data leakage.

With the stop words out of the way it was time to vectorize the corpus.

Naive Bayes

Training a Naive Bayes model of the vectorized tf-idf and returning the top 12 features from the model we can see some interesting trends. Four out of the top six features for Fake news are people's names (Trump, Clinton, Obama, Hillary) while True news seems to larger groups of people (state, house, government).

Word Rank Fake News True News
1 Trump said
2 Clinton Trump
3 Obama president
4 people state
5 president house
6 Hillary government
7 just Washington
8 said republican
9 like united
10 Donald states
11 twitter north
12 news new

The Naive Bayes model performed surprisingly well with:

0.930 Accuracy

Bi-grams (two words)

Changing the model's hyper-parameters and looking at groups of two words together people's full names appear. We can also see that True news has far more credible sources such as actual statements and quotes while Fake news focuses on images and youtube for their sources.

Word Rank Fake News True News
1 Donald Trump United States
2 Hillary Clinton North Korea
3 featured image white house
4 white house Donald Trump
5 President Trump President Donald
6 President Obama Prime minister
7 United States said statement
8 getty images Islamic state
9 fox news told reporters
10 New York Trump said
11 year old New York
12 youtube watch Washington President

0.953 Accuracy

Tri-Grams (three sequential words)

Exploring tri-grams we see an interesting shift with True News featuring far more people's names than Fake news which is again relying on screenshots, videos and at the top of the list a twitter handle.

Word Rank Fake News True News
1 Donald Trump realdonaldtrump President Donald Trump
2 black lives matter President Barack Obama
3 New York Times Washington President Donald
4 president United States white house said
5 featured image video president elect Donald
6 video screen capture elect Donald Trump
7 image video screen President Vladimir Putin
8 President Barack Obama Prime Minister Theresa
9 featured image screenshot state Rex Tillerson
10 President Donald Trump secretary state Rex
11 New York City Donald Trump said
12 image screen capture Russian President Vladimir

0.971 Accuracy

Random Forest

Exploring a random forest using n-gram = 1 we see 'said' as the most important feature when determining Fake or Real

Feature Feature Importances
said 0.042
washington 0.012
featured .010
image 0.010
minister 0.007

This RandomForest produced an incredible
0.979 Accuracy


There are trends that appear in both fake and true news that make them distinguishable from each other with fake news relying on twitter, images and videos far more often than actual news. While true news relies far more on facts and actual statements. Fake news only exists to divide us as people.

Further work

I would like to explore this dataset further and perform topic modeling to see what topics do both Fake and Real news talk about.

I also would like to explore how well a quadra-gram (groups of four words) would perform and what effect changing the threshold.

I feel as though this dataset could be cleaned more and that dataleakage may have been the culprit for such high accuracies so further cleaning may be necessary.


