Fake News Generator

Explanation

The code contained in this repo is an attempt to use modern techniques in artificial intelligence and machine learning to create fake news headlines. The essence of this process is a dataset of a million real headlines[1] and a Long Short Term Memory (LSTM) neural network trained at a character level on two subsets of this text[2]. There is an additional sentiment analysis component to speed up training time for the network and allow for some control over what kind of headlines are produced. The Vader[3] pre-trained sentiment analyzer was used for the purpose of seperating headlines based on positive or negative sentiment and seperate LSTM networks were trained on each of these sets. Producing a new headline starts with a randomly selected seed text from the existing headlines, and a generation of n characters based on that seed text. A diversity parameter controls the 'randomness' of next character chosen, based on conditional probability given the already generated text.

Dependencies

Python Packages:

Reportlab
Tensorflow
Keras
Numpy
Pandas
Sklearn

Implementation

See Natural Langauge Generation.ipynb and main.ipynb

Usage Tutorial

THIS PROGRAM BUILT AND TESTED WITH PYTHON 3 - MAC (OS X El Capitan)

https://youtu.be/TFCXgz-q3N0

In case the above process doesn't work, and you are familiar enough with Python to have installed the dependencies above, you can simply run quick_generate.py from terminal.

Included Functions

clean_sentence(sentence, shortstop=False)

INPUTS

-sentence = string -shortstop = boolean

OUTPUT

-string

This function cleans the input by removing numbers, non-english words/characters. If is True, then this function truncates the sentence after the last english stopword, this is done to try to reduce incomplete clauses caused by setting the length argument in get_headline too small.

get_headline(seed='The', sentiment = 'positive', lenght=50, diversity=0.2)

INPUTS

-sentence = string -shortstop = boolean

OUTPUT

-string

This is the central function of this program, it uses positive or negative_model.json and negative_model.h5 to recreate the trained neural network and then uses sample() to generate new text (character-by-character). The specifies the desired length (in characters) of the generated text, while controls whether sampling should be done from the model trained on negative headlines or positive headlines, and controls the degree of deviation from the estimated distribution (essentially, the randomness) in the generated text.

BingImageSearch(search)

INPUTS

-search = string

OUTPUT

-HTTPObject

This function queries the Bing Image API and returns a HTTPObject from the first result for the string.

save_image(http_response, filepath='image_result.jpg')

INPUTS

-http_response = HTTPObject -filepath = string (valid file location)

OUTPUT

-.jpg file

This function operates on the output of BingImageSearch() to convert the returned HTTPObject into an image file at

generate_headline_document(headline_text, headline_image, filename='new_headline.pdf')

INPUTS

-headline_text = string -headline_image = string (valid file location) -filename = string (valid file location)

OUTPUT

-.pdf file

This function uses the Reportlab library to generate a .pdf file with the <headline_text> string and <headline_image> image file at . This is mainly a helper function for generate_fake_news().

generate_fake_news(sentiment)

INPUTS

-sentiment = string ('positive','negative')

OUTPUT

-.pdf file

This function uses all the functions above to generate a .pdf file of a fake news report at new_headline.pdf (unless the default above is changed). A sample output can be seen in this repo at new_headline.pdf.

Files

generate_fake_news.ipynb:

Main notebook used to produce fake news headlines. Depends on the existence of all other files in the repo except for main.ipynb. Final output is a .pdf file with generated headline and corresponding image queried from Bing Image search. Includes script to install python dependencies in first cell of notebook.

positive_headlines_mini.csv

Data file for headlines with a Vader aggregate sentiment score >= 0. Data is comma-seperated with dimensions (554712,2). The 'mini' addition at the end is to indicate this is ~90% of the actual positive headline data used to train the model, this is to stay under the 25mb github upload limit. This does not affect prediction and since the model is not trained nearly to max performance, should minimally affect model training. In the future, would like to host all training data on Amazon S3, however a current attempt at this introduces too many dependencies.

negative_headlines.csv

Data file for headlines with a Vader aggregate sentiment score < 0. Data is comma-seperated with dimensions (). This is the full set of headlines with negative sentiment score that the model was trained on.

negative_model.json

This file contains the architecture (number of layers, cell types, layer sizes etc.) of the neural network trained on negative_headlines.csv. Can be used to initiate a model through the Keras package in Python

negative_model.h5

This file contains the actual connection weights for the neural network architecture in the corresponding .json file

positive_model.json

See above; architecture of neural network trained on positive_headlines.csv

positive_model.h5

See above; connection weights of neural network described in above .json

headline_image.jpg

Image returned from querying Bing for the generated headline

new_headline.pdf

Sample output generated from the negative_model

Profile_Picture.jpg

Placeholder image in case API is experiancing an intermittent failure

quick_generate.py

This file is meant to be self-contained and run from terminal. Has same output as main program and easier to run if you don't want to go through Jupyter notebook

References

[1] https://www.kaggle.com/therohk/million-headlines

[2] https://github.com/keras-team/keras/blob/master/examples/lstm_text_generation.py

[3] https://github.com/cjhutto/vaderSentiment

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
Natural Langauge Generation.ipynb		Natural Langauge Generation.ipynb
Profile_Picture.jpg		Profile_Picture.jpg
README.md		README.md
generate_fake_news.ipynb		generate_fake_news.ipynb
headline_image.jpg		headline_image.jpg
main.ipynb		main.ipynb
negative_headlines.csv		negative_headlines.csv
negative_model.h5		negative_model.h5
negative_model.json		negative_model.json
neural_network_dense.png		neural_network_dense.png
neural_network_single.png		neural_network_single.png
new_headline.pdf		new_headline.pdf
positive_headlines_mini .csv		positive_headlines_mini .csv
positive_model.h5		positive_model.h5
positive_model.json		positive_model.json
quick_generate.py		quick_generate.py
xy_plot.jpg		xy_plot.jpg
xy_plot2.jpeg		xy_plot2.jpeg

ShairozS/Fake_News_Generator

Folders and files

Latest commit

History

Repository files navigation

Fake News Generator

Explanation

Dependencies

Implementation

Usage Tutorial

Included Functions

Files

References

About

Resources

Stars

Watchers

Forks

Languages