The repository for the "Deep Learning Methods for Quotable Text" article published in my blog here.
The base directory contains the model and the code required to train it
data/ contains the quotes, and GloVe embeddings
scraping/ contains all the code for acquiring the relevant dataset
reference-papers/ contains a copy of the papers and some additional reading materials I had referenced in the post
- LitQuotes.com - Over 2800 Literary Quotes website
- QuotationsPage.com - Quotes and Famous Sayings
- You had me at hello: How phrasing affects memorability, Cristian Danescu-Niculescu-Mizil, Justin Cheng, Jon Kleinberg and Lillian Lee, Proceedings of ACL, 2012
- "Echoes of Persuasion: The Effect of Euphony in Persuasive Communication" by Guerini, M., Gozde, O., & Strapparava, C. HLT-NAACL, page 1483-1493. The Association for Computational Linguistics (2015).
- News Aggregator Dataset
- Project Gutenberg
keras
tensorflow (CUDA and tensorflow-gpu for GPU training)
nltk/spacy (for Dataset preprocessing and sentence pairing)
BeautifulSoup (for scraping the websites for quotes)
python litquotes_scraper.py
python quotationspage_scraper.py
cd litquotes/
copy /b *.txt ../quotes.txt
cd ../quotationspage
copy /b *.txt ../quotes.txt
Configure the parameters in configuration.py
then run
python main.py