AugDocImg

Augmenting text documents with images

Goal

The objective of this project is to provide illustration images for a text document (a text article for instance) automatically.

The project has two steps : First we find the topic of the text. Then we find images relevant in the image-net database.

Improve the tf-idf : word frequency
Improve the tf-idf : plural
Be able to have a feed back on the images : select the good synsets and then fetch images in the hyponyms
Clustering on colors: Resnet: res convu => output vect taille 2028 => mettre vec dans ACP/PCA (ou TSNE)
Présenter résultats dans un notebook jupyter
Requirements
VirtualEnv
Default values

(transfer learning => reduction dimmension => clusterisation => systeme de recommendation)

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
RAKE		RAKE
texts		texts
tfidfData		tfidfData
.gitignore		.gitignore
FoxStoplist.txt		FoxStoplist.txt
README.md		README.md
Report_P3A (1).pdf		Report_P3A (1).pdf
Report_P3A (2).pdf		Report_P3A (2).pdf
SmartStoplist.txt		SmartStoplist.txt
count_words.txt		count_words.txt
creation_resnet_rpz.py		creation_resnet_rpz.py
explore.py		explore.py
imgdownloader.py		imgdownloader.py
main.py		main.py
requirements.txt		requirements.txt
showImages.py		showImages.py
synset_list.txt		synset_list.txt
tfidf.py		tfidf.py
tfidf_word_freq.py		tfidf_word_freq.py
urlFinder.py		urlFinder.py