Analysing Document Content Flow using Kohonen Networks
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
static
templates
uploads
.gitignore
DocumentRSOM.py
README.md
RSOM.py
Web1.png
Web5_preview.png
datab.py
main.py
readPDF.py
som_plot.py
web2.png

README.md

Intra-document content flow can be seen as the way in which the paragraphs in a document are linked, whether they are sufficiently related, that they are located in the correct area of the document. Current research largely deals with finding related documents in a document corpus but pays little attention to the quality of the documents in the corpus. Additional document-led methodologies may provide more granular insights about the documents within a document collection and could have use cases within education or the web. This paper uses Kohonen networks, a type of neural network, whose nodes' positions within the feature space are located based on the weights used to train the network.

Often termed a topographical feature map, Kohonen networks can be visualised within two-dimensional space despite the potentially much higher feature space. The dissertation also provides information related to current state-of-the-art natural language processing techniques and current Kohonen network use cases.

This project makes use of web interfaces as a means of displaying the results of the content flow detected by the Kohonen network, GPU computing in an attempt to speed up the training time of the Kohonen networks and desktop computing to evaluate possible measures that may be suitable for future work related to intra-document content flow such as the mean distances, kurtosis and number of node jumps.

alt text

alt text

alt text