Skip to content

arimbr/theseus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

98 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Helsinki Metropolia University of Applied Sciences

Information Technology

Bachelor Thesis

Ari Bajo Rouvinen

Data Mining Thesis Topics in Finland

The Theseus open repository contains metadata about more than 100,000 thesis publications from the different universities of applied sciences in Finland. Different data mining techniques were applied to the Theseus dataset to build a web application to explore thesis topics and degree programmes using different libraries in Python and JavaScript. Thesis topics were extracted from manually annotated keywords by the authors and curated subjects by the librarians. During the project, the quality of the thesis keywords and subjects to represent the thesis topics was evaluated and several data quality issues were raised. Data mining techniques were used to collect, explore, clean, analyse, model and visualize the data.

Special focus was put on comparing the results of different dimensionality reduction and clustering techniques to visualize similar degrees based on topics. t-SNE proved to be a powerful method to visualize degrees on a 2-dimensional interactive map and hierarchical clustering was found to be the most flexible technique to get multiple clusterings at different levels.

The application allows to discover popular topics for a degree or university and popular degrees for a series of topics, as well as to explore related topics and related degrees. The work presented serves also as a foundation for future study regarding the evolution of topics popularity over time and the detection of trending topics.

About

Data Mining Thesis Topics in Finland

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors