Skip to content

bbrause/subrosa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SUB ROSA

SUB ROSA

subtitle-based film similarities

Do similar films use a similar language? SUB ROSA addresses this question by giving users the ability to examine movies for speech-related features. These features are extracted from subtitle data using methods from Natural Language Processing, Stylometry and Information Retrieval.
For detailed information about these methods, please read this paper.

This work was realized by Jan Luhmann as part of the course ”Drama Mining und Film-Analyse” (summer semester 2019) under the supervision of Manuel Burghardt and Jochen Tiepmar at the University of Leipzig.

Subtitle data was kindly provided by the team of OpenSubtitles.


Installation

  1. Make sure you have Python 3 installed. Also install dependencies using pip:
pip install Flask numpy scikit-learn 
  1. Clone this repository.
git clone https://github.com/bbrause/subrosa.git
  1. Move to the repository folder and start the app.
cd subrosa
python3 app/app.py