The aim of this paper/project is to present a possible solution to the Sexual Predator Identification problem as a part of Text Analysis and Retrieval course at the Faculty of Electrical Engineering and Computing in Zagreb. The given problem was originally presented at the PAN 2012 competition, where the task was divided into two parts:
- identifying the predators among all the users, and
- identifying the most distinctive features of the predators’bad behavior by singling out the incriminating lines.
We approached these tasks using machine learning, specifically ensemble learning. Features used in our ensemble model are lexical and behavioral features extracted from a data set consisting of online chat conversations.