Development of the process used in the Pundits Review website to recognise Premier League player & club entities in news articles - https://www.punditsreview.com/
Pundits Review scrapes and processes news articles about the Premier League in order to give players and teams a review score each week. Each Monday, the project collects articles, divides them into phrases, identifies the player or club being referred to and then predicts the sentiment of the phrase. See more on how it works here!
This repository shows the progression of the methods used to recognise the players being written about in a phrase within a football news article. The methods employ a spacy model to recognise the syntatic dependencies of words in a phrase. My final solution matches words which; a) Appear as a proper noun, b) Are longer than 3 characters, c) Appear in key syntactic dependency positions, d) Are listed as a player or club identifier.
Attempt 1 explores different ways to recognise a player entity using a spacy model
This notebook compares 10 different approaches to recognising player / club entities. Independent variable: key syntactic dependency position
Manually annotated (sentiment & player target) set of 500 rows of sample data taken from The Mirror - Match Reports