A repository for publishing our python discipline project, where we build a forecasting model of Premier League Matches Results using Python programing language.
Our data come from three sources:
- For the tweets related to PL teams, we use the Twitter module of snscrape, a library for scraping into social medias;
- For matches informations, we collect the data from FBREF, using a common request method;
- And, to collect SPI and NSXG indexes, we resort to the scraping technique that makes use of the web driver to access the information present in sites built in java script language. The referred web site is FiveThirtyEight. At the end of this step, we storege our dataframes in csv files and professionally into SQL server.
For data manipulation, our main libraries was pandas
and numpy
, where we adapted the databases for visualization and estimation purposes.
In this step the main libraries used were matplotlib.pyplot
and seaborn
. Below we present the respectives visualizations generated:
To train and test our model, we use the Random Forest Classificatier method, loaded from sklearn library. As our best result, we get a 67.90% of accuracy level.