A Jupyter Notebook implementing a parser for analysing HTML Articles
The parser was developed as a requirement for applying in the MSc in Information Studies programme at the University of Amsterdam.
The goal was to parse a given collection of articles and plot its Word Count Distribution.
It is developed using only python packages and not 3rd-party ones.
The parser implements HTMLParser.
Counter package is used for counting words, string for string manipulation & matplotlib for creating the histogram.