Skip to content


Repository files navigation


This repository contain the following files related to my submission of the Data Mining assignment:

  • The "Report.pdf" file is the assignment report. It describes the work done in the different stages of this project.
  • The "Notebook" folder contains a Jupyter Notebook that contains the most important steps of the project with code, results, and explanations. The "Other Notebooks" folder contains Jupyter notebooks that include additional analysis of the data.
  • The "Web crawlers" folder contains the Python scripts used for web crawling.
  • The "Stoage-in-database scripts" folder contains the Python scripts used to store crawled data in a MySQL database.


To run the scripts in this repository or to run the Jupyter notebook provided, you need some data files. These files were uploaded to Dropbox because some of them are bigger than Github allows.

So to be able to run the scripts and the notebook, download the data from this link:

and then make sure that the paths of the files in the scripts and in the notebook refer to the actual places of the files on your system. After that, you can run all scripts without a problem.

All scripts and the notebook were run and tested on a Mac device. They should run normally on Linux, but there might be some modifications needed to run them on Windows devices.

If any error happens when trying to run the scripts, please contact me on my email:

The YouTube Video of the Assignment

You can watch the presentation video of this assignment by following this link:


Ammar Alyousfi submission for the Data Mining assignment (UM, June 2019)






No releases published


No packages published