In this repository, we are trying to get some textual data from reddit to train a text classification model for predicting cybersecurity related posts in reddit.
Our system is looking for the following keywords in particular
- 'vulnerability',
- 'cybersecurity',
- 'cyber-crime',
- 'cybercrime',
- 'cyber crime',
- 'CVE',
- 'CVEs',
- 'CVE-',
- 'cyber attack'
The system uses PushShift API for downloading historical reddit data. The API documentation can be found in PushShift Github Repo