cascadePy is a Python toolkit developed by Centre for the Analysis of Social Media (CASM) Technology LLP in collaboration with the Global Initiative against Transnational Organised Crime (GITOC).
cascadePy combines a number of NLP, information-extraction and web-collection methods to provide a set of tools primarily for use in open-source intelligence (OSINT) efforts against the illicit online wildlife trade.
The intended use of cascadePy is to discover, characterise and expand the vernacular used by those complicit in the illicit wildlife trade, and identify the places they advertise on the web.
This work is an expansion on the original work that can be found here.
The installation instructions for the toolkit can be found below and a brief summary of each module can be found in the accompanying Wiki.
If you intend to use this toolkit, please use the following citation:
Pay, Jack Frederick, 2020. The Corpus Expansion Toolkit: finding what we want on the web (Doctoral thesis, University of Sussex).
In bibtex:
@phdthesis{pay2020corpusexpansion,
title = {The Corpus Expansion Toolkit: finding what we want on the web},
author = {Jack Frederick Pay},
year = {2020},
school = {University of Sussex},
url = {http://sro.sussex.ac.uk/id/eprint/93062/},
}
- It is recommended that your Python environment is >=3.8
- Is is also recommended to use a data-science focused Python environment, such as Anaconda.
- Clone the repository to your local machine
- run
python setup.py install - Install any relevant spaCy models you require. For example, for English run the following command:
python -m spacy download en - Follow the below instructions to install the Surprising Phrase Detector (SFPD)
Robertson, Andrew David, 2019. Characterising semantically coherent classes of text through feature discovery (Doctoral thesis, University of Sussex).
- Clone the repository found here (citing where necessary).
- Follow the necessary installation instructions.
The toolkit is primarily a library or programming API for others to develop their own corpus expansion pipelines and methodologies. However, a brief breakdown of each module can be found in the accompanying Wiki.
Please feel free to raise any issues found when using this toolkit, create pull requests or create discussion threads.
Neither CASM LLP or GITOC accept any liability for the misuse of any of the tools provided in this library.
