Moneypenny is a library for normalising and handling lists of URLs. It was originally built for the purposes of cleaning and generating disavow files for use with Google.
For example, you may have a file containing a list of URLs or a mix of URLs and 'domain:' entries (i.e. a disavow file), but having been aggregated from various sources you may want to remove duplicates and superfluous entries:
First convert it to a string and parse out the URL and 'domain:' entries using:
On the 'urls' or 'domains' list output as you see fit.
Moneypenny currently handles the creation/modification of a disavow file (including maintaining comments in their original place) and the testing of a disavow file against a separate list of URLs, showing which of these would be disavowed or not, were the disavow file to be applied.
Simply install with pip:
pip install moneypenny
To create / modify an existing disavow file,
To convert your file to a string, then pass that to:
With an optional argument for domain_limit, in case you want to disavow all links originating from a certain domain that exceeds your limit.
The output gives some summary statistics to do with the number of links/domains entered/disavowed along with 'domain_entries' - which contain the new domains from applying a domain limit and the domains from the original disavow file, and 'link_entries' - the individual links to be disavowed.
To modify your existing file, pass your original file to extract_file_contents(), and use this as the first parameter to:
With the dictionary output of disavow_file_to_dict() as the second parameter. This function will maintain the order (and comments) of your original disavow file.
For testing an existing disavow file against a file containing a list of URLs, simply call:
With your disavow file as the first parameter, and your URLs file to test as the second. The output is a dictionary, the most relevant keys of which are 'disavowed' and 'non_disavowed'; the rest are statistics summarising the input files and output files.
- Port in functionality to parse files from various sources (Majestic, Kerboo, LinkResearchTools) from our older code.
Moneypenny disavows secret agents, we are disavowing links … geddit?
See CONTRIBUTING file.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License
See LICENSE file.