Disavowing links for Queen and Country
Python
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
moneypenny
.gitignore
CONTRIBUTING
LICENSE
NOTICE
README.md
requirements.txt
setup.cfg
setup.py

README.md

Introduction

Moneypenny is a library for normalising and handling lists of URLs. It was originally built for the purposes of cleaning and generating disavow files for use with Google.

For example, you may have a file containing a list of URLs or a mix of URLs and 'domain:' entries (i.e. a disavow file), but having been aggregated from various sources you may want to remove duplicates and superfluous entries:

First convert it to a string and parse out the URL and 'domain:' entries using:

import_from_file('<your_filename>.txt')

Then call:

normalize_and_dedupe(<import_from_file_output>) 

On the 'urls' or 'domains' list output as you see fit.

Moneypenny currently handles the creation/modification of a disavow file (including maintaining comments in their original place) and the testing of a disavow file against a separate list of URLs, showing which of these would be disavowed or not, were the disavow file to be applied.

Using Moneypenny

Simply install with pip:

pip install moneypenny

To create / modify an existing disavow file,

First call:

extract_file_contents('<your_filename>.txt')

To convert your file to a string, then pass that to:

disavow_file_to_dict(<extract_file_contents_output>)

With an optional argument for domain_limit, in case you want to disavow all links originating from a certain domain that exceeds your limit.

For example, ‘www.example.com/a/spam’, ‘www.example.com/#spam’ and ‘www.example.com/a/c/?spam’ can be replaced with a single 'domain:www.example.com' entry.

The output gives some summary statistics to do with the number of links/domains entered/disavowed along with 'domain_entries' - which contain the new domains from applying a domain limit and the domains from the original disavow file, and 'link_entries' - the individual links to be disavowed.

To modify your existing file, pass your original file to extract_file_contents(), and use this as the first parameter to:

combine_with_original_disavow('<your_filename>.txt', 
<disavow_file_to_dict_output>)

With the dictionary output of disavow_file_to_dict() as the second parameter. This function will maintain the order (and comments) of your original disavow file.

For testing an existing disavow file against a file containing a list of URLs, simply call:

apply_disavow_files(<your_disavow_filename>.txt,
    <your_urls_to_test_filename>.txt)

With your disavow file as the first parameter, and your URLs file to test as the second. The output is a dictionary, the most relevant keys of which are 'disavowed' and 'non_disavowed'; the rest are statistics summarising the input files and output files.

To Do

  • Port in functionality to parse files from various sources (Majestic, Kerboo, LinkResearchTools) from our older code.

Why Moneypenny?

Moneypenny disavows secret agents, we are disavowing links … geddit?

Contributing

See CONTRIBUTING file.

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License

See LICENSE file.