Ways to improve the results of your current backlinks checker

1. change google query to {"domain" -site: domain}  for example "quantamixsolutions.com" - site:quantamixsolutions.com
2. then analyse found links using this script to find if there is an actual backlink or not



**Metrics Info**


> Metrics are contained in a dictionary


* 'AlexaRankCountry': country name that site ranks in e.g.'United States',
*  AlexaRank: site rank in the country '8',
* 'AlexaRankCountryCode': ISO-code of the country that site ranks in e.g. 'US',
* 'AlexaReach': Alexa reach metric e.g. '11',
* 'AlexaTrafficRank': Alexa Global rank e.g. '13',
* 'anchor_text': e.g.'Here’s a table',
* 'anchor_type': it can be a text or image e.g. 'text',
* 'country': country name of the site e.g. 'France',
* 'countryCode': ISO code of country of the site' e.g. FR,
* 'externalLinks': count of external links in the backlink page e.g. 30,
* 'from_spam_heavy_country': is the site of backlink from spam-heavy country e.g. False,
* 'hasManyBacklinks': is backlink page has many external links e.g. False,
* 'ip': ipv4 address of the domain of this backlink e.g. '151.80.39.61',
* 'is_doFollow': is this a dofollow backlink e.g. True,
* 'is_exist': is backlink is actually in the page e.g. True,
* 'pagerank': page rank of domain conducted on common crawl web graph e.g. 9.54,
* 'position': link position in the html page it can be 'body', 'footer' or 'header'
* 'rank': rank for domain of backlink in common crawl web graph e.g.16,
* 'status_code': Http status code for backlink e.g. 200




In [19]:
# prerequisites
!pip install requests
!pip install beautifulsoup4
!pip install geoip2
##################
import requests
from bs4 import BeautifulSoup
import socket
from urllib.parse import urlparse
import geoip2.database
import json

backlink = "https://ahrefs.com/blog/nofollow-links/"
targetLink = "https://en.wikipedia.org/wiki/Nofollow#Interpretation_by_the_individual_search_engines"

domain = urlparse(targetLink).netloc
backlinkDomain = urlparse(backlink).netloc

page = requests.get(backlink)
soup = BeautifulSoup(page.text, 'html.parser')

# backlink metrics dictionary
metrics = {}
metrics['status_code'] = page.status_code




In [20]:
# Do follow vs no follow
# check for X-Robots-Tag and robots meta tag
xrobots = page.headers.get('X-Robots-Tag')
metaRobots = soup.find('meta',  attrs={'name':'robots', 'content':True})
default_follow = True
if xrobots:  
  if "nofollow" in xrobots or "none" in xrobots :
    default_follow = False
if metaRobots:
  if "nofollow" in metaRobots["content"] or "none" in metaRobots["content"]:
    default_follow = False

metrics['is_doFollow'] = default_follow

# loop through links
metrics['is_exist'] = False
externalCounter = 0
for link in soup.find_all('a'):
  if link.get("href", None) == "" or link.get("href", None) is None:
    # href empty tag
    continue
  # count external links
  if backlinkDomain not in link.get("href", None):
    externalCounter+=1
  # if target domain exist
  if domain in link.get("href", None):
    metrics['is_exist'] = True
    metrics['position'] = 'body'
    metrics['anchor_text'] = link.string
    metrics['anchor_type'] = "text"
    if link.find('img'):
      metrics['anchor_type'] = "image"
    if link.get("rel",None) == "nofollow":
      metrics['is_doFollow'] = False
    if link.find_parent('footer'):
      metrics['position'] = 'footer'
    if link.find_parent('header'):
      metrics['position'] = 'header'
# add external links count
metrics['externalLinks'] = externalCounter
metrics['hasManyBacklinks'] = externalCounter > 200
# get Ipv4 address
metrics['ip'] = socket.gethostbyname(backlinkDomain)


In [21]:
# get country from ip using  maxmind geoip2 @ https://github.com/maxmind/GeoIP2-python

# !!! note you have to download GeoLite2-Country database

# This creates a Reader object. You should use the same object
# across multiple requests as creation of it is expensive.
reader = geoip2.database.Reader('GeoLite2-Country.mmdb')
# Replace "city" with the method corresponding to the database
# that you are using, e.g., "country".
response = reader.country(metrics['ip'])
metrics['countryCode'] = response.country.iso_code
metrics['country'] = response.country.name

# is IP address from spam-heavy countries like Russia, China or India ?
metrics['from_spam_heavy_country'] = False
if metrics['country'] == "Russia" or metrics['country'] == "China" or metrics['country'] == "India":
  metrics['from_spam_heavy_country'] = True

In [22]:
# is it from high authority site ? we can use either openpagerank or mozscape api
openprAPIKey = "so8gc48ccg8kkw04cwo04s4gcck4g8s0ck4s000k"
openPR = requests.get("https://openpagerank.com/api/v1.0/getPageRank", params={'domains[]': domain},headers={"API-OPR":openprAPIKey})

# convert the response 
pr = json.loads(openPR.text)

metrics['rank'] = pr['response'][0]['rank']
metrics['pagerank'] = pr['response'][0]['page_rank_decimal']

In [23]:
# alexa and traffic estimation
alexa = requests.get("http://data.alexa.com/data?cli=10&url="+domain)
soup = BeautifulSoup(alexa.text,'xml')
popularity = soup.find("POPULARITY")
reach = soup.find("REACH")
rank = soup.find("COUNTRY")
if popularity:
  metrics['AlexaTrafficRank'] =  popularity.get("TEXT",None)
if reach:
  metrics['AlexaReach'] = reach.get("RANK",None)
if rank:
  metrics['AlexaRank'] = rank.get("RANK",None)
  metrics['AlexaRankCountry'] = rank.get("NAME",None)
  metrics['AlexaRankCountryCode'] = rank.get("CODE",None)  

Alexa API response

In [24]:
alexa.text

'<?xml version="1.0" encoding="UTF-8"?>\r\n\r\n<!-- Need more Alexa data?  Find our APIs here: https://aws.amazon.com/alexa/ -->\r\n<ALEXA VER="0.9" URL="en.wikipedia.org/" HOME="0" AID="=" IDN="en.wikipedia.org/">\r\n<SD><POPULARITY URL="wikipedia.org/" TEXT="13" SOURCE="panel"/><REACH RANK="11"/><RANK DELTA="+0"/><COUNTRY CODE="US" NAME="United States" RANK="8"/></SD></ALEXA>'

Domcop API resoponse

In [25]:
pr

{'last_updated': '18th Jun 2020',
 'response': [{'domain': 'en.wikipedia.org',
   'error': '',
   'page_rank_decimal': 9.54,
   'page_rank_integer': 10,
   'rank': '16',
   'status_code': 200}],
 'status_code': 200}

**Final output**

In [26]:
metrics

{'AlexaRank': '8',
 'AlexaRankCountry': 'United States',
 'AlexaRankCountryCode': 'US',
 'AlexaReach': '11',
 'AlexaTrafficRank': '13',
 'anchor_text': 'Here’s a table',
 'anchor_type': 'text',
 'country': 'France',
 'countryCode': 'FR',
 'externalLinks': 30,
 'from_spam_heavy_country': False,
 'hasManyBacklinks': False,
 'ip': '151.80.39.61',
 'is_doFollow': True,
 'is_exist': True,
 'pagerank': 9.54,
 'position': 'body',
 'rank': '16',
 'status_code': 200}

**TODO**


*   Is it a unique domain? (Conducted on profile level)
*   Do a lot of your links come from the same IP address? (Conducted on profile level)
