<a href="https://colab.research.google.com/github/BritneyMuller/colab-notebooks/blob/master/Quick_%26_Dirty_Internal_Link_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Internal Link Analysis

Made by [![Follow](https://img.shields.io/twitter/follow/BritneyMuller?style=social)](https://twitter.com/BritneyMuller)

Please contact britneymuller@gmail.com with Subject [Colab Link Analysis] with any questions.

Explore: [github.com/BritneyMuller/colab-notebooks ](github.com/BritneyMuller/colab-notebooks)for more Notebook examples

# Upload internal link data via Screaming Frog 🐸


![Export internal link csv to your local computer](https://i.ibb.co/5W2crVy/Screen-Shot-2020-02-26-at-12-14-13-AM.png)

---
Before you start click: 

> 'Edit' -> 'Notebook Settings' and change 'Hardware Accelerator' to GPU.

---



Note: [Shift + Return] is the shortcut to run a single cell. 

Try running the code below by clicking into the cell and doing [Shift + Return].


In [1]:
import csv
import json
import requests
import pandas as pd
import numpy as np
import re
from IPython.display import display

In [2]:
from google.colab import files
uploaded = files.upload()

Saving lp-inlinks.csv to lp-inlinks (1).csv


Upload data as "df" (dataframe)

In [3]:
df = pd.read_csv("lp-inlinks (1).csv")

Look at the first 5 lines (this includes the header)


In [4]:
df.head()

Unnamed: 0,Type,From,To,Anchor Text,Alt Text,Follow,Link Attributes
0,HTTP Redirect,http://launchpadbemidji.com/,https://launchpadbemidji.com/,,,True,
1,AHREF,https://launchpadbemidji.com/,https://launchpadbemidji.com/,,LaunchPad,True,
2,AHREF,https://launchpadbemidji.com/,https://launchpadbemidji.com/,Home,,True,
3,AHREF,https://launchpadbemidji.com/,https://launchpadbemidji.com/,Home,,True,
4,AHREF,https://launchpadbemidji.com/,https://launchpadbemidji.com/,Home,,True,


In [5]:
df.tail()

Unnamed: 0,Type,From,To,Anchor Text,Alt Text,Follow,Link Attributes
992,AHREF,https://launchpadbemidji.com/photo-galleries/,https://launchpadbemidji.com/wp-content/upload...,,,True,
993,AHREF,https://launchpadbemidji.com/photo-galleries/,https://launchpadbemidji.com/wp-content/upload...,,,True,
994,AHREF,https://launchpadbemidji.com/photo-galleries/,https://launchpadbemidji.com/wp-content/upload...,,,True,
995,AHREF,https://launchpadbemidji.com/photo-galleries/,https://launchpadbemidji.com/wp-content/upload...,,,True,
996,AHREF,https://launchpadbemidji.com/photo-galleries/,https://launchpadbemidji.com/wp-content/upload...,,,True,


## Evaluate internal link counts

In [6]:
df['To'].value_counts()

https://launchpadbemidji.com/                                                       46
https://launchpadbemidji.com/wp-content/themes/amax/img/e.gif                       43
https://launchpadbemidji.com/entrepreneur-meet-up/                                  35
https://launchpadbemidji.com/membership/                                            35
https://launchpadbemidji.com/about-us/                                              35
                                                                                    ..
https://launchpadbemidji.com/wp-content/uploads/2017/02/3.jpg                        1
https://launchpadbemidji.com/wp-content/uploads/2019/06/1-2-3-Start-Up-Guide.pdf     1
https://launchpadbemidji.com/wp-content/uploads/2016/03/8.jpg                        1
https://launchpadbemidji.com/wp-content/uploads/2017/04/IMG_1308.jpg                 1
https://launchpadbemidji.com/wp-content/uploads/2019/04/den.jpg                      1
Name: To, Length: 116, dtype: int64

# Evaluate internal anchor text count

In [7]:
df['Anchor Text'].value_counts()

Contact Us                   33
LaunchPad Events             33
Entrepreneurs Meetup         33
LaunchPad Staff              33
Membership                   33
Current LaunchPad Members    33
Videos                       33
Resources                    33
COVID-19 Updates             33
Photo Galleries              33
Home                         33
About Us                     33
MEMBERSHIP                    1
ENTREPRENEURS MEETUP          1
ABOUT US                      1
Name: Anchor Text, dtype: int64

## Find all links to 'X' Page

In [8]:
df_filtered = df[(df['To'].str.contains("/entrepreneur-meet-up/", regex=True)==True)]
df_filtered.head(100)
#"From" URLs use targeted Anchor Text

Unnamed: 0,Type,From,To,Anchor Text,Alt Text,Follow,Link Attributes
212,AHREF,https://launchpadbemidji.com/,https://launchpadbemidji.com/entrepreneur-meet...,Entrepreneurs Meetup,,True,
213,AHREF,https://launchpadbemidji.com/,https://launchpadbemidji.com/entrepreneur-meet...,Entrepreneurs Meetup,,True,
214,AHREF,https://launchpadbemidji.com/,https://launchpadbemidji.com/entrepreneur-meet...,ENTREPRENEURS MEETUP,,True,
215,AHREF,https://launchpadbemidji.com/,https://launchpadbemidji.com/entrepreneur-meet...,Entrepreneurs Meetup,,True,
216,AHREF,https://launchpadbemidji.com/photo-galleries/,https://launchpadbemidji.com/entrepreneur-meet...,Entrepreneurs Meetup,,True,
217,AHREF,https://launchpadbemidji.com/photo-galleries/,https://launchpadbemidji.com/entrepreneur-meet...,Entrepreneurs Meetup,,True,
218,AHREF,https://launchpadbemidji.com/photo-galleries/,https://launchpadbemidji.com/entrepreneur-meet...,Entrepreneurs Meetup,,True,
219,AHREF,https://launchpadbemidji.com/entrepreneur-meet...,https://launchpadbemidji.com/entrepreneur-meet...,Entrepreneurs Meetup,,True,
220,AHREF,https://launchpadbemidji.com/entrepreneur-meet...,https://launchpadbemidji.com/entrepreneur-meet...,Entrepreneurs Meetup,,True,
221,AHREF,https://launchpadbemidji.com/entrepreneur-meet...,https://launchpadbemidji.com/entrepreneur-meet...,Entrepreneurs Meetup,,True,


#Load Filterable Table

In [9]:
%load_ext google.colab.data_table
df

Unnamed: 0,Type,From,To,Anchor Text,Alt Text,Follow,Link Attributes
0,HTTP Redirect,http://launchpadbemidji.com/,https://launchpadbemidji.com/,,,True,
1,AHREF,https://launchpadbemidji.com/,https://launchpadbemidji.com/,,LaunchPad,True,
2,AHREF,https://launchpadbemidji.com/,https://launchpadbemidji.com/,Home,,True,
3,AHREF,https://launchpadbemidji.com/,https://launchpadbemidji.com/,Home,,True,
4,AHREF,https://launchpadbemidji.com/,https://launchpadbemidji.com/,Home,,True,
...,...,...,...,...,...,...,...
992,AHREF,https://launchpadbemidji.com/photo-galleries/,https://launchpadbemidji.com/wp-content/upload...,,,True,
993,AHREF,https://launchpadbemidji.com/photo-galleries/,https://launchpadbemidji.com/wp-content/upload...,,,True,
994,AHREF,https://launchpadbemidji.com/photo-galleries/,https://launchpadbemidji.com/wp-content/upload...,,,True,
995,AHREF,https://launchpadbemidji.com/photo-galleries/,https://launchpadbemidji.com/wp-content/upload...,,,True,


In [10]:
df.head()

Unnamed: 0,Type,From,To,Anchor Text,Alt Text,Follow,Link Attributes
0,HTTP Redirect,http://launchpadbemidji.com/,https://launchpadbemidji.com/,,,True,
1,AHREF,https://launchpadbemidji.com/,https://launchpadbemidji.com/,,LaunchPad,True,
2,AHREF,https://launchpadbemidji.com/,https://launchpadbemidji.com/,Home,,True,
3,AHREF,https://launchpadbemidji.com/,https://launchpadbemidji.com/,Home,,True,
4,AHREF,https://launchpadbemidji.com/,https://launchpadbemidji.com/,Home,,True,


#Disable Filterable Table

In [11]:
#Disable Table
%unload_ext google.colab.data_table
df

#Optional Next Steps


#Now that we have your internal links cleaned and organized within a dataframe, let's bring in your keyword data!

The following example uses Moz's Ranking Keywords for Domain (berkeyfilters.com) export. 

You could also use or pull in GSC data (additional db cleanup might be required) 

In [None]:
from google.colab import files
uploaded = files.upload()

Saving berkey-cannibalization.csv to berkey-cannibalization.csv


In [None]:
df2 = pd.read_csv("berkey-cannibalization.csv")

In [None]:
df2.head()

Unnamed: 0,Keyword,Position,Volume,URL,Cannibalization
0,alexapure vs berkey,12,250,https://www.berkeyfilters.com/pages/berkey-vs-...,na
1,aquatru vs berkey,2,90,https://www.berkeyfilters.com/pages/berkey-vs-...,na
2,barkley water filter,9,10,https://www.berkeyfilters.com/,na
3,berke water filter,1,80,https://www.berkeyfilters.com/,na
4,berke water filter,1,80,https://www.berkeyfilters.com/collections/berk...,na


In [None]:
#data cleanup
#drop extra columns
df2 = df2[['Keyword', 'Position', 'Volume', 'URL']]

In [None]:
# Select all duplicate rows based on one column
duplicateRowsDF = df2[df2.duplicated(['Keyword'])]
 
print("Duplicate Keywords based on a single column are:", duplicateRowsDF, sep='\n')

Duplicate Keywords based on a single column are:
                               Keyword  ...                                                URL
4                   berke water filter  ...  https://www.berkeyfilters.com/collections/berk...
5                   berke water filter  ...  https://www.berkeyfilters.com/products/big-berkey
7                  berkee water filter  ...  https://www.berkeyfilters.com/pages/bundle-and...
8                  berkee water filter  ...  https://www.berkeyfilters.com/berkey-water-fil...
9                  berkee water filter  ...  https://www.berkeyfilters.com/collections/berk...
...                                ...  ...                                                ...
1074  what is in a berkey water filter  ...  https://www.berkeyfilters.com/collections/berk...
1075  what is in a berkey water filter  ...  https://www.berkeyfilters.com/products/big-berkey
1077     when to replace berkey filter  ...  https://www.berkeyfilters.com/pages/black-berk...
1

#All Cannibiliazion Keywords HERE:

---



In [None]:
duplicateRowsDF.head(30)

Unnamed: 0,Keyword,Position,Volume,URL
4,berke water filter,1,80,https://www.berkeyfilters.com/collections/berk...
5,berke water filter,1,80,https://www.berkeyfilters.com/products/big-berkey
7,berkee water filter,1,80,https://www.berkeyfilters.com/pages/bundle-and...
8,berkee water filter,1,80,https://www.berkeyfilters.com/berkey-water-fil...
9,berkee water filter,1,80,https://www.berkeyfilters.com/collections/berk...
10,berkee water filter,1,80,https://www.berkeyfilters.com/products/big-berkey
11,berkee water filter,1,80,https://www.berkeyfilters.com/products/royal-b...
12,berkee water filter,1,80,https://www.berkeyfilters.com/products/travel-...
14,berkeley filter,1,80,https://www.berkeyfilters.com/products/big-berkey
15,berkeley filter,1,80,https://www.berkeyfilters.com/berkey-water-fil...


In [None]:
duplicateRowsDF.count()

Keyword     704
Position    704
Volume      704
URL         704
dtype: int64

# Download Dataframe to CSV:


In [None]:
df.to_csv('data-output/my-data.csv')

Join internal link data

In [None]:
#working on how to do this. Hamlet Batista, where you at? :) Can't get any fancy melts or joins to do this + asked a dozen people about this. {shrug}