# SplunkDevURLs

Testing links from docs to dev!

First, import [os](https://docs.python.org/3/library/os.html) to set the current working dir to the folder where 'Dev links from Ponydocs - Sheet1.csv' exists locally.

In [17]:
import os
os.chdir('/Users/pking/Downloads')
print(os.getcwd())

/Users/pking/Downloads


Import [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/) (once installed) and [urllib](https://docs.python.org/3/library/urllib.html) to extract links from HTML. This cell is useful to ad hoc test any page by assigning it to url. This cell is not called in csv processing...but its imports are required.

In [18]:
from bs4 import BeautifulSoup
import urllib.request

url = 'https://docs.splunk.com/Documentation:TA-ueba:User:Sourcetypes:1.2.0'
conn = urllib.request.urlopen(url)
html = conn.read()

soup = BeautifulSoup(html)

links = soup.find_all('a')

for tag in links:
    link = tag.get('href', None)
    if link is not None:
        print(link)

/Documentation
http://www.splunk.com/en_us/products.html
http://www.splunk.com/en_us/products.html
http://www.splunk.com/en_us/products/splunk-enterprise.html
http://www.splunk.com/en_us/products/splunk-cloud.html
http://www.splunk.com/en_us/products/splunk-enterprise.html
http://www.splunk.com/en_us/products/premium-solutions/splunk-enterprise-security.html
http://www.splunk.com/en_us/products/premium-solutions/it-service-intelligence.html
http://www.splunk.com/en_us/products/premium-solutions/user-behavior-analytics.html
http://www.splunk.com/en_us/products/apps-and-add-ons.html
http://www.splunk.com/en_us/products/pricing.html
http://www.co-store.com/splunk
http://www.splunk.com/en_us/solutions
http://www.splunk.com/en_us/solutions/solution-areas.html
http://www.splunk.com/en_us/solutions/solution-areas/application-delivery.html
http://www.splunk.com/en_us/solutions/solution-areas/big-data.html
http://www.splunk.com/en_us/solutions/solution-areas/business-analytics.html
http://www.s

The testlink function tests the link passed to it:

- If an http error is thrown, write it to the csv.

- If not, but the link ends up on the landing page of dev.splunk.com anyways...call it a 'not mapped' link, except when the link was specifically to the landing page. Not sure if this is a meaningful test...

- Otherwise...the link laned on a mapped destination url or on the landing page iteself, intentionally. So that's OK! However, whether the destination is the *correct* or *intended* topic is not tested;  someone with a brain beyond this script must still ponder the appropriateness of the 'Ok' result.

In [19]:
def testlink(link, page, out_writer):
    try:
        test = urllib.request.urlopen(link)
    except urllib.error.HTTPError as e:
        print('    HTTPError: {}'.format(e.code) + ' (' + link + ')')
        out_writer.writerow([page, '{} ('.format(e.code) + link + ')'])
    else:
        if test.geturl() == 'https://dev.splunk.com/' and \
            link != 'https://dev.splunk.com/' and \
            link != 'https://dev.splunk.com' and \
            link != 'http://dev.splunk.com/' and \
            link != 'http://dev.splunk.com':
                print('    Not mapped (' + link + ')')
                out_writer.writerow([page, 'Not mapped (' + link + ')'])
        else: 
            out_writer.writerow([page, 'Ok (' + link + ')'])

The getlinks function pulls all href tags that contain 'dev.splunk.com' from the HTML of the given page.

If the given source page itself throws an error, then no links are tested; just report error code in output. 

In [20]:
def getlinks(page, out_writer):
    try:
        conn = urllib.request.urlopen(page)
        html = conn.read()
        soup = BeautifulSoup(html)
        links = soup.find_all('a')
        for tag in links:
            link = tag.get('href', None)
            if link is not None:
                if link.find('dev.splunk.com') > -1: 
                    testlink(link, page, out_writer)
        conn.close()
    except urllib.error.HTTPError as e:
        print('    PonyDocs HTTPError: {}'.format(e.code)) 
        out_writer.writerow([page, '{} On opening Ponydocs source'.format(e.code)])

Process the specified csv export from google sheets, one row at a time. The export csv must be in the current working dir (see top of this notebook).

First, import [csv](https://docs.python.org/3/library/csv.html) and set up a writable output file. **Note**  running this cell will truncate your prior output csv. To instead save it, make a copy before running this cell!

Then, get and test all the links from the ponydocs sources listed in sheet_exported.

Finally, close files.

In [21]:
import csv

out_file = open('out_file.csv', mode='w+')   # truncate existing out_file content
out_writer = csv.writer(out_file, delimiter=',')
out_writer.writerow(['Topic', 'Status'])     # start it with the same header as source list, for import beauty

sheet_exported = 'Dev links from Ponydocs - Sheet1.csv'

with open(sheet_exported) as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    line_count = 0
    for row in csv_reader:
        if line_count == 0:
            line_count += 1
        else:
            ponypage = row[0] #print(row[0]) # print(row)
            print('Row ' + str(line_count) + ' Page: ' + ponypage)
            getlinks(ponypage, out_writer)
            line_count += 1

print('Rows processed: ' + str(line_count))
csv_file.close()
out_file.close()

Row 1 Page: https://docs.splunk.com/Documentation:AddOns:Akamai:Configuresharedtokens:previous
Row 2 Page: https://docs.splunk.com/Documentation:AddOns:Akamai:Configuresharedtokens:unreleased
Row 3 Page: https://docs.splunk.com/Documentation:AddOns:Akamai:Hardwareandsoftwarerequirements:previous
Row 4 Page: https://docs.splunk.com/Documentation:AddOns:Akamai:Hardwareandsoftwarerequirements:unreleased
Row 5 Page: https://docs.splunk.com/Documentation:AddOns:Akamai:Install:previous
Row 6 Page: https://docs.splunk.com/Documentation:AddOns:Akamai:Install:unreleased
Row 7 Page: https://docs.splunk.com/Documentation:AddOns:Firehose:ConfigureHECdistributed:previous
Row 8 Page: https://docs.splunk.com/Documentation:AddOns:Firehose:ConfigureHECdistributed:released
    HTTPError: 404 (http://dev.splunk.com/view/event-collector/SP-CAAAE73#setup)
Row 9 Page: https://docs.splunk.com/Documentation:AddOns:Firehose:ConfigureHECdistributed:staging
Row 10 Page: https://docs.splunk.com/Documentation:AddO

Row 91 Page: https://docs.splunk.com/Documentation:ES:RN:Enhancements:Caterham
Row 92 Page: https://docs.splunk.com/Documentation:ES:RN:Enhancements:Dino
Row 93 Page: https://docs.splunk.com/Documentation:ES:User:Audit:6.0.0
Row 94 Page: https://docs.splunk.com/Documentation:ES:User:Audit:F1
Row 95 Page: https://docs.splunk.com/Documentation:ES:User:Audit:Interceptor
Row 96 Page: https://docs.splunk.com/Documentation:ES:User:Audit:drafts
Row 97 Page: https://docs.splunk.com/Documentation:ES:User:Configureblocklists:Dino
Row 98 Page: https://docs.splunk.com/Documentation:ES:User:NotableEvents:4.5.0
Row 99 Page: https://docs.splunk.com/Documentation:ES:User:RiskScoring:4.7.0
Row 100 Page: https://docs.splunk.com/Documentation:ES:User:RiskScoring:drafts
Row 101 Page: https://docs.splunk.com/Documentation:ES:User:SetupAdaptiveResponse:4.1.0
Row 102 Page: https://docs.splunk.com/Documentation:ES:dev:Planningyourintegration:1.1
Row 103 Page: https://docs.splunk.com/Documentation:Hunk:Hunk:In

Row 175 Page: https://docs.splunk.com/Documentation:Splunk:CloudUG2:AdddatausingHTTPeventcollector1:1
Row 176 Page: https://docs.splunk.com/Documentation:Splunk:CloudUG2:OptionsforaddingdatatoSplunkCloud:1
Row 177 Page: https://docs.splunk.com/Documentation:Splunk:CloudUG:AdddatausingHTTPprotocol:1
Row 178 Page: https://docs.splunk.com/Documentation:Splunk:CloudUG:GettingStartedWithSplunkCloud:1
Row 179 Page: https://docs.splunk.com/Documentation:Splunk:CloudUG:OptionsforgettingdataintoSplunkCloud:1
Row 180 Page: https://docs.splunk.com/Documentation:Splunk:DMC:Inputdashboards:6.5.0
Row 181 Page: https://docs.splunk.com/Documentation:Splunk:DMC:Usefeaturemonitoring:8.0.0
    HTTPError: 404 (http://dev.splunk.com/sdks)
Row 182 Page: https://docs.splunk.com/Documentation:Splunk:Data:AboutHEC:NightLight
Row 183 Page: https://docs.splunk.com/Documentation:Splunk:Data:UsetheHTTPEventCollector:7.1.0
Row 184 Page: https://docs.splunk.com/Documentation:Splunk:DataOB:Bestpracticesforindexingdat

Row 258 Page: https://docs.splunk.com/Documentation:Splunk:RESTAPI:RESTlicense:6.0beta
Row 259 Page: https://docs.splunk.com/Documentation:Splunk:RESTAPI:RESToutput:4.2.2
Row 260 Page: https://docs.splunk.com/Documentation:Splunk:RESTAPI:RESToutput:6.0beta
Row 261 Page: https://docs.splunk.com/Documentation:Splunk:RESTAPI:RESTsearch:4.2.2
Row 262 Page: https://docs.splunk.com/Documentation:Splunk:RESTAPI:RESTsearch:6.0beta
Row 263 Page: https://docs.splunk.com/Documentation:Splunk:RESTAPI:RESTsystem:4.2.2
Row 264 Page: https://docs.splunk.com/Documentation:Splunk:RESTAPI:RESTsystem:6.1beta
Row 265 Page: https://docs.splunk.com/Documentation:Splunk:RESTAPI:RESTusing:4.2.2
Row 266 Page: https://docs.splunk.com/Documentation:Splunk:RESTAPI:RESTusing:6.0beta
Row 267 Page: https://docs.splunk.com/Documentation:Splunk:RESTREF:RESTaccess:7.2.0
Row 268 Page: https://docs.splunk.com/Documentation:Splunk:RESTREF:RESTaccessExamples:6.3.1511
Row 269 Page: https://docs.splunk.com/Documentation:Splu

    HTTPError: 404 (http://dev.splunk.com/view/SP-CAAAEM9)
Row 341 Page: https://docs.splunk.com/Documentation:Splunkbase:Splunkbase:SubmitcontenttoSplunkbase:drafts
Row 342 Page: https://docs.splunk.com/Documentation:Splunkbase:Splunkbase:SubmitcontenttoSplunkbase:drafts
Row 343 Page: https://docs.splunk.com/Documentation:Splunkbase:Splunkbase:SubmitcontenttoSplunkbase:splunkbase
Row 344 Page: https://docs.splunk.com/Documentation:Splunkbase:Splunkbase:SubmitcontenttoSplunkbase:splunkbase
Row 345 Page: https://docs.splunk.com/Documentation:Splunkbase:Splunkbase:Versioncompatibility:splunkbase
    HTTPError: 404 (http://dev.splunk.com/view/app-cert/SP-CAAAE3H)
Row 346 Page: https://docs.splunk.com/Documentation:StyleGuide:StyleGuide:Quickreference:current
    HTTPError: 404 (https://dev.splunk.com/view/webframework-djangobindings/SP-CAAAEVE)
Row 347 Page: https://docs.splunk.com/Documentation:StyleGuide:StyleGuide:Quickreference:drafts
Row 348 Page: https://docs.splunk.com/Documentatio