If read a lot of penetration testing literature, you've probably come across Peter Kim's pentesting walkthrough, The Hacker's Playbook: A Practical Guide to Penetration Testing.
In the book, Peter (Mr. Kim) discusses a short script he uses to scrape XSS payloads from r/xss, a subreddit devoted to publishing and discussing XSS vulnerabilities. Here's the script, for reference.
#!/usr/bin/env python #Reddit XSS #Author: Cheetz import urllib2, sys import logging, os, re, sys, urllib, string from optparse import OptionParser from urlparse import urlparse class Lookup: def run(self,url): request = urllib2.Request(url) request.add_header('User-Agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:184.108.40.206) Gecko/20091221 Firefox/3.5.7 (.NET CLR 3.5.30729)') response = urllib2.urlopen(request) resolve_response = response.read() self.regex(resolve_response) #print resolve_response def regex(self,resolve_response): file = open("output_xss.txt", 'a') n = re.compile(r'href=\"http.*?>', re.IGNORECASE) result = n.findall(resolve_response) for a in result: if ("reddit" not in a): remove_string = 'href="' a = a.replace(remove_string,"") b = a.split('"') a = b file.write(a.replace(remove_string,"")) file.write('\n') p = re.compile(r'count=(\d+)&after=(.*?)\"', re.IGNORECASE) link = p.findall(resolve_response) next_string = "http://www.reddit.com/r/xss/?count="+link+"&after="+link file.close() self.run(next_string) if __name__ == '__main__': url = "http://www.reddit.com/r/xss" app = Lookup() app.run(url) ``` And here's a snippet of what the script outputs if you run it — a pretty substantial list of XSS payloads downloaded to a file named `output_xss.txt`.
http://h30499.www3.hp.com/t5/Fortify-Application-Security/XSS-and-App-Security-through-HTML5-s-PostMessage/ba-p/6515002 http://nahamsec.com/2014/05/single-vulnerability-to-cause-stored-xss-in-yahoo-flickr-google-twitter-amazon-youtube/ http://h30499.www3.hp.com/t5/Fortify-Application-Security/XSS-Beyond-the-Alert-Box/ba-p/6491366 http://john.com/login.php?id=%27;alert%28String.fromCharCode%2871,117,105,100,111,90,32,88,83,83%29%29//\%27;alert%28String.fromCharCode%2871,117,105,100,111,90,32,88,83,83%29%29//%22;alert%28String.fromCharCode%2871,117,105,100,111,90,32,88,83,83%29%29//\%22;alert%28String.fromCharCode%2871,117,105,100,111,90,32,88,83,83%29%29//--%3E%3C/SCRIPT%3E%22%3E%27%3E%3CSCRIPT%3Ealert%28String.fromCharCode%2871,117,105,100,111,90,32,88,83,83%29%29%3C/SCRIPT%3E http://www.usatoday.com/story/tech/2014/05/01/microsoft-issues-internet-explorer-security-fix/8562737/
There's a lot of support in
python-shell for doing things like fiddling with the script's input and output via
stdout, changing how the data is encoded and transmitted, etc, but again, I'm so very tired. Is there any solution for a hardworkin' bit-jockey like myself — one that doesn't require adding a persistent database layer or anything that will add an extra layer of complexity?
Actually, we already have a sort of database — the
output_xss.txt file the
reddit.py script creates and writes to every time it runs. If we can schedule when it runs, making sure the file is there before the Twitterbot attempts to draw from it, then we can just read and write directly from it. That sounds much more doable.
We already have the code to download the contents for and create the
output_xss.txt file, courtesy of Special K, now we just need to read the file, tweet the contents, and schedule the whole mess.
Reading the file is simple with node's built in
fs module. Looking through the node API documentation, we see this code will open our file and log its contents: