Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Browse files

No more argsparse. Re-did argument parsing and removed interactive ch…

…oosing.
  • Loading branch information...
commit 83ec5d1ba1c5a31180100415a063d3d605d89aef 1 parent afb2024
@KenanY authored
Showing with 146 additions and 46 deletions.
  1. +1 −1  .travis.yml
  2. +31 −12 README.md
  3. +114 −33 domainitter.py
View
2  .travis.yml
@@ -2,6 +2,6 @@ language: python
python:
- 2.7
install: pip install pep8 --use-mirrors
-script: python domainitter.py 20 --keyboard
+script: python domainitter.py -c 1 -n 20
notifications:
email: false
View
43 README.md
@@ -10,27 +10,46 @@ Pastebin may **BAN YOUR IP ADDRESS** for using this script!
## About
-So, for whatever reason, Pastebin has something called [Domains Center][2]. You can use it to quickly look up a comprehensive report on a domain. This is pretty neat, but when you come to an outdated report, you need to click something to update it. This is a simple Python script which uses the list of [top one million sites according to Alexa.com][1] and submits them to [Pastebin's domains center][2]. It parses through the list and sends GET requests to `www.pastebin.com/domain_update.php` in order to update the the archive of domains automatically.
+So, for whatever reason, Pastebin has something called [Domains Center][2]. You
+can use it to quickly look up a comprehensive report on a domain. This is pretty
+neat, but when you come to an outdated report, you need to click a link to
+update it. To make this this process a little easier, I made this is simple
+Python script which uses the list of
+[top one million sites according to Alexa.com][1] and submits them to
+[Pastebin's domains center][2]. It parses through the list and sends GET
+requests to `pastebin.com/domain_update.php` in order to update the archive of
+domains automatically.
## Usage
-`domainitter.py` relies on `argparse` for parsing arguments. As such, you must use Python **2.7**. **This does not work with Python 3**... yet.
-
-1. Clone [the repository][4] to get the files
+ 1. Clone [the repository][4] to get the files
`git clone git://github.com/KenanY/domainitter.git`
-2. _Completely optional and virtually useless_: get an updated list of the top one million websites by downloading [Alexa.com's list][3] (~9.60 MB). You'll need to unzip the archive, then convert the list to a plaintext file (don't change its name, or you'll break stuff).
-3. Run `domainitter.py`, optionally with an integer as an argument. It will be treated as the maximum number of domains you want to submit. Example: `domainitty.py 20` will only submit 20 domains
-4. Make a choice of whether you want the script to choose websites randomly to submit, or start from the top sites then work its way down
-5. ????
-6. PROFIT!!! <small>(not guaranteed)</small>
+ 2. _Completely optional and virtually useless_: get an updated list of the top one million websites by downloading [Alexa.com's list][3] (~9.60 MB). You'll need to unzip the archive, then convert the list to a plaintext file (don't change its name, or you'll break stuff).
+ 3. Run `domainitter.py`, optionally with arguments (see below)
+ 4. ????
+ 5. PROFIT!!!
+
+## Arguments
+
+ * '-n <int>' — Pass an integer with this flag to limit the number of domains to submit before stopping.
+ * '-k <int>' — Enter either '1' or '2' with this flag to pick between (`1`) submitting domains in random order or ('2') submitting domains in descending order.
## Contributing
-This is a really small project. You can easily hack on the `master` branch directly, so no need for complicated branches. Before you code a major change/feature (if there's such a thing possible for this script), open an issue in the [Issue Tracker][] so that we can discuss whether I'd be willing to accept such a change. This ensures that you don't go off wasting your time coding a feature I don't accept.
+This is a really small project. You can easily hack on the `master` branch
+directly, so no need for complicated branches. Before you code a major
+change/feature (if there's such a thing possible for this script), open an issue
+in the [Issue Tracker][] so that we can discuss whether I'd be willing to accept
+such a change. This ensures that you don't go off wasting your time coding a
+feature I don't accept.
-Also, please keep `domainitter.py` as [PEP 8][] compliant as possible. Use the [pep8 tool][] to check your code.
+Also, please keep `domainitter.py` as [PEP 8][] compliant as possible. Use the
+[pep8 tool][] to check your code.
-Tests are run on [Travis CI][]. Pretty much the only test is simply running the script itself, except with the argument `--keyboard` in order to skip the user input portion of the script. The current build status is displayed as a little image near the top of this README. You can check out all the builds [here][5].
+Tests are run on [Travis CI][]. Pretty much the only test is simply running the
+script itself, except with the argument `--keyboard` in order to skip the user
+input portion of the script. The current build status is displayed as a little
+image near the top of this README. You can check out all the builds [here][5].
[1]: http://www.alexa.com/topsites
View
147 domainitter.py
@@ -2,20 +2,111 @@
# -*- coding: utf-8 -*-
"""
-* Domainitter - A Pastebin site auto-submitter
+* Domainitter A Pastebin Domains center auto-updater
*
* Licensed under the MIT:
* http://www.opensource.org/licenses/mit-license.php
*
* Copyright (c) 2011–2012, Kenan Yildirim
+*
+*
+*
+* ArgsList parsing
+* ================
+*
+* Copyright (c) 2012, Kenneth Reitz
+* All rights reserved.
+*
+* Redistribution and use in source and binary forms, with or without
+* modification, are permitted provided that the following conditions are met:
+*
+* Redistributions of source code must retain the above copyright notice, this
+* list of conditions and the following disclaimer.
+* Redistributions in binary form must reproduce the above copyright notice,
+* this list of conditions and the following disclaimer in the documentation
+* and/or other materials provided with the distribution.
+* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+* ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+* LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+* POSSIBILITY OF SUCH DAMAGE.
"""
-import argparse
-import os.path
+import os
import random
import re
+import sys
import urllib
from sys import exit
+from sys import argv
+from collections import OrderedDict
+
+
+if sys.version_info[0] == 3:
+ string_type = str
+else:
+ string_type = basestring
+
+
+def _is_collection(obj):
+ """Tests if an object is a collection. Strings don't count."""
+
+ if isinstance(obj, string_type):
+ return False
+
+ return hasattr(obj, '__getitem__')
+
+
+class ArgsList(object):
+ """CLI Argument management."""
+
+ def __init__(self, args=None, no_argv=False):
+ if not args:
+ if not no_argv:
+ self._args = argv[1:]
+ else:
+ self._args = []
+ else:
+ self._args = args
+
+ def __getitem__(self, i):
+ try:
+ return self.all[i]
+ except IndexError:
+ return None
+
+ @property
+ def grouped(self):
+ """Extracts --flag groups from argument list.
+ Returns {format: Args, ...}
+ """
+
+ collection = OrderedDict(_=ArgsList(no_argv=True))
+ _current_group = None
+
+ for arg in self.all:
+ if arg.startswith('-'):
+ _current_group = arg
+ collection.setdefault(arg, ArgsList(no_argv=True))
+ else:
+ if _current_group:
+ collection[_current_group]._args.append(arg)
+ else:
+ collection['_']._args.append(arg)
+
+ return collection
+
+ @property
+ def all(self):
+ """Returns all arguments."""
+
+ return self._args
def split_thousands(s, sep=','):
@@ -32,20 +123,9 @@ def submit_site(query):
def main():
- entry = None
- answer = None
- parser = argparse.ArgumentParser(
- description='''Updates domain records on
- Pastebin''',
- epilog='''Copyright (c) 2011–2012,
- Kenan Yildirim''')
- parser.add_argument('scraps', type=int,
- help='integer of how many domains you want to update',
- nargs='?', default=1000000)
- parser.add_argument('-k', '--keyboard',
- help='run the script without needing keyboard input.',
- action='store_true', default=False)
- args = parser.parse_args()
+ args = ArgsList()
+ gr = args.grouped
+ answer = '1'
print 'Parsing website list...'
if os.path.isfile('top-1m.txt'):
@@ -55,22 +135,23 @@ def main():
print 'Error: could not find website list!'
exit()
- if args.keyboard:
- answer = '1'
+ print 'Parsing arguments...'
+ if '-n' in gr:
+ maxScraps = gr['-n'][:1][0]
+ print 'Found custom limit: ' + maxScraps
else:
- print '''Done!\n
- Time to make a choice:
- [1] Randomly select which websites to submit
- [2] Start from the most popular site, then work downwards'''
- answer = raw_input("> ")
+ maxScraps = 1000000
+
+ if '-c' in gr:
+ answer = str(gr['-c'][:1][0])
- print '\n'
+ print 'Done parsing stuff.'
- if '1' in answer:
- print 'Randomly selecting sites!'
- print '=========='
+ if answer is '1':
+ print 'Random'
+ print '======'
scraps = 1
- while scraps <= int(args.scraps):
+ while scraps <= int(maxScraps):
entryLine = re.split("[\W]?",
entries[random.randint(1, 1000000)],
1)
@@ -80,12 +161,12 @@ def main():
split_thousands(entryLine[0]))
scraps += 1
exit()
- elif '2' in answer:
- print 'Starting from the top sites!'
- print '=========='
+ elif answer is '2':
+ print 'Ordered'
+ print '======='
for entry in entries:
entryLine = re.split("[\W]?", entry, 1)
- if int(entryLine[0]) <= int(args.scraps):
+ if int(entryLine[0]) <= int(maxScraps):
submit_site(entryLine[1])
print "[%s] %s" % (str(split_thousands(entryLine[0])),
entryLine[1])
Please sign in to comment.
Something went wrong with that request. Please try again.