# Proxied auto voting using requests #

The Florida chapter of the American Institute of Architects is holding a 'People's Choice Awards' for buildings in Florida:

https://floridapeopleschoice.org/

One example building is here:

https://floridapeopleschoice.org/building.cfm?idsPCBuilding=263

Voting is via a simple 'I like this' checkbox on the page for each building.

**Q: can we auto vote for one of these buildings?**

Yes! Using Python requests!

## Contents ##

This notebook demonstrates how to make GET requests to vote in this simple competition via a rotating list of free HTTP proxies. 

It also incorporates a couple of additional obfuscation techniques - spoofing the user agent headers and only using geoip lookup to ensure votes only come from countries in sensible hours in their timezones. 

## Homework ##

If you're looking to extend this outside just cleaning it up, how about adding Tor support:

- all requests, HTTP and HTTPS should be proxied via Tor:

https://medium.com/@jasonrigden/using-tor-with-the-python-request-library-79015b2606cb

- rather than rotating proxies, change the host's Tor identity between requests using stem or TorCtl:

https://stackoverflow.com/questions/9887505/how-to-change-tor-identity-in-python

## Import required modules ##

In [1]:
import requests
from random import randint
from time import sleep
from lxml.html import fromstring
from itertools import cycle
from geoip import geolite2
# note you will also need to install python-geoip-geolite2 to get the database itself
from datetime import datetime
import pytz

# initialise user agent database without using caching (this may take some time)
from fake_useragent import UserAgent
ua = UserAgent(cache=False)

## Get some proxies ##

When voting multiple times like this, one simple way to be detected is via your IP address. It's highly unlikely one IP will vote many times, so it can be easily removed.

To counter this - we use multiple proxy servers through which we send our requests. This makes it look like the votes are coming from all over the world.

Rather than pay for some high quality proxies, we can retrieve a list of free ones from https://free-proxy-list.net/ 

The quality of these will vary greatly with most not supporting HTTPS and very few classified as 'elite' i.e. actually being anonymous, but they are fine for the purposes of a demonstration.

We can scrape the IP and ports for the proxies using xpaths:

In [2]:
# function to retrieve free proxy info from free-proxy-list.net
def get_proxies():
    # define proxy list url
    url = 'https://free-proxy-list.net/'
    
    # initialise output list
    proxies = []
    
    # retrieve random user agent based on browser stats
    headers = {'User-Agent': ua.random}

    # scrape the proxy data and parse
    response = requests.get(url, headers)
    parser = fromstring(response.text)
    # locate the proxy data table in the HTML
    for i in parser.xpath('//tbody/tr')[:]:
    # fields we want: ip:port, https bool, type (transparent, anonymous or elite)
    # this is in columns (td) 1/2, 7 and 5 respectively
        proxies.append({"ip": i.xpath('.//td[1]/text()')[0],
                        "port": i.xpath('.//td[2]/text()')[0],
                        "ip_port": ":".join([i.xpath('.//td[1]/text()')[0], 
                                        i.xpath('.//td[2]/text()')[0]]),
                        "https": i.xpath('.//td[7]/text()')[0],
                        "type": i.xpath('.//td[5]/text()')[0]})
    # return the list of proxy dicts
    return proxies

In [3]:
# retrieve raw proxy data
proxies = get_proxies()

Voting from all over the world is great, but if we're proxying votes via countries where it is the middle of the night, that will also be easily detected. 

To counter this, first we enrich our proxy list but identifying which timezone it resides in using a geoip (IP -> Location) lookup:

In [4]:
# add timezones to each proxy where possible using geoip
for proxy in proxies:
    # look up geoip info for each proxy using MaxMind's free Geolite DB
    match = geolite2.lookup(proxy['ip'])
    # if we are able to identify a timezone for the proxy add it
    if match and match.timezone != 'None':
        proxy.update({"tz": match.timezone})
    # otherwise, flag as missing
    else:
        proxy.update({"tz": None})

In [5]:
# check the output
proxies[0]

{'https': 'no',
 'ip': '210.16.84.102',
 'ip_port': '210.16.84.102:81',
 'port': '81',
 'type': 'transparent',
 'tz': None}

In [6]:
# create a new proxy list only containing those with a resolved timezone
tz_proxies = ([x for x in proxies if x['tz']])

## Set proxy anonymity ##

Before we start, time to set some important parameters. First, just how anonymous do we want our proxies to be?

Not all proxies are created equal - they come in 3 flavours:

- **transparent** proxies send the HTTP_X_FORWARDED_FOR and HTTP_VIA headers, meaning they formard on your actual IP address and flag the request as coming from a proxy
- **anonymous** proxies still send HTTP_VIA, alerting the recipient you are using a proxy, but do not send your real IP address with requests
- **elite** proxies only send the REMOTE_ADDR header with everything else being blank. You appear to be a user in the same country as the proxy

Transparent proxies are plentiful but largely useless for anonymity purposes whereas there are very few free elite proxies around so we need to balance the need for privacy against the number of locations we can send from.

Only some proxies also support sending requests over HTTPS so this is an additional restriction to consider:

In [7]:
# minimum proxy privacy level (transparent, anonymous or elite)
min_proxy_level = 'anonymous'

# require HTTPS?
require_https = False

In [8]:
# process proxies and check numbers
if min_proxy_level == 'elite':
    priv_proxies = [x for x in tz_proxies if x['type'] == 'elite']
elif min_proxy_level == 'anonymous':
    priv_proxies = [x for x in tz_proxies if x['type'] == 'elite' 
                                          or x['type'] == 'anonymous']
else:
    priv_proxies = tz_proxies
    
if require_https:
    final_proxies = [x for x in priv_proxies if x['https']]
else:
    final_proxies = priv_proxies
   
# IMPORTANT
# check you are happy with the final number of proxies matching your requirements!
print 'Number of available proxies matching requirements = %d' % (len(final_proxies))

Number of available proxies matching requirements = 15


In [9]:
# set up round robin proxy list (HTTP for now) using cycle
proxy_pool = cycle(final_proxies)

## Set voting hours and delay time ##

In [10]:
# if the current time at the proxy is outside these hours, we'll skip it for realism
# here we use 10am to 8pm
vote_start = 10
vote_end = 20

# we wait a random amount of time between votes between 1 and max_delay seconds
max_delay = 10

## Do some voting! ##

The fun part! Here we a simple while loop to make the GET requests we need to trigger a vote.

A quick summary of what is going to happen here:
- Before each vote we cycle to the next proxy in our list
- We check to make sure the local time where the proxy is based falls within the voting hours we have set up
- We pick a random User-Agent to use, based on browser popularity stats
- We add HTTP (and HTTPS if supported) proxy info to the request
- We vote via the proxy using a GET request
- If successful, we wait a random interval before moving onto the next proxy and trying again
- If not, we wait 30 seconds and try again

**Lets go!**

In [11]:
# voting is via a GET request being made to here:
url = "https://floridapeopleschoice.org/_vote.cfm?idsPCBuilding=263"

In [None]:
# set up counters
success_count = 0
fail_count = 0

# loop until we fail to vote 10 times
while fail_count < 10:
    # set up proxy
    proxy = next(proxy_pool)
    # check proxy has a timezone, skip if not 
    # TODO: MOVE THIS TO THE ENRICHMENT STAGE AND DISCARD NONES
    if proxy['tz']:
        tz = pytz.timezone(proxy['tz'])
        proxy_hour = datetime.now(tz).hour
        # check the time at the proxy location is within voting hours
        if proxy_hour < vote_start or proxy_hour > vote_end:
            print 'Skipping proxy %s for now as it is %d in %s' % (proxy['ip'],proxy_hour,tz)
            continue
    else:
        continue
    # make voting request and record response
    # set up proxy for request
    p = {"http": proxy['ip_port']}
    # if https is supported, add it
    if require_https and proxy['https'] == 'yes':
        p.update({"https": proxy['ip_port']})
    # set random user agent based on browser usage stats
    headers = {'User-Agent': ua.random}
    # make voting request
    response = requests.get(url,
                            headers,
                            proxies=p)
    # check for the response status code
    if response.status_code == 200:
        success_count += 1
        # add random delay before next vote
        delay = randint(1,max_delay)
        print "Success using %s in %s! That's %d votes so far - waiting %d seconds" % (proxy['ip_port'],
                                                                                       proxy['tz'],
                                                                                       success_count,
                                                                                       delay)     
        sleep(delay)
    else:
        print "Failed, that's %d times...waiting 30 seconds" % (fail_count)
        sleep(30)

print "More than 10 fails, stopping...for now"  