# Insider Trading & Campaign Contributions

Explorations in the possible relationships between NY State Campaign Contribution [dataset](http://www1.nyc.gov/nyctp/download.htm) and SEC Insider trading [dataset](http://www.secform4.com/sec-filings.htm)

**Note** - *The data from the SEC Insider Trading dataset was scraped from the web using [XML-to-csv.py](http://localhost:8888/edit/XML-to-csv.py)*

## Load data

In [1]:
import pandas as pd

### SEC Insider Trading

In [10]:
SEC_data = pd.read_csv('data/SEC_filings.csv')

In [13]:
print "Total records: {tot} Unique companies: {uni}".format( tot=len(SEC_data) , uni=len(SEC_data.Company.unique()) )

Total records: 155 Unique companies: 117


### Campaign Contribution

In [15]:
Campaign_data1 = pd.read_csv('data/dwd_FMS_Transactions_25000.csv')
Campaign_data2 = pd.read_csv('data/dwd_FMS_Transactions_5000_25000.csv')
Campaign_data3 = pd.read_csv('data/dwd_FMS_Transactions_5000.csv')

In [22]:
# Check have the same columns and concatenate datasets
assert (Campaign_data1.columns == Campaign_data2.columns).all and (Campaign_data2.columns == Campaign_data3.columns).all
Campaign_data = pd.concat([Campaign_data1, Campaign_data2, Campaign_data3])

In [23]:
print "Total records: {tot} Unique companies: {uni}".format( tot=len(Campaign_data) , uni=len(Campaign_data.Org_Name.unique()) )

Total records: 54587 Unique companies: 132


## First exploration

In [27]:
_check = 0
for c in SEC_data.Company.unique():
    for cc in Campaign_data.Org_Name.unique():
        if c in cc:
            print "SEC: {c} <-> Campaign: {camp}".format( sec=c, camp=cc )
        else:
            _check += 1

In [29]:
# Add check
assert _check == 132 * 117

## Automate comparison
Let's automate this process:

In [103]:
# Returns generator of tuple matches
def run_compare(comparator, datasets=None, verbose=False):
    _matches = 0
    
    # If no datasets were passed, use unique company names
    if not datasets:
        datasets = (SEC_data.Company.unique(), Campaign_data.Org_Name.unique())
    
    
    for c in datasets[0]:
        for cc in datasets[1]:
            
            is_match, token = comparator(c,cc)
            
            if is_match:
                _matches += 1
                if verbose:
                    print "MATCH [{tk}]: {c1} <-> {c2}".format( tk=token, c1=c , c2=cc )
                yield (c, cc)
    
    
    if _matches == 0:
        print "No matches found..."


Let's rerun the same simple comparison with our new `compare_run()`:

In [105]:
simple_comp = lambda x,y: (x in y,x)

comp = run_compare(simple_comp, verbose=True)
list(comp)

No matches found...


[]

## Comparators

The comparator must return a two-element tuple or list, where the first element is a `True` / `False`, and the second element is the token matched (if there is no match, then the token will be the empty string "").

Example:
`comparator("Abracadabra inc.","Abra LLC")` -> `(True, "abra")`

In [112]:
def word_comp(term1, term2):
    # Common terms not to be considered matches
    common_terms = ['inc', 'inc.', 'corp', 'co', 'systems', 'fund', 'ltd']
    
    # Get lowercase separate words
    words1 = [ w for w in term1.lower().split() if w not in common_terms and len(w) > 1 ]
    words2 = [ w for w in term2.lower().split() if w not in common_terms and len(w) > 1 ]
    for w in words1:
        if any(map(lambda x: w in x, words2)):
            return (True, w)
    
    for w in words2:
        if any(map(lambda x: w in x, words1)):
            return (True, w)
    
    return (False, "")

## Split words + case insensitive compare

In [113]:
comp = run_compare(word_comp, verbose=True)
_comp = list(comp)

print len(_comp)
# _comp

MATCH [ar]: Iheartmedia Inc. <-> AR KROPP LLC
MATCH [technologies]: Netsol Technologies Inc <-> DERIVE TECHNOLOGIES LLC
MATCH [net]: Netsol Technologies Inc <-> G NET CONSTRUCTION CORP ALLIANCE & SON CONSTRUCTIO
MATCH [technologies]: Netsol Technologies Inc <-> ACCELERATED TECHNOLOGIES OF NEW YORK, INC.
MATCH [on]: Consolidated Edison Inc <-> ARC ON 4TH STREET INC
MATCH [son]: Consolidated Edison Inc <-> G NET CONSTRUCTION CORP ALLIANCE & SON CONSTRUCTIO
MATCH [ar]: Village Super Market Inc <-> AR KROPP LLC
MATCH [super]: Village Super Market Inc <-> SUPERSTRUCTURES ENGINEERING ARCHITECTURE, PLLC
MATCH [la]: Village Super Market Inc <-> LA CHIANA REALTY INC
MATCH [market]: Village Super Market Inc <-> DELL MARKETING LP
MATCH [market]: Village Super Market Inc <-> F & H SUPPLY CO. U.S.MARKETING SERVICES CO.
MATCH [lage]: Village Super Market Inc <-> DE LAGE LANDEN OPERATIONAL SERVICES LLC
MATCH [pc]: Pc Connection Inc <-> IPC NEW YORK PROPERTIES LLC
MATCH [on]: Pc Connection Inc <-> ARC