# Query by Donor #

## Overview ##

Explore the FEC data by specifying SQL predicates that identify **Donors**, which are sets of Individual (`indiv` table) records deemed (e.g. conjectured or asserted) to represent the same real-world person.  The advantage of using Donor (over Individual) is that it is possible to distinguish between the contribution of distinct people identities within a result set (to the degree that the Donor mappings are accurate).

This approach will create the following query contexts:

* `ctx_donor`
* `ctx_indiv`
* `ctx_contrib`

## Notebook Setup ##

* Configure database connect information and options
* Clear potentially interfering context (PostgreSQL doesn't let you replace a view definition with conflicting column names)
* Set styling for notebook

In [1]:
sqlconnect = "postgresql+psycopg2://crash@localhost/fecdb"

%load_ext sql
%config SqlMagic.autopandas=True
%config InteractiveShell.ast_node_interactivity='last_expr_or_assign'
%sql $sqlconnect

'Connected: crash@fecdb'

In [2]:
%sql drop view if exists ctx_contrib cascade
%sql drop view if exists ctx_indiv cascade
%sql drop view if exists ctx_donor cascade

 * postgresql+psycopg2://crash@localhost/fecdb
Done.
 * postgresql+psycopg2://crash@localhost/fecdb
Done.
 * postgresql+psycopg2://crash@localhost/fecdb
Done.


In [3]:
%%html
<style>
  tr, th, td {
    text-align: left !important;
  }
</style>

## Create Donor Identities ##

Since the single-donor case is pretty straightforward, let's go with a multi-donor example here.  We'll create Donor identities for each of the people we have identified in the household examined in `el_queries1.sql` and `el_queries3.sql`.

First, we identity the primary donor in the household.

In [4]:
%%sql result <<
with indiv_set as (
    select i.*
      from indiv i
     where i.name like 'SANDELL, SCOTT%'
       and i.zip_code ~ '9402[58]'
       and i.name !~ 'MRS\.'
)
select set_donor_indiv('indiv', array_agg(id)) as donor_indiv_id
  from indiv_set

 * postgresql+psycopg2://crash@localhost/fecdb
1 rows affected.
Returning data to local variable result


In [5]:
donor_indiv_id1 = int(result.loc[0][0])

10527429

Next, we identify the other donor (or, remaining donors) in the household, by exclusion

In [6]:
%%sql result <<
with indiv_set as (
    select i.*
      from indiv i
     where i.name like 'SANDELL, %'
       and i.zip_code ~ '9402[58]'
       and coalesce(i.donor_indiv_id, 0) != :donor_indiv_id1
)
select set_donor_indiv('indiv', array_agg(id)) as donor_indiv_id
  from indiv_set

 * postgresql+psycopg2://crash@localhost/fecdb
1 rows affected.
Returning data to local variable result


In [7]:
donor_indiv_id2 = int(result.loc[0][0])

10527363

## Create Context Views ##

### Create `ctx_donor` ###

Now we set the query context to be the combinatin of the two Donor identities just created (identified by `donor_indiv_id`)

In [8]:
%%sql
create or replace view ctx_donor as
select d.*
  from donor_indiv d
 where d.id in (:donor_indiv_id1, :donor_indiv_id2)

 * postgresql+psycopg2://crash@localhost/fecdb
Done.


Let's take a quick look at the context before proceeding.  Note: even though these records are coming from the `indiv` table, we really consider them to be coming from the `donor_indiv` view, thus in the **Donor** domain (and not the **Individual** domain).

In [9]:
%%sql
select id,
       name,
       city,
       state,
       zip_code,
       elect_cycles
  from ctx_donor

 * postgresql+psycopg2://crash@localhost/fecdb
2 rows affected.


Unnamed: 0,id,name,city,state,zip_code,elect_cycles
0,10527363,"SANDELL, JENNIFER",MENLO PARK,CA,94025,"[2004, 2006, 2008, 2010]"
1,10527429,"SANDELL, SCOTT",MENLO PARK,CA,94025,"[2000, 2008, 2010, 2012, 2016]"


### Create `ctx_indiv` ###

In [10]:
%%sql
create or replace view ctx_indiv as
select i.*
  from ctx_donor dx
  join indiv i on i.donor_indiv_id = dx.id

 * postgresql+psycopg2://crash@localhost/fecdb
Done.


And visually inspect...

In [11]:
%%sql
select id,
       name,
       city,
       state,
       zip_code,
       elect_cycles,
       donor_indiv_id
  from ctx_indiv
 order by donor_indiv_id

 * postgresql+psycopg2://crash@localhost/fecdb
27 rows affected.


Unnamed: 0,id,name,city,state,zip_code,elect_cycles,donor_indiv_id
0,10527365,"SANDELL, JENNIFER",PORTOLA VALLEY,CA,94028,[2018],10527363
1,10527366,"SANDELL, JENNIFER",PORTOLA VALLEY,CA,940287608,"[2016, 2018, 2020]",10527363
2,10527364,"SANDELL, JENNIFER",MENLO PARK,CA,940250,[2004],10527363
3,10527370,"SANDELL, JENNIFER AYER",MENLO PARK,CA,94025,"[2004, 2010]",10527363
4,10527371,"SANDELL, JENNIFER MS.",MENLO PARK,CA,94025,[2004],10527363
5,10527368,"SANDELL, JENNIFER A",MENLO PARK,CA,94025,"[2006, 2008]",10527363
6,10527363,"SANDELL, JENNIFER",MENLO PARK,CA,94025,"[2004, 2006, 2008, 2010]",10527363
7,10527369,"SANDELL, JENNIFER A MS.",MENLO PARK,CA,94025,[2004],10527363
8,10527447,"SANDELL, SCOTT MRS.",MENLO PARK,CA,94025,[2004],10527363
9,10527442,"SANDELL, SCOTT MR.",MENLO PARK,CA,94025,"[2000, 2002, 2004, 2006]",10527429


### Create `ctx_contrib` ###

Note that we are adding `donor_indiv_id` to this view (on top of the `indiv_contrib` columns) so that queries using this context view are able to join to and/or group by the underlying Donor record (and not just the Individual associated with the contribution record), as shown in the second validation query below.

In [12]:
%%sql
create or replace view ctx_contrib as
select ic.*,
       ix.donor_indiv_id
  from ctx_indiv ix
  join indiv_contrib ic on ic.indiv_id = ix.id

 * postgresql+psycopg2://crash@localhost/fecdb
Done.


In [13]:
%%sql
select count(*)             as contribs,
       sum(transaction_amt) as total_amt,
       array_agg(distinct elect_cycle) as elect_cycles
  from ctx_contrib

 * postgresql+psycopg2://crash@localhost/fecdb
1 rows affected.


Unnamed: 0,contribs,total_amt,elect_cycles
0,101,264450.0,"[2000, 2002, 2004, 2006, 2008, 2010, 2012, 201..."


In [14]:
%%sql
select d.id                 as donor_id,
       d.name               as donor_name,
       count(*)             as contribs,
       sum(transaction_amt) as total_amt,
       array_agg(distinct elect_cycle) as elect_cycles
  from ctx_contrib cx
  join donor_indiv d on d.id = cx.donor_indiv_id
 group by 1, 2

 * postgresql+psycopg2://crash@localhost/fecdb
2 rows affected.


Unnamed: 0,donor_id,donor_name,contribs,total_amt,elect_cycles
0,10527363,"SANDELL, JENNIFER",28,37200.0,"[2004, 2006, 2008, 2010, 2016, 2018, 2020]"
1,10527429,"SANDELL, SCOTT",73,227250.0,"[2000, 2002, 2004, 2006, 2008, 2010, 2012, 201..."


## Query Based on Context ##

### Query using `ctx_donor` ###

In [15]:
%%sql
select ic.elect_cycle,
       count(*) cycle_contribs,
       sum(ic.transaction_amt) cycle_amount,
       round(avg(ic.transaction_amt), 2) avg_amount,
       min(ic.transaction_amt) min_amount,
       max(ic.transaction_amt) max_amount
  from ctx_donor dx
  join indiv i on i.donor_indiv_id = dx.id
  join indiv_contrib ic on ic.indiv_id = i.id
 group by 1
 order by 1

 * postgresql+psycopg2://crash@localhost/fecdb
11 rows affected.


Unnamed: 0,elect_cycle,cycle_contribs,cycle_amount,avg_amount,min_amount,max_amount
0,2000,4,2000.0,500.0,250.0,1000.0
1,2002,3,5800.0,1933.33,1400.0,2500.0
2,2004,15,17400.0,1160.0,250.0,2500.0
3,2006,6,9350.0,1558.33,1000.0,2500.0
4,2008,17,17200.0,1011.76,-2300.0,2300.0
5,2010,11,20750.0,1886.36,1000.0,5000.0
6,2012,4,3650.0,912.5,500.0,1175.0
7,2014,1,2500.0,2500.0,2500.0,2500.0
8,2016,24,88200.0,3675.0,-2500.0,20000.0
9,2018,12,86000.0,7166.67,2500.0,20000.0


### Query using `ctx_indiv` ###

In [16]:
%%sql
select ic.elect_cycle,
       count(*) cycle_contribs,
       sum(ic.transaction_amt) cycle_amount,
       round(avg(ic.transaction_amt), 2) avg_amount,
       min(ic.transaction_amt) min_amount,
       max(ic.transaction_amt) max_amount
  from ctx_indiv ix
  join indiv_contrib ic on ic.indiv_id = ix.id
 group by 1
 order by 1

 * postgresql+psycopg2://crash@localhost/fecdb
11 rows affected.


Unnamed: 0,elect_cycle,cycle_contribs,cycle_amount,avg_amount,min_amount,max_amount
0,2000,4,2000.0,500.0,250.0,1000.0
1,2002,3,5800.0,1933.33,1400.0,2500.0
2,2004,15,17400.0,1160.0,250.0,2500.0
3,2006,6,9350.0,1558.33,1000.0,2500.0
4,2008,17,17200.0,1011.76,-2300.0,2300.0
5,2010,11,20750.0,1886.36,1000.0,5000.0
6,2012,4,3650.0,912.5,500.0,1175.0
7,2014,1,2500.0,2500.0,2500.0,2500.0
8,2016,24,88200.0,3675.0,-2500.0,20000.0
9,2018,12,86000.0,7166.67,2500.0,20000.0


### Query using `ctx_contrib` ###

In [17]:
%%sql
select cx.elect_cycle,
       count(*) cycle_contribs,
       sum(cx.transaction_amt) cycle_amount,
       round(avg(cx.transaction_amt), 2) avg_amount,
       min(cx.transaction_amt) min_amount,
       max(cx.transaction_amt) max_amount
  from ctx_contrib cx
 group by 1
 order by 1

 * postgresql+psycopg2://crash@localhost/fecdb
11 rows affected.


Unnamed: 0,elect_cycle,cycle_contribs,cycle_amount,avg_amount,min_amount,max_amount
0,2000,4,2000.0,500.0,250.0,1000.0
1,2002,3,5800.0,1933.33,1400.0,2500.0
2,2004,15,17400.0,1160.0,250.0,2500.0
3,2006,6,9350.0,1558.33,1000.0,2500.0
4,2008,17,17200.0,1011.76,-2300.0,2300.0
5,2010,11,20750.0,1886.36,1000.0,5000.0
6,2012,4,3650.0,912.5,500.0,1175.0
7,2014,1,2500.0,2500.0,2500.0,2500.0
8,2016,24,88200.0,3675.0,-2500.0,20000.0
9,2018,12,86000.0,7166.67,2500.0,20000.0


In [18]:
%%sql
select d.id as donor_id,
       d.name as donor_name,
       cx.elect_cycle,
       count(*) cycle_contribs,
       sum(cx.transaction_amt) cycle_amount,
       round(avg(cx.transaction_amt), 2) avg_amount,
       min(cx.transaction_amt) min_amount,
       max(cx.transaction_amt) max_amount
  from ctx_contrib cx
  join donor_indiv d on d.id = cx.donor_indiv_id
 group by 1, 2, 3
 order by 5 desc

 * postgresql+psycopg2://crash@localhost/fecdb
18 rows affected.


Unnamed: 0,donor_id,donor_name,elect_cycle,cycle_contribs,cycle_amount,avg_amount,min_amount,max_amount
0,10527429,"SANDELL, SCOTT",2016,21,83500.0,3976.19,-2500.0,20000.0
1,10527429,"SANDELL, SCOTT",2018,10,80600.0,8060.0,2500.0,20000.0
2,10527429,"SANDELL, SCOTT",2010,9,15950.0,1772.22,1000.0,5000.0
3,10527429,"SANDELL, SCOTT",2008,10,10650.0,1065.0,-2300.0,2300.0
4,10527429,"SANDELL, SCOTT",2020,3,10600.0,3533.33,2800.0,5000.0
5,10527363,"SANDELL, JENNIFER",2004,10,10250.0,1025.0,250.0,2500.0
6,10527429,"SANDELL, SCOTT",2004,5,7150.0,1430.0,500.0,2000.0
7,10527363,"SANDELL, JENNIFER",2008,7,6550.0,935.71,250.0,2300.0
8,10527429,"SANDELL, SCOTT",2002,3,5800.0,1933.33,1400.0,2500.0
9,10527363,"SANDELL, JENNIFER",2018,2,5400.0,2700.0,2700.0,2700.0
