# Define Donor Context &ndash; Multi-Person Use Case #

## Overview ##

Explore the FEC data by specifying SQL predicates that identify **Donors**, which are sets of Individual (`indiv` table) records deemed (e.g. conjectured or asserted) to represent the same real-world person.  The advantage of using Donor (over Individual) is that it is possible to distinguish between the contribution of distinct people identities within a result set (to the degree that the Donor mappings are accurate).

This approach will create the following query contexts:

**Principal Context View**

* `ctx_donor`

**Dependent Context Views**

* `ctx_indiv`
* `ctx_indiv_contrib`
* `ctx_donor_contrib`

## Notebook Setup ##

### Configure database connect info/options ###

Note: database connect string can be specified on the initial `%sql` command:

```python
database_url = "postgresql+psycopg2://user@localhost/fecdb"
%sql $database_url

```

Or, connect string is taken from DATABASE_URL environment variable (if not specified for `%sql`):

```python
%sql

```

In [1]:
%load_ext sql
%config SqlMagic.autopandas=True
%config InteractiveShell.ast_node_interactivity='last_expr_or_assign'
# connect string taken from DATABASE_URL environment variable
%sql

'Connected: crash@fecdb'

### Clear context ###

Note that we drop *all* context views so we won't have any inconsistencies after this notebook is run.  After defining `ctx_indiv` below, we will define all dependent views (see Overview, above), and leave any higher-order or orthogonal views undefined

In [2]:
%sql drop view if exists ctx_dseg_memb     cascade
%sql drop view if exists ctx_dseg          cascade
%sql drop view if exists ctx_donor_contrib cascade
%sql drop view if exists ctx_donor         cascade
%sql drop view if exists ctx_household     cascade
%sql drop view if exists ctx_iseg_memb     cascade
%sql drop view if exists ctx_iseg          cascade
%sql drop view if exists ctx_indiv_contrib cascade
%sql drop view if exists ctx_indiv         cascade

 * postgresql+psycopg2://crash@localhost/fecdb
Done.
 * postgresql+psycopg2://crash@localhost/fecdb
Done.
 * postgresql+psycopg2://crash@localhost/fecdb
Done.
 * postgresql+psycopg2://crash@localhost/fecdb
Done.
 * postgresql+psycopg2://crash@localhost/fecdb
Done.
 * postgresql+psycopg2://crash@localhost/fecdb
Done.
 * postgresql+psycopg2://crash@localhost/fecdb
Done.
 * postgresql+psycopg2://crash@localhost/fecdb
Done.
 * postgresql+psycopg2://crash@localhost/fecdb
Done.


### Set styling ###

In [3]:
%%html
<style>
  tr, th, td {
    text-align: left !important;
  }
</style>

## Create Donor Identities ##

Since the single-donor case is pretty straightforward, let's go with a multi-donor example here.  We'll create Donor identities for each of the people we have identified in the household examined in `el_queries1.sql` and `el_queries3.sql`.

First, we identity the primary donor in the household.

In [4]:
%%sql result <<
with indiv_set as (
    select i.*
      from indiv i
     where i.name like 'SANDELL, SCOTT%'
       and i.zip_code ~ '9402[58]'
       and i.name !~ 'MRS\.'
)
select set_donor_indiv('indiv', array_agg(id)) as donor_indiv_id
  from indiv_set

 * postgresql+psycopg2://crash@localhost/fecdb
1 rows affected.
Returning data to local variable result


In [5]:
donor_indiv_id1 = int(result.loc[0][0])

10527429

Next, we identify the other donor (or, remaining donors) in the household, by exclusion

In [6]:
%%sql result <<
with indiv_set as (
    select i.*
      from indiv i
     where i.name like 'SANDELL, %'
       and i.zip_code ~ '9402[58]'
       and coalesce(i.donor_indiv_id, 0) != :donor_indiv_id1
)
select set_donor_indiv('indiv', array_agg(id)) as donor_indiv_id
  from indiv_set

 * postgresql+psycopg2://crash@localhost/fecdb
1 rows affected.
Returning data to local variable result


In [7]:
donor_indiv_id2 = int(result.loc[0][0])

10527363

## Create Principal View (`ctx_donor`) ##

Now we set the query context to be the combinatin of the two Donor identities just created (identified by `donor_indiv_id`)

In [8]:
%%sql
create or replace view ctx_donor as
select d.*
  from donor_indiv d
 where d.id in (:donor_indiv_id1, :donor_indiv_id2)

 * postgresql+psycopg2://crash@localhost/fecdb
Done.


Let's take a quick look at the context before proceeding.  Note: even though these records are coming from the `indiv` table, we really consider them to be coming from the `donor_indiv` view, thus in the **Donor** domain (and not the **Individual** domain).

In [9]:
%%sql
select id,
       name,
       city,
       state,
       zip_code,
       elect_cycles
  from ctx_donor

 * postgresql+psycopg2://crash@localhost/fecdb
2 rows affected.


Unnamed: 0,id,name,city,state,zip_code,elect_cycles
0,10527363,"SANDELL, JENNIFER",MENLO PARK,CA,94025,"[2004, 2006, 2008, 2010]"
1,10527429,"SANDELL, SCOTT",MENLO PARK,CA,94025,"[2000, 2008, 2010, 2012, 2016]"


## Create Dependent Views ##

### Create `ctx_indiv` ###

In [10]:
%%sql
create or replace view ctx_indiv as
select i.*
  from ctx_donor dx
  join indiv i on i.donor_indiv_id = dx.id

 * postgresql+psycopg2://crash@localhost/fecdb
Done.


And visually inspect...

In [11]:
%%sql
select id,
       name,
       city,
       state,
       zip_code,
       elect_cycles,
       donor_indiv_id
  from ctx_indiv
 order by donor_indiv_id

 * postgresql+psycopg2://crash@localhost/fecdb
27 rows affected.


Unnamed: 0,id,name,city,state,zip_code,elect_cycles,donor_indiv_id
0,10527365,"SANDELL, JENNIFER",PORTOLA VALLEY,CA,94028,[2018],10527363
1,10527366,"SANDELL, JENNIFER",PORTOLA VALLEY,CA,940287608,"[2016, 2018, 2020]",10527363
2,10527364,"SANDELL, JENNIFER",MENLO PARK,CA,940250,[2004],10527363
3,10527370,"SANDELL, JENNIFER AYER",MENLO PARK,CA,94025,"[2004, 2010]",10527363
4,10527371,"SANDELL, JENNIFER MS.",MENLO PARK,CA,94025,[2004],10527363
5,10527368,"SANDELL, JENNIFER A",MENLO PARK,CA,94025,"[2006, 2008]",10527363
6,10527363,"SANDELL, JENNIFER",MENLO PARK,CA,94025,"[2004, 2006, 2008, 2010]",10527363
7,10527369,"SANDELL, JENNIFER A MS.",MENLO PARK,CA,94025,[2004],10527363
8,10527447,"SANDELL, SCOTT MRS.",MENLO PARK,CA,94025,[2004],10527363
9,10527442,"SANDELL, SCOTT MR.",MENLO PARK,CA,94025,"[2000, 2002, 2004, 2006]",10527429


### Create `ctx_indiv_contrib` ###

In [12]:
%%sql
create or replace view ctx_indiv_contrib as
select ic.*
  from ctx_indiv ix
  join indiv_contrib ic on ic.indiv_id = ix.id

 * postgresql+psycopg2://crash@localhost/fecdb
Done.


And validate...

In [13]:
%%sql
select count(*)             as contribs,
       sum(transaction_amt) as total_amt,
       array_agg(distinct elect_cycle) as elect_cycles
  from ctx_indiv_contrib

 * postgresql+psycopg2://crash@localhost/fecdb
1 rows affected.


Unnamed: 0,contribs,total_amt,elect_cycles
0,101,264450.0,"[2000, 2002, 2004, 2006, 2008, 2010, 2012, 201..."


### Create `ctx_donor_contrib` ###

This is really the same as `ctx_indiv_contrib`, except that we are adding `donor_indiv_id` on top of the `indiv_contrib` columns so that queries using this context view are able to join to (and/or group by) the underlying Donor record (and not just the Individual associated with the contribution record).

In [14]:
%%sql
create or replace view ctx_donor_contrib as
select ic.*,
       ix.donor_indiv_id
  from ctx_indiv ix
  join indiv_contrib ic on ic.indiv_id = ix.id

 * postgresql+psycopg2://crash@localhost/fecdb
Done.


In [15]:
%%sql
select d.id                 as donor_id,
       d.name               as donor_name,
       count(*)             as contribs,
       sum(transaction_amt) as total_amt,
       array_agg(distinct elect_cycle) as elect_cycles
  from ctx_donor_contrib cx
  join donor_indiv d on d.id = cx.donor_indiv_id
 group by 1, 2

 * postgresql+psycopg2://crash@localhost/fecdb
2 rows affected.


Unnamed: 0,donor_id,donor_name,contribs,total_amt,elect_cycles
0,10527363,"SANDELL, JENNIFER",28,37200.0,"[2004, 2006, 2008, 2010, 2016, 2018, 2020]"
1,10527429,"SANDELL, SCOTT",73,227250.0,"[2000, 2002, 2004, 2006, 2008, 2010, 2012, 201..."
