# Define Individual Context &ndash; Multi-Person Use Case #

## Overview ##

Explore the FEC data by specifying SQL predicates that identify **Individuals**, which are people identities extracted&mdash;and somewhat cleansed&mdash;from the [Individual Contributions](https://www.fec.gov/campaign-finance-data/contributions-individuals-file-description/) file.  Inidividual records (stored in the `indiv` table), are basically distinct combinations of name and address information (city, state, zipcode) that have not been aggressively deduplicated.  Thus, there will be multiple records for a real-world person if there are variants (or typos or deception) in the identifying information for contribution records.

Querying by Individual can be used to target all of the `indiv` records (and associated contribution data in `indiv_contrib`) for a single person, or for a set of people to be explored collectively.  An example of the second usage will be presented here (the first is covered in the preceding `dc1` notebook).  One of the limitation of Querying by Individual is that it is difficult to distinguish between the contribution of distinct people identities within a result set.

Note that this approach will create the following query contexts (each of which may be used in formulating specific queries for investigation or reporting):

**Principal Context View**

* `ctx_indiv`

**Dependent Context Views**

* `ctx_indiv_contrib`

## Notebook Setup ##

### Configure database connect info/options ###

Note: database connect string can be specified on the initial `%sql` command:

```python
database_url = "postgresql+psycopg2://user@localhost/fecdb"
%sql $database_url

```

Or, connect string is taken from DATABASE_URL environment variable (if not specified for `%sql`):

```python
%sql

```

In [1]:
%load_ext sql
%config SqlMagic.autopandas=True
%config InteractiveShell.ast_node_interactivity='last_expr_or_assign'
# connect string taken from DATABASE_URL environment variable
%sql

'Connected: crash@fecdb'

### Clear context ###

Note that we drop *all* context views so we won't have any inconsistencies after this notebook is run.  After defining `ctx_indiv` below, we will define all dependent views (see Overview, above), and leave any higher-order or orthogonal views undefined

In [2]:
%sql drop view if exists ctx_dseg_memb     cascade
%sql drop view if exists ctx_dseg          cascade
%sql drop view if exists ctx_donor_contrib cascade
%sql drop view if exists ctx_donor         cascade
%sql drop view if exists ctx_household     cascade
%sql drop view if exists ctx_iseg_memb     cascade
%sql drop view if exists ctx_iseg          cascade
%sql drop view if exists ctx_indiv_contrib cascade
%sql drop view if exists ctx_indiv         cascade

 * postgresql+psycopg2://crash@localhost/fecdb
Done.
 * postgresql+psycopg2://crash@localhost/fecdb
Done.
 * postgresql+psycopg2://crash@localhost/fecdb
Done.
 * postgresql+psycopg2://crash@localhost/fecdb
Done.
 * postgresql+psycopg2://crash@localhost/fecdb
Done.
 * postgresql+psycopg2://crash@localhost/fecdb
Done.
 * postgresql+psycopg2://crash@localhost/fecdb
Done.
 * postgresql+psycopg2://crash@localhost/fecdb
Done.
 * postgresql+psycopg2://crash@localhost/fecdb
Done.


### Set styling ###

In [3]:
%%html
<style>
  tr, th, td {
    text-align: left !important;
  }
</style>

## Create Principal View (`ctx_indiv`) ##

For this use case, we'll identify the `indiv` records associated with the household (multiple people) previously queried (in `el_queries1.sql` and `el_queries3.sql`)

In [4]:
%%sql
create or replace view ctx_indiv as
select *
  from indiv
 where name like 'SANDELL, %'
   and zip_code ~ '9402[58]'

 * postgresql+psycopg2://crash@localhost/fecdb
Done.


Let's take a quick look at the context we just set (for validation) before proceeding

In [5]:
%%sql
select id,
       name,
       city,
       state,
       zip_code,
       elect_cycles
  from ctx_indiv

 * postgresql+psycopg2://crash@localhost/fecdb
27 rows affected.


Unnamed: 0,id,name,city,state,zip_code,elect_cycles
0,10527369,"SANDELL, JENNIFER A MS.",MENLO PARK,CA,94025,[2004]
1,10527363,"SANDELL, JENNIFER",MENLO PARK,CA,94025,"[2004, 2006, 2008, 2010]"
2,10527371,"SANDELL, JENNIFER MS.",MENLO PARK,CA,94025,[2004]
3,10527368,"SANDELL, JENNIFER A",MENLO PARK,CA,94025,"[2006, 2008]"
4,10527370,"SANDELL, JENNIFER AYER",MENLO PARK,CA,94025,"[2004, 2010]"
5,10527364,"SANDELL, JENNIFER",MENLO PARK,CA,940250,[2004]
6,10527366,"SANDELL, JENNIFER",PORTOLA VALLEY,CA,940287608,"[2016, 2018, 2020]"
7,10527365,"SANDELL, JENNIFER",PORTOLA VALLEY,CA,94028,[2018]
8,10527440,"SANDELL, SCOTT D.",PORTOLA VALLEY,CA,94028,[2016]
9,10527433,"SANDELL, SCOTT D",MENLO PARK,CA,94025,"[2004, 2006, 2008, 2010]"


## Create Dependent Views ##

### Create `ctx_indiv_contrib` ###

Now we'll create the context view for the contributions from the targeted "Individual" records

In [6]:
%%sql
create or replace view ctx_indiv_contrib as
select ic.*
  from ctx_indiv ix
  join indiv_contrib ic on ic.indiv_id = ix.id

 * postgresql+psycopg2://crash@localhost/fecdb
Done.


And some quick validation on the view

In [7]:
%%sql
select count(*)             as contribs,
       sum(transaction_amt) as total_amt,
       array_agg(distinct elect_cycle) as elect_cycles
  from ctx_indiv_contrib

 * postgresql+psycopg2://crash@localhost/fecdb
1 rows affected.


Unnamed: 0,contribs,total_amt,elect_cycles
0,101,264450.0,"[2000, 2002, 2004, 2006, 2008, 2010, 2012, 201..."
