# Query by Household #

## Overview ##

Explore the FEC data by specifying SQL predicates identifying "households" (defined based on `indiv` records conjectured to represent real-world people residing at the same physical address)

This approach will create the following query contexts:

* `ctx_household`
* `ctx_indiv`
* `ctx_contrib`

## Notebook Setup ##

### Configure database connect info/options ###

Note: database connect string can be specified on the initial `%sql` command:

```python
database_url = "postgresql+psycopg2://user@localhost/fecdb"
%sql $database_url

```

Or, connect string is taken from DATABASE_URL environment variable (if not specified for `%sql`):

```python
%sql

```

In [1]:
%load_ext sql
%config SqlMagic.autopandas=True
%config InteractiveShell.ast_node_interactivity='last_expr_or_assign'
# connect string taken from DATABASE_URL environment variable
%sql

'Connected: crash@fecdb'

### Set styling ###

In [2]:
%%html
<style>
  tr, th, td {
    text-align: left !important;
  }
</style>

## Validate Context ##

In [3]:
%%sql
select count(*)
  from ctx_household

 * postgresql+psycopg2://crash@localhost/fecdb
1 rows affected.


Unnamed: 0,count
0,1


## Queries / Use Cases ##

### Demographic Summary by State ###

In [4]:
%%sql
select hx.state,
       count(*)
  from ctx_household hx
 group by 1
 order by 2 desc

 * postgresql+psycopg2://crash@localhost/fecdb
1 rows affected.


Unnamed: 0,state,count
0,CA,1


### Top Contributing Households across Election Cycles ###

Note that we could simplify this query by introducing a `ctx_household_contrib` context view (analogous to `ctx_donor_contrib`).  *\[It is actually a somewhat-deliberate design choice not to extend all of the Donor constructs over to Household, even though the two entities have identical underlying structures&mdash;we may complete and maintain the analogy later, if analysis and reporting by Household becomes more important and/or interesting\]*

In [5]:
%%sql
select hx.id as hh_id,
       hx.name as hh_name,
       count(*) contribs,
       sum(ic.transaction_amt) total_amount,
       round(avg(ic.transaction_amt), 2) avg_amount,
       max(ic.transaction_amt) max_amount,
       array_agg(distinct ic.elect_cycle) as elect_cycles,
       round(sum(ic.transaction_amt) / count(distinct ic.elect_cycle), 2) avg_cycle_amount
  from ctx_household hx
  join indiv i on i.hh_indiv_id = hx.id
  join indiv_contrib ic on ic.indiv_id = i.id
 group by 1, 2
 order by 4 desc
 limit 50

 * postgresql+psycopg2://crash@localhost/fecdb
1 rows affected.


Unnamed: 0,hh_id,hh_name,contribs,total_amount,avg_amount,max_amount,elect_cycles,avg_cycle_amount
0,10527363,"SANDELL, JENNIFER",101,264450.0,2618.32,20000.0,"[2000, 2002, 2004, 2006, 2008, 2010, 2012, 201...",24040.91
