# Query by Individual #

## Overview ##

Explore the FEC data by specifying SQL predicates that identify **Individuals**, which are people identities extracted&mdash;and somewhat cleansed&mdash;from the [Individual Contributions](https://www.fec.gov/campaign-finance-data/contributions-individuals-file-description/) file.  Inidividual records (stored in the `indiv` table), are basically distinct combinations of name and address information (city, state, zipcode) that have not been aggressively deduplicated.  Thus, there will be multiple records for a real-world person if there are variants (or typos or deception) in the identifying information for contribution records.

Querying by Individual can be used to target all of the `indiv` records (and associated contribution data in `indiv_contrib`) for a single person, or for a set of people to be explored collectively.  Examples of both usages will be presented here.

Note that this approach will create the following query contexts (each of which may be used in formulating specific queries for investigation or reporting):

* `ctx_indiv`
* `ctx_contrib`

One of the limitation of Querying by Individual is that it is difficult to distinguish between the contribution of distinct people identities within a result set.

## Notebook Setup ##

### Configure database connect info/options ###

Note: database connect string can be specified on the initial `%sql` command:

```python
database_url = "postgresql+psycopg2://user@localhost/fecdb"
%sql $database_url

```

Or, connect string is taken from DATABASE_URL environment variable (if not specified for `%sql`):

```python
%sql

```

In [1]:
%load_ext sql
%config SqlMagic.autopandas=True
%config InteractiveShell.ast_node_interactivity='last_expr_or_assign'
# connect string taken from DATABASE_URL environment variable
%sql

'Connected: crash@fecdb'

### Set styling ###

In [2]:
%%html
<style>
  tr, th, td {
    text-align: left !important;
  }
</style>

## Validate Context ##

In [3]:
%%sql
select count(*)
  from ctx_indiv

 * postgresql+psycopg2://crash@localhost/fecdb
1 rows affected.


Unnamed: 0,count
0,54


## Queries / Use Cases ##

### Demographic Summary by State ###

Note: this is not a great example, since the Individual records are not "de-duped"&mdash;`ctx_indiv` may very well contain records representing the same real-world person (i.e. "Donor").  I think this highlights the fact that an Individual Context may be generally less useful than either Individual Contribution or Donor (and Donor Contribution) Contexts.

In [4]:
%%sql
select ix.state,
       count(*)
  from ctx_indiv ix
 group by 1
 order by 2 desc

 * postgresql+psycopg2://crash@localhost/fecdb
14 rows affected.


Unnamed: 0,state,count
0,CA,20
1,NY,11
2,MA,6
3,IL,4
4,PA,2
5,MT,2
6,SC,2
7,WA,1
8,CT,1
9,DC,1
