# 1. Introduction

This document aims to perform an initial exploration of the features and values within the Form 990ez to identify relevant variables for a detailed analysis of philanthropic giving within the environmental and social justice sectors. Additionally, we intend to assess the level of transparency and accountability of various organizations, leveraging the data provided by the form 990ez to inform our inquiry.

Form 990 is a detailed annual information return filed by larger tax-exempt organizations with gross receipts over $200,000 or total assets over $500,000, providing comprehensive insights into their finances, governance, and operations. Form 990-EZ is a simpler, shorter form for smaller organizations with gross receipts less than $200,000 and total assets less than $500,000, requiring less detailed information. This file was pulled from https://www.irs.gov/statistics/soi-tax-stats-annual-extract-of-tax-exempt-organization-financial-data.


# 2. First Glance

## 2.1. General Summary

In [3]:
# Libraries for data manipulation.
import pandas as pd
import numpy as np

# Libraries for data visualisation.
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.io as pio

# Libraries for quarto rending
from IPython.display import Markdown,display
from tabulate import tabulate
import plotly.io as pio

# Remove warnings.
import warnings
warnings.filterwarnings("ignore", category=UserWarning)

# Read in data.
form_990_2022ez = pd.read_csv('../../data/22eoextractez.csv')

# Print data dimensions.
shape_caption = "Data Dimensions:"
shape_df = pd.DataFrame({
        'Dimension': ['Rows','Columns'],
        'Count': [form_990_2022ez.shape[0], form_990_2022ez.shape[1]]
    })
shape_df['Count'] = shape_df['Count'].apply(lambda x: f"{x:,}")
shape_markdown = shape_caption + "\n\n" + shape_df.to_markdown(index=False)
display(Markdown(shape_markdown))

# Print a sample of the data.
first_five_rows_caption = "First Five Rows of Data:"
first_five_rows_markdown = first_five_rows_caption + "\n\n" + form_990_2022ez.head().to_markdown(index=False)
display(Markdown(first_five_rows_markdown))

# Print metadata.
metadata_caption = "Metadata:"
column_metadata = []

for col in form_990_2022ez.columns:
    # Gather metadata for each col.
    col_metadata = {
        'Column Name': col,
        'Data Type': str(form_990_2022ez[col].dtype),
        'Unique Values': form_990_2022ez[col].nunique(),
        'Missing Values': form_990_2022ez[col].isnull().sum()
    }
    # Append metadata to list.
    column_metadata.append(col_metadata)

# Convert list to pd df and then markdown table.
metadata_df = pd.DataFrame(column_metadata)
metadata_df['Unique Values'] = metadata_df['Unique Values'].apply(lambda x: f"{x:,}")
metadata_df['Missing Values'] = metadata_df['Missing Values'].apply(lambda x: f"{x:,}")
metadata_markdown = metadata_caption + "\n\n" + metadata_df.to_markdown(index=False)
display(Markdown(metadata_markdown))

Data Dimensions:

| Dimension   | Count   |
|:------------|:--------|
| Rows        | 231,289 |
| Columns     | 72      |

First Five Rows of Data:

| efile   |      ein |   taxpd |   subseccd |   totcntrbs |   prgmservrev |   duesassesmnts |   othrinvstinc |   grsamtsalesastothr |   basisalesexpnsothr |   gnsaleofastothr |   grsincgaming |   grsrevnuefndrsng |   direxpns |   netincfndrsng |   grsalesminusret |   costgoodsold |   grsprft |   othrevnue |   totrevnue |   totexpns |   totexcessyr |   othrchgsnetassetfnd |   networthend |   totassetsend |   totliabend |   totnetassetsend | actvtynotprevrptcd   | chngsinorgcd   | unrelbusincd   | filedf990tcd   | contractioncd   |   politicalexpend | filedf1120polcd   | loanstoofficerscd   |   loanstoofficers |   initiationfee |   grspublicrcpts | s4958excessbenefcd   | prohibtdtxshltrcd   |   nonpfrea |   totnoforgscnt |   totsupport |   gftgrntrcvd170 |   txrevnuelevied170 |   srvcsval170 |   pubsuppsubtot170 |   excds2pct170 |   pubsupplesspct170 |   samepubsuppsubtot170 |   grsinc170 |   netincunrelatd170 |   othrinc170 |   totsupport170 |   grsrcptsrelatd170 |   totgftgrntrcvd509 |   grsrcptsadmiss509 |   grsrcptsactvts509 |   txrevnuelevied509 |   srvcsval509 |   pubsuppsubtot509 |   rcvdfrmdisqualsub509 |   excds1pct509 |   subtotpub509 |   pubsupplesssub509 |   samepubsuppsubtot509 |   grsinc509 |   unreltxincls511tx509 |   subtotsuppinc509 |   netincunreltd509 |   othrinc509 |   totsupp509 |
|:--------|---------:|--------:|-----------:|------------:|--------------:|----------------:|---------------:|---------------------:|---------------------:|------------------:|---------------:|-------------------:|-----------:|----------------:|------------------:|---------------:|----------:|------------:|------------:|-----------:|--------------:|----------------------:|--------------:|---------------:|-------------:|------------------:|:---------------------|:---------------|:---------------|:---------------|:----------------|------------------:|:------------------|:--------------------|------------------:|----------------:|-----------------:|:---------------------|:--------------------|-----------:|----------------:|-------------:|-----------------:|--------------------:|--------------:|-------------------:|---------------:|--------------------:|-----------------------:|------------:|--------------------:|-------------:|----------------:|--------------------:|--------------------:|--------------------:|--------------------:|--------------------:|--------------:|-------------------:|-----------------------:|---------------:|---------------:|--------------------:|-----------------------:|------------:|-----------------------:|-------------------:|-------------------:|-------------:|-------------:|
| P       | 10011694 |  201609 |          3 |           0 |          4677 |            3866 |             36 |                    0 |                    0 |                 0 |              0 |                  0 |          0 |               0 |              2754 |            820 |      1934 |           0 |       10513 |       6528 |          3985 |                     0 |         29135 |          29135 |            0 |             29135 | N                    | N              | N              | N              | N               |                 0 | N                 | N                   |                 0 |               0 |                0 | N                    | N                   |         09 |               0 |            0 |                0 |                   0 |             0 |                  0 |              0 |                   0 |                      0 |           0 |                   0 |            0 |               0 |                   0 |               18214 |               31347 |                   0 |                   0 |             0 |              49561 |                  12520 |              0 |          12520 |               37041 |                  49561 |         286 |                      0 |                286 |                  0 |            0 |        49847 |
| P       | 10011694 |  201709 |          3 |           0 |          3089 |            2060 |             18 |                    0 |                    0 |                 0 |              0 |                  0 |          0 |               0 |              2659 |            782 |      1877 |           0 |        7044 |       8505 |         -1461 |                     0 |         27674 |          27674 |            0 |             27674 | N                    | N              | N              | N              | N               |                 0 | N                 | N                   |                 0 |               0 |                0 | N                    | N                   |         09 |               0 |            0 |                0 |                   0 |             0 |                  0 |              0 |                   0 |                      0 |           0 |                   0 |            0 |               0 |                   0 |               16429 |               30873 |                   0 |                   0 |             0 |              47302 |                  12548 |              0 |          12548 |               34754 |                  47302 |         280 |                      0 |                280 |                  0 |            0 |        47582 |
| P       | 10011694 |  201809 |          3 |           0 |          4041 |            2540 |             18 |                    0 |                    0 |                 0 |              0 |                  0 |          0 |               0 |              2359 |            685 |      1674 |           0 |        8273 |       6813 |          1460 |                     0 |         29134 |          29134 |            0 |             29134 | N                    | N              | N              | N              | N               |                 0 | N                 | N                   |                 0 |               0 |                0 | N                    | N                   |         09 |               0 |            0 |                0 |                   0 |             0 |                  0 |              0 |                   0 |                      0 |           0 |                   0 |            0 |               0 |                   0 |               15626 |               31483 |                   0 |                   0 |             0 |              47109 |                  12764 |              0 |          12764 |               34345 |                  47109 |         169 |                      0 |                169 |                  0 |            0 |        47278 |
| P       | 10011694 |  201909 |          3 |           0 |          3182 |            2205 |             19 |                    0 |                    0 |                 0 |              0 |                  0 |          0 |               0 |              4303 |           1282 |      3021 |           0 |        8427 |       6292 |          2135 |                     0 |         31269 |          31269 |            0 |             31269 | N                    | N              | N              | N              | N               |                 0 | N                 | N                   |                 0 |               0 |                0 | N                    | N                   |         09 |               0 |            0 |                0 |                   0 |             0 |                  0 |              0 |                   0 |                      0 |           0 |                   0 |            0 |               0 |                   0 |               14301 |               31988 |                   0 |                   0 |             0 |              46289 |                  12980 |              0 |          12980 |               33309 |                  46289 |         133 |                      0 |                133 |                  0 |            0 |        46422 |
| P       | 10011694 |  202009 |          3 |           0 |          2795 |            3670 |             20 |                    0 |                    0 |                 0 |              0 |                  0 |          0 |               0 |              2668 |            821 |      1847 |           0 |        8332 |       6410 |          1922 |                     0 |         33191 |          33191 |            0 |             33191 | N                    | N              | N              | N              | N               |                 0 | N                 | N                   |                 0 |               0 |                0 | N                    | N                   |         09 |               0 |            0 |                0 |                   0 |             0 |                  0 |              0 |                   0 |                      0 |           0 |                   0 |            0 |               0 |                   0 |               14341 |               32527 |                   0 |                   0 |             0 |              46868 |                  13196 |              0 |          13196 |               33672 |                  46868 |         111 |                      0 |              46979 |                  0 |            0 |        93847 |

Metadata:

| Column Name          | Data Type   | Unique Values   |   Missing Values |
|:---------------------|:------------|:----------------|-----------------:|
| efile                | object      | 2               |                0 |
| ein                  | int64       | 193,368         |                0 |
| taxpd                | int64       | 135             |                0 |
| subseccd             | int64       | 21              |                0 |
| totcntrbs            | int64       | 81,399          |                0 |
| prgmservrev          | int64       | 46,383          |                0 |
| duesassesmnts        | int64       | 35,288          |                0 |
| othrinvstinc         | int64       | 10,793          |                0 |
| grsamtsalesastothr   | int64       | 5,385           |                0 |
| basisalesexpnsothr   | int64       | 4,367           |                0 |
| gnsaleofastothr      | int64       | 5,664           |                0 |
| grsincgaming         | int64       | 5,552           |                0 |
| grsrevnuefndrsng     | int64       | 32,208          |                0 |
| direxpns             | int64       | 24,630          |                0 |
| netincfndrsng        | int64       | 31,952          |                0 |
| grsalesminusret      | int64       | 14,448          |                0 |
| costgoodsold         | int64       | 12,260          |                0 |
| grsprft              | int64       | 13,607          |                0 |
| othrevnue            | int64       | 16,269          |                0 |
| totrevnue            | int64       | 116,060         |                0 |
| totexpns             | int64       | 109,593         |                0 |
| totexcessyr          | int64       | 79,479          |                0 |
| othrchgsnetassetfnd  | int64       | 17,822          |                0 |
| networthend          | int64       | 137,153         |                0 |
| totassetsend         | int64       | 134,036         |                0 |
| totliabend           | int64       | 27,815          |                0 |
| totnetassetsend      | int64       | 135,989         |                0 |
| actvtynotprevrptcd   | object      | 2               |                0 |
| chngsinorgcd         | object      | 2               |                0 |
| unrelbusincd         | object      | 2               |                0 |
| filedf990tcd         | object      | 2               |                0 |
| contractioncd        | object      | 2               |                0 |
| politicalexpend      | int64       | 262             |                0 |
| filedf1120polcd      | object      | 2               |                0 |
| loanstoofficerscd    | object      | 2               |                0 |
| loanstoofficers      | int64       | 2,094           |                0 |
| initiationfee        | int64       | 1,029           |                0 |
| grspublicrcpts       | int64       | 578             |                0 |
| s4958excessbenefcd   | object      | 2               |                0 |
| prohibtdtxshltrcd    | object      | 2               |                0 |
| nonpfrea             | object      | 17              |                0 |
| totnoforgscnt        | int64       | 33              |                0 |
| totsupport           | int64       | 1,835           |                0 |
| gftgrntrcvd170       | int64       | 54,437          |                0 |
| txrevnuelevied170    | int64       | 885             |                0 |
| srvcsval170          | int64       | 694             |                0 |
| pubsuppsubtot170     | int64       | 54,584          |                0 |
| excds2pct170         | int64       | 12,624          |                0 |
| pubsupplesspct170    | int64       | 53,976          |                0 |
| samepubsuppsubtot170 | int64       | 54,520          |                0 |
| grsinc170            | int64       | 7,407           |                0 |
| netincunrelatd170    | int64       | 1,460           |                0 |
| othrinc170           | int64       | 7,046           |                0 |
| totsupport170        | int64       | 55,217          |                0 |
| grsrcptsrelatd170    | int64       | 12,105          |                0 |
| totgftgrntrcvd509    | int64       | 78,532          |                0 |
| grsrcptsadmiss509    | int64       | 47,665          |                0 |
| grsrcptsactvts509    | int64       | 9,524           |                0 |
| txrevnuelevied509    | int64       | 544             |                0 |
| srvcsval509          | int64       | 443             |                0 |
| pubsuppsubtot509     | int64       | 88,070          |                0 |
| rcvdfrmdisqualsub509 | int64       | 4,257           |                0 |
| excds1pct509         | int64       | 2,061           |                0 |
| subtotpub509         | int64       | 5,791           |                0 |
| pubsupplesssub509    | int64       | 87,340          |                0 |
| samepubsuppsubtot509 | int64       | 87,962          |                0 |
| grsinc509            | int64       | 9,159           |                0 |
| unreltxincls511tx509 | int64       | 399             |                0 |
| subtotsuppinc509     | int64       | 9,270           |                0 |
| netincunreltd509     | int64       | 1,443           |                0 |
| othrinc509           | int64       | 6,167           |                0 |
| totsupp509           | int64       | 88,360          |                0 |

# 3. Data Preparation

In this section, we detail the initial steps taken to prepare the 990ez Form from the IRS for analysis. Our goals are to ensure consistency in column naming, handle missing values appropriately, and convert data into formats that are suitable for our analytical needs. Please click the drop down arrow for more details on code used to achieve this.

In [4]:
# Standardize column names.
form_990_2022ez.columns = [x.lower() for x in form_990_2022ez.columns]

# Replace zeros with NaN for appropriate columns.

# Replace NaN with appropriate values accordingly.

# Convert columns to appropriate data types.
date_cols = ['taxpd']
for col in date_cols:
    form_990_2022ez[col] = pd.to_datetime(form_990_2022ez[col].astype(str).str.replace('\.0$', '', regex=True), format='%Y%m', errors='coerce')

# # Convert dtype for appropriate columns.
form_990_2022ez['ein'] = form_990_2022ez['ein'].astype(str).str.replace('\.0$', '', regex=True)

# # Drop duplicates by keeping last tax_pd date.
form_990_2022ez = form_990_2022ez.sort_values('taxpd').drop_duplicates('ein', keep='last')

head_caption = "Cleaned data sample view:"
head_df = form_990_2022ez.head().copy()
head_markdown = head_caption + "\n\n" + head_df.to_markdown(index=False)
display(head_caption,head_df)


(231289, 72)
(193368, 72)


'Cleaned data sample view:'

Unnamed: 0,efile,ein,taxpd,subseccd,totcntrbs,prgmservrev,duesassesmnts,othrinvstinc,grsamtsalesastothr,basisalesexpnsothr,...,excds1pct509,subtotpub509,pubsupplesssub509,samepubsuppsubtot509,grsinc509,unreltxincls511tx509,subtotsuppinc509,netincunreltd509,othrinc509,totsupp509
217119,P,900401808,2000-05-01,3,40038,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
62095,P,300358343,2002-03-01,3,27687,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
70027,P,331204761,2006-06-01,3,1925,0,0,0,0,0,...,0,0,156604,156604,0,0,0,0,0,156604
139749,P,581645905,2008-12-01,4,0,0,123725,557,0,0,...,0,0,0,0,0,0,0,0,0,0
162698,P,760694653,2008-12-01,12,0,185788,900,7698,0,0,...,0,0,0,0,0,0,0,0,0,0


# 4. Analysis
Objective: Determine if the 990ez Form can be used to conduct a comprehensive analysis of existing philanthropic giving in environmental and social justice. Additionally, can it be used to assess the level of transparency and accountability in current giving practices?


## 4.1. Indentifying Relevant Organizations
The 990ez Form can be filtered to reflect organizations by their codes (reflecting their primary mission), enabling the identification of nonprofits focused on environmental protection, social justice, advocacy, and related activities. This step is crucial for creating a focused dataset of relevant organizations for our objective above.

The next order of data transformation should involve filtering the 990ez Form in the same manner as the Exempt Organizations Business Master File and 990 Form.

**Action item**: Review with team to determine which column makes the most sense to use to filter relevant orgs. Options include:
* Subection and Classification codes.
* National Taxonomy of Exempt Entities (NTEE) codes (many are missing unfortunately).
* Foundation codes.
* Activity codes (most likely not useful since becoming obsolete with the adoption of the NTEE coding system in January 1995).

In [None]:
# Insert code here for appropriate filtering if necessary.

## 4.2 Philanthropic Giving Analysis

Below are general topics that may be extracted from the data from a variety of fields to further our understanding of philanthropic giving. 

* Financial contributions and grants: Indicate the volume of philanthropic contributions received and an overview of the scale of philanthropic giving the organization is involved in.
* Revenue from related activities: Provide insight into how these activities contribute to the organization's overall funding and its mission focus.

In [6]:
def display_head(df, columns, new_column_names, caption):
    head_df = df[columns].head().rename(columns=new_column_names)
    head_markdown = f"{caption}\n\n{head_df.to_markdown(index=False)}"
    display(Markdown(head_markdown))
def display_unique_values(df, columns, new_column_names, caption):
    unique_val_df = df[columns].value_counts().reset_index()
    unique_val_df.rename(columns=new_column_names, inplace=True)
    uni_markdown = f"{caption}\n\n{unique_val_df.to_markdown(index=False)}"
    display(Markdown(uni_markdown))

### 4.2.1 Financial contributions and grants

Related data fields and their descriptions:

* gftgrntrcvd170:	Gifts grants membership fees received (170)
* totcntrbs:	Contributions, gifts, grants, etc received

In [7]:
head_columns = ['ein','gftgrntrcvd170','totcntrbs']
head_new_names = {'ein': 'EIN', 'gftgrntrcvd170': 'Gifts grants membership fees received (170)','totcntrbs':'Contributions, gifts, grants, etc received'}
display_head(form_990_2022ez, head_columns, head_new_names, "Example of financial contributions and grants columns:")



Example of financial contributions and grants columns:

|       EIN |   Gifts grants membership fees received (170) |   Contributions, gifts, grants, etc received |
|----------:|----------------------------------------------:|---------------------------------------------:|
| 900401808 |                                        449308 |                                        40038 |
| 300358343 |                                        117583 |                                        27687 |
| 331204761 |                                             0 |                                         1925 |
| 581645905 |                                             0 |                                            0 |
| 760694653 |                                             0 |                                            0 |

### 4.2.2 Revenue from related activities

Related data fields and their descriptions:

* prgmservrev:	Program service revenue
* grspublicrcpts: Gross receipts for public use of club facilities

In [9]:
head_columns = ['ein','prgmservrev','grspublicrcpts']
head_new_names = {'ein': 'EIN', 'prgmservrev': 'program service revenue','grspublicrcpts':'Gross receipts for public use of club facilities'}
display_head(form_990_2022ez, head_columns, head_new_names, "Example of revenue from related activities data:")



Example of revenue from related activities data:

|       EIN |   program service revenue |   Gross receipts for public use of club facilities |
|----------:|--------------------------:|---------------------------------------------------:|
| 900401808 |                         0 |                                                  0 |
| 300358343 |                         0 |                                                  0 |
| 331204761 |                         0 |                                                  0 |
| 581645905 |                         0 |                                                  0 |
| 760694653 |                    185788 |                                                  0 |

### 4.2.2 Expenses and allocations

Related data fields and their descriptions:

* totexpns:	Total expenses
* direxpns: Special events direct expenses

In [11]:
head_columns = ['ein','totexpns','direxpns']
head_new_names = {'ein': 'EIN', 'totexpns': 'Total expenses','direxpns':'Special events direct expenses'}
display_head(form_990_2022ez, head_columns, head_new_names, "Example of expenses and allocations data:")



Example of expenses and allocations data:

|       EIN |   Total expenses |   Special events direct expenses |
|----------:|-----------------:|---------------------------------:|
| 900401808 |            39875 |                                0 |
| 300358343 |                0 |                                0 |
| 331204761 |             4293 |                            25788 |
| 581645905 |           121887 |                                0 |
| 760694653 |           111156 |                                0 |