# 1. Introduction

Split Interest Trust Data collected from Forms 5227, or Split-Interest Trust Information Returns, pertain to specific types of trusts known as split-interest trusts. These are financial entities where the trust's benefits are divided (or "split") between two different types of beneficiaries. Typically, one set of benefits goes to a non-charitable beneficiary (like an individual or a family member) for a specified period, and the other set of benefits goes to a charitable organization, either for a different period or in perpetuity.

The IRS Form 5227 is used by split-interest trusts to report income, deductions, and credits to the IRS. This form is essential for ensuring that these trusts comply with tax laws and regulations, and it helps the IRS monitor the activities and financial health of these entities. This file was pulled from https://www.irs.gov/charities-non-profits/exempt-organizations-business-master-file-extract-eo-bmf


# 2. First Glance

## 2.1. General Summary

In [12]:
# Libraries for data manipulation.
import pandas as pd
import numpy as np

# Libraries for data visualisation.
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.io as pio

# Libraries for quarto rending
from IPython.display import Markdown,display
from tabulate import tabulate
import plotly.io as pio

# Remove warnings.
import warnings
warnings.filterwarnings("ignore", category=UserWarning)

# Read in data.
split_interest = pd.read_csv('../../data/sit-2020.csv')

# Print data dimensions.
shape_caption = "Data Dimensions:"
shape_df = pd.DataFrame({
        'Dimension': ['Rows','Columns'],
        'Count': [split_interest.shape[0], split_interest.shape[1]]
    })
shape_df['Count'] = shape_df['Count'].apply(lambda x: f"{x:,}")
shape_markdown = shape_caption + "\n\n" + shape_df.to_markdown(index=False)
display(Markdown(shape_markdown))

# Print a sample of the data.
first_five_rows_caption = "First Five Rows of Data:"
first_five_rows_markdown = first_five_rows_caption + "\n\n" + split_interest.head().to_markdown(index=False)
display(Markdown(first_five_rows_markdown))

# Print metadata.
metadata_caption = "Metadata:"
column_metadata = []

for col in split_interest.columns:
    # Gather metadata for each col.
    col_metadata = {
        'Column Name': col,
        'Data Type': str(split_interest[col].dtype),
        'Unique Values': split_interest[col].nunique(),
        'Missing Values': split_interest[col].isnull().sum()
    }
    # Append metadata to list.
    column_metadata.append(col_metadata)

# Convert list to pd df and then markdown table.
metadata_df = pd.DataFrame(column_metadata)
metadata_df['Unique Values'] = metadata_df['Unique Values'].apply(lambda x: f"{x:,}")
metadata_df['Missing Values'] = metadata_df['Missing Values'].apply(lambda x: f"{x:,}")
metadata_markdown = metadata_caption + "\n\n" + metadata_df.to_markdown(index=False)
display(Markdown(metadata_markdown))

  split_interest = pd.read_csv('../../data/sit-2020.csv')


Data Dimensions:

| Dimension   | Count   |
|:------------|:--------|
| Rows        | 92,186  |
| Columns     | 315     |

First Five Rows of Data:

|   PRIMARY_TIN_TYP_CD |     EIN |   ESTABLISHED_PRD | CHECK_DIGIT_CD   | NAME_CNTRL_CD   |   COLL_LOC_CD |   ULC_CD |   MF_AO_CD |   SMALL_BUS_AO_CD |   PRIMARY_ZIP_CD |   FR_941_CD |   FR_1120_CD |   FR_720_CD |   FR_1041_CD |   FR_1065_CD |   FR_1066_CD |   FR_8804_CD |   FR_CT1_CD |   FR_940_CD |   FR_943_CD |   FR_1042_CD |   FR_944_CD |   FR_8752_CD |   FR_945_CD |   FR_990T_CD |   FR_1041A_CD |   FR_5227_CD |   FR_3520A_CD |   FR_990PF_CD |   FR_4720_CD |   FR_709_CD |   FR_706_CD |   FR_2290_CD |   FR_11C_CD |   FR_730_CD |   FR_990_CD |   FR_3520_CD |   FR_706GS_T_CD |   FR_706GS_D_CD |   UNIFIED_CUM_AMT |   CAF_CNT |   NAICS_CD |   NAICS_VAL_CD |   NAICS_YR |   RRB_NUM |   NUM_PARTNERS_CNT |   F1065_PTR_TAX_PRD |   SOLE_PROP_SSN_NUM |   OFFICER_TIN_NUM |   PARENT_EIN |   LATEST_709_PRD |   TEFRA_MFT_CD |   TEFRA_PRD |   ADR_CHANGE_PRD |   PETITION_DT |   SEHI_CREDIT_45R_1ST_YR |   SEHI_CREDIT_45R_2ND_YR |   TC520_CC6X_DT |   BUSINESS_PRD |   BUSINESS_CLOSE_DT |   WAGES_PRD |   WAGES_LAST_DT |   CFOL_UPDT_PRD |   FTD_DEPOSIT_YR |   FTD_HIS_DEPOSIT_YR |   LEVY_HEARING_DT |   TC971_AC954_YR |   MEMO_SW_CD |   TDA_TDI_IND |   TDA_CD |   TDI_IND |   SW941_CD |   TC52X_CD |   SW910_CD |   IND637_CD |   TC74X_CD |   C_CORP_CD |   INVALID_SSN_CD |   REVAL_SSN_CD |   OFFSET_CD |   SW918_IND |   TC148_CD |   SW914_CD |   ACT_CD |   SW1120_CD |   BANKRUPT_FLC_CD |   SW53_CD |   TC530_UNREV_CD |   TC530_UNRV_CC19_CD |   NON_53_TDA_CD |   DOC_87_CD |   SW720_CD |   IDRS_CD |   TC130_CD |   TC844_CD |   IDRS_ACT_CD |   FCIC_CD |   TC59X_CD |   FTD_ALERT_CD |   OVERFLOW_CD |   EO_STAT_CD |   CLOSE_53_CD |   IDRS_JJ_CD |   UPC_359_IND |   AIMS_CD |   HARDSHIP_CD |   IMF_FILING_CD |   IND_2032A_CD |   CSED_CD |   EOMF_CD |   TAX_SHELTER_IND |   AUDIT_HISTORY_CD |   CMS_CD |   DMF_CD |   PMF_CD |   FM_8123_CD |   FM_CT1_CD |   SW1042_CD |   SW940_CD |   SW943_CD |   OIC_YR |   TC06X_CD |   OPENING_DO_CD |   SC_JURISDICTION_CD |   CC84_CD |   BAL_DUE_IND |   ES_PENALTY_CD |   PDT_IND |   SW597_IND |   NANNY_TAX_CD |   TCMP_CD |   F944_CD |   LRA136_CD |   LARGE_CORP_CD |   TCMP_CYCLE_CD |   TC09X_NUM |   FISCAL_PRD |   PR_FY_MON_NUM |   DELQ_HISTORY_CD |   ERIS_CD |   STATUS_58_CD |   LPS_ACT_IND |   BANKRUPT_LOC_CD |   STATUS_26_CD |   STATUS_02_CD |   RTN_1_CD |   RTN_2_CD |   RTN_3_CD |   RTN_4_CD |   RTN_5_CD |   MRS_CD |   BMFOL_CYCLE_CD |   BMFOL_ACTIVE_CD |   CEP_IND |   DISASTER_CD |   F990T_EO_ORG_CD |   CORP_990T_IND |   GATT_CD |   VENDOR_CD |   DMF_FRZ_CD |   TC076_CD | MF_BOD_CD   | BOD_CLIENT_CD   |   MAN_BOD_CLIENT_IND |   FM941_TDA_TDI_CD |   LEVY_SW_CD |   EFTPS_84_CD |   EFTPS_85_CD |   EMPLOYMENT_CD |   CD_48_IND |   TC520_CC6X_IND |   COLL_LOC_IND |   TC08X_CP_IND |   NY_PENT_TERR_CD |   TOT_1065_1120_CD |   FRZ_916_CD |   ACT_RET_PENSION_CD |   ERO_IND |   EFTP_FTD_ABATE_IND |   IND527_CD |   F8872_TYPE_CD |   F8872_PR_CD |   FINAL_8872_CD |   KATRINA_CD |   TEL_EX_TAX_REF_CD |   F944_BS6_CD |   TETR_424_CD |   LIEN_CD |   F944_BYPS_CD |   SWCP148_CD |   F990_PF_REP_IND |   LLC_IND |   FOREIGN_CNTRY_CD |   COBRA_ASSIST_CD |   UNDEL_ADR_CD |   UNDEL_CD |   CORP_FTP_CD |   APPL_LG_EMPLR_CD |   DOJ_CD |   IDENTITY_THEFT_CD |   PRIVATE_DEBT_COLLECTION_CD |   DAILY_DELQ_PNLTY_501C_CD |   PRIV_DEBT_COLL_TP_AUTH_NUM |   INDIV_ESTATE_IND |   AC754_IND |   HIST_FIDO_CD |   HIST_TDA_PRM_DO_CD |   HIST_PRIM_DO |   PRIOR_SOLE_PROP_SSN_NUM |   SOLE_PROP_SCH_PRD |   PRIOR_PRIN_OWNER_TRUST_TIN |   PRIN_TIN_ORD |   F944_FRC_CYMN8_CD |   F944_FRC_CYMN7_CD |   F944_FRC_CYMN6_CD |   F944_FRC_CYMN5_CD |   F944_FRC_CYMN4_CD |   F944_FRC_CYMN3_CD |   F944_FRC_CYMN2_CD |   F944_FRC_CYMN1_CD |   F944_FRC_CY_CD |   F944_FRC_CYPL1_CD |   RAF_FL940_CD |   RAF_FL941_CD |   RAF_FL943_CD |   RAF_FL944_CD |   RAF_FL945_CD |   RAF_FLCT1_CD |   RAF_FL1042_CD |   EFTPS_940_CD |   EFTPS_941_CD |   EFTPS_943_CD |   EFTPS_944_CD |   EFTPS_945_CD |   EFTPS_CT1_CD |   EFTPS_720_CD |   EFTPS_1042_CD |   EFTPS_FCORP_CD |   GEOGRAPHIC_CD | OLD_NM_CNTRL_CD   | PRIMARY_STATE_CD   | PRIMARY_CITY_NM   | PRIMARY_ADR   | PRIMARY_NM                        | PRIMARY_CONT_NM                |   SORT_NM | CARE_OF_NM                       |   LOCATION_ADR |   LOCATION_CONT_ADR |   FOREIGN_ADR |   AO_CD |   GRP_EXEMPT_NUM |   CURR_SUBSECT_CD |   PRIOR_SUBSECT_CD |   AFFILIATION_CD |   CLASSIFICATION_CD |   ORGANIZING_DT |   REGISTRATION_DT |   RULING_PRD |   DEDUCT_CD |   DEDUCT_YR |   CURR_FOUNDATION_CD |   PR_FOUNDATION_CD |   BMF_ACTIVITY_CD |   ORGANIZATION_CD |   FILE_FOLDER_NUM |   PENSION_PLAN_CD |   ADV_RULING_PRD |   TEAM_EXPRG_CASE_CD |   NTEE_CD |   HOSP_CD |   LOBBY_CD |   LOBBY_ELECT_YR |   ASSET_CD |   INCOME_CD |   RTN_TAX_YR |   CURR_STATUS_CD |   CURR_STATUS_PRD |   PRIOR_STATUS_CD |   PRIOR_STATUS_PRD |   ZIP_9_CD |   ZIP_5_CD |   ZIP_3_CD | REC_LOAD_DT   |   CHAR_TIN |   ASSET_AMT |   INCOME_AMT |   Name |   ICO |   Sort_Name |   Street |   rx |   matchpos |   matchpos2 |   matchpos3 |   matchpos4 |   Old_Name |   SSNFMT_NAME |   Old_ICO |   SSNFMT_ICO |   Old_Sort_Name |   SSNFMT_SORTNAME |   Old_Street |   SSNFMT_STREET |   Strng_Length |   Nmbr_Before |   Nmbr_After |   Name_9Dig |   ICO_9Dig |   Sort_Name_9Dig |   Street_9Dig |   DIG9_NAME |   DIG9_ICO |   DIG9_SORTNAME |   DIG9_STREET |
|---------------------:|--------:|------------------:|:-----------------|:----------------|--------------:|---------:|-----------:|------------------:|-----------------:|------------:|-------------:|------------:|-------------:|-------------:|-------------:|-------------:|------------:|------------:|------------:|-------------:|------------:|-------------:|------------:|-------------:|--------------:|-------------:|--------------:|--------------:|-------------:|------------:|------------:|-------------:|------------:|------------:|------------:|-------------:|----------------:|----------------:|------------------:|----------:|-----------:|---------------:|-----------:|----------:|-------------------:|--------------------:|--------------------:|------------------:|-------------:|-----------------:|---------------:|------------:|-----------------:|--------------:|-------------------------:|-------------------------:|----------------:|---------------:|--------------------:|------------:|----------------:|----------------:|-----------------:|---------------------:|------------------:|-----------------:|-------------:|--------------:|---------:|----------:|-----------:|-----------:|-----------:|------------:|-----------:|------------:|-----------------:|---------------:|------------:|------------:|-----------:|-----------:|---------:|------------:|------------------:|----------:|-----------------:|---------------------:|----------------:|------------:|-----------:|----------:|-----------:|-----------:|--------------:|----------:|-----------:|---------------:|--------------:|-------------:|--------------:|-------------:|--------------:|----------:|--------------:|----------------:|---------------:|----------:|----------:|------------------:|-------------------:|---------:|---------:|---------:|-------------:|------------:|------------:|-----------:|-----------:|---------:|-----------:|----------------:|---------------------:|----------:|--------------:|----------------:|----------:|------------:|---------------:|----------:|----------:|------------:|----------------:|----------------:|------------:|-------------:|----------------:|------------------:|----------:|---------------:|--------------:|------------------:|---------------:|---------------:|-----------:|-----------:|-----------:|-----------:|-----------:|---------:|-----------------:|------------------:|----------:|--------------:|------------------:|----------------:|----------:|------------:|-------------:|-----------:|:------------|:----------------|---------------------:|-------------------:|-------------:|--------------:|--------------:|----------------:|------------:|-----------------:|---------------:|---------------:|------------------:|-------------------:|-------------:|---------------------:|----------:|---------------------:|------------:|----------------:|--------------:|----------------:|-------------:|--------------------:|--------------:|--------------:|----------:|---------------:|-------------:|------------------:|----------:|-------------------:|------------------:|---------------:|-----------:|--------------:|-------------------:|---------:|--------------------:|-----------------------------:|---------------------------:|-----------------------------:|-------------------:|------------:|---------------:|---------------------:|---------------:|--------------------------:|--------------------:|-----------------------------:|---------------:|--------------------:|--------------------:|--------------------:|--------------------:|--------------------:|--------------------:|--------------------:|--------------------:|-----------------:|--------------------:|---------------:|---------------:|---------------:|---------------:|---------------:|---------------:|----------------:|---------------:|---------------:|---------------:|---------------:|---------------:|---------------:|---------------:|----------------:|-----------------:|----------------:|:------------------|:-------------------|:------------------|:--------------|:----------------------------------|:-------------------------------|----------:|:---------------------------------|---------------:|--------------------:|--------------:|--------:|-----------------:|------------------:|-------------------:|-----------------:|--------------------:|----------------:|------------------:|-------------:|------------:|------------:|---------------------:|-------------------:|------------------:|------------------:|------------------:|------------------:|-----------------:|---------------------:|----------:|----------:|-----------:|-----------------:|-----------:|------------:|-------------:|-----------------:|------------------:|------------------:|-------------------:|-----------:|-----------:|-----------:|:--------------|-----------:|------------:|-------------:|-------:|------:|------------:|---------:|-----:|-----------:|------------:|------------:|------------:|-----------:|--------------:|----------:|-------------:|----------------:|------------------:|-------------:|----------------:|---------------:|--------------:|-------------:|------------:|-----------:|-----------------:|--------------:|------------:|-----------:|----------------:|--------------:|
|                    2 | 2001555 |            197112 | UI               | OAKE            |             0 |        1 |        nan |                21 |      44015108025 |           0 |            0 |           0 |            0 |            0 |            0 |            0 |           0 |           0 |           0 |            0 |           0 |            0 |           0 |            0 |             0 |            0 |             0 |             0 |            0 |           0 |           0 |            0 |           0 |           0 |           0 |            0 |               0 |               0 |                 0 |         0 |     813000 |              5 |       1988 |         0 |                  0 |                   0 |                   0 |                 0 |            0 |                0 |              0 |           0 |           200640 |           nan |                        0 |                        0 |             nan |              0 |                 nan |           0 |             nan |          202203 |                0 |                    0 |               nan |                0 |            0 |             0 |        0 |         0 |          0 |          0 |          0 |           0 |          0 |           0 |                0 |              0 |           0 |           0 |          0 |          0 |        1 |           0 |                 0 |         0 |                0 |                    0 |               0 |           0 |          0 |         0 |          0 |          0 |             0 |         0 |          0 |              0 |             0 |            2 |             0 |            0 |             0 |         0 |             0 |               0 |              0 |         0 |         0 |                 0 |                  0 |        0 |        0 |        0 |            0 |           0 |           0 |          0 |          0 |        0 |          0 |               1 |                    0 |         0 |             0 |               0 |         0 |           0 |              0 |         0 |         0 |           0 |               0 |               0 |           0 |       198504 |               0 |                 0 |         0 |              0 |             0 |                 0 |              0 |              0 |          0 |          0 |          0 |          0 |          0 |        0 |                0 |                 0 |         0 |             4 |                 0 |               0 |         0 |           0 |            0 |          0 | TE          | B               |                    0 |                  0 |            0 |             0 |             0 |             nan |           0 |                0 |              0 |              0 |                 0 |                  0 |            0 |                    0 |         0 |                    0 |           0 |               0 |             0 |               0 |            0 |                   0 |             0 |             0 |         0 |              0 |            0 |                 0 |       nan |                nan |                 0 |              0 |          0 |             0 |                  0 |        0 |                   0 |                            0 |                          0 |                            0 |                  0 |           0 |              1 |                    0 |              4 |                         0 |                   0 |                            0 |              0 |                 nan |                 nan |                 nan |                 nan |                 nan |                 nan |                 nan |                 nan |              nan |                 nan |              0 |              0 |              0 |              0 |              0 |              0 |               0 |              0 |              0 |              0 |              0 |              0 |              0 |              0 |               0 |                0 |               0 | nan               | ME                 | BANGOR            | 2 HAMMOND ST  | MYRTICE OAKES TR UW NECT          | nan                            |       nan | % MERRILL TR CO TTEE             |            nan |                 nan |           nan |       1 |                0 |                90 |                  0 |                0 |                1000 |             nan |               nan |       197012 |           0 |           0 |                    0 |                  0 |         909000000 |                 2 |          40000000 |                 2 |                0 |                    0 |       nan |         0 |          0 |                0 |          0 |           0 |         1970 |               12 |            197305 |               nan |                  0 |   44015108 |       4401 |         44 | 2022-01-21    | 2002001555 |         nan |          nan |    nan |   nan |         nan |      nan |  nan |        nan |         nan |         nan |         nan |        nan |           nan |       nan |          nan |             nan |               nan |          nan |             nan |            nan |           nan |          nan |         nan |        nan |              nan |           nan |         nan |        nan |             nan |           nan |
|                    2 | 2032809 |            197606 | SJ               | HISP            |             0 |       13 |        nan |                21 |     100192701991 |           0 |            0 |           0 |            0 |            0 |            0 |            0 |           0 |           0 |           0 |            0 |           0 |            0 |           0 |            0 |             0 |            0 |             0 |             0 |            0 |           0 |           0 |            0 |           0 |           0 |           0 |            0 |               0 |               0 |                 0 |         0 |     813000 |              5 |       1981 |         0 |                  0 |                   0 |                   0 |                 0 |            0 |                0 |              0 |           0 |           200639 |           nan |                        0 |                        0 |             nan |              0 |                 nan |           0 |             nan |          202206 |                0 |                    0 |               nan |                0 |            0 |             0 |        0 |         0 |          0 |          0 |          0 |           0 |          0 |           0 |                0 |              0 |           0 |           0 |          0 |          0 |        1 |           0 |                 0 |         0 |                0 |                    0 |               0 |           0 |          0 |         0 |          0 |          0 |             0 |         0 |          0 |              0 |             0 |            2 |             0 |            0 |             0 |         0 |             0 |               0 |              0 |         0 |         0 |                 0 |                  0 |        0 |        0 |        0 |            0 |           0 |           0 |          0 |          0 |        0 |          0 |              13 |                    0 |         0 |             0 |               0 |         0 |           0 |              0 |         0 |         0 |           0 |               0 |               0 |           0 |       198412 |               0 |                 0 |         0 |              0 |             0 |                 0 |              0 |              0 |          0 |          0 |          0 |          0 |          0 |        0 |                0 |                 0 |         0 |             4 |                 0 |               0 |         0 |           0 |            0 |          0 | TE          | B               |                    0 |                  0 |            0 |             0 |             0 |             nan |           0 |                0 |              0 |              0 |                 1 |                  0 |            0 |                    0 |         0 |                    0 |           0 |               0 |             0 |               0 |            0 |                   0 |             0 |             0 |         0 |              0 |            0 |                 0 |       nan |                nan |                 0 |              0 |          0 |             0 |                  0 |        0 |                   0 |                            0 |                          0 |                            0 |                  0 |           0 |             13 |                    0 |             13 |                         0 |                   0 |                            0 |              0 |                 nan |                 nan |                 nan |                 nan |                 nan |                 nan |                 nan |                 nan |              nan |                 nan |              0 |              0 |              0 |              0 |              0 |              0 |               0 |              0 |              0 |              0 |              0 |              0 |              0 |              0 |               0 |                0 |               0 | nan               | NY                 | NEW YORK          | 9 W 57TH ST   | HISPANIC SOCIETY OF AMERICA TRUST | NECT                           |       nan | % MORGAN GUARANTY TRUST COMPANY  |            nan |                 nan |           nan |       1 |                0 |                90 |                  0 |                0 |                1000 |             nan |               nan |       197605 |           0 |           0 |                    0 |                  0 |         909000000 |                 2 |         130000000 |                 2 |                0 |                    0 |       nan |         0 |          0 |                0 |          0 |           0 |            0 |               12 |            197605 |               nan |                  0 |  100192701 |      10019 |        100 | 2022-02-11    | 2002032809 |         nan |          nan |    nan |   nan |         nan |      nan |  nan |        nan |         nan |         nan |         nan |        nan |           nan |       nan |          nan |             nan |               nan |          nan |             nan |            nan |           nan |          nan |         nan |        nan |              nan |           nan |         nan |        nan |             nan |           nan |
|                    2 | 2126212 |            197710 | KT               | BONT            |             0 |       35 |        nan |                24 |     468020000000 |           0 |            0 |           0 |            0 |            0 |            0 |            0 |           0 |           0 |           0 |            0 |           0 |            0 |           0 |            0 |             0 |            0 |             0 |             0 |            0 |           0 |           0 |            0 |           0 |           0 |           0 |            0 |               0 |               0 |                 0 |         0 |     813000 |              5 |       1982 |         0 |                  0 |                   0 |                   0 |                 0 |            0 |                0 |              0 |           0 |           200637 |           nan |                        0 |                        0 |             nan |              0 |                 nan |           0 |             nan |          202203 |                0 |                    0 |               nan |                0 |            0 |             0 |        0 |         0 |          0 |          0 |          0 |           0 |          0 |           0 |                0 |              0 |           0 |           0 |          0 |          0 |        1 |           1 |                 0 |         0 |                0 |                    0 |               0 |           0 |          0 |         0 |          0 |          0 |             0 |         0 |          0 |              0 |             0 |            2 |             0 |            0 |             0 |         0 |             0 |               0 |              0 |         0 |         0 |                 0 |                  0 |        0 |        0 |        0 |            0 |           0 |           0 |          0 |          0 |        0 |          0 |              35 |                    0 |         0 |             0 |               0 |         0 |           0 |              0 |         0 |         0 |           0 |               0 |               0 |           0 |       198609 |               0 |                 0 |         0 |              0 |             0 |                 0 |              0 |              0 |          0 |          0 |          0 |          0 |          0 |        0 |                0 |                 0 |         0 |             4 |                 0 |               0 |         0 |           0 |            0 |          0 | TE          | B               |                    0 |                  0 |            0 |             0 |             0 |             nan |           0 |                0 |              0 |              0 |                 0 |                  0 |            0 |                    0 |         0 |                    0 |           0 |               0 |             0 |               0 |            0 |                   0 |             0 |             0 |         0 |              0 |            0 |                 0 |       nan |                nan |                 0 |              0 |          0 |             0 |                  0 |        0 |                   0 |                            0 |                          0 |                            0 |                  0 |           0 |             35 |                    0 |             35 |                         0 |                   0 |                            0 |              0 |                 nan |                 nan |                 nan |                 nan |                 nan |                 nan |                 nan |                 nan |              nan |                 nan |              0 |              0 |              0 |              0 |              0 |              0 |               0 |              0 |              0 |              0 |              0 |              0 |              0 |              0 |               0 |                0 |               0 | CAST              | IN                 | FORT WAYNE        | LOCAL         | LOUISE BONTER UNITRUST FBO JESSIE | FRANCES CASTLE                 |       nan | % FORT WAYNE NATIONAL BANK       |            nan |                 nan |           nan |       3 |                0 |                90 |                  0 |                0 |                1000 |             nan |               nan |       197710 |           0 |           0 |                    0 |                  0 |         928000000 |                 2 |         310000000 |                 2 |                0 |                    0 |       nan |         0 |          0 |                0 |          4 |           3 |         1981 |               12 |            197710 |               nan |                  0 |  468020000 |      46802 |        468 | 2022-01-21    | 2002126212 |         nan |          nan |    nan |   nan |         nan |      nan |  nan |        nan |         nan |         nan |         nan |        nan |           nan |       nan |          nan |             nan |               nan |          nan |             nan |            nan |           nan |          nan |         nan |        nan |              nan |           nan |         nan |        nan |             nan |           nan |
|                    2 | 2190092 |            197112 | CU               | JARV            |             0 |       13 |        nan |                21 |     100202302992 |           0 |            0 |           0 |            0 |            0 |            0 |            0 |           0 |           0 |           0 |            0 |           0 |            0 |           0 |            0 |             0 |            0 |             0 |             0 |            0 |           0 |           0 |            0 |           0 |           0 |           0 |            0 |               0 |               0 |                 0 |         0 |     813000 |              5 |       1985 |         0 |                  0 |                   0 |                   0 |                 0 |            0 |                0 |              0 |           0 |           200637 |           nan |                        0 |                        0 |             nan |              0 |                 nan |           0 |             nan |          202206 |                0 |                    0 |               nan |                0 |            0 |             0 |        0 |         0 |          0 |          0 |          0 |           0 |          0 |           0 |                0 |              0 |           0 |           0 |          0 |          0 |        1 |           1 |                 0 |         0 |                0 |                    0 |               0 |           0 |          0 |         0 |          0 |          0 |             0 |         0 |          0 |              0 |             0 |            2 |             0 |            0 |             0 |         0 |             0 |               0 |              0 |         0 |         0 |                 0 |                  0 |        0 |        0 |        0 |            0 |           0 |           0 |          0 |          0 |        0 |          0 |              13 |                    0 |         0 |             0 |               0 |         0 |           0 |              0 |         0 |         0 |           0 |               0 |               0 |           0 |       198512 |               0 |                 0 |         0 |              0 |             0 |                 0 |              0 |              0 |          0 |          0 |          0 |          0 |          0 |        0 |                0 |                 0 |         0 |             4 |                 0 |               0 |         0 |           0 |            0 |          0 | TE          | B               |                    0 |                  0 |            0 |             0 |             0 |             nan |           0 |                0 |              0 |              0 |                 1 |                  0 |            0 |                    0 |         0 |                    0 |           0 |               0 |             0 |               0 |            0 |                   0 |             0 |             0 |         0 |              0 |            0 |                 0 |       nan |                nan |                 0 |              0 |          0 |             0 |                  0 |        0 |                   0 |                            0 |                          0 |                            0 |                  0 |           0 |             13 |                    0 |             13 |                         0 |                   0 |                            0 |              0 |                 nan |                 nan |                 nan |                 nan |                 nan |                 nan |                 nan |                 nan |              nan |                 nan |              0 |              0 |              0 |              0 |              0 |              0 |               0 |              0 |              0 |              0 |              0 |              0 |              0 |              0 |               0 |                0 |               0 | JAVI              | NY                 | NEW YORK          | 600 FIFTH AVE | JAMES N JARVIE TRUST UW PAR 14 FB | JERUSALEM YMCA BUILDING ISRAEL |       nan | % MANUFACTURERS HANOVER TRUST CO |            nan |                 nan |           nan |       1 |                0 |                90 |                  0 |                0 |                1000 |             nan |               nan |       197012 |           0 |           0 |                    0 |                  0 |         909000000 |                 2 |         130000000 |                 2 |                0 |                    0 |       nan |         0 |          0 |                0 |          0 |           0 |         1970 |               12 |            197308 |               nan |                  0 |  100202302 |      10020 |        100 | 2022-02-11    | 2002190092 |         nan |          nan |    nan |   nan |         nan |      nan |  nan |        nan |         nan |         nan |         nan |        nan |           nan |       nan |          nan |             nan |               nan |          nan |             nan |            nan |           nan |          nan |         nan |        nan |              nan |           nan |         nan |        nan |             nan |           nan |
|                    2 | 2209021 |            197903 | WQ               | HAMM            |             0 |       56 |        nan |                23 |     273490000000 |           0 |            0 |           0 |            0 |            0 |            0 |            0 |           0 |           0 |           0 |            0 |           0 |            0 |           0 |            0 |             0 |            0 |             0 |             0 |            0 |           0 |           0 |            0 |           0 |           0 |           0 |            0 |               0 |               0 |                 0 |         0 |     999000 |              5 |       1983 |         0 |                  0 |                   0 |                   0 |                 0 |            0 |                0 |              0 |           0 |           200636 |           nan |                        0 |                        0 |             nan |              0 |                 nan |           0 |             nan |          202241 |                0 |                    0 |               nan |                0 |            0 |             0 |        0 |         0 |          0 |          0 |          0 |           0 |          0 |           0 |                0 |              0 |           0 |           0 |          0 |          0 |        1 |           0 |                 0 |         0 |                0 |                    0 |               0 |           0 |          0 |         0 |          0 |          0 |             0 |         0 |          0 |              0 |             0 |            2 |             0 |            0 |             0 |         0 |             0 |               0 |              0 |         0 |         0 |                 0 |                  0 |        0 |        0 |        0 |            0 |           0 |           0 |          0 |          0 |        0 |          0 |              56 |                    0 |         0 |             0 |               0 |         0 |           0 |              0 |         0 |         0 |           0 |               0 |               0 |           0 |       198312 |               0 |                 0 |         0 |              0 |             0 |                 0 |              0 |              0 |          0 |          0 |          0 |          0 |          0 |        0 |                0 |                 0 |         0 |             5 |                 0 |               0 |         0 |           0 |            0 |          0 | TE          | B               |                    0 |                  0 |            0 |             0 |             0 |             nan |           0 |                0 |              0 |              0 |                 0 |                  0 |            0 |                    0 |         0 |                    0 |           0 |               0 |             0 |               0 |            0 |                   0 |             0 |             0 |         0 |              0 |            0 |                 0 |       nan |                nan |                 0 |              0 |          0 |             0 |                  0 |        0 |                   0 |                            0 |                          0 |                            0 |                  0 |           0 |             56 |                    0 |             56 |                         0 |                   0 |                            0 |              0 |                 nan |                 nan |                 nan |                 nan |                 nan |                 nan |                 nan |                 nan |              nan |                 nan |              0 |              0 |              0 |              0 |              0 |              0 |               0 |              0 |              0 |              0 |              0 |              0 |              0 |              0 |               0 |                0 |               0 | nan               | NC                 | SNOW CAMP         | LOCAL         | HAMMER TRUST                      | nan                            |       nan | % A W MOON JR                    |            nan |                 nan |           nan |       2 |                0 |                90 |                  0 |                0 |                1000 |             nan |               nan |            0 |           0 |           0 |                    0 |                  0 |                 0 |                 5 |         580001038 |                 0 |                0 |                    0 |       nan |         0 |          0 |                0 |          5 |           4 |         1977 |               12 |            198207 |                 4 |             198201 |  273490000 |      27349 |        273 | 2022-10-14    | 2002209021 |         nan |          nan |    nan |   nan |         nan |      nan |  nan |        nan |         nan |         nan |         nan |        nan |           nan |       nan |          nan |             nan |               nan |          nan |             nan |            nan |           nan |          nan |         nan |        nan |              nan |           nan |         nan |        nan |             nan |           nan |

Metadata:

| Column Name                | Data Type   | Unique Values   | Missing Values   |
|:---------------------------|:------------|:----------------|:-----------------|
| PRIMARY_TIN_TYP_CD         | int64       | 1               | 0                |
| EIN                        | int64       | 92,186          | 0                |
| ESTABLISHED_PRD            | int64       | 632             | 0                |
| CHECK_DIGIT_CD             | object      | 528             | 188              |
| NAME_CNTRL_CD              | object      | 15,678          | 0                |
| COLL_LOC_CD                | int64       | 9               | 0                |
| ULC_CD                     | int64       | 66              | 0                |
| MF_AO_CD                   | float64     | 0               | 92,186           |
| SMALL_BUS_AO_CD            | int64       | 8               | 0                |
| PRIMARY_ZIP_CD             | int64       | 42,253          | 0                |
| FR_941_CD                  | int64       | 5               | 0                |
| FR_1120_CD                 | int64       | 4               | 0                |
| FR_720_CD                  | int64       | 5               | 0                |
| FR_1041_CD                 | int64       | 5               | 0                |
| FR_1065_CD                 | int64       | 2               | 0                |
| FR_1066_CD                 | int64       | 2               | 0                |
| FR_8804_CD                 | int64       | 2               | 0                |
| FR_CT1_CD                  | int64       | 3               | 0                |
| FR_940_CD                  | int64       | 3               | 0                |
| FR_943_CD                  | int64       | 3               | 0                |
| FR_1042_CD                 | int64       | 3               | 0                |
| FR_944_CD                  | int64       | 3               | 0                |
| FR_8752_CD                 | int64       | 2               | 0                |
| FR_945_CD                  | int64       | 3               | 0                |
| FR_990T_CD                 | int64       | 3               | 0                |
| FR_1041A_CD                | int64       | 3               | 0                |
| FR_5227_CD                 | int64       | 4               | 0                |
| FR_3520A_CD                | int64       | 3               | 0                |
| FR_990PF_CD                | int64       | 2               | 0                |
| FR_4720_CD                 | int64       | 3               | 0                |
| FR_709_CD                  | int64       | 2               | 0                |
| FR_706_CD                  | int64       | 2               | 0                |
| FR_2290_CD                 | int64       | 3               | 0                |
| FR_11C_CD                  | int64       | 2               | 0                |
| FR_730_CD                  | int64       | 2               | 0                |
| FR_990_CD                  | int64       | 3               | 0                |
| FR_3520_CD                 | int64       | 3               | 0                |
| FR_706GS_T_CD              | int64       | 3               | 0                |
| FR_706GS_D_CD              | int64       | 3               | 0                |
| UNIFIED_CUM_AMT            | int64       | 1               | 0                |
| CAF_CNT                    | int64       | 100             | 0                |
| NAICS_CD                   | int64       | 86              | 0                |
| NAICS_VAL_CD               | int64       | 7               | 0                |
| NAICS_YR                   | int64       | 42              | 0                |
| RRB_NUM                    | int64       | 1               | 0                |
| NUM_PARTNERS_CNT           | int64       | 3               | 0                |
| F1065_PTR_TAX_PRD          | int64       | 3               | 0                |
| SOLE_PROP_SSN_NUM          | int64       | 1               | 0                |
| OFFICER_TIN_NUM            | int64       | 33,765          | 0                |
| PARENT_EIN                 | int64       | 13              | 0                |
| LATEST_709_PRD             | int64       | 1               | 0                |
| TEFRA_MFT_CD               | int64       | 2               | 0                |
| TEFRA_PRD                  | int64       | 2               | 0                |
| ADR_CHANGE_PRD             | int64       | 1,004           | 0                |
| PETITION_DT                | float64     | 1               | 92,185           |
| SEHI_CREDIT_45R_1ST_YR     | int64       | 1               | 0                |
| SEHI_CREDIT_45R_2ND_YR     | int64       | 1               | 0                |
| TC520_CC6X_DT              | float64     | 0               | 92,186           |
| BUSINESS_PRD               | int64       | 448             | 0                |
| BUSINESS_CLOSE_DT          | float64     | 1               | 92,168           |
| WAGES_PRD                  | int64       | 119             | 0                |
| WAGES_LAST_DT              | float64     | 1               | 92,184           |
| CFOL_UPDT_PRD              | int64       | 347             | 0                |
| FTD_DEPOSIT_YR             | int64       | 10              | 0                |
| FTD_HIS_DEPOSIT_YR         | int64       | 4               | 0                |
| LEVY_HEARING_DT            | float64     | 1               | 92,185           |
| TC971_AC954_YR             | int64       | 1               | 0                |
| MEMO_SW_CD                 | int64       | 2               | 0                |
| TDA_TDI_IND                | int64       | 2               | 0                |
| TDA_CD                     | int64       | 2               | 0                |
| TDI_IND                    | int64       | 2               | 0                |
| SW941_CD                   | int64       | 2               | 0                |
| TC52X_CD                   | int64       | 1               | 0                |
| SW910_CD                   | int64       | 1               | 0                |
| IND637_CD                  | int64       | 2               | 0                |
| TC74X_CD                   | int64       | 1               | 0                |
| C_CORP_CD                  | int64       | 2               | 0                |
| INVALID_SSN_CD             | int64       | 1               | 0                |
| REVAL_SSN_CD               | int64       | 1               | 0                |
| OFFSET_CD                  | int64       | 1               | 0                |
| SW918_IND                  | int64       | 1               | 0                |
| TC148_CD                   | int64       | 1               | 0                |
| SW914_CD                   | int64       | 1               | 0                |
| ACT_CD                     | int64       | 2               | 0                |
| SW1120_CD                  | int64       | 31              | 0                |
| BANKRUPT_FLC_CD            | int64       | 2               | 0                |
| SW53_CD                    | int64       | 2               | 0                |
| TC530_UNREV_CD             | int64       | 1               | 0                |
| TC530_UNRV_CC19_CD         | int64       | 1               | 0                |
| NON_53_TDA_CD              | int64       | 2               | 0                |
| DOC_87_CD                  | int64       | 2               | 0                |
| SW720_CD                   | int64       | 2               | 0                |
| IDRS_CD                    | int64       | 1               | 0                |
| TC130_CD                   | int64       | 5               | 0                |
| TC844_CD                   | int64       | 3               | 0                |
| IDRS_ACT_CD                | int64       | 1               | 0                |
| FCIC_CD                    | int64       | 1               | 0                |
| TC59X_CD                   | int64       | 2               | 0                |
| FTD_ALERT_CD               | int64       | 1               | 0                |
| OVERFLOW_CD                | int64       | 1               | 0                |
| EO_STAT_CD                 | int64       | 4               | 0                |
| CLOSE_53_CD                | float64     | 4               | 1                |
| IDRS_JJ_CD                 | int64       | 1               | 0                |
| UPC_359_IND                | int64       | 2               | 0                |
| AIMS_CD                    | float64     | 2               | 16               |
| HARDSHIP_CD                | int64       | 1               | 0                |
| IMF_FILING_CD              | int64       | 1               | 0                |
| IND_2032A_CD               | int64       | 1               | 0                |
| CSED_CD                    | int64       | 1               | 0                |
| EOMF_CD                    | int64       | 1               | 0                |
| TAX_SHELTER_IND            | int64       | 1               | 0                |
| AUDIT_HISTORY_CD           | int64       | 1               | 0                |
| CMS_CD                     | int64       | 1               | 0                |
| DMF_CD                     | int64       | 1               | 0                |
| PMF_CD                     | int64       | 2               | 0                |
| FM_8123_CD                 | int64       | 1               | 0                |
| FM_CT1_CD                  | int64       | 1               | 0                |
| SW1042_CD                  | int64       | 2               | 0                |
| SW940_CD                   | int64       | 4               | 0                |
| SW943_CD                   | int64       | 2               | 0                |
| OIC_YR                     | int64       | 2               | 0                |
| TC06X_CD                   | int64       | 1               | 0                |
| OPENING_DO_CD              | int64       | 66              | 0                |
| SC_JURISDICTION_CD         | int64       | 1               | 0                |
| CC84_CD                    | int64       | 1               | 0                |
| BAL_DUE_IND                | int64       | 2               | 0                |
| ES_PENALTY_CD              | float64     | 27              | 57               |
| PDT_IND                    | int64       | 1               | 0                |
| SW597_IND                  | int64       | 2               | 0                |
| NANNY_TAX_CD               | int64       | 1               | 0                |
| TCMP_CD                    | int64       | 1               | 0                |
| F944_CD                    | int64       | 2               | 0                |
| LRA136_CD                  | int64       | 2               | 0                |
| LARGE_CORP_CD              | int64       | 1               | 0                |
| TCMP_CYCLE_CD              | int64       | 1               | 0                |
| TC09X_NUM                  | int64       | 1               | 0                |
| FISCAL_PRD                 | int64       | 172             | 0                |
| PR_FY_MON_NUM              | int64       | 13              | 0                |
| DELQ_HISTORY_CD            | int64       | 2               | 0                |
| ERIS_CD                    | int64       | 2               | 0                |
| STATUS_58_CD               | int64       | 2               | 0                |
| LPS_ACT_IND                | int64       | 2               | 0                |
| BANKRUPT_LOC_CD            | int64       | 1               | 0                |
| STATUS_26_CD               | int64       | 2               | 0                |
| STATUS_02_CD               | int64       | 2               | 0                |
| RTN_1_CD                   | float64     | 13              | 1                |
| RTN_2_CD                   | int64       | 12              | 0                |
| RTN_3_CD                   | int64       | 7               | 0                |
| RTN_4_CD                   | int64       | 3               | 0                |
| RTN_5_CD                   | int64       | 1               | 0                |
| MRS_CD                     | int64       | 1               | 0                |
| BMFOL_CYCLE_CD             | int64       | 1               | 0                |
| BMFOL_ACTIVE_CD            | int64       | 1               | 0                |
| CEP_IND                    | int64       | 1               | 0                |
| DISASTER_CD                | int64       | 5               | 0                |
| F990T_EO_ORG_CD            | int64       | 4               | 0                |
| CORP_990T_IND              | int64       | 2               | 0                |
| GATT_CD                    | int64       | 2               | 0                |
| VENDOR_CD                  | int64       | 1               | 0                |
| DMF_FRZ_CD                 | int64       | 1               | 0                |
| TC076_CD                   | int64       | 3               | 0                |
| MF_BOD_CD                  | object      | 3               | 0                |
| BOD_CLIENT_CD              | object      | 9               | 0                |
| MAN_BOD_CLIENT_IND         | int64       | 1               | 0                |
| FM941_TDA_TDI_CD           | int64       | 1               | 0                |
| LEVY_SW_CD                 | int64       | 2               | 0                |
| EFTPS_84_CD                | int64       | 2               | 0                |
| EFTPS_85_CD                | int64       | 2               | 0                |
| EMPLOYMENT_CD              | object      | 2               | 92,168           |
| CD_48_IND                  | int64       | 1               | 0                |
| TC520_CC6X_IND             | int64       | 1               | 0                |
| COLL_LOC_IND               | int64       | 2               | 0                |
| TC08X_CP_IND               | int64       | 1               | 0                |
| NY_PENT_TERR_CD            | int64       | 4               | 0                |
| TOT_1065_1120_CD           | int64       | 2               | 0                |
| FRZ_916_CD                 | int64       | 1               | 0                |
| ACT_RET_PENSION_CD         | int64       | 2               | 0                |
| ERO_IND                    | int64       | 2               | 0                |
| EFTP_FTD_ABATE_IND         | int64       | 2               | 0                |
| IND527_CD                  | int64       | 2               | 0                |
| F8872_TYPE_CD              | int64       | 1               | 0                |
| F8872_PR_CD                | int64       | 1               | 0                |
| FINAL_8872_CD              | int64       | 1               | 0                |
| KATRINA_CD                 | int64       | 1               | 0                |
| TEL_EX_TAX_REF_CD          | int64       | 2               | 0                |
| F944_BS6_CD                | int64       | 1               | 0                |
| TETR_424_CD                | int64       | 1               | 0                |
| LIEN_CD                    | int64       | 2               | 0                |
| F944_BYPS_CD               | int64       | 2               | 0                |
| SWCP148_CD                 | int64       | 1               | 0                |
| F990_PF_REP_IND            | int64       | 1               | 0                |
| LLC_IND                    | object      | 2               | 92,181           |
| FOREIGN_CNTRY_CD           | object      | 15              | 92,143           |
| COBRA_ASSIST_CD            | int64       | 1               | 0                |
| UNDEL_ADR_CD               | int64       | 2               | 0                |
| UNDEL_CD                   | int64       | 2               | 0                |
| CORP_FTP_CD                | int64       | 1               | 0                |
| APPL_LG_EMPLR_CD           | int64       | 3               | 0                |
| DOJ_CD                     | int64       | 1               | 0                |
| IDENTITY_THEFT_CD          | int64       | 2               | 0                |
| PRIVATE_DEBT_COLLECTION_CD | int64       | 1               | 0                |
| DAILY_DELQ_PNLTY_501C_CD   | int64       | 1               | 0                |
| PRIV_DEBT_COLL_TP_AUTH_NUM | int64       | 1               | 0                |
| INDIV_ESTATE_IND           | int64       | 1               | 0                |
| AC754_IND                  | int64       | 1               | 0                |
| HIST_FIDO_CD               | int64       | 66              | 0                |
| HIST_TDA_PRM_DO_CD         | int64       | 12              | 0                |
| HIST_PRIM_DO               | int64       | 36              | 0                |
| PRIOR_SOLE_PROP_SSN_NUM    | int64       | 1               | 0                |
| SOLE_PROP_SCH_PRD          | int64       | 1               | 0                |
| PRIOR_PRIN_OWNER_TRUST_TIN | int64       | 9               | 0                |
| PRIN_TIN_ORD               | int64       | 22              | 0                |
| F944_FRC_CYMN8_CD          | float64     | 1               | 92,181           |
| F944_FRC_CYMN7_CD          | float64     | 1               | 92,182           |
| F944_FRC_CYMN6_CD          | float64     | 1               | 92,181           |
| F944_FRC_CYMN5_CD          | float64     | 1               | 92,181           |
| F944_FRC_CYMN4_CD          | float64     | 1               | 92,180           |
| F944_FRC_CYMN3_CD          | float64     | 1               | 92,181           |
| F944_FRC_CYMN2_CD          | float64     | 1               | 92,180           |
| F944_FRC_CYMN1_CD          | float64     | 1               | 92,180           |
| F944_FRC_CY_CD             | float64     | 1               | 92,180           |
| F944_FRC_CYPL1_CD          | float64     | 0               | 92,186           |
| RAF_FL940_CD               | int64       | 2               | 0                |
| RAF_FL941_CD               | int64       | 2               | 0                |
| RAF_FL943_CD               | int64       | 1               | 0                |
| RAF_FL944_CD               | int64       | 2               | 0                |
| RAF_FL945_CD               | int64       | 1               | 0                |
| RAF_FLCT1_CD               | int64       | 1               | 0                |
| RAF_FL1042_CD              | int64       | 2               | 0                |
| EFTPS_940_CD               | int64       | 2               | 0                |
| EFTPS_941_CD               | int64       | 2               | 0                |
| EFTPS_943_CD               | int64       | 1               | 0                |
| EFTPS_944_CD               | int64       | 2               | 0                |
| EFTPS_945_CD               | int64       | 1               | 0                |
| EFTPS_CT1_CD               | int64       | 1               | 0                |
| EFTPS_720_CD               | int64       | 1               | 0                |
| EFTPS_1042_CD              | int64       | 2               | 0                |
| EFTPS_FCORP_CD             | int64       | 2               | 0                |
| GEOGRAPHIC_CD              | int64       | 597             | 0                |
| OLD_NM_CNTRL_CD            | object      | 3,202           | 84,212           |
| PRIMARY_STATE_CD           | object      | 54              | 44               |
| PRIMARY_CITY_NM            | object      | 6,154           | 0                |
| PRIMARY_ADR                | object      | 42,477          | 0                |
| PRIMARY_NM                 | object      | 87,453          | 0                |
| PRIMARY_CONT_NM            | object      | 28,496          | 21,066           |
| SORT_NM                    | object      | 46,978          | 37,868           |
| CARE_OF_NM                 | object      | 19,185          | 40,290           |
| LOCATION_ADR               | object      | 4,544           | 82,595           |
| LOCATION_CONT_ADR          | object      | 2,969           | 82,616           |
| FOREIGN_ADR                | object      | 24              | 92,142           |
| AO_CD                      | int64       | 6               | 0                |
| GRP_EXEMPT_NUM             | int64       | 1               | 0                |
| CURR_SUBSECT_CD            | int64       | 1               | 0                |
| PRIOR_SUBSECT_CD           | int64       | 8               | 0                |
| AFFILIATION_CD             | int64       | 3               | 0                |
| CLASSIFICATION_CD          | int64       | 2               | 0                |
| ORGANIZING_DT              | float64     | 0               | 92,186           |
| REGISTRATION_DT            | float64     | 0               | 92,186           |
| RULING_PRD                 | int64       | 115             | 0                |
| DEDUCT_CD                  | int64       | 3               | 0                |
| DEDUCT_YR                  | int64       | 9               | 0                |
| CURR_FOUNDATION_CD         | int64       | 4               | 0                |
| PR_FOUNDATION_CD           | int64       | 6               | 0                |
| BMF_ACTIVITY_CD            | int64       | 8               | 0                |
| ORGANIZATION_CD            | int64       | 5               | 0                |
| FILE_FOLDER_NUM            | int64       | 150             | 0                |
| PENSION_PLAN_CD            | int64       | 3               | 0                |
| ADV_RULING_PRD             | int64       | 1               | 0                |
| TEAM_EXPRG_CASE_CD         | int64       | 2               | 0                |
| NTEE_CD                    | object      | 2               | 92,183           |
| HOSP_CD                    | float64     | 1               | 2,667            |
| LOBBY_CD                   | int64       | 1               | 0                |
| LOBBY_ELECT_YR             | int64       | 1               | 0                |
| ASSET_CD                   | int64       | 10              | 0                |
| INCOME_CD                  | int64       | 10              | 0                |
| RTN_TAX_YR                 | int64       | 54              | 0                |
| CURR_STATUS_CD             | int64       | 1               | 0                |
| CURR_STATUS_PRD            | int64       | 616             | 0                |
| PRIOR_STATUS_CD            | float64     | 4               | 89,585           |
| PRIOR_STATUS_PRD           | int64       | 408             | 0                |
| ZIP_9_CD                   | int64       | 41,744          | 0                |
| ZIP_5_CD                   | int64       | 11,241          | 0                |
| ZIP_3_CD                   | int64       | 871             | 0                |
| REC_LOAD_DT                | object      | 51              | 0                |
| CHAR_TIN                   | int64       | 92,186          | 0                |
| ASSET_AMT                  | float64     | 74,396          | 12,927           |
| INCOME_AMT                 | float64     | 48,825          | 12,927           |
| Name                       | float64     | 0               | 92,186           |
| ICO                        | float64     | 0               | 92,186           |
| Sort_Name                  | float64     | 0               | 92,186           |
| Street                     | float64     | 0               | 92,186           |
| rx                         | float64     | 0               | 92,186           |
| matchpos                   | float64     | 0               | 92,186           |
| matchpos2                  | float64     | 0               | 92,186           |
| matchpos3                  | float64     | 0               | 92,186           |
| matchpos4                  | float64     | 0               | 92,186           |
| Old_Name                   | float64     | 0               | 92,186           |
| SSNFMT_NAME                | float64     | 0               | 92,186           |
| Old_ICO                    | float64     | 0               | 92,186           |
| SSNFMT_ICO                 | float64     | 0               | 92,186           |
| Old_Sort_Name              | float64     | 0               | 92,186           |
| SSNFMT_SORTNAME            | float64     | 0               | 92,186           |
| Old_Street                 | float64     | 0               | 92,186           |
| SSNFMT_STREET              | float64     | 0               | 92,186           |
| Strng_Length               | float64     | 0               | 92,186           |
| Nmbr_Before                | float64     | 0               | 92,186           |
| Nmbr_After                 | float64     | 0               | 92,186           |
| Name_9Dig                  | float64     | 0               | 92,186           |
| ICO_9Dig                   | float64     | 0               | 92,186           |
| Sort_Name_9Dig             | float64     | 0               | 92,186           |
| Street_9Dig                | float64     | 0               | 92,186           |
| DIG9_NAME                  | float64     | 0               | 92,186           |
| DIG9_ICO                   | float64     | 0               | 92,186           |
| DIG9_SORTNAME              | float64     | 0               | 92,186           |
| DIG9_STREET                | float64     | 0               | 92,186           |

# 3. Data Preparation

In this section, we detail the initial steps taken to prepare the split interest trust data from the IRS for analysis. Our goals are to ensure consistency in column naming, handle missing values appropriately, and convert data into formats that are suitable for our analytical needs. Please click the drop down arrow for more details on code used to achieve this.

In [13]:
# Standardize column names.
split_interest.columns = [x.lower() for x in split_interest.columns]

# Replace zeros with NaN for appropriate columns.

# Replace NaN with appropriate values accordingly.

# Convert columns to appropriate data types.
date_cols = ['established_prd','ruling_prd','curr_status_prd']
for col in date_cols:
    split_interest[col] = split_interest[col].astype(str).str.replace('\.0$', '', regex=True)
    split_interest[col] = pd.to_datetime(split_interest[col], format='%Y%m', errors='coerce')

# Drop duplicates.

# Convert dtype for appropriate columns.
split_interest['ein'] = split_interest['ein'].astype(str).str.replace('\.0$', '', regex=True)

# Show cleaned data.
head_caption = "Cleaned data sample view:"
head_df = split_interest.head().copy()
head_markdown = head_caption + "\n\n" + head_df.to_markdown(index=False)
display(Markdown(head_markdown))

Cleaned data sample view:

|   primary_tin_typ_cd |     ein | established_prd     | check_digit_cd   | name_cntrl_cd   |   coll_loc_cd |   ulc_cd |   mf_ao_cd |   small_bus_ao_cd |   primary_zip_cd |   fr_941_cd |   fr_1120_cd |   fr_720_cd |   fr_1041_cd |   fr_1065_cd |   fr_1066_cd |   fr_8804_cd |   fr_ct1_cd |   fr_940_cd |   fr_943_cd |   fr_1042_cd |   fr_944_cd |   fr_8752_cd |   fr_945_cd |   fr_990t_cd |   fr_1041a_cd |   fr_5227_cd |   fr_3520a_cd |   fr_990pf_cd |   fr_4720_cd |   fr_709_cd |   fr_706_cd |   fr_2290_cd |   fr_11c_cd |   fr_730_cd |   fr_990_cd |   fr_3520_cd |   fr_706gs_t_cd |   fr_706gs_d_cd |   unified_cum_amt |   caf_cnt |   naics_cd |   naics_val_cd |   naics_yr |   rrb_num |   num_partners_cnt |   f1065_ptr_tax_prd |   sole_prop_ssn_num |   officer_tin_num |   parent_ein |   latest_709_prd |   tefra_mft_cd |   tefra_prd |   adr_change_prd |   petition_dt |   sehi_credit_45r_1st_yr |   sehi_credit_45r_2nd_yr |   tc520_cc6x_dt |   business_prd |   business_close_dt |   wages_prd |   wages_last_dt |   cfol_updt_prd |   ftd_deposit_yr |   ftd_his_deposit_yr |   levy_hearing_dt |   tc971_ac954_yr |   memo_sw_cd |   tda_tdi_ind |   tda_cd |   tdi_ind |   sw941_cd |   tc52x_cd |   sw910_cd |   ind637_cd |   tc74x_cd |   c_corp_cd |   invalid_ssn_cd |   reval_ssn_cd |   offset_cd |   sw918_ind |   tc148_cd |   sw914_cd |   act_cd |   sw1120_cd |   bankrupt_flc_cd |   sw53_cd |   tc530_unrev_cd |   tc530_unrv_cc19_cd |   non_53_tda_cd |   doc_87_cd |   sw720_cd |   idrs_cd |   tc130_cd |   tc844_cd |   idrs_act_cd |   fcic_cd |   tc59x_cd |   ftd_alert_cd |   overflow_cd |   eo_stat_cd |   close_53_cd |   idrs_jj_cd |   upc_359_ind |   aims_cd |   hardship_cd |   imf_filing_cd |   ind_2032a_cd |   csed_cd |   eomf_cd |   tax_shelter_ind |   audit_history_cd |   cms_cd |   dmf_cd |   pmf_cd |   fm_8123_cd |   fm_ct1_cd |   sw1042_cd |   sw940_cd |   sw943_cd |   oic_yr |   tc06x_cd |   opening_do_cd |   sc_jurisdiction_cd |   cc84_cd |   bal_due_ind |   es_penalty_cd |   pdt_ind |   sw597_ind |   nanny_tax_cd |   tcmp_cd |   f944_cd |   lra136_cd |   large_corp_cd |   tcmp_cycle_cd |   tc09x_num |   fiscal_prd |   pr_fy_mon_num |   delq_history_cd |   eris_cd |   status_58_cd |   lps_act_ind |   bankrupt_loc_cd |   status_26_cd |   status_02_cd |   rtn_1_cd |   rtn_2_cd |   rtn_3_cd |   rtn_4_cd |   rtn_5_cd |   mrs_cd |   bmfol_cycle_cd |   bmfol_active_cd |   cep_ind |   disaster_cd |   f990t_eo_org_cd |   corp_990t_ind |   gatt_cd |   vendor_cd |   dmf_frz_cd |   tc076_cd | mf_bod_cd   | bod_client_cd   |   man_bod_client_ind |   fm941_tda_tdi_cd |   levy_sw_cd |   eftps_84_cd |   eftps_85_cd |   employment_cd |   cd_48_ind |   tc520_cc6x_ind |   coll_loc_ind |   tc08x_cp_ind |   ny_pent_terr_cd |   tot_1065_1120_cd |   frz_916_cd |   act_ret_pension_cd |   ero_ind |   eftp_ftd_abate_ind |   ind527_cd |   f8872_type_cd |   f8872_pr_cd |   final_8872_cd |   katrina_cd |   tel_ex_tax_ref_cd |   f944_bs6_cd |   tetr_424_cd |   lien_cd |   f944_byps_cd |   swcp148_cd |   f990_pf_rep_ind |   llc_ind |   foreign_cntry_cd |   cobra_assist_cd |   undel_adr_cd |   undel_cd |   corp_ftp_cd |   appl_lg_emplr_cd |   doj_cd |   identity_theft_cd |   private_debt_collection_cd |   daily_delq_pnlty_501c_cd |   priv_debt_coll_tp_auth_num |   indiv_estate_ind |   ac754_ind |   hist_fido_cd |   hist_tda_prm_do_cd |   hist_prim_do |   prior_sole_prop_ssn_num |   sole_prop_sch_prd |   prior_prin_owner_trust_tin |   prin_tin_ord |   f944_frc_cymn8_cd |   f944_frc_cymn7_cd |   f944_frc_cymn6_cd |   f944_frc_cymn5_cd |   f944_frc_cymn4_cd |   f944_frc_cymn3_cd |   f944_frc_cymn2_cd |   f944_frc_cymn1_cd |   f944_frc_cy_cd |   f944_frc_cypl1_cd |   raf_fl940_cd |   raf_fl941_cd |   raf_fl943_cd |   raf_fl944_cd |   raf_fl945_cd |   raf_flct1_cd |   raf_fl1042_cd |   eftps_940_cd |   eftps_941_cd |   eftps_943_cd |   eftps_944_cd |   eftps_945_cd |   eftps_ct1_cd |   eftps_720_cd |   eftps_1042_cd |   eftps_fcorp_cd |   geographic_cd | old_nm_cntrl_cd   | primary_state_cd   | primary_city_nm   | primary_adr   | primary_nm                        | primary_cont_nm                |   sort_nm | care_of_nm                       |   location_adr |   location_cont_adr |   foreign_adr |   ao_cd |   grp_exempt_num |   curr_subsect_cd |   prior_subsect_cd |   affiliation_cd |   classification_cd |   organizing_dt |   registration_dt | ruling_prd          |   deduct_cd |   deduct_yr |   curr_foundation_cd |   pr_foundation_cd |   bmf_activity_cd |   organization_cd |   file_folder_num |   pension_plan_cd |   adv_ruling_prd |   team_exprg_case_cd |   ntee_cd |   hosp_cd |   lobby_cd |   lobby_elect_yr |   asset_cd |   income_cd |   rtn_tax_yr |   curr_status_cd | curr_status_prd     |   prior_status_cd |   prior_status_prd |   zip_9_cd |   zip_5_cd |   zip_3_cd | rec_load_dt   |   char_tin |   asset_amt |   income_amt |   name |   ico |   sort_name |   street |   rx |   matchpos |   matchpos2 |   matchpos3 |   matchpos4 |   old_name |   ssnfmt_name |   old_ico |   ssnfmt_ico |   old_sort_name |   ssnfmt_sortname |   old_street |   ssnfmt_street |   strng_length |   nmbr_before |   nmbr_after |   name_9dig |   ico_9dig |   sort_name_9dig |   street_9dig |   dig9_name |   dig9_ico |   dig9_sortname |   dig9_street |
|---------------------:|--------:|:--------------------|:-----------------|:----------------|--------------:|---------:|-----------:|------------------:|-----------------:|------------:|-------------:|------------:|-------------:|-------------:|-------------:|-------------:|------------:|------------:|------------:|-------------:|------------:|-------------:|------------:|-------------:|--------------:|-------------:|--------------:|--------------:|-------------:|------------:|------------:|-------------:|------------:|------------:|------------:|-------------:|----------------:|----------------:|------------------:|----------:|-----------:|---------------:|-----------:|----------:|-------------------:|--------------------:|--------------------:|------------------:|-------------:|-----------------:|---------------:|------------:|-----------------:|--------------:|-------------------------:|-------------------------:|----------------:|---------------:|--------------------:|------------:|----------------:|----------------:|-----------------:|---------------------:|------------------:|-----------------:|-------------:|--------------:|---------:|----------:|-----------:|-----------:|-----------:|------------:|-----------:|------------:|-----------------:|---------------:|------------:|------------:|-----------:|-----------:|---------:|------------:|------------------:|----------:|-----------------:|---------------------:|----------------:|------------:|-----------:|----------:|-----------:|-----------:|--------------:|----------:|-----------:|---------------:|--------------:|-------------:|--------------:|-------------:|--------------:|----------:|--------------:|----------------:|---------------:|----------:|----------:|------------------:|-------------------:|---------:|---------:|---------:|-------------:|------------:|------------:|-----------:|-----------:|---------:|-----------:|----------------:|---------------------:|----------:|--------------:|----------------:|----------:|------------:|---------------:|----------:|----------:|------------:|----------------:|----------------:|------------:|-------------:|----------------:|------------------:|----------:|---------------:|--------------:|------------------:|---------------:|---------------:|-----------:|-----------:|-----------:|-----------:|-----------:|---------:|-----------------:|------------------:|----------:|--------------:|------------------:|----------------:|----------:|------------:|-------------:|-----------:|:------------|:----------------|---------------------:|-------------------:|-------------:|--------------:|--------------:|----------------:|------------:|-----------------:|---------------:|---------------:|------------------:|-------------------:|-------------:|---------------------:|----------:|---------------------:|------------:|----------------:|--------------:|----------------:|-------------:|--------------------:|--------------:|--------------:|----------:|---------------:|-------------:|------------------:|----------:|-------------------:|------------------:|---------------:|-----------:|--------------:|-------------------:|---------:|--------------------:|-----------------------------:|---------------------------:|-----------------------------:|-------------------:|------------:|---------------:|---------------------:|---------------:|--------------------------:|--------------------:|-----------------------------:|---------------:|--------------------:|--------------------:|--------------------:|--------------------:|--------------------:|--------------------:|--------------------:|--------------------:|-----------------:|--------------------:|---------------:|---------------:|---------------:|---------------:|---------------:|---------------:|----------------:|---------------:|---------------:|---------------:|---------------:|---------------:|---------------:|---------------:|----------------:|-----------------:|----------------:|:------------------|:-------------------|:------------------|:--------------|:----------------------------------|:-------------------------------|----------:|:---------------------------------|---------------:|--------------------:|--------------:|--------:|-----------------:|------------------:|-------------------:|-----------------:|--------------------:|----------------:|------------------:|:--------------------|------------:|------------:|---------------------:|-------------------:|------------------:|------------------:|------------------:|------------------:|-----------------:|---------------------:|----------:|----------:|-----------:|-----------------:|-----------:|------------:|-------------:|-----------------:|:--------------------|------------------:|-------------------:|-----------:|-----------:|-----------:|:--------------|-----------:|------------:|-------------:|-------:|------:|------------:|---------:|-----:|-----------:|------------:|------------:|------------:|-----------:|--------------:|----------:|-------------:|----------------:|------------------:|-------------:|----------------:|---------------:|--------------:|-------------:|------------:|-----------:|-----------------:|--------------:|------------:|-----------:|----------------:|--------------:|
|                    2 | 2001555 | 1971-12-01 00:00:00 | UI               | OAKE            |             0 |        1 |        nan |                21 |      44015108025 |           0 |            0 |           0 |            0 |            0 |            0 |            0 |           0 |           0 |           0 |            0 |           0 |            0 |           0 |            0 |             0 |            0 |             0 |             0 |            0 |           0 |           0 |            0 |           0 |           0 |           0 |            0 |               0 |               0 |                 0 |         0 |     813000 |              5 |       1988 |         0 |                  0 |                   0 |                   0 |                 0 |            0 |                0 |              0 |           0 |           200640 |           nan |                        0 |                        0 |             nan |              0 |                 nan |           0 |             nan |          202203 |                0 |                    0 |               nan |                0 |            0 |             0 |        0 |         0 |          0 |          0 |          0 |           0 |          0 |           0 |                0 |              0 |           0 |           0 |          0 |          0 |        1 |           0 |                 0 |         0 |                0 |                    0 |               0 |           0 |          0 |         0 |          0 |          0 |             0 |         0 |          0 |              0 |             0 |            2 |             0 |            0 |             0 |         0 |             0 |               0 |              0 |         0 |         0 |                 0 |                  0 |        0 |        0 |        0 |            0 |           0 |           0 |          0 |          0 |        0 |          0 |               1 |                    0 |         0 |             0 |               0 |         0 |           0 |              0 |         0 |         0 |           0 |               0 |               0 |           0 |       198504 |               0 |                 0 |         0 |              0 |             0 |                 0 |              0 |              0 |          0 |          0 |          0 |          0 |          0 |        0 |                0 |                 0 |         0 |             4 |                 0 |               0 |         0 |           0 |            0 |          0 | TE          | B               |                    0 |                  0 |            0 |             0 |             0 |             nan |           0 |                0 |              0 |              0 |                 0 |                  0 |            0 |                    0 |         0 |                    0 |           0 |               0 |             0 |               0 |            0 |                   0 |             0 |             0 |         0 |              0 |            0 |                 0 |       nan |                nan |                 0 |              0 |          0 |             0 |                  0 |        0 |                   0 |                            0 |                          0 |                            0 |                  0 |           0 |              1 |                    0 |              4 |                         0 |                   0 |                            0 |              0 |                 nan |                 nan |                 nan |                 nan |                 nan |                 nan |                 nan |                 nan |              nan |                 nan |              0 |              0 |              0 |              0 |              0 |              0 |               0 |              0 |              0 |              0 |              0 |              0 |              0 |              0 |               0 |                0 |               0 | nan               | ME                 | BANGOR            | 2 HAMMOND ST  | MYRTICE OAKES TR UW NECT          | nan                            |       nan | % MERRILL TR CO TTEE             |            nan |                 nan |           nan |       1 |                0 |                90 |                  0 |                0 |                1000 |             nan |               nan | 1970-12-01 00:00:00 |           0 |           0 |                    0 |                  0 |         909000000 |                 2 |          40000000 |                 2 |                0 |                    0 |       nan |         0 |          0 |                0 |          0 |           0 |         1970 |               12 | 1973-05-01 00:00:00 |               nan |                  0 |   44015108 |       4401 |         44 | 2022-01-21    | 2002001555 |         nan |          nan |    nan |   nan |         nan |      nan |  nan |        nan |         nan |         nan |         nan |        nan |           nan |       nan |          nan |             nan |               nan |          nan |             nan |            nan |           nan |          nan |         nan |        nan |              nan |           nan |         nan |        nan |             nan |           nan |
|                    2 | 2032809 | 1976-06-01 00:00:00 | SJ               | HISP            |             0 |       13 |        nan |                21 |     100192701991 |           0 |            0 |           0 |            0 |            0 |            0 |            0 |           0 |           0 |           0 |            0 |           0 |            0 |           0 |            0 |             0 |            0 |             0 |             0 |            0 |           0 |           0 |            0 |           0 |           0 |           0 |            0 |               0 |               0 |                 0 |         0 |     813000 |              5 |       1981 |         0 |                  0 |                   0 |                   0 |                 0 |            0 |                0 |              0 |           0 |           200639 |           nan |                        0 |                        0 |             nan |              0 |                 nan |           0 |             nan |          202206 |                0 |                    0 |               nan |                0 |            0 |             0 |        0 |         0 |          0 |          0 |          0 |           0 |          0 |           0 |                0 |              0 |           0 |           0 |          0 |          0 |        1 |           0 |                 0 |         0 |                0 |                    0 |               0 |           0 |          0 |         0 |          0 |          0 |             0 |         0 |          0 |              0 |             0 |            2 |             0 |            0 |             0 |         0 |             0 |               0 |              0 |         0 |         0 |                 0 |                  0 |        0 |        0 |        0 |            0 |           0 |           0 |          0 |          0 |        0 |          0 |              13 |                    0 |         0 |             0 |               0 |         0 |           0 |              0 |         0 |         0 |           0 |               0 |               0 |           0 |       198412 |               0 |                 0 |         0 |              0 |             0 |                 0 |              0 |              0 |          0 |          0 |          0 |          0 |          0 |        0 |                0 |                 0 |         0 |             4 |                 0 |               0 |         0 |           0 |            0 |          0 | TE          | B               |                    0 |                  0 |            0 |             0 |             0 |             nan |           0 |                0 |              0 |              0 |                 1 |                  0 |            0 |                    0 |         0 |                    0 |           0 |               0 |             0 |               0 |            0 |                   0 |             0 |             0 |         0 |              0 |            0 |                 0 |       nan |                nan |                 0 |              0 |          0 |             0 |                  0 |        0 |                   0 |                            0 |                          0 |                            0 |                  0 |           0 |             13 |                    0 |             13 |                         0 |                   0 |                            0 |              0 |                 nan |                 nan |                 nan |                 nan |                 nan |                 nan |                 nan |                 nan |              nan |                 nan |              0 |              0 |              0 |              0 |              0 |              0 |               0 |              0 |              0 |              0 |              0 |              0 |              0 |              0 |               0 |                0 |               0 | nan               | NY                 | NEW YORK          | 9 W 57TH ST   | HISPANIC SOCIETY OF AMERICA TRUST | NECT                           |       nan | % MORGAN GUARANTY TRUST COMPANY  |            nan |                 nan |           nan |       1 |                0 |                90 |                  0 |                0 |                1000 |             nan |               nan | 1976-05-01 00:00:00 |           0 |           0 |                    0 |                  0 |         909000000 |                 2 |         130000000 |                 2 |                0 |                    0 |       nan |         0 |          0 |                0 |          0 |           0 |            0 |               12 | 1976-05-01 00:00:00 |               nan |                  0 |  100192701 |      10019 |        100 | 2022-02-11    | 2002032809 |         nan |          nan |    nan |   nan |         nan |      nan |  nan |        nan |         nan |         nan |         nan |        nan |           nan |       nan |          nan |             nan |               nan |          nan |             nan |            nan |           nan |          nan |         nan |        nan |              nan |           nan |         nan |        nan |             nan |           nan |
|                    2 | 2126212 | 1977-10-01 00:00:00 | KT               | BONT            |             0 |       35 |        nan |                24 |     468020000000 |           0 |            0 |           0 |            0 |            0 |            0 |            0 |           0 |           0 |           0 |            0 |           0 |            0 |           0 |            0 |             0 |            0 |             0 |             0 |            0 |           0 |           0 |            0 |           0 |           0 |           0 |            0 |               0 |               0 |                 0 |         0 |     813000 |              5 |       1982 |         0 |                  0 |                   0 |                   0 |                 0 |            0 |                0 |              0 |           0 |           200637 |           nan |                        0 |                        0 |             nan |              0 |                 nan |           0 |             nan |          202203 |                0 |                    0 |               nan |                0 |            0 |             0 |        0 |         0 |          0 |          0 |          0 |           0 |          0 |           0 |                0 |              0 |           0 |           0 |          0 |          0 |        1 |           1 |                 0 |         0 |                0 |                    0 |               0 |           0 |          0 |         0 |          0 |          0 |             0 |         0 |          0 |              0 |             0 |            2 |             0 |            0 |             0 |         0 |             0 |               0 |              0 |         0 |         0 |                 0 |                  0 |        0 |        0 |        0 |            0 |           0 |           0 |          0 |          0 |        0 |          0 |              35 |                    0 |         0 |             0 |               0 |         0 |           0 |              0 |         0 |         0 |           0 |               0 |               0 |           0 |       198609 |               0 |                 0 |         0 |              0 |             0 |                 0 |              0 |              0 |          0 |          0 |          0 |          0 |          0 |        0 |                0 |                 0 |         0 |             4 |                 0 |               0 |         0 |           0 |            0 |          0 | TE          | B               |                    0 |                  0 |            0 |             0 |             0 |             nan |           0 |                0 |              0 |              0 |                 0 |                  0 |            0 |                    0 |         0 |                    0 |           0 |               0 |             0 |               0 |            0 |                   0 |             0 |             0 |         0 |              0 |            0 |                 0 |       nan |                nan |                 0 |              0 |          0 |             0 |                  0 |        0 |                   0 |                            0 |                          0 |                            0 |                  0 |           0 |             35 |                    0 |             35 |                         0 |                   0 |                            0 |              0 |                 nan |                 nan |                 nan |                 nan |                 nan |                 nan |                 nan |                 nan |              nan |                 nan |              0 |              0 |              0 |              0 |              0 |              0 |               0 |              0 |              0 |              0 |              0 |              0 |              0 |              0 |               0 |                0 |               0 | CAST              | IN                 | FORT WAYNE        | LOCAL         | LOUISE BONTER UNITRUST FBO JESSIE | FRANCES CASTLE                 |       nan | % FORT WAYNE NATIONAL BANK       |            nan |                 nan |           nan |       3 |                0 |                90 |                  0 |                0 |                1000 |             nan |               nan | 1977-10-01 00:00:00 |           0 |           0 |                    0 |                  0 |         928000000 |                 2 |         310000000 |                 2 |                0 |                    0 |       nan |         0 |          0 |                0 |          4 |           3 |         1981 |               12 | 1977-10-01 00:00:00 |               nan |                  0 |  468020000 |      46802 |        468 | 2022-01-21    | 2002126212 |         nan |          nan |    nan |   nan |         nan |      nan |  nan |        nan |         nan |         nan |         nan |        nan |           nan |       nan |          nan |             nan |               nan |          nan |             nan |            nan |           nan |          nan |         nan |        nan |              nan |           nan |         nan |        nan |             nan |           nan |
|                    2 | 2190092 | 1971-12-01 00:00:00 | CU               | JARV            |             0 |       13 |        nan |                21 |     100202302992 |           0 |            0 |           0 |            0 |            0 |            0 |            0 |           0 |           0 |           0 |            0 |           0 |            0 |           0 |            0 |             0 |            0 |             0 |             0 |            0 |           0 |           0 |            0 |           0 |           0 |           0 |            0 |               0 |               0 |                 0 |         0 |     813000 |              5 |       1985 |         0 |                  0 |                   0 |                   0 |                 0 |            0 |                0 |              0 |           0 |           200637 |           nan |                        0 |                        0 |             nan |              0 |                 nan |           0 |             nan |          202206 |                0 |                    0 |               nan |                0 |            0 |             0 |        0 |         0 |          0 |          0 |          0 |           0 |          0 |           0 |                0 |              0 |           0 |           0 |          0 |          0 |        1 |           1 |                 0 |         0 |                0 |                    0 |               0 |           0 |          0 |         0 |          0 |          0 |             0 |         0 |          0 |              0 |             0 |            2 |             0 |            0 |             0 |         0 |             0 |               0 |              0 |         0 |         0 |                 0 |                  0 |        0 |        0 |        0 |            0 |           0 |           0 |          0 |          0 |        0 |          0 |              13 |                    0 |         0 |             0 |               0 |         0 |           0 |              0 |         0 |         0 |           0 |               0 |               0 |           0 |       198512 |               0 |                 0 |         0 |              0 |             0 |                 0 |              0 |              0 |          0 |          0 |          0 |          0 |          0 |        0 |                0 |                 0 |         0 |             4 |                 0 |               0 |         0 |           0 |            0 |          0 | TE          | B               |                    0 |                  0 |            0 |             0 |             0 |             nan |           0 |                0 |              0 |              0 |                 1 |                  0 |            0 |                    0 |         0 |                    0 |           0 |               0 |             0 |               0 |            0 |                   0 |             0 |             0 |         0 |              0 |            0 |                 0 |       nan |                nan |                 0 |              0 |          0 |             0 |                  0 |        0 |                   0 |                            0 |                          0 |                            0 |                  0 |           0 |             13 |                    0 |             13 |                         0 |                   0 |                            0 |              0 |                 nan |                 nan |                 nan |                 nan |                 nan |                 nan |                 nan |                 nan |              nan |                 nan |              0 |              0 |              0 |              0 |              0 |              0 |               0 |              0 |              0 |              0 |              0 |              0 |              0 |              0 |               0 |                0 |               0 | JAVI              | NY                 | NEW YORK          | 600 FIFTH AVE | JAMES N JARVIE TRUST UW PAR 14 FB | JERUSALEM YMCA BUILDING ISRAEL |       nan | % MANUFACTURERS HANOVER TRUST CO |            nan |                 nan |           nan |       1 |                0 |                90 |                  0 |                0 |                1000 |             nan |               nan | 1970-12-01 00:00:00 |           0 |           0 |                    0 |                  0 |         909000000 |                 2 |         130000000 |                 2 |                0 |                    0 |       nan |         0 |          0 |                0 |          0 |           0 |         1970 |               12 | 1973-08-01 00:00:00 |               nan |                  0 |  100202302 |      10020 |        100 | 2022-02-11    | 2002190092 |         nan |          nan |    nan |   nan |         nan |      nan |  nan |        nan |         nan |         nan |         nan |        nan |           nan |       nan |          nan |             nan |               nan |          nan |             nan |            nan |           nan |          nan |         nan |        nan |              nan |           nan |         nan |        nan |             nan |           nan |
|                    2 | 2209021 | 1979-03-01 00:00:00 | WQ               | HAMM            |             0 |       56 |        nan |                23 |     273490000000 |           0 |            0 |           0 |            0 |            0 |            0 |            0 |           0 |           0 |           0 |            0 |           0 |            0 |           0 |            0 |             0 |            0 |             0 |             0 |            0 |           0 |           0 |            0 |           0 |           0 |           0 |            0 |               0 |               0 |                 0 |         0 |     999000 |              5 |       1983 |         0 |                  0 |                   0 |                   0 |                 0 |            0 |                0 |              0 |           0 |           200636 |           nan |                        0 |                        0 |             nan |              0 |                 nan |           0 |             nan |          202241 |                0 |                    0 |               nan |                0 |            0 |             0 |        0 |         0 |          0 |          0 |          0 |           0 |          0 |           0 |                0 |              0 |           0 |           0 |          0 |          0 |        1 |           0 |                 0 |         0 |                0 |                    0 |               0 |           0 |          0 |         0 |          0 |          0 |             0 |         0 |          0 |              0 |             0 |            2 |             0 |            0 |             0 |         0 |             0 |               0 |              0 |         0 |         0 |                 0 |                  0 |        0 |        0 |        0 |            0 |           0 |           0 |          0 |          0 |        0 |          0 |              56 |                    0 |         0 |             0 |               0 |         0 |           0 |              0 |         0 |         0 |           0 |               0 |               0 |           0 |       198312 |               0 |                 0 |         0 |              0 |             0 |                 0 |              0 |              0 |          0 |          0 |          0 |          0 |          0 |        0 |                0 |                 0 |         0 |             5 |                 0 |               0 |         0 |           0 |            0 |          0 | TE          | B               |                    0 |                  0 |            0 |             0 |             0 |             nan |           0 |                0 |              0 |              0 |                 0 |                  0 |            0 |                    0 |         0 |                    0 |           0 |               0 |             0 |               0 |            0 |                   0 |             0 |             0 |         0 |              0 |            0 |                 0 |       nan |                nan |                 0 |              0 |          0 |             0 |                  0 |        0 |                   0 |                            0 |                          0 |                            0 |                  0 |           0 |             56 |                    0 |             56 |                         0 |                   0 |                            0 |              0 |                 nan |                 nan |                 nan |                 nan |                 nan |                 nan |                 nan |                 nan |              nan |                 nan |              0 |              0 |              0 |              0 |              0 |              0 |               0 |              0 |              0 |              0 |              0 |              0 |              0 |              0 |               0 |                0 |               0 | nan               | NC                 | SNOW CAMP         | LOCAL         | HAMMER TRUST                      | nan                            |       nan | % A W MOON JR                    |            nan |                 nan |           nan |       2 |                0 |                90 |                  0 |                0 |                1000 |             nan |               nan | NaT                 |           0 |           0 |                    0 |                  0 |                 0 |                 5 |         580001038 |                 0 |                0 |                    0 |       nan |         0 |          0 |                0 |          5 |           4 |         1977 |               12 | 1982-07-01 00:00:00 |                 4 |             198201 |  273490000 |      27349 |        273 | 2022-10-14    | 2002209021 |         nan |          nan |    nan |   nan |         nan |      nan |  nan |        nan |         nan |         nan |         nan |        nan |           nan |       nan |          nan |             nan |               nan |          nan |             nan |            nan |           nan |          nan |         nan |        nan |              nan |           nan |         nan |        nan |             nan |           nan |

# 4. Analysis
Objective: Determine if the split interest trust data can be used to conduct a comprehensive analysis of existing philanthropic giving in environmental and social justice. Additionally, can it be used to assess the level of transparency and accountability in current giving practices?


## 4.1. Indentifying Relevant Organizations

The split interest trust data can be filtered to reflect organizations by their codes (reflecting their primary mission), enabling the identification of nonprofits focused on environmental protection, social justice, advocacy, and related activities. This step is crucial for creating a focused dataset of relevant organizations for our objective above.

The next order of data transformation should involve filtering the split interest trust data in the same manner as the Exempt Organizations Business Master File and 990 Forms.

**Action item**: Review with team to determine which column makes the most sense to use to filter relevant orgs. Options include:
* Subection and Classification codes.
* National Taxonomy of Exempt Entities (NTEE) codes (many are missing unfortunately).
* Foundation codes.
* Activity codes (most likely not useful since becoming obsolete with the adoption of the NTEE coding system in January 1995).

In [None]:
# Insert code here for appropriate filtering if necessary.

In [4]:
def display_head(df, columns, caption):
    head_df = df[columns].head()
    head_markdown = f"{caption}\n\n{head_df.to_markdown(index=False)}"
    display(Markdown(head_markdown))
def display_unique_values(df, columns):
    unique_val_df = df[columns].value_counts().reset_index()
    uni_markdown = f"{unique_val_df.to_markdown(index=False)}"
    display(Markdown(uni_markdown))
def display_missing_cts(df, columns):
    missing_values = df[columns].isna().sum().reset_index()
    missing_markdown = f"{missing_values.rename(columns={'index':'column name',0:'missing data points'}).to_markdown(index=False)}"
    display(Markdown(missing_markdown))
def display_stats(df, columns):
    stats = df[columns].describe().reset_index()
    for col in stats.columns[1:]:  
        if df[col].dtype == 'float64' or df[col].dtype == 'float32':
            stats[col] = stats[col].astype(float).apply(lambda x: f"{x:,.0f}")
    stats_markdown = stats.rename(columns={'index': 'Statistics'}).to_markdown(index=False)
    display(Markdown(stats_markdown))

## 4.2 Financial and Charitable Contributions
* fr_5227_cd: Indicates whether the Form 5227 has been filed. This is crucial as it directly relates to split-interest trusts, providing a starting point for identifying relevant entities.
* asset_amt: The amount of assets the trust holds. This information can help gauge the size and potential impact of the trust.
* income_amt: The income amount can offer insights into the financial capacity and activity level of the trust within a given period.

In [6]:
unique_columns = ['fr_5227_cd','asset_amt','income_amt']
display_stats(split_interest, unique_columns)

display_missing_cts(split_interest, unique_columns)

| Statistics   |   fr_5227_cd | asset_amt       | income_amt     |
|:-------------|-------------:|:----------------|:---------------|
| count        |  92186       | 79,259          | 79,259         |
| mean         |      2.18666 | 20,150,374      | 353,715        |
| std          |      9.30574 | 2,495,732,782   | 37,669,447     |
| min          |      0       | -5,607,001      | -1,446,332     |
| 25%          |      1       | 111,156         | 6,068          |
| 50%          |      1       | 297,205         | 17,998         |
| 75%          |      1       | 803,410         | 53,902         |
| max          |     88       | 481,223,000,000 | 10,357,423,300 |

| column name   |   missing data points |
|:--------------|----------------------:|
| fr_5227_cd    |                     0 |
| asset_amt     |                 12927 |
| income_amt    |                 12927 |

## 4.3 Trust and Beneficiary Information
* name, primary_adr, primary_city_nm, primary_state_cd, zip_5_cd: These columns provide detailed information about the trust, including its name and location. This can be used to assess geographical trends in giving and to potentially correlate trusts with local environmental and social justice initiatives.

In [9]:
columns = ['ein','name','primary_adr','primary_city_nm','primary_state_cd','zip_5_cd']
display_head(split_interest, columns, "Example of trust and beneficiary data:")


Example of trust and beneficiary data:

|     ein |   name | primary_adr   | primary_city_nm   | primary_state_cd   |   zip_5_cd |
|--------:|-------:|:--------------|:------------------|:-------------------|-----------:|
| 2001555 |    nan | 2 HAMMOND ST  | BANGOR            | ME                 |       4401 |
| 2032809 |    nan | 9 W 57TH ST   | NEW YORK          | NY                 |      10019 |
| 2126212 |    nan | LOCAL         | FORT WAYNE        | IN                 |      46802 |
| 2190092 |    nan | 600 FIFTH AVE | NEW YORK          | NY                 |      10020 |
| 2209021 |    nan | LOCAL         | SNOW CAMP         | NC                 |      27349 |

## 4.4 Operational and Compliance Indicators
* established_prd: Knowing when the trust was established can help identify long-standing contributors versus newer entities in the philanthropic landscape.
* ruling_prd: This could indicate when the trust was granted tax-exempt status, which might correlate with strategic shifts in philanthropic giving.
* curr_status_cd and curr_status_prd: These columns indicate the current status of the trust and the period of this status, which can be vital for understanding operational standing and compliance.
* audit_history_cd: An indicator of whether the trust has been audited, which could imply levels of scrutiny and compliance.

In [14]:
columns = ['ein','established_prd','ruling_prd','curr_status_cd','curr_status_prd','audit_history_cd']
display_head(split_interest, columns, "Example of operational and compliance data:")

Example of operational and compliance data:

|     ein | established_prd     | ruling_prd          |   curr_status_cd | curr_status_prd     |   audit_history_cd |
|--------:|:--------------------|:--------------------|-----------------:|:--------------------|-------------------:|
| 2001555 | 1971-12-01 00:00:00 | 1970-12-01 00:00:00 |               12 | 1973-05-01 00:00:00 |                  0 |
| 2032809 | 1976-06-01 00:00:00 | 1976-05-01 00:00:00 |               12 | 1976-05-01 00:00:00 |                  0 |
| 2126212 | 1977-10-01 00:00:00 | 1977-10-01 00:00:00 |               12 | 1977-10-01 00:00:00 |                  0 |
| 2190092 | 1971-12-01 00:00:00 | 1970-12-01 00:00:00 |               12 | 1973-08-01 00:00:00 |                  0 |
| 2209021 | 1979-03-01 00:00:00 | NaT                 |               12 | 1982-07-01 00:00:00 |                  0 |

## 4.5 Giving Patterns and Trends
* deduct_cd and deduct_yr: Information on deductions claimed can offer insights into the trust's giving patterns and the timing of their contributions.

In [10]:
columns = ['ein','deduct_cd','deduct_yr']
display_head(split_interest, columns, "Example of giving patterns and trends data:")

Example of giving patterns and trends data:

|     ein |   deduct_cd |   deduct_yr |
|--------:|------------:|------------:|
| 2001555 |           0 |           0 |
| 2032809 |           0 |           0 |
| 2126212 |           0 |           0 |
| 2190092 |           0 |           0 |
| 2209021 |           0 |           0 |

## 4.6 General Compliance and Filing Information
* rec_load_dt: The record load date might provide a timeline for data analysis, showing when information was last updated.
* char_tin: Though not directly related to financial data, the trust's identification number could be useful for cross-referencing other databases or verifying information.

In [11]:
columns = ['ein','rec_load_dt','char_tin']
display_head(split_interest, columns, "Example of general compliance and filing info data:")

Example of general compliance and filing info data:

|     ein | rec_load_dt   |   char_tin |
|--------:|:--------------|-----------:|
| 2001555 | 2022-01-21    | 2002001555 |
| 2032809 | 2022-02-11    | 2002032809 |
| 2126212 | 2022-01-21    | 2002126212 |
| 2190092 | 2022-02-11    | 2002190092 |
| 2209021 | 2022-10-14    | 2002209021 |