# Charity classification analysis

Analysis using a new dataset that classifies and tags all active and inactive charities according to their activity/sector. This analysis explores how number of charities in specific activities have changed, whether specific sectors were more "trendy" at some point and whether others have died out.

For this analysis we need a number of different modules. 

In [92]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import date
import os
import ast

%matplotlib inline

## Import data

We will define the root of the repository, so that we can import files from the folder more easily. 

In [4]:
root = os.path.abspath(os.path.join(os.getcwd(), ".."))

First we need to [import data](https://charityclassification.org.uk/data/data-downloads/) from the Charity Classification project. We will focus on the ICNPTSO classifications for now. 

In [6]:
active_icnptso = pd.read_csv(os.path.join(root, "data\\raw\\charities_active-icnptso.csv"))
inactive_icnptso = pd.read_csv(os.path.join(root, "data\\raw\\charities_inactive-icnptso.csv"))
icnptso = pd.read_csv(os.path.join(root, "data\\raw\\icnptso.csv"))

Then we import data from the Charity Commission register using [FindThatCharity](https://findthatcharity.uk/orgid/type/registered-charity). This includes all active and removed charities registered in the UK as of November 2021. 

In [7]:
charities = pd.read_csv(os.path.join(root, "data\\raw\\registered-charity.csv"))

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


## Cleaning

### Check

First we want to get a feel for how big the different dataframes are and what variables they contain.

In [15]:
dataframes = [active_icnptso,inactive_icnptso,icnptso, charities ]

for df in dataframes:
    print(len(df))
    print(df.head(2))
    print("")
    print("")

202222
           org_id icnptso_code  icnptso_code_probability icnptso_code_source
0  GB-CHC-1000000          B32                     0.358            ml_model
1  GB-CHC-1000001          A11                     0.836            ml_model


191438
           org_id icnptso_code  icnptso_code_probability icnptso_code_source
0  GB-CHC-1000004          A11                     1.000            ml_model
1  GB-CHC-1000006          D13                     0.697            ml_model


127
  Section Group Sub-group                                             Title  \
0       A   NaN       NaN  Culture, communication and recreation activities   
1       A   A10       NaN                                  Culture and arts   

           Notes  
0  ICNPO Group 1  
1            NaN  


394035
               id                         name charityNumber companyNumber  \
0  GB-CHC-1000000  THE ROYAL ANNIVERSARY TRUST       1000000           NaN   
1  GB-CHC-1000001          THE ARTS FOUNDATION       100

### Merging

Now we are going to merge classifications for active and inactive into one dataframe.

In [71]:
all_icnptso = pd.concat([active_icnptso,inactive_icnptso])
all_icnptso.head(2)

Unnamed: 0,org_id,icnptso_code,icnptso_code_probability,icnptso_code_source
0,GB-CHC-1000000,B32,0.358,ml_model
1,GB-CHC-1000001,A11,0.836,ml_model


Then we want to merge the classifications to data from the Charity Commission register and start slowly building our dataframe (df) for analysis.

In [88]:
#select only columns we are interested in
df = charities[['id', 'name', 'charityNumber', 'dateRegistered', 'dateRemoved', 'active']]

#join classification data
df = pd.merge(df, all_icnptso, left_on= df["id"], right_on=all_icnptso["org_id"], how='left')

#remove columns we don't need
df = df.drop(columns=["key_0", "org_id"])

df.head(2)

Unnamed: 0,id,name,charityNumber,dateRegistered,dateRemoved,active,icnptso_code,icnptso_code_probability,icnptso_code_source
0,GB-CHC-1000000,THE ROYAL ANNIVERSARY TRUST,1000000,1990-08-03,,True,B32,0.358,ml_model
1,GB-CHC-1000001,THE ARTS FOUNDATION,1000001,1990-08-02,,True,A11,0.836,ml_model


Now we want to clean up the icnptso categories a bit more and merge in the titles.

In [89]:
#get group
df["icnptso_group"] = df["icnptso_code"].str[0]

#get title for group
df = pd.merge(df, icnptso[["Section", "Title"]], left_on = df["icnptso_group"], right_on=icnptso["Section"], how='left')
df = df.drop(columns=["key_0", "Section"]).rename(columns={"Title": "icnptso_group_title"})

#get title for subgroup
df = pd.merge(df, icnptso[["Sub-group", "Title"]], left_on = df["icnptso_code"], right_on=icnptso["Sub-group"], how='left')
df = df.drop(columns=["key_0", "Sub-group"]).rename(columns={"Title": "icnptso_title"})

### Dates

In [103]:
#convert columns to pandas datetime
df["dateRemoved"] = pd.to_datetime(df["dateRemoved"])
df["dateRegistered"] = pd.to_datetime(df["dateRegistered"])

#get years from date
df["yearRemoved"] = df["dateRemoved"].dt.year
df["yearRegistered"] = df["dateRegistered"].dt.year

## Analysis