# Charity classification analysis

Analysis using a new dataset that classifies and tags all active and inactive charities according to their activity/sector. This analysis explores how number of charities in specific activities have changed, whether specific sectors were more "trendy" at some point and whether others have died out.

For this analysis we need a number of different modules. 

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import date
import os
import ast

%matplotlib inline

## Import data

We will define the root of the repository, so that we can import files from the folder more easily. 

In [2]:
root = os.path.abspath(os.path.join(os.getcwd(), ".."))

First we need to [import data](https://charityclassification.org.uk/data/data-downloads/) from the Charity Classification project. We will focus on the ICNPTSO classifications for now. 

In [3]:
active_icnptso = pd.read_csv(os.path.join(root, "data\\raw\\charities_active-icnptso.csv"))
inactive_icnptso = pd.read_csv(os.path.join(root, "data\\raw\\charities_inactive-icnptso.csv"))
icnptso = pd.read_csv(os.path.join(root, "data\\raw\\icnptso.csv"))

Then we import data from the Charity Commission register using [FindThatCharity](https://findthatcharity.uk/orgid/type/registered-charity). This includes all active and removed charities registered in the UK as of November 2021. 

In [4]:
charities = pd.read_csv(os.path.join(root, "data\\raw\\registered-charity.csv"))

  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


## Cleaning

### Check

First we want to get a feel for how big the different dataframes are and what variables they contain.

In [5]:
dataframes = [active_icnptso,inactive_icnptso,icnptso, charities ]

for df in dataframes:
    print(len(df))
    print(df.head(2))
    print("")
    print("")

202222
           org_id icnptso_code  icnptso_code_probability icnptso_code_source
0  GB-CHC-1000000          B32                     0.358            ml_model
1  GB-CHC-1000001          A11                     0.836            ml_model


191438
           org_id icnptso_code  icnptso_code_probability icnptso_code_source
0  GB-CHC-1000004          A11                     1.000            ml_model
1  GB-CHC-1000006          D13                     0.697            ml_model


127
  Section Group Sub-group                                             Title  \
0       A   NaN       NaN  Culture, communication and recreation activities   
1       A   A10       NaN                                  Culture and arts   

           Notes  
0  ICNPO Group 1  
1            NaN  


394035
               id                         name charityNumber companyNumber  \
0  GB-CHC-1000000  THE ROYAL ANNIVERSARY TRUST       1000000           NaN   
1  GB-CHC-1000001          THE ARTS FOUNDATION       100

### Merging

Now we are going to merge classifications for active and inactive into one dataframe.

In [6]:
all_icnptso = pd.concat([active_icnptso,inactive_icnptso])
all_icnptso.head(2)

Unnamed: 0,org_id,icnptso_code,icnptso_code_probability,icnptso_code_source
0,GB-CHC-1000000,B32,0.358,ml_model
1,GB-CHC-1000001,A11,0.836,ml_model


Then we want to merge the classifications to data from the Charity Commission register and start slowly building our dataframe (df) for analysis.

In [30]:
#select only columns we are interested in
df = charities[['id', 'name', 'charityNumber', 'dateRegistered', 'dateRemoved', 'active']]

#join classification data
df = pd.merge(df, all_icnptso, left_on= df["id"], right_on=all_icnptso["org_id"], how='left')

#remove columns we don't need
df = df.drop(columns=["key_0", "org_id"])

len(df)

394035

Now we want to clean up the icnptso categories a bit more and merge in the titles.

In [31]:
#get group
df["icnptso_group"] = df["icnptso_code"].str[0]

#get title for group
df_join = icnptso.loc[icnptso["Group"].isna()][["Section", "Title"]]
df = pd.merge(df, df_join, left_on = df["icnptso_group"], right_on=df_join["Section"], how='left')
df = df.drop(columns=["key_0", "Section"]).rename(columns={"Title": "icnptso_group_title"})

len(df)

394035

In [32]:
df.head()

Unnamed: 0,id,name,charityNumber,dateRegistered,dateRemoved,active,icnptso_code,icnptso_code_probability,icnptso_code_source,icnptso_group,icnptso_group_title
0,GB-CHC-1000000,THE ROYAL ANNIVERSARY TRUST,1000000,1990-08-03,,True,B32,0.358,ml_model,B,Education services
1,GB-CHC-1000001,THE ARTS FOUNDATION,1000001,1990-08-02,,True,A11,0.836,ml_model,A,"Culture, communication and recreation activities"
2,GB-CHC-1000002,THE SPENSER-MORRIS CHARITABLE FOUNDATION LIMITED,1000002,1990-08-02,,True,H10,0.993,ml_model,H,Philanthropic intermediaries and voluntarism p...
3,GB-CHC-1000003,SOUTHERN AFRICA RESOURCES CENTRE,1000003,1990-08-02,,True,G20,,manual,G,"Civic, advocacy, political and international a..."
4,GB-CHC-1000004,THE MICHAEL VYNER TRUST,1000004,1990-08-01,2004-10-20,False,A11,1.0,ml_model,A,"Culture, communication and recreation activities"


### Dates

In [45]:
#convert columns to pandas datetime
df["dateRemoved"] = pd.to_datetime(df["dateRemoved"])
df["dateRegistered"] = pd.to_datetime(df["dateRegistered"])

#get years from date
df["yearRemoved"] = df["dateRemoved"].dt.year
df["yearRegistered"] = df["dateRegistered"].dt.year

#combine years before 1960 (as E&W database only starts then)
df["yearRemovedCombined"] = df["yearRemoved"]
df.loc[df["yearRemoved"] < 1960, "yearRemovedCombined"] = 1960 
df["yearRegisteredCombined"] = df["yearRegistered"]
df.loc[df["yearRegistered"] < 1960, "yearRegisteredCombined"] = 1960 

## Analysis

### Number of registrations by year and group

In [46]:
table = df.groupby(["yearRegisteredCombined", "icnptso_group_title"])["id"].count()
table = pd.DataFrame(table).unstack()

#export table
table.to_csv(os.path.join(root, "data\\processed\\registrations-by-year-combined-icnptso.csv"))

table

Unnamed: 0_level_0,id,id,id,id,id,id,id,id,id,id,id,id
icnptso_group_title,"Business, professional and labour organizations","Civic, advocacy, political and international activities","Community and economic development, and housing activities","Culture, communication and recreation activities",Education services,Environmental protection and animal welfare activities,Human health services,Other activities,Philanthropic intermediaries and voluntarism promotion,"Professional, scientific, accounting and administrative services",Religious congregations and associations,Social services
yearRegisteredCombined,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2
1960,11.0,147.0,157.0,207.0,117.0,7.0,31.0,,627.0,20.0,1422.0,219.0
1961,24.0,347.0,368.0,281.0,479.0,10.0,68.0,3.0,1401.0,24.0,266.0,203.0
1962,121.0,989.0,1774.0,1000.0,1292.0,82.0,250.0,8.0,5740.0,88.0,1198.0,762.0
1963,98.0,2236.0,2727.0,1477.0,1694.0,83.0,252.0,6.0,4925.0,85.0,1282.0,805.0
1964,62.0,1190.0,1610.0,1081.0,1407.0,37.0,311.0,8.0,6355.0,74.0,1403.0,791.0
...,...,...,...,...,...,...,...,...,...,...,...,...
2017,18.0,871.0,777.0,1149.0,701.0,65.0,434.0,14.0,983.0,92.0,1255.0,885.0
2018,11.0,601.0,592.0,972.0,675.0,52.0,351.0,25.0,866.0,89.0,1020.0,820.0
2019,18.0,644.0,689.0,1053.0,798.0,57.0,440.0,18.0,906.0,95.0,1130.0,893.0
2020,11.0,698.0,703.0,1003.0,647.0,53.0,493.0,16.0,942.0,96.0,1209.0,1017.0


### Removals per year and group

In [47]:
table = df.groupby(["yearRemovedCombined", "icnptso_group_title"])["id"].count()
table = pd.DataFrame(table).unstack()

#export table
table.to_csv(os.path.join(root, "data\\processed\\removals-by-year-combined-icnptso.csv"))

table

Unnamed: 0_level_0,id,id,id,id,id,id,id,id,id,id,id,id
icnptso_group_title,"Business, professional and labour organizations","Civic, advocacy, political and international activities","Community and economic development, and housing activities","Culture, communication and recreation activities",Education services,Environmental protection and animal welfare activities,Human health services,Other activities,Philanthropic intermediaries and voluntarism promotion,"Professional, scientific, accounting and administrative services",Religious congregations and associations,Social services
yearRemovedCombined,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2
1961.0,2.0,25.0,24.0,45.0,21.0,2.0,14.0,1.0,87.0,8.0,22.0,44.0
1962.0,4.0,9.0,9.0,9.0,7.0,2.0,7.0,,27.0,4.0,3.0,11.0
1963.0,,14.0,17.0,11.0,5.0,2.0,6.0,,19.0,,9.0,6.0
1964.0,1.0,13.0,16.0,11.0,8.0,1.0,4.0,,43.0,1.0,12.0,6.0
1965.0,1.0,7.0,16.0,7.0,7.0,2.0,3.0,,30.0,,15.0,8.0
...,...,...,...,...,...,...,...,...,...,...,...,...
2017.0,16.0,467.0,393.0,555.0,704.0,25.0,284.0,12.0,809.0,76.0,588.0,961.0
2018.0,18.0,576.0,539.0,616.0,782.0,37.0,334.0,20.0,860.0,79.0,727.0,1207.0
2019.0,17.0,631.0,573.0,750.0,819.0,25.0,328.0,15.0,1069.0,126.0,896.0,1129.0
2020.0,11.0,400.0,433.0,481.0,611.0,18.0,261.0,13.0,647.0,60.0,594.0,759.0


### charities active by year

In [51]:
group = list(df["icnptso_group_title"].unique())

In [61]:

sum((df["yearRegistered"]<= 1960) & ( df["yearRemoved"] > 1960 | df["yearRemoved"].isna()))

672

In [None]:
sum(df["yearRegistered"]<= 1960 )

In [None]:
if 78 <= grade <= 89: