## 1. START
Download data and do a first quick review


In [6]:
import pandas as pd
import matplotlib.pyplot as plt 
import seaborn as sns 
import numpy as np 

# download csv file with data base
df = pd.read_csv('C:/MyFiles/Study/Charity_Grant_Analysis/data/360giving_grants.csv')


In [8]:
# first quick look at the data
#size of the data frame
print(f' We have rows and columns {df.shape}')


 We have rows and columns (449414, 14)


In [10]:
#display first 5 rows
df.head()

Unnamed: 0,Title,Description,Amount Awarded,Award Date,Recipient Org:Name,Funding Org:Name,Funding Org: Org Type (additional data),Recipient Region (additional data),Best Available Region (additional data),Type of Recipient,Grant Programme:Title,Recipient Org: Org Type (additional data),License,Note See http://grantnav.threesixtygiving.org/datasets/ for further license information.
0,Grant to Youth Learning Network Ltd,Our Routes2Success (R2S) programme will be wor...,3011.14,2024-10-19,Youth Learning Network Ltd,Co-operative Group,Grantmaking Organisation,London,London,Organisation,Co-op Local Community Fund,Registered Charity,https://creativecommons.org/licenses/by-sa/4.0/,
1,A grant to Rights & Security International,"A restricted grant of £25,000 towards UK work ...",25000.0,2025-03-17,Rights & Security International,A B Charitable Trust,Grantmaking Organisation,London,London,Organisation,The Human Rights Framework,Registered Charity,https://creativecommons.org/licenses/by/4.0/,
2,A grant to Hillingdon Law Centre,"An unrestricted grant of £25,000 (This grant i...",25000.0,2025-03-17,Hillingdon Law Centre,A B Charitable Trust,Grantmaking Organisation,London,London,Organisation,Access to Justice,Registered Charity,https://creativecommons.org/licenses/by/4.0/,
3,A grant to ECPAT UK,"An unrestricted grant of £25,000 (This grant i...",25000.0,2025-03-17,ECPAT UK,A B Charitable Trust,Grantmaking Organisation,London,London,Organisation,Migrants and Refugees,Registered Charity,https://creativecommons.org/licenses/by/4.0/,
4,A grant to Paddington Law Centre,"An unrestricted grant of £20,000 (This grant i...",20000.0,2025-03-17,Paddington Law Centre,A B Charitable Trust,Grantmaking Organisation,London,London,Organisation,Access to Justice,Registered Charity,https://creativecommons.org/licenses/by/4.0/,


In [13]:
# display data types 
print(f'Data types in the dataframe:\n {df.dtypes}')

Data types in the dataframe:
 Title                                                                                        object
Description                                                                                  object
Amount Awarded                                                                              float64
Award Date                                                                                   object
Recipient Org:Name                                                                           object
Funding Org:Name                                                                             object
Funding Org: Org Type (additional data)                                                      object
Recipient Region (additional data)                                                           object
Best Available Region (additional data)                                                      object
Type of Recipient                                                     

In [14]:
# Do we have missing values?
df.isna().sum()



Title                                                                                            8
Description                                                                                      4
Amount Awarded                                                                                   0
Award Date                                                                                       0
Recipient Org:Name                                                                          138106
Funding Org:Name                                                                                 0
Funding Org: Org Type (additional data)                                                          0
Recipient Region (additional data)                                                               0
Best Available Region (additional data)                                                          0
Type of Recipient                                                                                0
Grant Prog

In [16]:
# quick analysis of numbers columns
df['Amount Awarded'].describe()

count    4.494140e+05
mean     1.750841e+05
std      7.051984e+06
min     -4.586625e+07
25%      4.553800e+02
50%      5.000000e+03
75%      2.496900e+04
max      2.351584e+09
Name: Amount Awarded, dtype: float64

### Results

- The dataset contains 449,414 rows and 14 columns.
- Most columns are of type `object` (including the date column).
- There are no missing values in 12 columns. Only 8 values are missing in `Title` and 4 in `Description`, which is not critical.
- There are 138,106 missing values in `Recipient Org: Name`. We also have missing values in `Grant Programme: Title` (34,441) and `Recipient Org: Org Type (additional data)` (277,395).
- This is likely due to individual recipients; I will inspect these columns after removing individuals.
- From a quick analysis, the mean of `Amount Awarded` is 175,084.08. Negative values represent returned funds.

Now we need to create a cleaned dataset that will help answer our questions.

## 2. Manipulations

To answer our questions we will perform the following steps:

1. Remove records with negative `Amount Awarded`.
2. Filter out individual recipients — focus only on grants to organisations.
3. Drop unnecessary columns (for example, `Best Available Region` and `License`).
4. Convert column data types where needed (e.g., parse dates).

In [17]:
# Remove all data with minus
df_new = df[df['Amount Awarded']>0]
df_new.head(3)

Unnamed: 0,Title,Description,Amount Awarded,Award Date,Recipient Org:Name,Funding Org:Name,Funding Org: Org Type (additional data),Recipient Region (additional data),Best Available Region (additional data),Type of Recipient,Grant Programme:Title,Recipient Org: Org Type (additional data),License,Note See http://grantnav.threesixtygiving.org/datasets/ for further license information.
0,Grant to Youth Learning Network Ltd,Our Routes2Success (R2S) programme will be wor...,3011.14,2024-10-19,Youth Learning Network Ltd,Co-operative Group,Grantmaking Organisation,London,London,Organisation,Co-op Local Community Fund,Registered Charity,https://creativecommons.org/licenses/by-sa/4.0/,
1,A grant to Rights & Security International,"A restricted grant of £25,000 towards UK work ...",25000.0,2025-03-17,Rights & Security International,A B Charitable Trust,Grantmaking Organisation,London,London,Organisation,The Human Rights Framework,Registered Charity,https://creativecommons.org/licenses/by/4.0/,
2,A grant to Hillingdon Law Centre,"An unrestricted grant of £25,000 (This grant i...",25000.0,2025-03-17,Hillingdon Law Centre,A B Charitable Trust,Grantmaking Organisation,London,London,Organisation,Access to Justice,Registered Charity,https://creativecommons.org/licenses/by/4.0/,


In [63]:
df_new.shape

(444998, 14)

In [37]:
# Filter our database and remove individuals. We are focused only in grants for organisations
# Let's see what we have in a column ['Type of Recipient]
df_new_1 = df_new['Type of Recipient'].value_counts()
display(df_new_1)


Type of Recipient
Organisation    306893
Individual      138105
Name: count, dtype: int64

In [40]:
# count individuals grants
df_individuals = df_new[df_new['Type of Recipient'] == 'Individual']
df_individuals_len = len(df_individuals)

print(f'we have {df_individuals_len} individual grants and this is {round(df_individuals_len/len(df_new)*100, 2)} %')



we have 138105 individual grants and this is 31.03 %


In [41]:
# count organisation grants

df_organisation = df_new[df_new['Type of Recipient'] == 'Organisation']
df_organisation_len = len(df_organisation)

print(f'we have {df_organisation_len} individual grants and this is {round(df_organisation_len/len(df_new)*100, 2)} %')



we have 306893 individual grants and this is 68.97 %


In [42]:
# let's leave only Organisation type because this is our focus
df_charity = df_new[df_new['Type of Recipient'] == 'Organisation']

In [64]:
# now let's delete extra columns
df_charity = df_new.drop(columns=['Best Available Region (additional data)', 'License', 'Note See http://grantnav.threesixtygiving.org/datasets/ for further license information.'])

In [59]:
# double check
df_charity.columns

Index(['Title', 'Description', 'Amount Awarded', 'Award Date',
       'Recipient Org:Name', 'Funding Org:Name',
       'Funding Org: Org Type (additional data)',
       'Recipient Region (additional data)', 'Type of Recipient',
       'Grant Programme:Title', 'Recipient Org: Org Type (additional data)'],
      dtype='object')

In [65]:
#change data column type 
df_charity['Award Date'] = pd.to_datetime(df_charity['Award Date'])

In [66]:
#lets check our new data frame
df_charity.shape 


(444998, 11)

In [69]:
print(f'our data base decreased on {len(df)-len(df_charity)} grants')

our data base decreased on 4416 grants


In [72]:
#check type of columns
df_charity.dtypes

Title                                                object
Description                                          object
Amount Awarded                                      float64
Award Date                                   datetime64[ns]
Recipient Org:Name                                   object
Funding Org:Name                                     object
Funding Org: Org Type (additional data)              object
Recipient Region (additional data)                   object
Type of Recipient                                    object
Grant Programme:Title                                object
Recipient Org: Org Type (additional data)            object
dtype: object

In [77]:
#basic info about Awarded Amount
df_charity['Amount Awarded'].describe()

count    4.449980e+05
mean     1.780255e+05
std      7.085478e+06
min      1.000000e-02
25%      4.970000e+02
50%      5.000000e+03
75%      2.500000e+04
max      2.351584e+09
Name: Amount Awarded, dtype: float64

In [78]:
#check missing values now
df_charity.isna().sum()

Title                                             8
Description                                       4
Amount Awarded                                    0
Award Date                                        0
Recipient Org:Name                           138106
Funding Org:Name                                  0
Funding Org: Org Type (additional data)           0
Recipient Region (additional data)                0
Type of Recipient                                 0
Grant Programme:Title                         34438
Recipient Org: Org Type (additional data)    274297
dtype: int64

In [82]:
#let's dive dipper
df_charity['Recipient Org:Name'].value_counts().sort_values(ascending=False)

Recipient Org:Name
University College London                       1450
Imperial College London                         1200
University of Oxford                             987
Individual Recipient                             928
University of Cambridge                          782
                                                ... 
Brass                                              1
ACES Bridlington Club and Friends by the Sea       1
Weighton Wildlife Group                            1
Rotary Club of Malton and Norton                   1
Decadent Drawing CIC                               1
Name: count, Length: 134156, dtype: int64

### Results

- We reduced the dataset; `df_charity` is our working dataset now.
- The dataset decreased by 4,416 grants.
- The mean `Amount Awarded` remains similar but is far above most grant amounts, indicating a skewed distribution.
-We still have 138106 missing values in 'Recipient Org:Name'. This is 31% of data. We can work with this because this is not the main focus for us.