# Visualising the network of 360 Giving grants

## Preparing the data

This notebook takes you through the process of preparing grants data
to produce a network diagram showing relationships between funders.
In this notebook I will:

1. Fetch grants data from [GrantNav](http://grantnav.threesixtygiving.org/)
2. Transform it into a tables of links and nodes
3. Export those tables to the right format

To create a network diagram you need two parts:

1. *Points* (aka *Nodes*) represent the organisations, both recipients and funders
2. *Links* (aka *Relations* or *Edges*) represent the relationship between points - in this case that one has funded the other.

The first step is to import the [pandas](https://pandas.pydata.org/) data 
analysis library. I'll be using this to fetch and transform the data.

In [1]:
import pandas as pd

### 1. Fetch grants data

First I make a variable containing a URL to a GrantNav CSV file.
This link was found by performing a search on GrantNav and then
right clicking on the "CSV" download button in the top right, 
and selecting "Copy link location" (the exact way of copying the
link may vary in different browsers).

In [2]:
grantnav_url = 'http://grantnav.threesixtygiving.org/search.csv?json_query=%7B%22aggs%22%3A+%7B%22fundingOrganization%22%3A+%7B%22terms%22%3A+%7B%22field%22%3A+%22fundingOrganization.id_and_name%22%2C+%22size%22%3A+3%7D%7D%2C+%22recipientRegionName%22%3A+%7B%22terms%22%3A+%7B%22field%22%3A+%22recipientRegionName%22%2C+%22size%22%3A+3%7D%7D%2C+%22currency%22%3A+%7B%22terms%22%3A+%7B%22field%22%3A+%22currency%22%2C+%22size%22%3A+3%7D%7D%2C+%22recipientDistrictName%22%3A+%7B%22terms%22%3A+%7B%22field%22%3A+%22recipientDistrictName%22%2C+%22size%22%3A+3%7D%7D%2C+%22recipientOrganization%22%3A+%7B%22terms%22%3A+%7B%22field%22%3A+%22recipientOrganization.id_and_name%22%2C+%22size%22%3A+3%7D%7D%7D%2C+%22query%22%3A+%7B%22bool%22%3A+%7B%22filter%22%3A+%5B%7B%22bool%22%3A+%7B%22should%22%3A+%5B%5D%7D%7D%2C+%7B%22bool%22%3A+%7B%22should%22%3A+%5B%5D%7D%7D%2C+%7B%22bool%22%3A+%7B%22should%22%3A+%5B%5D%2C+%22must%22%3A+%7B%7D%7D%7D%2C+%7B%22bool%22%3A+%7B%22should%22%3A+%7B%22range%22%3A+%7B%22amountAwarded%22%3A+%7B%7D%7D%7D%2C+%22must%22%3A+%7B%7D%7D%7D%2C+%7B%22bool%22%3A+%7B%22should%22%3A+%5B%5D%7D%7D%2C+%7B%22bool%22%3A+%7B%22should%22%3A+%5B%5D%7D%7D%2C+%7B%22bool%22%3A+%7B%22should%22%3A+%5B%5D%7D%7D%2C+%7B%22bool%22%3A+%7B%22should%22%3A+%5B%5D%7D%7D%5D%2C+%22must%22%3A+%7B%22query_string%22%3A+%7B%22default_field%22%3A+%22_all%22%2C+%22query%22%3A+%22awardDate%3A%5B2016-01-01+TO+2017-12-31%5D%22%7D%7D%7D%7D%2C+%22extra_context%22%3A+%7B%22amountAwardedFixed_facet_size%22%3A+3%2C+%22awardYear_facet_size%22%3A+3%7D%2C+%22sort%22%3A+%7B%22_score%22%3A+%7B%22order%22%3A+%22desc%22%7D%7D%7D'

Make a list of the columns I want to use in the data - only a small number are relevant for this exercise.

In [3]:
columns = [
    "Identifier", "Currency", "Amount Awarded", "Award Date",
    "Recipient Org:Identifier", "Recipient Org:Name", 
    "Recipient Org:Charity Number", "Recipient Org:Company Number",
    "Funding Org:Identifier", "Funding Org:Name"
]

Then I use pandas to fetch the data. The `read_csv` method accepts an
`index_col` parameter which tells it which column to use as an index, while
passing our columns to `usecols` means only those columns will be returned

In [4]:
grants = pd.read_csv(grantnav_url, index_col='Identifier', usecols=columns)

I've added a dummy `Grants` variable with a value of 1 for each row. This
will help later when I want to count the number of grants.

In [5]:
grants.loc[:, "Grants"] = 1

In [6]:
grants.loc[:, "Funding Org:Name"] = grants["Funding Org:Name"].str.strip()

Let's take a look at the resulting data - first see how many rows there are:

In [7]:
len(grants)

62369

Then preview the list itself

In [8]:
grants

Unnamed: 0_level_0,Currency,Amount Awarded,Award Date,Recipient Org:Identifier,Recipient Org:Name,Recipient Org:Charity Number,Recipient Org:Company Number,Funding Org:Identifier,Funding Org:Name,Grants
Identifier,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
360G-LankellyChase-2016-17-004,GBP,10000.0,2017-03-13T00:00:00+00:00,GB-COH-05836950,The Barrow Cadbury Trust,1115476,05836950,GB-CHC-1107583,Lankelly Chase Foundation,1
360G-LankellyChase-2016-17-008,GBP,5250.0,2016-11-14T00:00:00+00:00,GB-COH-07556168,Centre for Criminal Appeals,1144162,07556168,GB-CHC-1107583,Lankelly Chase Foundation,1
360G-LankellyChase-2016-17-011,GBP,20000.0,2016-06-08T00:00:00+00:00,GB-COH-05137036,The Centre for Social Justice,,05137036,GB-CHC-1107583,Lankelly Chase Foundation,1
360G-LankellyChase-2016-17-012,GBP,100000.0,2016-11-14T00:00:00+00:00,GB-COH-08259430,Collaborate CIC,0,08259430,GB-CHC-1107583,Lankelly Chase Foundation,1
360G-LankellyChase-2016-17-020,GBP,20000.0,2016-05-10T00:00:00+00:00,GB-COH-02959712,The Forum for the Future,1040519,02959712,GB-CHC-1107583,Lankelly Chase Foundation,1
360G-LankellyChase-2016-17-034,GBP,160000.0,2016-06-08T00:00:00+00:00,GB-COH-01792921,Local Solutions,515060,01792921,GB-CHC-1107583,Lankelly Chase Foundation,1
360G-LankellyChase-2016-17-040,GBP,65000.0,2016-10-10T00:00:00+00:00,GB-COH-09042558,Our Sorority CIC,1162882,09042558,GB-CHC-1107583,Lankelly Chase Foundation,1
360G-LankellyChase-2016-17-048,GBP,71960.0,2016-09-12T00:00:00+00:00,GB-COH-06402143,Social Finance Ltd,,06402143,GB-CHC-1107583,Lankelly Chase Foundation,1
360G-LankellyChase-2016-17-049,GBP,10000.0,2016-12-12T00:00:00+00:00,GB-COH-08364475,Social Innovation Exchange/SIX,1155570,08364475,GB-CHC-1107583,Lankelly Chase Foundation,1
360G-LankellyChase-2016-17-052,GBP,80000.0,2016-10-10T00:00:00+00:00,GB-COH-07992556,Transforming Choice CIC,,07992556,GB-CHC-1107583,Lankelly Chase Foundation,1


### 2. Transform into a new format

I might want to filter the data before plotting it, so it's useful to copy to
a new variable to keep the original `grants` variable as is. 

In this case I've filtered to only include grants in 2017.

In [9]:
to_use = grants[grants["Award Date"].str.startswith("2017")]

#### Create the links table

First I create the links table. This has one row per funding
relationship, from Funder > Recipient. I've added some columns
showing the number of grants made by each funder to the recipient
and the value of those grants.

To do this I use the `groupby` pandas function to group, and then
`sum` to pick up the amount and number of grants. I've reset the index
to make it easier to use the table later on.

In [25]:
links = to_use.groupby(
    ["Recipient Org:Identifier", "Funding Org:Identifier"]
).sum()[["Amount Awarded", "Grants"]].reset_index()
links

Unnamed: 0,Recipient Org:Identifier,Funding Org:Identifier,Amount Awarded,Grants
0,360G-ArcadiaFund:ORG-University-of-Hamburg,360G-ArcadiaFund,2000000.00,1
1,360G-BarnetCouncil-ORG:1st-3rd-New-Barnet-Scou...,GB-LAE-BNE,5000.00,1
2,360G-BarnetCouncil-ORG:Barnet-Bowls-Club,GB-LAE-BNE,5000.00,1
3,360G-BarnetCouncil-ORG:Finchley-Horticultural-...,GB-LAE-BNE,4950.00,1
4,360G-BarnetCouncil-ORG:Mill-Hill-Neighbourhood...,GB-LAE-BNE,750.00,1
5,360G-BarnetCouncil-ORG:Stonegrove-Estates-Yout...,GB-LAE-BNE,2405.00,1
6,360G-BarnetCouncil-ORG:The-Hope-Of-Childs-Hill,GB-LAE-BNE,5000.00,2
7,360G-BirminghamCC-acocks_green_nhood_forum,GB-LAE-BIR,800.00,1
8,360G-BirminghamCC-bham_neighbourhood_forum,GB-LAE-BIR,610.00,2
9,360G-BirminghamCC-boldmere_neighbourhood_forum,GB-LAE-BIR,700.00,1


I only want to use links including recipients that have received
grants from more than one funder, to limit the size of the network
diagram. 

To do this I've used the `value_counts()` function to get a list
of unique recipients and how many relationships they occur in.

In [26]:
link_recipients = links["Recipient Org:Identifier"].value_counts()

Then I filter this list to include only those links where the
recipient appears three or more times.

In [27]:
links = links[links["Recipient Org:Identifier"].isin(link_recipients[link_recipients>=3].index)]

The resulting table can then be saved as a CSV file.

In [28]:
links.to_csv("links.csv", index=False)

#### Create the points table

Next I need a table with every recipient and funder who appears in 
the links table. This is done by creating two identical tables and
then concatenating them.

I've gone back to the original grants table (aliased to `to_use`)
to get this data, this is so that I can pick up the name of each 
organisation. The names can vary so I've just used what it says 
the first time it appears.

In [29]:
recipients = to_use.groupby("Recipient Org:Identifier").agg({
    "Recipient Org:Name": "first",
    "Grants": "sum",
    "Amount Awarded": "sum",
}).rename(columns={"Recipient Org:Name": "name"})
funders = to_use.groupby("Funding Org:Identifier").agg({
    "Funding Org:Name": "first",
    "Grants": "sum",
    "Amount Awarded": "sum",
}).rename(columns={"Funding Org:Name": "name"})

Next I filter it so it only includes the organisations that are shown in 
the links table.

In [30]:
recipients = recipients[recipients.index.isin(links["Recipient Org:Identifier"].unique())]
funders = funders[funders.index.isin(links["Funding Org:Identifier"].unique())]

The recipient and funder tables are then concatenated to produce one table,
and saved as a CSV file.

In [31]:
points = pd.concat({"recipients": recipients, "funders": funders}, names=["group", "identifier"])
points.to_csv("points.csv")

### Create output for Onodo

The [Onodo](https://onodo.org/) tool for creating network diagrams needs the data in a particular format, in a single spreadsheet.

In [79]:
points_onodo = points.reset_index()[["identifier", "group", "name"]].rename(columns={
    "identifier": "Name",
    "group": "Type",
    "name": "Description"
})
points_onodo.loc[:, "Visible"] = 1
points_onodo

Unnamed: 0,Name,Type,Description,Visible
0,360G-blf,funders,The Big Lottery Fund,1
1,GB-COH-IP00525R,funders,Co-operative Group,1
2,GB-CHC-230260,funders,Garfield Weston Foundation,1
3,GB-COH-RC000766,funders,Sport England,1
4,GB-CHC-1080418,funders,Quartet Community Foundation,1
5,GB-SC-SC002970,funders,The Robertson Trust,1
6,GB-CHC-1045304,funders,Heart Of England Community Foundation,1
7,GB-CHC-1105580,funders,The Tudor Trust,1
8,GB-CHC-327114,funders,Lloyds Bank Foundation for England and Wales,1
9,GB-COH-02273708,funders,Community Foundation serving Tyne & Wear and N...,1


In [80]:
links_onodo = links.rename(columns={"Funding Org:Identifier": "Source", "Recipient Org:Identifier": "Target"})
links_onodo.loc[:, "Type"] = "funded"
links_onodo.loc[:, "Directed"] = 1
links_onodo = links_onodo[["Source", "Type", "Target", "Directed"]]
links_onodo

Unnamed: 0,Source,Type,Target,Directed
7985,GB-COH-04831118,funded,360G-trafford-theatre_of_the_senses,1
7986,GB-LAE-TRF,funded,360G-trafford-theatre_of_the_senses,1
8067,360G-blf,funded,GB-CHC-1000011,1
8068,GB-CHC-226446,funded,GB-CHC-1000011,1
8075,360G-blf,funded,GB-CHC-1000340,1
8076,GB-CHC-230260,funded,GB-CHC-1000340,1
8077,360G-blf,funded,GB-CHC-1000351,1
8078,GB-COH-IP00525R,funded,GB-CHC-1000351,1
8092,360G-blf,funded,GB-CHC-1000714,1
8093,GB-CHC-230260,funded,GB-CHC-1000714,1


In [81]:
writer = pd.ExcelWriter('360_onodo.xlsx', engine='xlsxwriter')
points_onodo.to_excel(writer, sheet_name='Nodes', index=False)
links_onodo.to_excel(writer, sheet_name='Relations', index=False)
writer.save()

### Create Chord output

In [152]:
l = links.join(grants.groupby("Funding Org:Identifier").first()["Funding Org:Name"], on="Funding Org:Identifier")

In [153]:
funders = l["Funding Org:Name"].unique()

In [154]:
funder_rels = {}
for f in funders:
    recipients = l.loc[
                l["Funding Org:Name"]==f, 
                "Recipient Org:Identifier"].unique()
    funder_rels[f] = pd.DataFrame(l.loc[
        l["Recipient Org:Identifier"].isin(recipients) & (l["Funding Org:Name"]!=f),
        "Funding Org:Name"
    ].value_counts().rename("grants"))

In [155]:
funder_rels = pd.concat(funder_rels, names=["Funder from", "Funder to"])

In [157]:
funder_rels.to_csv("chord.csv")