Skip to content
JohnHBrock edited this page Sep 9, 2013 · 23 revisions

Gathering and merging structured data about nonprofits

GivingGraph enables you to aggregate - and merge - data on nonprofit organizations from structured sources such as GuideStar and CharityNavigator (with appropriate API permissions).

This step will provide information such as nonprofit name, financials, and categorization code. This is how we get our canonical list of what nonprofits are out there - the necessary first step to linking them to companies and to each other.

Crawling websites and mining unstructured data about nonprofits and companies

GivingGraph collects information connecting nonprofits and companies from the following sources:

  • News stories: GivingGraph uses Yahoo News to gather news stories containing mentions of nonprofits, companies, and causes. Filtering is performed to reduce false matches.

  • Webpages: The data from Guidestar and CharityNavigator often contain the web addresses of each nonprofit. The tool can crawl these pages and extract names of companies, using a combination of lists of company names.

This step provides information such as the causes and nonprofits a company supports.

Pulling and analyzing social data

For each nonprofit, GivingGraph attempts to identify their Twitter account(s) and analyze their follower list to distill information such as:

  • number of followers, retweeters, favorites
  • most frequent hashtags

Building and analyzing the giving graph

With these sources of information, we can then build a weighted graph connecting nonprofits, causes, companies, and people. Community detection algorithms can then be used to detect related nonprofits recommend partnerships.

Tweet Similarity

We calculate similarity between nonprofits based on the text of tweets and the text of mission statements. These similarity calculations are performed in givinggraph/analysis/similarity.py using the gensim library.

Clone this wiki locally