Skip to content
JohnHBrock edited this page Sep 9, 2013 · 9 revisions

GivingGraph collects data from several sources and stores them in a MySQL DB.

  1. GuideStar: (Requires API credentials) Given an EIN, we query GuideStar for NTEE code, mission statement, annual revenue, and year founded. Results saved to the nonprofits table.
  2. Charity News: Given a nonprofit name, we search the Yahoo News API for related news articles. (See searcher.py.) Results saved to the `news_articles' table.
  3. Company Mentions: For each news article collected above, search for mentions of companies. (See parser.py.) Results saved to the news_articles_companies_rel table.
  4. Twitter Handle: Yahoo is queried for the Twitter handle of this nonprofit. Result saved to the nonprofits table.
  5. Tweets: The most recent tweets from this nonprofit are collected and stored in nonprofits_tweets.
  6. Followers: The list of Twitter users who follow this nonprofit are collected and stored in nonprofits_followers.

After these data are collected, a similarity score is computed for each pair of nonprofits based on each data source:

  • tweet similarity is stored in nonprofits_similarity_by_tweets.
  • description similarity is stored in nonprofits_similarity_by_description.
  • follower similarity is stored in nonprofits_similarity_by_tweet_ids.

Once these similarity scores have been computed, nonprofits are then clustered into communities, with the results stored in nonprofits_communities_by_description, nonprofits_communities_by_tweets, nonprofits_communities_by_tweet_words.

The results of our analysis are exposed through a REST API.

Clone this wiki locally