Skip to content
This repository has been archived by the owner on Mar 11, 2020. It is now read-only.


Repository files navigation

Cure Alzheimer's Fund Data Collection (CAFDC)

README (technical)


Data collection and distribution from Google Scholar and NIH RePORTER for the Cure Alzheimer's Fund written in Python 2, distributed with Django, and hosted on The data is downloadable as a CSV file.

Google Scholar

CAFDC uses to scrape Google scholar for entries with the exact phrase "Cure Alzheimer's Fund" (false positives are later removed manually by CAF). Entries are then converted to Django objects and stored in a local mySQL database.

Every so often, Google Scholar will throw a CAPTCHA. In the event of this or any error, will save the current progress. A cron job later attempts to resume the process with scrape. Thus, the total process could take several days, but will only need to be finished a single time.

The format of the Google Scholar CSV is:

  1. URL
  2. title
  3. number of citations
  4. journal edition info
  5. journal name
  6. journal volume
  7. journal issue
  8. year
  9. authors


Once all CAF-related papers have been collected from Google Scholar, CAFDC uses the NIH ExPORTER to download CSV files of the NIH RePORTER database, one per year. See for the locations of these files.

The format of the NIH CSV is:

  1. URL
  2. title
  3. funded researcher
  4. funding amount
  5. year


  • Matthew Pfeiffer
  • Hunter Gatewood
  • Srinivas Kaza


No description, website, or topics provided.







No releases published