### What to do
Scrape the websites in order to extract the following information:
- Name 
- Logo
- Location 
- Themes (i.e. children, homeless, medicine...)
- Description
- URL of their website
- Year of foundation (This information is sometimes implicit: can be computed based on Number of year of activity)

### What to use
- Requests
- BeautifulSoup
- Scrapy

### Websites to scrape
urls = [
    "https://www.charitynavigator.org",
    "https://www.globalgiving.org",
    "https://www.guidestar.org",
]

### Charity Navigator API
https://charity-navigator.stellate.io

In [13]:
import os
from dotenv import load_dotenv
import requests
import pandas as pd

# Load environment variables from .env file
load_dotenv()

# Access the variables
charity_navigator_key = os.getenv('CHARITY_NAVIGATOR')

CHARITY_NAVIGATOR_ENDPOINT = 'https://data.charitynavigator.org/'

# Fetch data from Charity Navigator
def fetch_charity_data(count):
    QUERY = """
    query {
        publicSearchFaceted(term: "", from: %d) {
            size
            from
            term
            result_count
            results {
                ein
                name
                mission
                organization_url
                charity_navigator_url
                encompass_score
                encompass_star_rating
                encompass_publication_date
                cause
                street
                street2
                city
                state
                zip
                country
                highest_level_advisory
                encompass_rating_id
            }
        }
    }
    """ % count

    headers = {
        "Stellate-Api-Token": charity_navigator_key,
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
    }
    response = requests.post(
        CHARITY_NAVIGATOR_ENDPOINT,
        headers=headers,
        json={"query": QUERY}
    )
    # Raise an error if the request fails
    response.raise_for_status()
    
    # Return the JSON response
    return response.json()

results = []

for i in range(0, 10001, 10):
    data = fetch_charity_data(i)
    print(data)
    results.append(data)

# Save the results to a CSV file
df = pd.DataFrame(results)
df.to_csv('charity_navigator_data.csv', index=False)

{'data': {'publicSearchFaceted': {'size': 10, 'from': 0, 'term': '', 'result_count': 10000, 'results': [{'ein': '273521132', 'name': 'World Central Kitchen Incorporated', 'mission': 'Founded in 2010 by Chef José Andrés, World Central Kitchen (WCK) is a nonprofit organization that is first to the frontlines providing fresh meals in response to crises, while working to build resilient food systems with locally led solutions. Applying our model of quick action, leveraging local resources, and adapting in real time, WCK has served more than 250 million nourishing meals around the world. ', 'organization_url': 'WWW.WCK.ORG', 'charity_navigator_url': 'https://charitynavigator.org/ein/273521132', 'encompass_score': '100', 'encompass_star_rating': '4', 'encompass_publication_date': '2024-11-12T00:03:04.302Z', 'cause': 'Food aid', 'street': '200 Massachusetts Ave NW 7TH Floor', 'street2': '7th Floor', 'city': 'Washington', 'state': 'DC', 'zip': '20001-0000', 'country': 'USA', 'highest_level_adv

In [15]:
print(len(results))

1001
