# Award Recipients

### Objective: Fetch all award recipients from USASpending.
#### Filter by amount greater than X.
- Currently x = $0.00
- This dataset includes contractors and other entities that have been awarded funds from the federal government.
- The dataset is ~176K rows as of 11/20/2024
- Note: Award types (P,C,R) filter out child award (C) records to prevent duplicates for awards (P,R) records are saved.
#### Dataset Details
- Output .csv file for portability
- Columns, this data can be used by other APIs that require a type of id for more details.
    - id, UUID of recipient at the recipient_level 
    - duns, DUNS id (Data Universal Numbering System),can be blank/null if not provided
    - uei, Recipients UEI (Unique Entity Identifier) null when not provided
        - Note: can be used to validate data in source system via website search
    - name, Name of recipient, null if not provided
    - recipient_level, Enum letter representing level. P=Parent, C=Child, R=Neither P or C
    - amount, Sum of all monetary value for all transactions associated with recipient for trailing 12 months
    - date_fetched, Added to results by fetch function date records retrieved from API.

## Technical Details

### Remote APIs
- Data is fetched from USA Spending APIs
- Documentation can be found at 
    - ```https://api.usaspending.gov/```
    - Technical documentation resources
        - ```https://api.usaspending.gov/docs/endpoints```
        - ```https://github.com/fedspendingtransparency/usaspending-api```
        - API in use documentation
        - ```https://github.com/fedspendingtransparency/usaspending-api/blob/master/usaspending_api/api_contracts/contracts/v2/recipient.md```
- The data is public no API key is required.
    - Use of a key is handled in code, for now it can be ```DEMO_KEY```
    - If the api is changed and requires a key use a proper key value.
- API in use
    - ```https://api.usaspending.gov/api/v2/recipient/```

### Functions & Parameter Notes
- The ```fetch_paginated_data``` function appends data to a csv file.
- The function uses pagination and a retry loops to deal with rate limits.
- Default values 
    - ```fetch_paginated_data(api_url, params, api_key, limit=100, max_pages=10, output_file="government_contractors.csv", retries=5, delay=5)```
- Example execution
    - ```fetch_paginated_data(api_url, params, api_key, 1000, 1000)```
- Notes
    - Logic
        - Processing will stop when a page of results has a summed amount of $0.00.
        - There may be a few records with a $0.00 amount value at from the last page based on page size.
        - e.g. Pulling 1000 records/page the last x records can be $0.00.
    - Execution Time
        - A timer tracks how long execution takes.
        - In testing this was in 5 minute range, ymmv.

# Imports

In [1]:
import csv
import os
import pandas as pd
import time
from datetime import datetime

import requests
from requests.exceptions import RequestException, Timeout

# Functions

In [2]:
def fetch_paginated_data(api_url = 'NA', params = {}, api_key = 'DEMO_KEY', limit=100, max_pages=10, output_file="federal_government_awardees.csv", page_amount_min = 0.0, retries=5, delay=5):
    """
    Fetch data from the USASpending API with pagination and write to a CSV file incrementally.
    Includes API key for authentication in the headers.
    
    :param api_url: The API endpoint URL (e.g., 'https://api.usaspending.gov/api/v2/recipient/')
    :param params: A dictionary containing filters or additional query parameters.
    :param api_key: Your USASpending API key to use for authentication.
    :param limit: Number of records to fetch per page (default is 100).
    :param max_pages: Maximum number of pages to fetch (default is 10).
    :param output_file: Name of the CSV file to save the results (default is 'federal_government_awardees.csv').
    :return: None
    """
    # Record the start time of the process
    start_time = time.time()
    
    #all_results = []
    total_amount = 0.0
    page = 1  # Start at the first page
    
    # Get the current directory where the script is running
    current_directory = os.getcwd()
    
    # Define the full path for the output file
    output_path = os.path.join(current_directory, output_file)
    
    # Define headers with API key for authentication
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }
    # Initialize sum requirement variable to min amount
    # ensure we run through the loop one time.
    page_amount_sum = page_amount_min
    
    # Get the current date
    current_date = datetime.now().strftime('%Y-%m-%d')
    
    while page <= max_pages:

        attempt = 0
        while attempt < retries:

            # Add pagination parameters to the request
            params.update({"limit": limit, "page": page})

            # Make the API request with headers
            try:
                response = requests.post(api_url, json=params, headers=headers)
                response.raise_for_status()  # Raise an exception for HTTP errors
                
                # Parse the JSON response
                data = response.json()
                results = data.get("results", [])
                
                # If there are no results, stop paginating
                if not results:
                    print(f"No more results on page {page}. Ending pagination.")
                    break
                
                # Add the current date to each record
                for record in results:
                    record["date_fetched"] = current_date
                
                # Filter out all 'C' records (those with recipient_level = 'C')
                filtered_results = [result for result in results if result.get('recipient_level') != 'C']

                # Sum the 'amount' field for the current page
                page_amount_sum = sum(result.get('amount', 0.0) for result in filtered_results)
                total_amount += page_amount_sum
                
                print(f"Page {page}: Sum of 'amount' for this page: {page_amount_sum}")
                
                if page_amount_sum > page_amount_min:
                    # Open the CSV file in append mode to add data incrementally
                    with open(output_path, mode='a', newline='', encoding='utf-8') as file:
                        # Create a CSV DictWriter object
                        writer = csv.DictWriter(file, fieldnames=filtered_results[0].keys())
                        
                        # Write the header only if the file is empty (on the first page)
                        if file.tell() == 0:
                            writer.writeheader()
                        
                        # Write the data (current page results)
                        writer.writerows(filtered_results)
                    
                    print(f"Fetched and wrote page {page} with {len(filtered_results)} records.")
                else:
                    print(f"Page {page}: Sum of 'amount' for this page: {page_amount_sum} " +
                          f"is less than the minimum amount requirement of {page_amount_min}" +
                          f", skipping csv write.")
                
                page += 1  # Increment to fetch the next page
                   
            except (RequestException, Timeout) as e:
                print(f"Error occurred on attempt {attempt} for page {page}: {e}")
                sleep_time = delay * attempt
                attempt += 1
                if attempt < retries:
                    print(f"Retrying in {delay} seconds...")
                    time.sleep(sleep_time)  # Wait before retrying
                else:
                    print(f"Max retries reached for page {page}. Moving to next page.")
                    break
            
            # Retry loop exit cases
            if (page_amount_sum <= page_amount_min):
                break

            if page >= max_pages + 1:
                break
        
        # Pagination loop exit cases
        if (page_amount_sum <= page_amount_min):
            break

        if page >= max_pages + 1:
            break
        
    print(f"Total Amount is {total_amount}.")                
    print(f"Results are being saved to {output_path}")
    
    # Record the end time and calculate the elapsed time
    end_time = time.time()
    elapsed_time = end_time - start_time
    print(f"Process completed in {elapsed_time:.2f} seconds")


# Execution Example

In [None]:
# Parameters 
api_url = "https://api.usaspending.gov/api/v2/recipient/"
params = {
   # "filters": {"some api parameter": ["the value"]}  # Example filter (optional)
      # filters are not used but are available see api documentation.
      # page number and limit are in use as parameters.
}
api_key = "DEMO_KEY"  # Replace with your actual API key when needed
fetch_paginated_data(api_url, params, api_key, 1000, 1000, "federal_government_awardees.csv", 0.0)

Page 1: Sum of 'amount' for this page: 1544592320628.671
Fetched and wrote page 1 with 640 records.
Page 2: Sum of 'amount' for this page: 129365480322.77994
Fetched and wrote page 2 with 604 records.
Page 3: Sum of 'amount' for this page: 66280520424.809975
Fetched and wrote page 3 with 603 records.
Page 4: Sum of 'amount' for this page: 43199333046.79999
Fetched and wrote page 4 with 598 records.
Page 5: Sum of 'amount' for this page: 31153705095.54998
Fetched and wrote page 5 with 585 records.
Page 6: Sum of 'amount' for this page: 24846870969.98999
Fetched and wrote page 6 with 592 records.
Page 7: Sum of 'amount' for this page: 20642142348.429993
Fetched and wrote page 7 with 601 records.
Page 8: Sum of 'amount' for this page: 17461534679.700024
Fetched and wrote page 8 with 599 records.
Page 9: Sum of 'amount' for this page: 14911428799.960018
Fetched and wrote page 9 with 593 records.
Page 10: Sum of 'amount' for this page: 13271292134.290005
Fetched and wrote page 10 with 607 r