# Green Purchasing Behavior Cube - NoSQL Project

## Project Overview
This project implements a custom JSON parser and NoSQL database operations to analyze the relationship between consumer spending on sustainable foods and economic factors like income and jobs.

## Team: Individual Project (Shamik Basu)


## Part 1: Extended JSON Parser

Extending the sample code to handle:
- Arrays
- Nested objects and arrays
- Boolean and null values
- Complex JSON structures


In [7]:
# Extended JSON Parser Implementation
# Based on sample code, extended to handle arrays, nested structures, booleans, and null

def parse_string(str):
    """Parse a string value from JSON"""
    str = str.lstrip()
    assert(str[0] == '"'), f"Expected '\"' but found '{str[0]}'"
    str = str[1:]  # skip the start quote
    
    # Handle escaped characters (basic support)
    mystr = ""
    i = 0
    while i < len(str):
        if str[i] == '\\' and i + 1 < len(str):
            # Handle escape sequences
            if str[i+1] == 'n':
                mystr += '\n'
                i += 2
            elif str[i+1] == 't':
                mystr += '\t'
                i += 2
            elif str[i+1] == '\\':
                mystr += '\\'
                i += 2
            elif str[i+1] == '"':
                mystr += '"'
                i += 2
            else:
                mystr += str[i]
                i += 1
        elif str[i] == '"':
            # End of string
            rest = str[i + 1:]
            return mystr, rest
        else:
            mystr += str[i]
            i += 1
    
    raise ValueError('Unterminated string')

def parse_number(str):
    """Parse a number (int or float) from JSON"""
    str = str.lstrip()
    
    chs = ''
    is_float = False
    i = 0
    for ch in str:
        if (ch.isdigit() or ch == '.' or ch == '-' or ch == '+' or ch == 'e' or ch == 'E'):
            if ch == '.':
                is_float = True
            chs += ch
            i += 1
        else:
            break
    
    if len(chs) == 0:
        raise ValueError('Expected number but found nothing')
    
    str = str[i:]
    value = float(chs) if is_float else int(chs)
    return value, str

def parse_boolean(str):
    """Parse boolean values (true/false)"""
    str = str.lstrip()
    if str.startswith('true'):
        return True, str[4:]
    elif str.startswith('false'):
        return False, str[5:]
    else:
        raise ValueError('Expected boolean but found something else')

def parse_null(str):
    """Parse null value"""
    str = str.lstrip()
    if str.startswith('null'):
        return None, str[4:]
    else:
        raise ValueError('Expected null but found something else')

def parse_colon(str):
    """Consume a colon ':'"""
    str = str.lstrip()
    assert(str[0] == ':'), f"Expected ':' but found '{str[0]}'"
    return str[1:]

def parse_value(str):
    """Parse any JSON value (object, array, string, number, boolean, null)"""
    str = str.lstrip()
    
    if len(str) == 0:
        raise ValueError('Unexpected end of string')
    
    if str[0] == '{':
        return parse_object(str)
    elif str[0] == '[':
        return parse_array(str)
    elif str[0] == '"':
        return parse_string(str)
    elif str[0] == '-' or str[0].isdigit():
        return parse_number(str)
    elif str.startswith('true') or str.startswith('false'):
        return parse_boolean(str)
    elif str.startswith('null'):
        return parse_null(str)
    else:
        raise ValueError(f'Unexpected character: {str[0]}')

def parse_object(str):
    """Parse a JSON object (dictionary) - extended to handle nested structures"""
    str = str.lstrip()
    assert(str[0] == '{'), f"Expected '{{' but found '{str[0]}'"
    str = str[1:]  # skip {
    
    obj = {}
    
    while True:
        str = str.lstrip()
        
        if len(str) == 0:
            raise ValueError('Expecting "}" but reached the end of string!')
        elif str[0] == '}':  # end of json object
            str = str[1:]  # consume '}'
            return obj, str
        elif str[0] == ',':
            str = str[1:]  # skip ','
        else:  # ready for a new key-value pair
            key, str = parse_string(str)
            str = parse_colon(str)  # skip colon
            value, str = parse_value(str)  # parse any type of value
            obj[key] = value

def parse_array(str):
    """Parse a JSON array (list) - handles nested structures"""
    str = str.lstrip()
    assert(str[0] == '['), f"Expected '[' but found '{str[0]}'"
    str = str[1:]  # skip [
    
    arr = []
    
    while True:
        str = str.lstrip()
        
        if len(str) == 0:
            raise ValueError('Expecting "]" but reached the end of string!')
        elif str[0] == ']':  # end of array
            str = str[1:]  # consume ']'
            return arr, str
        elif str[0] == ',':
            str = str[1:]  # skip ','
        else:  # ready for a new value
            value, str = parse_value(str)  # parse any type of value
            arr.append(value)

def json_load(json_str):
    """Main function to load JSON string into Python object"""
    json_str = json_str.strip()
    value, rest = parse_value(json_str)
    rest = rest.strip()
    if len(rest) > 0:
        raise ValueError(f'Unexpected content after JSON: {rest[:20]}')
    return value

# Test the extended parser
print("Testing Extended JSON Parser:")
print("=" * 50)

# Test 1: Simple object
test1 = '{"name": "john", "age": 25.3, "gender": "male"}'
result1 = json_load(test1)
print("Test 1 - Simple object:", result1)

# Test 2: Nested object
test2 = '{"person": {"name": "john", "age": 25}, "city": "LA"}'
result2 = json_load(test2)
print("Test 2 - Nested object:", result2)

# Test 3: Array
test3 = '[1, 2, 3, "hello", true, null]'
result3 = json_load(test3)
print("Test 3 - Array:", result3)

# Test 4: Array of objects
test4 = '[{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]'
result4 = json_load(test4)
print("Test 4 - Array of objects:", result4)

# Test 5: Complex nested structure
test5 = '{"data": [{"county": "LA", "spend": 1000}, {"county": "NY", "spend": 2000}], "year": 2023}'
result5 = json_load(test5)
print("Test 5 - Complex nested:", result5)


Testing Extended JSON Parser:
Test 1 - Simple object: {'name': 'john', 'age': 25.3, 'gender': 'male'}
Test 2 - Nested object: {'person': {'name': 'john', 'age': 25}, 'city': 'LA'}
Test 3 - Array: [1, 2, 3, 'hello', True, None]
Test 4 - Array of objects: [{'id': 1, 'name': 'Alice'}, {'id': 2, 'name': 'Bob'}]
Test 5 - Complex nested: {'data': [{'county': 'LA', 'spend': 1000}, {'county': 'NY', 'spend': 2000}], 'year': 2023}


## Part 2: Collection/DataFrame Structure

Implementing a collection structure to store JSON documents (similar to MongoDB collections)


In [8]:
# Collection class to store JSON documents (similar to MongoDB collections)

class Collection:
    """A collection class to store and manipulate JSON documents"""
    
    def __init__(self, name):
        self.name = name
        self.documents = []  # List of dictionaries (JSON objects)
    
    def insert(self, document):
        """Insert a document (dictionary) into the collection"""
        if isinstance(document, dict):
            self.documents.append(document)
        else:
            raise TypeError("Document must be a dictionary")
    
    def insert_many(self, documents):
        """Insert multiple documents into the collection"""
        for doc in documents:
            self.insert(doc)
    
    def __len__(self):
        return len(self.documents)
    
    def __getitem__(self, index):
        return self.documents[index]
    
    def __iter__(self):
        return iter(self.documents)
    
    def __repr__(self):
        return f"Collection(name='{self.name}', documents={len(self.documents)})"
    
    def to_list(self):
        """Return all documents as a list"""
        return self.documents.copy()

# Test Collection
print("Testing Collection Class:")
print("=" * 50)

collection = Collection("test_collection")
collection.insert({"name": "Alice", "age": 30})
collection.insert({"name": "Bob", "age": 25})
print(f"Collection: {collection}")
print(f"Documents: {collection.to_list()}")


Testing Collection Class:
Collection: Collection(name='test_collection', documents=2)
Documents: [{'name': 'Alice', 'age': 30}, {'name': 'Bob', 'age': 25}]


## Part 3: Core Operations

Implementing filtering, projection, group by, aggregation, and join operations


In [9]:
# Operation 1: Filtering
def filter_collection(collection, condition_func):
    """
    Filter documents in a collection based on a condition function
    
    Args:
        collection: Collection object
        condition_func: Function that takes a document and returns True/False
    
    Returns:
        New Collection with filtered documents
    """
    filtered = Collection(f"{collection.name}_filtered")
    for doc in collection:
        if condition_func(doc):
            filtered.insert(doc.copy())
    return filtered

# Operation 2: Projection
def project_collection(collection, fields):
    """
    Project (select) specific fields from documents
    
    Args:
        collection: Collection object
        fields: List of field names to select
    
    Returns:
        New Collection with projected documents
    """
    projected = Collection(f"{collection.name}_projected")
    for doc in collection:
        new_doc = {}
        for field in fields:
            if field in doc:
                new_doc[field] = doc[field]
        projected.insert(new_doc)
    return projected

# Operation 3: Group By
def group_by(collection, group_key):
    """
    Group documents by a key
    
    Args:
        collection: Collection object
        group_key: Field name to group by
    
    Returns:
        Dictionary where keys are group values and values are lists of documents
    """
    groups = {}
    for doc in collection:
        if group_key in doc:
            key_value = doc[group_key]
            if key_value not in groups:
                groups[key_value] = []
            groups[key_value].append(doc)
    return groups

# Operation 4: Aggregation
def aggregate(collection, group_key, agg_field, agg_func):
    """
    Group by a key and apply an aggregation function to a field
    
    Args:
        collection: Collection object
        group_key: Field name to group by
        agg_field: Field name to aggregate
        agg_func: Aggregation function (e.g., 'sum', 'avg', 'max', 'min', 'count')
    
    Returns:
        List of dictionaries with group_key and aggregated value
    """
    groups = group_by(collection, group_key)
    results = []
    
    for key_value, docs in groups.items():
        values = [doc[agg_field] for doc in docs if agg_field in doc]
        
        if len(values) == 0:
            continue
            
        if agg_func == 'sum':
            agg_value = sum(values)
        elif agg_func == 'avg':
            agg_value = sum(values) / len(values)
        elif agg_func == 'max':
            agg_value = max(values)
        elif agg_func == 'min':
            agg_value = min(values)
        elif agg_func == 'count':
            agg_value = len(values)
        else:
            raise ValueError(f"Unknown aggregation function: {agg_func}")
        
        results.append({group_key: key_value, f"{agg_func}({agg_field})": agg_value})
    
    return results

# Operation 5: Join
def join_collections(collection1, collection2, key1, key2):
    """
    Join two collections on specified keys
    
    Args:
        collection1: First Collection object
        collection2: Second Collection object
        key1: Key in collection1 to join on
        key2: Key in collection2 to join on
    
    Returns:
        New Collection with joined documents
    """
    joined = Collection(f"{collection1.name}_join_{collection2.name}")
    
    # Build index on collection2 for faster lookup
    index = {}
    for doc2 in collection2:
        if key2 in doc2:
            key_value = doc2[key2]
            if key_value not in index:
                index[key_value] = []
            index[key_value].append(doc2)
    
    # Perform join
    for doc1 in collection1:
        if key1 in doc1:
            key_value = doc1[key1]
            if key_value in index:
                for doc2 in index[key_value]:
                    # Merge documents
                    merged = doc1.copy()
                    # Add fields from doc2, avoiding conflicts by prefixing
                    for k, v in doc2.items():
                        if k != key2:  # Don't duplicate the join key
                            if k in merged:
                                merged[f"{collection2.name}_{k}"] = v
                            else:
                                merged[k] = v
                    joined.insert(merged)
    
    return joined

# Test operations
print("Testing Core Operations:")
print("=" * 50)

# Create test collection
test_coll = Collection("test")
test_coll.insert_many([
    {"county": "LA", "spend": 1000, "year": 2023},
    {"county": "NY", "spend": 2000, "year": 2023},
    {"county": "LA", "spend": 1500, "year": 2024},
    {"county": "NY", "spend": 2500, "year": 2024},
])

print("Original collection:")
for doc in test_coll:
    print(f"  {doc}")

# Test filtering
print("\n1. Filtering (spend > 1500):")
filtered = filter_collection(test_coll, lambda doc: doc.get("spend", 0) > 1500)
for doc in filtered:
    print(f"  {doc}")

# Test projection
print("\n2. Projection (county, spend):")
projected = project_collection(test_coll, ["county", "spend"])
for doc in projected:
    print(f"  {doc}")

# Test group by
print("\n3. Group by county:")
groups = group_by(test_coll, "county")
for key, docs in groups.items():
    print(f"  {key}: {len(docs)} documents")

# Test aggregation
print("\n4. Aggregation (sum of spend by county):")
agg_result = aggregate(test_coll, "county", "spend", "sum")
for result in agg_result:
    print(f"  {result}")

# Test join
print("\n5. Join:")
coll1 = Collection("coll1")
coll1.insert_many([
    {"county": "LA", "population": 10000000},
    {"county": "NY", "population": 8000000},
])

coll2 = Collection("coll2")
coll2.insert_many([
    {"county_code": "LA", "unemployment": 5.2},
    {"county_code": "NY", "unemployment": 4.8},
])

joined = join_collections(coll1, coll2, "county", "county_code")
for doc in joined:
    print(f"  {doc}")


Testing Core Operations:
Original collection:
  {'county': 'LA', 'spend': 1000, 'year': 2023}
  {'county': 'NY', 'spend': 2000, 'year': 2023}
  {'county': 'LA', 'spend': 1500, 'year': 2024}
  {'county': 'NY', 'spend': 2500, 'year': 2024}

1. Filtering (spend > 1500):
  {'county': 'NY', 'spend': 2000, 'year': 2023}
  {'county': 'NY', 'spend': 2500, 'year': 2024}

2. Projection (county, spend):
  {'county': 'LA', 'spend': 1000}
  {'county': 'NY', 'spend': 2000}
  {'county': 'LA', 'spend': 1500}
  {'county': 'NY', 'spend': 2500}

3. Group by county:
  LA: 2 documents
  NY: 2 documents

4. Aggregation (sum of spend by county):
  {'county': 'LA', 'sum(spend)': 2500}
  {'county': 'NY', 'sum(spend)': 4500}

5. Join:
  {'county': 'LA', 'population': 10000000, 'unemployment': 5.2}
  {'county': 'NY', 'population': 8000000, 'unemployment': 4.8}


## Part 4: JSON File Loading

Function to load JSON files (arrays of objects) into collections


In [10]:
# Function to load JSON file into a collection
def load_json_file(filename):
    """
    Load a JSON file (array of objects) into a Collection
    
    Args:
        filename: Path to JSON file
    
    Returns:
        Collection object
    """
    try:
        with open(filename, 'r', encoding='utf-8') as f:
            content = f.read().strip()
        
        # Parse JSON
        data = json_load(content)
        
        # Create collection (handle both Windows and Unix paths)
        collection_name = filename.replace('\\', '/').split('/')[-1].split('.')[0]
        collection = Collection(collection_name)
        
        # Handle both array of objects and single object
        if isinstance(data, list):
            for doc in data:
                if isinstance(doc, dict):
                    collection.insert(doc)
        elif isinstance(data, dict):
            collection.insert(data)
        else:
            raise ValueError("JSON file must contain an object or array of objects")
        
        return collection
    except FileNotFoundError:
        print(f"File {filename} not found. Creating empty collection.")
        collection_name = filename.replace('\\', '/').split('/')[-1].split('.')[0]
        return Collection(collection_name)
    except Exception as e:
        print(f"Error loading {filename}: {e}")
        collection_name = filename.replace('\\', '/').split('/')[-1].split('.')[0]
        return Collection(collection_name)

# Test file loading (will create sample data if files don't exist)
print("Testing JSON File Loading:")
print("=" * 50)


Testing JSON File Loading:


## Part 5: Sample Data Generation

Creating sample datasets for Green Purchasing Behavior analysis


In [11]:
# Generate sample datasets for Green Purchasing Behavior project

def create_sample_data():
    """Create sample JSON data files for the project"""
    
    # Sample food spending data
    food_spending_data = [
        {"county": "Los Angeles", "category": "organic_fruits", "spend": 1250.50, "year": 2023},
        {"county": "Los Angeles", "category": "organic_vegetables", "spend": 980.25, "year": 2023},
        {"county": "Los Angeles", "category": "organic_fruits", "spend": 1350.75, "year": 2024},
        {"county": "Los Angeles", "category": "organic_vegetables", "spend": 1100.00, "year": 2024},
        {"county": "New York", "category": "organic_fruits", "spend": 1450.00, "year": 2023},
        {"county": "New York", "category": "organic_vegetables", "spend": 1200.50, "year": 2023},
        {"county": "New York", "category": "organic_fruits", "spend": 1550.25, "year": 2024},
        {"county": "New York", "category": "organic_vegetables", "spend": 1300.75, "year": 2024},
        {"county": "Cook", "category": "organic_fruits", "spend": 950.00, "year": 2023},
        {"county": "Cook", "category": "organic_vegetables", "spend": 750.50, "year": 2023},
        {"county": "Cook", "category": "organic_fruits", "spend": 1050.25, "year": 2024},
        {"county": "Cook", "category": "organic_vegetables", "spend": 850.00, "year": 2024},
        {"county": "Harris", "category": "organic_fruits", "spend": 800.00, "year": 2023},
        {"county": "Harris", "category": "organic_vegetables", "spend": 650.25, "year": 2023},
        {"county": "Harris", "category": "organic_fruits", "spend": 900.50, "year": 2024},
        {"county": "Harris", "category": "organic_vegetables", "spend": 700.00, "year": 2024},
    ]
    
    # Sample jobs data
    jobs_data = [
        {"county": "Los Angeles", "occupation": "agriculture", "median_income": 45000, "year": 2023},
        {"county": "Los Angeles", "occupation": "retail", "median_income": 38000, "year": 2023},
        {"county": "Los Angeles", "occupation": "agriculture", "median_income": 47000, "year": 2024},
        {"county": "Los Angeles", "occupation": "retail", "median_income": 39000, "year": 2024},
        {"county": "New York", "occupation": "agriculture", "median_income": 48000, "year": 2023},
        {"county": "New York", "occupation": "retail", "median_income": 40000, "year": 2023},
        {"county": "New York", "occupation": "agriculture", "median_income": 50000, "year": 2024},
        {"county": "New York", "occupation": "retail", "median_income": 41000, "year": 2024},
        {"county": "Cook", "occupation": "agriculture", "median_income": 42000, "year": 2023},
        {"county": "Cook", "occupation": "retail", "median_income": 35000, "year": 2023},
        {"county": "Cook", "occupation": "agriculture", "median_income": 44000, "year": 2024},
        {"county": "Cook", "occupation": "retail", "median_income": 36000, "year": 2024},
        {"county": "Harris", "occupation": "agriculture", "median_income": 40000, "year": 2023},
        {"county": "Harris", "occupation": "retail", "median_income": 33000, "year": 2023},
        {"county": "Harris", "occupation": "agriculture", "median_income": 42000, "year": 2024},
        {"county": "Harris", "occupation": "retail", "median_income": 34000, "year": 2024},
    ]
    
    # Sample unemployment data
    unemployment_data = [
        {"county": "Los Angeles", "rate": 5.2, "year": 2023},
        {"county": "Los Angeles", "rate": 4.8, "year": 2024},
        {"county": "New York", "rate": 4.5, "year": 2023},
        {"county": "New York", "rate": 4.2, "year": 2024},
        {"county": "Cook", "rate": 5.8, "year": 2023},
        {"county": "Cook", "rate": 5.5, "year": 2024},
        {"county": "Harris", "rate": 6.2, "year": 2023},
        {"county": "Harris", "rate": 5.9, "year": 2024},
    ]
    
    # Convert to JSON strings and save
    def dict_to_json_str(obj):
        """Convert Python dict/list to JSON string"""
        if isinstance(obj, dict):
            items = []
            for k, v in obj.items():
                key_str = f'"{k}"'
                if isinstance(v, str):
                    val_str = f'"{v}"'
                elif isinstance(v, (int, float)):
                    val_str = str(v)
                elif isinstance(v, bool):
                    val_str = 'true' if v else 'false'
                elif v is None:
                    val_str = 'null'
                elif isinstance(v, (list, dict)):
                    val_str = dict_to_json_str(v)
                else:
                    val_str = f'"{str(v)}"'
                items.append(f'{key_str}: {val_str}')
            return '{' + ', '.join(items) + '}'
        elif isinstance(obj, list):
            items = [dict_to_json_str(item) if isinstance(item, (dict, list)) else 
                    (f'"{item}"' if isinstance(item, str) else str(item)) 
                    for item in obj]
            return '[' + ', '.join(items) + ']'
        else:
            return str(obj)
    
    # Write files
    with open('food_spending.json', 'w') as f:
        f.write(dict_to_json_str(food_spending_data))
    
    with open('jobs.json', 'w') as f:
        f.write(dict_to_json_str(jobs_data))
    
    with open('unemployment.json', 'w') as f:
        f.write(dict_to_json_str(unemployment_data))
    
    print("Sample data files created:")
    print("  - food_spending.json")
    print("  - jobs.json")
    print("  - unemployment.json")

# Create sample data
create_sample_data()


Sample data files created:
  - food_spending.json
  - jobs.json
  - unemployment.json


## Part 6: Green Purchasing Behavior Application

Application that uses all implemented functions to analyze sustainable food purchasing behavior


In [12]:
# Green Purchasing Behavior Analysis Application

print("=" * 70)
print("GREEN PURCHASING BEHAVIOR ANALYSIS APPLICATION")
print("=" * 70)

# Load data from JSON files
print("\n1. Loading Data from JSON Files:")
print("-" * 70)
food_spending = load_json_file('food_spending.json')
jobs = load_json_file('jobs.json')
unemployment = load_json_file('unemployment.json')

print(f"Loaded {len(food_spending)} food spending records")
print(f"Loaded {len(jobs)} job records")
print(f"Loaded {len(unemployment)} unemployment records")

# Application Question 1: Who is buying sustainable food? (Demographics, Geography)
print("\n2. Question 1: Who is buying sustainable food? (Geography Analysis)")
print("-" * 70)

# Filter: High spending counties (spend > 1000)
print("\n2a. Filtering: Counties with spending > $1000")
high_spending = filter_collection(food_spending, lambda doc: doc.get("spend", 0) > 1000)
print(f"Found {len(high_spending)} records with spending > $1000")

# Projection: County and spend
print("\n2b. Projection: County and spending amounts")
county_spending = project_collection(high_spending, ["county", "spend"])
for doc in county_spending:
    print(f"  {doc['county']}: ${doc['spend']:.2f}")

# Group by: County
print("\n2c. Group By: Spending by county")
county_groups = group_by(food_spending, "county")
for county, docs in county_groups.items():
    total = sum(doc.get("spend", 0) for doc in docs)
    print(f"  {county}: ${total:.2f} total spending ({len(docs)} records)")

# Aggregation: Total spending by county
print("\n2d. Aggregation: Total spending by county")
total_by_county = aggregate(food_spending, "county", "spend", "sum")
for result in total_by_county:
    print(f"  {result['county']}: ${result['sum(spend)']:.2f}")

# Application Question 2: How income influences sustainable purchasing
print("\n3. Question 2: How income influences sustainable purchasing behavior")
print("-" * 70)

# Join: Food spending with jobs data
print("\n3a. Join: Food spending with jobs data (on county)")
spending_jobs = join_collections(food_spending, jobs, "county", "county")
print(f"Joined collection has {len(spending_jobs)} records")

# Filter: High income areas (median_income > 40000)
print("\n3b. Filtering: High income areas (median_income > $40,000)")
high_income_spending = filter_collection(spending_jobs, 
                                        lambda doc: doc.get("median_income", 0) > 40000)
print(f"Found {len(high_income_spending)} records in high-income areas")

# Aggregation: Average spending by income level
print("\n3c. Aggregation: Average spending by income level")
# First, categorize income levels
def categorize_income(doc):
    income = doc.get("median_income", 0)
    if income >= 45000:
        return "high"
    elif income >= 40000:
        return "medium"
    else:
        return "low"

# Add income category to documents
spending_with_category = Collection("spending_categorized")
for doc in spending_jobs:
    new_doc = doc.copy()
    new_doc["income_category"] = categorize_income(doc)
    spending_with_category.insert(new_doc)

avg_by_income = aggregate(spending_with_category, "income_category", "spend", "avg")
print("Average spending by income category:")
for result in avg_by_income:
    print(f"  {result['income_category']}: ${result['avg(spend)']:.2f}")

# Application Question 3: Economic shocks and spending habits
print("\n4. Question 3: Do economic shocks (unemployment) change spending habits?")
print("-" * 70)

# Join: Food spending with unemployment data
print("\n4a. Join: Food spending with unemployment data")
spending_unemployment = join_collections(food_spending, unemployment, "county", "county")
print(f"Joined collection has {len(spending_unemployment)} records")

# Filter: High unemployment areas (rate > 5.5%)
print("\n4b. Filtering: High unemployment areas (rate > 5.5%)")
high_unemployment = filter_collection(spending_unemployment, 
                                     lambda doc: doc.get("rate", 0) > 5.5)
print(f"Found {len(high_unemployment)} records in high-unemployment areas")

# Aggregation: Average spending by unemployment level
print("\n4c. Aggregation: Spending comparison by unemployment level")
def categorize_unemployment(doc):
    rate = doc.get("rate", 0)
    if rate >= 6.0:
        return "very_high"
    elif rate >= 5.5:
        return "high"
    elif rate >= 4.5:
        return "medium"
    else:
        return "low"

spending_with_unemp_cat = Collection("spending_unemp_categorized")
for doc in spending_unemployment:
    new_doc = doc.copy()
    new_doc["unemployment_category"] = categorize_unemployment(doc)
    spending_with_unemp_cat.insert(new_doc)

avg_by_unemployment = aggregate(spending_with_unemp_cat, "unemployment_category", "spend", "avg")
print("Average spending by unemployment category:")
for result in avg_by_unemployment:
    print(f"  {result['unemployment_category']}: ${result['avg(spend)']:.2f}")

# Year-over-year analysis
print("\n4d. Year-over-year spending change by unemployment")
year_groups = group_by(spending_unemployment, "year")
for year, docs in sorted(year_groups.items()):
    avg_spend = sum(doc.get("spend", 0) for doc in docs) / len(docs) if docs else 0
    avg_unemp = sum(doc.get("rate", 0) for doc in docs) / len(docs) if docs else 0
    print(f"  {year}: Avg spending=${avg_spend:.2f}, Avg unemployment={avg_unemp:.2f}%")

# Application Question 4: Category analysis
print("\n5. Question 4: Spending patterns by food category")
print("-" * 70)

# Aggregation: Total spending by category
print("\n5a. Aggregation: Total spending by category")
total_by_category = aggregate(food_spending, "category", "spend", "sum")
for result in total_by_category:
    print(f"  {result['category']}: ${result['sum(spend)']:.2f}")

# Aggregation: Average spending by category
print("\n5b. Aggregation: Average spending by category")
avg_by_category = aggregate(food_spending, "category", "spend", "avg")
for result in avg_by_category:
    print(f"  {result['category']}: ${result['avg(spend)']:.2f}")

# Complex query: Category spending by county
print("\n5c. Complex Query: Category spending by county (using group by and aggregation)")
county_category_groups = {}
for doc in food_spending:
    county = doc.get("county")
    category = doc.get("category")
    key = (county, category)
    if key not in county_category_groups:
        county_category_groups[key] = []
    county_category_groups[key].append(doc)

print("Top spending combinations:")
sorted_combos = sorted(county_category_groups.items(), 
                      key=lambda x: sum(d.get("spend", 0) for d in x[1]), 
                      reverse=True)
for (county, category), docs in sorted_combos[:5]:
    total = sum(d.get("spend", 0) for d in docs)
    print(f"  {county} - {category}: ${total:.2f}")

print("\n" + "=" * 70)
print("APPLICATION ANALYSIS COMPLETE")
print("=" * 70)


GREEN PURCHASING BEHAVIOR ANALYSIS APPLICATION

1. Loading Data from JSON Files:
----------------------------------------------------------------------
Loaded 16 food spending records
Loaded 16 job records
Loaded 8 unemployment records

2. Question 1: Who is buying sustainable food? (Geography Analysis)
----------------------------------------------------------------------

2a. Filtering: Counties with spending > $1000
Found 8 records with spending > $1000

2b. Projection: County and spending amounts
  Los Angeles: $1250.50
  Los Angeles: $1350.75
  Los Angeles: $1100.00
  New York: $1450.00
  New York: $1200.50
  New York: $1550.25
  New York: $1300.75
  Cook: $1050.25

2c. Group By: Spending by county
  Los Angeles: $4681.50 total spending (4 records)
  New York: $5501.50 total spending (4 records)
  Cook: $3600.75 total spending (4 records)
  Harris: $3050.75 total spending (4 records)

2d. Aggregation: Total spending by county
  Los Angeles: $4681.50
  New York: $5501.50
  Cook: $3

## Summary

This project implements:

1. **Extended JSON Parser**: Handles objects, arrays, nested structures, booleans, and null values
2. **Collection Class**: Stores and manages JSON documents (similar to MongoDB collections)
3. **Core Operations**:
   - **Filtering**: Select documents based on conditions
   - **Projection**: Select specific fields from documents
   - **Group By**: Group documents by a key
   - **Aggregation**: Compute aggregates (sum, avg, max, min, count) on grouped data
   - **Join**: Join two collections on specified keys

4. **Application**: Green Purchasing Behavior analysis that:
   - Analyzes who buys sustainable food (geography)
   - Examines income influence on purchasing behavior
   - Studies economic shocks (unemployment) impact on spending
   - Analyzes spending patterns by food category

All operations are implemented from scratch without using pandas, json, or csv libraries.
