# Berkeley Open Data Analysis Pipeline
## Integration with Datasette, Pandas, NumPy, TensorFlow, Plotly, and Seaborn

**Created:** 2025-01-13  
**Purpose:** Fetch, analyze, and visualize City of Berkeley Open Data

### Workflow:
1. Connect to Berkeley Open Data API (Socrata)
2. Fetch Business Licenses and other datasets
3. Clean and process data
4. Export to CSV, JSON, GeoJSON
5. Load into Datasette (SQLite)
6. Analyze with Pandas/NumPy
7. Visualize with Plotly/Seaborn
8. Optional: ML with TensorFlow

In [None]:
# CELL 1: Install Required Packages
# Run this cell once to install all dependencies

%pip install pandas numpy geopandas sodapy datasette plotly seaborn tensorflow folium requests --break-system-packages

print("‚úÖ Installation complete!")

Note: you may need to restart the kernel to use updated packages.
‚úÖ Installation complete!


In [None]:
# CELL 2: Import Libraries

import pandas as pd
import numpy as np
import geopandas as gpd
from sodapy import Socrata
import json
import sqlite3
from pathlib import Path
import plotly.express as px
import plotly.graph_objects as go
import seaborn as sns
import matplotlib.pyplot as plt
from datetime import datetime
import warnings
warnings.filterwarnings('ignore')

# Set display options
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 100)

print("‚úÖ All libraries imported successfully")
print(f"Pandas version: {pd.__version__}")
print(f"NumPy version: {np.__version__}")

‚úÖ All libraries imported successfully
Pandas version: 2.2.3
NumPy version: 2.1.3


In [None]:
# CELL 3: Configure Berkeley Open Data API - new
# CELL 3: Configure Berkeley Open Data API

# üîë IMPORTANT: You Need a Free App Token!
# 
# Berkeley's API requires authentication. Get your FREE token here:
# https://data.cityofberkeley.info/profile/edit/developer_settings
# 
# It takes 2 minutes and is completely free!
# 
# Quick Setup:
# 1. Create account at https://data.cityofberkeley.info
# 2. Go to Developer Settings
# 3. Create New App Token
# 4. Copy your token
# 5. Either:
#    - Create .env file with: BERKELEY_APP_TOKEN=your-token
#    - OR paste token directly below
# 
# See TOKEN_SETUP_INSTRUCTIONS.md for detailed help!

import os

# Try to load from environment variable first
try:
    from dotenv import load_dotenv
    load_dotenv()
    print("‚úÖ Loaded .env file")
except:
    print("‚ÑπÔ∏è  python-dotenv not installed (optional)")

# Berkeley Open Data Portal Configuration
BERKELEY_DOMAIN = "data.cityofberkeley.info"

# üîë SET YOUR APP TOKEN HERE
# Get your free token from: https://data.cityofberkeley.info/profile/edit/developer_settings

# Option 1: From environment variable (recommended - keeps token private)
APP_TOKEN = os.environ.get('BERKELEY_APP_TOKEN')

# Option 2: Paste directly here (quick, but less secure if sharing)
# Uncomment the line below and add your token:
APP_TOKEN = "sKTwtTUlhd2VfrmCC9W3xKr9P"  # Replace with your actual token

# Check if token is set
if APP_TOKEN:
    print(f"‚úÖ App token loaded: {APP_TOKEN[:8]}...")
else:
    print("\n" + "="*70)
    print("‚ö†Ô∏è  WARNING: No app token found!")
    print("="*70)
    print("\nYou need a FREE app token to access Berkeley's data.")
    print("\nüìù Quick Setup (2 minutes):")
    print("   1. Go to: https://data.cityofberkeley.info")
    print("   2. Sign up (free)")
    print("   3. Developer Settings ‚Üí Create App Token")
    print("   4. Copy token and either:")
    print("      - Create .env file with: BERKELEY_APP_TOKEN=your-token")
    print("      - Or paste directly in this cell (see Option 2 above)")
    print("\nüìñ See TOKEN_SETUP_INSTRUCTIONS.md for detailed help")
    print("="*70)
    print("\n‚ÑπÔ∏è  Continuing without token - may get 403 errors...")

# CORRECTED Dataset IDs
DATASETS = {
    'business_licenses': 'rwnf-bu3w',  # ‚úÖ CORRECT ID
    'crime_incidents': 'k2nh-s5h5',
    'restaurant_inspections': 'b47j-kakm',
    'building_permits': 'ydr8-5enu',
}

# Initialize Socrata client
from sodapy import Socrata

client = Socrata(BERKELEY_DOMAIN, APP_TOKEN)

print(f"\n‚úÖ Connected to {BERKELEY_DOMAIN}")
if APP_TOKEN:
    print("   Using app token (10,000 requests/hour)")
else:
    print("   ‚ö†Ô∏è  No token - limited to 1,000 requests/hour and may be blocked")
print(f"\nüìä Available datasets: {list(DATASETS.keys())}")



‚úÖ Loaded .env file
‚úÖ App token loaded: sKTwtTUl...

‚úÖ Connected to data.cityofberkeley.info
   Using app token (10,000 requests/hour)

üìä Available datasets: ['business_licenses', 'crime_incidents', 'restaurant_inspections', 'building_permits']


In [None]:
# CELL 4: Functions for Data Fetching

def fetch_berkeley_data(dataset_name, limit=10000, filters=None):
    """
    Fetch data from Berkeley Open Data Portal
    
    Parameters:
    -----------
    dataset_name : str
        Name of dataset from DATASETS dict
    limit : int
        Maximum number of records to fetch
    filters : dict
        Optional filters (e.g., {'city': 'Berkeley'})
    
    Returns:
    --------
    pandas.DataFrame
    """
    try:
        dataset_id = DATASETS.get(dataset_name)
        if not dataset_id:
            raise ValueError(f"Unknown dataset: {dataset_name}")
        
        print(f"üì• Fetching {dataset_name} from Berkeley Open Data...")
        
        # Build query parameters
        params = {"$limit": limit}
        if filters:
            # Convert filters to SoQL WHERE clause
            where_clauses = [f"{k}='{v}'" for k, v in filters.items()]
            params["$where"] = " AND ".join(where_clauses)
        
        # Fetch data
        results = client.get(dataset_id, **params)
        
        # Convert to DataFrame
        df = pd.DataFrame.from_records(results)
        
        print(f"‚úÖ Fetched {len(df)} records")
        return df
        
    except Exception as e:
        print(f"‚ùå Error fetching data: {e}")
        return None

print("‚úÖ Functions defined")

‚úÖ Functions defined


In [None]:
# CELL 5: Fetch Business Licenses Data 

# Fetch all business licenses
business_licenses = fetch_berkeley_data('business_licenses', limit=50000)

# Verify BL-005071 is there
if business_licenses is not None:
    bl_005071 = business_licenses[business_licenses['recordid'] == 'BL-005071']
    if len(bl_005071) > 0:
        print("‚úÖ BL-005071 found!")
        print(bl_005071)
    else:
        print("‚ùå BL-005071 not in data")

if business_licenses is not None:
    # Display basic info
    print("\nüìä Dataset Info:")
    print(f"Shape: {business_licenses.shape}")
    print(f"\nColumns: {business_licenses.columns.tolist()}")
    print(f"\nFirst few records:")
    display(business_licenses.head())
    
    # Check for location data
    if 'location' in business_licenses.columns:
        print("\n‚úÖ Location data available for mapping")
    else:
        print("\n‚ö†Ô∏è No location column found")

üì• Fetching business_licenses from Berkeley Open Data...
‚úÖ Fetched 13004 records
‚úÖ BL-005071 found!
              apn   recordid        busdesc b1_per_sub_type              dba  \
68  052 157301400  BL-005071  OPTICAL STORE    Retail Trade  FOCAL POINT INC   

                            naics tax_code employee_num bus_own_type  \
68  446130 - Optical Goods Stores        R            6  Corporation   

   b1_business_name          b1_address1   b1_city b1_state     b1_zip  \
68  FOCAL POINT INC  2700 RYDIN RD STE A  RICHMOND       CA  948045800   

   b1_contact_type b1_full_address b1_situs_city b1_situs_state b1_situs_zip  \
68  Business Owner  2638 ASHBY AVE      BERKELEY             CA        94705   

   b1_address2  
68         NaN  

üìä Dataset Info:
Shape: (13004, 20)

Columns: ['apn', 'recordid', 'busdesc', 'b1_per_sub_type', 'dba', 'naics', 'tax_code', 'employee_num', 'bus_own_type', 'b1_business_name', 'b1_address1', 'b1_city', 'b1_state', 'b1_zip', 'b1_contact_type'

Unnamed: 0,apn,recordid,busdesc,b1_per_sub_type,dba,naics,tax_code,employee_num,bus_own_type,b1_business_name,b1_address1,b1_city,b1_state,b1_zip,b1_contact_type,b1_full_address,b1_situs_city,b1_situs_state,b1_situs_zip,b1_address2
0,ZZZZZZZZZZZZZ,BL-022624,PAINTING CONTRACTOR,Construction or Contractor,ACCEL PAINTING,238320 - Painting and Wall Covering Contractors,C,2,Corporation,ACCEL PAINTING,106 CORTES CT,HERCULES,CA,94547,Business Owner,0 VARIOUS,BERKELEY,CA,94704,
1,059 226001900,BL-026980,FLORIST-OPEN AIR,Retail Trade,EMILA,453110 - Florists,R,4,Sole Ownership,EMILA,8 KERR AVE,BERKELEY,CA,94707,Business Owner,1527 SHATTUCK Ave,BERKELEY,CA,94709,
2,057 208501700,BL-010435,AUTO COLLISION REPAIRS,Business Personal Repair Svs,PREMIER AUTOBODY,811111 - General Automotive Repair,B,3,Sole Ownership,PREMIER AUTOBODY,1911 SAN PABLO AVE,BERKELEY,CA,94702,Business Owner,1911 SAN PABLO AVE,BERKELEY,CA,94702,
3,058 218600700,BL-022111,RES RENTAL - 15 UNITS,Rental of Real Property,FINGADO PAMELA,531110 - Lessors of Residential Buildings and ...,L,0,Sole Ownership,FINGADO PAMELA,851 SEA VIEW DR,EL CERRITO,CA,94530,Business Owner,2355 HILGARD Ave,BERKELEY,CA,94709,
4,053 170104902,BL-015919,RECORDING & PROD. SVCS,Entertainment Recreation,BOP CITY MUSIC,512240 - Sound Recording Studios,E,0,LLC,BOP CITY MUSIC,2827 RUSSELL ST,BERKELEY,CA,94705-2345,Business Owner,2827 RUSSELL St,BERKELEY,CA,94705,



‚ö†Ô∏è No location column found


In [None]:
# CELL 6: Data Cleaning & Processing new
# CELL 6: Data Cleaning & Processing

def clean_business_data(df):
    """Clean and process business license data"""
    df_clean = df.copy()
    
    # Normalize APN for joining with parcels
    if 'apn' in df_clean.columns:
        df_clean['apn_normalized'] = df_clean['apn'].astype(str).str.strip()
    
    # Parse business_location if it exists (it's a Location type with coordinates)
    if 'business_location' in df_clean.columns:
        try:
            # business_location is a dict with latitude/longitude
            import json
            
            def extract_coords(loc):
                if pd.isna(loc):
                    return None, None
                try:
                    if isinstance(loc, str):
                        loc = json.loads(loc)
                    if isinstance(loc, dict):
                        return loc.get('latitude'), loc.get('longitude')
                except:
                    pass
                return None, None
            
            df_clean[['latitude', 'longitude']] = df_clean['business_location'].apply(
                lambda x: pd.Series(extract_coords(x))
            )
            
            coord_count = df_clean['latitude'].notna().sum()
            print(f"‚úÖ Extracted coordinates from business_location: {coord_count} records")
        except Exception as e:
            print(f"‚ö†Ô∏è  Could not parse business_location: {e}")
    
    # Convert date columns to datetime
    for col in df_clean.columns:
        if 'date' in col.lower():
            try:
                df_clean[col] = pd.to_datetime(df_clean[col])
            except:
                pass
    
    print(f"‚úÖ Data cleaned.function return Shape: {df_clean.shape}")
    
    return df_clean

# Clean the data
if business_licenses is not None:
    business_licenses_clean = clean_business_data(business_licenses)
    # look at new data
    print(f"Look at new Shape: {business_licenses_clean.shape}")
    print(f"\nColumns: {business_licenses_clean.columns.tolist()}")

    
    # Show sample with correct column names
    display_cols = ['recordid', 'b1_business_name', 'b1_full_address', 'busdesc', 'dba']
    available_cols = [col for col in display_cols if col in business_licenses_clean.columns]
    
    print("\nüìã Sample new cleaned data:")
    display(business_licenses_clean[available_cols].head())

‚úÖ Data cleaned. Shape: (13004, 21)

üìã Sample cleaned data:


Unnamed: 0,recordid,b1_business_name,b1_full_address,busdesc
0,BL-022624,ACCEL PAINTING,0 VARIOUS,PAINTING CONTRACTOR
1,BL-026980,EMILA,1527 SHATTUCK Ave,FLORIST-OPEN AIR
2,BL-010435,PREMIER AUTOBODY,1911 SAN PABLO AVE,AUTO COLLISION REPAIRS
3,BL-022111,FINGADO PAMELA,2355 HILGARD Ave,RES RENTAL - 15 UNITS
4,BL-015919,BOP CITY MUSIC,2827 RUSSELL St,RECORDING & PROD. SVCS


In [8]:
# CELL 7: Export to Multiple Formats-new
# CELL 7: Export to Multiple Formats

if business_licenses_clean is not None:
    # Create output directory
    output_dir = Path('/Users/johngage/berkeley-data')
    output_dir.mkdir(parents=True, exist_ok=True)
    
    timestamp = datetime.now().strftime("%Y%m%d")
    
    # Export to CSV
    csv_path = output_dir / f'business_licenses_{timestamp}.csv'
    business_licenses_clean.to_csv(csv_path, index=False)
    print(f"‚úÖ CSV exported: {csv_path}")
    
    # Export to JSON
    json_path = output_dir / f'business_licenses_{timestamp}.json'
    business_licenses_clean.to_json(json_path, orient='records', indent=2)
    print(f"‚úÖ JSON exported: {json_path}")
    
    # Export to GeoJSON if coordinates exist
    if 'latitude' in business_licenses_clean.columns and 'longitude' in business_licenses_clean.columns:
        geo_df = business_licenses_clean.dropna(subset=['latitude', 'longitude'])
        
        if len(geo_df) > 0:
            import geopandas as gpd
            gdf = gpd.GeoDataFrame(
                geo_df,
                geometry=gpd.points_from_xy(geo_df.longitude, geo_df.latitude),
                crs='EPSG:4326'
            )
            
            geojson_path = output_dir / f'business_licenses_{timestamp}.geojson'
            gdf.to_file(geojson_path, driver='GeoJSON')
            print(f"‚úÖ GeoJSON exported: {geojson_path} ({len(gdf)} records)")
        else:
            print("‚ö†Ô∏è  No coordinates available for GeoJSON export")
    
    print(f"\nüìä Exported {len(business_licenses_clean)} business licenses")

‚úÖ CSV exported: /Users/johngage/berkeley-data/business_licenses_20251115.csv
‚úÖ JSON exported: /Users/johngage/berkeley-data/business_licenses_20251115.json

üìä Exported 13004 business licenses


In [11]:
# CELL 8: Load Data into Datasette (SQLite) - new

# CELL 8: Load Data into Datasette (SQLite)

def load_to_datasette(df, table_name, db_path='./berkeley.db'):
    """Load DataFrame into SQLite database for Datasette"""
    import sqlite3
    
    # Prepare for SQLite
    df_for_db = df.copy()
    
    # Convert datetime to string
    for col in df_for_db.select_dtypes(include=['datetime64']).columns:
        df_for_db[col] = df_for_db[col].astype(str)
    
    # Drop complex objects (business_location if it's still a dict)
    if 'business_location' in df_for_db.columns:
        try:
            # Try to convert to string if it's a dict/json
            df_for_db['business_location'] = df_for_db['business_location'].astype(str)
        except:
            df_for_db = df_for_db.drop('business_location', axis=1)
    
    # Connect to SQLite
    conn = sqlite3.connect(db_path)
    
    # Load data
    df_for_db.to_sql(table_name, conn, if_exists='replace', index=False)
    
    # Create indexes on key columns (using correct API field names)
    cursor = conn.cursor()
    
    if 'b1_business_name' in df_for_db.columns:
        cursor.execute(f"CREATE INDEX IF NOT EXISTS idx_business_name ON {table_name}(b1_business_name)")
    
    if 'apn_normalized' in df_for_db.columns:
        cursor.execute(f"CREATE INDEX IF NOT EXISTS idx_apn ON {table_name}(apn_normalized)")
    elif 'apn' in df_for_db.columns:
        cursor.execute(f"CREATE INDEX IF NOT EXISTS idx_apn ON {table_name}(apn)")
    
    if 'recordid' in df_for_db.columns:
        cursor.execute(f"CREATE INDEX IF NOT EXISTS idx_recordid ON {table_name}(recordid)")
    
    if 'b1_zip' in df_for_db.columns:
        cursor.execute(f"CREATE INDEX IF NOT EXISTS idx_zip ON {table_name}(b1_zip)")
    
    if 'latitude' in df_for_db.columns:
        cursor.execute(f"CREATE INDEX IF NOT EXISTS idx_lat ON {table_name}(latitude)")
    
    if 'longitude' in df_for_db.columns:
        cursor.execute(f"CREATE INDEX IF NOT EXISTS idx_lon ON {table_name}(longitude)")
    
    conn.commit()
    conn.close()
    
    print(f"‚úÖ Loaded {len(df_for_db)} records into table '{table_name}'")
    print(f"   Database: {db_path}")
    
    # Show which indexes were created
    index_cols = []
    if 'recordid' in df_for_db.columns: index_cols.append('recordid')
    if 'b1_business_name' in df_for_db.columns: index_cols.append('b1_business_name')
    if 'apn_normalized' in df_for_db.columns or 'apn' in df_for_db.columns: index_cols.append('apn')
    if 'latitude' in df_for_db.columns: index_cols.append('latitude')
    
    print(f"‚úÖ Created indexes on: {', '.join(index_cols)}")

if business_licenses_clean is not None:
    load_to_datasette(business_licenses_clean, 'licenses')

‚úÖ Loaded 13004 records into table 'licenses'
   Database: ./berkeley.db
‚úÖ Created indexes on: recordid, b1_business_name, apn


In [12]:
# CELL 9: Query Data with SQL- new

# CELL 9: Query Data with SQL

def query_datasette_db(query, db_path='./berkeley.db'):
    """Execute SQL query on Datasette database"""
    conn = sqlite3.connect(db_path)
    df = pd.read_sql_query(query, conn)
    conn.close()
    return df

print("üìä Running SQL Queries:\n")

# Query 1: Count by business type
query1 = """
SELECT busdesc as business_type, COUNT(*) as count
FROM licenses
GROUP BY busdesc
ORDER BY count DESC
LIMIT 10
"""
result1 = query_datasette_db(query1)
print("Top 10 Business Types:")
display(result1)

print("\n")

# Query 2: Businesses by ZIP code
query2 = """
SELECT b1_zip as zip_code, COUNT(*) as count
FROM licenses
WHERE b1_zip IS NOT NULL
GROUP BY b1_zip
ORDER BY count DESC
LIMIT 10
"""
result2 = query_datasette_db(query2)
print("Top 10 ZIP Codes:")
display(result2)

print("\n")

# Query 3: Check how many have coordinates
query3 = """
SELECT 
    COUNT(*) as total_licenses,
    COUNT(latitude) as with_coordinates,
    ROUND(100.0 * COUNT(latitude) / COUNT(*), 1) as percent_mapped
FROM licenses
"""
result3 = query_datasette_db(query3)
print("Coordinate Coverage:")
display(result3)

üìä Running SQL Queries:

Top 10 Business Types:


Unnamed: 0,business_type,count
0,GENERAL CONTRACTOR,609
1,COMMERCIAL RENTAL,586
2,RENTAL PROPERTY,333
3,ELECTRICAL CONTRACTOR,290
4,RES. RENTAL - 4 UNITS,202
5,ROOFING CONTRACTOR,146
6,RES. RENTAL - 3 UNITS,140
7,RESTAURANT,131
8,PLUMBING CONTRACTOR,124
9,PSYCHOTHERAPY,120




Top 10 ZIP Codes:


Unnamed: 0,zip_code,count
0,94703,393
1,94705,391
2,94710,381
3,94702,336
4,94704,302
5,94709,278
6,94707,208
7,94706,77
8,94708,77
9,94530,65






DatabaseError: Execution failed on sql '
SELECT 
    COUNT(*) as total_licenses,
    COUNT(latitude) as with_coordinates,
    ROUND(100.0 * COUNT(latitude) / COUNT(*), 1) as percent_mapped
FROM licenses
': no such column: latitude

In [None]:
# CELL 10: Data Analysis with Pandas & NumPy - new

# CELL 10: Data Analysis with Pandas & NumPy

if business_licenses_clean is not None:
    print("üìä Data Analysis:\n")
    
    # 1. Business types
    if 'busdesc' in business_licenses_clean.columns:
        print("1. Top Business Types:")
        print(business_licenses_clean['busdesc'].value_counts().head(10))
        print("\n" + "="*60 + "\n")
    
    # 2. Distribution by city
    if 'b1_situs_city' in business_licenses_clean.columns:
        print("2. Businesses by Physical City:")
        print(business_licenses_clean['b1_situs_city'].value_counts())
        print("\n" + "="*60 + "\n")
    
    # 3. Coordinates coverage
    if 'latitude' in business_licenses_clean.columns:
        total = len(business_licenses_clean)
        with_coords = business_licenses_clean['latitude'].notna().sum()
        print(f"3. Location Coverage:")
        print(f"   Total businesses: {total:,}")
        print(f"   With coordinates: {with_coords:,} ({100*with_coords/total:.1f}%)")
        print("\n" + "="*60 + "\n")
    
    # 4. Missing data analysis
    print("4. Missing Data:")
    missing = business_licenses_clean.isnull().sum()
    missing_pct = (missing / len(business_licenses_clean)) * 100
    missing_df = pd.DataFrame({
        'Missing Count': missing,
        'Percentage': missing_pct
    }).sort_values('Missing Count', ascending=False)
    display(missing_df[missing_df['Missing Count'] > 0].head(10))

In [None]:
# CELL 11: Visualization with Seaborn

if business_licenses_clean is not None and 'business_type' in business_licenses_clean.columns:
    sns.set_style("whitegrid")
    plt.figure(figsize=(12, 6))
    
    top_types = business_licenses_clean['business_type'].value_counts().head(15)
    sns.barplot(x=top_types.values, y=top_types.index, palette='viridis')
    plt.title('Top 15 Business Types in Berkeley', fontsize=14, fontweight='bold')
    plt.xlabel('Count')
    plt.ylabel('Business Type')
    plt.tight_layout()
    
    # Save figure
    plt.savefig('/mnt/user-data/outputs/business_types.png', dpi=300, bbox_inches='tight')
    plt.show()
    
    print("‚úÖ Visualization saved")

In [None]:
# CELL 12: Interactive Visualization with Plotly

if business_licenses_clean is not None:
    # Interactive bar chart
    if 'business_type' in business_licenses_clean.columns:
        top_types = business_licenses_clean['business_type'].value_counts().head(20)
        
        fig = px.bar(
            x=top_types.values,
            y=top_types.index,
            orientation='h',
            title='Top 20 Business Types in Berkeley',
            labels={'x': 'Number of Businesses', 'y': 'Business Type'},
            color=top_types.values,
            color_continuous_scale='Viridis'
        )
        fig.update_layout(height=600, showlegend=False)
        fig.show()
    
    # Interactive map
    if 'latitude' in business_licenses_clean.columns and 'longitude' in business_licenses_clean.columns:
        map_df = business_licenses_clean.dropna(subset=['latitude', 'longitude']).head(1000)
        
        if len(map_df) > 0:
            fig = px.scatter_mapbox(
                map_df,
                lat='latitude',
                lon='longitude',
                hover_name='business_name' if 'business_name' in map_df.columns else None,
                zoom=12,
                height=600,
                title='Berkeley Business Locations (Sample)'
            )
            fig.update_layout(mapbox_style="open-street-map")
            fig.show()

In [None]:
# CELL 13: Launch Datasette

print("""
üöÄ To launch Datasette server:

In a terminal, run:
  datasette /mnt/user-data/outputs/berkeley_data.db --host 0.0.0.0 --port 8001

Then access at: http://localhost:8001
""")

# Create metadata file
metadata = {
    "title": "Berkeley Open Data",
    "description": "City of Berkeley business licenses and related data",
    "databases": {
        "berkeley_data": {
            "tables": {
                "business_licenses": {
                    "title": "Business Licenses",
                    "description": "Active business licenses in Berkeley, CA"
                }
            }
        }
    }
}

with open('/mnt/user-data/outputs/metadata.json', 'w') as f:
    json.dump(metadata, f, indent=2)

print("‚úÖ Metadata file created")

## Summary

This notebook demonstrates:
- ‚úÖ Fetching data from Berkeley Open Data API
- ‚úÖ Data cleaning and processing with Pandas
- ‚úÖ Exporting to CSV, JSON, and GeoJSON
- ‚úÖ Loading into SQLite for Datasette
- ‚úÖ SQL queries for analysis
- ‚úÖ Visualizations with Seaborn and Plotly
- ‚úÖ Interactive maps

### Next Steps:
1. Add more datasets from Berkeley Open Data
2. Create custom dashboards
3. Schedule automated updates
4. Integrate with OSMnx for spatial analysis
5. Build predictive models with TensorFlow