# 🚀 Microsoft Fabric GraphQL API Demo
## Comprehensive Guide to Pagination, Filtering, and Querying

This notebook demonstrates how to effectively use GraphQL with Microsoft Fabric's API, focusing on:

- **🔄 Cursor-based Pagination** - Efficient data traversal using `first` and `after` parameters
- **🔍 Advanced Filtering** - Complex filtering with `TripFilterInput`, `GeographyFilterInput`, etc.
- **📊 Sorting & Ordering** - Using `TripOrderByInput` and other ordering inputs  
- **📈 Aggregations** - Leveraging built-in aggregation functions
- **🌐 Multi-entity Queries** - Working with Trips, Geography, Medallions, Weather, and more

### 🗂️ Available Data Types:
- **Trip** - Taxi trip records with pickup/dropoff details
- **Geography** - Location data with zip codes and addresses
- **Medallion** - Taxi medallion information
- **Weather** - Weather data correlated with trips
- **Time** - Time dimension tables
- **vw_PaymentAnalysis** - Payment analysis views

In [1]:
# 📦 Setup and Import Libraries
import requests
import json
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from azure.identity import DefaultAzureCredential
from datetime import datetime
import time
import warnings
warnings.filterwarnings('ignore')

# Set up plotting style
plt.style.use('default')
sns.set_palette("husl")

print("✅ All libraries imported successfully!")
print("📋 Libraries loaded:")
print("   - requests: GraphQL HTTP client")
print("   - pandas: Data manipulation")
print("   - matplotlib/seaborn: Visualization")
print("   - azure.identity: Authentication")
print("   - json: JSON handling")

✅ All libraries imported successfully!
📋 Libraries loaded:
   - requests: GraphQL HTTP client
   - pandas: Data manipulation
   - matplotlib/seaborn: Visualization
   - azure.identity: Authentication
   - json: JSON handling


In [2]:
# 🔐 Authentication and GraphQL Client Setup

# WARNING: This is for development/demo purposes only!
# In production, use proper app registration with client_id and scopes
print("🔑 Setting up authentication...")

app = DefaultAzureCredential()
scope = 'https://analysis.windows.net/powerbi/api/user_impersonation'
result = app.get_token(scope)

if not result.token:
    raise Exception("❌ Could not get access token")

# Configure GraphQL endpoint and headers
endpoint = 'https://eebc13ddb5604280aeb3c6d342982b7e.zee.graphql.fabric.microsoft.com/v1/workspaces/eebc13dd-b560-4280-aeb3-c6d342982b7e/graphqlapis/d8b1987f-67b0-4934-bfff-662417bb3fc7/graphql'

headers = {
    'Authorization': f'Bearer {result.token}',
    'Content-Type': 'application/json'
}

print("✅ Authentication successful!")
print("🌐 GraphQL endpoint configured")
print("📡 Ready to execute queries")

🔑 Setting up authentication...
✅ Authentication successful!
🌐 GraphQL endpoint configured
📡 Ready to execute queries


In [3]:
# 🛠️ GraphQL Helper Functions

def execute_graphql_query(query, variables=None):
    """
    Execute a GraphQL query with error handling and response parsing
    """
    try:
        payload = {'query': query}
        if variables:
            payload['variables'] = variables
            
        response = requests.post(endpoint, json=payload, headers=headers)
        response.raise_for_status()
        
        data = response.json()
        
        if 'errors' in data:
            print(f"❌ GraphQL errors: {data['errors']}")
            return None
            
        return data['data']
        
    except Exception as error:
        print(f"❌ Query failed: {error}")
        return None

def print_pagination_info(connection_data, entity_name="items"):
    """
    Print pagination information in a readable format
    """
    if not connection_data:
        return
        
    items_count = len(connection_data.get('items', []))
    has_next = connection_data.get('hasNextPage', False)
    end_cursor = connection_data.get('endCursor', 'None')
    
    print(f"📊 Retrieved {items_count} {entity_name}")
    print(f"📄 Has next page: {has_next}")
    print(f"🔗 End cursor: {end_cursor}")

print("✅ Helper functions loaded!")

✅ Helper functions loaded!


In [5]:
# 🚕 Single Page Trip Data Query (Page Size: 10)

def fetch_trips_page(page_size=10, after_cursor=None):
    """
    Fetch a single page of trip data using cursor-based pagination
    """
    # Build query arguments
    args = f"first: {page_size}"
    if after_cursor:
        args += f', after: "{after_cursor}"'
    
    query = f"""
    query {{
      trips({args}) {{
         items {{
            DateID
            MedallionID
            HackneyLicenseID
            PickupTimeID
            DropoffTimeID
            PickupGeographyID
            DropoffGeographyID
            PickupLatitude
            PickupLongitude
            PickupLatLong
            DropoffLatitude
            DropoffLongitude
            DropoffLatLong
            PassengerCount
            TripDurationSeconds
            TripDistanceMiles
            PaymentType
            FareAmount
            SurchargeAmount
            TaxAmount
            TipAmount
            TollsAmount
            TotalAmount
         }}
         endCursor
         hasNextPage
      }}
    }}
    """
    
    print(f"🔄 Fetching trips page (size: {page_size})")
    if after_cursor:
        print(f"   Using cursor: {after_cursor[:20]}...")
    
    data = execute_graphql_query(query)
    
    if data and 'trips' in data:
        print_pagination_info(data['trips'], "trips")
        return data['trips']
    
    return None

# Test: Fetch first page of trips
print("=== SINGLE PAGE FETCH TEST ===")
first_page = fetch_trips_page(page_size=10)

if first_page and first_page['items']:
    print(f"\n📋 Sample trip data:")
    sample_trip = first_page['items'][0]
    for key, value in sample_trip.items():
        print(f"   {key}: {value}")
else:
    print("❌ No trip data retrieved")

=== SINGLE PAGE FETCH TEST ===
🔄 Fetching trips page (size: 10)
📊 Retrieved 10 trips
📄 Has next page: True
🔗 End cursor: W3siRW50aXR5TmFtZSI6IlRyaXAiLCJGaWVsZE5hbWUiOiJEYXRlSUQiLCJGaWVsZFZhbHVlIjoyMDEzMDEwMSwiRGlyZWN0aW9uIjowfV0=

📋 Sample trip data:
   DateID: 20130101
   MedallionID: 1035
   HackneyLicenseID: 15003
   PickupTimeID: 20800
   DropoffTimeID: 21729
   PickupGeographyID: 74530
   DropoffGeographyID: 46957
   PickupLatitude: 40.8237
   PickupLongitude: -73.9548
   PickupLatLong: 40.8237,-73.9548
   DropoffLatitude: 40.9157
   DropoffLongitude: -73.8973
   DropoffLatLong: 40.9157,-73.8973
   PassengerCount: 1
   TripDurationSeconds: 928
   TripDistanceMiles: 7.9
   PaymentType: CSH
   FareAmount: 24
   SurchargeAmount: 1
   TaxAmount: 1
   TipAmount: 0
   TollsAmount: 2
   TotalAmount: 27


In [7]:
# 🔄 Complete Pagination: Fetch ALL Trip Data

def fetch_all_trips(page_size=100, max_pages=None, delay_seconds=0.5):
    """
    Fetch ALL trip data using pagination with configurable limits
    
    Args:
        page_size (int): Number of items per page
        max_pages (int): Maximum pages to fetch (None for unlimited)
        delay_seconds (float): Delay between requests to avoid rate limiting
    
    Returns:
        list: All trip records retrieved
    """
    all_trips = []
    after_cursor = None
    page_count = 0
    start_time = time.time()
    
    print(f"🚀 Starting complete trip data fetch")
    print(f"📋 Configuration:")
    print(f"   • Page size: {page_size}")
    print(f"   • Max pages: {max_pages or 'Unlimited'}")
    print(f"   • Delay between requests: {delay_seconds}s")
    print(f"{'='*50}")
    
    while True:
        page_count += 1
        
        # Check page limit
        if max_pages and page_count > max_pages:
            print(f"🛑 Reached maximum page limit: {max_pages}")
            break
        
        print(f"\n📖 Fetching page {page_count}...")
        
        # Fetch current page
        page_data = fetch_trips_page(page_size=page_size, after_cursor=after_cursor)
        
        if not page_data or not page_data.get('items'):
            print(f"ℹ️  No more data found at page {page_count}")
            break
        
        # Add items to collection
        items = page_data['items']
        all_trips.extend(items)
        
        print(f"   ✅ Added {len(items)} trips")
        print(f"   📊 Total trips collected: {len(all_trips)}")
        
        # Check if there's a next page
        if not page_data.get('hasNextPage', False):
            print(f"ℹ️  Reached end of data (no more pages)")
            break
        
        # Update cursor for next page
        after_cursor = page_data.get('endCursor')
        
        # Rate limiting delay
        if delay_seconds > 0:
            time.sleep(delay_seconds)
    
    elapsed_time = time.time() - start_time
    
    print(f"\n{'='*50}")
    print(f"🎯 PAGINATION COMPLETE!")
    print(f"📊 Final Results:")
    print(f"   • Total trips retrieved: {len(all_trips):,}")
    print(f"   • Pages processed: {page_count}")
    print(f"   • Time elapsed: {elapsed_time:.2f} seconds")
    print(f"   • Average time per page: {elapsed_time/page_count:.2f} seconds")
    
    return all_trips

# Execute complete pagination with reasonable limits for demo
print("=== COMPLETE PAGINATION DEMO ===")
print("⚠️  Fetching first 5 pages for demonstration (50 trips max)")
print("    In production, remove max_pages limit to fetch all data")

all_trip_data = fetch_all_trips(
    page_size=100,
    max_pages=5,  # Limit for demo - remove this in production
    delay_seconds=0.5  # Small delay to be respectful to the API
)


=== COMPLETE PAGINATION DEMO ===
⚠️  Fetching first 5 pages for demonstration (50 trips max)
    In production, remove max_pages limit to fetch all data
🚀 Starting complete trip data fetch
📋 Configuration:
   • Page size: 100
   • Max pages: 5
   • Delay between requests: 0.5s

📖 Fetching page 1...
🔄 Fetching trips page (size: 100)
📊 Retrieved 100 trips
📄 Has next page: True
🔗 End cursor: W3siRW50aXR5TmFtZSI6IlRyaXAiLCJGaWVsZE5hbWUiOiJEYXRlSUQiLCJGaWVsZFZhbHVlIjoyMDEzMDEwMSwiRGlyZWN0aW9uIjowfV0=
   ✅ Added 100 trips
   📊 Total trips collected: 100

📖 Fetching page 2...
🔄 Fetching trips page (size: 100)
   Using cursor: W3siRW50aXR5TmFtZSI6...
📊 Retrieved 100 trips
📄 Has next page: True
🔗 End cursor: W3siRW50aXR5TmFtZSI6IlRyaXAiLCJGaWVsZE5hbWUiOiJEYXRlSUQiLCJGaWVsZFZhbHVlIjoyMDEzMDEwMiwiRGlyZWN0aW9uIjowfV0=
   ✅ Added 100 trips
   📊 Total trips collected: 200

📖 Fetching page 3...
🔄 Fetching trips page (size: 100)
   Using cursor: W3siRW50aXR5TmFtZSI6...
📊 Retrieved 100 trips
📄 Has next

In [8]:
# 💾 Export Trip Data to JSON File

def save_trips_to_json(trips_data, filename=None):
    """
    Save trip data to a JSON file with metadata
    
    Args:
        trips_data (list): List of trip records
        filename (str): Optional filename, auto-generated if not provided
    
    Returns:
        str: Path to the saved file
    """
    if not trips_data:
        print("❌ No trip data to save")
        return None
    
    # Generate filename if not provided
    if not filename:
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        filename = f"fabric_trips_data_{timestamp}.json"
    
    # Create export data with metadata
    export_data = {
        "metadata": {
            "export_timestamp": datetime.now().isoformat(),
            "total_records": len(trips_data),
            "source": "Microsoft Fabric GraphQL API",
            "schema_version": "1.0"
        },
        "trips": trips_data
    }
    
    try:
        # Write to JSON file with proper formatting
        with open(filename, 'w', encoding='utf-8') as file:
            json.dump(export_data, file, indent=2, ensure_ascii=False, default=str)
        
        # Get file size
        import os
        file_size = os.path.getsize(filename)
        file_size_mb = file_size / (1024 * 1024)
        
        print(f"✅ Trip data successfully exported!")
        print(f"📁 File: {filename}")
        print(f"📊 Records: {len(trips_data):,}")
        print(f"💾 File size: {file_size:,} bytes ({file_size_mb:.2f} MB)")
        
        return filename
        
    except Exception as e:
        print(f"❌ Error saving file: {str(e)}")
        return None

# Save the fetched trip data to JSON file
if 'all_trip_data' in locals() and all_trip_data:
    print("💾 EXPORTING TRIP DATA TO JSON")
    print("=" * 50)
    
    saved_file = save_trips_to_json(all_trip_data)
    
    if saved_file:
        print(f"\n📋 Export Summary:")
        print(f"   • File location: {os.path.abspath(saved_file)}")
        print(f"   • Data structure: JSON with metadata wrapper")
        print(f"   • Encoding: UTF-8")
        print(f"   • Format: Pretty-printed with 2-space indentation")
        
        # Show sample structure
        print(f"\n🔍 File Structure Preview:")
        print(f"{{")
        print(f"  'metadata': {{")
        print(f"    'export_timestamp': '2025-09-24T...',")
        print(f"    'total_records': {len(all_trip_data)},")
        print(f"    'source': 'Microsoft Fabric GraphQL API',")
        print(f"    'schema_version': '1.0'")
        print(f"  }},")
        print(f"  'trips': [")
        print(f"    {{ trip_record_1 }},")
        print(f"    {{ trip_record_2 }},")
        print(f"    ...")
        print(f"  ]")
        print(f"}}")
        
else:
    print("⚠️  No trip data available to export")
    print("   Run the pagination cells above first to fetch trip data")

💾 EXPORTING TRIP DATA TO JSON
✅ Trip data successfully exported!
📁 File: fabric_trips_data_20250924_102403.json
📊 Records: 500
💾 File size: 372,022 bytes (0.35 MB)

📋 Export Summary:


NameError: name 'os' is not defined

In [None]:
# 📂 Advanced Export Options

def export_trips_multiple_formats(trips_data, base_filename=None):
    """
    Export trip data in multiple formats: JSON, CSV, and summary
    
    Args:
        trips_data (list): List of trip records
        base_filename (str): Base filename (without extension)
    
    Returns:
        dict: Dictionary with paths to all created files
    """
    if not trips_data:
        print("❌ No trip data to export")
        return {}
    
    # Generate base filename if not provided
    if not base_filename:
        timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
        base_filename = f"fabric_trips_{timestamp}"
    
    created_files = {}
    
    print(f"📂 EXPORTING TO MULTIPLE FORMATS")
    print(f"🏷️  Base filename: {base_filename}")
    print("=" * 50)
    
    # 1. Export as JSON (detailed format)
    try:
        json_filename = f"{base_filename}.json"
        export_data = {
            "metadata": {
                "export_timestamp": datetime.now().isoformat(),
                "total_records": len(trips_data),
                "source": "Microsoft Fabric GraphQL API",
                "export_format": "detailed_json",
                "columns": list(trips_data[0].keys()) if trips_data else [],
                "schema_version": "1.0"
            },
            "trips": trips_data
        }
        
        with open(json_filename, 'w', encoding='utf-8') as file:
            json.dump(export_data, file, indent=2, ensure_ascii=False, default=str)
        
        created_files['json'] = json_filename
        file_size = os.path.getsize(json_filename) / (1024 * 1024)
        print(f"✅ JSON: {json_filename} ({file_size:.2f} MB)")
        
    except Exception as e:
        print(f"❌ JSON export failed: {str(e)}")
    
    # 2. Export as CSV (for Excel compatibility)
    try:
        import pandas as pd
        csv_filename = f"{base_filename}.csv"
        
        df = pd.DataFrame(trips_data)
        df.to_csv(csv_filename, index=False, encoding='utf-8')
        
        created_files['csv'] = csv_filename
        file_size = os.path.getsize(csv_filename) / (1024 * 1024)
        print(f"✅ CSV: {csv_filename} ({file_size:.2f} MB)")
        
    except Exception as e:
        print(f"❌ CSV export failed: {str(e)}")
    
    # 3. Export summary statistics
    try:
        summary_filename = f"{base_filename}_summary.json"
        
        # Calculate summary statistics
        df = pd.DataFrame(trips_data)
        
        # Numeric columns summary
        numeric_summary = {}
        numeric_cols = df.select_dtypes(include=[int, float]).columns
        for col in numeric_cols:
            if col in df.columns:
                numeric_summary[col] = {
                    "count": int(df[col].count()),
                    "mean": float(df[col].mean()) if pd.notna(df[col].mean()) else None,
                    "std": float(df[col].std()) if pd.notna(df[col].std()) else None,
                    "min": float(df[col].min()) if pd.notna(df[col].min()) else None,
                    "max": float(df[col].max()) if pd.notna(df[col].max()) else None
                }
        
        # Categorical columns summary
        categorical_summary = {}
        categorical_cols = df.select_dtypes(include=[object]).columns
        for col in categorical_cols:
            if col in df.columns:
                value_counts = df[col].value_counts().head(10).to_dict()
                categorical_summary[col] = {
                    "unique_count": int(df[col].nunique()),
                    "top_values": {str(k): int(v) for k, v in value_counts.items()}
                }
        
        summary_data = {
            "metadata": {
                "export_timestamp": datetime.now().isoformat(),
                "source_records": len(trips_data),
                "analysis_type": "summary_statistics"
            },
            "dataset_overview": {
                "total_records": len(df),
                "total_columns": len(df.columns),
                "numeric_columns": len(numeric_cols),
                "categorical_columns": len(categorical_cols),
                "memory_usage_mb": round(df.memory_usage(deep=True).sum() / (1024 * 1024), 2)
            },
            "numeric_statistics": numeric_summary,
            "categorical_statistics": categorical_summary
        }
        
        with open(summary_filename, 'w', encoding='utf-8') as file:
            json.dump(summary_data, file, indent=2, ensure_ascii=False, default=str)
        
        created_files['summary'] = summary_filename
        file_size = os.path.getsize(summary_filename) / 1024  # KB for summary
        print(f"✅ Summary: {summary_filename} ({file_size:.1f} KB)")
        
    except Exception as e:
        print(f"❌ Summary export failed: {str(e)}")
    
    # 4. Create a manifest file
    try:
        manifest_filename = f"{base_filename}_manifest.json"
        
        manifest_data = {
            "export_info": {
                "timestamp": datetime.now().isoformat(),
                "base_filename": base_filename,
                "source": "Microsoft Fabric GraphQL API",
                "total_records": len(trips_data)
            },
            "files_created": created_files,
            "file_descriptions": {
                "json": "Complete trip data with metadata in JSON format",
                "csv": "Trip data in CSV format for Excel/analysis tools",
                "summary": "Statistical summary and data profiling information"
            }
        }
        
        with open(manifest_filename, 'w', encoding='utf-8') as file:
            json.dump(manifest_data, file, indent=2, ensure_ascii=False)
        
        created_files['manifest'] = manifest_filename
        print(f"✅ Manifest: {manifest_filename}")
        
    except Exception as e:
        print(f"❌ Manifest creation failed: {str(e)}")
    
    print("=" * 50)
    print(f"🎉 Export complete! Created {len(created_files)} files")
    
    return created_files

# Execute the multi-format export
if 'all_trip_data' in locals() and all_trip_data:
    print("🚀 EXECUTING MULTI-FORMAT EXPORT")
    
    export_results = export_trips_multiple_formats(all_trip_data)
    
    if export_results:
        print(f"\n📁 All files created in current directory:")
        for format_type, filename in export_results.items():
            full_path = os.path.abspath(filename)
            print(f"   {format_type.upper()}: {full_path}")
        
        print(f"\n💡 Usage Tips:")
        print(f"   • Open .csv file in Excel for data analysis")
        print(f"   • Use .json file for programmatic processing")
        print(f"   • Check _summary.json for data insights")
        print(f"   • Review _manifest.json for export details")
    
else:
    print("⚠️  No trip data available for export")
    print("   Execute the pagination cells first to fetch data")

In [None]:
# 📊 Data Analysis and Summary

if all_trip_data:
    print("=== TRIP DATA ANALYSIS ===")
    
    # Convert to DataFrame for analysis
    df_trips = pd.DataFrame(all_trip_data)
    
    print(f"📋 Dataset Overview:")
    print(f"   • Total trips: {len(df_trips):,}")
    print(f"   • Columns: {len(df_trips.columns)}")
    print(f"   • Data types:")
    
    # Show data types
    for col in df_trips.columns:
        dtype = df_trips[col].dtype
        non_null = df_trips[col].notna().sum()
        print(f"     - {col}: {dtype} ({non_null}/{len(df_trips)} non-null)")
    
    print(f"\n💰 Financial Summary:")
    if 'FareAmount' in df_trips.columns:
        fare_amounts = pd.to_numeric(df_trips['FareAmount'], errors='coerce')
        total_amounts = pd.to_numeric(df_trips['TotalAmount'], errors='coerce')
        
        print(f"   • Total Fare Amount: ${fare_amounts.sum():,.2f}")
        print(f"   • Average Fare: ${fare_amounts.mean():.2f}")
        print(f"   • Total Amount (with tips/taxes): ${total_amounts.sum():,.2f}")
        print(f"   • Average Total: ${total_amounts.mean():.2f}")
    
    print(f"\n🚗 Trip Summary:")
    if 'PassengerCount' in df_trips.columns:
        passenger_counts = pd.to_numeric(df_trips['PassengerCount'], errors='coerce')
        print(f"   • Total Passengers: {passenger_counts.sum():,}")
        print(f"   • Average Passengers per Trip: {passenger_counts.mean():.1f}")
    
    if 'TripDistanceMiles' in df_trips.columns:
        distances = pd.to_numeric(df_trips['TripDistanceMiles'], errors='coerce')
        print(f"   • Total Distance: {distances.sum():,.1f} miles")
        print(f"   • Average Distance: {distances.mean():.1f} miles")
    
    print(f"\n💳 Payment Type Distribution:")
    if 'PaymentType' in df_trips.columns:
        payment_counts = df_trips['PaymentType'].value_counts()
        for payment_type, count in payment_counts.items():
            percentage = (count / len(df_trips)) * 100
            print(f"   • {payment_type}: {count} trips ({percentage:.1f}%)")
    
    print(f"\n📋 Sample Records (first 3 trips):")
    print(df_trips.head(3).to_string(index=False))
    
else:
    print("❌ No trip data available for analysis")

In [None]:
# 📈 Trip Data Visualization

if all_trip_data and len(all_trip_data) > 0:
    # Create visualizations
    fig, axes = plt.subplots(2, 2, figsize=(15, 12))
    fig.suptitle('Trip Data Analysis Dashboard', fontsize=16, y=1.02)
    
    # 1. Fare Amount Distribution
    if 'FareAmount' in df_trips.columns:
        fare_amounts = pd.to_numeric(df_trips['FareAmount'], errors='coerce').dropna()
        if len(fare_amounts) > 0:
            axes[0, 0].hist(fare_amounts, bins=30, alpha=0.7, color='skyblue', edgecolor='black')
            axes[0, 0].set_title('Fare Amount Distribution')
            axes[0, 0].set_xlabel('Fare Amount ($)')
            axes[0, 0].set_ylabel('Frequency')
            axes[0, 0].axvline(fare_amounts.mean(), color='red', linestyle='--', 
                              label=f'Mean: ${fare_amounts.mean():.2f}')
            axes[0, 0].legend()
    
    # 2. Payment Type Distribution
    if 'PaymentType' in df_trips.columns:
        payment_counts = df_trips['PaymentType'].value_counts()
        if len(payment_counts) > 0:
            axes[0, 1].pie(payment_counts.values, labels=payment_counts.index, autopct='%1.1f%%')
            axes[0, 1].set_title('Payment Type Distribution')
    
    # 3. Trip Distance Distribution
    if 'TripDistanceMiles' in df_trips.columns:
        distances = pd.to_numeric(df_trips['TripDistanceMiles'], errors='coerce').dropna()
        if len(distances) > 0:
            axes[1, 0].hist(distances, bins=30, alpha=0.7, color='lightgreen', edgecolor='black')
            axes[1, 0].set_title('Trip Distance Distribution')
            axes[1, 0].set_xlabel('Distance (miles)')
            axes[1, 0].set_ylabel('Frequency')
            axes[1, 0].axvline(distances.mean(), color='red', linestyle='--', 
                              label=f'Mean: {distances.mean():.2f} miles')
            axes[1, 0].legend()
    
    # 4. Passenger Count Distribution
    if 'PassengerCount' in df_trips.columns:
        passenger_counts = df_trips['PassengerCount'].value_counts().sort_index()
        if len(passenger_counts) > 0:
            axes[1, 1].bar(passenger_counts.index, passenger_counts.values, 
                          alpha=0.7, color='orange', edgecolor='black')
            axes[1, 1].set_title('Passenger Count Distribution')
            axes[1, 1].set_xlabel('Number of Passengers')
            axes[1, 1].set_ylabel('Number of Trips')
            axes[1, 1].set_xticks(passenger_counts.index)
    
    plt.tight_layout()
    plt.show()
    
    # Create correlation matrix if we have numeric columns
    numeric_cols = df_trips.select_dtypes(include=[np.number]).columns
    if len(numeric_cols) > 1:
        plt.figure(figsize=(10, 8))
        correlation_matrix = df_trips[numeric_cols].corr()
        sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', center=0,
                   square=True, linewidths=0.5)
        plt.title('Trip Data Correlation Matrix')
        plt.tight_layout()
        plt.show()
    
else:
    print("❌ No data available for visualization")

## 🚀 Advanced Pagination Techniques

### Handling Large Datasets

When working with large datasets through GraphQL pagination, consider these best practices:

1. **Rate Limiting**: Add delays between requests to avoid overwhelming the API
2. **Error Handling**: Implement robust retry logic for network failures
3. **Memory Management**: Process data in chunks to avoid memory issues
4. **Progress Tracking**: Show progress for long-running operations
5. **Caching**: Store intermediate results to resume interrupted operations

In [None]:
# 🎯 Production-Ready Pagination with Error Handling

import time
import logging
from typing import List, Dict, Optional, Generator

def fetch_all_trips_robust(
    page_size: int = 10,
    max_total_records: Optional[int] = None,
    delay_between_requests: float = 0.5,
    max_retries: int = 3
) -> List[Dict]:
    """
    Robust pagination implementation with error handling and rate limiting.
    
    Args:
        page_size: Number of records per page
        max_total_records: Maximum total records to fetch (None = all)
        delay_between_requests: Seconds to wait between API calls
        max_retries: Maximum retry attempts for failed requests
    
    Returns:
        List of all trip records
    """
    all_trips = []
    cursor = None
    page_number = 1
    total_fetched = 0
    
    print(f"🚀 Starting robust pagination (page_size={page_size}, max_records={max_total_records})")
    print("=" * 60)
    
    while True:
        retry_count = 0
        current_page_data = None
        
        # Retry logic for current page
        while retry_count <= max_retries:
            try:
                print(f"📄 Fetching page {page_number} (attempt {retry_count + 1}/{max_retries + 1})")
                
                current_page_data = fetch_trips_page(
                    page_size=page_size, 
                    cursor=cursor
                )
                
                break  # Success, exit retry loop
                
            except Exception as e:
                retry_count += 1
                print(f"⚠️ Error on page {page_number}, attempt {retry_count}: {str(e)}")
                
                if retry_count <= max_retries:
                    wait_time = 2 ** retry_count  # Exponential backoff
                    print(f"⏳ Retrying in {wait_time} seconds...")
                    time.sleep(wait_time)
                else:
                    print(f"❌ Failed to fetch page {page_number} after {max_retries + 1} attempts")
                    print(f"📊 Returning {len(all_trips)} trips fetched so far")
                    return all_trips
        
        # Process successful page
        if current_page_data and current_page_data.get('items'):
            trips_on_page = current_page_data['items']
            all_trips.extend(trips_on_page)
            total_fetched += len(trips_on_page)
            
            print(f"✅ Page {page_number}: {len(trips_on_page)} trips fetched")
            print(f"📊 Total so far: {total_fetched} trips")
            
            # Check if we've reached the maximum limit
            if max_total_records and total_fetched >= max_total_records:
                all_trips = all_trips[:max_total_records]  # Trim to exact limit
                print(f"🏁 Reached maximum limit of {max_total_records} records")
                break
            
            # Check if there are more pages
            if not current_page_data.get('hasNextPage', False):
                print(f"🏁 No more pages available")
                break
            
            # Prepare for next page
            cursor = current_page_data.get('endCursor')
            if not cursor:
                print(f"⚠️ No cursor for next page, stopping")
                break
            
            page_number += 1
            
            # Rate limiting
            if delay_between_requests > 0:
                print(f"⏱️ Waiting {delay_between_requests}s before next request...")
                time.sleep(delay_between_requests)
        else:
            print(f"❌ No data returned for page {page_number}")
            break
    
    print("=" * 60)
    print(f"🎉 Pagination complete! Total trips fetched: {len(all_trips)}")
    return all_trips

In [None]:
# 🧪 Test the Robust Pagination (Limited to 50 records for demo)

print("🧪 TESTING ROBUST PAGINATION")
print("This demo fetches 50 records to show the robust pagination in action...")
print()

# Test with limited records for demonstration
test_trips = fetch_all_trips_robust(
    page_size=10,
    max_total_records=50,  # Limit for demo
    delay_between_requests=0.1,  # Minimal delay for demo
    max_retries=2
)

if test_trips:
    print(f"\n🎯 RESULTS SUMMARY:")
    print(f"   ✅ Successfully fetched {len(test_trips)} trip records")
    print(f"   📄 Pages processed: {(len(test_trips) - 1) // 10 + 1}")
    print(f"   🔍 Sample trip IDs: {[trip.get('TripId', 'N/A') for trip in test_trips[:5]]}")
else:
    print("\n❌ No trips were fetched")

## 📚 Summary & Next Steps

### What We've Accomplished

✅ **Authentication Setup**: Configured Microsoft Fabric GraphQL API access  
✅ **Schema Understanding**: Analyzed the Trip data structure with pagination support  
✅ **Basic Pagination**: Implemented simple page-by-page data fetching  
✅ **Complete Pagination**: Created functions to fetch ALL data using cursor-based pagination  
✅ **Data Analysis**: Added comprehensive data analysis and visualization  
✅ **Production-Ready Code**: Implemented robust error handling and rate limiting  

### Key Takeaways

1. **Page Size**: Use `first: 10` parameter to control page size
2. **Cursor Navigation**: Use `after: "cursor_value"` for next pages
3. **Completion Detection**: Check `hasNextPage` to know when to stop
4. **Error Handling**: Always implement retry logic for production systems
5. **Rate Limiting**: Add delays between requests to be API-friendly

### Next Steps

- **Scale Up**: Remove the `max_total_records` limit to fetch complete dataset
- **Data Pipeline**: Export results to CSV/database for further analysis
- **Monitoring**: Add logging and metrics for production deployments
- **Optimization**: Implement parallel processing for multiple data streams

### 🎯 Ready to Use!

Your pagination implementation is now ready to handle real-world scenarios with:
- Configurable page sizes
- Robust error handling  
- Progress tracking
- Memory-efficient processing
- Production-grade reliability