# Simulated Visit Generator

The function below generates a simulated visit to a web site. A visit has a collection of events. There are 3 types of events: Page Views, Add Item to Cart, and Purchase. 

All events have page views. Some have Add to Cart events. Some of the visits with Add to Cart events have purchases. 

The OpenAPI schema for a Visit is shown below. 

```
openapi: 3.0.0
info:
  title: Visit Schema API
  version: 1.0.0
  description: Schema for representing a visit to a website, including page views, adding items to a cart, and purchases.
paths: {}
components:
  schemas:
    Visit:
      type: object
      properties:
        session_id:
          type: string
          example: "SID-1234"
          description: "A unique identifier for the user's session."
        user_id:
          type: string
          example: "UID-5678"
          description: "A unique identifier for the user visiting the website."
        device_type:
          type: string
          enum: [desktop, mobile, tablet]
          example: "desktop"
          description: "The type of device used by the user."
        geolocation:
          type: string
          example: "37.7749,-122.4194"
          description: "The geolocation of the user in latitude,longitude format."
        user_agent:
          type: string
          example: "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
          description: "The user agent string of the browser/device used by the user."
        events:
          type: array
          items:
            $ref: '#/components/schemas/Event'
          description: "List of events during the user's visit."

    Event:
      type: object
      properties:
        event_type:
          type: string
          enum: [page_view, add_item_to_cart, purchase]
          example: "page_view"
          description: "The type of event that occurred."
        timestamp:
          type: string
          format: date-time
          example: "2023-08-10T12:34:56Z"
          description: "The exact time when the event occurred."
        details:
          type: object
          oneOf:
            - $ref: '#/components/schemas/PageViewDetails'
            - $ref: '#/components/schemas/AddItemToCartDetails'
            - $ref: '#/components/schemas/PurchaseDetails'
          description: "Specific details of the event based on its type."

    PageViewDetails:
      type: object
      properties:
        page_url:
          type: string
          example: "https://example.com/products"
          description: "The URL of the webpage that was viewed."
        referrer_url:
          type: string
          nullable: true
          example: "https://google.com"
          description: "The URL of the referrer page that led to this page view, or null if none."

    AddItemToCartDetails:
      type: object
      properties:
        product_id:
          type: string
          example: "HDW-001"
          description: "The unique identifier of the product added to the cart."
        product_name:
          type: string
          example: "Laptop X200"
          description: "The name of the product added to the cart."
        category:
          type: string
          enum: [hardware, software, peripherals]
          example: "hardware"
          description: "The category of the product added to the cart."
        price:
          type: number
          format: float
          example: 999.99
          description: "The price of the product added to the cart."
        quantity:
          type: integer
          example: 2
          description: "The quantity of the product added to the cart."

    PurchaseDetails:
      type: object
      properties:
        order_id:
          type: string
          example: "ORD-4321"
          description: "A unique identifier for the order."
        amount:
          type: number
          format: float
          example: 1999.98
          description: "The total amount of the purchase."
        currency:
          type: string
          example: "USD"
          description: "The currency used for the purchase."
        items:
          type: array
          items:
            $ref: '#/components/schemas/PurchaseItem'
          description: "A list of items purchased in this order."

    PurchaseItem:
      type: object
      properties:
        product_id:
          type: string
          example: "HDW-001"
          description: "The unique identifier of the product purchased."
        product_name:
          type: string
          example: "Laptop X200"
          description: "The name of the product purchased."
        category:
          type: string
          enum: [hardware, software, peripherals]
          example: "hardware"
          description: "The category of the product purchased."
        price:
          type: number
          format: float
          example: 999.99
          description: "The price of the product purchased."
        quantity:
          type: integer
          example: 2
          description: "The quantity of the product purchased."

```

In [1]:
import random
from datetime import datetime
import json

def generate_visit():
    # Sample products categorized by type with hard-coded product IDs
    products = {
        "hardware": [
            {"product_id": "HDW-001", "name": "Laptop X200", "price": 999.99},
            {"product_id": "HDW-002", "name": "Desktop Z500", "price": 1299.99},
            {"product_id": "HDW-003", "name": "Gaming PC Y900", "price": 1899.99},
            {"product_id": "HDW-004", "name": "Ultrabook A400", "price": 1199.99},
            {"product_id": "HDW-005", "name": "Workstation Pro 9000", "price": 2599.99},
            {"product_id": "HDW-006", "name": "Mini PC Cube", "price": 699.99}
        ],
        "software": [
            {"product_id": "SFT-001", "name": "Office Suite Pro", "price": 199.99},
            {"product_id": "SFT-002", "name": "Antivirus Shield", "price": 49.99},
            {"product_id": "SFT-003", "name": "Photo Editor Pro", "price": 79.99},
            {"product_id": "SFT-004", "name": "Project Manager Plus", "price": 299.99},
            {"product_id": "SFT-005", "name": "Video Editor Pro", "price": 149.99},
            {"product_id": "SFT-006", "name": "Music Studio 2024", "price": 89.99}
        ],
        "peripherals": [
            {"product_id": "PER-001", "name": "Wireless Mouse", "price": 29.99},
            {"product_id": "PER-002", "name": "Mechanical Keyboard", "price": 89.99},
            {"product_id": "PER-003", "name": "27\" 4K Monitor", "price": 399.99},
            {"product_id": "PER-004", "name": "USB-C Docking Station", "price": 129.99},
            {"product_id": "PER-005", "name": "Noise Cancelling Headphones", "price": 199.99},
            {"product_id": "PER-006", "name": "Webcam HD 1080p", "price": 49.99}
        ]
    }

    # Helper function to generate a timestamp
    def generate_timestamp():
        return datetime.now().isoformat()

    # Helper function to select a random product from a category
    def select_random_product():
        category = random.choice(list(products.keys()))
        product = random.choice(products[category])
        return product, category

    # Generating the base session details
    session = {
        "session_id": f"SID-{random.randint(1000, 9999)}",
        "user_id": f"UID-{random.randint(1000, 9999)}",
        "device_type": random.choice(["desktop", "mobile", "tablet"]),
        "geolocation": f"{random.uniform(-90, 90):.6f},{random.uniform(-180, 180):.6f}",
        "user_agent": random.choice([
            "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36",
            "Mozilla/5.0 (iPhone; CPU iPhone OS 14_6 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0 Mobile/15E148 Safari/604.1",
            "Mozilla/5.0 (Linux; Android 10; SM-G973F) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Mobile Safari/537.36"
        ]),
        "events": []
    }

    # Adding between 2 and 6 page_view events
    num_page_views = random.randint(2, 6)
    for _ in range(num_page_views):
        page_view_event = {
            "event": {
                "event_type": "page_view",
                "timestamp": generate_timestamp(),
                "details": {
                    "page_url": random.choice([
                        "https://example.com/home",
                        "https://example.com/products",
                        "https://example.com/about",
                        "https://example.com/contact",
                        "https://example.com/cart"
                    ]),
                    "referrer_url": random.choice([
                        "https://google.com",
                        "https://example.com/home",
                        None  # No referrer for the first page view
                    ])
                }
            }
        }
        session["events"].append(page_view_event)

    # Determine whether to add add_item_to_cart events
    added_items = []
    if random.random() < 0.5:  # 50% chance to add items to the cart
        num_items_to_add = random.randint(1, 3)
        for _ in range(num_items_to_add):
            product, category = select_random_product()
            add_item_to_cart_event = {
                "event": {
                    "event_type": "add_item_to_cart",
                    "timestamp": generate_timestamp(),
                    "details": {
                        "product_id": product["product_id"],
                        "product_name": product["name"],
                        "category": category,
                        "price": product["price"],
                        "quantity": random.randint(1, 5)
                    }
                }
            }
            session["events"].append(add_item_to_cart_event)
            added_items.append(add_item_to_cart_event)

    # Determine whether to add a purchase event
    if added_items and random.random() < 0.5:  # Only add purchase if items were added to cart
        total_amount = sum(
            item["event"]["details"]["price"] * item["event"]["details"]["quantity"]
            for item in added_items
        )
        purchase_event = {
            "event": {
                "event_type": "purchase",
                "timestamp": generate_timestamp(),
                "details": {
                    "order_id": f"ORD-{random.randint(1000, 9999)}",
                    "amount": total_amount,
                    "currency": "USD",
                    "items": [
                        {
                            "product_id": item["event"]["details"]["product_id"],
                            "product_name": item["event"]["details"]["product_name"],
                            "category": item["event"]["details"]["category"],
                            "price": item["event"]["details"]["price"],
                            "quantity": item["event"]["details"]["quantity"]
                        }
                        for item in added_items
                    ]
                }
            }
        }
        session["events"].append(purchase_event)

    return session

# Example usage
visit = generate_visit()
visit_json = json.dumps(visit, indent=4)
print(visit_json)


{
    "session_id": "SID-2491",
    "user_id": "UID-8492",
    "device_type": "desktop",
    "geolocation": "58.819722,-47.734615",
    "user_agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 14_6 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0 Mobile/15E148 Safari/604.1",
    "events": [
        {
            "event": {
                "event_type": "page_view",
                "timestamp": "2024-08-11T17:49:09.861623",
                "details": {
                    "page_url": "https://example.com/contact",
                    "referrer_url": "https://example.com/home"
                }
            }
        },
        {
            "event": {
                "event_type": "page_view",
                "timestamp": "2024-08-11T17:49:09.861639",
                "details": {
                    "page_url": "https://example.com/contact",
                    "referrer_url": "https://example.com/home"
                }
            }
        },
        {
            "event"

# Generate a file with sample visits

The function below generates a file with a variable number of visits. 

In [8]:
import random
from datetime import datetime
import json
import os

# Assuming the previous generate_visit function is already defined

def generate_visits_to_jsonl(num_visits, file_name):
    with open(file_name, 'w') as f:
        for _ in range(num_visits):
            visit = generate_visit()
            visit_json = json.dumps(visit)
            f.write(visit_json + '\n')

# Example usage:

FILENAME = 'visits-1000.jsonl'
os.environ['FILENAME'] = FILENAME

COUNT = 1000


generate_visits_to_jsonl(COUNT, FILENAME)

print("Done")

Done


# Display the first few records of the generated file

In [9]:
! head $FILENAME

{"session_id": "SID-3547", "user_id": "UID-3208", "device_type": "tablet", "geolocation": "61.139804,-30.017778", "user_agent": "Mozilla/5.0 (Linux; Android 10; SM-G973F) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Mobile Safari/537.36", "events": [{"event": {"event_type": "page_view", "timestamp": "2024-08-11T17:55:31.834052", "details": {"page_url": "https://example.com/cart", "referrer_url": null}}}, {"event": {"event_type": "page_view", "timestamp": "2024-08-11T17:55:31.834080", "details": {"page_url": "https://example.com/contact", "referrer_url": "https://google.com"}}}, {"event": {"event_type": "page_view", "timestamp": "2024-08-11T17:55:31.834088", "details": {"page_url": "https://example.com/about", "referrer_url": null}}}, {"event": {"event_type": "page_view", "timestamp": "2024-08-11T17:55:31.834093", "details": {"page_url": "https://example.com/about", "referrer_url": "https://example.com/home"}}}]}
{"session_id": "SID-9220", "user_id": "UID-1936", "device_t