### Data extraction of research papers to a specific JSON schema.

In [0]:
import os
import json
from openai import OpenAI

DATABRICKS_TOKEN = ""
DATABRICKS_BASE_URL = "https://e2-demo-field-eng.cloud.databricks.com/serving-endpoints"

client = OpenAI(
  api_key=DATABRICKS_TOKEN,
  base_url=DATABRICKS_BASE_URL
  )

response_format = {
      "type": "json_schema",
      "json_schema": {
        "name": "research_paper_extraction",
        "schema": {
          "type": "object",
          "properties": {
            "title": { "type": "string" },
            "authors": {
              "type": "array",
              "items": { "type": "string" }
            },
            "abstract": { "type": "string" },
            "keywords": {
              "type": "array",
              "items": { "type": "string" }
            }
          },
        },
        "strict": True
      }
    }

In [0]:
# Unstructured research paper content (messy, real-world format)
unstructured_paper_text = """
Proceedings of the International Conference on Machine Learning and Data Science 2024

Real-Time Anomaly Detection in IoT Networks using Deep Learning: A Hybrid CNN-LSTM Approach

Sarah Chen¹, Michael Rodriguez², Dr. Priya Patel¹, James Thompson³

¹University of California, Berkeley - Department of Computer Science
²Stanford Research Institute  
³Microsoft Research Labs

Received: March 15, 2024 | Accepted: August 22, 2024 | Published: September 10, 2024

INTRODUCTION AND BACKGROUND

The Internet of Things (IoT) has revolutionized how we interact with technology, with an estimated 75 billion connected devices expected by 2025. However, this exponential growth brings unprecedented security challenges. Traditional signature-based intrusion detection systems are inadequate for the dynamic and heterogeneous nature of IoT environments.

METHODOLOGY AND APPROACH

In this work, we propose a novel hybrid architecture that combines the spatial feature extraction capabilities of Convolutional Neural Networks with the temporal modeling strengths of Long Short-Term Memory networks. Our approach processes network traffic data in real-time, analyzing packet headers, payload characteristics, and temporal patterns to identify anomalous behavior.

EXPERIMENTAL SETUP

We collected network traffic data from smart home environments including smart thermostats, security cameras, voice assistants, and lighting systems. The dataset comprised 2.5 million network packets gathered over a six-month period from January to June 2024. Data preprocessing involved feature normalization, sequence padding, and splitting into training (70%), validation (15%), and testing (15%) sets.

RESULTS AND FINDINGS

Our hybrid CNN-LSTM model achieved remarkable performance metrics: 94.7% detection accuracy, 2.1% false positive rate, and 15 milliseconds average response time. When compared to traditional rule-based systems, our approach showed 23% improvement in accuracy and 67% reduction in false alarms. The model successfully detected various attack types including DDoS, man-in-the-middle attacks, and device hijacking attempts.

CONCLUSION

This research demonstrates that deep learning techniques, specifically the combination of CNN and LSTM architectures, provide a robust solution for real-time IoT anomaly detection. The system's low latency and high accuracy make it suitable for deployment in production environments where immediate threat response is critical.

Related research areas include: deep learning applications, anomaly detection algorithms, IoT security frameworks, neural network architectures, real-time processing systems, cybersecurity solutions, machine learning for network security, and network traffic analysis techniques.

© 2024 International Conference on Machine Learning and Data Science. All rights reserved.
DOI: 10.1234/icmlds.2024.5678
Page 142-158
"""

In [0]:
messages = [{
        "role": "system",
        "content": "You are an expert at structured data extraction. You will be given unstructured text from a research paper and should convert it into the given structure."
      },
      {
        "role": "user",
        "content": unstructured_paper_text
      }]

response = client.chat.completions.create(
    model="databricks-meta-llama-3-3-70b-instruct",
    messages=messages,
    response_format=response_format
)

print(json.dumps(response.choices[0].message.model_dump()['content'], indent=2))

In [0]:
print(response.choices[0].message.content)

In [0]:
import os
import json
from openai import OpenAI

response_format = {
      "type": "json_object",
    }

messages = [
      {
        "role": "user",
        "content": "Extract the name, size, price, and color from this product description as a JSON object:\n<description>\nThe SmartHome Mini is a compact smart home assistant available in black or white for only $49.99. It's 5 inches wide.\n</description>"
      }]

response = client.chat.completions.create(
    model="databricks-meta-llama-3-3-70b-instruct",
    messages=messages,
    response_format=response_format
)

print(json.dumps(response.choices[0].message.model_dump()['content'], indent=2))

In [0]:
print(response.choices[0].message.content)

In [0]:
print(response.choices[0].message.model_dump()['content'])