In [1]:
# !pip install PyPDF2

Collecting PyPDF2
  Downloading pypdf2-3.0.1-py3-none-any.whl.metadata (6.8 kB)
Downloading pypdf2-3.0.1-py3-none-any.whl (232 kB)
Installing collected packages: PyPDF2
Successfully installed PyPDF2-3.0.1


In [1]:
# Imports
import os
import requests
from dotenv import load_dotenv
from PyPDF2 import PdfReader
from IPython.display import Markdown, display
from openai import OpenAI

In [2]:
# Load environment variables in a file called .env
load_dotenv()
api_key = os.getenv('OPENAI_API_KEY')

In [3]:
# Check the key
if not api_key:
    print("No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!")
elif not api_key.startswith("sk-proj-"):
    print("An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook")
elif api_key.strip() != api_key:
    print("An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook")
else:
    print("API key found and looks good so far!")

API key found and looks good so far!


In [4]:
# Initialize OpenAI API
openai = OpenAI()

In [5]:
# System prompt
system_prompt = (
    "You are an assistant that analyzes PDF documents and provides detailed summaries. "
    "Your response should be concise yet comprehensive, covering the following elements: "
    "problem, method, result, uniqueness, relevance, and conclusion. "
    "Use a professional and formal tone throughout your response."
)

In [6]:
# Function to download and save the PDF
def download_pdf(url, save_path):
    response = requests.get(url)
    if response.status_code == 200:
        with open(save_path, 'wb') as file:
            file.write(response.content)
        print(f"PDF downloaded successfully: {save_path}")
    else:
        raise Exception(f"Failed to download PDF. Status code: {response.status_code}")


In [7]:
# Function to extract text from PDF
def extract_text_from_pdf(pdf_path):
    reader = PdfReader(pdf_path)
    text = ""
    for page in reader.pages:
        text += page.extract_text()
    return text


In [8]:
# Function to generate user prompt
def user_prompt_for(pdf_text):
    return (
        "You are analyzing a PDF document. Please summarize the document's content, "
        "including the problem addressed, the methods used, the results obtained, "
        "its uniqueness, relevance to the field, and the overall conclusion. "
        f"Here is the text of the document:\n\n{pdf_text}"
    )

In [9]:
# Summarize the PDF
def summarize_pdf(url):
    try:
        # Step 1: Download the PDF
        pdf_path = "temp.pdf"  # Temporary file to save the PDF
        download_pdf(url, pdf_path)
        
        # Step 2: Extract text from the PDF
        pdf_text = extract_text_from_pdf(pdf_path)
        
        # Step 3: Generate user prompt
        user_prompt = user_prompt_for(pdf_text)
        
        # Step 4: Call OpenAI API
        response = openai.chat.completions.create(
            model="gpt-4o-mini",
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_prompt}
            ]
        )
        return response.choices[0].message.content
    except Exception as e:
        return f"Error summarizing PDF: {e}"

In [10]:
# Display the PDF summary
def display_pdf_summary(url):
    summary = summarize_pdf(url)
    display(Markdown(f"## Summary for PDF at {url}\n\n{summary}"))

In [11]:
# Test the function with the given PDF link
pdf_url = "https://www.nature.com/articles/s12276-023-01050-9.pdf"
display_pdf_summary(pdf_url)

PDF downloaded successfully: temp.pdf


## Summary for PDF at https://www.nature.com/articles/s12276-023-01050-9.pdf

**Summary of the Document: "MicroRNA: Trends in Clinical Trials of Cancer Diagnosis and Therapy Strategies"**

**Problem Addressed:**  
The review article addresses the significant role of microRNAs (miRNAs) in cancer, emphasizing their potential as biomarkers for diagnosis and therapeutic targets while highlighting the need for updated clinical data on their use.

**Method:**  
The authors conducted a comprehensive literature review, collating information from clinical trials and preclinical studies involving miRNAs in cancer settings. Strategies involving miRNA mimics and inhibitors were analyzed, alongside their methods of delivery and clinical application.

**Results Obtained:**  
The review provided insights into various miRNAs implicated in cancer, such as miR-34a, miR-16, miR-155, and miR-193a-3p, highlighting specific clinical trials associated with these miRNAs. For instance, the miRNA mimic MRX34 faced challenges with safety, leading to its discontinuation, whereas other trials, such as those for MRG-106 and INT-1B3, showed promising therapeutic potential.

**Uniqueness:**  
The document offers a detailed synthesis of both the existing and ongoing clinical trials, which is a crucial aspect often underrepresented in literature regarding miRNAs in cancer. It highlights the dynamic nature of miRNA functions, the evolution of therapeutic approaches, and the specificity of miRNA interactions within the cancer landscape.

**Relevance to the Field:**  
This review is highly relevant as it aligns with the growing interest in personalized medicine and the need for reliable biomarkers and therapeutic strategies in oncology. The calls for more extensive clinical trials and refined strategies for miRNA-based therapies reflect ongoing challenges and opportunities in cancer treatment.

**Conclusion:**  
The authors conclude that despite the challenges faced in early clinical trials, the substantial potential of miRNAs as both diagnostic markers and therapeutic targets in cancer remains. Advancements in synthetic RNA technologies and improved delivery systems could enhance the safety and efficacy of miRNA therapies, indicating that miRNA-focused research will likely play a pivotal role in the development of next-generation cancer treatments.