## Notebook Description

This notebook automates the initial step in the PRISMA Flow process by validating PubMed articles against a specified PICO (Population, Intervention, Comparison, Outcome) statement. Using **Vertex AI** and **Gemini Flash 1.5**, the notebook determines whether each article matches the defined PICO criteria based on its title and abstract, streamlining the systematic review process.

## Objectives

1. **PICO Validation**: Ensure that each article aligns with the PICO statement provided by the user, using AI-powered validation.
2. **PRISMA Flow Step 1**: Automate the qualification of articles for inclusion in the PRISMA Flow process based on PICO relevance.
3. **Efficient Filtering**: Reduce the manual review workload by programmatically filtering articles that match the review's scope and objectives.

## Process Overview

1. **Input Data**:
   - Retrieve article **title** and **abstract** from PubMed.
   - Accept a **PICO statement** input from the user.
  
2. **Validation with Vertex AI (Gemini Flash 1.5)**:
   - Send the title, abstract, and PICO statement to Vertex AI.
   - Use **Gemini Flash 1.5** to evaluate if the article matches the criteria defined in the PICO statement.
  
3. **Output**:
   - Return a validation result indicating whether the article aligns with the PICO statement.
   - Generate a structured output for qualifying articles to proceed in the PRISMA Flow.


In [9]:
#We want to create a single file that contains all the pdfs in the pdfs/debd9b3c-4531-462c-b2c2-983b2710fe81 folder.

import PyPDF2
import os

#Get all the pdf filenames
pdf_files = []
for filename in os.listdir('pdfs/debd9b3c-4531-462c-b2c2-983b2710fe81'):
    if filename.endswith('.pdf'):
        pdf_files.append(filename)
pdf_files.sort(key=str.lower)

pdf_writer = PyPDF2.PdfWriter()

#Loop through all the pdf files
for filename in pdf_files[:10]:
    pdf_file_obj = open('pdfs/debd9b3c-4531-462c-b2c2-983b2710fe81/' + filename, 'rb')
    pdf_reader = PyPDF2.PdfReader(pdf_file_obj)

    #Loop through all the pages (except the first) and add them
    for page_num in range(0, len(pdf_reader.pages)):
        page_obj = pdf_reader.pages[page_num]
        pdf_writer.add_page(page_obj)

#Save the resulting pdf to a file
pdf_output = open('pdfs/question3_sample.pdf', 'wb')
pdf_writer.write(pdf_output)

pdf_output.close()