In [35]:
import fitz  # PyMuPDF

def extract_text_from_pdf(pdf_path):
    text = ""
    with fitz.open(pdf_path) as doc:
        for page in doc:
            text += page.get_text()
    return text


In [36]:
import google.generativeai as genai

# Configure the Gemini API client
genai.configure(api_key="")

def summarize_text(text):
    model = genai.GenerativeModel("gemini-1.5-flash")
    response = model.generate_content(f"Summarize the following text:\n\n{text}")
    return response.text


In [38]:
def summarize_pdf(pdf_path):
    # Extract text from PDF
    text = extract_text_from_pdf(pdf_path)
    
    # Summarize the extracted text
    summary = summarize_text(text)
    
    return summary

# Example usage
pdf_path = r"C:\Users\Nilanjan\Desktop\hern.pdf"
summary = summarize_pdf(pdf_path)
print(summary)


This systematic review analyzes recent (2016-2021) approaches to generating synthetic tabular data for healthcare records, focusing on the use of Generative Adversarial Networks (GANs) to protect patient privacy.  The authors reviewed 34 publications and classified the methods into three categories: classical approaches (baseline methods, statistical models, and machine learning models), deep learning approaches (autoencoders, GANs, and ensembles), and other approaches (CoMSER, Aten Framework, SynSys, Synthea, Prophet).  While GAN-based methods showed promising results in generating usable and private data resembling real data, the study found a lack of universally accepted metrics for evaluating resemblance, utility, and privacy.  The authors conclude that further research is needed to develop standardized evaluation methods and improve the generalizability of GANs for diverse tabular healthcare datasets.



In [39]:
def chunk_text(text, max_tokens=2048):
    # Split text into chunks of max_tokens size
    words = text.split()
    for i in range(0, len(words), max_tokens):
        yield ' '.join(words[i:i + max_tokens])

def summarize_large_pdf(pdf_path):
    text = extract_text_from_pdf(pdf_path)
    summaries = []
    for chunk in chunk_text(text):
        summary = summarize_text(chunk)
        summaries.append(summary)
    return ' '.join(summaries)


In [43]:
if __name__ == "__main__":
    # Ask the user for the PDF file path
    pdf_path = input("Enter the path to the PDF file: ").strip()

    try:
        # Summarize the PDF content
        summary = summarize_large_pdf(pdf_path)

        # Output the summary
        print("\nSummary of the PDF:")
        print(summary)
    except FileNotFoundError:
        print(f"Error: The file '{pdf_path}' was not found.")
    except Exception as e:
        print(f"An error occurred: {e}")
        


Enter the path to the PDF file: C:\Users\Nilanjan\Desktop\CTGAN first.pdf

Summary of the PDF:
This paper introduces CTGAN, a Conditional Tabular GAN, designed to generate realistic synthetic tabular data.  Existing methods struggle with the complexities of tabular data, including mixed data types (continuous and discrete), non-Gaussian and multimodal distributions in continuous columns, imbalanced categorical columns, and the sparsity of one-hot encoded vectors.  CTGAN addresses these challenges using several novel techniques: mode-specific normalization to handle non-Gaussian and multimodal continuous data, a conditional generator and training-by-sampling to mitigate imbalanced categorical columns.  Benchmarked against Bayesian network baselines and other GANs on a suite of simulated and real datasets, CTGAN significantly outperforms existing methods in terms of both likelihood fitness and machine learning efficacy of the generated data.  The authors also provide an open-source bench

In [44]:
def ask_question_about_pdf(pdf_text, question):
    # Combine PDF content with the question for context
    context = f"Text: {pdf_text}\n\nQuestion: {question}"
    
    model = genai.GenerativeModel("gemini-1.5-flash")
    response = model.generate_content(f"{context}\nAnswer the question.")
    return response.text

# Function to load the PDF and initiate the chatbot loop
def chatbot_interaction(pdf_path):
    # Extract text from the provided PDF
    pdf_text = extract_text_from_pdf(pdf_path)

    print("PDF loaded successfully! You can now ask questions.")
    
    # Interactive chatbot loop for real-time questions
    while True:
        # Get user input (question)
        question = input("\nWhat would you like to ask about the PDF? (Type 'exit' to quit): ")

        # Exit condition
        if question.lower() == "exit":
            print("Goodbye!")
            break

        # Get the answer from the model
        answer = ask_question_about_pdf(pdf_text, question)
        
        # Print the model's answer
        print("\nAnswer:", answer)

# Interact with the model in a Jupyter cell
pdf_path = input("Enter the path to the PDF file: ").strip()

# Start the chatbot interaction directly within the editor
chatbot_interaction(pdf_path)

Enter the path to the PDF file: C:\Users\Nilanjan\Desktop\CTGAN first.pdf
PDF loaded successfully! You can now ask questions.

What would you like to ask about the PDF? (Type 'exit' to quit): what is it about?

Answer: This paper introduces CTGAN, a Conditional Tabular Generative Adversarial Network, designed to generate realistic synthetic tabular data.  The authors address challenges faced by existing GANs and other methods when dealing with the complexities of tabular data, such as mixed data types (continuous and discrete), non-Gaussian and multimodal distributions, imbalanced categorical columns, and sparse one-hot encodings.  CTGAN incorporates several novel techniques, including mode-specific normalization, a conditional generator, and training-by-sampling, to overcome these challenges.  The paper presents a comprehensive benchmark using simulated and real-world datasets, demonstrating that CTGAN significantly outperforms existing methods, including Bayesian network baselines an