**----------HECTOR_________HNM**

 Install Required Libraries


In [1]:
!pip install openai PyMuPDF tiktoken


Collecting PyMuPDF
  Downloading pymupdf-1.25.3-cp39-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (3.4 kB)
Collecting tiktoken
  Downloading tiktoken-0.9.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Downloading pymupdf-1.25.3-cp39-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (20.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m20.0/20.0 MB[0m [31m75.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading tiktoken-0.9.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m63.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: PyMuPDF, tiktoken
Successfully installed PyMuPDF-1.25.3 tiktoken-0.9.0



**Extract Text from the PDF**

Use the PyMuPDF library to extract text from the PDF:

In [2]:
import fitz  # PyMuPDF

def extract_text_from_pdf(pdf_path):
    """Extract text from a PDF file."""
    doc = fitz.open(pdf_path)
    text = ""
    for page_num in range(len(doc)):
        page = doc.load_page(page_num)
        text += page.get_text()
    return text

# Example usage:
pdf_path = '/content/sample-research-proposal.pdf'
pdf_text = extract_text_from_pdf(pdf_path)



**Split Text into Manageable Chunks**

Given GPT-4's token limitations, it's essential to split the text into chunks that fit within these constraints. We'll use the tiktoken library to handle tokenization:

In [3]:
import tiktoken

def split_text_into_chunks(text, max_tokens=2000, model_name="gpt-4"):
    """Split text into chunks based on token limits."""
    tokenizer = tiktoken.encoding_for_model(model_name)
    words = text.split()
    chunks = []
    current_chunk = []
    current_length = 0

    for word in words:
        token_length = len(tokenizer.encode(word)) + 1  # +1 for the space
        if current_length + token_length > max_tokens:
            chunks.append(" ".join(current_chunk))
            current_chunk = [word]
            current_length = token_length
        else:
            current_chunk.append(word)
            current_length += token_length

    if current_chunk:
        chunks.append(" ".join(current_chunk))

    return chunks

# Split the extracted text into chunks
chunks = split_text_into_chunks(pdf_text)


**Set Up OpenAI API**


Configure the OpenAI API with your API key:

In [4]:
import openai

# Set your OpenAI API key
openai.api_key = 'API_KEY_Open_AI'


**Define the Custom Prompt**


a comprehensive prompt to guide GPT-4 in generating detailed summaries with chain-of-thought reasoning:


In [5]:
custom_prompt = """
You are an expert summarizer with a deep understanding of complex documents. Given the following passage, provide a clear, concise, and well-structured summary. Ensure the summary includes:

- **Objectives:** The main goals or purposes outlined.
- **Methodology:** The approaches and methods employed.
- **Key Findings:** The primary results or discoveries.
- **Conclusions:** The interpretations and implications.
- **Recommendations:** Suggested actions or next steps.

Additionally, incorporate chain-of-thought reasoning to elucidate the logical flow and connections between different sections of the document.

Passage:
"""



**Summarize Each Chunk**

Iterate over each chunk, apply the custom prompt, and generate summaries:

In [9]:
def summarize_chunk(chunk, prompt):
    """Summarize a text chunk using OpenAI's GPT-4."""
    # Initialize the OpenAI client, passing the API key
    client = openai.OpenAI(api_key='paste_Your_API') # Pass the API key to the OpenAI client

    # Call the API using the client object
    response = client.chat.completions.create( # with client.chat.completions.create().
        model="gpt-4",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt + chunk}
        ],
        max_tokens=500,
        temperature=0.7
    )
    return response.choices[0].message['content'].strip()

**Combine Chunk Summaries into a Final Summary**

Merge the individual chunk summaries and generate a comprehensive final summary:

In [15]:
# Initialize an empty list to store chunk summaries
chunk_summaries = []

# Iterate over the chunks and generate summaries for each chunk
for chunk in chunks:
    summary = summarize_chunk(chunk, custom_prompt)
    chunk_summaries.append(summary)

# Combine all chunk summaries
combined_summary_text = "\n\n".join(chunk_summaries)


In [13]:
def summarize_chunk(chunk, prompt):
    """Summarize a text chunk using OpenAI's GPT-4."""
    # Initialize the OpenAI client, passing the API key
    client = openai.OpenAI(api_key='PASTE YOUR API') # Pass the API key to the OpenAI client

    # Call the API using the client object
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt + chunk}
        ],
        max_tokens=500,
        temperature=0.7
    )
    # Access content using .message.content
    return response.choices[0].message.content.strip() # Access the content attribute of the ChatCompletionMessage object

In [16]:

# Generate a final cohesive summary
final_summary_prompt = """
You have been provided with summaries of different sections of a document. Your task is to integrate these summaries into a single, coherent, and well-structured summary. Ensure that the final summary maintains logical flow and effectively synthesizes the information from all sections.

Section Summaries:
"""

final_summary = summarize_chunk(combined_summary_text, final_summary_prompt)

# Display the final summary
print("Final Document Summary:\n")
print(final_summary)


Final Document Summary:

The integrated summary of the sections is as follows:

The various research proposals aim to elucidate the experiences and perceptions of different societal groups, focusing on working-class fathers post-divorce/separation, the changing perception of 'the people' in social welfare, and the shift in policing towards a consumer-based approach.

In the first two studies, the main goal is to explore the experiences of fathers after divorce/separation and their negotiation of fatherhood roles and work-life balance. The methodologies in these studies are primarily qualitative, employing methods such as semi-structured interviews, participant observation, and reflexive interviewing. The studies anticipate providing a nuanced understanding of post-divorce/separation fatherhood and its broader social and political significance, as well as informing more appropriate and egalitarian social policy. Recommendations include further research into these experiences and deeper 

**Notes:**


**Token Management**: Adjust the max_tokens parameter based on GPT-4's limitations to ensure the model processes the input effectively.


**Prompt Customization:** Tailor the custom_prompt to align with the specific content and desired focus of your documents.

**API Costs**: Be mindful of the costs associated with OpenAI API usage, especially when processing large documents.