## 1. Setup Environment (Python 3.11^)

In [8]:
!pip install openai pdfplumber pandas python-dotenv

Collecting python-dotenv
  Using cached python_dotenv-1.0.1-py3-none-any.whl (19 kB)
Installing collected packages: python-dotenv
Successfully installed python-dotenv-1.0.1

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip available: [0m[31;49m22.3.1[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [18]:
# Import libraries
import pdfplumber
from openai import OpenAI
import re
import pandas as pd
from dotenv import load_dotenv
import os

## Load PDF and extract text

In [10]:
# Step 1: Extract text from PDF
def extract_text_from_pdf(pdf_path):
    with pdfplumber.open(pdf_path) as pdf:
        text = ""
        for page in pdf.pages:
            text += page.extract_text()
    return text

# Extract text from "The Lottery"
pdf_path = "Lottery.pdf"
raw_text = extract_text_from_pdf(pdf_path)

# Step 2: Preprocess text
def preprocess_text(text):
    # Remove extra spaces, headers, and line breaks
    clean_text = re.sub(r"\s+", " ", text)
    return clean_text

clean_text = preprocess_text(raw_text)

# print(clean_text)

## Get Extract Information with OpenAI API

In [19]:
load_dotenv()
api_key = os.getenv("OPENAI_API_KEY")
client = OpenAI(
    api_key=api_key
)

In [20]:
# Step 3: Analyze content with OpenAI
def query_openai(prompt):
    """Send a prompt to OpenAI and return the response."""
    response = client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": prompt,
            }
        ],
        model="gpt-4o",  # Replace with the model available to you
    )
    return response.choices[0].message.content

# Define helper functions for specific tasks
def get_characters(text):
    prompt = f"List all characters in the following story with descriptions: {text[:1500]}"
    return query_openai(prompt)


In [23]:
characters = get_characters(clean_text)

In [22]:
def get_settings(text):
    prompt = f"Describe the settings in this story: {text[:1500]}"
    return query_openai(prompt)
settings = get_settings(clean_text)

def generate_script(text):
    prompt = f"Convert this excerpt into a script format with dialogues and stage directions: {text[:1500]}"
    return query_openai(prompt)
script = generate_script(clean_text)

'Based on the provided excerpt from "The Lottery" by Shirley Jackson, here are the characters mentioned along with brief descriptions derived from the text:   1. **Bobby Martin**: A young boy who dodges his mother\'s grasp and runs back to a pile of stones. He is later called back by his father and takes his place between him and his oldest brother.2. **Bobby Martin\'s Father**: He speaks up sharply to call Bobby back when he runs off. He stands with his sons, indicating his presence is one of authority and order.3. **Bobby Martin\'s Oldest Brother**: Mentioned as standing next to Bobby and their father. He is part of the family unit attending the lottery.4. **Mr. Summers**: Described as a round-faced, jovial man. He has considerable time and energy to devote to civic activities and runs the coal business. People feel sorry for him because he has no children and his wife is described as a scold. He is responsible for conducting the lottery and other village events.This excerpt doesn\'t provide exhaustive character details, as the focus is mainly on the gathering event of the lottery and a glimpse into the atmosphere and a few villagers\' interactions. For a complete list of characters and more detailed descriptions, refer to the entire short story.'

In [24]:
# Step 5: Output Results
print("Characters:\n", characters)
print("\nSettings:\n", settings)
print("\nScript:\n", script)

Characters:
 Given that "The Lottery" by Shirley Jackson is a short story, I'll describe some of the key characters based on what you've provided and additional context:

1. **Mr. Summers**: He is responsible for conducting the lottery. Described as a round-faced, jovial man, Mr. Summers is dedicated to civic activities and also runs a coal business. Despite his cheerful demeanor, townspeople pity him due to his childless marriage and his wife, described as a scold.

2. **Bobby Martin**: A young boy who, like other children, is reluctant to participate in the organized gathering. He is playful and mischievous, initially ducking away from his mother’s grasp to run and play near the pile of stones.

3. **Mr. Martin**: Bobby Martin's father. He is seen as authoritative, as he speaks sharply to Bobby, ensuring his son takes his place between him and Bobby’s older brother.

4. **Mrs. Martin**: Bobby's mother, who tries to keep her son in line, indicating a typical family dynamic of discipli

In [25]:
# Optional: Save outputs to a text file or dataframe
results = {
    "Characters": characters,
    "Settings": settings,
    "Script": script
}

df = pd.DataFrame.from_dict(results, orient='index', columns=["Details"])
df.to_csv("extracted_story_details.csv", index=False)

print("Results saved to 'extracted_story_details.csv'.")

Results saved to 'extracted_story_details.csv'.
