
# Import Libraries and Configure API Key
In this cell, we import the necessary libraries and configure the API key to access OpenAI's services.

- json: Used to handle data in JSON format.
- openai: Library for interacting with OpenAI's API.

In [28]:
import json
import openai

openai.api_key = "token"


This function extracts data from a JSON file containing information about movies.

In [29]:
def extract_movie_data(json_file):
    try:
        with open(json_file, 'r', encoding='utf-8') as file:
            movies = json.load(file)

        extracted_data = []
        for movie in movies:
            extracted_data.append({
                "title": movie.get("title"),
                "genre": movie.get("genre"),
                "description": movie.get("description")
            })

        return extracted_data

    except FileNotFoundError:
        print(f"File not found: {json_file}")
    except json.JSONDecodeError:
        print("Failed to decode JSON.")
    except Exception as e:
        print(f"Unexpected Error: {e}")


This function generates the input text (prompt) that will be sent to the GPT model. The prompt provides detailed instructions and an expected format for the results.

In [30]:
def create_prompt(batch):
    base_prompt = (
        "You are an expert in semantic analysis and audiovisual content categorization. "
        "Your task is to generate subcategories (max 2 topics) that describe specific aspects of movies based on their genres and descriptions. "
        "Here are the movies to analyze:\n\n"
    )
    for film in batch:
        base_prompt += (
            f"Title: '{film['title']}'\n"
            f"Genres: {film['genre']}\n"
            f"Description: {film['description']}\n\n"
        )
    base_prompt += (
        "For each movie, return the output in the following format:\n"
        "```\n"
        "[\n"
        "  {\n"
        "    \"title\": \"<Movie Title>\",\n"
        "    \"genre\": [<Genres>],\n"
        "    \"topics\": [<Generated Subcategories>]\n"
        "  }\n"
        "]\n"
        "```"
    )
    return base_prompt


This function sends the generated prompt to the GPT model and processes the response.

In [31]:
def call_gpt(prompt, model):
    try:
        response = openai.ChatCompletion.create(
            model=model,
            messages=[{"role": "user", "content": prompt}],
            temperature=0.7,
            max_tokens=1000
        )
        content = response["choices"][0]["message"]["content"]

        content_cleaned = content.replace("```json", "").replace("```", "").strip()

        return eval(content_cleaned)
    except Exception as e:
        print(f"Failed to use model: {e}")
        return None


This function coordinates the process of generating topics in batches.

In [32]:
def generate_topics(films, batch_size=10):
    results = []

    for i in range(0, len(films), batch_size):
        batch = films[i:i + batch_size]
        prompt = create_prompt(batch)
        response = call_gpt(prompt, "gpt-4o-mini")
        if response:
            results.extend(response)

    return results


In [33]:
json_file = "movies.json"

films = extract_movie_data(json_file)

results = generate_topics(films)

print(json.dumps(results, indent=4, ensure_ascii=False))


[
    {
        "title": "The Lord of the Rings: The Fellowship of the Ring",
        "genre": [
            "Action",
            "Adventure",
            "Drama"
        ],
        "topics": [
            "Epic Quest",
            "Fantasy Worldbuilding"
        ]
    },
    {
        "title": "Interstellar",
        "genre": [
            "Adventure",
            "Drama",
            "Sci-Fi"
        ],
        "topics": [
            "Space Exploration",
            "Human Survival"
        ]
    },
    {
        "title": "The Martian",
        "genre": [
            "Adventure",
            "Drama",
            "Sci-Fi"
        ],
        "topics": [
            "Survival Drama",
            "Ingenuity in Isolation"
        ]
    },
    {
        "title": "Star Wars",
        "genre": [
            "Action",
            "Adventure",
            "Fantasy"
        ],
        "topics": [
            "Galactic Conflict",
            "Hero's Journey"
        ]
    }
]
