## Integration of a Large Language Model (LLM)

To enrich the analysis and support interpretation tasks,  
we integrated the **OpenAI GPT model** using the official Python API.

The model can be used to:
- generate explanations and summaries of results,
- assist with code documentation and interpretation,
- or even answer natural language queries about the dataset.

The API key is securely loaded from a `.env` file to protect credentials.

```python
from openai import OpenAI
client = OpenAI(api_key=api_key)

## OpenAI API Setup

We load the OpenAI API key from a secure `.env` file using the `python-dotenv` package.  
This allows us to query GPT models directly from the notebook.


In [19]:
import openai
from dotenv import load_dotenv
import os

# Load API key from .env file
load_dotenv("../.env")
print("Key found:", os.getenv("OPENAI_API_KEY") is not None)
openai.api_key = os.getenv("OPENAI_API_KEY")


Key found: True


## Prompt Construction

We create a natural language prompt based on the current DataFrame column names.  
This prompt is sent to GPT to generate ideas for analysis and visualization.

In [20]:
# Generate a prompt based on the DataFrame columns
columns = df.columns.tolist()
prompt = f"""
I have a DataFrame with the following columns: {columns}.
Please suggest 3 meaningful analyses, visualizations, or statistical tests I could perform.
Explain each briefly.
"""

## 🤖 GPT Analysis Suggestions

We send the prompt to the OpenAI GPT model (`gpt-3.5-turbo`) and print its response.  
The model suggests possible directions for analysis based on our dataset structure.

In [22]:
# Query GPT-3.5 for suggestions
response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are a helpful data analyst."},
        {"role": "user", "content": prompt}
    ]
)

# Display the suggestions
print(response.choices[0].message["content"])

1. Genre Analysis:
Perform an analysis to identify the most popular genres in terms of average rating and number of votes. This can be presented as a bar chart or a heatmap, showing the average rating and number of votes for each genre. This analysis can help understand audience preferences and the overall quality of content in different genres.

2. Yearly Trends Analysis:
Create a line chart or time series plot to visualize the trends in average ratings over the years. This analysis can help identify any patterns or changes in the quality of content over time. It could also identify any trends in viewer engagement based on the number of votes received each year.

3. Rating vs. Votes Relationship:
Conduct a correlation analysis between average rating and the number of votes to determine if there is any relationship between the two. Scatter plots or a correlation matrix can be used to visualize this relationship. Understanding how rating and votes are related can provide insights into v