# Identifying Conflict Trends in Yemen
## By: Veronica Lopez-Romero

## Introduction
OpenAI's o1 reasoning model is designed for complex problem-solving, data analysis, and image interpretation by simulating a multi-step thought process before generating responses. Unlike traditional GPT models, which produce output in a single pass, reasoning models use internal **reasoning tokens** to explore multiple approaches before finalizing an answer.
<p align="center">
    <img src="https://cdn.openai.com/API/images/guides/reasoning_tokens.png" alt="Reasoning Tokens" width="600">
</p>  

*Source: [OpenAI Reasoning Models Guide](https://platform.openai.com/docs/guides/reasoning)*

**Key Differences: o1 Reasoning Model vs. GPT**
- Multi-step reasoning: o1 evaluates different solutions before selecting the best response.
- Deeper analytical capabilities: Optimized for complex data interpretation tasks.
- Context-aware image analysis: Provides more structured and insightful image descriptions.
- Reasoning Effort Control: Users can adjust the depth of reasoning (`low`, `medium`, `high`).


For more details, refer to the [OpenAI Reasoning Models Guide](https://platform.openai.com/docs/guides/reasoning).


## Purchase and Store API Key

You need to **purchase** your [OpenAI](https://openai.com/) API key and store it securely, such as in **AWS Secrets Manager**.

- **Key Name:** `api_key`  
- **Key Value:** `<your OpenAI API key>`  
- **Secret Name:** `openai`  

## Install Python Libraries

- **openai**: Used to call `o1` and `GPT` models for data analysis and image interpretation.

In [1]:
pip install openai -q

Note: you may need to restart the kernel to use updated packages.


## Import Required Libraries

The following libraries are used in this notebook:

- **boto3**: AWS SDK for Python, used to interact with AWS services.
- **json**: Standard Python library for handling JSON data.
- **IPython.display**: Provides tools to display images, Markdown content, and other rich media in Jupyter Notebook.
- **openai**: Used to call `o1` and `GPT` models for data analysis and image interpretation.
- **pandas**: A powerful library for data manipulation and analysis.
- **pprint**: Pretty prints data structures for better readability.

In [2]:
import boto3
import json
from IPython.display import display, Image, Markdown
from openai import OpenAI
import pandas as pd
from pprint import pprint

## Retrieve API Keys Securely from AWS Secrets Manager

The following function, `get_secret()`, retrieves a secret from **AWS Secrets Manager**. This is a secure way to store and access sensitive credentials, such as API keys, without hardcoding them into the script

In [3]:
def get_secret(secret_name):
    region_name = "us-east-1"

    # Create a Secrets Manager client
    session = boto3.session.Session()
    client = session.client(
        service_name='secretsmanager',
        region_name=region_name
    )

    try:
        get_secret_value_response = client.get_secret_value(
            SecretId=secret_name
        )
    except ClientError as e:
        raise e

    secret = get_secret_value_response['SecretString']
    
    return json.loads(secret)

## Initialize OpenAI Client

The following code initializes the OpenAI client using a securely stored API key retrieved from AWS Secrets Manager.

In [4]:
client = OpenAI(api_key= get_secret('openai')['api_key'])

## Load and Analyze the Yemen Dataset

In [5]:
df = pd.read_excel('Yemen Data.xlsx')
data_json = df.to_json(orient="records")
df.head()

Unnamed: 0,Country,Admin1,Admin2,Admin1 Pcode,Month,Year,Events,Fatalities
0,Yemen,Abyan,Ahwar,YE12,January,2022,2,1
1,Yemen,Ad Dali,Ad Dhalee,YE30,January,2022,2,3
2,Yemen,Aden,Khur Maksar,YE24,January,2022,4,2
3,Yemen,Al Bayda,Az Zahir,YE14,January,2022,1,1
4,Yemen,Al Bayda,Naman,YE14,January,2022,14,10


## Generate Data Analysis Prompt for OpenAI Model

To investigate what **patterns** and **anomalies** are present during 2022-2024, we generate a structured prompt for the OpenAI model. The model will analyze the dataset and generate insights, including **Python code for visualizations**.


In [6]:
data_prompt = f"Based on the provided dataset, what patterns or anomalies do you notice and what might explain any spikes or regional differences? Provide Python-generated charts to support your conclusion. Data: {data_json}"
# print(prompt)

## Define a Function to Get Assistance from OpenAI GPT-4o

The following function, `openai_gpt_help()`, sends a prompt to OpenAI's **GPT-4o model** and returns a response. It also prints the number of tokens used in the request.

In [7]:
def openai_gpt_help(prompt):
    messages = [{"role": "user", "content": prompt}]
    response = client.chat.completions.create(
        model='gpt-4o',
        messages=messages,
        temperature = 0
    )
    token_usage = response.usage
    
    pprint(f"Tokens used: {token_usage}")

    return response.choices[0].message.content

In [8]:
gpt_result = openai_gpt_help(prompt=data_prompt)

('Tokens used: CompletionUsage(completion_tokens=835, prompt_tokens=70916, '
 'total_tokens=71751, '
 'completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, '
 'audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), '
 'prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0))')


In [9]:
display(Markdown(gpt_result))

To analyze the dataset and identify patterns or anomalies, we can use Python libraries such as pandas for data manipulation and matplotlib or seaborn for visualization. Let's start by loading the data into a pandas DataFrame and then create some visualizations to explore the data.

```python
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the data into a pandas DataFrame
data = [
    {"Country": "Yemen", "Admin1": "Abyan", "Admin2": "Ahwar", "Admin1 Pcode": "YE12", "Month": "January", "Year": 2022, "Events": 2, "Fatalities": 1},
    # ... (other data points)
    {"Country": "Yemen", "Admin1": "Lahij", "Admin2": "Al Qabbaytah", "Admin1 Pcode": "YE25", "Month": "December", "Year": 2024, "Events": 2, "Fatalities": 17}
]

df = pd.DataFrame(data)

# Convert Month to a datetime object for easier plotting
df['Date'] = pd.to_datetime(df['Year'].astype(str) + '-' + df['Month'], format='%Y-%B')

# Group by Date and sum the Events and Fatalities
monthly_data = df.groupby('Date').sum().reset_index()

# Plot the number of events over time
plt.figure(figsize=(14, 6))
sns.lineplot(data=monthly_data, x='Date', y='Events', marker='o')
plt.title('Number of Events Over Time')
plt.xlabel('Date')
plt.ylabel('Number of Events')
plt.xticks(rotation=45)
plt.grid(True)
plt.show()

# Plot the number of fatalities over time
plt.figure(figsize=(14, 6))
sns.lineplot(data=monthly_data, x='Date', y='Fatalities', marker='o', color='red')
plt.title('Number of Fatalities Over Time')
plt.xlabel('Date')
plt.ylabel('Number of Fatalities')
plt.xticks(rotation=45)
plt.grid(True)
plt.show()

# Group by Admin1 and sum the Events and Fatalities
regional_data = df.groupby('Admin1').sum().reset_index()

# Plot the number of events by region
plt.figure(figsize=(14, 6))
sns.barplot(data=regional_data, x='Admin1', y='Events')
plt.title('Number of Events by Region')
plt.xlabel('Region')
plt.ylabel('Number of Events')
plt.xticks(rotation=45)
plt.grid(True)
plt.show()

# Plot the number of fatalities by region
plt.figure(figsize=(14, 6))
sns.barplot(data=regional_data, x='Admin1', y='Fatalities', color='red')
plt.title('Number of Fatalities by Region')
plt.xlabel('Region')
plt.ylabel('Number of Fatalities')
plt.xticks(rotation=45)
plt.grid(True)
plt.show()
```

### Observations:

1. **Temporal Patterns**:
   - The line plots for events and fatalities over time can help identify any spikes or trends. Look for months with unusually high numbers of events or fatalities, which might indicate periods of increased conflict or specific incidents.

2. **Regional Differences**:
   - The bar plots for events and fatalities by region (Admin1) can highlight which regions are most affected. Regions with consistently high numbers of events or fatalities might be conflict hotspots.

3. **Anomalies**:
   - Any sudden spikes in events or fatalities could indicate specific incidents or escalations in conflict. These should be investigated further to understand the underlying causes.

4. **Potential Explanations**:
   - Spikes in events or fatalities could be due to military operations, political events, or other socio-political factors. Regional differences might be explained by the presence of active conflict zones or political tensions.

By analyzing these visualizations, you can gain insights into the patterns and anomalies in the dataset, which can help in understanding the conflict dynamics in Yemen.

## Define a Function to Get Assistance from OpenAI o1 Model  

The following function, `openai_o_help()`, sends a prompt to OpenAI's **o1 reasoning model** and returns a response.  

### Key Differences Between o1 and GPT Models:
- **Reasoning Effort**: The o1 model allows users to control reasoning depth using `reasoning_effort` (`low`, `medium`, `high`).  
- **No Temperature Parameter**: Unlike GPT models, **o1 does not support `temperature`**.  
- **Developer Messages Replace System Messages**:  
  - Starting with `o1-2024-12-17`, **developer messages** replace **system messages** to align with chain-of-command behavior.  

### Best Practices for Prompting o1  
- **Keep prompts simple and direct.**  
- **Avoid chain-of-thought prompts.** o1 reasons internally, so step-by-step instructions aren't needed.  
- **Use delimiters for clarity.** Use Markdown, XML tags, or section titles.  
- **Try zero-shot first.** If needed, add few-shot examples that closely match your goal.  
- **Be explicit.** Clearly define success criteria and constraints.  
- **Markdown is disabled by default.** To enable, start with `"Formatting re-enabled"`.  

Source: [OpenAI Reasoning Models Best Practices Guide](https://platform.openai.com/docs/guides/reasoning-best-practices).  


In [10]:
def openai_o_help(prompt):
    messages = [ {"role": "user", "content": prompt}]
    response = client.chat.completions.create(
        model='o1',
        reasoning_effort="high", # low, medium or high
        messages=messages,

    )
    token_usage = response.usage
    
    pprint(f"Tokens used: {token_usage}")

    return response.choices[0].message.content

In [11]:
o1_result = openai_o_help(prompt=data_prompt)

('Tokens used: CompletionUsage(completion_tokens=3265, prompt_tokens=70915, '
 'total_tokens=74180, '
 'completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, '
 'audio_tokens=0, reasoning_tokens=1984, rejected_prediction_tokens=0), '
 'prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0))')


In [12]:
print(o1_result)

Below is an illustrative example of how one might load this dataset into Python, produce a few summary charts, and interpret the major trends. In short:

• Marib (particularly Harib, Al Jubah, Marib City), Shabwah (e.g., Ain, Bayhan, Ataq), and Hajjah (e.g., Haradh, Abs) stand out in early 2022 with large spikes in both the number of conflict events and fatalities.  
• These spikes coincide with well‐known front lines of fighting in Yemen where offensives were especially intense.  
• Through 2022–2023, many governorates (e.g., Taizz, Al Bayda, Al Jawf) continue to see persistent conflict, creating smaller waves of violence.  
• The data suggests conflict often surges where front lines shift or major operations occur—e.g., in Marib and Shabwah during January–February 2022, or in Hajjah (Haradh, Abs) in February 2022.  
• Regional differences often reflect changes in allegiances, local ceasefires, or major offensives, causing sudden spikes in events and fatalities in certain months.

───

# Model Evaluation

**How the model's reasoning supported my analysis:**

The model helped me identify meaningful patterns in the Yemen conflict data by identifying key spikes in violence and connecting them to likely causes, such as front line offensives and regional shifts in control. It also visualized trends over time and by region, which supported my understanding of how conflict intensity changed across months and districts.

**Whether this approach could be applied to real-world intelligence workflows:**

Yes, I believe this approach could support real-world intelligence work by quickly identifying high-risk areas, explaining anomalies, and tracking the effectiveness of ceasefires or peace efforts. Structured reasoning like this helps analysts connect data trends to on the field events and generate useful hypotheses.

**Any limitations or ethical concerns you encountered:**

One limitation I faced was that the model couldn't handle large datasets which could cause a margin of error further impacting the analysis. Another limitation is that the model can only analyze the data it's given. It doesn't know the full real-world context, so it's important not to assume its explanations are always accurate.

## References  
- **OpenAI Reasoning Models Guide**: [OpenAI](https://platform.openai.com/docs/guides/reasoning)  
- **OpenAI Reasoning Models Best Practices Guide**: [OpenAI](https://platform.openai.com/docs/guides/reasoning-best-practices)  
- **Colin Jarvis. “Reasoning with O1.” DeepLearning.AI.** Accessed February 14, 2025. [DeepLearning.AI](https://www.deeplearning.ai/short-courses/reasoning-with-o1/)  