<img src="https://bigdata.com/assets/notebooks/bigdata-by-ravenpack-logo.png" width="300" align="center">
<br>
<br>

# **Daily Top Trending Topics for Crude Oil**

This Jupyter notebook implements an **agentic workflow** based on the content retrieval from BigData API to **identify, verifiy, reindex, and summarize** the specialized news that are **trending topics** for the crude oil market.

The workflow is structured as follows:

**Step 1- Generation of the Lexicon**: Identify the specialized industry-specific jargon relevant to the crude oil market to ensure a high recall in the content retrieval.

**Step 2- Content Retrieval Based on BigData**: Perform a keyword search on the news content with the Bigdata API to retrieve documents, splitting the search over daily timeframes and multi-threading the content search on the individual keywords for speed purpose.

**Step 3- Topic Clustering and Selection**: Perform topic modelling using a large language model to verify and cluster the news. Then, the summarization ensures topic selection identifying the top trending news for crude oil, while deriving advanced analytics to quantify the trendiness (based on news volume), novelty (based on daily changes in summaries), impact and magnitude (based on the financial materiality on crude oil prices) of the trending topics.

**Step 4- Customized Report Generation**: Customize the ranking system of the summarized topics based on their trendiness, novelty, and financial materiality on crude oil prices, and display a daily market update. For verification purpose, the reports are supported by the granular news and sources.

**Output**

1. **Daily Market Reports**: A detailed and visually appealing report summarizing the top trending topics for crude oil, with a customizable ranking system to reindex the news.
2. **Actionable Dataframe**: A timestamped dataframe containing the granular news clustered into relevant topics, and the advanced analytics of trendiness, novelty, impact, and magnitude scores to be potentially used for backtesting purpose.

**Requirements**

- Credentials for the Bigdata API to perform keyword and document searches on news content.
- Credentials for the OpenAI API used in the notebook, this could be substituted with any other LLM.
- A `tools` folder in the same directory as this notebook, containing a Python file named `utils_reports.py` with all required functionalities.
- A `requirements.txt` file listing all the necessary Python libraries and dependencies. We recommend installing these packages in a virtual environment.

# Set-Up

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from src.lexicon_generator import LexiconGenerator

In [3]:
import os
from dotenv import load_dotenv

load_dotenv(os.path.abspath("/home/abouchs/.python_env_var/.env"))

True

In [4]:
BIGDATA_USERNAME = os.getenv("BIGDATA_USERNAME")
BIGDATA_PASSWORD = os.getenv("BIGDATA_PASSWORD")

In [5]:
from bigdata_client import Bigdata
bigdata_cred = Bigdata(BIGDATA_USERNAME, BIGDATA_PASSWORD)

In [6]:
output_dir = f"//home/abouchs/shared/OutputData/abouchs/Bigdata_cookbook/trending_topics/"

# Step 1- Generation of the Lexicon

In this step, we identify the specialized industry-specific jargon relevant to the crude oil market to ensure a high recall in the content retrieval.

In [7]:
main_theme = "Crude Oil"
system_prompt = (
    f"""You are an expert tasked with generating a lexicon of the most important and relevant keywords specific to the given main theme and its related market.

    Your goal is to compile a list of terms that are critical for understanding and analyzing the main theme's market. This lexicon should include only the most essential keywords, phrases, and abbreviations that are directly associated with trading, analysis, logistics, and industry reporting related to the main theme.

    Guidelines:

    1. **Focus on relevance:** Include only the most important and commonly used keywords that are uniquely tied to main theme and its market. These should reflect key concepts, market mechanisms, pricing benchmarks, logistical aspects, and industry-specific terminology.
    2. **Avoid redundancy:** Do not repeat the word of the main theme or its components, such as "Crude" or "Oil" in multiple phrases. Include the main theme only as a standalone term, and focus on other specific terms without redundant repetition.
    3. **Strict exclusion of generic terms:** Exclude any terms that are generic or broadly used in other markets, such as "Arbitrage," "Hedge," "Liquidity," "Spot Price," "Futures Contract," "Backwardation," or "Contango," even if they have a specific meaning in the main theme market. Only include terms that are uniquely relevant to the main theme market and cannot be applied broadly.
    4. **Include specific variations:** Where applicable, provide both the full form and common abbreviations as SEPARATE keywords (e.g., "West Texas Intermediate" and "WTI" or variations like "Brent" and "Brent Crude").
    5. **Ensure clarity:** Each keyword should be concise, clear, and directly relevant to the main theme's market, avoiding any ambiguity.
    6. **Select only the most critical:** There is no need to reach a specific number of keywords. Focus solely on the most crucial terms without padding the list. If fewer keywords meet the criteria, that is acceptable.

    The output should be a lexicon of only the most critical and uniquely relevant keywords related to the main theme market, formatted as a JSON list.
    """
)


In [8]:
try:
    import asyncio
    asyncio.get_running_loop()
    import nest_asyncio; nest_asyncio.apply()
    print("✅ nest_asyncio applied")
except (RuntimeError, ImportError):
    print("✅ nest_asyncio not needed")

✅ nest_asyncio applied


In [9]:
LexiconGenerator = LexiconGenerator(openai_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o", seeds=[123, 123456, 123456789, 456789, 789])

In [10]:
keywords_lex = LexiconGenerator.generate(theme=main_theme, system_prompt=system_prompt)

[LexiconGenerator] Using seeds: [123, 123456, 123456789, 456789, 789]
[LexiconGenerator] Raw response for seed 123: {
    "keywords": [
        "Brent",
        "WTI",
        "OPEC",
        "API Gravity",
        "Sweet Crude",
        "Sour Crude",
        "Crack Spread",
        "Refinery Throughput",
        "Strategic Petroleum Reserve",
        "Barrel",
        "Tanker Rates",
        "Seaborne Trade",
        "Pipeline Capacity",
        "Upstream",
        "Downstream",
        "Midstream",
        "EIA",
        "IEA",
        "Rigs Count",
        "Fracking",
        "Shale Oil",
        "Non-OPEC",
        "Oil Sands",
        "Offshore Drilling",
        "Oilfield Services",
        "Production Sharing Agreement",
        "Exploration and Production",
        "Hydrocarbon Reserves",
        "Crude Assay",
        "Spot Charter"
    ]
}
[LexiconGenerator] Parsed JSON for seed 123: {'keywords': ['Brent', 'WTI', 'OPEC', 'API Gravity', 'Sweet Crude', 'Sour Crude', 'Crack Spre

In [11]:
keywords_lex

['Brent',
 'WTI',
 'OPEC',
 'API Gravity',
 'Sweet Crude',
 'Sour Crude',
 'Crack Spread',
 'Refinery Throughput',
 'Strategic Petroleum Reserve',
 'Barrel',
 'Tanker Rates',
 'Seaborne Trade',
 'Pipeline Capacity',
 'Upstream',
 'Downstream',
 'Midstream',
 'EIA',
 'IEA',
 'Rigs Count',
 'Fracking',
 'Shale Oil',
 'Non-OPEC',
 'Oil Sands',
 'Offshore Drilling',
 'Oilfield Services',
 'Production Sharing Agreement',
 'Exploration and Production',
 'Hydrocarbon Reserves',
 'Crude Assay',
 'Spot Charter',
 'Refinery Margin',
 'Light Crude',
 'Heavy Crude',
 'NYMEX',
 'ICE',
 'Petrochemicals',
 'Onshore Drilling',
 'Oil Rig',
 'Pipeline',
 'Floating Storage',
 'Distillation',
 'Hydrocracking',
 'Desulfurization',
 'Oil Benchmark',
 'Price Differential',
 'Production Cut',
 'Geopolitical Risk',
 'Brent Crude',
 'West Texas Intermediate',
 'OPEC+',
 'Refinery Utilization',
 'SPR',
 'Bbl',
 'Energy Information Administration',
 'International Energy Agency',
 'Seaborne Crude',
 'Bitumen',
 '

In [12]:
len(keywords_lex)

81

# Step 2- Content Retrieval Based on Bigdata

In this section, we perform a keyword search on the news content with the Bigdata API to retrieve documents, splitting the search over daily timeframes and multi-threading the content search on the individual keywords for speed purpose. The user can define the time range below to generate daily reports between the start and end dates.

In [13]:
start_query = '2025-06-21'
end_query = '2025-06-28'

In [14]:
from src.search_topics import search_by_keywords
results, daily_keyword_count = search_by_keywords(
    keywords=keywords_lex,
    start_date=start_query,
    end_date=end_query,
    freq='D',
    document_limit=10)

About to run 648 queries
Example Query: Keyword('Brent') over date range: AbsoluteDateRange('2025-06-21T00:00:00', '2025-06-21T23:59:59')


Querying Bigdata...: 100%|██████████| 648/648 [02:15<00:00,  4.77it/s]


In [15]:
results

Unnamed: 0,timestamp,rp_document_id,headline,chunk_number,sentence_id,source_id,source_name,text,keyword,date
0,2025-06-21 00:00:00+00:00,96EC510A80FAD37155CD1543681E8CC1,"Morning Briefing: June 21, 2025",9.0,96EC510A80FAD37155CD1543681E8CC1-9,CE1ADC,Anadolu Agency,- Top European diplomats emphasized the urgenc...,Geopolitical Risk,2025-06-21
1,2025-06-21 00:00:00+00:00,0B4844B85EA092ED4AB50AB3534AB360,BW Energy: Update on Fixed Income Investor Mee...,3.0,0B4844B85EA092ED4AB50AB3534AB360-3,923B93,Financial Services Monitor Worldwide,BW Energy is a growth E&P company with a diffe...,E&P,2025-06-21
2,2025-06-21 00:00:00+00:00,45A5868267C8A445F6B696C495AA8673,Aduro Clean Technologies Announces Closing of ...,7.0,45A5868267C8A445F6B696C495AA8673-7,D051D6,Global Data Point,About Aduro Clean Technologies\nAduro Clean Te...,Bitumen,2025-06-21
3,2025-06-21 00:00:00+00:00,45A5868267C8A445F6B696C495AA8673,Aduro Clean Technologies Announces Closing of ...,1.0,45A5868267C8A445F6B696C495AA8673-1,D051D6,Global Data Point,(GlobeNewswire) - Aduro Clean Technologies Inc...,Bitumen,2025-06-21
4,2025-06-21 00:00:00+00:00,96EC510A80FAD37155CD1543681E8CC1,"Morning Briefing: June 21, 2025",11.0,96EC510A80FAD37155CD1543681E8CC1-11,CE1ADC,Anadolu Agency,China is the world's largest oil importer. In ...,Energy Information Administration,2025-06-21
...,...,...,...,...,...,...,...,...,...,...
9432,2025-06-28 23:50:36+00:00,DF667772C27122819087C82C1D54C3DD,The Strategic Empire: Debt & the Dollar,55.0,DF667772C27122819087C82C1D54C3DD-55,EC0C87,Michael Hudson,"Yes, someday the United States cannot get a fr...",OPEC+,2025-06-28
9433,2025-06-28 23:50:36+00:00,DF667772C27122819087C82C1D54C3DD,The Strategic Empire: Debt & the Dollar,46.0,DF667772C27122819087C82C1D54C3DD-46,EC0C87,Michael Hudson,The United States is unwilling to annul Global...,OPEC+,2025-06-28
9434,2025-06-28 23:50:36+00:00,DF667772C27122819087C82C1D54C3DD,The Strategic Empire: Debt & the Dollar,47.0,DF667772C27122819087C82C1D54C3DD-47,EC0C87,Michael Hudson,"However, the condition for letting OPEC countr...",OPEC+,2025-06-28
9435,2025-06-28 23:54:30+00:00,83B358B6C8498F0C6A4F19F9F88A07A6,An animated example of ice-core drilling in An...,1.0,83B358B6C8498F0C6A4F19F9F88A07A6-1,905C54,ABC Online,An animated example of ice-core drilling in An...,ICE,2025-06-28


# Step 3- Topic Clustering and Selection

In this step, we perform topic modelling using a large language model to verify and cluster the news. Then, the summarization ensures topic selection identifying the top trending news for crude oil, while deriving advanced analytics to quantify the trendiness (based on news volume), novelty (based on daily changes in summaries), impact and magnitude (based on the financial materiality on crude oil prices) of the trending topics.

Before performing the topic clustering, we apply a verification layer to remove the news that are not relative to the oil market

In [16]:
model = "gpt-4o-mini" 
api_key = os.getenv("OPENAI_API_KEY")

In [17]:
from src.topics_extractor import process_all_reports
semaphore_size = 1000

# Assuming unique_reports is your DataFrame
filtered_reports = process_all_reports(results, model, api_key, main_theme, semaphore_size)

Filtering News:   0%|          | 0/9437 [00:00<?, ?it/s]

In this cell, we leverage a LLM to perform topic modeling, identifying and clustering the key topics from the news reports.

In [18]:
filtered_reports

Unnamed: 0,timestamp,rp_document_id,headline,chunk_number,sentence_id,source_id,source_name,text,keyword,date
0,2025-06-21 00:00:00+00:00,45A5868267C8A445F6B696C495AA8673,Aduro Clean Technologies Announces Closing of ...,7.0,45A5868267C8A445F6B696C495AA8673-7,D051D6,Global Data Point,About Aduro Clean Technologies\nAduro Clean Te...,Bitumen,2025-06-21
1,2025-06-21 00:00:00+00:00,45A5868267C8A445F6B696C495AA8673,Aduro Clean Technologies Announces Closing of ...,1.0,45A5868267C8A445F6B696C495AA8673-1,D051D6,Global Data Point,(GlobeNewswire) - Aduro Clean Technologies Inc...,Bitumen,2025-06-21
2,2025-06-21 00:00:00+00:00,E2D405BFF0426528C258C62EC5773E36,Nine creates new executive roles as it combine...,6.0,E2D405BFF0426528C258C62EC5773E36-6,346656,Executive Appointments Worldwide,The company is recruiting for new positions in...,Brent,2025-06-21
3,2025-06-21 00:00:00+00:00,C47A2AB4D8E9C02F3720F56FC500DE76,Nine Announces New Leadership and Plans to Gro...,4.0,C47A2AB4D8E9C02F3720F56FC500DE76-4,346656,Executive Appointments Worldwide,Sport is still a key part of Nine and the bigg...,Brent,2025-06-21
4,2025-06-21 00:00:00+00:00,5DB7156ACE3425F4F61B3BE3E882AF0D,Premium Alcohol Market Forecasts Report 2025-2...,3.0,5DB7156ACE3425F4F61B3BE3E882AF0D-3,923B93,Financial Services Monitor Worldwide,Market Trends Growing Disposable Income and Ur...,Energy Information Administration,2025-06-21
...,...,...,...,...,...,...,...,...,...,...
3749,2025-06-28 23:33:44+00:00,4C596AE14BED3F44AB5D9442B44F1F5F,Better Dividend Stock: Kinder Morgan vs. Enter...,1.0,4C596AE14BED3F44AB5D9442B44F1F5F-1,648085,AOL.com,If you are looking at Kinder Morgan (NYSE: KMI...,Midstream,2025-06-28
3750,2025-06-28 23:37:13+00:00,99083749AA42E83A4352A4002BC9479C,"3 days after Kullu flash flood, missing teen's...",2.0,99083749AA42E83A4352A4002BC9479C-2,80FC03,The Times Of India,They were swept away from a hydropower project...,Downstream,2025-06-28
3751,2025-06-28 23:42:59+00:00,EF9DB0616299C83C5CF64CE5B32D2744,Iran signals openness to transfers of enriched...,3.0,EF9DB0616299C83C5CF64CE5B32D2744-3,7018D1,Charlotte Observer,"However, he stressed that Iran would not renou...",Liftings,2025-06-28
3752,2025-06-28 23:48:49+00:00,456DB9D74C8F0B01626EE845F8FF4CA6,Caught on camera: Car literally drives through...,2.0,456DB9D74C8F0B01626EE845F8FF4CA6-2,E54C73,ABC News,"""This is like a movie or something,"" Patel sai...",Barrel,2025-06-28


In [19]:
from src.topics_extractor import run_process_all_trending_topics
flattened_trending_topics_df = run_process_all_trending_topics(
    unique_reports=filtered_reports,
    model=model,
    start_query=start_query,
    end_query=end_query,
    api_key=os.environ['OPENAI_API_KEY'],
    main_theme = main_theme,
    batches = 20
)


Extracting Topics for 2025-06-21:   0%|          | 0/19 [00:00<?, ?it/s]

Extracting Topics for 2025-06-22:   0%|          | 0/20 [00:00<?, ?it/s]

Extracting Topics for 2025-06-23:   0%|          | 0/20 [00:00<?, ?it/s]

Extracting Topics for 2025-06-24:   0%|          | 0/20 [00:00<?, ?it/s]

Extracting Topics for 2025-06-25:   0%|          | 0/20 [00:00<?, ?it/s]

Extracting Topics for 2025-06-26:   0%|          | 0/20 [00:00<?, ?it/s]

Extracting Topics for 2025-06-27:   0%|          | 0/20 [00:00<?, ?it/s]

Extracting Topics for 2025-06-28:   0%|          | 0/20 [00:00<?, ?it/s]

Consolidating topics...


Consolidating topic batches: 100%|██████████| 9/9 [00:19<00:00,  2.12s/it]


{'Geopolitical Tensions Impacting Oil Prices': ['topic_1', 'topic_5', 'topic_14', 'topic_40', 'topic_50', 'topic_265', 'topic_277', 'topic_278', 'topic_287', 'topic_309', 'topic_312', 'topic_322', 'topic_330', 'topic_335', 'topic_339'], 'Geopolitical Risks Impacting Oil Supply': ['topic_4', 'topic_15', 'topic_48'], 'Iran-Israel Conflict and Geopolitical Risks': ['topic_7', 'topic_20', 'topic_39'], 'OPEC+ Production Decisions': ['topic_3', 'topic_13', 'topic_28', 'topic_22', 'topic_215', 'topic_218', 'topic_222', 'topic_238', 'topic_246'], 'Impact of Sanctions on Oil Markets': ['topic_8'], 'Market Recovery and Oil Price Trends': ['topic_6', 'topic_16', 'topic_23', 'topic_49'], 'Climate Change and Fossil Fuel Exploration': ['topic_2'], "Brazil's Fossil Fuel Development Plans": ['topic_9'], "OPEC's Role in Global Oil Markets": ['topic_10', 'topic_30'], 'Challenges Facing Major Refiners': ['topic_12', 'topic_34', 'topic_46'], 'U.S. Military Involvement in Middle East Conflicts': ['topic_26

Summarizing topics: 100%|██████████| 31/31 [00:06<00:00,  5.15it/s]
Generating titles: 100%|██████████| 32/32 [00:05<00:00,  5.53it/s]


Generating Day in Review summaries...
Adding one-line summaries to DataFrame...


Generating text summaries: 100%|██████████| 1274/1274 [00:13<00:00, 91.05it/s] 


In [22]:
flattened_trending_topics_df

Unnamed: 0,Date,Day_in_Review,Topic,Summary,Source,Headline,Text,Volume_Score,Topic_labels,Text_Summary
0,2025-06-21,- **Geopolitical Tensions**: Escalating confli...,Israel-Iran Tensions Drive Oil Price Volatilit...,"Geopolitical tensions, particularly between Is...",Livemint,Russia's Top Oil Executive Says OPEC Was Astu...,It's been a turbulent week in the global oil m...,1,Geopolitical Tensions Impacting Oil Prices,Recent military exchanges between Israel and I...
1,2025-06-21,- **Geopolitical Tensions**: Escalating confli...,OPEC+ Adjusts Production Strategies Amid Clima...,OPEC+ production decisions are increasingly in...,InsideClimate News,Scientists' Letter Urges Brazil's President Lu...,"""Our two main tasks are to eliminate fossil fu...",2,OPEC+ Production Decisions,A call to halt fossil fuel exploration and use...
2,2025-06-21,- **Geopolitical Tensions**: Escalating confli...,OPEC+ Adjusts Production Strategies Amid Clima...,OPEC+ production decisions are increasingly in...,InsideClimate News,Scientists' Letter Urges Brazil's President Lu...,"""I took the letter to the head of the Brazilia...",2,OPEC+ Production Decisions,Australian climate scientist Bill Hare engages...
3,2025-06-21,- **Geopolitical Tensions**: Escalating confli...,"OPEC+ Boosts Oil Production by 411,000 Barrels...",The impact of sanctions on oil markets is sign...,Yahoo! Finance,Russia's Top Oil Executive Says OPEC+ Was Astu...,"""The decision by OPEC+ leaders to raise produc...",2,Impact of Sanctions on Oil Markets,OPEC+ leaders' decision to increase oil produc...
4,2025-06-21,- **Geopolitical Tensions**: Escalating confli...,"OPEC+ Boosts Oil Production by 411,000 Barrels...",The impact of sanctions on oil markets is sign...,Yahoo! Finance,Russia's Top Oil Executive Says OPEC+ Was Astu...,(Bloomberg) -- Steps taken by the OPEC+ group ...,2,Impact of Sanctions on Oil Markets,OPEC+'s decision to increase oil production by...
...,...,...,...,...,...,...,...,...,...,...
1132,2025-06-27,"- OPEC+ increased oil production by 411,000 ba...","OPEC+ Boosts Oil Production by 411,000 Barrels...",The impact of sanctions on oil markets is sign...,Business Times (Singapore),Oil steadies after report of planned Opec+ Aug...,"""The report about an Opec increase came out an...",3,Impact of Sanctions on Oil Markets,OPEC+'s decision to increase oil production by...
1142,2025-06-27,"- OPEC+ increased oil production by 411,000 ba...",OPEC+ Output Hikes Drive Oil Market Volatility...,CME Group plays a pivotal role in oil futures ...,Investing.com via Web,Oil steadies after report of planned OPEC+ Aug...,Oil steadies after report of planned OPEC+ Aug...,2,CME Group's Role in Oil Futures Trading,OPEC+'s planned output hike for August contrib...
1143,2025-06-27,"- OPEC+ increased oil production by 411,000 ba...",OPEC+ Output Hikes Drive Oil Market Volatility...,CME Group plays a pivotal role in oil futures ...,MT Newswires,Oil Rig Count Falls by 6; Crude Set for Sharp ...,"""This supply expansion comes at a time when th...",2,CME Group's Role in Oil Futures Trading,OPEC+ output hikes amid a challenging global m...
1225,2025-06-28,"- OPEC+ increased oil production by 411,000 ba...","OPEC+ Boosts Oil Production by 411,000 Barrels...",The impact of sanctions on oil markets is sign...,FOX Business,Why summer gas prices are at a four-year low d...,"Since there was no disruption to oil supply, o...",2,Impact of Sanctions on Oil Markets,OPEC+'s decision to boost oil production by 41...


**Trendiness and Novelty Scores**: We derive analytics related to the trendiness of the topic based on the news volume, and the novelty of the topic based on the changes in daily summaries, evaluating the uniqueness and freshness of each topic.

In [23]:
from src.topics_extractor import run_add_advanced_novelty_scores
# Calculate trendiness and novelty scores, assessing the uniqueness and freshness of each topic
flattened_trending_topics_df = run_add_advanced_novelty_scores(flattened_trending_topics_df, api_key = os.environ['OPENAI_API_KEY'], main_theme = main_theme)

Calculating Novelty Scores:   0%|          | 0/56 [00:00<?, ?it/s]

In [24]:
flattened_trending_topics_df

Unnamed: 0,Date,Day_in_Review,Topic,Summary,Source,Headline,Text,Volume_Score,Topic_labels,Text_Summary,Novelty_Score
0,2025-06-21,- **Geopolitical Tensions**: Escalating confli...,Israel-Iran Tensions Drive Oil Price Volatilit...,"Geopolitical tensions, particularly between Is...",Livemint,Russia's Top Oil Executive Says OPEC Was Astu...,It's been a turbulent week in the global oil m...,1,Geopolitical Tensions Impacting Oil Prices,Recent military exchanges between Israel and I...,New
79,2025-06-21,- **Geopolitical Tensions**: Escalating confli...,Israel-Iran Tensions Drive Oil Price Volatilit...,"Geopolitical tensions, particularly between Is...",The Economic Times,Oil prices fall as US delays decision on direc...,Oil prices fell on Friday after the White Hous...,2,Geopolitical Tensions Impacting Oil Prices,Oil prices declined due to the White House's p...,New
78,2025-06-21,- **Geopolitical Tensions**: Escalating confli...,Israel-Iran Tensions Drive Oil Price Volatilit...,"Geopolitical tensions, particularly between Is...",The Economic Times,Oil prices fall as US delays decision on direc...,Iran is OPEC's third-largest producer.\nBrent ...,2,Geopolitical Tensions Impacting Oil Prices,Iran's position as OPEC's third-largest produc...,New
77,2025-06-21,- **Geopolitical Tensions**: Escalating confli...,OPEC+ Output Hikes Drive Oil Market Volatility...,CME Group plays a pivotal role in oil futures ...,Sharenet,Russia's Sechin: OPEC+ oil output rise justifi...,Russia's Sechin: OPEC+ oil output rise justifi...,3,CME Group's Role in Oil Futures Trading,Russia's Sechin defends OPEC+ output increases...,New
76,2025-06-21,- **Geopolitical Tensions**: Escalating confli...,OPEC+ Output Hikes Drive Oil Market Volatility...,CME Group plays a pivotal role in oil futures ...,The Economic Times,Oil prices fall as US delays decision on direc...,Iran is OPEC's third-largest producer.\nBrent ...,3,CME Group's Role in Oil Futures Trading,Iran's position as OPEC's third-largest produc...,New
...,...,...,...,...,...,...,...,...,...,...,...
1142,2025-06-27,"- OPEC+ increased oil production by 411,000 ba...",OPEC+ Output Hikes Drive Oil Market Volatility...,CME Group plays a pivotal role in oil futures ...,Investing.com via Web,Oil steadies after report of planned OPEC+ Aug...,Oil steadies after report of planned OPEC+ Aug...,2,CME Group's Role in Oil Futures Trading,OPEC+'s planned output hike for August contrib...,New
1027,2025-06-27,"- OPEC+ increased oil production by 411,000 ba...",Geopolitical Tensions Drive Oil Price Volatili...,"The recent geopolitical tensions, particularly...",EconoTimes.com,Oil Prices Steady as Markets See Low Risk of S...,"Despite the initial oil price surge, traders a...",5,Technological Innovations in Oil Recovery,Traders' confidence in stable global oil suppl...,Old
1143,2025-06-27,"- OPEC+ increased oil production by 411,000 ba...",OPEC+ Output Hikes Drive Oil Market Volatility...,CME Group plays a pivotal role in oil futures ...,MT Newswires,Oil Rig Count Falls by 6; Crude Set for Sharp ...,"""This supply expansion comes at a time when th...",2,CME Group's Role in Oil Futures Trading,OPEC+ output hikes amid a challenging global m...,New
1225,2025-06-28,"- OPEC+ increased oil production by 411,000 ba...","OPEC+ Boosts Oil Production by 411,000 Barrels...",The impact of sanctions on oil markets is sign...,FOX Business,Why summer gas prices are at a four-year low d...,"Since there was no disruption to oil supply, o...",2,Impact of Sanctions on Oil Markets,OPEC+'s decision to boost oil production by 41...,New


**Financial Materiality**: We derive analytics related to the impact (Positive, Negative) and magnitude (High, Medium, Low) of the topics, inferring their  market impact on crude oil prices. The inference is based on the price mechanisms involving supply and demand dynamics, geopolitical factors among others.

In [25]:
from src.topics_extractor import add_market_impact_to_df

point_of_view = "a crude oil trader, where price is influenced by supply-demand dynamics, geopolitical events, and market sentiment. \
As a trader, you focus on changes in production, inventories, and economic indicators from key markets."


flattened_trending_topics_df = add_market_impact_to_df(flattened_trending_topics_df, api_key = os.environ['OPENAI_API_KEY'], main_theme = main_theme, point_of_view = point_of_view)

We display the results of topic modeling and summarization. The **Topic** column represents the themes inferred through topic clustering using a LLM, which groups the news articles based on their content and underlying themes. The **Summary** provides a synthesized overview of all news articles within the same topic, offering a high-level view of the key messages for each cluster. The **Topic** is then rephrased into a concise form based on the summary. The **Text_Summary** provides a detailed summary of each individualchunk, capturing its core message.

In [26]:
flattened_trending_topics_df.head(5)

Unnamed: 0,Date,Day_in_Review,Topic,Summary,Source,Headline,Text,Volume_Score,Text_Summary,Volume_Score.1,Novelty_Score,Impact_Score,Magnitude_Score
0,2025-06-21,- **Geopolitical Tensions**: Escalating confli...,Israel-Iran Tensions Drive Oil Price Volatilit...,"Geopolitical tensions, particularly between Is...",Livemint,Russia's Top Oil Executive Says OPEC Was Astu...,It's been a turbulent week in the global oil m...,1,Recent military exchanges between Israel and I...,1,New,Positive,High
1,2025-06-21,- **Geopolitical Tensions**: Escalating confli...,OPEC+ Adjusts Production Strategies Amid Clima...,OPEC+ production decisions are increasingly in...,InsideClimate News,Scientists' Letter Urges Brazil's President Lu...,"""Our two main tasks are to eliminate fossil fu...",2,A call to halt fossil fuel exploration and use...,2,New,Negative,High
2,2025-06-21,- **Geopolitical Tensions**: Escalating confli...,OPEC+ Adjusts Production Strategies Amid Clima...,OPEC+ production decisions are increasingly in...,InsideClimate News,Scientists' Letter Urges Brazil's President Lu...,"""I took the letter to the head of the Brazilia...",2,Australian climate scientist Bill Hare engages...,2,New,Negative,High
3,2025-06-21,- **Geopolitical Tensions**: Escalating confli...,"OPEC+ Boosts Oil Production by 411,000 Barrels...",The impact of sanctions on oil markets is sign...,Yahoo! Finance,Russia's Top Oil Executive Says OPEC+ Was Astu...,"""The decision by OPEC+ leaders to raise produc...",2,OPEC+ leaders' decision to increase oil produc...,2,New,Negative,High
4,2025-06-21,- **Geopolitical Tensions**: Escalating confli...,"OPEC+ Boosts Oil Production by 411,000 Barrels...",The impact of sanctions on oil markets is sign...,Yahoo! Finance,Russia's Top Oil Executive Says OPEC+ Was Astu...,(Bloomberg) -- Steps taken by the OPEC+ group ...,2,OPEC+'s decision to increase oil production by...,2,New,Negative,High


For verification purpose, this actionable timestamped dataframe contains the granular news clustered into relevant topics, and also the advanced analytics of trendiness, novelty, impact, and magnitude scores to be potentially used for backtesting.

# Step 4- Customized Report Generation

In this step, we rank the topics, allowing the user to customize the ranking system to reindex the news, based on their trendiness, novelty, and financial materiality on crude oil prices. We finally display a daily market update, supported by the corresponding granular news and sources for verification purpose.

The user selects the date for the report summarizing the top trending topics, and customizes the ranking system to prioritize the topics based on volume (trendiness and media attention), novelty (based on the emergence of new daily news), impact direction (positive or negative), and magnitude (financial materiality). The ranking system prioritizes the criteria in the order specified by the user, allowing for a tailored focus on the most relevant aspects of the data.

The order in which the criteria are listed in user_selected_ranking determines their priority for ranking the topics within the report. The first criterion in the list has the highest priority, followed by the second, and then the third. The user can customize the ranking by choosing to prioritize impact direction (positive or negative), novelty, magnitude, or volume, and has the flexibility to select 1, 2, or all 3 criteria based on their specific needs.

In [71]:
from src.topics_extractor import clean_text
from IPython.display import display
from IPython.core.display import HTML
specific_date = '2025-06-23'  # Example date, can be modified as needed

# Applying the cleaning function to the text in your DataFrame before rendering
flattened_trending_topics_df['Summary'] = flattened_trending_topics_df['Summary'].apply(clean_text)
flattened_trending_topics_df['Day_in_Review'] = flattened_trending_topics_df['Day_in_Review'].apply(clean_text)
flattened_trending_topics_df['Text_Summary'] = flattened_trending_topics_df['Text_Summary'].apply(clean_text)
flattened_trending_topics_df['Topic'] = flattened_trending_topics_df['Topic'].apply(clean_text)

user_selected_ranking = ['novelty', 'volume', 'magnitude']  # User can modify this list to change the ranking order

#impact_filter = 'positive_impact' #User can use the impact_filter to filter out the report
from src.topics_extractor import prepare_data_for_report, generate_html_report
prepared_reports = prepare_data_for_report(flattened_trending_topics_df, user_selected_ranking, impact_filter = None, report_date = specific_date)

# Generate and display the HTML report for each date
for report in prepared_reports:
    html_content = generate_html_report(
        report['date'],
        report['day_in_review'],
        report['topics'],
        'Daily crude oil market update'  # Pass the main theme to dynamically generate the title
    )
    display(HTML(html_content))
    print("")
    print("")
    print("")




