<img src="https://bigdata.com/assets/notebooks/bigdata-by-ravenpack-logo.png" width="300" align="center">
<br>
<br>

# **Daily Top Trending Topics for Crude Oil**

This Jupyter notebook implements an **agentic workflow** based on the content retrieval from BigData API to **identify, verifiy, reindex, and summarize** the specialized news that are **trending topics** for the crude oil market.

The workflow is structured as follows:

**Step 1- Generation of the Lexicon**: Identify the specialized industry-specific jargon relevant to the crude oil market to ensure a high recall in the content retrieval.

**Step 2- Content Retrieval Based on BigData**: Perform a keyword search on the news content with the Bigdata API to retrieve documents, splitting the search over daily timeframes and multi-threading the content search on the individual keywords for speed purpose.

**Step 3- Topic Clustering and Selection**: Perform topic modelling using a large language model to verify and cluster the news. Then, the summarization ensures topic selection identifying the top trending news for crude oil, while deriving advanced analytics to quantify the trendiness (based on news volume), novelty (based on daily changes in summaries), impact and magnitude (based on the financial materiality on crude oil prices) of the trending topics.

**Step 4- Customized Report Generation**: Customize the ranking system of the summarized topics based on their trendiness, novelty, and financial materiality on crude oil prices, and display a daily market update. For verification purpose, the reports are supported by the granular news and sources.

**Output**

1. **Daily Market Reports**: A detailed and visually appealing report summarizing the top trending topics for crude oil, with a customizable ranking system to reindex the news.
2. **Actionable Dataframe**: A timestamped dataframe containing the granular news clustered into relevant topics, and the advanced analytics of trendiness, novelty, impact, and magnitude scores to be potentially used for backtesting purpose.

**Requirements**

- Credentials for the Bigdata API to perform keyword and document searches on news content.
- Credentials for the OpenAI API used in the notebook, this could be substituted with any other LLM.
- A `tools` folder in the same directory as this notebook, containing a Python file named `utils_reports.py` with all required functionalities.
- A `requirements.txt` file listing all the necessary Python libraries and dependencies. We recommend installing these packages in a virtual environment.

# Set-Up

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
from src.lexicon_generator import LexiconGenerator

In [3]:
import os
from dotenv import load_dotenv

load_dotenv(os.path.abspath("/home/abouchs/.python_env_var/.env"))

True

In [4]:
BIGDATA_USERNAME = os.getenv("BIGDATA_USERNAME")
BIGDATA_PASSWORD = os.getenv("BIGDATA_PASSWORD")

In [5]:
from bigdata_client import Bigdata
bigdata_cred = Bigdata(BIGDATA_USERNAME, BIGDATA_PASSWORD)

In [6]:
output_dir = f"//home/abouchs/shared/OutputData/abouchs/Bigdata_cookbook/trending_topics/"

# Step 1- Generation of the Lexicon

In this step, we identify the specialized industry-specific jargon relevant to the crude oil market to ensure a high recall in the content retrieval.

In [7]:
main_theme = "Crude Oil"
system_prompt = (
    f"""You are an expert tasked with generating a lexicon of the most important and relevant keywords specific to the given main theme and its related market.

    Your goal is to compile a list of terms that are critical for understanding and analyzing the main theme's market. This lexicon should include only the most essential keywords, phrases, and abbreviations that are directly associated with trading, analysis, logistics, and industry reporting related to the main theme.

    Guidelines:

    1. **Focus on relevance:** Include only the most important and commonly used keywords that are uniquely tied to main theme and its market. These should reflect key concepts, market mechanisms, pricing benchmarks, logistical aspects, and industry-specific terminology.
    2. **Avoid redundancy:** Do not repeat the word of the main theme or its components, such as "Crude" or "Oil" in multiple phrases. Include the main theme only as a standalone term, and focus on other specific terms without redundant repetition.
    3. **Strict exclusion of generic terms:** Exclude any terms that are generic or broadly used in other markets, such as "Arbitrage," "Hedge," "Liquidity," "Spot Price," "Futures Contract," "Backwardation," or "Contango," even if they have a specific meaning in the main theme market. Only include terms that are uniquely relevant to the main theme market and cannot be applied broadly.
    4. **Include specific variations:** Where applicable, provide both the full form and common abbreviations as SEPARATE keywords (e.g., "West Texas Intermediate" and "WTI" or variations like "Brent" and "Brent Crude").
    5. **Ensure clarity:** Each keyword should be concise, clear, and directly relevant to the main theme's market, avoiding any ambiguity.
    6. **Select only the most critical:** There is no need to reach a specific number of keywords. Focus solely on the most crucial terms without padding the list. If fewer keywords meet the criteria, that is acceptable.

    The output should be a lexicon of only the most critical and uniquely relevant keywords related to the main theme market, formatted as a JSON list.
    """
)


In [8]:
try:
    import asyncio
    asyncio.get_running_loop()
    import nest_asyncio; nest_asyncio.apply()
    print("✅ nest_asyncio applied")
except (RuntimeError, ImportError):
    print("✅ nest_asyncio not needed")

✅ nest_asyncio applied


In [9]:
LexiconGenerator = LexiconGenerator(openai_key=os.getenv("OPENAI_API_KEY"), model="gpt-4o", seeds=[123, 123456, 123456789, 456789, 789])

In [10]:
keywords_lex = LexiconGenerator.generate(theme=main_theme, system_prompt=system_prompt)

[LexiconGenerator] Using seeds: [123, 123456, 123456789, 456789, 789]
[LexiconGenerator] Raw response for seed 123: {
    "keywords": [
        "Brent",
        "Brent Crude",
        "WTI",
        "West Texas Intermediate",
        "OPEC",
        "Organization of the Petroleum Exporting Countries",
        "API Gravity",
        "Sweet Crude",
        "Sour Crude",
        "Crack Spread",
        "Refinery Margin",
        "Barrel",
        "Light Crude",
        "Heavy Crude",
        "ULSD",
        "Ultra-Low Sulfur Diesel",
        "EIA",
        "Energy Information Administration",
        "DOE",
        "Department of Energy",
        "Cushing",
        "Strategic Petroleum Reserve",
        "Bakken",
        "Permian Basin",
        "North Sea Oil",
        "Middle East Crude",
        "Asian Crude",
        "Crude Assay",
        "Tanker Rates",
        "Pipeline Tariff",
        "Floating Storage",
        "Seaborne Crude",
        "Crude Export Ban",
        "Shale Oil",
 

In [11]:
keywords_lex

['Brent',
 'Brent Crude',
 'WTI',
 'West Texas Intermediate',
 'OPEC',
 'Organization of the Petroleum Exporting Countries',
 'API Gravity',
 'Sweet Crude',
 'Sour Crude',
 'Crack Spread',
 'Refinery Margin',
 'Barrel',
 'Light Crude',
 'Heavy Crude',
 'ULSD',
 'Ultra-Low Sulfur Diesel',
 'EIA',
 'Energy Information Administration',
 'DOE',
 'Department of Energy',
 'Cushing',
 'Strategic Petroleum Reserve',
 'Bakken',
 'Permian Basin',
 'North Sea Oil',
 'Middle East Crude',
 'Asian Crude',
 'Crude Assay',
 'Tanker Rates',
 'Pipeline Tariff',
 'Floating Storage',
 'Seaborne Crude',
 'Crude Export Ban',
 'Shale Oil',
 'Bitumen',
 'Oil Sands',
 'Fracking',
 'Hydraulic Fracturing',
 'Oil Rig Count',
 'Drilling Rig',
 'Offshore Drilling',
 'Onshore Drilling',
 'Oilfield Services',
 'Upstream',
 'Midstream',
 'Downstream',
 'Petrochemicals',
 'Refinery Capacity',
 'Oil Benchmark',
 'Crude Oil Inventory',
 'OPEC+',
 'Sulfur Content',
 'Crude Oil Benchmarks',
 'Crude Oil Refining',
 'Crude O

In [12]:
len(keywords_lex)

116

# Step 2- Content Retrieval Based on Bigdata

In this section, we perform a keyword search on the news content with the Bigdata API to retrieve documents, splitting the search over daily timeframes and multi-threading the content search on the individual keywords for speed purpose. The user can define the time range below to generate daily reports between the start and end dates.

In [13]:
start_query = '2025-06-21'
end_query = '2025-06-28'

In [None]:
from src.search_topics import search_by_keywords
results, daily_keyword_count = search_by_keywords(
    keywords=keywords_lex,
    start_date=start_query,
    end_date=end_query,
    freq='D',
    document_limit=10)

About to run 928 queries
Example Query: Keyword('Brent') over date range: AbsoluteDateRange('2025-06-21T00:00:00', '2025-06-21T23:59:59')


Querying Bigdata...:  78%|███████▊  | 724/928 [02:36<00:36,  5.55it/s]

In [None]:
results

Unnamed: 0,timestamp,rp_document_id,headline,chunk_number,sentence_id,source_id,source_name,text,keyword,date
0,2025-06-21 00:00:00+00:00,B795BAAD72F92F10BA5DB89D56DBD7DD,Former Fulton County Deputy Sheriff Charged wi...,5.0,B795BAAD72F92F10BA5DB89D56DBD7DD-5,BC923D,Legal Monitor Worldwide,She was indicted by a federal grand jury seate...,Brent,2025-06-21
1,2025-06-21 00:00:00+00:00,86E659AE53391BE3F217C0EC3B031F97,Peter Dey Announces Retirement from Gran Tierr...,3.0,86E659AE53391BE3F217C0EC3B031F97-3,346656,Executive Appointments Worldwide,About Gran Tierra Energy Inc.\nGran Tierra Ene...,Exploration and Production,2025-06-21
2,2025-06-21 00:00:00+00:00,9E72438557B0F58D96E6DAA5279BD0EF,Trafigura Backs Euro Sun's Rovina Valley with ...,9.0,9E72438557B0F58D96E6DAA5279BD0EF-9,923B93,Financial Services Monitor Worldwide,Shares in Euro Sun have tripled this year to C...,Upstream,2025-06-21
3,2025-06-21 00:00:00+00:00,E2D405BFF0426528C258C62EC5773E36,Nine creates new executive roles as it combine...,6.0,E2D405BFF0426528C258C62EC5773E36-6,346656,Executive Appointments Worldwide,The company is recruiting for new positions in...,Brent,2025-06-21
4,2025-06-21 00:00:00+00:00,C47A2AB4D8E9C02F3720F56FC500DE76,Nine Announces New Leadership and Plans to Gro...,4.0,C47A2AB4D8E9C02F3720F56FC500DE76-4,346656,Executive Appointments Worldwide,Sport is still a key part of Nine and the bigg...,Brent,2025-06-21
...,...,...,...,...,...,...,...,...,...,...
9055,2025-06-28 23:37:13+00:00,99083749AA42E83A4352A4002BC9479C,"3 days after Kullu flash flood, missing teen's...",2.0,99083749AA42E83A4352A4002BC9479C-2,80FC03,The Times Of India,They were swept away from a hydropower project...,Downstream,2025-06-28
9057,2025-06-28 23:48:49+00:00,456DB9D74C8F0B01626EE845F8FF4CA6,Caught on camera: Car literally drives through...,2.0,456DB9D74C8F0B01626EE845F8FF4CA6-2,E54C73,ABC News,"""This is like a movie or something,"" Patel sai...",Barrel,2025-06-28
9058,2025-06-28 23:50:36+00:00,DF667772C27122819087C82C1D54C3DD,The Strategic Empire: Debt & the Dollar,46.0,DF667772C27122819087C82C1D54C3DD-46,EC0C87,Michael Hudson,The United States is unwilling to annul Global...,OPEC+,2025-06-28
9059,2025-06-28 23:50:36+00:00,DF667772C27122819087C82C1D54C3DD,The Strategic Empire: Debt & the Dollar,55.0,DF667772C27122819087C82C1D54C3DD-55,EC0C87,Michael Hudson,"Yes, someday the United States cannot get a fr...",OPEC+,2025-06-28


# Step 3- Topic Clustering and Selection

In this step, we perform topic modelling using a large language model to verify and cluster the news. Then, the summarization ensures topic selection identifying the top trending news for crude oil, while deriving advanced analytics to quantify the trendiness (based on news volume), novelty (based on daily changes in summaries), impact and magnitude (based on the financial materiality on crude oil prices) of the trending topics.

Before performing the topic clustering, we apply a verification layer to remove the news that are not relative to the oil market

In [None]:
model = "gpt-4o-mini" 
api_key = os.getenv("OPENAI_API_KEY")

In [None]:
from src.topics_extractor import process_all_reports
semaphore_size = 1000

# Assuming unique_reports is your DataFrame
filtered_reports = process_all_reports(results, model, api_key, main_theme, semaphore_size)

Filtering News:   0%|          | 0/7478 [00:00<?, ?it/s]

In this cell, we leverage a LLM to perform topic modeling, identifying and clustering the key topics from the news reports.

In [None]:
filtered_reports

Unnamed: 0,timestamp,rp_document_id,headline,chunk_number,sentence_id,source_id,source_name,text,keyword,date
0,2025-06-21 00:00:00+00:00,B795BAAD72F92F10BA5DB89D56DBD7DD,Former Fulton County Deputy Sheriff Charged wi...,5.0,B795BAAD72F92F10BA5DB89D56DBD7DD-5,BC923D,Legal Monitor Worldwide,She was indicted by a federal grand jury seate...,Brent,2025-06-21
1,2025-06-21 00:00:00+00:00,86E659AE53391BE3F217C0EC3B031F97,Peter Dey Announces Retirement from Gran Tierr...,3.0,86E659AE53391BE3F217C0EC3B031F97-3,346656,Executive Appointments Worldwide,About Gran Tierra Energy Inc.\nGran Tierra Ene...,Exploration and Production,2025-06-21
2,2025-06-21 00:00:00+00:00,9E72438557B0F58D96E6DAA5279BD0EF,Trafigura Backs Euro Sun's Rovina Valley with ...,9.0,9E72438557B0F58D96E6DAA5279BD0EF-9,923B93,Financial Services Monitor Worldwide,Shares in Euro Sun have tripled this year to C...,Upstream,2025-06-21
3,2025-06-21 00:00:00+00:00,E2D405BFF0426528C258C62EC5773E36,Nine creates new executive roles as it combine...,6.0,E2D405BFF0426528C258C62EC5773E36-6,346656,Executive Appointments Worldwide,The company is recruiting for new positions in...,Brent,2025-06-21
4,2025-06-21 00:00:00+00:00,C47A2AB4D8E9C02F3720F56FC500DE76,Nine Announces New Leadership and Plans to Gro...,4.0,C47A2AB4D8E9C02F3720F56FC500DE76-4,346656,Executive Appointments Worldwide,Sport is still a key part of Nine and the bigg...,Brent,2025-06-21
...,...,...,...,...,...,...,...,...,...,...
3847,2025-06-28 23:33:44+00:00,42F7600A56C081D0A03B719DAF6A339A,3 Magnificent S&P 500 Dividend Stocks Down 25%...,5.0,42F7600A56C081D0A03B719DAF6A339A-5,648085,AOL.com,A lot of fuel to continue growing\nOneok's sto...,Midstream,2025-06-28
3848,2025-06-28 23:33:44+00:00,42F7600A56C081D0A03B719DAF6A339A,3 Magnificent S&P 500 Dividend Stocks Down 25%...,6.0,42F7600A56C081D0A03B719DAF6A339A-6,648085,AOL.com,Oneok's durable midstream business model has e...,Midstream,2025-06-28
3849,2025-06-28 23:33:44+00:00,4C596AE14BED3F44AB5D9442B44F1F5F,Better Dividend Stock: Kinder Morgan vs. Enter...,1.0,4C596AE14BED3F44AB5D9442B44F1F5F-1,648085,AOL.com,If you are looking at Kinder Morgan (NYSE: KMI...,Midstream,2025-06-28
3850,2025-06-28 23:37:13+00:00,99083749AA42E83A4352A4002BC9479C,"3 days after Kullu flash flood, missing teen's...",2.0,99083749AA42E83A4352A4002BC9479C-2,80FC03,The Times Of India,They were swept away from a hydropower project...,Downstream,2025-06-28


In [None]:
from src.topics_extractor import run_process_all_trending_topics
flattened_trending_topics_df = run_process_all_trending_topics(
    unique_reports=filtered_reports,
    model=model,
    start_query=start_query,
    end_query=end_query,
    api_key=os.environ['OPENAI_API_KEY'],
    main_theme = main_theme,
    batches = 20
)


Extracting Topics for 2025-06-21:   0%|          | 0/20 [00:00<?, ?it/s]

Extracting Topics for 2025-06-22:   0%|          | 0/20 [00:00<?, ?it/s]

Consolidating topic batches:   0%|          | 0/9 [25:21<?, ?it/s]
Task exception was never retrieved
future: <Task finished name='Task-22580' coro=<tqdm_asyncio.gather.<locals>.wrap_awaitable() done, defined at /opt/anaconda/envs/cookbook_legacy/lib/python3.9/site-packages/tqdm/asyncio.py:75> exception=UnboundLocalError("local variable 'json' referenced before assignment")>
Traceback (most recent call last):
  File "/opt/anaconda/envs/cookbook_legacy/lib/python3.9/asyncio/tasks.py", line 256, in __step
    result = coro.send(None)
  File "/opt/anaconda/envs/cookbook_legacy/lib/python3.9/site-packages/tqdm/asyncio.py", line 76, in wrap_awaitable
    return i, await f
  File "/home/abouchs/git/bigdata-cookbook/Trending_Topics_Crude_Oil/src/topics_extractor.py", line 631, in consolidate_trending_topics
    }
UnboundLocalError: local variable 'json' referenced before assignment
Task exception was never retrieved
future: <Task finished name='Task-22581' coro=<tqdm_asyncio.gather.<locals>.w

Extracting Topics for 2025-06-23:   0%|          | 0/20 [00:00<?, ?it/s]

Extracting Topics for 2025-06-24:   0%|          | 0/20 [00:00<?, ?it/s]

Extracting Topics for 2025-06-25:   0%|          | 0/20 [00:00<?, ?it/s]

Extracting Topics for 2025-06-26:   0%|          | 0/20 [00:00<?, ?it/s]

Extracting Topics for 2025-06-27:   0%|          | 0/20 [00:00<?, ?it/s]

Extracting Topics for 2025-06-28:   0%|          | 0/20 [00:00<?, ?it/s]

Consolidating topics...




JSON decode error: string indices must be integers
Raw content: {
    "consolidated_topics": {
        "Global Oil Demand Trends": [
            "topic_351",
            "topic_358",
            "topic_386",
            "topic_390",
            "topic_377"
        ],
        "Geopolitical Influence on Oil Supply": [
            "topic_352",
            "topic_370",
            "topic_363",
            "topic_376",
            "topic_378",
            "topic_400"
        ],
        "Oil Price Trends and Market Dynamics": [
            "topic_367",
            "topic_372",
            "topic_395",
            "topic_353"
        ],
        "US Oil Inventory and Production Trends": [
            "topic_361",
            "topic_374",
            "topic_388",
            "topic_369",
            "topic_381"
        ],
        "US Strategic Petroleum Reserve Issues": [
            "topic_356",
            "topic_396"
        ],
        "Technological Advancements in Oil Production": [
      

TypeError: string indices must be integers

JSON decode error: string indices must be integers
Raw content: {
    "consolidated_topics": {
        "Geopolitical Tensions Impacting Oil": [
            "topic_301",
            "topic_310",
            "topic_319",
            "topic_330",
            "topic_336"
        ],
        "OPEC Production Strategies": [
            "topic_303",
            "topic_324",
            "topic_328",
            "topic_343"
        ],
        "US Crude Oil Inventory Dynamics": [
            "topic_304",
            "topic_309",
            "topic_313",
            "topic_331",
            "topic_337"
        ],
        "China's Crude Oil Demand and Supply": [
            "topic_306",
            "topic_318",
            "topic_333"
        ],
        "Impact of Sanctions on Oil Exports": [
            "topic_317",
            "topic_320",
            "topic_338"
        ],
        "Market Reactions to Oil Price Movements": [
            "topic_325",
            "topic_342"
        ],
        "US

In [None]:
flattened_trending_topics_df

Unnamed: 0,Date,Day_in_Review,Topic,Summary,Source,Headline,Text,Volume_Score,Topic_labels,Text_Summary
0,2025-06-21,"- Geopolitical tensions, especially between Ir...",Geopolitical Tensions Drive Oil Prices Toward ...,"The ongoing geopolitical tensions, particularl...",Philippine Daily Inquirer via Web,Next week's oil price hike seen exceeding P5 p...,Iran has previously threatened to close the st...,2,consolidated_topics,Escalating tensions between Iran and Israel ha...
1,2025-06-21,"- Geopolitical tensions, especially between Ir...",Geopolitical Tensions Drive Oil Prices Toward ...,"The ongoing geopolitical tensions, particularl...",Charlotte Observer,"Sheltering in bunker, Iran's supreme leader na...",Iran appears to have overcome its initial shoc...,2,consolidated_topics,Iran's escalating military actions against Isr...
7,2025-06-21,"- Geopolitical tensions, especially between Ir...",Geopolitical Tensions Drive Oil Prices Toward ...,"The ongoing geopolitical tensions, particularl...",MDPI,Price Forecasting of Crude Oil Using Hybrid Ma...,These future directions aim to refine the crud...,1,consolidated_topics,Enhanced crude oil forecasting models integrat...
8,2025-06-21,"- Geopolitical tensions, especially between Ir...",Geopolitical Tensions Drive Oil Prices Toward ...,"The ongoing geopolitical tensions, particularl...",Indian Express,World's biggest banks increased fossil fuel fi...,"""This growth in fossil fuel finance is troubli...",1,consolidated_topics,Increased fossil fuel financing poses a long-t...
10,2025-06-21,"- Geopolitical tensions, especially between Ir...",Geopolitical Tensions Drive Oil Prices Toward ...,"The ongoing geopolitical tensions, particularl...",Yahoo! Finance,The Weekend: Markets on edge as Trump ponders ...,"Away from the Middle East bloodshed, investors...",2,consolidated_topics,Central banks maintain rates amid geopolitical...
...,...,...,...,...,...,...,...,...,...,...
1637,2025-06-28,"- Geopolitical tensions, especially between Ir...",Geopolitical Tensions Drive Oil Prices Toward ...,"The ongoing geopolitical tensions, particularl...",EconoTimes.com,"California Weighs Fuel Import Boost, Pauses Re...",The commission also urged Governor Gavin Newso...,1,consolidated_topics,"California's declining crude oil production, f..."
1638,2025-06-28,"- Geopolitical tensions, especially between Ir...",Geopolitical Tensions Drive Oil Prices Toward ...,"The ongoing geopolitical tensions, particularl...",Polymerupdate via Web,Crude oil stabilizes amid diminishing geopolit...,"Meanwhile, OPEC+ members have pledged to fully...",3,consolidated_topics,"OPEC+ plans to increase oil production by 411,..."
1639,2025-06-28,"- Geopolitical tensions, especially between Ir...",Geopolitical Tensions Drive Oil Prices Toward ...,"The ongoing geopolitical tensions, particularl...",Polymerupdate via Web,Crude oil stabilizes amid diminishing geopolit...,Steady demand increase\nIn its latest monthly ...,3,consolidated_topics,OPEC+ forecasts a steady increase in global cr...
1640,2025-06-28,"- Geopolitical tensions, especially between Ir...",Geopolitical Tensions Drive Oil Prices Toward ...,"The ongoing geopolitical tensions, particularl...",Polymerupdate via Web,Crude oil stabilizes amid diminishing geopolit...,Fundamental shift\nOil traders are gradually s...,3,consolidated_topics,Oil traders are shifting focus to fundamentals...


**Trendiness and Novelty Scores**: We derive analytics related to the trendiness of the topic based on the news volume, and the novelty of the topic based on the changes in daily summaries, evaluating the uniqueness and freshness of each topic.

In [23]:
from src.generic_utils_reports import run_process_all_trending_topics_legacy
flattened_trending_topics_df_legacy = run_process_all_trending_topics_legacy(
    unique_reports=filtered_reports,
    model=model,
    start_query=start_query,
    end_query=end_query,
    api_key=os.environ['OPENAI_API_KEY'],
    main_theme = main_theme,
    batchs = 20
)

Creating date ranges from 2025-06-21 00:00:00 to 2025-06-28 00:00:00 with frequency 'D'


Extracting Topics for 2025-06-21:   0%|          | 0/20 [00:00<?, ?it/s]

Extracting Topics for 2025-06-22:   0%|          | 0/20 [00:00<?, ?it/s]

Extracting Topics for 2025-06-23:   0%|          | 0/20 [00:00<?, ?it/s]

Extracting Topics for 2025-06-24:   0%|          | 0/20 [00:00<?, ?it/s]

Extracting Topics for 2025-06-25:   0%|          | 0/20 [00:00<?, ?it/s]

Extracting Topics for 2025-06-26:   0%|          | 0/20 [00:00<?, ?it/s]

Extracting Topics for 2025-06-27:   0%|          | 0/20 [00:00<?, ?it/s]

Extracting Topics for 2025-06-28:   0%|          | 0/20 [00:00<?, ?it/s]

Consolidating topics...


SyntaxError: EOL while scanning string literal (<string>, line 2401)

In [None]:
# Calculate trendiness and novelty scores, assessing the uniqueness and freshness of each topic
flattened_trending_topics_df = run_add_advanced_novelty_scores(flattened_trending_topics_df, api_key = os.environ['OPENAI_API_KEY'], main_theme = main_theme)

Calculating Novelty Scores:   0%|          | 0/180 [00:00<?, ?it/s]

**Financial Materiality**: We derive analytics related to the impact (Positive, Negative) and magnitude (High, Medium, Low) of the topics, inferring their  market impact on crude oil prices. The inference is based on the price mechanisms involving supply and demand dynamics, geopolitical factors among others.

In [None]:
point_of_view = "a crude oil trader, where price is influenced by supply-demand dynamics, geopolitical events, and market sentiment. \
As a trader, you focus on changes in production, inventories, and economic indicators from key markets."


flattened_trending_topics_df = add_market_impact_to_df(flattened_trending_topics_df, api_key = os.environ['OPENAI_API_KEY'], main_theme = main_theme, point_of_view = point_of_view)

We display the results of topic modeling and summarization. The **Topic** column represents the themes inferred through topic clustering using a LLM, which groups the news articles based on their content and underlying themes. The **Summary** provides a synthesized overview of all news articles within the same topic, offering a high-level view of the key messages for each cluster. The **Topic** is then rephrased into a concise form based on the summary. The **Text_Summary** provides a detailed summary of each individualchunk, capturing its core message.

In [None]:
flattened_trending_topics_df.head(5)

Unnamed: 0,Date,Day_in_Review,Topic,Summary,Source,Headline,Text,Volume_Score,Text_Summary,Volume_Score.1,Novelty_Score,Impact_Score,Magnitude_Score
0,2024-07-25,- **U.S. Crude Inventories Decline**: U.S. cru...,U.S. Crude Oil Inventories Plunge 3.7 Million ...,Recent reports from the U.S. Energy Informatio...,Klse I3investor.com,PublicInvest Research Headlines - 25 Jul 2024,Workers are now having a harder time finding j...,5,U.S. crude oil inventories unexpectedly droppe...,5,New,Negative,High
3,2024-07-25,- **U.S. Crude Inventories Decline**: U.S. cru...,U.S. Crude Oil Inventories Plunge 3.7 Million ...,Recent reports from the U.S. Energy Informatio...,Financial Express via Web,"Will Nifty hold 24,200 as markets see time & p...","The US Dollar Index (DXY), which measures the ...",5,A slight decline in the US Dollar Index coinci...,5,New,Negative,High
4,2024-07-25,- **U.S. Crude Inventories Decline**: U.S. cru...,U.S. Crude Oil Inventories Plunge 3.7 Million ...,Recent reports from the U.S. Energy Informatio...,RTTNews via Web,Taiwan Shares Tipped To Open In The Red,"In economic news, the Commerce Department unex...",5,U.S. crude oil prices rose following a surpris...,5,New,Negative,High
5,2024-07-25,- **U.S. Crude Inventories Decline**: U.S. cru...,US Crude Inventory Decline Fuels Fluctuations ...,Brent and West Texas Intermediate (WTI) crude ...,Livemint,Indian stock market: 7 key things that changed...,Oil Prices Crude oil prices traded lower. Bren...,4,Recent declines in Brent and WTI crude oil pri...,4,New,Positive,High
6,2024-07-25,- **U.S. Crude Inventories Decline**: U.S. cru...,US Crude Inventory Decline Fuels Fluctuations ...,Brent and West Texas Intermediate (WTI) crude ...,Business Standard via Web,Market outlook July 25: Global sell-off hints ...,The US 10-year bond yield quoted around 4.266 ...,4,Recent fluctuations in Brent and WTI oil price...,4,New,Positive,High


For verification purpose, this actionable timestamped dataframe contains the granular news clustered into relevant topics, and also the advanced analytics of trendiness, novelty, impact, and magnitude scores to be potentially used for backtesting.

# Step 4- Customized Report Generation

In this step, we rank the topics, allowing the user to customize the ranking system to reindex the news, based on their trendiness, novelty, and financial materiality on crude oil prices. We finally display a daily market update, supported by the corresponding granular news and sources for verification purpose.

The user selects the date for the report summarizing the top trending topics, and customizes the ranking system to prioritize the topics based on volume (trendiness and media attention), novelty (based on the emergence of new daily news), impact direction (positive or negative), and magnitude (financial materiality). The ranking system prioritizes the criteria in the order specified by the user, allowing for a tailored focus on the most relevant aspects of the data.

The order in which the criteria are listed in user_selected_ranking determines their priority for ranking the topics within the report. The first criterion in the list has the highest priority, followed by the second, and then the third. The user can customize the ranking by choosing to prioritize impact direction (positive or negative), novelty, magnitude, or volume, and has the flexibility to select 1, 2, or all 3 criteria based on their specific needs.

In [None]:
specific_date = '2024-07-25'

# Applying the cleaning function to the text in your DataFrame before rendering
flattened_trending_topics_df['Summary'] = flattened_trending_topics_df['Summary'].apply(clean_text)
flattened_trending_topics_df['Day_in_Review'] = flattened_trending_topics_df['Day_in_Review'].apply(clean_text)
flattened_trending_topics_df['Text_Summary'] = flattened_trending_topics_df['Text_Summary'].apply(clean_text)
flattened_trending_topics_df['Topic'] = flattened_trending_topics_df['Topic'].apply(clean_text)

user_selected_ranking = ['novelty', 'volume', 'magnitude']  # User can modify this list to change the ranking order

#impact_filter = 'positive_impact' #User can use the impact_filter to filter out the report

prepared_reports = prepare_data_for_report(flattened_trending_topics_df, user_selected_ranking, impact_filter = None, report_date = specific_date)

# Generate and display the HTML report for each date
for report in prepared_reports:
    html_content = generate_html_report(
        report['date'],
        report['day_in_review'],
        report['topics'],
        'Daily crude oil market update'  # Pass the main theme to dynamically generate the title
    )
    display(HTML(html_content))
    print("")
    print("")
    print("")




