## Setup and Imports

## Async Compatibility Setup

**Run this cell first** - Required for Google Colab, Jupyter Notebooks, and VS Code with Jupyter extension:

### Why is this needed?

Interactive environments (Colab, Jupyter) already have an asyncio event loop running. When `bigdata-research-tools` makes async API calls (like to OpenAI), you'll get this error without nest_asyncio:

```
RuntimeError: asyncio.run() cannot be called from a running event loop
```

The `nest_asyncio.apply()` command patches this to allow nested event loops.

💡 **Tip**: If you're unsure which environment you're in, just run the cell below - it won't hurt in any environment!

In [1]:
import datetime
start = datetime.datetime.now()

In [2]:
try:
    import asyncio
    asyncio.get_running_loop()
    import nest_asyncio; nest_asyncio.apply()
    print("✅ nest_asyncio applied")
except (RuntimeError, ImportError):
    print("✅ nest_asyncio not needed")

✅ nest_asyncio applied


## Environment Setup

The following cell configures the necessary path for the analysis

In [3]:
import os
import sys

current_dir = os.getcwd()
if current_dir not in sys.path:
    sys.path.append(current_dir)
print(f"✅ Local environment setup complete")

✅ Local environment setup complete


## Load Credentials

In [4]:
from dotenv import load_dotenv
from pathlib import Path

# script_dir = Path(__file__).parent if '__file__' in globals() else Path.cwd()
# load_dotenv(script_dir / '.env')

load_dotenv('/home/abouchs/.python_env_var/.env')

BIGDATA_USERNAME = os.getenv('BIGDATA_USERNAME')
BIGDATA_PASSWORD = os.getenv('BIGDATA_PASSWORD')
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')

if not all([BIGDATA_USERNAME, BIGDATA_PASSWORD, OPENAI_API_KEY]):
    print("❌ Missing required environment variables")
    raise ValueError("Missing required environment variables. Check your .env file.")
else:
    print("✅ Credentials loaded from .env file")

✅ Credentials loaded from .env file


## Import Required Libraries

Below is the Python code required for setting up our environment and importing necessary libraries.

In [30]:
%load_ext autoreload
%autoreload 2

import pandas as pd

from src.knowledge_graph_manager import *
from src.search_enhanced import *
from src.visuals import *
from src.summary_generator import *
from src.feature_extractor import *


from bigdata_client import Bigdata
from bigdata_client.models.search import DocumentType, SortBy
from bigdata_research_tools.search import run_search
from bigdata_research_tools.excel import ExcelManager

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Optional: Plotly Display Configuration

For better visualization rendering, you can also set the Plotly renderer:

In [19]:
import plotly.io as pio
import plotly.graph_objects as go

try:
    import os
    if 'JUPYTERHUB_SERVICE_PREFIX' in os.environ or 'JPY_SESSION_NAME' in os.environ:
        pio.renderers.default = 'jupyterlab'
        print("✅ Plotly configured for JupyterLab")
    else:
        pio.renderers.default = 'plotly_mimetype+notebook'
        print("✅ Plotly configured for Jupyter/VS Code")
except:
    pio.renderers.default = 'notebook'
    print("✅ Plotly configured with fallback renderer")

interactive_plots = True  # Set to False to generate static plots

✅ Plotly configured for Jupyter/VS Code


## Define Output Paths

We define the output paths for our Trump reelection impact analysis results.

In [20]:
output_dir = "output"
os.makedirs(output_dir, exist_ok=True)

export_path = f"{output_dir}/credit_ratings_monitor.xlsx"

## Connecting to Bigdata

Create a Bigdata object with your credentials.

In [21]:
bigdata = Bigdata(BIGDATA_USERNAME, BIGDATA_PASSWORD)

## Parameters

In [35]:
company_names = ['Boeing Co', 'Verizon Communications']
rating_agencies_names = ['S&P Global', 'Fitch Ratings Inc', "Moody's Corp"]

## Select query parameters
document_limit = 100  # Maximum number of retrieved documents for each day
batch_size = 1 #number of companies to process in each batch
sortby = SortBy.RELEVANCE # Parameter to rank the search results
document_type = DocumentType.NEWS  # Scope of search
start_date_query = '2021-10-01' # Start date
end_date_query = '2024-11-12' # End date

## Select the keyword related to credit ratings
keywords = ['credit rating']

## 1. Portfolio Selection

Define your watchlist starting from the companies name. For the purpose of this example, we select **Boeing Co.** and **Verizon Communications**. We retrieve their Entity ID by leveraging Bigdata.com's Knowledge Graph.

In [33]:
companies = get_entity_ids(company_names)

In [34]:
companies

['55438C', '8A8E41']

Select the list of Credit Rating Agencies (CRAs), such as **S&P**, **Moody's**, and **Fitch**.

In [36]:
rating_agencies = get_entity_ids(rating_agencies_names)

## 2. Hybrid Content Search

Perform content retrieval based on Entity Search and Keyword Search. This process retrieves specific sentences from news data that mention a company from our watchlist, at least one credit rating agency, and the keyword selected, allowing for targeted analysis of relevant content. The resulting data is stored in a DataFrame labeled `df_sentences`, organized to facilitate further exploration and analysis.

In [39]:
from src.search_enhanced import search_enhanced

In [None]:
contextualized_chunks = search_enhanced(
    companies=companies,
    keywords=keywords,
    sentences=None,
    control_entities=rating_agencies,
    start_date=start_date_query,
    end_date=end_date_query,
    scope=document_type,
    freq='D',
    document_limit=100,
    batch_size=1,
    enhance_search=True,
)

About to run 2 queries
Example Query: And(Entity('55438C'), Entity('CFF97C', '65A2CE', '3461CF'), Keyword('credit rating')) over date range: AbsoluteDateRange('2021-10-01T00:00:00', '2021-10-01T23:59:59')


Querying Bigdata...:   9%|▊         | 199/2278 [00:35<02:26, 14.21it/s]