In [1]:
from transformers import pipeline

def extract_companies(text):
    """
    Uses a Hugging Face NER pipeline to find Organizations in text.
    """
    # 1. Initialize the NER pipeline
    # We use 'simple' aggregation to group B-ORG and I-ORG tags together
    ner_pipe = pipeline(
        "ner",
        model="dslim/bert-base-NER",
        aggregation_strategy="simple"
    )

    # 2. Run the model on your news text
    results = ner_pipe(text)

    # 3. Filter for 'ORG' (Organizations) and clean the output
    found_companies = []
    for entity in results:
        if entity['entity_group'] == 'ORG':
            found_companies.append({
                "company": entity['word'],
                "confidence": round(float(entity['score']), 4),
                "start": entity['start'],
                "end": entity['end']
            })

    return found_companies

# --- Example Usage ---
news_headline = """
Microsoft and Alphabet are seeing massive gains in the AI sector,
while Tesla faces production hurdles in Berlin.
Meanwhile, JPMorgan is advising caution on tech stocks.
"""

extracted = extract_companies(news_headline)

print(f"--- Extracted Companies for Portfolio Linking ---")
for item in extracted:
    print(f"Entity: {item['company']:<15} | Confidence: {item['confidence']}")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/829 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/433M [00:00<?, ?B/s]

Some weights of the model checkpoint at dslim/bert-base-NER were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/59.0 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

added_tokens.json:   0%|          | 0.00/2.00 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Device set to use cpu


--- Extracted Companies for Portfolio Linking ---
Entity: Microsoft       | Confidence: 0.9989
Entity: Alphabet        | Confidence: 0.999
Entity: Tesla           | Confidence: 0.9949
Entity: JPMorgan        | Confidence: 0.9988
