#NER stands for Named Entity Recognition.

It is a Natural Language Processing (NLP) technique used to automatically identify and classify named entities in text into predefined categories such as:

1) Person names (e.g., Albert Einstein)
2. Organizations (e.g., Google)
   
3. Locations (e.g., Paris)

4.  Dates (e.g., 6th October 2025)

5.  Miscellaneous entities (e.g., currency, percentages, product names)

##Example:
Input: "Apple was founded by Steve Jobs in California in 1976."

###NER Output:

    Apple → Organization
    Steve Jobs → Person
    California → Location
    1976 → Date

In short, NER helps machines “understand” specific entities in text, which is crucial for tasks like information extraction, question answering, and chatbots.

#NER Spacy OpenAi-Integration

It is the combination of SpaCy’s fast Named Entity Recognition (NER) with OpenAI’s GPT models to detect, classify, and extract entities from text, allowing both structured (SpaCy) and flexible/custom/context-aware (OpenAI) entity recognition in a single workflow.

In short: SpaCy for speed and structure, OpenAI for context and customization.

In [1]:
import openai # Import the openai library for interacting with the OpenAI API.
import spacy # Import the spacy library for natural language processing tasks, specifically Named Entity Recognition (NER).
!pip install openai==0.28 # Install a specific version of the openai library (version 0.28).



In [2]:
# Load Spacy NER model
nlp = spacy.load("en_core_web_sm") # Load the English small model for spaCy, which includes a NER component.


# Predefined list of stock market-related entities (extendable)
stock_entities = ["Apple", "Google", "Microsoft", "Tesla", "NASDAQ", "Dow Jones", "S&P 500", "Bitcoin", "Ethereum"] # Define a list of entities related to the stock market to be recognized.

In [3]:
def analyze_sentiment(review, category):
    # Construct a prompt for the OpenAI API to perform sentiment analysis.
    prompt = f"Analyze the sentiment of the following {category} statement in the context of stock market performance. \
    Classify it as Positive (bullish), Negative (bearish), or Neutral:\n\nStatement: {review}"

    # Make a call to the OpenAI Chat Completion API to get the sentiment.
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo", # Specify the model to use for the API call.
        messages=[
            {"role": "system", "content": "You are a sentiment analysis assistant specialized in financial markets."}, # Define the role and behavior of the AI assistant.
            {"role": "user", "content": prompt}, # Provide the user's input (the prompt).
        ]
    )

    # Extract the sentiment classification from the API response.
    sentiment = response['choices'][0]['message']['content']
    return sentiment.strip() # Return the extracted sentiment, removing leading/trailing whitespace.

In [4]:
 #Function to perform Named Entity Recognition (NER)
def extract_entities(review):
    doc = nlp(review) # Process the input text using the loaded spaCy model to create a Doc object.
    entities = [(ent.text, ent.label_) for ent in doc.ents] # Extract entities and their labels from the Doc object.

    # Additional filtering for financial entities
    matched_stocks = [ent for ent in entities if ent[0] in stock_entities] # Filter the extracted entities to find those present in the predefined stock_entities list.
    return matched_stocks if matched_stocks else entities  # Return the matched stock entities if any are found, otherwise return all extracted entities.

In [12]:
# Main function to perform sentiment analysis and NER
def main():
    # Get input from user
    category = input("Enter the category (e.g., Stock, Index, Crypto, Economy, Other): ").capitalize() # Prompt the user to enter a category and capitalize the input.
    review = input(f"Enter your market statement related to {category.lower()}: ") # Prompt the user to enter a market statement related to the chosen category.

    if review: # Check if the user entered a statement.
        print("\nPerforming Stock Market Sentiment Analysis...\n") # Inform the user that sentiment analysis is being performed.
        sentiment_with_contributions = analyze_sentiment(review, category) # Call the analyze_sentiment function to get the sentiment.
        print(f"Sentiment Analysis Result: {sentiment_with_contributions}") # Print the sentiment analysis result.

        print("\nPerforming Named Entity Recognition (NER)...\n") # Inform the user that NER is being performed.
        entities = extract_entities(review) # Call the extract_entities function to get the entities.
        print(f"Identified Market Entities: {entities}") # Print the identified entities.
    else:
        print("Please enter a valid statement.") # Prompt the user to enter a valid statement if the input was empty.

# Run the main function
main() # Execute the main function.

Enter the category (e.g., Stock, Index, Crypto, Economy, Other): Crypto
Enter your market statement related to crypto: Bitcoin dipped slightly to $123,745 after hitting an all-time high, yet maintained strong weekly gains. Robust ETF inflows and institutional demand fuel this rally, with experts anticipating further upward movement. Despite minor corrections, the overall crypto market remains bullish, driven by rising confidence.

Performing Stock Market Sentiment Analysis...

Sentiment Analysis Result: This statement can be classified as Positive (bullish) in the context of stock market performance. The mention of Bitcoin maintaining strong weekly gains, robust ETF inflows, institutional demand, experts anticipating further upward movement, and rising confidence in the overall crypto market all point to a positive outlook for the market.

Performing Named Entity Recognition (NER)...

Identified Market Entities: [('123,745', 'MONEY'), ('weekly', 'DATE')]


In [None]:
#############################################################################################################################################################################################

In [13]:
!pip install PyPDF2 # Install the PyPDF2 library for working with PDF files.

Collecting PyPDF2
  Downloading pypdf2-3.0.1-py3-none-any.whl.metadata (6.8 kB)
Downloading pypdf2-3.0.1-py3-none-any.whl (232 kB)
[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/232.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━[0m [32m225.3/232.6 kB[0m [31m8.8 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m232.6/232.6 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: PyPDF2
Successfully installed PyPDF2-3.0.1


In [None]:
# ====================================================
# 🔧 Install Dependencies
# ====================================================
!pip install -q openai PyPDF2 # Install the openai and PyPDF2 libraries silently.

# ====================================================
# 📚 Import Libraries
# ====================================================
import openai # Import the openai library.
import PyPDF2 # Import the PyPDF2 library.
from google.colab import files # Import the files module from google.colab for file uploads.
from google.colab import userdata # Import userdata to access secrets

# ====================================================
# 🔑 Set your OpenAI API key from Secrets Manager
# ====================================================
# Access the API key stored in Colab's Secrets Manager
# Make sure you have added a secret named 'OPENAI_API_KEY'
try:
    openai.api_key = userdata.get('OPENAI_API_KEY') # Attempt to retrieve the OpenAI API key from Colab's Secrets Manager.
    if not openai.api_key: # Check if the API key was successfully retrieved.
        raise ValueError("OpenAI API key not found in Colab Secrets. Please add it using the 🔑 icon.") # Raise a ValueError if the API key is not found.
except Exception as e: # Catch any exceptions that occur during the process.
    print(f"Error accessing Secrets Manager: {e}") # Print an error message.
    # You might want to handle this error more gracefully,
    # e.g., prompt the user to add the key.
    raise # Re-raise the exception after printing the message


# ====================================================
# 📂 Function to upload PDF file
# ====================================================
def upload_pdf():
    print("📁 Please upload your research PDF file:") # Prompt the user to upload a PDF file.
    uploaded = files.upload() # Use google.colab.files.upload() to handle the file upload.
    file_name = next(iter(uploaded)) # Get the name of the uploaded file.
    print(f"✅ Uploaded: {file_name}") # Confirm the file upload with the file name.
    return file_name # Return the name of the uploaded file.

# ====================================================
# 📄 Function to extract text from PDF
# ====================================================
def extract_text_from_pdf(file_path):
    text = "" # Initialize an empty string to store the extracted text.
    with open(file_path, "rb") as pdf_file: # Open the PDF file in binary read mode.
        reader = PyPDF2.PdfReader(pdf_file) # Create a PdfReader object.
        for page in reader.pages: # Iterate through each page in the PDF.
            text += page.extract_text() or "" # Extract text from the current page and append it to the text string. Use or "" to handle pages with no text.
    return text.strip() # Return the extracted text, removing leading/trailing whitespace.

# ====================================================
# 🤖 Function to extract author name using OpenAI
# ====================================================
def extract_author_from_text(research_text):
    # Define the prompt for the OpenAI API to extract the author name.
    prompt = f"""
    You are a research paper analyzer.
    Extract only the author name(s) from the text below.
    If not found, respond with "Author not found".

    Research text:
    \"\"\"{research_text[:5000]}\"\"\"
    """
    # Make a call to the OpenAI Chat Completion API to get the author name.
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo", # Specify the model to use.
        messages=[{"role": "user", "content": prompt}], # Provide the user's input (the prompt).
        temperature=0 # Set the temperature to 0 for deterministic output.
    )
    return response.choices[0].message["content"].strip() # Extract and return the author name from the API response.

# ====================================================
# 📘 Extract Title and Abstract
# ====================================================
def extract_title_abstract_from_text(research_text):
    # Define the prompt for the OpenAI API to extract the title and abstract.
    prompt = f"""
    You are a research paper analyzer.
    From the following research text, extract:
    - The title of the paper
    - The abstract of the paper

    Return the output in a structured JSON format with keys: title, abstract.
    If a title or abstract is not found, use "Not found" for that key.

    Research text:
    \"\"\"{research_text[:5000]}\"\"\"
    """
    # Make a call to the OpenAI Chat Completion API to get the title and abstract.
    response = openai.ChatCompletion.create(
        model="gpt-3.5-turbo", # Specify the model to use.
        messages=[{"role": "user", "content": prompt}], # Provide the user's input (the prompt).
        temperature=0 # Set the temperature to 0 for deterministic output.
    )
    return response.choices[0].message["content"].strip() # Extract and return the title and abstract in JSON format from the API response.

# ====================================================
# 🚀 Main Function
# ====================================================
def main():
    file_path = upload_pdf() # Call the upload_pdf function to get the file path of the uploaded PDF.
    print("\n📖 Extracting text from PDF...\n") # Inform the user that text extraction is in progress.
    research_text = extract_text_from_pdf(file_path) # Call the extract_text_from_pdf function to get the text from the PDF.

    print("🔍 Extracting author details using OpenAI...\n") # Inform the user that author extraction is in progress.
    author_info = extract_author_from_text(research_text) # Call the extract_author_from_text function to get the author information.
    print(f"🧾 Extracted Author(s): {author_info}") # Print the extracted author information.

    print("\n🔍 Extracting title and abstract using OpenAI...\n") # Inform the user that title and abstract extraction is in progress.
    title_abstract_info = extract_title_abstract_from_text(research_text) # Call the extract_title_abstract_from_text function to get the title and abstract.
    print(f"📄 Extracted Title and Abstract:\n{title_abstract_info}") # Print the extracted title and abstract.

# ====================================================
# ▶️ Run
# ====================================================
main() # Execute the main function to start the process.

📁 Please upload your research PDF file:


Saving Research paper.pdf to Research paper (9).pdf
✅ Uploaded: Research paper (9).pdf

📖 Extracting text from PDF...

🔍 Extracting author details using OpenAI...

🧾 Extracted Author(s): Author: Mariusz Kruk

🔍 Extracting title and abstract using OpenAI...

📄 Extracted Title and Abstract:
{
    "title": "A look at advanced learners’ use of mobile devices for English language study: Insights from interview data",
    "abstract": "The paper discusses the results of a study which explored advanced learners of English engagement with their mobile devices to develop learning experiences that meet their needs and goals as foreign language learners. The data were collected from 20 students by means of a semi-structured interview. The gathered data were subjected to qualitative and quantitative analysis. The results of the study demonstrated that, on the one hand, some subjects manifested heightened awareness relating to the advantageous role of mobile devices in their learning endeavors, thei