______
# Introduction: Natural Language Processing in Finance
______
When building an NLP pipeline for finance, high-quality textual data such as financial news, earnings call transcipts and analyst commentary is required to feed into models. Below are said accessible sources:

## Free Earnings Call Transcripts
- MarketBeat: free transcripts for S&P 500 earnings calls, available publicly via their website
- Seeking Alpha: comprehensive transcripts including slides and audio (may require sign up)
- Quartr: mobile app offering live and recorded earnings call transcripts with global coverage

## Financial News and Commentary APIs
- MarketAux: free tier API for real-time financial news across 5,000 sources
- FinnHub: free real-time news, along with market data like fundamentals and forex
- FinancialNewsAPI: aggregates millions of financial news articles with free access
- Financial Modelling Prep: free plan for stock news API
- StockNewsAPI.com: free endpoint for aggregated stock market news

## Datasets for Research and Model Training
- NIFTY Financial News Headlines Dataset: Contains 15.7M time aligned headlines and stock prices for 4,775 S&P 500 firms from 1999-2023, available on Hugging face
- FNSPIDL: dataset with 29.7M stock prices and 15.7M financial news records, ideal for sentiment and forecasting analysis
- ECTSum: earnings call transcripts with expert bullet-point summaries - for summarisation tasks
______
For sentiment analysis or financial specific natural language extraction we would employ readily available NLP tools - pre-trained domain models or enterprise grade NLP pipelines.

## NLP tools for finance
- FinBERT: specialised BERT model fine-tuned for financial sentiment analysis, built on top of Hugging Faces transformers. It outputs positive, negative and neutral with corresponding probabilities.
- Spark NLP & Spark OCR: enterprise-grade NLP library optimized for scalability on Spark. Includes pipelines for tokenization, NER, sentiment, document classification, plus OCR integration
- deepset Haystack: open-source framework for building search, QA, summarization, and RAG pipelines. Integrates with large language models via Hugging Face, OpenAI, Cohere
- Contextual AI: enterprise-focused platform for Retrieval-Augmented Generation (RAG), enabling specialised agents for banking and finance
- Cohere: large language model APIs for classification, summarization, and financial applications, now with a secure “North for Banking” platform
______
For the purpose of experimentation we would first be using **MarketAux** for fresh financial news via API, and **FinBERT** for sentiment classification on said financial news.
______

# MaketAux
A free market and financial news API. Gives access to global stock market and finance news, including funds, crypto. 

Link to website and documentation: https://www.marketaux.com/

In [None]:
# import necessary libraries
import os
from dotenv import load_dotenv

# import custom 
from utils import get_news

# load in environment variables
load_dotenv()

True

In [None]:
# load in API key
api_key = os.environ.get("MARKET_AUX_API_TOKEN")

In [None]:
def get_news(symbols: list[str], api_key: str) -> tuple[int, Union[dict, list]]:
    """
    This functions takes in user defined parameters to generate news
    :parameters:
    :symbols (List[str]): a list of stock symbols to search
    :api_key (str): api key on marketaux 
    :return: tuple[int, Union[dict, list]]: a tuple of the response code and the reponse text
    """

    # convert symbols list into a string
    tickers = ",".join(symbols)
    
    # create url for request
    url = f"https://api.marketaux.com/v1/news/all?symbols={tickers}&filter_entities=true&language=en&api_token={api_key}"

    # call marketaux for news
    response = requests.get(url)

    return response.status_code, response.json

Status Code: 200
Response Text:
{"meta":{"found":155352,"returned":3,"limit":3,"page":1},"data":[{"uuid":"4e0d1f77-3fe6-4302-a6ab-264f53d0ca2b","title":"Trump Turns On 'Nice' Carney With Canadian Tech Tax Ultimatum","description":"It had all been going so well. Since Mark Carney replaced Justin Trudeau as prime minister, Canada’s relationship with the White House appeared to be functional again, on the mend after President Donald Trump’s tariffs and overtures about making it the “51st state.”","keywords":"donald trump, Bloomberg, Mark Carney, Trade Talks","snippet":"On Friday, Trump said he’d terminate trade talks with Canada in response to a 3% digital services tax. The levy, which would mainly hit American companies, is...","url":"https:\/\/www.ndtvprofit.com\/business\/trump-turns-on-nice-carney-with-canadian-tech-tax-ultimatum","image_url":"https:\/\/media.assettype.com\/bloombergquint%2F2025-06-28%2Fb7t4opya%2F435607393.jpg?w=1200&auto=format%2Ccompress&ogImage=true","language":"e