# Financial News Sentiment Analysis

This notebook performs sentiment classification (bullish, neutral, bearish) on financial news using:
1. GMI API (DeepSeek-V3.2) as baseline
2. Two pre-trained transformer models for comparison
3. Fine-tuning the best model
4. Re-evaluation after fine-tuning


In [1]:
# Install required packages
!uv add datasets transformers torch requests pandas scikit-learn tqdm -q

In [None]:
import os

HF_TOKEN = os.getenv("HF_TOKEN")

DATASET_NAME = "ArthurMrv/EDGAR-CORPUS-Financial-Summarization-Labeled"

In [3]:
from tqdm.notebook import tqdm

## 1. Load Dataset


In [4]:
# Authenticate with Hugging Face Hub
from huggingface_hub import login
import getpass

login(token=HF_TOKEN)

print("\nSuccessfully logged in to Hugging Face Hub!")



Successfully logged in to Hugging Face Hub!


In [5]:
from datasets import load_dataset
import pandas as pd

# 1. Load in streaming mode
ds = load_dataset(DATASET_NAME, split="refined")
ds


README.md: 0.00B [00:00, ?B/s]

data/refined-00000-of-00002.parquet:   0%|          | 0.00/161M [00:00<?, ?B/s]

Generating refined split:   0%|          | 0/10313 [00:00<?, ? examples/s]

Dataset({
    features: ['input_hash', 'input', 'summary', 'model', 'llm_sentiment_class', 'llm_sentiment_model', 'llm_sentiment_rationale'],
    num_rows: 10313
})

In [7]:
df = ds.to_pandas()
df.head()

Unnamed: 0,input_hash,input,summary,model,llm_sentiment_class,llm_sentiment_model,llm_sentiment_rationale
0,5e86b75638298802943dcc20093fae925be0c5146cbbf5...,FINANCIAL STATEMENTS AND SUPPLEMENTARY DATA IN...,Here's a summary of the financial statement:\n...,Claude,-1,DeepSeek-V3.2,"the text explicitly states the company is ""ope..."
1,c5b4129b8c9c94491f949c87ac452f30af0f11b8a68e5d...,FINANCIAL STATEMENTS AND SUPPLEMENTARY DATA. ...,"Based on the provided excerpt, here's a summar...",Claude,-2,DeepSeek-V3.2,"the text explicitly describes ""ongoing financi..."
2,a488c022380bcf59d4d3c99127ed2554fa67f86854119e...,. Report of Independent Registered Public Acco...,This appears to be a partial financial stateme...,Claude,0,DeepSeek-V3.2,the text is purely descriptive of accounting p...
3,52db0b9f1bfe2db5807d09128d511e2176c0226e0558c7...,Index to Consolidated Financial Statements All...,This appears to be a partial financial stateme...,Claude,0,DeepSeek-V3.2,the text is a purely factual description of ac...
4,a5d562bb3069324a9facf4ecb6fdf52aee051460a2cecf...,. ACCOUNTING FIRM To the Board of Directors a...,Here's a summary of the financial statement:\n...,Claude,0,DeepSeek-V3.2,"the text is a purely factual, neutral descript..."


## 2. Setup API for Classification


In [8]:
import os
from huggingface_hub import InferenceClient

client = InferenceClient(
    api_key=HF_TOKEN,
)

In [9]:
def request_sentiment(text, prompt_template):
    prompt = prompt_template.format(input_text=text)
    completion = client.chat.completions.create(
        model="deepseek-ai/DeepSeek-V3.2",
        messages=[
            {"role": "user", "content": prompt}
        ],
    )
    response = completion.choices[0].message
    # Extract the word (expected to be only the class label)
    return response.content.strip().lower()

def get_llm_sentiment(df, input_column, output_column, prompt_template):
    """
    Given a DataFrame, input column, and output column name,
    calls the DeepSeek LLM via Hugging Face Inference API to get the sentiment for each row.
    The result will be stored in a new column (output_column).
    """

    tqdm.pandas(desc="Classifying sentiment")
    df[output_column] = df[input_column].progress_apply(lambda x: request_sentiment(x, prompt_template))
    return df

In [10]:
prompt_template = """
You are a Financial News Sentiment Classifier.
Your task is to classify the sentiment EXPLICITLY expressed in the text regarding stocks.

### Scoring Rubric:
+2 (Highly Bullish): Text explicitly states prices are soaring, skyrocketing, or reports massive success/breakthroughs.
+1 (Bullish): Text describes price increases, positive outlooks, or favorable conditions.
 0 (Neutral): Text is purely factual with no emotional charge, or describes flat price movement.
-1 (Bearish): Text describes price drops, fears, negative outlooks, or unfavorable conditions.
-2 (Highly Bearish): Text explicitly states prices are crashing, plummeting, or reports crisis/panic.

### Critical Rules:
1. **React to the text, not the market.** If the text says "stocks are down," the score MUST be negative, even if the reason is generic (like the FED).
2. Do not assume news is "priced in." Analyze the immediate emotional and factual content of the snippet.
3. Ignore external context. Only use the provided text.

### Output Format:
Reasoning: [1 sentence identifying the specific keywords or claims in the text that justify the score]
Score: [Integer between -2 and 2]

---
Article: {input_text}
"""
input_column = "summary"
output_column = "llm_output"

In [11]:
input_txt = """
Here's a summary of the financial statement: Financial Health Overview: - The company (Digerati) is operating at a loss with a working capital deficit - There are substantial concerns about the company's ability to continue operations - Management has addressed these concerns in Note 2 Revenue Streams: 1. Global VoIP Services - Provides VoIP services to U.S. and foreign telecommunications companies - Focuses on markets in Mexico, Asia, the Middle East, and Latin America 2. Cloud Communication Services - Offers hosted IP/PBX services to resellers and enterprise customers - Includes various features like call center applications, prepaid services, and customized IP/PBX features Key Expenses: - Transmission and termination charges from suppliers - Infrastructure and network costs - Internet bandwidth charges - Licensing and co-location charges - Installation costs Financial Risk Factors: - Credit risk from trade receivables - Potential exposure from bank deposits exceeding federally insured limits - Customer concentration risk (four customers comprise 20% of revenue) Revenue Recognition: - Based on evidence of arrangement, service delivery, fixed pricing, and collectability - Company acts as primary obligor with pricing authority and credit risk responsibility This statement indicates a company with established revenue streams but facing significant financial challenges and operational risks."""
request_sentiment(input_txt, prompt_template)

'reasoning: the text explicitly describes the company as "operating at a loss with a working capital deficit" and highlights "substantial concerns about the company\'s ability to continue operations," which are highly negative and unfavorable conditions.  \nscore: -2'

In [12]:
df_subset_llm = df[df["llm_sentiment_class"].isna()][["input_hash", "summary"]]
print(f"Number of rows with no sentiment class: {len(df_subset_llm)}")


df_subset_llm = df_subset_llm[["input_hash", "summary"]]
df_subset_llm.head()

Number of rows with no sentiment class: 6034


Unnamed: 0,input_hash,summary
2683,93fed708cdd40a6d913f9f66c5a72d54c356d0aa16669d...,"I apologize, but the text you've provided appe..."
4215,949e9c240eb4e75485e6c19ba7628f244323ee3fc6a827...,The financial statement provided includes info...
4281,c5b418da48cbefe7d4cca8a74eb87b28a0df20b504d913...,The financial statement provided includes info...
4282,b44b7a4f86a4130c03963f4b207253c15926fd45c7a86b...,Here's a summary of the financial statement:\n...
4283,c758f909196698d45e8303d6795214c8fd43926caabc9f...,The financial statement provided includes info...


In [13]:
# do test with 3 rows
test_df_subset_llm = df_subset_llm.head(3)
test_output_df_subset_llm = get_llm_sentiment(test_df_subset_llm, input_column, output_column, prompt_template)
test_output_df_subset_llm.head()

Classifying sentiment:   0%|          | 0/3 [00:00<?, ?it/s]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[output_column] = df[input_column].progress_apply(lambda x: request_sentiment(x, prompt_template))


Unnamed: 0,input_hash,summary,llm_output
2683,93fed708cdd40a6d913f9f66c5a72d54c356d0aa16669d...,"I apologize, but the text you've provided appe...",reasoning: the text is a meta-statement about ...
4215,949e9c240eb4e75485e6c19ba7628f244323ee3fc6a827...,The financial statement provided includes info...,reasoning: the text is a purely factual descri...
4281,c5b418da48cbefe7d4cca8a74eb87b28a0df20b504d913...,The financial statement provided includes info...,reasoning: the text explicitly states the comp...


In [14]:
import pandas as pd
import os

from datasets import Dataset, load_dataset, DatasetDict
from huggingface_hub import HfApi, HfFolder
import math

import time


batch_size = 100
n_rows = len(df_subset_llm)
n_batches = math.ceil(n_rows / batch_size)

api = HfApi(
    token=HF_TOKEN
)

# 1. Define a local temporary file
temp_file = "processed_data.jsonl"

# Clear file if it exists from a previous failed run (optional)
if os.path.exists(temp_file):
    os.remove(temp_file)

for i in range(n_batches):
    max_attempts = 3
    for attempt in range(max_attempts):
        try:
            batch_df = df_subset_llm.iloc[i * batch_size : (i + 1) * batch_size].copy()
            
            # Process the batch
            batch_df = get_llm_sentiment(batch_df, input_column, output_column, prompt_template)
            batch_df = batch_df[["input_hash", output_column]]
            
            # 2. Append to local JSONL file (Safe & Fast)
            # mode='a' is append; orient='records' writes proper JSON lines
            batch_df.to_json(temp_file, orient='records', lines=True, mode='a')
            
            print(f"Saved batch {i+1}/{n_batches} locally.")
            break

        except Exception as e:
            if attempt + 1 >= max_attempts:
                print("Failed to process batch", i+1)
            else:
                print(f"Attempt {attempt+1} failed for batch {i+1}/{n_batches}, sleeping for 10 seconds")
                print(f"Error: {e}")
                time.sleep(10)


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 1/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Attempt 1 failed for batch 2/61, sleeping for 10 seconds
Error: 504 Server Error: Gateway Time-out for url: https://router.huggingface.co/novita/v3/openai/chat/completions


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Attempt 2 failed for batch 2/61, sleeping for 10 seconds
Error: 504 Server Error: Gateway Time-out for url: https://router.huggingface.co/novita/v3/openai/chat/completions


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 2/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 3/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 4/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 5/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 6/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 7/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 8/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 9/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Attempt 1 failed for batch 10/61, sleeping for 10 seconds
Error: 504 Server Error: Gateway Time-out for url: https://router.huggingface.co/novita/v3/openai/chat/completions


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 10/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 11/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 12/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 13/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 14/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 15/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Attempt 1 failed for batch 16/61, sleeping for 10 seconds
Error: 504 Server Error: Gateway Time-out for url: https://router.huggingface.co/novita/v3/openai/chat/completions


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 16/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 17/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 18/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 19/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 20/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 21/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 22/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 23/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 24/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 25/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 26/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 27/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 28/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 29/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 30/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 31/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 32/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 33/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 34/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 35/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 36/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 37/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 38/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 39/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 40/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 41/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 42/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 43/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 44/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 45/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 46/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 47/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 48/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 49/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 50/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 51/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 52/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 53/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 54/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 55/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 56/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 57/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 58/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 59/61 locally.


Classifying sentiment:   0%|          | 0/100 [00:00<?, ?it/s]

Saved batch 60/61 locally.


Classifying sentiment:   0%|          | 0/34 [00:00<?, ?it/s]

Saved batch 61/61 locally.


In [15]:
full_df = pd.read_json(temp_file, orient='records', lines=True)
full_df.head()

Unnamed: 0,input_hash,llm_output
0,93fed708cdd40a6d913f9f66c5a72d54c356d0aa16669d...,reasoning: the text is a purely factual descri...
1,949e9c240eb4e75485e6c19ba7628f244323ee3fc6a827...,reasoning: the text explicitly states the corp...
2,c5b418da48cbefe7d4cca8a74eb87b28a0df20b504d913...,reasoning: the text explicitly mentions the co...
3,b44b7a4f86a4130c03963f4b207253c15926fd45c7a86b...,"reasoning: the text is a purely factual, descr..."
4,c758f909196698d45e8303d6795214c8fd43926caabc9f...,reasoning: the text is a purely factual descri...


In [19]:
# Split llm_output by '\nscore' and clean the reasoning column
import re
def extract_reasoning_and_score(input_hash):
    """
    Given an input_hash, find the corresponding llm_output in full_df,
    and return a dictionary: {'reasoning': ..., 'score': ...}
    """
    # Select the matching row
    row = full_df.loc[full_df['input_hash'] == input_hash]
    if row.empty:
        return None

    llm_output = row.iloc[0]['llm_output']
    
    # Split at the first occurrence of 'score:'
    if '\nscore:' not in llm_output:
        print(ValueError(f"No score found for input_hash: {input_hash}"))
        return None


    split_llm = llm_output.split('\nscore:')
    reasoning, score = "".join(split_llm[0:-1]), split_llm[-1]


    if 'reasoning' in llm_output:
        reasoning = reasoning[len('reasoning:'):].strip()
    elif len(reasoning) >= 0:
        reasoning = reasoning.strip()
    else:
        print("reasoning", reasoning)
        raise ValueError(f"No reasoning found for input_hash: {input_hash}")
    
    score = score.replace('score:', '').strip()
    return {"reasoning": reasoning, "score": score}

# print(extract_reasoning_and_score("5e86b75638298802943dcc20093fae925be0c5146cbbf551a839cf47987d1e53"))

In [20]:
# Enrich df_refined with llm_sentiment_class and llm_sentiment_rationale using extract_reasoning_and_score

def classify_sentiment(row, model_name = "DeepSeek-V3.2"):
    if row['llm_sentiment_class'] is not None:
        return pd.Series({
            "llm_sentiment_class": row['llm_sentiment_class'], 
            "llm_sentiment_rationale": row['llm_sentiment_rationale'],
            "llm_sentiment_model": model_name
        })

    extraction = extract_reasoning_and_score(row['input_hash'])
    if extraction is None:
        return pd.Series({"llm_sentiment_class": None, "llm_sentiment_rationale": None})
    # Convert score to class string (convention: -1=negative, 0=neutral, 1=positive)
    return pd.Series({
        "llm_sentiment_class": extraction["score"],
        "llm_sentiment_rationale": extraction["reasoning"],
        "llm_sentiment_model": model_name
    })

# Apply classification with tqdm progress bar
results = []
for idx, row in tqdm(df.iterrows(), total=len(df), desc="Classifying sentiment"):
    results.append(classify_sentiment(row))
result_df = pd.DataFrame(results, index=df.index)
df[["llm_sentiment_class", "llm_sentiment_rationale", "llm_sentiment_model"]] = result_df
df.head()

Classifying sentiment:   0%|          | 0/10313 [00:00<?, ?it/s]

No score found for input_hash: 8e2fe9208bfb30f706a06ebc7accfc5bf03c227ae1f9d9101b1a4fcc35c43ef1
No score found for input_hash: ba8aeb8f926c40f48e6cd31e8577057823a168d49e090be89602996b2926b653
No score found for input_hash: a5695b34374a74e7267ec65ec97b35332857bcbe4e75c30988c40d9407a23fea
No score found for input_hash: 5a2a4faac0f702b1a0688d3be867669a003a9037c965777cac6fc4b560bb181c
No score found for input_hash: f98e409e42f5dc47579e44ed88fab7f89a1d825b077db7b56e00f654cec150e8
No score found for input_hash: 25dafd9c79c017dafc37388a5565390113f2cb015f561198b12d457a4177d02a
No score found for input_hash: ed8eb23dbb480732dff173b530f1a80101ce4fdc0e52fc6ba200ee1376c712d0
No score found for input_hash: c604ad98101055267c01d240cab5f93524ca2067c851932b5e8f2e24a93f75fa
No score found for input_hash: 7e592af8b7e182946522bcb00c7d1ba952b0fb25e00026b770fd2e462579b447
No score found for input_hash: 2b445d8701a7bc24238cf51c3fee93a6063601e3debf4233245768b59469360c
No score found for input_hash: d0918aa48

Unnamed: 0,input_hash,input,summary,model,llm_sentiment_class,llm_sentiment_model,llm_sentiment_rationale
0,5e86b75638298802943dcc20093fae925be0c5146cbbf5...,FINANCIAL STATEMENTS AND SUPPLEMENTARY DATA IN...,Here's a summary of the financial statement:\n...,Claude,-1,DeepSeek-V3.2,"the text explicitly states the company is ""ope..."
1,c5b4129b8c9c94491f949c87ac452f30af0f11b8a68e5d...,FINANCIAL STATEMENTS AND SUPPLEMENTARY DATA. ...,"Based on the provided excerpt, here's a summar...",Claude,-2,DeepSeek-V3.2,"the text explicitly describes ""ongoing financi..."
2,a488c022380bcf59d4d3c99127ed2554fa67f86854119e...,. Report of Independent Registered Public Acco...,This appears to be a partial financial stateme...,Claude,0,DeepSeek-V3.2,the text is purely descriptive of accounting p...
3,52db0b9f1bfe2db5807d09128d511e2176c0226e0558c7...,Index to Consolidated Financial Statements All...,This appears to be a partial financial stateme...,Claude,0,DeepSeek-V3.2,the text is a purely factual description of ac...
4,a5d562bb3069324a9facf4ecb6fdf52aee051460a2cecf...,. ACCOUNTING FIRM To the Board of Directors a...,Here's a summary of the financial statement:\n...,Claude,0,DeepSeek-V3.2,"the text is a purely factual, neutral descript..."


# Save Generated Dataset in Hugging face

In [21]:
from datasets import Dataset, DatasetDict
from huggingface_hub import notebook_login

# Prepare the Dataset from df_refined
hf_dataset = Dataset.from_pandas(df, preserve_index=False)

hf_dataset.push_to_hub(DATASET_NAME, split="refined")



Uploading the dataset shards:   0%|          | 0/2 [00:00<?, ? shards/s]

Creating parquet from Arrow format:   0%|          | 0/4 [00:00<?, ?ba/s]

Processing Files (0 / 0): |          |  0.00B /  0.00B            

New Data Upload: |          |  0.00B /  0.00B            

Creating parquet from Arrow format:   0%|          | 0/4 [00:00<?, ?ba/s]

Processing Files (0 / 0): |          |  0.00B /  0.00B            

New Data Upload: |          |  0.00B /  0.00B            

CommitInfo(commit_url='https://huggingface.co/datasets/ArthurMrv/EDGAR-CORPUS-Financial-Summarization-Labeled/commit/8ba7ba69157e714265f063c15a88d69342f23122', commit_message='Upload dataset', commit_description='', oid='8ba7ba69157e714265f063c15a88d69342f23122', pr_url=None, repo_url=RepoUrl('https://huggingface.co/datasets/ArthurMrv/EDGAR-CORPUS-Financial-Summarization-Labeled', endpoint='https://huggingface.co', repo_type='dataset', repo_id='ArthurMrv/EDGAR-CORPUS-Financial-Summarization-Labeled'), pr_revision=None, pr_num=None)