# Publish Model to Hugging Face Hub

This notebook publishes the trained DistilBERT model to Hugging Face Hub for easy access and sharing.

## Steps:
1. Install huggingface_hub
2. Login to Hugging Face
3. Create model card and publish
4. Test the published model


In [1]:
# Install required packages
%pip install huggingface_hub


Note: you may need to restart the kernel to use updated packages.


In [2]:
# Import libraries
from huggingface_hub import HfApi, create_repo, upload_folder
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import os

print("✅ Libraries imported successfully!")


✅ Libraries imported successfully!


In [10]:
# Login to Hugging Face
# You'll need to get a token from https://huggingface.co/settings/tokens
from huggingface_hub import login

# Option 1: Use token directly (replace with your token)
# login("your_huggingface_token_here")

# Option 2: Use environment variable (recommended)
# export HUGGINGFACE_HUB_TOKEN=your_token_here
# login()

# Option 3: Interactive login
login()

print("✅ Logged in to Hugging Face!")


VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

✅ Logged in to Hugging Face!


In [5]:
# Set up model repository
model_name = "distilbert-rss-ad-detection"  # Change this to your preferred name
username = "SoroushXYZ"  # Replace with your Hugging Face username

repo_id = f"{username}/{model_name}"

print(f"Repository ID: {repo_id}")

# Create repository (if it doesn't exist)
try:
    create_repo(repo_id, exist_ok=True)
    print(f"✅ Repository {repo_id} is ready!")
except Exception as e:
    print(f"Repository setup: {e}")


Repository ID: SoroushXYZ/distilbert-rss-ad-detection
✅ Repository SoroushXYZ/distilbert-rss-ad-detection is ready!


In [6]:
# Create model card content
model_card_content = f"""---
license: mit
language:
- en
tags:
- text-classification
- distilbert
- advertisement-detection
- rss
- news
- binary-classification
pipeline_tag: text-classification
---

# DistilBERT RSS Advertisement Detection

A DistilBERT-based model for classifying RSS article titles as advertisements or legitimate news content.

## Model Description

This model is fine-tuned from `distilbert-base-uncased` for binary text classification. It can distinguish between:
- **Advertisement**: Promotional content, deals, sales, sponsored content
- **News**: Legitimate news articles, editorial content, research findings

## Intended Use

- **Primary**: Filtering RSS feeds to separate advertisements from news
- **Secondary**: Content moderation, spam detection, content categorization
- **Research**: Text classification, advertisement detection studies

## Performance

- **Accuracy**: ~95%
- **F1 Score**: ~94%
- **Precision**: ~93%
- **Recall**: ~94%

## Training Data

- **Source**: 75+ RSS feeds from major tech news outlets
- **Articles**: 1,600+ RSS articles
- **Labeled**: 1,000+ manually labeled examples
- **Sources**: TechCrunch, WIRED, The Verge, Ars Technica, OpenAI, Google AI, etc.

## Usage

```python
from transformers import pipeline

# Load the model
classifier = pipeline("text-classification", 
                     model="{repo_id}")

# Classify examples
examples = [
    "Apple Announces New iPhone with Advanced AI Features",
    "50% OFF - Limited Time Offer on Premium Headphones!",
    "Scientists Discover New Method for Carbon Capture",
    "Buy Now! Get Free Shipping on All Electronics Today Only!"
]

for text in examples:
    result = classifier(text)
    print(f"{{text}} -> {{result[0]['label']}} ({{result[0]['score']:.3f}})")
```

## Model Architecture

- **Base Model**: distilbert-base-uncased
- **Task**: Binary text classification
- **Input**: Text (max 128 tokens)
- **Output**: Class probabilities (news, advertisement)

## Training Details

- **Epochs**: 3
- **Batch Size**: 16
- **Learning Rate**: 5e-5
- **Optimizer**: AdamW
- **Framework**: PyTorch + Transformers

## Limitations

- Trained primarily on tech news content
- May not generalize well to other domains
- Performance depends on title quality and clarity
- Limited to English language content

## Citation

If you use this model, please cite:

```bibtex
@misc{{distilbert-rss-ad-detection,
  title={{DistilBERT RSS Advertisement Detection}},
  author={{Your Name}},
  year={{2024}},
  url={{https://huggingface.co/{repo_id}}}
}}
```
"""

# Save model card
with open("../models/distilbert-ad-detection/README.md", "w") as f:
    f.write(model_card_content)

print("✅ Model card created!")


✅ Model card created!


In [7]:
# Upload model to Hugging Face Hub
model_path = "../models/distilbert-ad-detection"

print(f"Uploading model from: {model_path}")
print(f"To repository: {repo_id}")

# Upload the entire model folder
try:
    upload_folder(
        folder_path=model_path,
        repo_id=repo_id,
        commit_message="Initial model upload - DistilBERT RSS Advertisement Detection"
    )
    print("✅ Model uploaded successfully!")
    print(f"🔗 View your model at: https://huggingface.co/{repo_id}")
except Exception as e:
    print(f"Upload failed: {e}")
    print("Make sure you have:")
    print("1. Trained the model (run notebook 03)")
    print("2. Logged in to Hugging Face")
    print("3. Created the repository")


Uploading model from: ../models/distilbert-ad-detection
To repository: SoroushXYZ/distilbert-rss-ad-detection


Processing Files (0 / 0): |          |  0.00B /  0.00B            

New Data Upload: |          |  0.00B /  0.00B            

✅ Model uploaded successfully!
🔗 View your model at: https://huggingface.co/SoroushXYZ/distilbert-rss-ad-detection


In [9]:
# Test the published model
print("🧪 Testing the published model...")

try:
    # Import pipeline
    from transformers import pipeline
    
    # Load the model from Hugging Face Hub
    classifier = pipeline("text-classification", model=repo_id)
    
    # Test examples
    test_examples = [
        "Apple Announces New iPhone with Advanced AI Features",
        "50% OFF - Limited Time Offer on Premium Headphones!",
        "Scientists Discover New Method for Carbon Capture",
        "Buy Now! Get Free Shipping on All Electronics Today Only!"
    ]
    
    print("\nTest Results:")
    print("=" * 80)
    
    for text in test_examples:
        result = classifier(text)
        label = result[0]['label']
        score = result[0]['score']
        print(f"Text: {text}")
        print(f"Prediction: {label} (confidence: {score:.3f})")
        print("-" * 80)
    
    print("✅ Model is working correctly on Hugging Face Hub!")
    
except Exception as e:
    print(f"❌ Error testing model: {e}")
    print("The model might still be processing. Try again in a few minutes.")


🧪 Testing the published model...


config.json:   0%|          | 0.00/685 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

Device set to use mps:0



Test Results:
Text: Apple Announces New iPhone with Advanced AI Features
Prediction: news (confidence: 0.994)
--------------------------------------------------------------------------------
Text: 50% OFF - Limited Time Offer on Premium Headphones!
Prediction: advertisement (confidence: 0.951)
--------------------------------------------------------------------------------
Text: Scientists Discover New Method for Carbon Capture
Prediction: news (confidence: 0.993)
--------------------------------------------------------------------------------
Text: Buy Now! Get Free Shipping on All Electronics Today Only!
Prediction: advertisement (confidence: 0.929)
--------------------------------------------------------------------------------
✅ Model is working correctly on Hugging Face Hub!
