A powerful web application built with FastAPI that intelligently removes citation markers from HTML files while preserving the complete HTML structure, formatting, and LaTeX content. Perfect for cleaning academic content, research papers, and educational materials generated by AI tools like Gemini.
- π§Ή Smart Citation Removal - Removes
[cite: numbers]patterns including dash-separated ranges (e.g.,[cite: 124, 125],[cite: 105-107]) - ποΈ Empty Tag Cleanup - Removes tags containing only
[cite_start]markers - π LaTeX Support - Preserves and renders mathematical formulas with MathJax
- π― Dual Mode - Paste HTML code OR upload files
- π Real-time Statistics - Live count of removed citations
- π¨ Modern Glass Morphism Design - Beautiful gradient backgrounds with glass effects
- π± Fully Responsive - Works perfectly on desktop, tablet, and mobile
- ποΈ Live Preview - See rendered HTML before and after cleaning with LaTeX rendering
- π» Code Comparison - Syntax-highlighted before/after code view
- π 6 Copy Methods - Clipboard, Rich Text, Formatted, Select All, Download, Reset
- π¬ Smooth Animations - Engaging transitions and micro-interactions
- π Interactive Demo - Live example with real academic content
- β‘ Auto-Update Preview - Real-time preview updates as you type (500ms debounce)
- Render: https://html-checker-1.onrender.com β Primary
- Status: Running on free tier (may have cold starts after 15 min inactivity)
- Response Time: < 200ms (warm) | ~50s (cold start)
β οΈ Note: This instance runs on Render's free tier. See RENDER_TROUBLESHOOTING.md if you encounter issues.
This application can be easily deployed to various cloud platforms including Render, Azure, DigitalOcean, Railway, Heroku, and more. See the Deployment section below for detailed instructions.
- Python 3.11 or higher
- pip (Python package manager)
- Clone the repository:
git clone https://github.com/algsoch/html-checker.git
cd html-checker- Install dependencies:
pip install -r requirements.txt- Run the application:
python main.py- Open your browser:
http://localhost:8000
- Navigate to the Paste Mode tab
- Paste your HTML code into the textarea
- Click "β¨ Clean HTML Code"
- View the before/after comparison and live preview
- Copy or download the cleaned HTML
- Navigate to the Upload Mode tab
- Drag & drop your HTML file OR click "Browse Files"
- Click "Clean & Download"
- Your cleaned file will be downloaded automatically
- Code View: Syntax-highlighted before/after comparison
- Rendered View: See actual HTML output with LaTeX formulas
- Statistics: Total citations removed, breakdown by type
- 6 Copy Options: Choose your preferred copy method
html-citation-cleaner/
βββ main.py # FastAPI backend with cleaning logic
βββ requirements.txt # Python dependencies
βββ render.yaml # Render deployment config
βββ startup.sh # Azure startup script
βββ Procfile # Heroku deployment config
βββ .github/
β βββ workflows/
β βββ main_html-checker.yml # Azure GitHub Actions CI/CD
β βββ azure-deploy.yml # Alternative Azure deployment
β βββ keep-alive.yml # Render keep-alive (optional)
βββ templates/
β βββ index.html # Main UI with live preview
β βββ public/
β βββ images/
β βββ filter.png # App favicon
βββ static/
β βββ style.css # Advanced styling (glass morphism)
β βββ script.js # Frontend logic + MathJax integration
βββ outputs/ # Cleaned files directory
βββ sample.html # Sample citation-filled HTML
βββ sample1.html # Demo content (Chemistry Question)
The application uses a two-step intelligent cleaning algorithm:
Removes entire HTML tags that contain ONLY cite markers (no other text):
| Before | After |
|---|---|
<p>[cite_start]</p> |
(removed) |
<p>[cite: 123]</p> |
(removed) |
<div>[cite_start]</div> |
(removed) |
<span>[cite: 456, 789]</span> |
(removed) |
Removes cite markers from tags that contain other content:
| Before | After |
|---|---|
<p>Some text [cite: 123]</p> |
<p>Some text </p> |
[cite_start]Other text |
Other text |
Text [cite: 124, 125] more |
Text more |
Formula \(C_6H_{14}\) [cite: 130] |
Formula \(C_6H_{14}\) |
β The HTML structure and LaTeX formulas remain completely intact!
This application can be deployed to multiple cloud platforms. Choose the one that best fits your needs:
Render offers a generous free tier and simple deployment process.
Deployment Steps:
- Fork this repository
- Go to Render Dashboard
- Click "New +" β "Web Service"
- Connect your GitHub repo
- Render will auto-detect the
render.yamlconfiguration - Click "Create Web Service"
Configuration (already in render.yaml):
plan: free # or 'starter' ($7/month) for always-on
startCommand: gunicorn main:app --workers 1 --worker-class uvicorn.workers.UvicornWorker --bind 0.0.0.0:$PORT --timeout 120Free Tier Features:
- 750 hours/month
- Service sleeps after 15 minutes of inactivity
- ~50 second cold start on first request
- Perfect for personal projects and learning
Upgrade to Starter ($7/month) for:
- Always-on service (no sleep)
- No cold starts
- Unlimited hours
Keep-Alive Option:
- The repository includes
.github/workflows/keep-alive.ymlthat pings your service every 5 minutes - Warning: This uses ~360 hours/month of your free tier limit
- Recommended to disable for free tier users (see RENDER_TROUBLESHOOTING.md)
Troubleshooting: See RENDER_TROUBLESHOOTING.md for detailed solutions to common issues.
Azure offers reliable hosting with good free tier options and easy GitHub integration.
The repository includes a pre-configured workflow (.github/workflows/main_html-checker.yml) that automatically deploys to Azure on every push to main branch.
Setup:
- Create an Azure App Service with Python runtime
- Configure deployment credentials in GitHub repository secrets
- Push to main branch - auto-deploys!
# Login to Azure
az login
# Create resource group
az group create --name html-checker-rg --location eastus
# Create App Service Plan (choose tier based on needs)
# Free F1: Good for testing ($0/month, has limitations)
# Basic B1: Recommended for production ($13/month)
az appservice plan create --name html-checker-plan --resource-group html-checker-rg --sku B1 --is-linux
# Create Web App
az webapp create --resource-group html-checker-rg --plan html-checker-plan --name your-unique-app-name --runtime "PYTHON:3.11"
# Configure startup command
az webapp config set --resource-group html-checker-rg --name your-unique-app-name --startup-file "gunicorn main:app --workers 2 --worker-class uvicorn.workers.UvicornWorker --bind 0.0.0.0:8000 --timeout 120"
# Deploy code
az webapp up --name your-unique-app-name --resource-group html-checker-rgAzure Pricing:
- Free F1: $0/month (60 CPU minutes/day, 1GB RAM) - Good for testing
- Basic B1: ~$13/month (Unlimited, 1.75GB RAM) - Recommended for production
- Standard S1: ~$70/month (Better performance, auto-scaling)
DigitalOcean offers simple deployment with competitive pricing.
Deployment Steps:
- Go to DigitalOcean App Platform
- Click "Create App" β Connect your GitHub repository
- Configure:
- Build Command:
pip install -r requirements.txt - Run Command:
gunicorn main:app --workers 2 --worker-class uvicorn.workers.UvicornWorker --bind 0.0.0.0:$PORT --timeout 120 - HTTP Port: 8080 (or use environment variable $PORT)
- Build Command:
- Choose plan and deploy
DigitalOcean Pricing:
- Basic: $5/month (512MB RAM, 1 vCPU)
- Professional: $12/month (1GB RAM, 1 vCPU)
- Pro+: $24/month (2GB RAM, 2 vCPU)
Railway provides a modern deployment experience with generous free tier.
Deployment Steps:
- Go to Railway
- Click "New Project" β "Deploy from GitHub repo"
- Select your repository
- Railway auto-detects Python and installs dependencies
- Add start command in settings:
gunicorn main:app --workers 2 --worker-class uvicorn.workers.UvicornWorker --bind 0.0.0.0:$PORT --timeout 120
Railway Pricing:
- Hobby: $5/month (512MB RAM, shared CPU)
- Pro: Starting at $20/month (more resources)
Note: Heroku discontinued free tier in November 2022.
Deployment Steps:
- Create a
Procfilein your repository (already included):web: gunicorn main:app --workers 2 --worker-class uvicorn.workers.UvicornWorker --bind 0.0.0.0:$PORT --timeout 120 - Install Heroku CLI
- Deploy:
heroku login heroku create your-app-name git push heroku main
Heroku Pricing:
- Basic: $7/month per dyno
- Standard: $25-50/month per dyno
Serverless deployment with pay-per-use pricing.
Deployment Steps:
- Create a
Dockerfile(if not exists):FROM python:3.11-slim WORKDIR /app COPY requirements.txt . RUN pip install -r requirements.txt COPY . . CMD ["gunicorn", "main:app", "--workers", "2", "--worker-class", "uvicorn.workers.UvicornWorker", "--bind", "0.0.0.0:8080", "--timeout", "120"]
- Deploy:
gcloud run deploy html-checker --source . --platform managed --region us-central1 --allow-unauthenticated
Google Cloud Run Pricing:
- Pay per use (free tier: 2 million requests/month)
- ~$0.24 per million requests after free tier
Modern platform with global deployment.
Deployment Steps:
- Install Fly CLI
- Deploy:
fly launch fly deploy
Fly.io Pricing:
- Free: 3 shared-cpu-1x VMs with 256MB RAM
- Paid: Starting at $1.94/month per VM
| Platform | Entry Price | Free Tier | Auto-Sleep | Best For |
|---|---|---|---|---|
| Render | $7/month (Starter) | 750 hrs/mo | After 15 min | Free tier, easy setup |
| Azure | $13/month (B1) | Limited F1 | No | Enterprise, Microsoft ecosystem |
| DigitalOcean | $5/month | No | No | Simple, predictable pricing |
| Railway | $5/month | 500 hours/mo | No | Modern development |
| Heroku | $7/month | No (removed) | No | Quick deployment |
| Google Cloud Run | Pay-per-use | 2M req/month | Yes | Serverless, variable traffic |
| Fly.io | $1.94/month | Limited | No | Global deployment |
For Students/Learning:
- Render: Best free tier (750 hours/month), easy setup
- Railway: Good trial, modern platform
- Azure F1: Free tier available (with limitations)
- Fly.io: Generous free tier
For Production:
- Render Starter: $7/month, always-on, no cold starts
- DigitalOcean: Simple pricing, $5-12/month
- Azure B1: Reliable, good performance, $13/month
- Railway: Modern platform, starting at $5/month
For Variable Traffic:
- Google Cloud Run: Pay only for what you use
| Method | Endpoint | Description |
|---|---|---|
GET |
/ |
Main web interface |
POST |
/upload |
Upload and clean HTML file |
GET |
/download/{filename} |
Download cleaned file |
GET |
/static/* |
Serve static assets (CSS/JS) |
GET |
/templates/public/* |
Serve public assets (images) |
Input HTML:
<p>[cite_start]</p>
<div>The Lewis electron-dot diagrams [cite: 124, 125] show molecular structure.</div>
<p>\(HClO_{3}\) is the stronger acid [cite: 130] due to electronegativity.</p>Output HTML:
<div>The Lewis electron-dot diagrams show molecular structure.</div>
<p>\(HClO_{3}\) is the stronger acid due to electronegativity.</p>β¨ Notice: LaTeX \(HClO_{3}\) is perfectly preserved!
- Backend: FastAPI (Python 3.11+)
- ASGI Server: Uvicorn with Gunicorn workers
- Frontend: HTML5, CSS3 (Glass Morphism), Vanilla JavaScript
- Math Rendering: MathJax 3
- Deployment: Render, Azure, DigitalOcean, Railway, Heroku, Google Cloud Run, Fly.io
- CI/CD: GitHub Actions
- Version Control: Git/GitHub
- β‘ Fast Processing: Handles large HTML files (100KB+) in milliseconds
- π― Accuracy: 100% citation removal without structure damage
- π± Responsive: Works on devices from 320px to 4K displays
- π Auto-Deploy: GitHub push β Live in 2 minutes
- πΎ Memory Efficient: Minimal server resource usage
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
vicky kumar
- GitHub: @algsoch
- LinkedIn: Connect with me
- FastAPI for the amazing web framework
- MathJax for LaTeX rendering support
- Render, Azure, and other cloud platforms for making deployment accessible
If you found this project helpful, please give it a βοΈ!
For issues or questions, please open an issue.