**Advanced Academic Paraphrasing Tool**
=======================================

üìã¬†**Project Documentation**
----------------------------

**1\. Project Overview**
------------------------

**Title**
---------

Humanized LLM-Powered Academic Paraphrasing System with Quality Assessment

**Description**
---------------

An advanced paraphrasing tool designed for academic research that combines large language models (Llama 3.3 70B) with humanization post-processing to generate high-quality, plagiarism-free paraphrases while preserving original meaning and technical accuracy.

**Version**
-----------

1.0.0 (February 2026)

**Author**
----------

Argha Mukherjee - Computer Science & Machine Learning

**Repository Type**
-------------------

Kaggle Notebook (GPU-accelerated)

**2\. Objective**
-----------------

**Primary Objectives**
----------------------

1.  **Preserve Original Meaning**: Generate paraphrases that maintain 78-92% semantic similarity to source text
    
2.  **Avoid Plagiarism**: Ensure 30-55% lexical overlap (40%+ word variation from original)
    
3.  **Humanize Output**: Remove AI-generated artifacts and robotic phrasing to produce natural, human-like text
    
4.  **Multi-Style Generation**: Provide three distinct paraphrasing styles (Academic, Concise, Technical)
    
5.  **Quality Assessment**: Real-time metrics for lexical overlap, semantic similarity, and humanness scores
    

**Secondary Objectives**
------------------------

*   Protect technical elements (equations, citations, URLs) during paraphrasing
    
*   Provide batch processing capabilities for multiple paragraphs
    
*   Generate reproducible results with source citation reminders
    
*   Enable offline/online hybrid operation (local embeddings + API models)

**üéì Citation**
---------------

If you use this tool in your research or academic work, please cite:

**BibTeX**
----------

@software{Personal ROBUST Paraphrasing Tool\_2026,

author = {\[Argha Mukherjee\]},

title = {{Humanized LLM-Powered Academic Paraphrasing System with Quality Assessment}},

year = {2026},

month = feb,

publisher = {Kaggle},

institution = {Jadavpur University \\&
School of Education Technology, Kolkata, India},

howpublished = {\\url{[https://www.kaggle.com/code/arghamukherjee1998/personal-robust-paraphrasing-tool/](https://www.kaggle.com/code/arghamukherjee1998/personal-robust-paraphrasing-tool/)}},

note = {Version 1.0.0. Developed as part of Post Graduate

research in deep learning and NLP and for personal use.}

}

In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

**‚ö†Ô∏è Disclaimer**
-----------------

This tool is designed to¬†**assist**¬†with academic writing, not replace critical thinking or original research. Users are responsible for:

*   Verifying factual accuracy of paraphrased content
    
*   Properly citing all sources
    
*   Following institutional academic integrity policies
    
*   Ensuring compliance with journal/conference guidelines
    
*   Understanding the original material before paraphrasing
    

**The authors assume no liability for misuse or academic misconduct.**

In [2]:
# CELL 1
# CELL 1: Install Required Packages
!pip install -q groq sentence-transformers torch nltk

# Download WordNet for language processing
import nltk
nltk.download('wordnet', quiet=True)
nltk.download('omw-1.4', quiet=True)

print("‚úÖ All dependencies installed!")
print("üìå Next: Get your free Groq API key from https://console.groq.com/keys")


[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m138.3/138.3 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[?25h‚úÖ All dependencies installed!
üìå Next: Get your free Groq API key from https://console.groq.com/keys



**Dependencies**
----------------

`   textgroq==0.4.2  sentence-transformers==3.3.1  torch==2.3.1  nltk==3.8.1  transformers==4.44.2  numpy==2.0.2  pandas==2.1.4   `

In [None]:
# CELL 2: Import Libraries and Configure Groq API

import os
from groq import Groq
import re
import torch
from typing import List, Dict
from sentence_transformers import SentenceTransformer, util
import warnings
warnings.filterwarnings('ignore')

# ============================================================================
# üîë GROQ API KEY
# ============================================================================
GROQ_API_KEY = "PASTE_YOUR_OWN_KEY_HERE"
# ============================================================================

# Validate and initialize
if GROQ_API_KEY == "PASTE_YOUR_NEW_KEY_HERE":
    print("‚ùå ERROR: Please replace PASTE_YOUR_NEW_KEY_HERE with your actual Groq API key!")
    print("üìå Get it from: https://console.groq.com/keys")
else:
    os.environ["GROQ_API_KEY"] = GROQ_API_KEY
    
    # Initialize Groq client
    groq_client = Groq(api_key=GROQ_API_KEY)
    
    # Load embedding model for quality analysis
    print("‚è≥ Loading semantic similarity model...")
    embedding_model = SentenceTransformer("sentence-transformers/all-mpnet-base-v2")
    
    print("‚úÖ All libraries imported!")
    print("‚úÖ Groq API configured with Llama 3.3 70B!")
    print(f"‚úÖ Semantic analyzer loaded!")
    print(f"üñ•Ô∏è  Device: {'CUDA' if torch.cuda.is_available() else 'CPU'}")


2026-02-03 22:38:42.375218: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1770158322.571630      55 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1770158322.625166      55 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1770158323.070367      55 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1770158323.070402      55 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1770158323.070405      55 computation_placer.cc:177] computation placer alr

‚è≥ Loading semantic similarity model...


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

‚úÖ All libraries imported!
‚úÖ Groq API configured with Llama 3.3 70B!
‚úÖ Semantic analyzer loaded!
üñ•Ô∏è  Device: CUDA


In [4]:
# CELL 3: Advanced Token Protector

class AdvancedTokenProtector:
    """Enhanced protection for scientific text elements"""
    
    def __init__(self):
        self.patterns = {
            'math_inline': re.compile(r'\$([^\$]+)\$'),
            'math_display': re.compile(r'\\\[(.+?)\\\]', re.DOTALL),
            'latex_env': re.compile(r'\\begin\{(\w+)\}(.*?)\\end\{\1\}', re.DOTALL),
            'code_inline': re.compile(r'`([^`]+)`'),
            'code_block': re.compile(r'```[\s\S]*?```'),
            'citation_latex': re.compile(r'\\cite\{[^}]+\}'),
            'citation_numeric': re.compile(r'\[(\d+(?:,\s*\d+)*)\]'),
            'citation_author': re.compile(r'\([A-Z][a-z]+(?:\s+et al\.?)?,?\s+\d{4}\)'),
            'url': re.compile(r'https?://[^\s]+'),
            'doi': re.compile(r'doi:\s*[^\s]+', re.IGNORECASE),
            'number_unit': re.compile(r'\b\d+\.?\d*\s*(?:mm|cm|m|km|mg|g|kg|ml|l|¬∞C|K|Hz|kHz|MHz|GHz|V|mV|A|mA|W|Pa|MPa|mol|%)\b'),
        }
        self.placeholder_map = {}
        self.counter = {}
    
    def protect(self, text: str):
        """Protect special tokens with placeholders"""
        protected = text
        self.placeholder_map = {}
        self.counter = {}
        
        for name in ['latex_env', 'math_display', 'code_block', 'math_inline', 
                     'code_inline', 'citation_latex', 'citation_author', 
                     'citation_numeric', 'doi', 'url', 'number_unit']:
            pattern = self.patterns[name]
            protected = self._replace(protected, pattern, name.upper())
        
        return protected, self.placeholder_map
    
    def _replace(self, text, pattern, token_type):
        """Replace matches with placeholders"""
        def repl(match):
            if token_type not in self.counter:
                self.counter[token_type] = 0
            self.counter[token_type] += 1
            placeholder = f"<{token_type}_{self.counter[token_type]}>"
            self.placeholder_map[placeholder] = match.group(0)
            return placeholder
        return pattern.sub(repl, text)
    
    def restore(self, text, placeholder_map):
        """Restore protected tokens"""
        for placeholder, original in placeholder_map.items():
            text = text.replace(placeholder, original)
        return text

protector = AdvancedTokenProtector()
print("‚úÖ Advanced token protector loaded!")


‚úÖ Advanced token protector loaded!


In [5]:
# CELL 4: Quality Analyzer

class QualityAnalyzer:
    """Analyze paraphrase quality with detailed metrics"""
    
    def __init__(self, embedding_model):
        self.embedding_model = embedding_model
    
    def compute_lexical_overlap(self, text1: str, text2: str) -> float:
        """Calculate word overlap percentage"""
        tokens1 = set(re.findall(r'\w+', text1.lower()))
        tokens2 = set(re.findall(r'\w+', text2.lower()))
        if not tokens1:
            return 0.0
        return len(tokens1 & tokens2) / len(tokens1)
    
    def compute_semantic_similarity(self, text1: str, text2: str) -> float:
        """Calculate meaning similarity using AI embeddings"""
        emb = self.embedding_model.encode([text1, text2], convert_to_tensor=True)
        return util.cos_sim(emb[0], emb[1]).item()
    
    def analyze(self, original: str, paraphrased: str) -> Dict:
        """Complete quality analysis"""
        lex = self.compute_lexical_overlap(original, paraphrased)
        sem = self.compute_semantic_similarity(original, paraphrased)
        
        return {
            'lexical_overlap': lex,
            'semantic_similarity': sem,
            'is_excellent': (0.30 <= lex <= 0.55 and 0.78 <= sem <= 0.92),
            'status': self._get_status(lex, sem)
        }
    
    def _get_status(self, lex, sem):
        """Determine quality status"""
        if 0.30 <= lex <= 0.55 and 0.78 <= sem <= 0.92:
            return "‚úÖ EXCELLENT"
        elif lex > 0.65:
            return "‚ö†Ô∏è TOO_SIMILAR"
        elif sem < 0.70:
            return "‚ö†Ô∏è MEANING_DRIFT"
        elif sem > 0.95:
            return "‚ö†Ô∏è NEARLY_IDENTICAL"
        else:
            return "‚úÖ GOOD"

analyzer = QualityAnalyzer(embedding_model)
print("‚úÖ Quality analyzer loaded!")


‚úÖ Quality analyzer loaded!


In [6]:
# CELL 5: Large Language Model Paraphraser (Llama 3.3 70B)

class LLMParaphraser:
    """Advanced paraphrasing using Groq's Llama 3.3 70B"""
    
    def __init__(self, groq_client, protector, analyzer):
        self.client = groq_client
        self.protector = protector
        self.analyzer = analyzer
        self.model = "llama-3.3-70b-versatile"  # 70 billion parameter model
    
    def paraphrase(self, text: str, sources: List[str] = None) -> Dict:
        """Generate 3 high-quality paraphrases"""
        
        print("="*90)
        print("üéì ADVANCED PARAPHRASING TOOL - POWERED BY LLAMA 3.3 70B")
        print("="*90)
        print(f"\nüìÑ ORIGINAL TEXT ({len(text)} characters):")
        print("-"*90)
        print(text)
        print("-"*90)
        
        if sources:
            print(f"\nüìö SOURCES ({len(sources)}):")
            for i, src in enumerate(sources, 1):
                print(f"   [{i}] {src}")
        else:
            print("\n‚ö†Ô∏è  Remember to cite your sources!")
        
        print("\n" + "="*90)
        print("üîÑ GENERATING 3 PARAPHRASES (This may take 10-15 seconds)...")
        print("="*90)
        
        # Protect technical elements
        protected, placeholder_map = self.protector.protect(text)
        
        # Define 3 prompting strategies
        prompts = {
            "Option 1 - Academic Restructure": self._create_prompt(protected, "academic"),
            "Option 2 - Concise Professional": self._create_prompt(protected, "concise"),
            "Option 3 - Technical Elaboration": self._create_prompt(protected, "technical")
        }
        
        results = {}
        
        for i, (style, prompt) in enumerate(prompts.items(), 1):
            print(f"\n‚è≥ {i}/3 - Generating {style}...")
            
            try:
                # Call Groq API
                response = self.client.chat.completions.create(
                    model=self.model,
                    messages=[
                        {
                            "role": "system", 
                            "content": "You are an expert academic writer. Paraphrase scientific text while preserving exact meaning and technical accuracy. Output ONLY the paraphrased text, no explanations."
                        },
                        {"role": "user", "content": prompt}
                    ],
                    temperature=0.7,
                    max_tokens=1024,
                    top_p=0.9
                )
                
                # Extract and clean
                paraphrased = response.choices[0].message.content.strip()
                style_type = "academic" if "Academic" in style else "concise" if "Concise" in style else "technical"
                paraphrased = self._clean_output(paraphrased, style_type)
                restored = self.protector.restore(paraphrased, placeholder_map)
                
                # Analyze quality
                quality = self.analyzer.analyze(text, restored)
                # Assess humanness
                humanness = humanizer.assess_humanness(restored)
                results[f"Option {i}"] = {
                    'style': style,
                    'text': restored,
                    'lexical_overlap': quality['lexical_overlap'],
                    'semantic_similarity': quality['semantic_similarity'],
                    'quality_status': quality['status'],
                    'humanness_score': humanness['humanness_score'],
                    'humanness_status': humanness['status']
                }
                
                print(f"   ‚úÖ Done!")
                
            except Exception as e:
                print(f"   ‚ùå Error: {e}")
                results[f"Option {i}"] = {
                    'style': style,
                    'text': f"Error: {str(e)}",
                    'lexical_overlap': 0.0,
                    'semantic_similarity': 0.0,
                    'quality_status': "ERROR"
                }
        
        # Display results
        self._display_results(results)
        
        return results
    
    def _create_prompt(self, text: str, style: str) -> str:
        """Create style-specific prompts"""
        
        base = """Paraphrase this text while:
1. Preserving EXACT original meaning
2. Maintaining technical accuracy
3. Keeping ALL placeholders unchanged (like <CITATION_1>, <URL_1>)
4. Using different sentence structures and vocabulary

"""
        
        styles = {
            "academic": "Style: Academic restructure - Rearrange sentences, scholarly language, formal tone.",
            "concise": "Style: Concise professional - Reduce wordiness, direct statements, clear and brief.",
            "technical": "Style: Technical elaboration - Emphasize technical details, add clarity, maintain precision."
        }
        
        return base + styles[style] + f"\n\nText:\n{text}\n\nParaphrase:"
    
    def _clean_output(self, text: str, style: str = "academic") -> str:
        
        """Clean and HUMANIZE LLM output"""
        # Basic cleaning
        text = re.sub(r'^(Paraphrase:|Here is|Here\'s|The paraphrased).*?:\s*', '', text, flags=re.IGNORECASE)
        text = text.strip()
    
        # APPLY HUMANIZATION (NEW!)
        text = humanizer.humanize(text, style)
    
        return text

    
    def _display_results(self, results: Dict):
        """Display formatted results"""
        print("\n" + "="*90)
        print("‚úÖ PARAPHRASING COMPLETE!")
        print("="*90)
        
        for option, data in results.items():
            print(f"\n{'='*90}")
            print(f"üìù {option.upper()}: {data['style']}")
            print('='*90)
            print(data['text'])
            print(f"\nüìä Quality Metrics:")
            print(f"   ‚Ä¢ Lexical Overlap: {data['lexical_overlap']:.1%}")
            print(f"   ‚Ä¢ Semantic Similarity: {data['semantic_similarity']:.1%}")
            print(f"   ‚Ä¢ Quality Status: {data['quality_status']}")
            print(f"   ‚Ä¢ Humanness Score: {data['humanness_score']:.0f}/100 ({data['humanness_status']})")
            print()
        
        print("="*90)
        print("üí° REMEMBER: Always cite your sources in the manuscript!")
        print("="*90)

# Initialize the paraphraser
llm_paraphraser = LLMParaphraser(groq_client, protector, analyzer)
print("‚úÖ LLM Paraphraser ready with Llama 3.3 70B (70 billion parameters)!")


‚úÖ LLM Paraphraser ready with Llama 3.3 70B (70 billion parameters)!


In [7]:
# CELL 5.5: Humanization Processor

class HumanizationProcessor:
    """Advanced humanization to make AI text more natural"""
    
    def __init__(self):
        # Robotic phrases to replace
        self.robotic_replacements = {
            # Formal ‚Üí Natural
            'It is important to note that': 'Notably,',
            'It should be noted that': 'Note that',
            'It is worth mentioning that': 'Interestingly,',
            'In order to': 'To',
            'Due to the fact that': 'Because',
            'In light of the fact that': 'Since',
            'For the purpose of': 'To',
            'With regard to': 'Regarding',
            'In the event that': 'If',
            'At this point in time': 'Now',
            
            # Academic stiffness ‚Üí Flow
            'It is evident that': 'Clearly,',
            'It can be observed that': 'Observations show',
            'It has been shown that': 'Research shows',
            'It is well established that': 'Studies confirm',
            
            # Passive ‚Üí Active voice improvements
            'are characterized by': 'exhibit',
            'are composed of': 'consist of',
            'are associated with': 'relate to',
            'are utilized': 'use',
            'is utilized': 'uses',
            
            # Wordiness ‚Üí Conciseness
            'a majority of': 'most',
            'a number of': 'several',
            'at the present time': 'currently',
            'during the course of': 'during',
            'in the absence of': 'without',
        }
        
        # AI-like phrases to avoid
        self.ai_tells = [
            'as an AI',
            'I cannot',
            'I am not able to',
            'based on my training',
            'here is the paraphrase',
            'here\'s the paraphrase',
            'the paraphrased version',
        ]
    
    def humanize(self, text: str, style: str = "academic") -> str:
        """Apply humanization transformations"""
        
        # Step 1: Remove AI artifacts
        text = self._remove_ai_artifacts(text)
        
        # Step 2: Replace robotic phrases
        text = self._replace_robotic_phrases(text)
        
        # Step 3: Improve flow
        text = self._improve_flow(text)
        
        # Step 4: Style-specific adjustments
        text = self._apply_style_adjustments(text, style)
        
        # Step 5: Final polish
        text = self._final_polish(text)
        
        return text
    
    def _remove_ai_artifacts(self, text: str) -> str:
        """Remove obvious AI-generated artifacts"""
        for phrase in self.ai_tells:
            text = re.sub(phrase, '', text, flags=re.IGNORECASE)
        
        # Remove meta-commentary
        text = re.sub(r'^(Paraphrase:|Here is|Here\'s|The following|Below is).*?:\s*', 
                     '', text, flags=re.IGNORECASE)
        
        return text.strip()
    
    def _replace_robotic_phrases(self, text: str) -> str:
        """Replace stiff academic phrases"""
        for robotic, natural in self.robotic_replacements.items():
            # Case-insensitive replacement
            pattern = re.compile(re.escape(robotic), re.IGNORECASE)
            text = pattern.sub(natural, text)
        
        return text
    
    def _improve_flow(self, text: str) -> str:
        """Improve sentence flow and transitions"""
        
        # Add variety to sentence starts
        sentences = text.split('. ')
        
        # Check for repetitive starts
        starts = [s.split()[0] if s.split() else '' for s in sentences]
        
        # If too many sentences start with "The", vary them
        if starts.count('The') > len(sentences) * 0.4:
            for i, s in enumerate(sentences):
                if s.startswith('The ') and i > 0:
                    # Add transition words occasionally
                    transitions = ['Additionally,', 'Furthermore,', 'Moreover,', '']
                    if i % 2 == 0:
                        sentences[i] = transitions[i % len(transitions)] + ' ' + s if transitions[i % len(transitions)] else s
        
        text = '. '.join(sentences)
        
        # Remove double spaces
        text = re.sub(r'\s+', ' ', text)
        
        return text
    
    def _apply_style_adjustments(self, text: str, style: str) -> str:
        """Apply style-specific humanization"""
        
        if style == "concise":
            # Remove excessive qualifiers
            text = re.sub(r'\b(very|extremely|highly|quite|rather)\s+', '', text)
            
            # Simplify complex structures
            text = text.replace('in which', 'where')
            text = text.replace('that which', 'what')
            
        elif style == "technical":
            # Ensure technical terms are precise
            # Keep technical language but make connectors natural
            text = text.replace('thus,', 'therefore,')
            text = text.replace('hence,', 'consequently,')
            
        elif style == "academic":
            # Balance formality with readability
            # Keep scholarly tone but improve flow
            pass
        
        return text
    
    def _final_polish(self, text: str) -> str:
        """Final polish for natural output"""
        
        # Ensure proper capitalization
        if text and text[0].islower():
            text = text[0].upper() + text[1:]
        
        # Ensure proper ending punctuation
        if text and text[-1] not in '.!?':
            text += '.'
        
        # Fix spacing around punctuation
        text = re.sub(r'\s+([.,;:!?])', r'\1', text)
        text = re.sub(r'([.,;:!?])(\w)', r'\1 \2', text)
        
        # Remove multiple punctuation
        text = re.sub(r'([.!?]){2,}', r'\1', text)
        
        # Remove extra spaces
        text = ' '.join(text.split())
        
        return text
    
    def assess_humanness(self, text: str) -> Dict:
        """Assess how human-like the text is"""
        
        # Check for AI tells
        ai_markers = sum(1 for phrase in self.ai_tells if phrase.lower() in text.lower())
        
        # Check for robotic phrases
        robotic_count = sum(1 for phrase in self.robotic_replacements.keys() 
                           if phrase.lower() in text.lower())
        
        # Check sentence variety
        sentences = text.split('.')
        sentence_starts = [s.strip().split()[0] if s.strip().split() else '' 
                          for s in sentences if s.strip()]
        
        variety_score = len(set(sentence_starts)) / max(len(sentence_starts), 1)
        
        # Calculate humanness score (0-100)
        humanness = 100
        humanness -= ai_markers * 20  # -20 per AI tell
        humanness -= robotic_count * 5  # -5 per robotic phrase
        humanness += variety_score * 10  # +10 for variety
        
        humanness = max(0, min(100, humanness))
        
        return {
            'humanness_score': humanness,
            'ai_markers': ai_markers,
            'robotic_phrases': robotic_count,
            'sentence_variety': variety_score,
            'status': 'EXCELLENT' if humanness >= 80 else 'GOOD' if humanness >= 60 else 'NEEDS_WORK'
        }

humanizer = HumanizationProcessor()
print("‚úÖ Humanization processor loaded!")
print("   ‚Ä¢ Removes robotic phrases")
print("   ‚Ä¢ Improves sentence flow")
print("   ‚Ä¢ Enhances naturalness")


‚úÖ Humanization processor loaded!
   ‚Ä¢ Removes robotic phrases
   ‚Ä¢ Improves sentence flow
   ‚Ä¢ Enhances naturalness


In [8]:
# CELL 6: Simple Interface Function

def paraphrase(text, sources=None):
    """
    Simple paraphrasing interface
    
    Args:
        text (str): Text to paraphrase
        sources (list): List of citation sources (optional but recommended)
    
    Returns:
        dict: Dictionary with 3 paraphrased options
    
    Example:
        results = paraphrase("Your text here", ["Source 1", "Source 2"])
        option1 = results["Option 1"]['text']
    """
    return llm_paraphraser.paraphrase(text, sources)

print("‚úÖ Simple interface loaded!")
print("\n" + "="*70)
print("üìñ USAGE INSTRUCTIONS:")
print("="*70)
print('1. results = paraphrase("Your text...", ["Source"])')
print('2. option1 = results["Option 1"][\'text\']')
print('3. option2 = results["Option 2"][\'text\']')
print('4. option3 = results["Option 3"][\'text\']')
print("="*70)


‚úÖ Simple interface loaded!

üìñ USAGE INSTRUCTIONS:
1. results = paraphrase("Your text...", ["Source"])
2. option1 = results["Option 1"]['text']
3. option2 = results["Option 2"]['text']
4. option3 = results["Option 3"]['text']


In [9]:
# CELL 6.5: Compare Before/After Humanization

def compare_humanization(original_text: str, show_details: bool = True):
    """
    Compare text before and after humanization
    
    Usage:
        compare_humanization("Your text here")
    """
    
    print("="*80)
    print("üî¨ HUMANIZATION COMPARISON TOOL")
    print("="*80)
    
    print("\nüìÑ ORIGINAL TEXT:")
    print("-"*80)
    print(original_text)
    
    # Assess original
    original_score = humanizer.assess_humanness(original_text)
    
    print(f"\nüìä Original Humanness: {original_score['humanness_score']:.0f}/100")
    print(f"   Status: {original_score['status']}")
    
    if show_details:
        print(f"   ‚Ä¢ AI Markers: {original_score['ai_markers']}")
        print(f"   ‚Ä¢ Robotic Phrases: {original_score['robotic_phrases']}")
        print(f"   ‚Ä¢ Sentence Variety: {original_score['sentence_variety']:.2f}")
    
    # Humanize
    humanized = humanizer.humanize(original_text, "academic")
    
    print("\n‚ú® HUMANIZED TEXT:")
    print("-"*80)
    print(humanized)
    
    # Assess humanized
    humanized_score = humanizer.assess_humanness(humanized)
    
    print(f"\nüìä Humanized Score: {humanized_score['humanness_score']:.0f}/100")
    print(f"   Status: {humanized_score['status']}")
    
    if show_details:
        print(f"   ‚Ä¢ AI Markers: {humanized_score['ai_markers']}")
        print(f"   ‚Ä¢ Robotic Phrases: {humanized_score['robotic_phrases']}")
        print(f"   ‚Ä¢ Sentence Variety: {humanized_score['sentence_variety']:.2f}")
    
    improvement = humanized_score['humanness_score'] - original_score['humanness_score']
    
    print("\nüìà IMPROVEMENT:")
    print(f"   {'+' if improvement >= 0 else ''}{improvement:.0f} points")
    
    print("="*80)
    
    return {
        'original': original_text,
        'humanized': humanized,
        'original_score': original_score,
        'humanized_score': humanized_score,
        'improvement': improvement
    }

print("‚úÖ Humanization comparison tool loaded!")
print("\nUsage: compare_humanization(\"Your text...\")")


‚úÖ Humanization comparison tool loaded!

Usage: compare_humanization("Your text...")


In [10]:
# CELL 7: Run Paraphrasing

# ============================================================================
# YOUR TEXT HERE - Replace with any paragraph you want to paraphrase
# ============================================================================

my_text = """DL has revolutionized cancer genomics by enhancing diagnostic accuracy and enabling personalized medicine through the development of advanced computational models. These systems integrate genomic data with other diagnostic tools, such as radiomics and pathology imaging, to create a more comprehensive framework for cancer detection, thereby improving clinical decision making. One key challenge in genomic analysis is the presence of imbalanced data sets, which can lead to biased predictions. To address this, methods like SMOTE-Tomek resampling help balance training data, making DL models more robust and generalizable across patient populations."""

# ============================================================================
# YOUR SOURCES HERE - Replace with your actual citations
# ============================================================================

my_sources = [
   
]

# ============================================================================
# GENERATE PARAPHRASES
# ============================================================================

results = paraphrase(my_text, my_sources)

# ============================================================================
# ACCESS INDIVIDUAL PARAPHRASES
# ============================================================================

option1_text = results["Option 1"]['text']
option2_text = results["Option 2"]['text']
option3_text = results["Option 3"]['text']

print("\n" + "="*90)
print("üíæ PARAPHRASES SAVED TO VARIABLES")
print("="*90)
print("‚Ä¢ option1_text - Academic Restructure")
print("‚Ä¢ option2_text - Concise Professional")
print("‚Ä¢ option3_text - Technical Elaboration")
print("="*90)
print("\nüí° TIP: Copy the best paraphrase for your manuscript, then CITE YOUR SOURCES!")


üéì ADVANCED PARAPHRASING TOOL - POWERED BY LLAMA 3.3 70B

üìÑ ORIGINAL TEXT (650 characters):
------------------------------------------------------------------------------------------
DL has revolutionized cancer genomics by enhancing diagnostic accuracy and enabling personalized medicine through the development of advanced computational models. These systems integrate genomic data with other diagnostic tools, such as radiomics and pathology imaging, to create a more comprehensive framework for cancer detection, thereby improving clinical decision making. One key challenge in genomic analysis is the presence of imbalanced data sets, which can lead to biased predictions. To address this, methods like SMOTE-Tomek resampling help balance training data, making DL models more robust and generalizable across patient populations.
------------------------------------------------------------------------------------------

‚ö†Ô∏è  Remember to cite your sources!

üîÑ GENERATING 3 PARAPHRASES

**STRICT RESTRICTED LICENSE (SRL-1.0)**

**Copyright (c) 2026 Argha Mukherjee**

**Permission Notice**

1.  **Grant of Rights:** 1.1 Subject to the terms below, the copyright holder **Argha Mukherjee** (‚ÄúCopyright Holder‚Äù) grants to any person or entity a non-exclusive, worldwide license to use, reproduce, and distribute the Software (defined below). 1.2 The license granted under Section 1.1 is conditional. Additional obligations apply if the recipient modifies the Software or uses the Software for commercial purposes as set out in Sections 2 and 3.
    
2.  **Definitions:**
    
    *   ‚ÄúSoftware‚Äù means the source code, object code, documentation, examples, and other material distributed by the Copyright Holder under this license.
        
    *   ‚ÄúModify‚Äù or ‚ÄúModification‚Äù means any change, adaptation, translation, translation into another programming language, compilation, patch, removal, or derivative work based on the Software.
        
    *   ‚ÄúCommercial Use‚Äù means any use of the Software that results in revenue, including but not limited to sale, licensing, sublicensing, renting, subscription services, hosting services, inclusion in paid products or services, or any activity intended for monetary gain.
        
    *   ‚ÄúGross Revenue‚Äù means the total amounts received by the licensee (and its affiliates) from any Commercial Use directly attributable to the Software, before any deductions for costs, taxes, refunds, credits, or expenses.
        
3.  **Modifications and Attribution (required):** 3.1 Modifications are permitted only on the following conditions:a) Full Credit ‚Äî Any modified version, derivative work, or work that contains portions of the Software must include a conspicuous attribution crediting the Copyright Holder as follows:‚ÄúPortions ¬© 2026 Argha Mukherjee. Original Software licensed under SRL-1.0.‚ÄùThe attribution must be included in:- a NOTICE or README file distributed with the software,- any about or credits screen for user-facing products,- prominent documentation and product websites where the product is described.b) Modification Notice ‚Äî All modified source files must contain a header comment describing what was changed, the author of the modification, and the date of modification.c) No Removal ‚Äî The original copyright notice and this license text must be preserved in all copies and distributed forms.
    
4.  **Commercial Use, Payment, and Reporting (mandatory):** 4.1 Commercial Permission ‚Äî Commercial Use of the Software is permitted only after (a) providing written notice to the Copyright Holder at the contact email below, and (b) complying with the payment and reporting obligations in this Section 4.4.2 Revenue Share ‚Äî The licensee must pay the Copyright Holder **a minimum of fifty percent (50%) of Gross Revenue** derived from any Commercial Use that incorporates, is based on, or benefits from the Software.4.3 Payment Schedule and Reports:a) Payment Frequency ‚Äî Payments of the revenue share are due quarterly within thirty (30) days after the end of each calendar quarter.b) Reporting ‚Äî With each payment the licensee must deliver a written report that shows how Gross Revenue was calculated, the relevant sales/transaction records, and the computation supporting the payment.4.4 Audit Right ‚Äî The Copyright Holder (or an independent auditor chosen by the Copyright Holder) may, upon reasonable notice and no more than once each calendar year, inspect relevant financial records of the licensee to verify Gross Revenue. If an audit reveals underpayment by more than five percent (5%), the licensee will reimburse the cost of the audit.4.5 Interest and Remedies ‚Äî Late payments will accrue interest at the lesser of (a) 1.5% per month, or (b) the maximum rate permitted by applicable law. Nonpayment or material breach of these payment terms entitles the Copyright Holder to injunctive relief, termination of this license, and recovery of damages.
    
5.  **Redistribution:** 5.1 Unmodified redistribution of the Software (source or binary) is permitted provided this license and the copyright and attribution notices are preserved and no fee is charged for the Software itself.5.2 Redistribution that constitutes Commercial Use (see Section 3 and 4) triggers the payment obligations in Section 4.
    
6.  **Warranties, Liability, and Indemnity:** 6.1 THE SOFTWARE IS PROVIDED ‚ÄúAS IS‚Äù, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED.6.2 IN NO EVENT SHALL THE COPYRIGHT HOLDER BE LIABLE FOR ANY INDIRECT, INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES.6.3 The licensee agrees to indemnify and hold harmless the Copyright Holder from third-party claims arising from the licensee‚Äôs Commercial Use or modifications of the Software.
    
7.  **Termination:** 7.1 This license terminates automatically if the licensee fails to comply with any material term of the license (including payment obligations and attribution requirements).7.2 Termination does not relieve the licensee of any accrued payment obligations or liabilities that arose prior to termination.
    
8.  **Governing Law and Jurisdiction:** 8.1 This license is governed by the laws of India. The parties submit to the exclusive jurisdiction of the courts located in Kolkata, India for resolution of any disputes, unless otherwise agreed in writing.
    
9.  **Severability:** 9.1 If any provision of this license is found invalid or unenforceable, the remainder of the license remains in force to the fullest extent permitted by law.
    
10.  **Contact** for Permissions, Reporting, and Payment
    
*   **Email: (arghamukherjee1998@gmail.com)**
    

11.  **Acceptance:**
    11.1 By using, modifying, or distributing the Software, the licensee agrees to be bound by the terms of this license.