## Importing Necessary Libraries

In [1]:
import pandas as pd
import requests
from bs4 import BeautifulSoup
import nltk
from nltk.corpus import stopwords
import string
from nltk.tokenize import sent_tokenize
from nltk.tokenize import word_tokenize
from textblob import TextBlob
from nltk.corpus import cmudict 

nltk.download('punkt')
nltk.download('stopwords')
nltk.download('cmudict')

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\abine\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\abine\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package cmudict to
[nltk_data]     C:\Users\abine\AppData\Roaming\nltk_data...
[nltk_data]   Package cmudict is already up-to-date!


True

* Imports the pandas library, which is a powerful tool for data manipulation and analysis in Python. It's widely used for handling and analyzing data structures like data frames.

* The requests library is used for making HTTP requests in Python. It allows you to send HTTP requests to a specified URL and get the response back, which can be useful for web scraping and interacting with APIs.

* BeautifulSoup is a library used for parsing HTML and XML documents. It creates a parse tree for parsed pages that can be used to extract data from HTML, which is useful for web scraping.

* NLTK (Natural Language Toolkit) is a comprehensive library for natural language processing (NLP) in Python. It provides tools for working with human language data, including tokenization, tagging, parsing, and semantic reasoning.

* This line imports the stopwords module from NLTK, which provides a list of common words that are usually filtered out in text processing, such as 'the', 'is', and 'in'.

* The string module provides a collection of string operations and constants. It is useful for tasks such as removing punctuation from text.

* Importing the sent_tokenize function from NLTK, which is used to split text into sentences.

* Importing the word_tokenize function from NLTK, which is used to split text into words (tokens).

* TextBlob is a library for processing textual data. It provides a simple API for common NLP tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.

* Importing the cmudict module from NLTK, which is a pronouncing dictionary that can be used to find the number of syllables in a word, among other phonetic information.

* Downloading the 'punkt' package from NLTK, which includes pre-trained models for tokenizing text into sentences and words.

* Downloading the 'stopwords' corpus from NLTK, which contains lists of common stopwords for various languages.

* Downloading the 'cmudict' corpus from NLTK, which is the Carnegie Mellon University Pronouncing Dictionary, useful for phonetic and syllable analysis of words.

## Importing the Input Data and Assigning it to a DataFrame

In [2]:
input_file = 'Input.xlsx'

df = pd.read_excel('C:/Users/abine/Desktop/Jupyter Notebook/BlackCoffer Project/Input.xlsx', header=0)

df

Unnamed: 0,URL_ID,URL
0,bctech2011,https://insights.blackcoffer.com/ml-and-ai-bas...
1,bctech2012,https://insights.blackcoffer.com/streamlined-i...
2,bctech2013,https://insights.blackcoffer.com/efficient-dat...
3,bctech2014,https://insights.blackcoffer.com/effective-man...
4,bctech2015,https://insights.blackcoffer.com/streamlined-t...
...,...,...
142,bctech2153,https://insights.blackcoffer.com/population-an...
143,bctech2154,https://insights.blackcoffer.com/google-lsa-ap...
144,bctech2155,https://insights.blackcoffer.com/healthcare-da...
145,bctech2156,https://insights.blackcoffer.com/budget-sales-...


## Defining Functions for Text Analysis

In [3]:
# Function to extract article title from URL
def extract_article_title(url):
    try:
        # Use requests to fetch webpage content
        response = requests.get(url)
        response.raise_for_status()
        html = response.text
        soup = BeautifulSoup(html, 'html.parser')
        
        # Extract article title
        article_title = soup.find('title').text.strip() if soup.find('title') else ''
        
        # Remove '- Blackcoffer Insights' from the end of the title
        if article_title.endswith('- Blackcoffer Insights'):
            article_title = article_title.replace('- Blackcoffer Insights', '').strip()
        
        return article_title
    
    except requests.exceptions.RequestException as e:
        print(f"Error fetching URL: {url}. Exception: {e}")
        return None
    
    except Exception as e:
        print(f"Error extracting article title from URL: {url}. Exception: {e}")
        return None

The function extract_article_title(url) retrieves and processes the title of a webpage given its URL.

* Fetch Webpage Content:
Uses the requests library to get the HTML content of the webpage.
Raises an exception if the HTTP request fails.

* Parse HTML:
Uses BeautifulSoup to parse the HTML content of the webpage.

* Extract Title:
Finds and extracts the text within the <title> tag.
Strips any leading/trailing whitespace from the title.
    
* Remove Specific Suffix:
If the title ends with '- Blackcoffer Insights', this suffix is removed.
    
* Error Handling:
Catches and prints exceptions related to HTTP requests and other potential errors during extraction.
Returns None in case of an error.

## Creating a list of Positive and Negative Words

### Positive Dictionary

In [4]:
# Sample Positive Dictionary
positive_dictionary = ["a+", "abound", "abounds", "abundance", "abundant", "accessable", "accessible", 
    "acclaim", "acclaimed", "acclamation", "accolade", "accolades", "accommodative", 
    "accomodative", "accomplish", "accomplished", "accomplishment", "accomplishments", 
    "accurate", "accurately", "achievable", "achievement", "achievements", "achievible", 
    "acumen", "adaptable", "adaptive", "adequate", "adjustable", "admirable", "admirably", 
    "admiration", "admire", "admired", "admires", "admiring", "admiringly", "adorable", 
    "adore", "adored", "adorer", "adoring", "adoringly", "adroit", "adroitly", "adulate", 
    "adulation", "adulatory", "advanced", "advantage", "advantageous", "advantageously", 
    "advantages", "adventuresome", "adventurous", "advocate", "advocated", "advocates", 
    "affability", "affable", "affably", "affectation", "affection", "affectionate", 
    "affinity", "affirm", "affirmation", "affirmative", "affluence", "affluent", 
    "afford", "affordable", "affordably", "afordable", "agile", "agilely", "agility", 
    "agreeable", "agreeableness", "agreeably", "all-around", "alluring", "alluringly", 
    "altruistic", "altruistically", "amaze", "amazed", "amazement", "amazes", "amazing", 
    "amazingly", "ambitious", "ambitiously", "ameliorate", "amenable", "amenity", 
    "amiability", "amiabily", "amiable", "amicability", "amicable", "amicably", 
    "amity", "ample", "amply", "amuse", "amusing", "amusingly", "angel", "angelic", 
    "apotheosis", "appeal", "appealing", "applaud", "appreciable", "appreciate", 
    "appreciated", "appreciates", "appreciative", "appreciatively", "appropriate", 
    "approval", "approve", "ardent", "ardently", "ardor", "articulate", "aspiration", 
    "aspirations", "aspire", "assurance", "assurances", "assure", "assuredly", 
    "assuring", "astonish", "astonished", "astonishing", "astonishingly", "astonishment", 
    "astound", "astounded", "astounding", "astoundingly", "astutely", "attentive", 
    "attraction", "attractive", "attractively", "attune", "audible", "audibly", 
    "auspicious", "authentic", "authoritative", "autonomous", "available", "aver", 
    "avid", "avidly", "award", "awarded", "awards", "awe", "awed", "awesome", "awesomely", 
    "awesomeness", "awestruck", "awsome", "backbone", "balanced", "bargain", "beauteous", 
    "beautiful", "beautifullly", "beautifully", "beautify", "beauty", "beckon", "beckoned", 
    "beckoning", "beckons", "believable", "believeable", "beloved", "benefactor", 
    "beneficent", "beneficial", "beneficially", "beneficiary", "benefit", "benefits", 
    "benevolence", "benevolent", "benifits", "best", "best-known", "best-performing", 
    "best-selling", "better", "better-known", "better-than-expected", "beutifully", 
    "blameless", "bless", "blessing", "bliss", "blissful", "blissfully", "blithe", 
    "blockbuster", "bloom", "blossom", "bolster", "bonny", "bonus", "bonuses", "boom", 
    "booming", "boost", "boundless", "bountiful", "brainiest", "brainy", "brand-new", 
    "brave", "bravery", "bravo", "breakthrough", "breakthroughs", "breathlessness", 
    "breathtaking", "breathtakingly", "breeze", "bright", "brighten", "brighter", 
    "brightest", "brilliance", "brilliances", "brilliant", "brilliantly", "brisk", 
    "brotherly", "bullish", "buoyant", "cajole", "calm", "calming", "calmness", 
    "capability", "capable", "capably", "captivate", "captivating", "carefree", 
    "cashback", "cashbacks", "catchy", "celebrate", "celebrated", "celebration", 
    "celebratory", "champ", "champion", "charisma", "charismatic", "charitable", 
    "charm", "charming", "charmingly", "chaste", "cheaper", "cheapest", "cheer", 
    "cheerful", "cheery", "cherish", "cherished", "cherub", "chic", "chivalrous", 
    "chivalry", "civility", "civilize", "clarity", "classic", "classy", "clean", 
    "cleaner", "cleanest", "cleanliness", "cleanly", "clear", "clear-cut", "cleared", 
    "clearer", "clearly", "clears", "clever", "cleverly", "cohere", "coherence", 
    "coherent", "cohesive", "colorful", "comely", "comfort", "comfortable", "comfortably", 
    "comforting", "comfy", "commend", "commendable", "commendably", "commitment", 
    "commodious", "compact", "compactly", "compassion", "compassionate", "compatible", 
    "competitive", "complement", "complementary", "complemented", "complements", 
    "compliant", "compliment", "complimentary", "comprehensive", "conciliate", 
    "conciliatory", "concise", "confidence", "confident", "congenial", "congratulate", 
    "congratulation", "congratulations", "congratulatory", "conscientious", "considerate", 
    "consistent", "consistently", "constructive", "consummate", "contentment", "continuity", 
    "contrasty", "contribution", "convenience", "convenient", "conveniently", "convience", 
    "convienient", "convient", "convincing", "convincingly", "cool", "coolest", "cooperative", 
    "cooperatively", "cornerstone", "correct", "correctly", "cost-effective", "cost-saving", 
    "counter-attack", "counter-attacks", "courage", "courageous", "courageously", "courageousness", 
    "courteous", "courtly", "covenant", "cozy", "creative", "credence", "credible", "crisp", 
    "crisper", "cure", "cure-all", "cushy", "cute", "cuteness", "danke", "danken", "daring", 
    "daringly", "darling", "dashing", "dauntless", "dawn", "dazzle", "dazzled", "dazzling", 
    "dead-cheap", "dead-on", "decency", "decent", "decisive", "decisiveness", "dedicated", 
    "defeat", "defeated", "defeating", "defeats", "defender", "deference", "deft", "deginified", 
    "delectable", "delicacy", "delicate", "delicious", "delight", "delighted", "delightful", 
    "delightfully", "delightfulness", "dependable", "dependably", "deservedly", "deserving", 
    "desirable", "desiring", "desirous", "destiny", "detachable", "devout", "dexterous", 
    "dexterously", "dextrous", "dignified", "dignify", "dignity", "diligence", "diligent", 
    "diligently", "diplomatic", "dirt-cheap", "distinction", "distinctive", "distinguished", 
    "diversified", "divine", "divinely", "dominate", "dominated", "dominates", "dote", "dotingly", 
    "doubtless", "dreamland", "dumbfounded", "dumbfounding", "dummy-proof", "durable", "dynamic", 
    "eager", "eagerly", "eagerness", "earnest", "earnestly", "earnestness", "ease", "eased", 
    "eases", "easier", "easiest", "easiness", "easing", "easy", "easy-to-use", "easygoing", 
    "ebullience", "ebullient", "ebulliently", "ecenomical", "economical", "ecstasies", 
    "ecstasy", "ecstatic", "ecstatically", "edify", "educated", "effective", "effectively", 
    "effectiveness", "effectual", "efficacious", "efficient", "efficiently", "effortless", 
    "effortlessly", "effusion", "effusive", "effusively", "effusiveness", "elan", "elate", 
    "elated", "elatedly", "elation", "electrify", "elegance", "elegant", "elegantly", 
    "elevate", "elite", "eloquence", "eloquent", "eloquently", "embolden", "eminence", 
    "eminent", "empathize", "empathy", "empower", "empowerment", "enchant", "enchanted", 
    "enchanting", "enchantingly", "encourage", "encouragement", "encouraging", 
    "encouragingly", "endear", "endearing", "endorse", "endorsed", "endorsement", 
    "endorses", "endorsing", "energetic", "energize", "energy-efficient", "energy-saving", 
    "engaging", "engrossing", "enhance", "enhanced", "enhancement", "enhances", "enjoy", 
    "enjoyable", "enjoyably", "enjoyed", "enjoying", "enjoyment", "enjoys", "enlighten", 
    "enlightenment", "enliven", "ennoble", "enough", "enrapt", "enrapture", "enraptured", 
    "enrich", "enrichment", "enterprising", "entertain", "entertaining", "entertains", 
    "enthral", "enthrall", "enthralled", "enthuse", "enthusiasm", "enthusiast", 
    "enthusiastic", "enthusiastically", "entice", "enticed", "enticing", "enticingly", 
    "entranced", "entrancing", "entrust", "enviable", "enviably", "envious", "enviously", 
    "enviousness", "envy", "equitable", "ergonomical", "err-free", "erudite", "ethical", 
    "eulogize", "euphoria", "euphoric", "euphorically", "evaluative", "evenly", "eventful", 
    "everlasting", "evocative", "exalt", "exaltation", "exalted", "exaltedly", "exalting", 
    "exaltingly", "examplar", "examplary", "exceed", "exceeded", "exceeding", "exceedingly", 
    "exceeds", "excel", "exceled", "excelent", "excellant", "excelled", "excellence", 
    "excellency", "excellent", "excellently", "excels", "exceptional", "exceptionally", 
    "excite", "excited", "excitedly", "excitedness", "excitement", "excites", "exciting", 
    "excitingly", "exellent", "exemplar", "exemplary", "exhilarate", "exhilarating", 
    "exhilaratingly", "exhilaration", "exonerate", "expansive", "expeditiously", "expertly", 
    "exquisite", "exquisitely", "extol", "extoll", "extraordinarily", "extraordinary", 
    "exuberance", "exuberant", "exuberantly", "exult", "exultant", "exultation", "exultingly", 
    "eye-catch", "eye-catching", "eyecatch", "eyecatching", "fabulous", "fabulously", "facilitate", 
    "fair", "fairly", "fairness", "faith", "faithful", "faithfully", "faithfulness", "fame", "famed", 
    "famous", "famously", "fancier", "fancinating", "fancy", "fanfare", "fans", "fantastic", 
    "fantastically", "fascinate", "fascinating", "fascinatingly", "fascination", "fashionable", 
    "fashionably", "fast", "fast-growing", "fast-paced", "faster", "fastest", "fastest-growing", 
    "faultless", "fav", "fave", "favor", "favorable", "favored", "favorite", "favorited", "favour", 
    "fearless", "fearlessly", "feasible", "feasibly", "feat", "feature-rich", "fecilitous", "feisty", 
    "felicitate", "felicitous", "felicity", "fertile", "fervent", "fervently", "fervid", "fervidly", 
    "fervor", "festive", "fidelity", "fiery", "fine", "fine-looking", "finely", "finer", "finest", 
    "firmer", "first-class", "first-in-class", "first-rate", "flashy", "flatter", "flattering", 
    "flatteringly", "flawless", "flawlessly", "flexibility", "flexible", "flourish", "flourishing", 
    "fluent", "flutter", "fond", "fondly", "fondness", "foolproof", "foremost", "foresight", 
    "formidable", "fortitude", "fortuitous", "fortuitously", "fortunate", "fortunately", 
    "fortune", "fragrant", "free", "freed", "freedom", "freedoms", "fresh", "fresher", "freshest", 
    "friendliness", "friendly", "frolic", "frugal", "fruitful", "ftw", "fulfillment", "fun", 
    "futurestic", "futuristic", "gaiety", "gaily", "gain", "gained", "gainful", "gainfully", 
    "gaining", "gains", "gallant", "gallantly", "galore", "geekier", "geeky", "gem", "gems", 
    "generosity", "generous", "generously", "genial", "genius", "gentle", "gentlest", 
    "genuine", "gifted", "glad", "gladden", "gladly", "gladness", "glamorous", "glee", 
    "gleeful", "gleefully", "glimmer", "glimmering", "glisten", "glistening", "glitter", 
    "glitz", "glorify", "glorious", "gloriously", "glory", "glow", "glowing", "glowingly", 
    "god-given", "god-send", "godlike", "godsend", "gold", "golden", "good", "goodly", 
    "goodness", "goodwill", "goood", "gooood", "gorgeous", "gorgeously", "grace", 
    "graceful", "gracefully", "gracious", "graciously", "graciousness", "grand", "grandeur", 
    "grateful", "gratefully", "gratification", "gratified", "gratifies", "gratify", 
    "gratifying", "gratifyingly", "gratitude", "great", "greatest", "greatness", "grin", 
    "groundbreaking", "guarantee", "guidance", "guiltless", "gumption", "gush", "gusto", 
    "gutsy", "hail", "halcyon", "hale", "hallmark", "hallmarks", "hallowed", "handier", 
    "handily", "hands-down", "handsome", "handsomely", "handy", "happier", "happily", 
    "happiness", "happy", "hard-working", "hardier", "hardy", "harmless", "harmonious", 
    "harmoniously", "harmonize", "harmony", "headway", "heal", "healthful", "healthy", 
    "hearten", "heartening", "heartfelt", "heartily", "heartwarming", "heaven", "heavenly", 
    "helped", "helpful", "helping", "hero", "heroic", "heroically", "heroine", "heroize", 
    "heros", "high-quality", "high-spirited", "hilarious", "holy", "homage", "honest", 
    "honesty", "honor", "honorable", "honored", "honoring", "hooray", "hopeful", 
    "hopefully", "hopefulness", "hopes", "hoping", "hospitable", "hot", "hotcake", 
    "hotcakes", "hottest", "hug", "humane", "humble", "humility", "humor", "humorous", 
    "humorously", "humour", "humourous", "ideal", "idealize", "ideally", "idol", 
    "idolize", "idolized", "idyllic", "illuminate", "illuminati", "illuminating", 
    "illumine", "illustrious", "ilu", "imaculate", "imaginative", "immaculate", 
    "immaculately", "immense", "impartial", "impartiality", "impartially", "impassioned", 
    "impeccable", "impeccably", "important", "impress", "impressed", "impresses", 
    "impressive", "impressively", "impressiveness", "improve", "improved", "improvement", 
    "improvements", "improves", "improving", "incredible", "incredibly", "indebted", 
    "individualized", "indulgence", "indulgent", "industrious", "inestimable", "inestimably", 
    "inexpensive", "infallibility", "infallible", "infallibly", "influential", "ingenious", 
    "ingeniously", "ingenuity", "ingenuous", "ingenuously", "innocuous", "innovation", 
    "innovative", "inpressed", "insightful", "insightfully", "inspiration", "inspirational", 
    "inspire", "inspiring", "instantly", "instructive", "instrumental", "integral", 
    "integrated", "intelligence", "intelligent", "intelligible", "interesting", 
    "interests", "intimacy", "intimate", "intricate", "intrigue", "intriguing", 
    "intriguingly", "intuitive", "invaluable", "invaluablely", "inventive", "invigorate", 
    "invigorating", "invincibility", "invincible", "inviolable", "inviolate", "invulnerable", 
    "irreplaceable", "irreproachable", "irresistible", "irresistibly", "issue-free", "jaw-droping", 
    "jaw-dropping", "jollify", "jolly", "jovial", "joy", "joyful", "joyfully", "joyous", "joyously", 
    "jubilant", "jubilantly", "jubilate", "jubilation", "jubiliant", "judicious", "justly", "keen", 
    "keenly", "keenness", "kid-friendly", "kindliness", "kindly", "kindness", "knowledgeable", 
    "kudos", "large-capacity", "laud", "laudable", "laudably", "lavish", "lavishly", "law-abiding", 
    "lawful", "lawfully", "lead", "leading", "leads", "lean", "led", "legendary", "leverage", 
    "levity", "liberate", "liberation", "liberty", "lifesaver", "light-hearted", "lighter", 
    "likable", "like", "liked", "likes", "liking", "lionhearted", "lively", "logical", "long-lasting", 
    "lovable", "lovably", "love", "loved", "loveliness", "lovely", "lover", "loves", "loving", 
    "low-cost", "low-price", "low-priced", "low-risk", "lower-priced", "loyal", "loyalty", "lucid", 
    "lucidly", "luck", "luckier", "luckiest", "luckiness", "lucky", "lucrative", "luminous", 
    "lush", "luster", "lustrous", "luxuriant", "luxuriate", "luxurious", "luxuriously", 
    "luxury", "lyrical", "magic", "magical", "magnanimous", "magnanimously", "magnetic", 
    "magnificence", "magnificent", "magnificently", "majestic", "majesty", "manageable", 
    "manifest", "manly", "mannerly", "marvel", "marveled", "marvelled", "marvellous", 
    "marvelous", "marvelously", "marvelousness", "marvels", "master", "masterful", 
    "masterfully", "masterpiece", "masterpieces", "masters", "mastery", "matchless", 
    "mature", "maturely", "maturity", "meaningful", "memorable", "merciful", "mercifully", 
    "mercy", "merit", "meritorious", "merrily", "merriment", "merriness", "merry", "mesmerize", 
    "mesmerized", "mesmerizes", "mesmerizing", "mesmerizingly", "meticulous", "meticulously", 
    "mightily", "mighty", "mind-blowing", "miracle", "miracles", "miraculous", "miraculously", 
    "miraculousness", "modern", "modest", "modesty", "momentous", "monumental", "monumentally", 
    "morality", "motivated", "multi-purpose", "navigable", "neat", "neatest", "neatly", "nice", 
    "nicely", "nicer", "nicest", "nifty", "nimble", "noble", "nobly", "noiseless", "non-violence", 
    "non-violent", "notably", "noteworthy", "nourish", "nourishing", "nourishment", "novelty", 
    "nurturing", "oasis", "obsession", "obsessions", "obtainable", "openly", "openness", "optimal", 
    "optimism", "optimistic", "opulent", "orderly", "originality", "outdo", "outdone", "outperform", 
    "outperformed", "outperforming", "outperforms", "outstanding", "outstandingly", "outshine", 
    "outshone", "outsmart", "outstanding", "ovation", "overjoyed", "overtake", "overtaken", 
    "overtakes", "overtaking", "overtook", "overture", "pain-free", "painless", "painlessly", 
    "palatial", "pamper", "pampered", "pamperedly", "pamperedness", "pampers", "panoramic", 
    "paradise", "paramount", "pardon", "passion", "passionate", "passionately", "patience", 
    "patient", "patiently", "patriot", "patriotic", "peace", "peaceable", "peaceful", "peacefully", 
    "peacekeepers", "peach", "peerless", "pep", "pepped", "pepping", "peppy", "peps", "perfect", 
    "perfection", "perfectly", "permissible", "perseverance", "persevere", "personages", 
    "personalized", "phenomenal", "phenomenally", "picturesque", "piety", "pinnacle", "playful", 
    "playfully", "pleasant", "pleasantly", "pleased", "pleases", "pleasing", "pleasingly", 
    "pleasurable", "pleasurably", "pleasure", "plentiful", "pluses", "plush", "plusses", 
    "poetic", "poeticize", "poignant", "poise", "poised", "polished", "polite", "politely", 
    "popular", "portable", "posh", "positive", "positively", "positives", "powerful", "powerfully", 
    "praise", "praiseworthy", "praising", "pre-eminent", "precious", "precise", "precisely", 
    "preeminent", "prefer", "preferable", "preferably", "prefered", "preferes", "preferring", 
    "prefers", "premier", "prestige", "prestigious", "prettily", "pretty", "priceless", "pride", 
    "principled", "privilege", "privileged", "prize", "proactive", "problem-free", "problem-solver", 
    "prodigious", "prodigiously", "prodigy", "productive", "productively", "proficient", "proficiently", 
    "profound", "profoundly", "profuse", "profusion", "progress", "progressive", "prolific", "prominent", 
    "prominence", "promise", "promised", "promises", "promising", "promoter", "prompt", "promptly", 
    "proper", "properly", "propitious", "propitiously", "pros", "prosper", "prosperity", "prosperous", 
    "prospros", "protect", "protection", "protective", "proud", "proven", "proves", "providence", 
    "proving", "prowess", "prudence", "prudent", "prudently", "punctual", "pure", "purify", "purposeful", 
    "quaint", "qualified", "qualify", "quicker", "quiet", "quieter", "radiance", "radiant", "rapid", 
    "rapport", "rapt", "rapture", "raptureous", "raptureously", "rapturous", "rapturously", "rational", 
    "razor-sharp", "reachable", "readable", "readily", "ready", "reaffirm", "reaffirmation", "realistic", 
    "realizable", "reasonable", "reasonably", "reasoned", "reassurance", "reassure", "receptive", 
    "reclaim", "recommend", "recommendation", "recommendations", "recommended", "reconcile", 
    "record-setting", "recover", "recovery", "rectification", "rectify", "rectifying", "redeem", 
    "redeeming", "redemption", "refine", "refined", "refinement", "reform", "reformed", "reforming", 
    "reforms", "refresh", "refreshed", "refreshing", "refund", "refunded", "regal", "regally", 
    "regard", "rejoice", "rejoicing", "rejoicingly", "rejuvenate", "rejuvenated", "rejuvenating", 
    "relaxed", "relent", "reliable", "reliably", "relief", "relish", "remarkable", "remarkably", 
    "remedy", "remission", "remunerate", "renaissance", "renewed", "renown", "renowned", "replaceable", 
    "reputable", "rescue", "rescued", "resilient", "resolute", "resound", "resounding", "resourceful", 
    "resourcefulness", "respect", "respectable", "respectful", "respectfully", "respite", "resplendent", 
    "responsibly", "responsive", "restful", "restored", "restructure", "restructured", "restructuring", 
    "retractable", "revel", "revelation", "revere", "revered", "reverence", "reverent", "reverently", 
    "revitalize", "revival", "revive", "revives", "revolutionize", "revolutionized", "revolutionizes", 
    "reward", "rewarding", "rewardingly", "rich", "richer", "richly", "richness", "right", "righten", 
    "righteous", "righteously", "righteousness", "rightful", "rightfully", "rightly", "rightness", 
    "risk-free", "robust", "rock-star", "rock-stars", "rockstar", "rockstars", "romantic", "romantically", 
    "romanticize", "roomier", "roomy", "rosy", "safe", "safely", "sagacity", "sagely", "saint", "saintliness", 
    "saintly", "salient", "salvation", "sane", "sanity", "satisfaction", "satisfactorily", "satisfactory", 
    "satisfied", "satisfies", "satisfy", "satisfying", "satisified", "saver", "savings", "savior", "savvy", 
    "scenic", "seamless", "seasoned", "secure", "securely", "selective", "self-determination", 
    "self-respect", "self-satisfaction", "self-sufficiency", "self-sufficient", "sensation", 
    "sensational", "sensationally", "sensations", "sensible", "sensibly", "sensitive", 
    "serene", "serenity", "sexy", "sharp", "sharper", "sharpest", "shimmering", "shimmeringly", 
    "shine", "shiny", "significant", "silent", "simpler", "simplest", "simplified", 
    "simplifies", "simplify", "simplifying", "sincere", "sincerely", "sincerity", "skill", 
    "skilled", "skillful", "skillfully", "sleek", "slick", "smart", "smarter", "smartest", 
    "smartly", "smile", "smiles", "smiling", "smilingly", "smitten", "smooth", "smoother", 
    "smoothes", "smoothest", "smoothly", "snappy", "snazzy", "sociable", "soft", "softer", 
    "solace", "solicitous", "solicitously", "solid", "solidarity", "soothe", "soothingly", 
    "sophisticated", "soulful", "soundly", "soundness", "spacious", "sparkle", "sparkling", 
    "spectacular", "spectacularly", "speedily", "speedy", "spellbind", "spellbinding", 
    "spellbindingly", "spellbound", "spirited", "spiritual", "splendid", "splendidly", 
    "splendor", "spontaneous", "sporty", "spotless", "sprightly", "stability", "stabilize", 
    "stable", "stainless", "standout", "state-of-the-art", "stately", "statuesque", 
    "staunch", "staunchly", "staunchness", "steadfast", "steadfastly", "steadfastness", 
    "steadiest", "steadiness", "steady", "stellar", "stellarly", "stimulate", "stimulates", 
    "stimulating", "stimulative", "stirringly", "straighten", "straightforward", "streamlined", 
    "striking", "strikingly", "striving", "strong", "stronger", "strongest", "stunned", "stunning", 
    "stunningly", "stupendous", "stupendously", "sturdier", "sturdy", "stylish", "stylishly", 
    "stylized", "suave", "suavely", "sublime", "subsidize", "subsidized", "subsidizes", 
    "subsidizing", "substantive", "succeed", "succeeded", "succeeding", "succeeds", "succes", 
    "success", "successes", "successful", "successfully", "suffice", "sufficed", "suffices", 
    "sufficient", "sufficiently", "suitable", "sumptuous", "sumptuously", "sumptuousness", 
    "super", "superb", "superbly", "superior", "superiority", "supple", "support", "supported", 
    "supporter", "supporting", "supportive", "supports", "supremacy", "supreme", "supremely", 
    "supurb", "supurbly", "surmount", "surpass", "surreal", "survival", "survivor", "sustainability", 
    "sustainable", "swank", "swankier", "swankiest", "swanky", "sweeping", "sweet", "sweeten", 
    "sweetheart", "sweetly", "sweetness", "swift", "swiftness", "talent", "talented", "talents", 
    "tantalize", "tantalizing", "tantalizingly", "tempt", "tempting", "temptingly", "tenacious", 
    "tenaciously", "tenacity", "tender", "tenderly", "terrific", "terrifically", "thank", "thankful", 
    "thinner", "thoughtful", "thoughtfully", "thoughtfulness", "thrift", "thrifty", "thrill", 
    "thrilled", "thrilling", "thrillingly", "thrills", "thrive", "thriving", "thumb-up", "thumbs-up", 
    "tickle", "tidy", "time-honored", "timely", "tingle", "titillate", "titillating", "titillatingly", 
    "toast", "togetherness", "tolerable", "toll-free", "top", "top-notch", "top-quality", "topnotch", 
    "tops", "tough", "tougher", "toughest", "tranquil", "tranquility", "transparent", "treasure", 
    "tremendously", "trendy", "triumph", "triumphal", "triumphant", "triumphantly", "trivially", 
    "trophy", "trouble-free", "trump", "trumpet", "trust", "trusted", "trusting", "trustingly", "trustworthiness", 
    "trustworthy", "trusty", "truthful", "truthfully", "truthfulness", "twinkly", "ultra-crisp", 
    "unabashed", "unabashedly", "unaffected", "unassailable", "unbeatable", "unbiased", "unbound", 
    "uncomplicated", "unconditional", "undamaged", "undaunted", "understandable", "undisputable", 
    "undisputably", "undisputed", "unencumbered", "unequivocal", "unequivocally", "unfazed", "unfettered", 
    "unforgettable", "unity", "unlimited", "unmatched", "unparalleled", "unquestionable", "unquestionably", 
    "unreal", "unrestricted", "unrivaled", "unselfish", "unwavering", "upbeat", "upgradable", "upgradeable", 
    "upgraded", "upheld", "uphold", "uplift", "uplifting", "upliftingly", "upliftment", "upscale", "usable", 
    "useful", "user-friendly", "user-replaceable", "valiant", "valiantly", "valor", "valuable", "variety", 
    "venerate", "verifiable", "veritable", "versatile", "versatility", "vibrant", "vibrantly", "victorious", 
    "victory", "viewable", "vigilance", "vigilant", "virtue", "virtuous", "virtuously", "visionary", 
    "vivacious", "vivid", "voluntarily", "voluntary", "vulnerability", "vulnerable", "warm", "warmer", 
    "warmhearted", "warmly", "warmth", "wealthy", "welcome", "well", "well-backlit", "well-balanced", 
    "well-behaved", "well-being", "well-bred", "well-connected", "well-educated", "well-established", 
    "well-informed", "well-intentioned", "well-known", "well-made", "well-managed", "well-mannered", 
    "well-positioned", "well-publicized", "well-regarded", "well-rounded", "well-run", "well-wishers", 
    "wellbeing", "whoa", "wholeheartedly", "wholesome", "whooa", "whoooa", "wieldy", "willing", "willingly", 
    "willingness", "win", "windfall", "winnable", "winner", "winners", "winning", "wins", "wisdom", "wise", 
    "wisely", "witty", "won", "wonder", "wonderful", "wonderfully", "wonderous", "wonderously", "wonders", 
    "wondrous", "woo", "work", "workable", "worked", "works", "world-famous", "worth", "worth-while", 
    "worthiness", "worthwhile", "worthy", "wow", "wowed", "wowing", "wows", "yay", "youthful", "zeal", 
    "zenith", "zest", "zippy", "glamorous", "helpful", "light-hearted", "lovely", "loyal", "magic", 
    "miracle", "miraculous", "nifty", "nice", "originality", "paradise", "phenomenal", "pleasant", 
    "pleasantly", "pleased", "pleasing", "positive", "praise", "praiseworthy", "precious", "premium", 
    "remarkable", "satisfied", "satisfaction", "satisfy", "smart", "smooth", "splendid", "stellar", 
    "superb", "superior", "terrific", "thank", "thorough", "thrilled", "top", "top-notch", "treasure", 
    "triumph", "trust", "unbeatable", "upbeat", "vibrant", "victory", "well", "willing", "win", "winner", 
    "winning", "wow"]

### Negative Dictionary

In [5]:
# Sample Negative Dictionary
negative_dictionary = [ "abandon", "aberration", "abhor", "abject", "abnormal", "abolish", "abominable", 
    "abomination", "abrade", "abrasive", "abrupt", "abscond", "absence", "absurd", 
    "absurdity", "abuse", "abysmal", "abyss", "accidental", "accursed", "accusation", 
    "accuse", "acerbic", "achy", "acrid", "adamant", "addict", "admonish", "adulterate", 
    "adversary", "adversity", "afflict", "afraid", "aggravate", "aggression", "aggressive", 
    "agonize", "agony", "ail", "ailment", "alarm", "alienate", "allergic", "aloof", 
    "amiss", "amputate", "anger", "angry", "anguish", "annihilate", "annoy", "annoyance", 
    "anomalous", "antagonistic", "antagonize", "anxiety", "anxious", "apathetic", 
    "apathy", "appalling", "apprehension", "apprehensive", "arbitrary", "archaic", 
    "argumentative", "arrogance", "arrogant", "ashamed", "asinine", "aspersion", 
    "assassin", "assault", "astray", "atrocious", "atrocity", "attack", "audacious", 
    "austere", "authenticity", "avoid", "awful", "awkward", "backwards", "bad", "bane", 
    "barbaric", "barbarous", "barren", "baseless", "bashful", "battered", "battle", 
    "belligerent", "bemoan", "beneath", "berserk", "betray", "betrayal", "bewildered", 
    "bias", "bicker", "bitter", "bizarre", "blackmail", "blah", "blame", "blasted", 
    "blatant", "bleak", "bleed", "blemish", "blindside", "blister", "block", "blockade", 
    "blunt", "blur", "blurt", "boastful", "bombard", "bombastic", "bondage", "bother", 
    "bothersome", "brash", "bravado", "breach", "break", "breakdown", "broken", 
    "brutal", "brutality", "brute", "burden", "burn", "burning", "busy", "butt", 
    "bypass", "calamity", "callous", "calumny", "cancer", "candid", "cannibal", 
    "capricious", "careless", "cataclysm", "catastrophe", "caustic", "cave", 
    "cease", "cessation", "chafe", "challenge", "chaos", "chaotic", "charge", 
    "charlatan", "cheat", "cheater", "cheesy", "chide", "chilling", "choke", 
    "chronic", "clash", "clumsy", "coarse", "collapse", "collide", "collision", 
    "complain", "complicate", "compulsion", "conceal", "conceited", "concern", 
    "confine", "conflict", "confound", "confusion", "congested", "conquer", 
    "conspiracy", "contagion", "contempt", "contemptible", "contend", 
    "contentious", "contort", "contradict", "contradiction", "contrary", 
    "contravene", "contrite", "controversial", "conundrum", "convict", 
    "conviction", "corrosive", "corrupt", "corruption", "costly", 
    "counterfeit", "counterproductive", "coward", "crabby", "crack", 
    "cramped", "cranky", "crap", "crash", "craven", "craze", "crazy", 
    "creep", "cripple", "crisis", "critic", "critical", "criticism", 
    "criticize", "crooked", "crude", "cruel", "cruelty", "crumble", 
    "crummy", "crush", "cryptic", "culpable", "cumbersome", "curse", 
    "cursed", "cynical", "damage", "damaging", "dampen", "danger", 
    "dangerous", "dark", "dastardly", "deadlock", "deadly", "deafening", 
    "death", "debase", "debatable", "deceit", "deceitful", "deceive", 
    "deception", "decimate", "decay", "deceptive", "decline", "defame", 
    "defect", "defective", "defend", "defender", "defensive", "defer", 
    "defiance", "defiant", "deficient", "defile", "defraud", "defunct", 
    "degenerate", "degradation", "degrade", "dejected", "delay", "deliberate", 
    "delusion", "delusional", "demanding", "demean", "demented", "demolish", 
    "demonic", "demonize", "denial", "denounce", "dense", "denunciation", 
    "deny", "deplete", "deplorable", "deplorably", "depraved", "depress", 
    "depressed", "depressing", "deprivation", "deride", "derision", 
    "derogatory", "desert", "desertion", "desolate", "despair", "desperate", 
    "desperation", "despise", "despondent", "destroy", "destruction", 
    "destructive", "detach", "detachment", "deter", "detest", "detestable", 
    "detour", "detract", "detriment", "devastate", "devastation", "deviate", 
    "devil", "devilish", "devious", "devour", "diabolic", "diabolical", 
    "dialect", "dictator", "difficult", "diffidence", "diminish", "dirt", 
    "dirty", "disable", "disadvantage", "disagree", "disagreeable", 
    "disappear", "disappointment", "disapprove", "disarm", "disaster", 
    "disastrous", "disavow", "disbelief", "discard", "discern", "discomfort", 
    "discompose", "disconcert", "discontent", "discontented", "discord", 
    "discourage", "discouragement", "discouraging", "discredit", "discreet", 
    "discrepancy", "disdain", "disdainful", "disease", "disfavor", "disgust", 
    "disgusting", "dishonest", "dishonor", "disillusion", "disinclined", 
    "disjointed", "dislike", "disloyal", "dismal", "dismay", "dismiss", 
    "disobey", "disorder", "disorganized", "disparage", "disparity", 
    "dispassionate", "displeased", "displeasure", "disposable", "disposal", 
    "disproportionate", "disprove", "dispute", "disquiet", "disregard", 
    "disrespect", "disrespectful", "disrupt", "disruption", "dissatisfaction", 
    "dissatisfied", "dissent", "disservice", "dissimilar", "dissipate", 
    "dissolve", "distant", "distaste", "distasteful", "distort", "distortion", 
    "distract", "distress", "distressing", "distrust", "disturb", "disturbed", 
    "divergent", "divide", "divisive", "dizzy", "dodgy", "dogged", "domineer", 
    "domineering", "doubt", "doubtful", "doubtfully", "down", "downcast", 
    "downfall", "downgrade", "downhearted", "downhill", "downside", "downturn", 
    "drab", "drag", "drain", "dread", "dreadful", "dreary", "drench", "dripping", 
    "drone", "droop", "drop", "drought", "drown", "drunk", "dry", "dubious", 
    "dud", "dull", "dumb", "dump", "dunce", "dupe", "dust", "dusty", "dwindle", 
    "dying", "earsplitting", "eccentric", "edgy", "egotistic", "egregious", 
    "eject", "elusive", "embarrass", "embarrassing", "embitter", "embittered", 
    "embrace", "embroil", "emphatic", "empty", "encroach", "endanger", "enemies", 
    "enemy", "enrage", "enraged", "enslave", "entangle", "entrap", "envious", 
    "erratic", "erroneous", "error", "eruption", "escape", "evade", "evict", 
    "evil", "exaggerate", "exasperate", "excruciating", "excuse", "exile", 
    "exorbitant", "expel", "expensive", "expire", "explode", "exploit", "expose", 
    "expulsion", "extinct", "extinguish", "extortion", "extraneous", "extravagant", 
    "exude", "exult", "eye-sore", "fail", "failure", "fake", "fall", "fallacy", 
    "fallen", "false", "falter", "fatal", "fatigue", "fault", "faulty", "fear", 
    "fearful", "feeble", "feeble-minded", "feign", "fell", "fiasco", "fickle", 
    "fiction", "fidget", "fiend", "filthy", "finicky", "fissure", "flag", "flagrant", 
    "flake", "flaky", "flaw", "flawed", "flee", "fleeting", "flimsy", "flirt", "flop", 
    "flout", "fluster", "foe", "fool", "foolish", "forbid", "forbidding", "force", 
    "forceful", "foreboding", "forgetful", "forgettable", "forsake", "foul", "fragmented", 
    "frantic", "fraud", "fraudulent", "freak", "freakish", "freeze", "fret", "friction", 
    "frivolous", "frown", "frustrate", "frustrating", "frustration", "fuck", "fudge", 
    "fugitive", "full", "fumble", "fume", "fundamental", "funeral", "funky", "funny", 
    "furious", "futility", "fuzzy", "gag", "gaffe", "gainsay", "gall", "garbage", 
    "gaudy", "gawk", "geeky", "generic", "ghastly", "ghostly", "gibberish", "gibe", 
    "giddy", "gimmick", "give-up", "glare", "glitch", "glower", "glum", "gnaw", "goofy", 
    "grave", "greed", "greedy", "grief", "grieve", "grievous", "grim", "grime", "grind", 
    "gripe", "gross", "grotesque", "grouch", "grouchy", "groundless", "growl", "grudge", 
    "gruesome", "grumble", "grumpy", "guilt", "guilty", "gullible", "gulp", "gutless", 
    "hack", "haggard", "haggle", "halt", "hamper", "hamstring", "handicap", "hang", 
    "haphazard", "harangue", "harass", "harassment", "harbinger", "harsh", "hassle", 
    "haste", "hasty", "hateful", "hate", "haughty", "haunt", "headache", "heartache", 
    "heartbreak", "heartless", "heat", "heinous", "hell", "hideous", "hideousness", 
    "hinder", "hindrance", "hoard", "hoax", "hobble", "hog", "hollow", "hopeless", 
    "horde", "horrible", "horrid", "horrific", "hostile", "hothead", "hubris", 
    "huckster", "hugely", "humble", "humiliate", "humiliation", "hurt", "hurtful", 
    "hustle", "hysteria", "hysterical", "icky", "idiocy", "idiot", "idiotic", "idle", 
    "ignoble", "ignominious", "ignorant", "ignore", "ill", "illegal", "illegitimate", 
    "illiterate", "illness", "illogical", "illusive", "illusory", "immaterial", 
    "immature", "imminence", "imminent", "immoral", "impair", "impairment", "impasse", 
    "impatient", "impeach", "impede", "impediment", "impending", "imperfect", "imperil", 
    "impersonal", "impetuous", "implicate", "implication", "implode", "implore", "imply", 
    "importunate", "importune", "impose", "imposition", "impossible", "impotent", 
    "impractical", "imprecise", "imprison", "imprisonment", "improbability", 
    "improbable", "improper", "improve", "improvement", "imprudent", "impudent", 
    "impugn", "impulse", "impulsive", "impunity", "impure", "inability", "inaccurate", 
    "inaction", "inactive", "inadequate", "inappropriate", "inarticulate", 
    "inattentive"]

### Positive Score

In [6]:
# Function to calculate Positive Score
def calculate_positive_score(title, positive_dictionary):
    words = word_tokenize(title.lower())
    positive_score = sum(1 for word in words if word in positive_dictionary)
    return positive_score

* Tokenize the Title:

word_tokenize(title.lower()): Converts the title to lowercase and tokenizes it into individual words.

* Calculate Positive Score:

sum(1 for word in words if word in positive_dictionary): Iterates through the tokenized words and counts how many of those words are present in the positive_dictionary.
For each word in the tokenized list that matches a word in the positive dictionary, it adds 1 to the positive_score.

### Negative Score

In [7]:
# Function to calculate Negative Score
def calculate_negative_score(title, negative_dictionary):
    words = word_tokenize(title.lower())
    negative_score = sum(1 for word in words if word in negative_dictionary)
    return negative_score

* Tokenize the Title:

word_tokenize(title.lower()): Converts the title to lowercase and tokenizes it into individual words.

* Calculate Negative Score:

sum(1 for word in words if word in negative_dictionary): Iterates through the tokenized words and counts how many of those words are present in the negative_dictionary.
For each word in the tokenized list that matches a word in the negative dictionary, it adds 1 to the negative_score.

### Polarity Score

In [8]:
# Function to calculate Polarity Score
def calculate_polarity_score(positive_score, negative_score):
    if positive_score == 0 and negative_score == 0:
        return 0
    return (positive_score - negative_score) / ((positive_score + negative_score) + 0.000001)

* Check for Zero Division:

if positive_score == 0 and negative_score == 0:: Checks if both positive_score and negative_score are zero. If so, returns a polarity score of 0 to avoid division by zero errors.

* Calculate Polarity Score:

(positive_score - negative_score): Computes the difference between positive_score and negative_score.
((positive_score + negative_score) + 0.000001): Normalizes the difference by adding a small value (0.000001) to the sum of positive_score and negative_score to prevent division by zero.
Returns the calculated polarity score, which ranges from -1 (strongly negative) to 1 (strongly positive).

### Subjectivity Score

In [9]:
# Function to calculate Subjectivity Score
def calculate_subjectivity_score(positive_score, negative_score, total_words):
    if total_words == 0:
        return 0
    return (positive_score + negative_score) / (total_words + 0.000001)

* Check for Zero Division:

if total_words == 0:: Checks if total_words is zero. If so, returns a subjectivity score of 0 to avoid division by zero errors.

* Calculate Subjectivity Score:

(positive_score + negative_score): Computes the sum of positive_score and negative_score.
(total_words + 0.000001): Normalizes the sum by adding a small value (0.000001) to total_words to prevent division by zero.
Returns the calculated subjectivity score, which represents the ratio of positive and negative words relative to the total number of words in the text.

### Average Sentance Length

In [10]:
# Function to calculate Average Sentence Length
def calculate_avg_sentence_length(title):
    sentences = title.split('.')
    total_sentences = len(sentences)
    total_words = len(word_tokenize(title))
    if total_sentences == 0:
        return 0
    return total_words / total_sentences

* Split Text into Sentences:

sentences = title.split('.'): Splits the title into sentences based on periods ('.').
This assumes sentences end with periods and may not handle abbreviations or other punctuation correctly.

* Count Sentences and Words:

total_sentences = len(sentences): Counts the total number of sentences obtained from splitting the text.
total_words = len(word_tokenize(title)): Tokenizes the entire title into words and counts the total number of words.

* Check for Zero Division:

if total_sentences == 0:: Checks if total_sentences is zero. If so, returns an average sentence length of 0 to avoid division by zero errors.

* Calculate Average Sentence Length:

Returns the ratio of total_words to total_sentences, which gives the average number of words per sentence.

### Syllable Count

In [19]:
# Function to count syllables in a word
def calculate_syllable_count(word):
    d = cmudict.dict()
    if word.lower() in d:
        return [len(list(y for y in x if y[-1].isdigit())) for x in d[word.lower()]][0]
    else:
        vowels = "aeiou"
        count = 0
        word = word.lower().strip(".:;?!")
        if len(word) == 0:
            return 0
        if word[0] in vowels:
            count += 1
        for index in range(1, len(word)):
            if word[index] in vowels and word[index - 1] not in vowels:
                count += 1
        if word.endswith("e"):
            count -= 1
        if word.endswith("le"):
            count += 1
        if count == 0:
            count += 1
        return count

* Check CMU Dictionary:

d = cmudict.dict(): Loads the CMU dictionary, which is a pronunciation dictionary.

* Lookup Word in Dictionary:

if word.lower() in d: Checks if the lowercase version of word exists in the CMU dictionary.
If found, retrieves the syllable count using list comprehensions and parsing of the dictionary entry.

* Heuristic Syllable Counting:

If the word is not found in the dictionary, it uses a heuristic approach to count syllables:

Initializes a count variable.
Strips common punctuation from the word.
Checks if the word starts with a vowel and increments the count.
Iterates through the word to count syllables based on vowel occurrences and non-vowel transitions.
Adjusts the syllable count based on common English syllable patterns (e at the end reduces count, le at the end increases count).
Ensures the minimum syllable count is 1.

### Count of Words, Count of Complex Words and Percentage of Complex Words

In [27]:
# Function to calculate Count of Words,Count of Complex Words and Percentage of Complex words
def calculate_complex_word_count(title):
    words = word_tokenize(title.lower())
    stop_words = set(stopwords.words('english'))
    punctuations = set(string.punctuation)
    complex_word_count = 0
    total_words = 0
    
    def syllable_count(word):
        return sum([len(list(y for y in x if y[-1].isdigit())) for x in cmudict.dict().get(word.lower(), [])])
    
    for word in words:
        if word not in stop_words and word not in punctuations:
            total_words += 1
            if syllable_count(word) > 2:
                complex_word_count += 1
    if total_words == 0:
        return 0, 0, 0 
    percentage_complex_words = (complex_word_count / total_words) * 100
    return complex_word_count, percentage_complex_words, total_words

* Tokenize and Preprocess Text:

words = word_tokenize(title.lower()): Tokenizes the lowercase version of title into individual words.

* Initialize Variables:

stop_words = set(stopwords.words('english')): Retrieves a set of English stopwords (common words like 'the', 'is', etc.) using NLTK.
punctuations = set(string.punctuation): Retrieves a set of punctuation marks.
Initializes complex_word_count and total_words counters.

* Define Syllable Count Function:

syllable_count(word): Defines an inner function to count syllables using the CMU dictionary (cmudict). If a word is not found in the dictionary, it returns 0.

* Iterate Through Words:

Iterates through each word in words.
Checks if the word is not a stopword and not a punctuation mark.
Increments total_words for each valid word encountered.
Checks if syllable_count(word) > 2 to determine if the word is complex (having more than 2 syllables). If true, increments complex_word_count.

* Handle Edge Case:

Checks if total_words is 0 to avoid division by zero. If true, returns 0 for all counts (complex_word_count, percentage_complex_words, total_words).

* Calculate Percentage of Complex Words:

Computes percentage_complex_words as (complex_word_count / total_words) * 100.

* Return Results:

Returns a tuple containing complex_word_count, percentage_complex_words, and total_words.

### Fog Index

In [28]:
# Function to calculate Fog Index
def calculate_fog_index(avg_sentence_length, percentage_complex_words):
    return 0.4 * (avg_sentence_length + percentage_complex_words)

* Compute Fog Index:

Calculates the Fog Index using the formula 0.4 * (avg_sentence_length + percentage_complex_words).
The Fog Index formula combines the average sentence length and the percentage of complex words to estimate the readability level of the text.

### Average Word Length

In [29]:
# Function to calculate Average Word Length
def calculate_avg_word_length(title):
    words = word_tokenize(title.lower())
    total_words = len(words)
    if total_words == 0:
        return 0
    total_length = sum(len(word) for word in words)
    return total_length / total_words

* Tokenize the Text:

words = word_tokenize(title.lower()): Tokenizes the lowercase version of title into individual words.

* Calculate Total Words:

total_words = len(words): Counts the total number of words in the tokenized list words.

* Check for Zero Division:

if total_words == 0:: Checks if total_words is zero. If so, returns an average word length of 0 to avoid division by zero errors.

* Calculate Total Length of Words:

total_length = sum(len(word) for word in words): Computes the total length of all words by summing the length of each word in words.

* Compute Average Word Length:

Returns the average word length by dividing total_length by total_words.

### Personal Pronouns

In [30]:
# Function to count personal pronouns
def count_personal_pronouns(title):
    words = word_tokenize(title.lower())
    personal_pronouns = ['i', 'me', 'my', 'mine', 'myself', 'we', 'us', 'our', 'ours', 'ourselves',
                    'you', 'your', 'yours', 'yourself', 'yourselves',
                    'he', 'him', 'his', 'himself', 'she', 'her', 'hers', 'herself',
                    'it', 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves']
    return sum(1 for word in words if word in personal_pronouns)

* Tokenize the Text:

words = word_tokenize(title.lower()): Tokenizes the lowercase version of title into individual words.

* Define Personal Pronouns List:

personal_pronouns: Contains a predefined list of personal pronouns commonly used in English.

* Count Personal Pronouns:

Uses a generator expression within the sum() function to iterate through each word in words.
Checks if each word exists in the personal_pronouns list.
Increments the count for each word found in the list.

### Average Words Per Sentance

In [31]:
def calculate_avg_words_per_sentence(title):
    total_words = len(word_tokenize(title))
    avg_sentence_length, total_sentences = calculate_avg_sentence_length(title)
    if total_sentences == 0:
        return 0
    return total_words / total_sentences

* Tokenize Text:

words = word_tokenize(title): Tokenizes the title into words using NLTK's word_tokenize function.

* Count Words and Sentences:

total_words = len(words): Counts the total number of words in the tokenized title.
total_sentences = len(sent_tokenize(title)): Counts the total number of sentences in the title using NLTK's sent_tokenize function.

* Check for Zero Division:

if total_sentences == 0: Checks if total_sentences is zero to avoid division by zero error.

* Calculate Average Words per Sentence:

Returns the average number of words per sentence by dividing total_words by total_sentences.

### Creating Lists to Store Metrics

In [32]:
# Create lists to store the Datas
titles_list = []
positive_scores = []
negative_scores = []
polarity_scores = []
subjectivity_scores = []
avg_sentence_lengths = []
percentage_complex_words_list = []
fog_index_list = []
complex_word_count_list = []
total_words_list = []
syllable_count_list = []
personal_pronouns_list = []
avg_word_length_list = []

### Iterating Through DataFrame and Processing Each URL

In [33]:
# Iterate through the DataFrame and process each URL
for index, row in df.iterrows():
    title = extract_article_title(row['URL'])
    
    if title:
        positive_score = calculate_positive_score(title, positive_dictionary)
        negative_score = calculate_negative_score(title, negative_dictionary)
        polarity_score = calculate_polarity_score(positive_score, negative_score)
        total_words_count = len(word_tokenize(title))
        subjectivity_score = calculate_subjectivity_score(positive_score, negative_score, total_words_count)
        avg_sentence_length = calculate_avg_sentence_length(title)
        complex_word_count, percentage_complex_words, total_words = calculate_complex_word_count(title)
        fog_index = calculate_fog_index(avg_sentence_length, percentage_complex_words)
        syllable_count = calculate_syllable_count(title)
        personal_pronouns = count_personal_pronouns(title)
        avg_word_length = calculate_avg_word_length(title)
    else:
        positive_score = negative_score = polarity_score = subjectivity_score = None
        avg_sentence_length = percentage_complex_words = fog_index = 0
        complex_word_count = total_words = syllable_count = personal_pronouns = avg_word_length = 0
    
    # Append values to respective lists
    titles_list.append(title)
    positive_scores.append(positive_score)
    negative_scores.append(negative_score)
    polarity_scores.append(polarity_score)
    subjectivity_scores.append(subjectivity_score)
    avg_sentence_lengths.append(avg_sentence_length)
    percentage_complex_words_list.append(percentage_complex_words)
    fog_index_list.append(fog_index)
    complex_word_count_list.append(complex_word_count)
    total_words_list.append(total_words)
    syllable_count_list.append(syllable_count)
    personal_pronouns_list.append(personal_pronouns)
    avg_word_length_list.append(avg_word_length)

Error fetching URL: https://insights.blackcoffer.com/dashboard-to-track-the-analytics-of-the-website-using-google-analytics-and-google-tag-manager/. Exception: 502 Server Error: Bad Gateway for url: https://insights.blackcoffer.com/dashboard-to-track-the-analytics-of-the-website-using-google-analytics-and-google-tag-manager/


* DataFrame Iteration:

for index, row in df.iterrows():: Iterates through each row (row) in the DataFrame (df). index represents the index of the row.

* Extracting Article Title:

title = extract_article_title(row['URL']): Calls the extract_article_title function to fetch and clean the title from the URL provided in the current row (row['URL']).

Calculating Metrics:

* Sentiment Analysis:

positive_score, negative_score, polarity_score: Calculate positive and negative scores using predefined dictionaries (positive_dictionary, negative_dictionary), and then compute the polarity score based on these scores.

* Text Complexity:

total_words_count: Counts the total number of words in the title using len(word_tokenize(title)).
subjectivity_score: Computes the subjectivity score based on positive score, negative score, and total words count.
avg_sentence_length: Calculates the average sentence length in terms of words.
complex_word_count, percentage_complex_words, total_words: Calculates the count of complex words, percentage of complex words, and total words count in the title.
fog_index: Computes the Fog Index based on average sentence length and percentage of complex words.
syllable_count: Counts the syllables in the title.
personal_pronouns: Counts personal pronouns (like 'I', 'you', 'he', etc.) in the title.
avg_word_length: Calculates the average word length in the title.

* Handling Errors:

If the title extraction fails (if not title:), default values (None or 0) are assigned to the metrics to avoid errors.

* Storing Results:

Appends all calculated metrics (title, positive_score, negative_score, polarity_score, etc.) to their respective lists (titles_list, positive_scores, negative_scores, etc.).

### Adding Results to DataFrame and Saving to CSV

In [34]:
# Add the results to the DataFrame
df['TITLE'] = titles_list
df['POSITIVE SCORE'] = positive_scores
df['NEGATIVE SCORE'] = negative_scores
df['POLARITY SCORE'] = polarity_scores
df['SUBJECTIVITY SCORE'] = subjectivity_scores
df['AVG SENTENCE LENGTH'] = avg_sentence_lengths
df['PERCENTAGE OF COMPLEX WORDS'] = percentage_complex_words_list
df['FOG INDEX'] = fog_index_list
df['COMPLEX WORD COUNT'] = complex_word_count_list
df['WORD COUNT'] = total_words_list
df['SYLLABLE PER WORD'] = syllable_count_list
df['PERSONAL PRONOUNS'] = personal_pronouns_list
df['AVG WORD LENGTH'] = avg_word_length_list

# Save the DataFrame to a new Excel file
output_file_path = 'C:/Users/abine/Desktop/Jupyter Notebook/BlackCoffer Project/Output.csv'
df.to_csv(output_file_path, index=False)

print("Article titles and scores extracted and saved successfully.")

Article titles and scores extracted and saved successfully.


* Adding Columns to DataFrame:

Each list (titles_list, positive_scores, negative_scores, etc.) containing calculated metrics is assigned to a new column in the DataFrame (df). This step aligns each metric with its corresponding article title.

* Output File Path:

output_file_path: Specifies the path where the CSV file ('Output.csv') will be saved. Adjust this path according to your local file system structure.

* Saving to CSV:

df.to_csv(output_file_path, index=False): Saves the updated DataFrame to a CSV file without including the index column.

### Ouput

In [37]:
df

Unnamed: 0,URL_ID,URL,TITLE,POSITIVE SCORE,NEGATIVE SCORE,POLARITY SCORE,SUBJECTIVITY SCORE,AVG SENTENCE LENGTH,PERCENTAGE OF COMPLEX WORDS,FOG INDEX,COMPLEX WORD COUNT,WORD COUNT,SYLLABLE PER WORD,PERSONAL PRONOUNS,AVG WORD LENGTH
0,bctech2011,https://insights.blackcoffer.com/ml-and-ai-bas...,ML and AI-based insurance premium model to pre...,2.0,0.0,1.000000,0.125000,16.0,60.000000,30.400000,6,10,28,0,5.125000
1,bctech2012,https://insights.blackcoffer.com/streamlined-i...,Streamlined Integration: Interactive Brokers A...,1.0,0.0,0.999999,0.083333,12.0,33.333333,18.133333,3,9,27,0,6.833333
2,bctech2013,https://insights.blackcoffer.com/efficient-dat...,Efficient Data Integration and User-Friendly I...,2.0,0.0,1.000000,0.142857,14.0,81.818182,38.327273,9,11,37,0,7.642857
3,bctech2014,https://insights.blackcoffer.com/effective-man...,Effective Management of Social Media Data Extr...,1.0,0.0,0.999999,0.062500,16.0,90.000000,42.400000,9,10,35,0,6.125000
4,bctech2015,https://insights.blackcoffer.com/streamlined-t...,Streamlined Trading Operations Interface for M...,2.0,0.0,1.000000,0.153846,13.0,60.000000,29.200000,6,10,34,0,7.230769
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
142,bctech2153,https://insights.blackcoffer.com/population-an...,Population and Community Survey of America,0.0,0.0,0.000000,0.000000,6.0,100.000000,42.400000,4,4,15,0,6.166667
143,bctech2154,https://insights.blackcoffer.com/google-lsa-ap...,Google LSA API Data Automation and Dashboarding,0.0,0.0,0.000000,0.000000,7.0,33.333333,16.133333,2,6,15,0,5.857143
144,bctech2155,https://insights.blackcoffer.com/healthcare-da...,Healthcare Data Analysis,0.0,0.0,0.000000,0.000000,3.0,66.666667,27.866667,2,3,8,0,7.333333
145,bctech2156,https://insights.blackcoffer.com/budget-sales-...,"Budget, Sales KPI Dashboard using Power BI",0.0,0.0,0.000000,0.000000,8.0,0.000000,3.200000,0,7,12,0,4.500000
