📊 Text Analysis and Sentiment Analysis Tool

A powerful Python-based tool for comprehensive text analysis, sentiment analysis, and readability assessment of web articles. Extract, process, and analyze content with ease.

✨ Features

🌐 Web Scraping
- Extracts article content from any URL
- Handles both static and JavaScript-heavy websites
- Built-in fallback mechanisms for robust content extraction
📊 Sentiment Analysis
- Positive/Negative sentiment scoring
- Polarity and subjectivity metrics
- Emotion detection
📝 Text Complexity Analysis
- Flesch Reading Ease score
- Average sentence and word length
- Percentage of complex words
- Fog Index calculation
- Syllable analysis
- Personal pronoun counting
💾 Output
- Clean CSV export
- Structured data format
- Easy integration with data analysis tools

🚀 Quick Start

Prerequisites

Python 3.8 or higher
Chrome WebDriver (for Selenium)
UV (Ultra-fast Python package installer and resolver)

Installation with UV (Recommended)

Install UV if you haven't already:

curl -LsSf https://astral.sh/uv/install.sh | sh

Clone and set up the project:

git clone https://github.com/codeMaestro78/Text-Analysis-and-Sentiment-Analysis-Tool.git
cd blackcoffer_assignment/assignment

Install dependencies with UV:
```
uv pip install -r requirements.txt
```

Download required NLTK data:

python -c "import nltk; nltk.download('punkt'); nltk.download('stopwords')"

Running the Application

To run the analysis using UV:

uv run main.py

Or for verbose output:

uv run main.py --verbose

📁 Project Structure

assignment/
├── Input.csv                  # Input file with URLs to analyze
├── Output_Data_Structure.csv  # Template for output data
├── main.py                    # Main application script
├── requirements.txt           # Python dependencies
├── README.md                  # This documentation
├── MasterDictionary/          # Sentiment analysis word lists
│   ├── positive-words.txt     # Positive sentiment words
│   └── negative-words.txt     # Negative sentiment words
└── StopWords/                 # Text processing stop words
    └── *.txt                  # Various stop word lists

🔍 Output Metrics

The analysis generates comprehensive metrics for each URL, including:

Metric	Description
POSITIVE SCORE	Score indicating positive sentiment
NEGATIVE SCORE	Score indicating negative sentiment
POLARITY SCORE	Overall sentiment polarity (-1 to 1)
SUBJECTIVITY SCORE	How subjective the text is (0 to 1)
AVG SENTENCE LENGTH	Average number of words per sentence
PERCENTAGE OF COMPLEX WORDS	Percentage of complex words in the text
FOG INDEX	Readability metric
AVG NUMBER OF WORDS PER SENTENCE	Self-explanatory
COMPLEX WORD COUNT	Number of complex words
WORD COUNT	Total number of words
SYLLABLE PER WORD	Average syllables per word

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🛠️ Usage

Input Format

Create an Input.csv file in the project root with the following format:

URL_ID,URL
id1,https://example.com/article1
id2,https://example.com/article2

Running the Analysis

Basic Usage:
```
uv run main.py
```
With Custom Input File:
```
uv run main.py --input custom_input.csv
```
Verbose Mode (for debugging):
```
uv run main.py --verbose
```

What Happens Next?

The script will:

Read URLs from the input file
Extract and clean article content
Perform comprehensive text and sentiment analysis
Generate output files in the project directory
Display progress and summary statistics

Output

The analysis generates the following metrics for each URL:

POSITIVE SCORE: Score indicating positive sentiment
NEGATIVE SCORE: Score indicating negative sentiment
POLARITY SCORE: Overall sentiment polarity (-1 to 1)
SUBJECTIVITY SCORE: How subjective the text is (0 to 1)
AVG SENTENCE LENGTH: Average number of words per sentence
PERCENTAGE OF COMPLEX WORDS: Percentage of complex words in the text
FOG INDEX: Readability metric
AVG NUMBER OF WORDS PER SENTENCE: Self-explanatory
COMPLEX WORD COUNT: Number of complex words
WORD COUNT: Total number of words
SYLLABLE PER WORD: Average syllables per word
PERSONAL PRONOUNS: Count of personal pronouns
AVG WORD LENGTH: Average length of words in characters

Configuration

You can modify the following in the code if needed:

use_selenium in WebScraper class to toggle between Selenium and requests
Timeout values for web requests
Output file names and locations

Troubleshooting

SSL Certificate Errors: If you encounter SSL errors, try running with verify=False in the requests (not recommended for production)
WebDriver Issues: Ensure Chrome and ChromeDriver versions are compatible
Missing Dependencies: Make sure all required packages are installed

Acknowledgments

NLTK for natural language processing
TextStat for readability calculations
BeautifulSoup and Selenium for web scraping

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
MasterDictionary		MasterDictionary
StopWords		StopWords
.gitignore		.gitignore
.python-version		.python-version
Input.csv		Input.csv
Output_Data_Structure.csv		Output_Data_Structure.csv
Output_Data_Structure.xlsx		Output_Data_Structure.xlsx
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📊 Text Analysis and Sentiment Analysis Tool

✨ Features

🚀 Quick Start

Prerequisites

Installation with UV (Recommended)

Running the Application

📁 Project Structure

🔍 Output Metrics

🤝 Contributing

📄 License

🛠️ Usage

Input Format

Running the Analysis

What Happens Next?

Output

Configuration

Troubleshooting

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

codeMaestro78/Text-Analysis-and-Sentiment-Analysis-Tool

Folders and files

Latest commit

History

Repository files navigation

📊 Text Analysis and Sentiment Analysis Tool

✨ Features

🚀 Quick Start

Prerequisites

Installation with UV (Recommended)

Running the Application

📁 Project Structure

🔍 Output Metrics

🤝 Contributing

📄 License

🛠️ Usage

Input Format

Running the Analysis

What Happens Next?

Output

Configuration

Troubleshooting

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages