Web Mining and Applied NLP (44-620)

Final Project: Article Summarizer

Student Name: Naiema Elsaadi

The project aims to assist users in quickly understanding the sentiment of articles and identifying key information without having to read the entire text. It can be useful for researchers, journalists, and anyone interested in understanding the emotional content of written text.

Introduction

The project is a Python-based text analysis tool that performs sentiment analysis on articles retrieved from online sources. It utilizes natural language processing (NLP) techniques to analyze the sentiment of articles and provides insights into the emotional tone of the text.

Key features of the project include:

Retrieving articles from specified URLs.
Parsing the HTML content of articles and extracting the main text.
Analyzing the sentiment of the text using both token-based and lemma-based approaches.
Generating histograms to visualize the distribution of sentiment scores.
Summarizing articles based on their sentiment scores, providing concise summaries of the main points.

Objectives

The primary objective of this project is to apply web mining and natural language processing (NLP) skills to develop a text analysis tool. The project aims to achieve the following goals:

API Integration: Explore options for integrating with web APIs to fetch articles from online sources dynamically. This allows users to analyze content from a variety of sources, enhancing the versatility and usability of the tool.
Sentiment Analysis: Utilize NLP techniques to analyze the sentiment of articles and quantify the emotional tone of the text. This involves determining polarity scores that indicate whether the sentiment of the text is positive, negative, or neutral.
Text Corpus Analysis: Gather text corpus data from online sources and analyze the frequency of words to identify common themes or topics. This involves processing large volumes of text data to extract meaningful insights and trends.
Visualization: Implement visualization techniques to present the analysis results in a clear and intuitive manner. This includes generating histograms, word clouds, and other visualizations to help users understand the data more effectively.
Article Summarization: Implement algorithms to automatically generate concise summaries of articles retrieved from online sources. This involves extracting key information and significant insights from the text to provide users with a condensed version of the content.

By achieving these objectives, the project aims to provide a valuable tool for data analysts, researchers, journalists, and content creators to analyze and extract insights from textual data efficiently and effectively. Additionally, the project serves as a unique addition to the developer's online portfolio, showcasing their skills and capabilities in web mining and NLP.

Prerequisites

Before running the project, ensure you have the following prerequisites:

Git
Github
Python 3.10+ installed
VS studio Code
juypterlab
anaconda prompt (miniconda3) or windows PowerShell
Required Python libraries (e.g., numpy) installed in your active environment

Getting Started

Before we begin, make sure you have the necessary dependencies installed in your Python environment.
If you haven't already, you'll need to install spaCy and spaCy NLP using the following command:
python -m pip install beautifulsoup4
python -m pip install html5lib
python -m pip install requests
python -m pip install spacy
python -m pip install spacytextblob

Before we start

Create and activate a Python virtual environment.
Address any errors you get running this code cell by installing the necessary packages into your active Python environment.
Before starting the project, try all these imports FIRST
from collections import Counter
import pickle
import requests
import spacy
from bs4 import BeautifulSoup
import matplotlib.pyplot as plt
!pip list
print('All prereqs installed.')

Modules

The following modules are required for the installation of this project:

import requests
from bs4 import BeautifulSoup
import pickle
import spacy
from spacytextblob.spacytextblob import SpacyTextBlob
from textblob import TextBlob
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Installation

Instructions for installing any dependencies or software needed to run the project. Include commands for installing packages or setting up environments. To use the notebooks in this repository, follow these steps:

Clone the repository to your local machine: git clone Repo
Navigate to the project directory: cd file
Create a virtual environment (optional but recommended): python -m venv venv
Activate the virtual environment: For Windows: venv\Scripts\activate
Install the required packages: pip install -r requirements.txt
Launch Jupyter Lab: jupyter lab
Open the desired notebook and execute the code cells.

Data Source

The source of the repository for this project is available at:GitHub Repository

Contact

For any questions, feedback, or inquiries, you can reach out to me via My GitHub Repository

Acknowledgments

Special thanks to the Web Mining and Applied NLP (44-620) course instructor Dr. Case for her guidance and support throughout the project.

Reference

https://www.crummy.com/software/BeautifulSoup/bs4/doc/#quick-start https://github.com/denisecase/620-mod6-web-scraping https://github.com/denisecase/620-mod6-web-scraping/blob/main/web_scraping.ipynb

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
README.md		README.md
article-summarizer.html		article-summarizer.html
article-summarizer.ipynb		article-summarizer.ipynb
article.html		article.html
article.pkl		article.pkl
article.txt		article.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Mining and Applied NLP (44-620)

Final Project: Article Summarizer

Student Name: Naiema Elsaadi

Introduction

Key features of the project include:

Objectives

Prerequisites

Getting Started

Before we start

Modules

Installation

Data Source

Contact

Acknowledgments

Reference

About

Releases

Packages

Languages

NaiemaElsaadi/article-summarizer-Final-Project

Folders and files

Latest commit

History

Repository files navigation

Web Mining and Applied NLP (44-620)

Final Project: Article Summarizer

Student Name: Naiema Elsaadi

Introduction

Key features of the project include:

Objectives

Prerequisites

Getting Started

Before we start

Modules

Installation

Data Source

Contact

Acknowledgments

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages