NewsScrape is a Python script that processes and beautifies news data from a JSON file. It extracts content from URLs, removes special characters, and rephrases the content for a cleaner presentation.
Make sure you have Python installed on your machine. You can download it from python.org.
Install the required Python packages using:
pip install -r requirements.txt
-
Clone the Repository:
git clone https://github.com/Aayush518/NewsScrape.git cd NewsScrape
-
Prepare Your News Data:
Update the
updated_news_data.json
file with your news data. Each news item should have "headline," "content," and "summary" fields. -
Run the Script:
python generate_single.py
This will process each news item, create individual text files, and store them in the project directory.
generate_single.py
: The main script to process news data.rephrase.py
: Additional script for rephrasing content (optional).updated_news_data.json
: JSON file containing news data.
The script will create individual text files (e.g., news_1.txt
, news_2.txt
) for each news item with the original title and rephrased content.