InsightBot is a news/article aggregation, processing, and exploration platform. It allows you to fetch, clean, analyze, and visualize articles from a wide range of sources, and provides a Windows XP–inspired web interface for browsing and searching articles.
Blog: https://insightbot-techwiz.blogspot.com/2025/09/insightbot-making-daily-news-simple.html
insightbot/
│
├── app.py # Main Flask web app (browse/search articles)
├── fetch_process_upload.py # Fetch/process/upload articles from any website (user input)
├── upload_to_mongodb.py # Upload preprocessed articles to MongoDB
├── preprocess_articles.py # Clean/process the raw dataset
├── insightbot_dataset_builder.py # Build raw dataset from 40+ news sites
├── requirements.txt # Python dependencies
└── data/
└── extracted/
└── articles_YYYYMMDD.json # Raw dataset (output of dataset_builder)
└── preprocessed/
└── articles_preprocessed.json # Cleaned dataset (output of preprocess_articles)
git clone <your-repo-url>
cd insightbot
python -m venv venv
# On Windows:
venv\Scripts\activate
# On Linux/Mac:
source venv/bin/activate
pip install -r requirements.txt
Fetches articles from 40+ news sites and saves them as a raw JSON file.
python insightbot_dataset_builder.py
- Output:
data/extracted/articles_YYYYMMDD.json
Cleans, normalizes, and analyzes the raw dataset.
python preprocess_articles.py
- Output:
data/preprocessed/articles_preprocessed.json
Uploads the preprocessed articles to your local MongoDB database.
python upload_to_mongodb.py
You can fetch, process, and upload articles from any website (not just the built-in sources):
python fetch_process_upload.py
- Enter the website URL when prompted.
- The script will process and upload new articles to MongoDB.
The main interface is a Flask app that lets you browse, search, and fetch new articles.
python app.py
or (if you prefer Flask CLI):
set FLASK_APP=app.py # On Windows
export FLASK_APP=app.py # On Linux/Mac
flask run
- Browse all articles in the database (with "Show More" pagination).
- Filter by website/source using the dropdown.
- Search/fetch new articles from any website (just enter the URL).
- Responsive, Windows XP–inspired UI.
- Modal window for reading articles in-app, with a link to the original source.
- Live updates: When fetching new articles, the UI polls for new content and displays it as soon as it's ready.
- Make sure MongoDB is running locally on
mongodb://localhost:27017
. - The database used is
insightbot
, and the collection isarticles
. - You can change these in the config section of each script if needed.
-
insightbot_dataset_builder.py
Fetches and creates the raw dataset (articles_YYYYMMDD.json
) from 40+ news sites. -
preprocess_articles.py
Cleans and processes the raw dataset, producingarticles_preprocessed.json
. -
upload_to_mongodb.py
Uploads the preprocessed articles to the MongoDB database. -
fetch_process_upload.py
Fetches, processes, and uploads articles from any user-supplied website. -
app.py
Main Flask web app. Shows all articles from the database, allows filtering, and lets users search for new articles from any website.
- Python 3.8+
- MongoDB (local or remote)
- See
requirements.txt
for all Python dependencies.
- Build the dataset:
python insightbot_dataset_builder.py
- Preprocess the dataset:
python preprocess_articles.py
- Upload to MongoDB:
python upload_to_mongodb.py
- Run the web app:
python app.py
- (Optional) Fetch articles from a new website:
python fetch_process_upload.py