News Aggregator API

A Django-based REST API for aggregating news articles from RSS feeds, scraping article content, and providing music search functionality via iTunes API.

Features

News Aggregation: Fetches articles from multiple RSS feeds across categories (World, Technology, Sports, Entertainment, Nigeria)
Article Scraping: Automatically scrapes full article content from whitelisted domains (ESPN, TechCrunch, Al Jazeera)
Periodic Updates: Background scraper that runs every 5 minutes to update articles and fetch new content
Music Search: Integration with iTunes API for music search and details
Caching: Built-in caching for improved performance
CORS Support: Configured for frontend integration
RESTful API: JSON-based API with pagination support

Installation

Prerequisites

Python 3.8+
pip
Virtual environment (recommended)

Setup

Clone the repository (if applicable) or navigate to the project directory

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```

Set up environment variables: Create a .env file in the project root with the following variables:

DJANGO_SECRET_KEY=your-secret-key-here
DJANGO_DEBUG=True
DJANGO_ALLOWED_HOSTS=127.0.0.1,localhost,your-domain.com
CORS_ALLOW_ALL=True
CORS_ALLOWED_ORIGINS=http://127.0.0.1:3000,https://your-frontend-domain.com
CSRF_TRUSTED_ORIGINS=https://your-frontend-domain.com

Run database migrations:
```
python manage.py migrate
```
Create a superuser (optional, for Django admin):
```
python manage.py createsuperuser
```

Usage

Running the Development Server

python manage.py runserver

The API will be available at http://127.0.0.1:8000/api/

Running the Periodic Scraper

The periodic scraper runs automatically every 5 minutes to:

Scrape unscraped articles
Update existing articles
Fetch new articles from RSS feeds

Automatic Startup

To start the scraper automatically when the Django application starts, set the environment variable:

START_PERIODIC_SCRAPER=True

Note: This is recommended for production deployments where you want continuous background scraping. In development, you may prefer to run it manually.

Manual Execution

To run it manually:

python manage.py scrape_periodically

To customize the interval:

python manage.py scrape_periodically --interval=600  # 10 minutes

Running with Gunicorn (Production)

gunicorn app:app --bind 0.0.0.0:8000

API Endpoints

News Endpoints

GET /api/news - List news articles
- Query parameters:
  - category: Filter by category (world, technology, sports, entertainment, nigeria, all)
  - limit: Number of articles to return (default: 12)
  - offset: Pagination offset (default: 0)
GET /api/news/{item_id} - Get detailed article information
- Includes full scraped content if available
GET /api/trending - Get trending articles

Music Endpoints

GET /api/music - Search music tracks
- Query parameters:
  - term: Search term
  - limit: Number of results (default: 24)
  - offset: Pagination offset (default: 0)
  - country: Country code (default: US)
GET /api/music/{track_id} - Get detailed track information

Example API Responses

News List

{
  "results": [
    {
      "id": "encoded-url-id",
      "title": "Article Title",
      "excerpt": "Article excerpt...",
      "image": "https://example.com/image.jpg",
      "source": "BBC News",
      "publishedAt": "2024-01-15T10:30:00Z",
      "trending": true,
      "author": "John Doe",
      "category": "World",
      "readTime": "5 min read",
      "link": "https://example.com/article"
    }
  ]
}

News Detail

{
  "id": "encoded-url-id",
  "link": "https://example.com/article",
  "title": "Article Title",
  "content": "Full article content...",
  "scraped": true,
  "author": "John Doe",
  "publishedAt": "2024-01-15T10:30:00Z",
  "source": "BBC News",
  "category": "World",
  "readTime": "5 min read",
  "image": "https://example.com/image.jpg"
}

Environment Variables

Variable	Description	Default
`DJANGO_SECRET_KEY`	Django secret key	dev-secret-key-change-me
`DJANGO_DEBUG`	Enable/disable debug mode	True
`DJANGO_ALLOWED_HOSTS`	Comma-separated list of allowed hosts	*
`CORS_ALLOW_ALL`	Allow all CORS origins	True
`CORS_ALLOWED_ORIGINS`	Comma-separated list of allowed CORS origins	(empty)
`CSRF_TRUSTED_ORIGINS`	Comma-separated list of trusted CSRF origins	(empty)
`START_PERIODIC_SCRAPER`	Automatically start the periodic scraper when the app starts	False

Database

The project uses SQLite by default. To use a different database, modify the DATABASES setting in core/settings.py.

Deployment

Local Development

Follow the installation steps above.

Production Deployment

Set DJANGO_DEBUG=False
Use a production WSGI server like Gunicorn
Configure a reverse proxy (nginx/Apache)
Set up proper environment variables
Run database migrations
Collect static files: python manage.py collectstatic

Running scrape_periodically.py in Shared Hosting (cPanel)

Shared hosting environments like cPanel have limitations that make running long-running processes like the periodic scraper challenging:

Limitations:

No persistent processes: cPanel typically terminates long-running scripts after a few minutes
No cron job support: Some shared hosts don't provide cron job access
Resource restrictions: Limited CPU/memory for background tasks
Timeout issues: Scripts may timeout before completing

Best Practice for cPanel:

Given the constraints, consider deploying the main API to cPanel and running the scraper on a separate platform that supports background processes.

Contributing

Fork the repository
Create a feature branch
Make your changes
Run tests (if any)
Submit a pull request

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
api		api
core		core
.env		.env
README.md		README.md
app.py		app.py
db.sqlite3		db.sqlite3
manage.py		manage.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

News Aggregator API

Features

Installation

Prerequisites

Setup

Usage

Running the Development Server

Running the Periodic Scraper

Automatic Startup

Manual Execution

Running with Gunicorn (Production)

API Endpoints

News Endpoints

Music Endpoints

Example API Responses

News List

News Detail

Environment Variables

Database

Deployment

Local Development

Production Deployment

Running scrape_periodically.py in Shared Hosting (cPanel)

Limitations:

Recommended Solutions:

Best Practice for cPanel:

Contributing

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

devcarkson/backend

Folders and files

Latest commit

History

Repository files navigation

News Aggregator API

Features

Installation

Prerequisites

Setup

Usage

Running the Development Server

Running the Periodic Scraper

Automatic Startup

Manual Execution

Running with Gunicorn (Production)

API Endpoints

News Endpoints

Music Endpoints

Example API Responses

News List

News Detail

Environment Variables

Database

Deployment

Local Development

Production Deployment

Running scrape_periodically.py in Shared Hosting (cPanel)

Limitations:

Recommended Solutions:

Best Practice for cPanel:

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages