Article Scraper

Description

An article scraper is used to extract all the necessary information from news and articles. The motivation for this project emerged when I tried to read content online but always got blocked by unnecessary paywalls or sign up/sign in banners. Now, with this scraper, one can extract the content from digital news and articles, bypassing these banner popups and accessing our favourite content.

Input & Output:

Input: { 'url': string }

Output: { 'article_title': string | None, 'description': string | None, 'article_content': string | None, 'author': string | None, 'publish_date': string | None, 'article_url': string | None, 'canonical_url': string | None, 'publisher_name': string | None, 'image': string | None, 'keywords': string | None, 'video_url': string | None, 'audio_url': string | None }

Tech Stack:

Python 3.7, Beautiful Soup, Docker, AWS Lambda, AWS ECR, Github and GitHub Actions for CI/CD

Deployment:

This project has been deployed as an AWS Lambda function using a container image from the AWS ECR service and made available. AWS Lambda functions are cost effective solutions for personal projects that don't have a lot of network traffic. GitHub Actions is used for CI/CD pipelines where every git push triggers the pipeline and the updated docker container is pushed to AWS ECR.

Try It Out:

The Front-end Interface is made using streamlit and deployed on HuggingFace Spaces: Link to Streamlit App: https://huggingface.co/spaces/rahulNenavath305/article-scraper

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Article Scraper

Description

Input & Output:

Tech Stack:

Deployment:

Try It Out:

About

Releases

Packages

Languages

RahulNenavath/Article-Scraper

Folders and files

Latest commit

History

Repository files navigation

Article Scraper

Description

Input & Output:

Tech Stack:

Deployment:

Try It Out:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages