Skip to content

coderJT/News-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

News scraper app made with Javascript and Python with NLP

Screenshot
This project mainly consists of a scraper and a dashboard.
The scraper is made with Selenium and BeautifulSoup, and tested on a news website https://thestar.com.my.
The dashboard is made with React.JS and styled with Material UI.
NLP methods:

  1. Sentiment Analysis - Made with nltk library
  2. Summarizer - Made with Sumy's LsaSummarizer

Dashboard overview

  1. A table of data consisting of fields (No, Title, Date, Tag/Category, Content) is displayed on the left side of the screen after data is fetched or scraped.

  2. Functionalities are provided and interactable through buttons in the middle part of the screen:

  • Fetch news = Fetch previously scraped news from database (mongoDB) (~1-6 seconds)
  • Scrape news = Create instance of scraper to scrape news from source (~20-30 seconds)
  • Reset news = Clear data from database (~1-3 seconds)
  • Sentiment analysis = Perform sentiment analysis on selected news article (~1-3 seconds)
  • Summarize = Perform summarizing on selected news article (~1-3 seconds)
  • Fetch news (With Tags) = Fetch previously scraped new from database according to tags
  • Scrape news (With Tags) = Scrape news from source according to tags
  1. A details column on the right side of the screen then displays the following info when a news is selected:

    • Title
    • Url
    • Sentiment Analysis Result
    • Summary
    • Content
  2. Limited selection of themes is added and can be exposed through the sidebar.

To start:

  • npm install
  • npm start (might take more time initially to download nltk files)

About

News scraper (The Star) and analysis, all in one dashboard.

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •