Skip to content

WanderingObserver/api-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

terminal

Scraper

Web Scraper API

Scraper Screenshot
Scrape any website's H1, H2, H3 and link tags.

Getting Started with Scraper

These instructions will get you a copy of the project up and running on your local machine for development.

Prerequisites

Things you need to install beforehand:

  • Rails - Ruby Framework

Installing

Open terminal and run the following lines of code to clone and run this project.

$ git clone https://github.com/SeeYouSpaceCowboy/api-scraper.git
$ cd api-scraper
$ bundle install
$ rails s

This project should now be running locally on port 3000.

About

These instructions are a small documentation of the how the API should behave.

Example calls are made using JavaScript's axios npm package.

Scrape Given URL

Saves H1, H2, H3 and links from a given URL to the database.

URL Endpoint Method URL Params Success Response
/url POST Required: link=[string] 200

Sample Call:

  axios.post('http://localhost:3000/v1/urls', { url: 'http://dailynews.com' })
    .then(response => response.data)
    .catch(error => error)

Content:

[
  {
    "link": "http://dailynews.com",
    "h1": [
      {
          "content": "Passenger dies after car crash in North Hollywood",
          "link": "http://www.dailynews.com/2017/10/27/1-in-critical-condition-after-car-crash-in-north-hollywood/"
      },
      ...
    ],
    "h2": [
      {
          "content": "LA Metro security guards attacked near Watts station; one shot at with his own gun",
          "link": "http://www.dailynews.com/2017/10/27/la-metro-security-guards-attacked-near-watts-station-one-shot-at-with-his-own-gun/"
      },
      ...
    ],
    "h3": [ ... ],
    "a": [ ... ]
  },
  ...
]

Get All H1, H2, H3 and Links

Get back H1, H2, H3 and links from a previously saved URLs from the database.

URL Endpoint Method URL Params Success Response
/url GET N/A 200

Sample Call:

  axios.get('http://localhost:3000/v1/urls')
    .then(response => response.data)
    .cathc(error => error)

Content:

[
  {
    "link": "http://dailynews.com",
    "h1": [
      {
          "content": "Passenger dies after car crash in North Hollywood",
          "link": "http://www.dailynews.com/2017/10/27/1-in-critical-condition-after-car-crash-in-north-hollywood/"
      },
      ...
    ],
    "h2": [
      {
          "content": "LA Metro security guards attacked near Watts station; one shot at with his own gun",
          "link": "http://www.dailynews.com/2017/10/27/la-metro-security-guards-attacked-near-watts-station-one-shot-at-with-his-own-gun/"
      },
      ...
    ],
    "h3": [ ... ],
    "a": [ ... ]
  },
  ...
]

Contributors

Scraper was built by Mohammed Chisti.

About

A simple API build to scrape h1, h2, h3 & link tags from a given URL

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published