-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Welcome to the web scraping demo repository. In this repository, I have coded a backend server capable of web scraping, and storing that data in a database.
I have used different frameworks for the backend API and the Supabase Edge function.
For developing the backend API, I have used Node.js with the Node Package Manager (NPM).
I have created API endpoints that can fetch, create, and delete scrapers. I have also added endpoints for starting the web scraper (it triggers the edge function dealing with the web scraping) and to fetch the output data linked with a scraper. Additionally, I have also added the standard auth endpoints (sign in, sign up, sign out) that use Supabase auth for authentication.
For the Supabase edge function, I have used TypeScript, that is JavaScript with a type framework added to it. Supabase requires its edge functions to be written in TypeScript, and also use the Deno Package Manager. Instead of downloading the actual packages, I just have to reference them with a link.
The setup for the supabase functions was a bit complicated. I first had to install the Supabase CLI for creating and editing a function locally. I then had to install Docker to be able to package the function for testing and deploying.
I have used Supabase as the database to store the scraper configuration and output. the reason behind this was that Supabase had everything I needed. Also, unlike AWS, the Supabase free tier wasn't limited to just one year, and this was a bit concerning, considering that AWS provided EC2 instances to run our web server on a public domain, which meant I could have stuck to just one service (AWS) had this limitation not beeen an issue.
In Supabase, I set up 2 tables, one for storing the scraper configuration (name, uid, url, selectors), and another table for storing the bucket link to the output data, the time at which it was generated, and the scraper it belongs to (scraper id was a foreign key that linked to the corresponding scraper in the scraper configuration table). I then setup email and password authentication and created a bucket for storing the scraper outputs.
Developed by Aadi Umrani