Skip to content

AnshulH/API-Crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 

Repository files navigation

Basic Public APIs List Crawler

This is an api crawler that fetches public apis data from POSTMAN. Support for rate limiting and pagination.

More reference - Public APIs github repo (https://github.com/public-apis/public-apis) is a collective list of free APIs for use in software and web development.

Steps to run the code

  • To fetch all api data to database python3 main.py in source file
    • All data should be fetched by this command to database files
  • To run queries on data base simply run python3 queryDB.py

Database Schema

DB Schema

Points Achieved

  • Your code should follow concept of OOPS
  • Support for Authentication & token expiration
  • Support for Pagination to get all data
  • Support for Rate limiting
  • Crawled all API entries for all categories and stored it in a database

Things to Improve with time / TODO List

  • Database is not scalable for more large number of entries. Use another Scalable Db framework.
  • Need to implement checks while writing to db to avoid copies.
  • A few unnecessary token calls are bring made and can be avoided.
  • Implement better async code to speed up processing.
  • Use docker to run the code.

About

A minimal public api crawler

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages