Skip to content

GitHub Profile Scraper #7

@FarazzShaikh

Description

@FarazzShaikh

First and Last Name

Faraz Shaikh

Email

farazzshaikh@gmail.com
frzskh@hw.ac.uk

Company/Organization (Ex: Heriot-Watt)

Heriot-Watt

Job Title (Ex: Student)

Student

Project Title

GitHub profile scraper (will think of something more creative later)

Briefly describe the project

See bellow

What kind of machines and how many do you expect to use?

None

What operating system and networking are you planning to use?

None?

Any other relevant details we should know about?

See bellow

Additional context


GitHub profile scraper

A self-hosted GitHub profile scrapper. This can be used as a middle-man between your site and GitHub's API.

The problem

The official GitHub API rate limits you to about 60 requests an hour for core and 20 for search. Furthermore, some data simply requires some API gymnastics to retrieve.

Yes the GraphQL API does exist and is better but do you really want to set up GraphQL for static sites? I don't. Besides, its a cool little side project to spend a week on.

The Solution

This will use Firebase Cloud Functions to run a function every couple hours (or whatever interval) and scrape the contents of a GitHub profile via ether good ol' Web Scraping or the GitHub API itself. After that, it will store all the data as one or two documents in Firebase Realtime Database.

The user can then run another Cloud Function to fetch the data from the database. Something like this:

Group 1 (3)

The Use

You can use this to include "real time" GitHub stats in your whatever. Personally, I will use this to do the same in my portfolio site.

The data that would be useful is things like

  • Latest Commit
  • Latest Repository
  • The languages for a repository
  • Stars on a repository

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions