A web scraping / data mining script for extracting beginner-friendly github repos from Y Combinator's company database: https://www.ycombinator.com/companies/
Watch this YouTube video for a detailed explanation, and more!
This scraper is designed with Microverse students and graduates in mind. It extracts the most recent and active Y Combinator github repos, written in JavaScript and Ruby, which are the two languages Micronauts use the most!
The resulting list as of July 2022 for YC batches S21, W21, S22 and W22, is in 05_final_repos.csv
. Be sure to check it out to find beginner-friendly repos written in JS and Ruby.
- Clone the repository.
$ cd ycombinator_githubs
$ npm install
- Generate your github Personal Access Token as detailed in
.env.sample
- Change the name of
.env.sample
to simply.env
$ ./scrape.sh
For details on how the scraping works, or what the .csv
files contain, read scrape.sh