Table of Contents
This repository contains a collection of scrapers designed to extract open source dependencies from various websites and platforms. The goal of the project is to provide a tool for analyzing the dependencies of different software projects, helping developers and researchers gain insights into the usage of open source libraries and frameworks.
The scrapers are developed using a combination of web scraping techniques and data extraction methods. They are capable of extracting information such as package names, versions from websites and platforms commonly used for software development.
By utilizing these scrapers, users can retrieve valuable data about the dependencies of specific companies, packages, or projects. This information can be used for various purposes, including identifying common dependencies among companies, tracking the usage of specific packages across different projects, and exploring the relationships between different open source libraries.
We have the following Scrapers:
- Slack
- Spotify
- Cisco
- Samsung
- Porsche
- Discord
- Broadcom
- Confluent
- Adlock
- Apple Maps
- Bocada
- Bosch
- Genesis
- Veertu
- Smartsheet
- Spark
- Camunda
- Oracle Fusion Platform
- Box
- Parasoft Enterprise
- Clue.io
- Cognition
- Nvidia
- Shoott
- Meta Ray Ban
- Panapto
- Giphy
- Flexera
- Parsec
- Spaceti
Before running the project, make sure you have the following prerequisites installed:
- Python 3: You should have Python 3 installed in order to run the project.
- Docker: Docker is required to run the project. You can download and install Docker from the official website.
-
Clone the repo
git clone https://github.com/FOSSRIT/FOCUSED-Web-Scraper.git
-
Run docker compose
cd FOCUSED-Web-Scraper docker-compose build docker-compose up
There are 4 GET API calls to get the data
To get all the dependencies
http://127.0.0.1:4000/dependencies
To get all the dependencies of a company along with shared authors of a dependency
http://127.0.0.1:4000/dependencies/company/<CompanyName>
# example
http://127.0.0.1:4000/dependencies/company/slack
To get the list of companies which share that particular dependency
http://127.0.0.1:4000/dependencies/<Company>/<Package>
# example
http://127.0.0.1:4000/dependencies/slack/buffer
To get the list of companies which share that particular dependency
http://127.0.0.1:4000/dependencies/package/<Package>
# example
http://127.0.0.1:4000/dependencies/package/react
http://localhost:3000/
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the MIT License.
If you have any questions, suggestions, or encounter any issues while using the scrapers, please don't hesitate to contact us.
Team at Open@RIT