Skip to content

JumpsuitWizard/FOCUSED-Web-Scraper

Repository files navigation

Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Contributing
  5. License
  6. Contact

About The Project

This repository contains a collection of scrapers designed to extract open source dependencies from various websites and platforms. The goal of the project is to provide a tool for analyzing the dependencies of different software projects, helping developers and researchers gain insights into the usage of open source libraries and frameworks.

The scrapers are developed using a combination of web scraping techniques and data extraction methods. They are capable of extracting information such as package names, versions from websites and platforms commonly used for software development.

By utilizing these scrapers, users can retrieve valuable data about the dependencies of specific companies, packages, or projects. This information can be used for various purposes, including identifying common dependencies among companies, tracking the usage of specific packages across different projects, and exploring the relationships between different open source libraries.

We have the following Scrapers:

  1. Slack
  2. Spotify
  3. Cisco
  4. Samsung
  5. Porsche
  6. Discord
  7. Broadcom
  8. Confluent
  9. Adlock
  10. Apple Maps
  11. Bocada
  12. Bosch
  13. Genesis
  14. Veertu
  15. Smartsheet
  16. Spark
  17. Camunda
  18. Oracle Fusion Platform
  19. Box
  20. Parasoft Enterprise
  21. Clue.io
  22. Cognition
  23. Nvidia
  24. Shoott
  25. Meta Ray Ban
  26. Panapto
  27. Giphy
  28. Flexera
  29. Parsec
  30. Spaceti

(back to top)

Built With

  • Python
  • Beautiful Soup
  • Flask
  • Docker
  • React

(back to top)

Getting Started

Prerequisites

Before running the project, make sure you have the following prerequisites installed:

  • Python 3: You should have Python 3 installed in order to run the project.
  • Docker: Docker is required to run the project. You can download and install Docker from the official website.

Installation

  1. Clone the repo

    git clone https://github.com/FOSSRIT/FOCUSED-Web-Scraper.git
  2. Run docker compose

     cd FOCUSED-Web-Scraper
     docker-compose build
     docker-compose up

(back to top)

Usage

There are 4 GET API calls to get the data

To get all the dependencies
 http://127.0.0.1:4000/dependencies  
 To get all the dependencies of a company along with shared authors of a dependency
 http://127.0.0.1:4000/dependencies/company/<CompanyName>
 # example
 http://127.0.0.1:4000/dependencies/company/slack 
 To get the list of companies which share that particular dependency
 http://127.0.0.1:4000/dependencies/<Company>/<Package>
 # example
 http://127.0.0.1:4000/dependencies/slack/buffer
To get the list of companies which share that particular dependency
 http://127.0.0.1:4000/dependencies/package/<Package>
 # example
 http://127.0.0.1:4000/dependencies/package/react

To run the React Application

 http://localhost:3000/

(back to top)

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

(back to top)

License

Distributed under the MIT License.

(back to top)

Contact

If you have any questions, suggestions, or encounter any issues while using the scrapers, please don't hesitate to contact us.

Team at Open@RIT

(back to top)