Skip to content

This Python web scraping project utilizes Selenium and BeautifulSoup to extract data from the Google Summer of Code (GSoC) portal, providing information on organizations and their projects for a specific year.

Notifications You must be signed in to change notification settings

bishal-Samanta/gosc-web-scraping-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Python Web Scraping Project

Project Overview

This Python web scraping project is developed for learning and exploration purposes, with no malicious intentions. It aims to provide a practical application of web scraping techniques using Selenium and BeautifulSoup, focusing on extracting data from the Google Summer of Code (GSoC) portal.

Click here to view sample data

Org Data

  • org_name
  • org_description
  • technology
  • topics
  • official_link
  • gsoc_link

Project Data

  • Organization Name
  • Official Link
  • GSOC Link
  • Project Details Link
  • Technology Used
  • Project Topics
  • Project Details Description

Tech Stacks:

  • Programming Language: Python
  • Core Tools: Selenium, BeautifulSoup
  • Database / Visualization: Google Sheet and JSON

Features:

  • Scrape Data: The script is capable of scraping data from the Google GSoC portal based on a particular year.
  • Data Properties: It can generate data for all organizations along with their projects for a specific year, including organization names, descriptions, project details, and relevant links.
  • Data Output: It provides output in both JSON and CSV formats, facilitating data analysis and visualization.

Disclaimer

This project is created solely for learning and exploration purposes, and it is not intended for any malicious activities. Users are advised to use the scraping tool responsibly and in compliance with website terms of service and legal regulations.

About

This Python web scraping project utilizes Selenium and BeautifulSoup to extract data from the Google Summer of Code (GSoC) portal, providing information on organizations and their projects for a specific year.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages