# Top Repositories For GitHub Topics

### Introduction

- I am going to collect and parse some raw data from the website of github.
- In this project I will use Python and its libraries like requests, BeatifulSoup,Pandas,etc.

- GitHub is a code hosting platform for version control and collaboration. It lets you and others work together on projects 
  from anywhere.
- A repository contains all of your project's file and each file's revision history. You can discuss and manage your project's
  work within the repository.

### Project Outline
- we're going to scrape https://github.com/topics
- we will get a list of topics. For each topic we'll get topic title, topic page URL and topic description
- For each topic, we'll get the top 25 repositories in the topic from the topic page
- For each repository, we'll grab the repo name, username, stars and repo URL
- **For each topic we'll create a CSV file in the following format:-**
   
   *Repo Name,Username,Stars,Repo URL*
   
   *three.js,mrdoob,69700,https://github.com/mrdoob/three.js*
  
   *libgdx,libgdx,18300,https://github.com/libgdx/libgdx*

*We are importing requests library to download the page*

In [1]:
import requests

In [2]:
#get the webpage

topics_url='https://github.com/topics'

In [3]:
#download the url by creating response object

response=requests.get(topics_url)

In [4]:
#checking the status of the response object

response.status_code

200

In [5]:
#checking the length of the texts in response object

len(response.text)

140923

In [6]:
#storing contents of response object in another variable

page_contents=response.text

In [7]:
# viewing few contents from the page

page_contents[0:1000]

'\n\n<!DOCTYPE html>\n<html lang="en" data-color-mode="auto" data-light-theme="light" data-dark-theme="dark" data-a11y-animated-images="system">\n  <head>\n    <meta charset="utf-8">\n  <link rel="dns-prefetch" href="https://github.githubassets.com">\n  <link rel="dns-prefetch" href="https://avatars.githubusercontent.com">\n  <link rel="dns-prefetch" href="https://github-cloud.s3.amazonaws.com">\n  <link rel="dns-prefetch" href="https://user-images.githubusercontent.com/">\n  <link rel="preconnect" href="https://github.githubassets.com" crossorigin>\n  <link rel="preconnect" href="https://avatars.githubusercontent.com">\n\n\n\n  <link crossorigin="anonymous" media="all" integrity="sha512-ksfTgQOOnE+FFXf+yNfVjKSlEckJAdufFIYGK7ZjRhWcZgzAGcmZqqArTgMLpu90FwthqcCX4ldDgKXbmVMeuQ==" rel="stylesheet" href="https://github.githubassets.com/assets/light-92c7d381038e.css" /><link crossorigin="anonymous" media="all" integrity="sha512-1KkMNn8M/al/dtzBLupRwkIOgnA9MWkm8oxS+solP87jByEvY/g4BmoxLihRogKcX

In [8]:
#save the page contents in a file

with open('webpage.html','w',newline='',encoding='UTF8') as f:
    f.write(page_contents)

### Use BeautifulSoup to Parse and Extract Information

In [9]:
#importing BeautifulSoup

from bs4 import BeautifulSoup

In [10]:
# Parsing the page contents using BeautifulSoup and storing it in a variable

doc=BeautifulSoup(page_contents,'html.parser')

In [11]:
#Checking the datatype of doc

type(doc)

bs4.BeautifulSoup

In [12]:
# Finding all the p tags in the page

p_tags=doc.find_all('p')

In [13]:
#checking the lengths of the p tags

len(p_tags)

67

In [14]:
#viewing first five p tags

p_tags[0:5]

[<p class="f4 color-fg-muted col-md-6 mx-auto">Browse popular topics on GitHub.</p>,
 <p class="f3 lh-condensed text-center Link--primary mb-0 mt-1">
         Web Components
       </p>,
 <p class="f5 color-fg-muted text-center mb-0 mt-1">Web Components are a set of web platform APIs developers can use to create custom HTML tags.</p>,
 <p class="f3 lh-condensed text-center Link--primary mb-0 mt-1">
         GitHub API
       </p>,
 <p class="f5 color-fg-muted text-center mb-0 mt-1">The GitHub API allows you to build applications that integrate with GitHub.</p>]

In [15]:
# getting all topic title's tag from the topic page

selection_class='f3 lh-condensed mb-0 mt-1 Link--primary'
topic_title_tags=doc.find_all('p',{'class':selection_class})

In [16]:
#checking the length of topic titles (total no. of titles)

len(topic_title_tags)

30

In [17]:
# Viewing first five titles

topic_title_tags[:5]

[<p class="f3 lh-condensed mb-0 mt-1 Link--primary">3D</p>,
 <p class="f3 lh-condensed mb-0 mt-1 Link--primary">Ajax</p>,
 <p class="f3 lh-condensed mb-0 mt-1 Link--primary">Algorithm</p>,
 <p class="f3 lh-condensed mb-0 mt-1 Link--primary">Amp</p>,
 <p class="f3 lh-condensed mb-0 mt-1 Link--primary">Android</p>]

In [18]:
#viewing first topic title

topic_title_tag0=topic_title_tags[0]
topic_title_tag0

<p class="f3 lh-condensed mb-0 mt-1 Link--primary">3D</p>

In [19]:
#viewing parent of topic title tag ,i.e., div tag

div_tag = topic_title_tag0.parent

In [20]:
#getting description of all the topics from the page

desc_selector='f5 color-fg-muted mb-0 mt-1'
topic_desc_tags=doc.find_all('p',{'class':desc_selector})
topic_descs=[]

for tag in topic_desc_tags:
    topic_descs.append(tag.text.strip())
topic_descs    

['3D modeling is the process of virtually developing the surface and structure of a 3D object.',
 'Ajax is a technique for creating interactive web applications.',
 'Algorithms are self-contained sequences that carry out a variety of tasks.',
 'Amp is a non-blocking concurrency library for PHP.',
 'Android is an operating system built by Google designed for mobile devices.',
 'Angular is an open source web application platform.',
 'Ansible is a simple and powerful automation engine.',
 'An API (Application Programming Interface) is a collection of protocols and subroutines for building software.',
 'Arduino is an open source hardware and software company and maker community.',
 'ASP.NET is a web framework for building modern web apps and services.',
 'Atom is a open source text editor built with web technologies.',
 'An awesome list is a list of awesome things curated by the community.',
 'Amazon Web Services provides on-demand cloud computing platforms on a subscription basis.',
 'Azu

In [21]:
#viewing description of first five topics

topic_desc_tags[:5]

[<p class="f5 color-fg-muted mb-0 mt-1">
           3D modeling is the process of virtually developing the surface and structure of a 3D object.
         </p>,
 <p class="f5 color-fg-muted mb-0 mt-1">
           Ajax is a technique for creating interactive web applications.
         </p>,
 <p class="f5 color-fg-muted mb-0 mt-1">
           Algorithms are self-contained sequences that carry out a variety of tasks.
         </p>,
 <p class="f5 color-fg-muted mb-0 mt-1">
           Amp is a non-blocking concurrency library for PHP.
         </p>,
 <p class="f5 color-fg-muted mb-0 mt-1">
           Android is an operating system built by Google designed for mobile devices.
         </p>]

In [22]:
#getting the link of each topics

topic_link_tags=doc.find_all('a',{'class':'no-underline flex-1 d-flex flex-column'})

In [23]:
#viewing the total no of links we get 

len(topic_link_tags)

30

In [24]:
#viewing the tag of link of the first topic

topic_link_tags[0]

<a class="no-underline flex-1 d-flex flex-column" href="/topics/3d">
<p class="f3 lh-condensed mb-0 mt-1 Link--primary">3D</p>
<p class="f5 color-fg-muted mb-0 mt-1">
          3D modeling is the process of virtually developing the surface and structure of a 3D object.
        </p>
</a>

In [25]:
#getting the link of first topic

topic_link_tags[0]['href']

'/topics/3d'

In [26]:
#getting the complete url of the first topic

topic0_url="https://github.com"+topic_link_tags[0]['href']
print(topic0_url)

https://github.com/topics/3d


In [27]:
#getting text (topic name) from the topic title tag 

topic_title_tags[0].text

'3D'

In [28]:
#storing all the topic title in a list

topic_titles=[]

for tag in topic_title_tags:
    topic_titles.append(tag.text)

topic_titles    

['3D',
 'Ajax',
 'Algorithm',
 'Amp',
 'Android',
 'Angular',
 'Ansible',
 'API',
 'Arduino',
 'ASP.NET',
 'Atom',
 'Awesome Lists',
 'Amazon Web Services',
 'Azure',
 'Babel',
 'Bash',
 'Bitcoin',
 'Bootstrap',
 'Bot',
 'C',
 'Chrome',
 'Chrome extension',
 'Command line interface',
 'Clojure',
 'Code quality',
 'Code review',
 'Compiler',
 'Continuous integration',
 'COVID-19',
 'C++']

In [29]:
#storing all the topic description in a list

topic_descs=[]

for tag in topic_desc_tags:
    topic_descs.append(tag.text.strip())
    
topic_descs[:5]   

['3D modeling is the process of virtually developing the surface and structure of a 3D object.',
 'Ajax is a technique for creating interactive web applications.',
 'Algorithms are self-contained sequences that carry out a variety of tasks.',
 'Amp is a non-blocking concurrency library for PHP.',
 'Android is an operating system built by Google designed for mobile devices.']

In [30]:
#storing all the topic complete url in a list

topic_urls=[]
base_url="https://github.com"

for tag in topic_link_tags:
    topic_urls.append(base_url + tag['href'])
    
topic_urls    

['https://github.com/topics/3d',
 'https://github.com/topics/ajax',
 'https://github.com/topics/algorithm',
 'https://github.com/topics/amphp',
 'https://github.com/topics/android',
 'https://github.com/topics/angular',
 'https://github.com/topics/ansible',
 'https://github.com/topics/api',
 'https://github.com/topics/arduino',
 'https://github.com/topics/aspnet',
 'https://github.com/topics/atom',
 'https://github.com/topics/awesome',
 'https://github.com/topics/aws',
 'https://github.com/topics/azure',
 'https://github.com/topics/babel',
 'https://github.com/topics/bash',
 'https://github.com/topics/bitcoin',
 'https://github.com/topics/bootstrap',
 'https://github.com/topics/bot',
 'https://github.com/topics/c',
 'https://github.com/topics/chrome',
 'https://github.com/topics/chrome-extension',
 'https://github.com/topics/cli',
 'https://github.com/topics/clojure',
 'https://github.com/topics/code-quality',
 'https://github.com/topics/code-review',
 'https://github.com/topics/compil

In [31]:
# importing pandas to convert informations into dataframe

import pandas as pd

In [32]:
#storing topic title, description and url in a dictionary

topics_dict={'title':topic_titles,'description':topic_descs,'url':topic_urls}

In [33]:
# forming a table of topic title, description and url using pandas dataframe

topics_df=pd.DataFrame(topics_dict)
topics_df

Unnamed: 0,title,description,url
0,3D,3D modeling is the process of virtually develo...,https://github.com/topics/3d
1,Ajax,Ajax is a technique for creating interactive w...,https://github.com/topics/ajax
2,Algorithm,Algorithms are self-contained sequences that c...,https://github.com/topics/algorithm
3,Amp,Amp is a non-blocking concurrency library for ...,https://github.com/topics/amphp
4,Android,Android is an operating system built by Google...,https://github.com/topics/android
5,Angular,Angular is an open source web application plat...,https://github.com/topics/angular
6,Ansible,Ansible is a simple and powerful automation en...,https://github.com/topics/ansible
7,API,An API (Application Programming Interface) is ...,https://github.com/topics/api
8,Arduino,Arduino is an open source hardware and softwar...,https://github.com/topics/arduino
9,ASP.NET,ASP.NET is a web framework for building modern...,https://github.com/topics/aspnet


#### Create CSV file(s) with the extracted information

In [34]:
topics_df.to_csv('topics.csv',index=None)

### Getting information out of a topic page

- Here we are going to scrape information of the top repositories of the topic.
- We will scrape the information like username, repositories name, its url, number of stars given to repository.
- Here we are firstly scraping the information of first topic, then we will scrape the information for all topics. 

In [35]:
topic_page_url=topic_urls[0]
topic_page_url

'https://github.com/topics/3d'

In [36]:
response=requests.get(topic_page_url)

In [37]:
response.status_code

200

In [38]:
len(response.text)

641679

In [39]:
# Extracting and parsing the info of fist topic using BeautifulSoup

topic_doc=BeautifulSoup(response.text,'html.parser')

In [40]:
# getting the repositories tag for the topics

h3_selection_class='f3 color-fg-muted text-normal lh-condensed'
repo_tags=topic_doc.find_all('h3',{'class':h3_selection_class})

Just trying to get some information about the repo tags and 'a' tags that it contain 

In [41]:
len(repo_tags)

30

In [42]:
repo_tags[0]

<h3 class="f3 color-fg-muted text-normal lh-condensed">
<a data-ga-click="Explore, go to repository owner, location:explore feed" data-hydro-click='{"event_type":"explore.click","payload":{"click_context":"REPOSITORY_CARD","click_target":"OWNER","click_visual_representation":"REPOSITORY_OWNER_HEADING","actor_id":null,"record_id":97088,"originating_url":"https://github.com/topics/3d","user_id":null}}' data-hydro-click-hmac="4bdbc49d3c05ae7f70b531fbce709a384200b0768554e0172950286a8db30940" data-turbo="false" data-view-component="true" href="/mrdoob">
            mrdoob
</a>
          /
          <a class="text-bold wb-break-word" data-ga-click="Explore, go to repository, location:explore feed" data-hydro-click='{"event_type":"explore.click","payload":{"click_context":"REPOSITORY_CARD","click_target":"REPOSITORY","click_visual_representation":"REPOSITORY_NAME_HEADING","actor_id":null,"record_id":576201,"originating_url":"https://github.com/topics/3d","user_id":null}}' data-hydro-click-hma

In [43]:
a_tags=repo_tags[0].find_all('a')

In [44]:
a_tags[0]

<a data-ga-click="Explore, go to repository owner, location:explore feed" data-hydro-click='{"event_type":"explore.click","payload":{"click_context":"REPOSITORY_CARD","click_target":"OWNER","click_visual_representation":"REPOSITORY_OWNER_HEADING","actor_id":null,"record_id":97088,"originating_url":"https://github.com/topics/3d","user_id":null}}' data-hydro-click-hmac="4bdbc49d3c05ae7f70b531fbce709a384200b0768554e0172950286a8db30940" data-turbo="false" data-view-component="true" href="/mrdoob">
            mrdoob
</a>

In [45]:
a_tags[0].text.strip()

'mrdoob'

In [46]:
a_tags[1].text.strip()

'three.js'

In [47]:
a_tags[1]

<a class="text-bold wb-break-word" data-ga-click="Explore, go to repository, location:explore feed" data-hydro-click='{"event_type":"explore.click","payload":{"click_context":"REPOSITORY_CARD","click_target":"REPOSITORY","click_visual_representation":"REPOSITORY_NAME_HEADING","actor_id":null,"record_id":576201,"originating_url":"https://github.com/topics/3d","user_id":null}}' data-hydro-click-hmac="517d3d5cb9d89752156923904a4238816bc9b51ab7772f3e3644ce897d8dd4e5" data-turbo="false" data-view-component="true" href="/mrdoob/three.js">
            three.js
</a>

In [48]:
a_tags[1]['href']

'/mrdoob/three.js'

In [49]:
repo_url=base_url + a_tags[1]['href']
print(repo_url)

https://github.com/mrdoob/three.js


Getting the star tags of repositories of the topic.

In [50]:
star_tags=topic_doc.find_all('span',{'class':'Counter js-social-count'})

Veiwing some information about star tags.

In [51]:
len(star_tags)

30

In [52]:
star_tags[0]

<span aria-label="82525 users starred this repository" class="Counter js-social-count" data-pjax-replace="true" data-plural-suffix="users starred this repository" data-singular-suffix="user starred this repository" data-view-component="true" id="repo-stars-counter-star" title="82,525">82.5k</span>

In [53]:
star_tags[1].text.strip()

'20.1k'

Defining a function to get number of stars given to repository.

In [54]:
def parse_star_count(stars_str):
    stars_str=stars_str.strip()
    if stars_str[-1] == 'k':
        return int(float(stars_str[:-1])*1000)
    return int(stars_str)

In [55]:
parse_star_count(star_tags[0].text.strip())

82500

Defining a function to get the all information about repository like username, repository name, its url and number of stars.

In [56]:
def get_repo_info(h3_tag,star_tag):
    #returns all the required info about a repository
    a_tags=h3_tag.find_all('a')
    username=a_tags[0].text.strip()
    repo_name= a_tags[1].text.strip()
    repo_url= base_url + a_tags[1]['href']
    stars= parse_star_count(star_tag.text.strip())
    return username, repo_name, stars, repo_url
    

In [57]:
get_repo_info(repo_tags[0],star_tags[0])

('mrdoob', 'three.js', 82500, 'https://github.com/mrdoob/three.js')

Storing all the information of repositories of a topic in the dictionary form.

In [58]:
topic_repos_dict={'username':[],'repo_name':[],'stars':[],'repo_url':[]}

for i in range(len(repo_tags)):
    repo_info=get_repo_info(repo_tags[i],star_tags[i])
    topic_repos_dict['username'].append(repo_info[0])
    topic_repos_dict['repo_name'].append(repo_info[1])
    topic_repos_dict['stars'].append(repo_info[2])
    topic_repos_dict['repo_url'].append(repo_info[3])

In [59]:
topic_repos_dict

{'username': ['mrdoob',
  'libgdx',
  'pmndrs',
  'BabylonJS',
  'aframevr',
  'ssloy',
  'lettier',
  'FreeCAD',
  'metafizzy',
  'CesiumGS',
  'timzhang642',
  'a1studmuffin',
  'isl-org',
  'blender',
  'domlysz',
  'spritejs',
  'openscad',
  'tensorspace-team',
  'jagenjo',
  'YadiraF',
  'AaronJackson',
  'google',
  'ssloy',
  'mosra',
  'FyroxEngine',
  'gfxfundamentals',
  'tengbao',
  'cleardusk',
  'jasonlong',
  'cnr-isti-vclab'],
 'repo_name': ['three.js',
  'libgdx',
  'react-three-fiber',
  'Babylon.js',
  'aframe',
  'tinyrenderer',
  '3d-game-shaders-for-beginners',
  'FreeCAD',
  'zdog',
  'cesium',
  '3D-Machine-Learning',
  'SpaceshipGenerator',
  'Open3D',
  'blender',
  'BlenderGIS',
  'spritejs',
  'openscad',
  'tensorspace',
  'webglstudio.js',
  'PRNet',
  'vrn',
  'model-viewer',
  'tinyraytracer',
  'magnum',
  'Fyrox',
  'webgl-fundamentals',
  'vanta',
  '3DDFA',
  'isometric-contributions',
  'meshlab'],
 'stars': [82500,
  20100,
  18300,
  17500,
  1420

Converting the info dictionary into dataframe.

In [60]:
topic_repos_df=pd.DataFrame(topic_repos_dict)
topic_repos_df

Unnamed: 0,username,repo_name,stars,repo_url
0,mrdoob,three.js,82500,https://github.com/mrdoob/three.js
1,libgdx,libgdx,20100,https://github.com/libgdx/libgdx
2,pmndrs,react-three-fiber,18300,https://github.com/pmndrs/react-three-fiber
3,BabylonJS,Babylon.js,17500,https://github.com/BabylonJS/Babylon.js
4,aframevr,aframe,14200,https://github.com/aframevr/aframe
5,ssloy,tinyrenderer,13800,https://github.com/ssloy/tinyrenderer
6,lettier,3d-game-shaders-for-beginners,13100,https://github.com/lettier/3d-game-shaders-for...
7,FreeCAD,FreeCAD,11400,https://github.com/FreeCAD/FreeCAD
8,metafizzy,zdog,9200,https://github.com/metafizzy/zdog
9,CesiumGS,cesium,8700,https://github.com/CesiumGS/cesium


### Final Code

Write a Sigle Function to :
1. Get the list of topics from the topics page
2. Get the list of top repos from the individual topic pages
3. For each topic, create a csv of the top repos for the topic

In [61]:
import os

def get_topics_page(topic_url):

    #Download the page
    response=requests.get(topic_url)
    
    #Check successful response
    if response.status_code !=200:
        raise Exception("Failed To Load Page {}".format(topic_url))
    
    #Parse using BeautifulSoup
    topic_doc = BeautifulSoup(response.text,'html.parser')
    
    return topic_doc

def get_repo_info(h3_tag,star_tag):
    #returns all the required info about a repository
    a_tags=h3_tag.find_all('a')
    username=a_tags[0].text.strip()
    repo_name= a_tags[1].text.strip()
    repo_url= base_url + a_tags[1]['href']
    stars= parse_star_count(star_tag.text.strip())
    return username, repo_name, stars, repo_url

def get_topic_repos(topic_doc):
    
    #Get the h3 tag containing repo title, repo URL and Username
    h3_selection_class='f3 color-fg-muted text-normal lh-condensed'
    repo_tags=topic_doc.find_all('h3',{'class':h3_selection_class})
    
    #Get star tags
    star_tags=topic_doc.find_all('span',{'class':'Counter js-social-count'})
    
    topic_repos_dict={'username':[],'repo_name':[],'stars':[],'repo_url':[] }
    
    #Get repo info
    for i in range(len(repo_tags)):
            repo_info=get_repo_info(repo_tags[i],star_tags[i])
            topic_repos_dict['username'].append(repo_info[0])
            topic_repos_dict['repo_name'].append(repo_info[1])
            topic_repos_dict['stars'].append(repo_info[2])
            topic_repos_dict['repo_url'].append(repo_info[3])
    
    return pd.DataFrame(topic_repos_dict)

def scrape_topic(topic_url, path):
    if os.path.exists(path):
        print("The file {} already exists. Skipping...".format(path))
        return
    topic_df = get_topic_repos(get_topic_page(topic_url))
    topic_df.to_csv(path, index=None)    

- Here we are defining a function that will return the titles of each topic,
- then defining another function that will return the description of each topic and
- third function will scrape the urls of each topic and return it,
- then last function will create and return a dataframe of all this information of the topics

In [62]:
def get_topic_titles(doc):
    selection_class = 'f3 lh-condensed mb-0 mt-1 Link--primary'
    topic_title_tags = doc.find_all('p', {'class': selection_class})
    topic_titles = []
    for tag in topic_title_tags:
        topic_titles.append(tag.text)
    return topic_titles

def get_topic_descs(doc):
    desc_selector='f5 color-fg-muted mb-0 mt-1'
    topic_desc_tags=doc.find_all('p',{'class':desc_selector})
    topic_descs=[]

    for tag in topic_desc_tags:
        topic_descs.append(tag.text.strip())
    return topic_descs    

def get_topic_urls(doc):
    topic_link_tags = doc.find_all('a', {'class': 'no-underline flex-1 d-flex flex-column'})
    topic_urls = []
    base_url = 'https://github.com'
    for tag in topic_link_tags:
        topic_urls.append(base_url + tag['href'])
    return topic_urls
    

def scrape_topics():
    topics_url = 'https://github.com/topics'
    response = requests.get(topics_url)
    if response.status_code != 200:
        raise Exception('Failed to load page {}'.format(topic_url))
    doc= BeautifulSoup(response.text,'html.parser')    
    topics_dict = {
        'title': get_topic_titles(doc),
        'description': get_topic_descs(doc),
        'url': get_topic_urls(doc)
    }
    return pd.DataFrame(topics_dict)

- Defining a function that will create a folder named 'data' using makedirs function of os module
- this folder will store the csv file of each topic
- this csv file will store the information of top repositories and 
- skip regenerating the same csv file if it has generated earlier.

In [63]:
def scrape_topics_repos():
    print('Scraping list of topics')
    topics_df = scrape_topics()
    
    os.makedirs('data', exist_ok=True)
    for index, row in topics_df.iterrows():
        print('Scraping top repositories for "{}"'.format(row['title']))
        scrape_topic(row['url'], 'data/{}.csv'.format(row['title']))

In [64]:
scrape_topics_repos()

Scraping list of topics
Scraping top repositories for "3D"
The file data/3D.csv already exists. Skipping...
Scraping top repositories for "Ajax"
The file data/Ajax.csv already exists. Skipping...
Scraping top repositories for "Algorithm"
The file data/Algorithm.csv already exists. Skipping...
Scraping top repositories for "Amp"
The file data/Amp.csv already exists. Skipping...
Scraping top repositories for "Android"
The file data/Android.csv already exists. Skipping...
Scraping top repositories for "Angular"
The file data/Angular.csv already exists. Skipping...
Scraping top repositories for "Ansible"
The file data/Ansible.csv already exists. Skipping...
Scraping top repositories for "API"
The file data/API.csv already exists. Skipping...
Scraping top repositories for "Arduino"
The file data/Arduino.csv already exists. Skipping...
Scraping top repositories for "ASP.NET"
The file data/ASP.NET.csv already exists. Skipping...
Scraping top repositories for "Atom"
The file data/Atom.csv alre

## Summary and Future Work

**Summary**
- In this project first we scraped information of topics like its title, description and URL.
- And then we scraped information about repositories of each topic like its username, repository name, number of stars and its URL .
- and we stored all this information in a folder which contain a csv file of each topic containing repositories information.

**Future Work**
- Here we have scraped topic and its repositories information from only one page but there are further more pages in which we can do scraping.
- In upcoming days i will try to scrape other pages of topics and also some more pages of topic's repositories. 

**Refrences**
- Here is the page link of github topic on which we have done scraping :- https://github.com/topics