### Top Repositories for GitHub Topics

### Objectives:

    A: Browse through the github topic site and select the top topics to scrape.
    B: Identify the information you'd like to scrape from the site. Decide the format of the output CSV file.
    C: Summarize your project idea and outline your strategy in a Juptyer notebook.


### Project Outline

- The site to scrape https://github.com/topics
- Extracting a list of topics from the site. For each topic, I'll extract the topic title, topic page URL and topic description
- For each topic, I'll get the top 25 repositories in the topic from the topic page.
- For each repository, I'll grab the repo name, username, stars and repo URL
- For each topic I'll create a CSV file in the following format:

   Repo Name,Username,Stars,Repo URL 
   three.js,mrdoob,69700,https://github.com/mrdoob/three.js 
   libgdx,libgdx,18300,https://github.com/libgdx/libgdx


### Using the requests library to download web pages

In [6]:
!pip install requests --upgrade --quiet


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.1.2[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49m/usr/bin/python3 -m pip install --upgrade pip[0m


In [1]:
import requests

In [2]:
topics_url = 'https://github.com/topics'

In [3]:
response = requests.get(topics_url)

In [4]:
#verifying the http status code for reponse
response.status_code

200

In [5]:
#getting the length of the webpage content
len(response.text)

185626

In [6]:
page_contents = response.text

In [7]:
#Viewing the first 1000 page content of the web content 
page_contents[:1000]

'\n\n<!DOCTYPE html>\n<html\n  lang="en"\n  \n  data-color-mode="auto" data-light-theme="light" data-dark-theme="dark"\n  data-a11y-animated-images="system" data-a11y-link-underlines="true"\n  >\n\n\n\n\n  <head>\n    <meta charset="utf-8">\n  <link rel="dns-prefetch" href="https://github.githubassets.com">\n  <link rel="dns-prefetch" href="https://avatars.githubusercontent.com">\n  <link rel="dns-prefetch" href="https://github-cloud.s3.amazonaws.com">\n  <link rel="dns-prefetch" href="https://user-images.githubusercontent.com/">\n  <link rel="preconnect" href="https://github.githubassets.com" crossorigin>\n  <link rel="preconnect" href="https://avatars.githubusercontent.com">\n\n  \n\n  <link crossorigin="anonymous" media="all" rel="stylesheet" href="https://github.githubassets.com/assets/light-0eace2597ca3.css" /><link crossorigin="anonymous" media="all" rel="stylesheet" href="https://github.githubassets.com/assets/dark-a167e256da9c.css" /><link data-color-theme="dark_dimmed" crossor

In [29]:
# Saving the page_contents sliced above into an html file
with open('webpage.html', 'w') as f:
        f.write(page_contents)

### Using Beautiful Soup to parse and extract information

In [30]:
!pip install beautifulsoup4 --upgrade --quiet


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.1.2[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49m/usr/bin/python3 -m pip install --upgrade pip[0m


In [8]:
from bs4 import BeautifulSoup

In [9]:
doc = BeautifulSoup(page_contents, 'html.parser')

In [10]:
type(doc)

bs4.BeautifulSoup

In [11]:
#targeting to extract the topic headings inside the paragraph tags.
#thus, searching for all p-tags

topic_title_tags = doc.find_all('p')

In [12]:
len(topic_title_tags)

69

In [13]:
#slicing the first 5 p_tags
topic_title_tags[:5]

[<p>We read every piece of feedback, and take your input very seriously.</p>,
 <p class="text-small color-fg-muted">
             To see all available qualifiers, see our <a class="Link--inTextBlock" href="https://docs.github.com/search-github/github-code-search/understanding-github-code-search-syntax">documentation</a>.
           </p>,
 <p class="f4 color-fg-muted col-md-6 mx-auto">Browse popular topics on GitHub.</p>,
 <p class="f3 lh-condensed text-center Link--primary mb-0 mt-1">
         Firefox
       </p>,
 <p class="f5 color-fg-muted text-center mb-0 mt-1">Firefox is an open source web browser from Mozilla.</p>]

In [14]:
selection_class = 'f3 lh-condensed mb-0 mt-1 Link--primary'
topic_title_tags = doc.find_all('p',{'class':selection_class})

In [15]:
len(topic_title_tags)

30

In [16]:
topic_title_tags[:5]

[<p class="f3 lh-condensed mb-0 mt-1 Link--primary">3D</p>,
 <p class="f3 lh-condensed mb-0 mt-1 Link--primary">Ajax</p>,
 <p class="f3 lh-condensed mb-0 mt-1 Link--primary">Algorithm</p>,
 <p class="f3 lh-condensed mb-0 mt-1 Link--primary">Amp</p>,
 <p class="f3 lh-condensed mb-0 mt-1 Link--primary">Android</p>]

##### Extracting the topic descriptions from the topic description tags

In [17]:
#Extracting description
desc_selector = 'f5 color-fg-muted mb-0 mt-1'
topic_desc_tags = doc.find_all('p', {'class':desc_selector})

In [18]:
topic_desc_tags[:5]

[<p class="f5 color-fg-muted mb-0 mt-1">
           3D refers to the use of three-dimensional graphics, modeling, and animation in various industries.
         </p>,
 <p class="f5 color-fg-muted mb-0 mt-1">
           Ajax is a technique for creating interactive web applications.
         </p>,
 <p class="f5 color-fg-muted mb-0 mt-1">
           Algorithms are self-contained sequences that carry out a variety of tasks.
         </p>,
 <p class="f5 color-fg-muted mb-0 mt-1">
           Amp is a non-blocking concurrency library for PHP.
         </p>,
 <p class="f5 color-fg-muted mb-0 mt-1">
           Android is an operating system built by Google designed for mobile devices.
         </p>]

In [19]:
#extracting a list of the topic descriptions from the topic_desc_tags using a for loop

topic_descs = []

for tag in topic_desc_tags:
    topic_descs.append(tag.text.strip()) #removing the extra texts using .text and the spacing using .strip()
print(topic_descs[:5])

['3D refers to the use of three-dimensional graphics, modeling, and animation in various industries.', 'Ajax is a technique for creating interactive web applications.', 'Algorithms are self-contained sequences that carry out a variety of tasks.', 'Amp is a non-blocking concurrency library for PHP.', 'Android is an operating system built by Google designed for mobile devices.']


In [20]:
topic_descs[:5]

['3D refers to the use of three-dimensional graphics, modeling, and animation in various industries.',
 'Ajax is a technique for creating interactive web applications.',
 'Algorithms are self-contained sequences that carry out a variety of tasks.',
 'Amp is a non-blocking concurrency library for PHP.',
 'Android is an operating system built by Google designed for mobile devices.']

##### Extracting the hrefs links from the topic pages
###### step1 : finding the topic link tags

In [21]:
topic_link_tags = doc.find_all('a', {'class':'no-underline flex-grow-0'})

In [22]:
len(topic_link_tags)

30

In [23]:
topic_link_tags[0]['href']

'/topics/3d'

###### Step2: appending the link tags to the github url to create the desired topic url.

In [24]:
topic0_url = "https://github.com" + topic_link_tags[0]['href']
print(topic0_url)

https://github.com/topics/3d


###### Step3: Creating  a list of the topic Urls from the topic_link_tags using a for loop. This generates new topic urls for the different topics 

In [25]:
topic_urls = []
base_url = "https://github.com"

for tag in topic_link_tags:
    topic_urls.append(base_url + tag['href'])
print(topic_urls)
    

['https://github.com/topics/3d', 'https://github.com/topics/ajax', 'https://github.com/topics/algorithm', 'https://github.com/topics/amphp', 'https://github.com/topics/android', 'https://github.com/topics/angular', 'https://github.com/topics/ansible', 'https://github.com/topics/api', 'https://github.com/topics/arduino', 'https://github.com/topics/aspnet', 'https://github.com/topics/atom', 'https://github.com/topics/awesome', 'https://github.com/topics/aws', 'https://github.com/topics/azure', 'https://github.com/topics/babel', 'https://github.com/topics/bash', 'https://github.com/topics/bitcoin', 'https://github.com/topics/bootstrap', 'https://github.com/topics/bot', 'https://github.com/topics/c', 'https://github.com/topics/chrome', 'https://github.com/topics/chrome-extension', 'https://github.com/topics/cli', 'https://github.com/topics/clojure', 'https://github.com/topics/code-quality', 'https://github.com/topics/code-review', 'https://github.com/topics/compiler', 'https://github.com/t

##### Extracting topic titles from the topic_title_tags 

In [26]:
topic_title_tags[:5]

[<p class="f3 lh-condensed mb-0 mt-1 Link--primary">3D</p>,
 <p class="f3 lh-condensed mb-0 mt-1 Link--primary">Ajax</p>,
 <p class="f3 lh-condensed mb-0 mt-1 Link--primary">Algorithm</p>,
 <p class="f3 lh-condensed mb-0 mt-1 Link--primary">Amp</p>,
 <p class="f3 lh-condensed mb-0 mt-1 Link--primary">Android</p>]

In [27]:
#extracting a list of the topic titles from the topic_title_tags using a for loop 
topic_titles = []

for tag in topic_title_tags:
    topic_titles.append(tag.text)
    
print(topic_titles) 

['3D', 'Ajax', 'Algorithm', 'Amp', 'Android', 'Angular', 'Ansible', 'API', 'Arduino', 'ASP.NET', 'Atom', 'Awesome Lists', 'Amazon Web Services', 'Azure', 'Babel', 'Bash', 'Bitcoin', 'Bootstrap', 'Bot', 'C', 'Chrome', 'Chrome extension', 'Command line interface', 'Clojure', 'Code quality', 'Code review', 'Compiler', 'Continuous integration', 'COVID-19', 'C++']


### Using Pandas to Create a DataFrame from the Extracted Data and Saving it to a CSV File

In [28]:
import pandas as pd

In [29]:
topics_dict = {
    'Title':topic_titles,
    'Description':topic_descs,
    'Link':topic_urls,
    
}

In [30]:
topics_df = pd.DataFrame(topics_dict)

In [159]:
topics_df. head()

Unnamed: 0,Title,Description,Link
0,3D,3D refers to the use of three-dimensional grap...,https://github.com/topics/3d
1,Ajax,Ajax is a technique for creating interactive w...,https://github.com/topics/ajax
2,Algorithm,Algorithms are self-contained sequences that c...,https://github.com/topics/algorithm
3,Amp,Amp is a non-blocking concurrency library for ...,https://github.com/topics/amphp
4,Android,Android is an operating system built by Google...,https://github.com/topics/android


In [141]:
# Saving the created dataframe to a csv file
topics_df.to_csv('topics.csv', index=None)

### Getting information out of a topic page

###### Task1: Scraping the username and url of the first topic page

In [32]:
topic_page_url = topic_urls[0]

In [33]:
topic_page_url

'https://github.com/topics/3d'

In [34]:
response = requests.get(topic_page_url)

In [35]:
#checking if the response is successful
response.status_code

200

In [36]:
len(response.text)

503593

In [37]:
topic_doc = BeautifulSoup(response.text, 'html.parser')

In [38]:
h3_selection_class = 'f3 color-fg-muted text-normal lh-condensed'
repo_tags = topic_doc.find_all('h3', {'class':h3_selection_class} )

In [39]:
repo_tags[0]

<h3 class="f3 color-fg-muted text-normal lh-condensed">
<a class="Link" data-hydro-click='{"event_type":"explore.click","payload":{"click_context":"REPOSITORY_CARD","click_target":"OWNER","click_visual_representation":"REPOSITORY_OWNER_HEADING","actor_id":null,"record_id":97088,"originating_url":"https://github.com/topics/3d","user_id":null}}' data-hydro-click-hmac="c72fbd5c69a8ee7c9c53a4e65de2b93c8fc7552dd793945819639bc165c0f0ba" data-turbo="false" data-view-component="true" href="/mrdoob">
            mrdoob
</a>          /
          <a class="Link text-bold wb-break-word" data-hydro-click='{"event_type":"explore.click","payload":{"click_context":"REPOSITORY_CARD","click_target":"REPOSITORY","click_visual_representation":"REPOSITORY_NAME_HEADING","actor_id":null,"record_id":576201,"originating_url":"https://github.com/topics/3d","user_id":null}}' data-hydro-click-hmac="4a2667db3d63a1739c412e059e5da95afe419df83f70949b5d59dc3478f5c79a" data-turbo="false" data-view-component="true" href

In [40]:
len(repo_tags)

20

In [41]:
a_tags = repo_tags[0].find_all('a')
a_tags 

[<a class="Link" data-hydro-click='{"event_type":"explore.click","payload":{"click_context":"REPOSITORY_CARD","click_target":"OWNER","click_visual_representation":"REPOSITORY_OWNER_HEADING","actor_id":null,"record_id":97088,"originating_url":"https://github.com/topics/3d","user_id":null}}' data-hydro-click-hmac="c72fbd5c69a8ee7c9c53a4e65de2b93c8fc7552dd793945819639bc165c0f0ba" data-turbo="false" data-view-component="true" href="/mrdoob">
             mrdoob
 </a>,
 <a class="Link text-bold wb-break-word" data-hydro-click='{"event_type":"explore.click","payload":{"click_context":"REPOSITORY_CARD","click_target":"REPOSITORY","click_visual_representation":"REPOSITORY_NAME_HEADING","actor_id":null,"record_id":576201,"originating_url":"https://github.com/topics/3d","user_id":null}}' data-hydro-click-hmac="4a2667db3d63a1739c412e059e5da95afe419df83f70949b5d59dc3478f5c79a" data-turbo="false" data-view-component="true" href="/mrdoob/three.js">
             three.js
 </a>]

In [42]:
a_tags[0]

<a class="Link" data-hydro-click='{"event_type":"explore.click","payload":{"click_context":"REPOSITORY_CARD","click_target":"OWNER","click_visual_representation":"REPOSITORY_OWNER_HEADING","actor_id":null,"record_id":97088,"originating_url":"https://github.com/topics/3d","user_id":null}}' data-hydro-click-hmac="c72fbd5c69a8ee7c9c53a4e65de2b93c8fc7552dd793945819639bc165c0f0ba" data-turbo="false" data-view-component="true" href="/mrdoob">
            mrdoob
</a>

In [43]:
#Scraping the username
a_tags[0].text.strip()

'mrdoob'

In [44]:
a_tags[1]

<a class="Link text-bold wb-break-word" data-hydro-click='{"event_type":"explore.click","payload":{"click_context":"REPOSITORY_CARD","click_target":"REPOSITORY","click_visual_representation":"REPOSITORY_NAME_HEADING","actor_id":null,"record_id":576201,"originating_url":"https://github.com/topics/3d","user_id":null}}' data-hydro-click-hmac="4a2667db3d63a1739c412e059e5da95afe419df83f70949b5d59dc3478f5c79a" data-turbo="false" data-view-component="true" href="/mrdoob/three.js">
            three.js
</a>

In [45]:
#scraping the repository name
a_tags[1].text.strip()

'three.js'

In [46]:
#scraping the href
a_tags[1]['href']

'/mrdoob/three.js'

In [47]:
#creating the repo url from the above scraped data
base_url = "https://github.com"
repo_url = base_url + a_tags[1]['href']
print(repo_url)

https://github.com/mrdoob/three.js


##### Scraping the number of stars for the repository

In [48]:
star_tags = topic_doc.find_all('span', id='repo-stars-counter-star')

In [49]:
len(star_tags)

20

In [73]:
star_tags[0].text

'97.6k'

In [74]:
#Converting the Stars to a defined number 
def parse_star_count(stars_str):
    stars_str = stars_str.strip()
    if stars_str[-1]== 'k': #if the last element is eqal to k
        return int(float(stars_str[:-1])*1000) # we remove the last element using stars_str[:-1]
    return int(stars_str)
        

In [75]:
print(parse_star_count(star_tags[0].text))

97600


##### Creating a function that returns all required data for a specific repo.

In [103]:
def get_rep_info(h3_tag, star_tag):
    #returns all the required information about a repository
    a_tags = h3_tag.find_all('a')
    username = a_tags[0].text.strip()
    repo_name = a_tags[1].text.strip()
    repo_url = base_url + a_tags[1]['href']
    stars = parse_star_count(star_tag.text.strip())
    return username, repo_name, stars, repo_url

In [99]:
get_rep_info(repo_tags[0], star_tags[0])

('mrdoob', 'three.js', 97600, 'https://github.com/mrdoob/three.js')

##### Returning Scraped Data for all top Repos for the topic 3D using a for loop 

In [100]:
topic_repos_dict = {
    'username':[],
    'repo_name':[],
    'stars':[],
    'repo_url':[]
}


for i in range (len(repo_tags)):
    repo_info = get_rep_info(repo_tags[i], star_tags[i])
    topic_repos_dict['username'].append(repo_info[0])
    topic_repos_dict['repo_name'].append(repo_info[1])
    topic_repos_dict['stars'].append(repo_info[2])
    topic_repos_dict['repo_url'].append(repo_info[3])

###### Converting to a DataFrame

In [155]:
topic_repos_df = pd.DataFrame(topic_repos_dict)
topic_repos_df.head(11)

Unnamed: 0,username,repo_name,stars,repo_url
0,mrdoob,three.js,97600,https://github.com/mrdoob/three.js
1,pmndrs,react-three-fiber,25300,https://github.com/pmndrs/react-three-fiber
2,libgdx,libgdx,22500,https://github.com/libgdx/libgdx
3,BabylonJS,Babylon.js,22100,https://github.com/BabylonJS/Babylon.js
4,ssloy,tinyrenderer,18900,https://github.com/ssloy/tinyrenderer
5,FreeCAD,FreeCAD,16900,https://github.com/FreeCAD/FreeCAD
6,lettier,3d-game-shaders-for-beginners,16800,https://github.com/lettier/3d-game-shaders-for...
7,aframevr,aframe,16000,https://github.com/aframevr/aframe
8,CesiumGS,cesium,11600,https://github.com/CesiumGS/cesium
9,blender,blender,10900,https://github.com/blender/blender


#### General Code for Extracting Data for all Topics (Combined code).

In [204]:
import os

def get_topic_page(topic_url):
    # download the page
    response = requests.get(topic_url)
    # Checking the status of the page (response)
    if response.status_code != 200:
        raise Exception('Failed to load page{}', format(topic_url))
    #parse using beautifulSoup
    topic_doc = BeautifulSoup(response.text, 'html.parser')
    return topic_doc

def get_rep_info(h3_tag, star_tag):
    #returns all the required information about a repository
    a_tags = h3_tag.find_all('a')
    username = a_tags[0].text.strip()
    repo_name = a_tags[1].text.strip()
    repo_url = base_url + a_tags[1]['href']
    stars = parse_star_count(star_tag.text.strip())
    return username, repo_name, stars, repo_url



def get_topic_repos(topic_doc):
    # Get the h3 tags containing repo title, repo URL and username
    h3_selection_class = 'f3 color-fg-muted text-normal lh-condensed'
    repo_tags = topic_doc.find_all('h3', {'class':h3_selection_class} )
     # Get star tags
    star_tags = topic_doc.find_all('span', id='repo-stars-counter-star')
    
    topic_repos_dict = {
        'username':[],
        'repo_name':[],
        'stars':[],
        'repo_url':[]
        }
    
    # Get repo info
    for i in range (len(repo_tags)):
        repo_info = get_rep_info(repo_tags[i], star_tags[i])
        topic_repos_dict['username'].append(repo_info[0])
        topic_repos_dict['repo_name'].append(repo_info[1])
        topic_repos_dict['stars'].append(repo_info[2])
        topic_repos_dict['repo_url'].append(repo_info[3])
    
    return pd.DataFrame(topic_repos_dict)
    
    
    
def scrape_topic(topic_url, path):
    if os.path.exists(path): # Checking if a file exists so that it can be skipped and not be re-downloaded 
        print("The file {} already exists. Skipping...".format(path))
        return
    topic_df = get_topic_repos(get_topic_page(topic_url))
    topic_df.to_csv(path, index=None)

##### Getting Data for the 4th topic using the code above

In [173]:
url4 = topic_urls[4]

In [174]:
url4

'https://github.com/topics/android'

In [175]:
topic4_doc = get_topic_page(url4)

In [176]:
topic4_repos = get_topic_repos(topic4_doc)

In [177]:
topic4_repos.head()

Unnamed: 0,username,repo_name,stars,repo_url
0,flutter,flutter,160000,https://github.com/flutter/flutter
1,facebook,react-native,115000,https://github.com/facebook/react-native
2,justjavac,free-programming-books-zh_CN,108000,https://github.com/justjavac/free-programming-...
3,Genymobile,scrcpy,98600,https://github.com/Genymobile/scrcpy
4,Hack-with-Github,Awesome-Hacking,75000,https://github.com/Hack-with-Github/Awesome-Ha...


#### Using a single function to generate the same table as above

In [178]:
get_topic_repos(get_topic_page(topic_urls[4])).head()

Unnamed: 0,username,repo_name,stars,repo_url
0,flutter,flutter,160000,https://github.com/flutter/flutter
1,facebook,react-native,115000,https://github.com/facebook/react-native
2,justjavac,free-programming-books-zh_CN,108000,https://github.com/justjavac/free-programming-...
3,Genymobile,scrcpy,98600,https://github.com/Genymobile/scrcpy
4,Hack-with-Github,Awesome-Hacking,75000,https://github.com/Hack-with-Github/Awesome-Ha...


In [179]:
topic_urls[5]

'https://github.com/topics/angular'

In [180]:
get_topic_repos(get_topic_page(topic_urls[5])).head()

Unnamed: 0,username,repo_name,stars,repo_url
0,justjavac,free-programming-books-zh_CN,108000,https://github.com/justjavac/free-programming-...
1,angular,angular,93700,https://github.com/angular/angular
2,storybookjs,storybook,82100,https://github.com/storybookjs/storybook
3,leonardomso,33-js-concepts,61300,https://github.com/leonardomso/33-js-concepts
4,ionic-team,ionic-framework,50200,https://github.com/ionic-team/ionic-framework



Write a single function to :
1. Get the list of topics from the topics page
2. Get the list of top repos from the individual topic pages
3. For each topic, create a CSV of the top repos for the topic


In [188]:
def get_topic_titles(doc):
    selection_class = 'f3 lh-condensed mb-0 mt-1 Link--primary'
    topic_title_tags = doc.find_all('p', {'class': selection_class})
    topic_titles = []
    for tag in topic_title_tags:
        topic_titles.append(tag.text)
    return topic_titles

def get_topic_descs(doc):
    desc_selector = 'f5 color-fg-muted mb-0 mt-1'
    topic_desc_tags = doc.find_all('p', {'class': desc_selector})
    topic_descs = []
    for tag in topic_desc_tags:
        topic_descs.append(tag.text.strip())
    return topic_descs

def get_topic_urls(doc):
    topic_link_tags = doc.find_all('a', {'class': 'no-underline flex-grow-0'})
    topic_urls = []
    base_url = 'https://github.com'
    for tag in topic_link_tags:
        topic_urls.append(base_url + tag['href'])
    return topic_urls
    

def scrape_topics():
    topics_url = 'https://github.com/topics'
    response = requests.get(topics_url)
    if response.status_code != 200:
        raise Exception('Failed to load page {}'.format(topic_url))
    topics_dict = {
        'title': get_topic_titles(doc),
        'description': get_topic_descs(doc),
        'url': get_topic_urls(doc)
    }
    topics_df = pd.DataFrame(topics_dict)
    return topics_df


In [189]:
titles = get_topic_titles(doc)
descriptions = get_topic_descs(doc)
urls = get_topic_urls(doc)

print(len(titles), len(descriptions), len(urls))


30 30 30


In [203]:
#Scraping list of topics and subsequently scrape a list of top repositories.
def scrape_topics_repos():
    print('Scraping list of topics')
    topics_df = scrape_topics()
    #Creating a folder / directory named 'data' to save the scraped files
    os.makedirs('data', exist_ok=True)
    for index, row in topics_df.iterrows(): # Iterating over rows 
        print('Scraping top repositories for "{}"'.format(row['title']))
        scrape_topic(row['url'], 'data/{}.csv'.format(row['title']))

In [202]:
scrape_topics_repos()

Scraping list of topics
Scraping top repositories for "3D"
The file data/3D.csv already exists. Skipping...
Scraping top repositories for "Ajax"
The file data/Ajax.csv already exists. Skipping...
Scraping top repositories for "Algorithm"
The file data/Algorithm.csv already exists. Skipping...
Scraping top repositories for "Amp"
The file data/Amp.csv already exists. Skipping...
Scraping top repositories for "Android"
The file data/Android.csv already exists. Skipping...
Scraping top repositories for "Angular"
The file data/Angular.csv already exists. Skipping...
Scraping top repositories for "Ansible"
The file data/Ansible.csv already exists. Skipping...
Scraping top repositories for "API"
The file data/API.csv already exists. Skipping...
Scraping top repositories for "Arduino"
The file data/Arduino.csv already exists. Skipping...
Scraping top repositories for "ASP.NET"
The file data/ASP.NET.csv already exists. Skipping...
Scraping top repositories for "Atom"
The file data/Atom.csv alre

### Document and Share the Work