# TOPIC - Top repositories for github trending topics

## PROJECT OUTLINE


##### We're going to scrape https://github.com/topics
##### We'll get a list of topics. For each topic, we'll get topic title, topic page URL and topic description.

In [None]:
!pip install requests --upgrade --quiet

In [2]:
import requests

In [3]:
topics_url = 'https://github.com/topics'

In [4]:
response = requests.get(topics_url)

In [5]:
response.status_code

200

In [6]:
page_contents = response.text

### USE BEAUTIFULSOUP TO PARSE AND EXTRACT INFORMATION

In [7]:
!pip install beautifulsoup4 --upgrade --quiet

In [8]:
from bs4 import BeautifulSoup

In [9]:
doc = BeautifulSoup(page_contents, 'html.parser')

### EXTRACTING TOPIC TITLE TAGS

In [10]:
selection_class = 'f3 lh-condensed mb-0 mt-1 Link--primary'

In [11]:
topic_title_tags = doc.find_all('p', {'class': selection_class})

In [12]:
len(topic_title_tags)

30

In [13]:
topic_title_tags[:4]

[<p class="f3 lh-condensed mb-0 mt-1 Link--primary">3D</p>,
 <p class="f3 lh-condensed mb-0 mt-1 Link--primary">Ajax</p>,
 <p class="f3 lh-condensed mb-0 mt-1 Link--primary">Algorithm</p>,
 <p class="f3 lh-condensed mb-0 mt-1 Link--primary">Amp</p>]

### EXTRACTING TOPIC DESCRIPTION

In [14]:
description_selector = 'f5 color-fg-muted mb-0 mt-1'

In [42]:
topic_desc_tags = doc.find_all('p', {'class': description_selector})


In [46]:
description_texts = [tag.text.strip() for tag in topic_desc_tags]

In [47]:
len(description_texts)

30

In [48]:
description_texts[:4]

['3D refers to the use of three-dimensional graphics, modeling, and animation in various industries.',
 'Ajax is a technique for creating interactive web applications.',
 'Algorithms are self-contained sequences that carry out a variety of tasks.',
 'Amp is a non-blocking concurrency library for PHP.']

### EXTRACTING TOPIC LINKS
##### We cannot directly fetch links. Links have some tags which are different rest are same. 

##### Here in this case topic title is different rest all are same i.e "https://github.com/topics/"

In [23]:
topic_titles = []

for tag in topic_title_tags:
    topic_titles.append(tag.text)
    
print(topic_titles)

['3D', 'Ajax', 'Algorithm', 'Amp', 'Android', 'Angular', 'Ansible', 'API', 'Arduino', 'ASP.NET', 'Atom', 'Awesome Lists', 'Amazon Web Services', 'Azure', 'Babel', 'Bash', 'Bitcoin', 'Bootstrap', 'Bot', 'C', 'Chrome', 'Chrome extension', 'Command line interface', 'Clojure', 'Code quality', 'Code review', 'Compiler', 'Continuous integration', 'COVID-19', 'C++']


In [31]:
topic_urls = []
base_url = 'https://github.com/topics/'

for title in topic_titles:
    topic_urls.append(base_url + title)

print(topic_urls)

['https://github.com/topics/3D', 'https://github.com/topics/Ajax', 'https://github.com/topics/Algorithm', 'https://github.com/topics/Amp', 'https://github.com/topics/Android', 'https://github.com/topics/Angular', 'https://github.com/topics/Ansible', 'https://github.com/topics/API', 'https://github.com/topics/Arduino', 'https://github.com/topics/ASP.NET', 'https://github.com/topics/Atom', 'https://github.com/topics/Awesome Lists', 'https://github.com/topics/Amazon Web Services', 'https://github.com/topics/Azure', 'https://github.com/topics/Babel', 'https://github.com/topics/Bash', 'https://github.com/topics/Bitcoin', 'https://github.com/topics/Bootstrap', 'https://github.com/topics/Bot', 'https://github.com/topics/C', 'https://github.com/topics/Chrome', 'https://github.com/topics/Chrome extension', 'https://github.com/topics/Command line interface', 'https://github.com/topics/Clojure', 'https://github.com/topics/Code quality', 'https://github.com/topics/Code review', 'https://github.com

### MAKING A DATAFRAME

In [32]:
!pip install pandas --quiet

In [33]:
import pandas as pd

In [49]:
topics_dictionary = {
    'TITLE': topic_title_tags,
    'DESCRIPTION': description_texts,
    'URL': topic_urls
}

In [50]:
topics_df = pd.DataFrame(topics_dictionary)

In [51]:
topics_df

Unnamed: 0,TITLE,DESCRIPTION,URL
0,[3D],3D refers to the use of three-dimensional grap...,https://github.com/topics/3D
1,[Ajax],Ajax is a technique for creating interactive w...,https://github.com/topics/Ajax
2,[Algorithm],Algorithms are self-contained sequences that c...,https://github.com/topics/Algorithm
3,[Amp],Amp is a non-blocking concurrency library for ...,https://github.com/topics/Amp
4,[Android],Android is an operating system built by Google...,https://github.com/topics/Android
5,[Angular],Angular is an open source web application plat...,https://github.com/topics/Angular
6,[Ansible],Ansible is a simple and powerful automation en...,https://github.com/topics/Ansible
7,[API],An API (Application Programming Interface) is ...,https://github.com/topics/API
8,[Arduino],Arduino is an open source platform for buildin...,https://github.com/topics/Arduino
9,[ASP.NET],ASP.NET is a web framework for building modern...,https://github.com/topics/ASP.NET


In [54]:
topics_df.to_csv('topics.csv',index= None)