<center>
<h3 name="illness" style="border-color: black;  
           border-width: 3px;
           color: white;
           font-size:25px;
           background-color: Green;
           border-style: solid;  
           border-radius: 5px 5px; 
           padding: 8px">
    Web Scrapping GitHub Top Repositories Project
</h3>
</center>



### Pick a website and describe your objective

- Browse through different sites and pick on to scrape. Check the "Project Ideas" section for inspiration
- Identify the information you'd like to scrape from the site. Decide the format of the output CSV file.
- Summarize your project idea and outline your strategy in a Jupyter Notebook.

### Project Outline


- I'm going to scrape https://github.com/topics
- I'll get a list of topics. For each topic, we'll get topic title, topic page URL and topic description.
- For each topic, we'll get the top 30 repositories in the topic from the topic page.
- For each topic we'll create a CSV file in the following format:


```
Repo Name,User Name,Stars,Repo URL
three.js,mrdoob,90100,https://github.com/mrdoob/three.js
react-three-fiber,libgdx,21300,https://github.com/libgdx/libgdx
```

### Use the requests library to download web pages.

In [10]:
!pip install requests --upgrade --quiet

In [33]:
import requests

In [34]:
URL = 'https://github.com/topics'

In [35]:
response = requests.get(URL)

Status Code : https://httpstatuses.io/

In [36]:
response.status_code

200

In [37]:
len(response.text)

153049

In [38]:
page_contents = response.text

In [45]:
page_contents[:100]

'\n\n<!DOCTYPE html>\n<html lang="en" data-color-mode="auto" data-light-theme="light" data-dark-theme="d'

In [47]:
with open('webpage.html','w',encoding="utf-8") as f:
    f.write(page_contents)

### Use Beautiful Soup to parse and extract information

In [49]:
!pip install beautifulsoup4 --upgrade --quiet

In [50]:
from bs4 import BeautifulSoup

In [53]:
doc = BeautifulSoup(page_contents,'html.parser')

In [54]:
type(doc)

bs4.BeautifulSoup

In [60]:
p_tags = doc.find_all('p')
p_tags[:10]

[<p class="f4 color-fg-muted col-md-6 mx-auto">Browse popular topics on GitHub.</p>,
 <p class="f3 lh-condensed text-center Link--primary mb-0 mt-1">
         Python
       </p>,
 <p class="f5 color-fg-muted text-center mb-0 mt-1">Python is a dynamically typed programming language.</p>,
 <p class="f3 lh-condensed text-center Link--primary mb-0 mt-1">
         Clojure
       </p>,
 <p class="f5 color-fg-muted text-center mb-0 mt-1">Clojure is a dynamic, general-purpose programming language.</p>,
 <p class="f3 lh-condensed text-center Link--primary mb-0 mt-1">
         Atom
       </p>,
 <p class="f5 color-fg-muted text-center mb-0 mt-1">Atom is a open source text editor built with web technologies.</p>,
 <p class="f3 lh-condensed mb-0 mt-1 Link--primary">3D</p>,
 <p class="f5 color-fg-muted mb-0 mt-1">
           3D refers to the use of three-dimensional graphics, modeling, and animation in various industries.
         </p>,
 <p class="f3 lh-condensed mb-0 mt-1 Link--primary">Ajax</p>]

#### Topic Title Selector

In [63]:
selection_class = "f3 lh-condensed mb-0 mt-1 Link--primary"

In [67]:
topic_title_tags = doc.find_all('p',{'class': selection_class})

In [68]:
len(topic_title_tags)

30

In [69]:
topic_title_tags

[<p class="f3 lh-condensed mb-0 mt-1 Link--primary">3D</p>,
 <p class="f3 lh-condensed mb-0 mt-1 Link--primary">Ajax</p>,
 <p class="f3 lh-condensed mb-0 mt-1 Link--primary">Algorithm</p>,
 <p class="f3 lh-condensed mb-0 mt-1 Link--primary">Amp</p>,
 <p class="f3 lh-condensed mb-0 mt-1 Link--primary">Android</p>,
 <p class="f3 lh-condensed mb-0 mt-1 Link--primary">Angular</p>,
 <p class="f3 lh-condensed mb-0 mt-1 Link--primary">Ansible</p>,
 <p class="f3 lh-condensed mb-0 mt-1 Link--primary">API</p>,
 <p class="f3 lh-condensed mb-0 mt-1 Link--primary">Arduino</p>,
 <p class="f3 lh-condensed mb-0 mt-1 Link--primary">ASP.NET</p>,
 <p class="f3 lh-condensed mb-0 mt-1 Link--primary">Atom</p>,
 <p class="f3 lh-condensed mb-0 mt-1 Link--primary">Awesome Lists</p>,
 <p class="f3 lh-condensed mb-0 mt-1 Link--primary">Amazon Web Services</p>,
 <p class="f3 lh-condensed mb-0 mt-1 Link--primary">Azure</p>,
 <p class="f3 lh-condensed mb-0 mt-1 Link--primary">Babel</p>,
 <p class="f3 lh-condensed m

In [75]:
topic_title_tags[:5]

[<p class="f3 lh-condensed mb-0 mt-1 Link--primary">3D</p>,
 <p class="f3 lh-condensed mb-0 mt-1 Link--primary">Ajax</p>,
 <p class="f3 lh-condensed mb-0 mt-1 Link--primary">Algorithm</p>,
 <p class="f3 lh-condensed mb-0 mt-1 Link--primary">Amp</p>,
 <p class="f3 lh-condensed mb-0 mt-1 Link--primary">Android</p>]

#### Description Selector

In [78]:
desc_selector = "f5 color-fg-muted mb-0 mt-1"
topic_desc_tags = doc.find_all('p',{'class': desc_selector})

In [80]:
topic_desc_tags[:5]

[<p class="f5 color-fg-muted mb-0 mt-1">
           3D refers to the use of three-dimensional graphics, modeling, and animation in various industries.
         </p>,
 <p class="f5 color-fg-muted mb-0 mt-1">
           Ajax is a technique for creating interactive web applications.
         </p>,
 <p class="f5 color-fg-muted mb-0 mt-1">
           Algorithms are self-contained sequences that carry out a variety of tasks.
         </p>,
 <p class="f5 color-fg-muted mb-0 mt-1">
           Amp is a non-blocking concurrency library for PHP.
         </p>,
 <p class="f5 color-fg-muted mb-0 mt-1">
           Android is an operating system built by Google designed for mobile devices.
         </p>]

### Create CSV (File) with the extracted information

### Document and Share your work