# Top Github Repositories by Topic

### Pick a website and describe your objective
- Browse through different sites and pick on to scrape.
- Identify the information you'd like to scrape from the site. Decide the format of the output CSV file.
- Summarize your project idea and outline your strategy in a Juptyer notebook.

#### Project Outline:
- We are going to scrape https://github.com/topics
- Get a list of topics. For each topic, we will have topic title, topic page url (which includes the 'id'), and topic description
- For each topic, we will get the top 25 repositories
- For each repo, we will have the repo name, username, stars, and repo url
- Each topic will have a csv file with the following format
```
Repo Name,Username,Stars,Repo URL
free-programming-books-zh_CN,justjavac,102000,https://github.com/justjavac/free-programming-books-zh_CN
kotlin,JetBrains,44600,https://github.com/JetBrains/kotlin
```

### Use the requests library to download web pages
- Inspect the website's HTML source and identify the right URLs to download.
- Download and save web pages locally using the requests library.
- Create a function to automate downloading for different topics/search queries.

In [12]:
! python3 -m pip install --upgrade pip --user
! python3 -m pip install requests --upgrade

In [13]:
import requests

In [20]:
topics_url = 'https://github.com/topics'
response = requests.get(topics_url)
# 200-299 is a successful response download for status_code
# https://developer.mozilla.org/en-US/docs/Web/HTTP/Status contains more info on status codes
response.status_code

200

In [22]:
page_contents = response.text
with open('topics-page.html', 'w') as f:
    f.write(page_contents)

### Use Beautiful Soup to parse and extract information
- Parse and explore the structure of downloaded web pages using Beautiful soup.
- Use the right properties and methods to extract the required information.
- Create functions to extract from the page into lists and dictionaries.
- (Optional) Use a REST API to acquire additional information if required.


In [23]:
! python3 -m pip install beautifulsoup4 --upgrade

### Create CSV file(s) with the extracted information
- Create functions for the end-to-end process of downloading, parsing, and saving CSVs.
- Execute the function with different inputs to create a dataset of CSV files.
- Verify the information in the CSV files by reading them back using Pandas.

### Document and share your work
- Add proper headings and documentation in your Jupyter notebook.
- Publish your Jupyter notebook to your GitHub
- (Optional) Write a blog post about your project and share it online.