Web Scraping:

-    use this website : [Github/topics](https://github.com/topics)
-    Write a Python script using the requests library to fetch the HTML content of the chosen website.
-    Print the status code of the response to ensure the request was successful using `.status_code`, it should be 200.
-    Print the first 100 characters of the HTML content to verify the response.
-    Save the HTML content to a file named webpage.html. Ensure you handle the text encoding correctly.
-    Use BeautifulSoup to parse the saved HTML content.
-    Identify two distinct pieces of information on the webpage to extract (e.g., titles of the topics and their descriptions).
-    Write code to extract these pieces of information. Ensure you identify the correct HTML tags and classes used for these elements on the webpage.
-    Print the length and content of each extracted list to verify the extraction process.
-    Create a Python dictionary to structure the extracted data, with keys representing the type of information (e.g., ‘title’ and ‘description’).
-    Convert this dictionary into a pandas DataFrame.
-    Print the DataFrame to confirm its structure and contents.


In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

In [2]:
# fetch the content
url = "https://github.com/topics"
response = requests.get(url)

# print response
print(f"status code: {response.status_code}")

# print the 1st 100 char of the html cont to verfy rspnse
print(f"first 100 char of the html cont: {response.text[:100]}")

# save the html content to a file
with open("webpage.html", "w", encoding="utf-8") as f:
    f.write(response.text)

# parse the html content
with open("webpage.html", "r", encoding="utf-8") as f:
    soup = BeautifulSoup(f, "html.parser")

# extract topic titles & descriptions
titles = [title.get_text(strip=True) for title in soup.select('p.f3')]
descriptions = [desc.get_text(strip=True) for desc in soup.select('p.f5')]

# print length and content of extracted lists
print(f"\nNumber of titles extracted: {len(titles)}")
print("Titles:", titles)
print(f"\nNumber of descriptions extracted: {len(descriptions)}")
print("Descriptions:", descriptions)

# create dict and convert it to DF
data = {"title": titles, "description": descriptions}
df = pd.DataFrame(data)

# print df
print("\nDataFrame:")
print(df)

status code: 200
first 100 char of the html cont: 

<!DOCTYPE html>
<html
  lang="en"
  
  data-color-mode="auto" data-light-theme="light" data-dark-t

Number of titles extracted: 33
Titles: ['Minecraft', 'Unity', 'Clojure', '3D', 'Ajax', 'Algorithm', 'Amp', 'Android', 'Angular', 'Ansible', 'API', 'Arduino', 'ASP.NET', 'Awesome Lists', 'Amazon Web Services', 'Azure', 'Babel', 'Bash', 'Bitcoin', 'Bootstrap', 'Bot', 'C', 'Chrome', 'Chrome extension', 'Command-line interface', 'Clojure', 'Code quality', 'Code review', 'Compiler', 'Continuous integration', 'C++', 'Cryptocurrency', 'Crystal']

Number of descriptions extracted: 33
Descriptions: ['Minecraft is a sandbox video game.', 'Unity is a game engine used to create 2D/3D video games, and simulations for computers, consoles, and mobile devices.', 'Clojure is a dynamic, general-purpose programming language.', '3D refers to the use of three-dimensional graphics, modeling, and animation in various industries.', 'Ajax is a technique for cre