
## Instructions

1. Use this website: [GitHub Topics](https://github.com/topics).

2. **Write a Python script** using the `requests` library to:
    - Fetch the HTML content of the chosen website.
    - Print the status code of the response to ensure the request was successful using `.status_code`. It should be `200`.
    - Print the first 100 characters of the HTML content to verify the response.
    - Save the HTML content to a file named `webpage.html`. Ensure you handle the text encoding correctly.

3. **Parse the HTML content** using `BeautifulSoup`:
    - Load the saved HTML file and parse it.

4. **Identify two distinct pieces of information** on the webpage to extract:
    - Titles of the topics.
    - Descriptions of the topics.

5. **Write code to extract** these pieces of information:
    - Ensure you identify the correct HTML tags and classes used for these elements on the webpage.

6. **Print the extracted data**:
    - Print the length and content of each extracted list to verify the extraction process.

7. **Structure the extracted data**:
    - Create a Python dictionary with keys representing the type of information (e.g., `title` and `description`).

8. **Convert the dictionary** into a `pandas` DataFrame:
    - Print the DataFrame to confirm its structure and contents.


In [21]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://github.com/topics/ios'
response = requests.get(url)
print(response.status_code)

ios_soup = BeautifulSoup(response.content, 'html.parser')
print(response.text[:100])


200


<!DOCTYPE html>
<html
  lang="en"
  
  data-color-mode="auto" data-light-theme="light" data-dark-t


In [22]:
with open('webpage.html', 'w', encoding='utf-8') as file:
    file.write(response.text)

In [28]:
with open('webpage.html', 'r', encoding='utf-8') as file:
    html_content = file.read()

# Parse the HTML content with BeautifulSoup
ios_soup = BeautifulSoup(html_content, 'html.parser')

# Example: Print the title of the webpage
print("Webpage Title:", ios_soup.title.string)


for heading_level in ['h2', 'h3', 'h4']:
    headings = [heading.text.strip() for heading in soup.find_all(heading_level)]
    print(f"{heading_level.upper()} Headings:")
    for heading in headings:
        print("-", heading)
    print()

Webpage Title: ios · GitHub Topics · GitHub
H2 Headings:
- Navigation Menu
- Use saved searches to filter your results more quickly
- Here are
    48,137 public repositories
    matching this topic...
- Footer

H3 Headings:
- flutter          /
          flutter
- facebook          /
          react-native
- justjavac          /
          free-programming-books-zh_CN
- Solido          /
          awesome-flutter
- ultralytics          /
          yolov5
- FiloSottile          /
          mkcert
- ionic-team          /
          ionic-framework
- google          /
          material-design-icons
- vsouza          /
          awesome-ios
- appwrite          /
          appwrite
- dkhamsing          /
          open-source-ios-apps
- dcloudio          /
          uni-app
- fastlane          /
          fastlane
- expo          /
          expo
- SheetJS          /
          sheetjs
- xitu          /
          gold-miner
- bilibili          /
          ijkplayer
- utmapp          /
       

In [24]:
unique_tags = {tag.name for tag in soup.find_all()}

# Print the unique tags
print("Unique Tags in the HTML:")
for tag in sorted(unique_tags):
    print("-", tag)

Unique Tags in the HTML:
- a
- article
- auto-check
- body
- button
- clipboard-copy
- cookie-consent-link
- custom-scopes
- dd
- details
- details-dialog
- details-menu
- dialog
- dialog-helper
- div
- dl
- dt
- footer
- form
- ghcc-consent
- h1
- h2
- h3
- head
- header
- html
- i
- img
- input
- label
- li
- link
- main
- meta
- modal-dialog
- nav
- p
- path
- qbsearch-input
- query-builder
- react-partial
- relative-time
- script
- scrollable-region
- span
- summary
- svg
- template
- textarea
- title
- tool-tip
- topic-feeds-toast-trigger
- ul


In [25]:
titles = [title.text.strip() for title in soup.find_all('p', class_='f3 lh-condensed mb-0 mt-1 Link--primary')]
details = [detail.text.strip() for detail in soup.find_all('p', class_='f5 color-fg-muted mb-0 mt-1')]

# Print the length and content of each list
print("Titles:")
print("Number of Titles:", len(titles))
print("Titles Content:", titles)

print("\nDetails:")
print("Number of Details:", len(details))
print("Details Content:", details)

Titles:
Number of Titles: 0
Titles Content: []

Details:
Number of Details: 0
Details Content: []


In [26]:
titles = [title.get_text() for title in ios_soup.find_all('h3')]
details = [detail.get_text() for detail in ios_soup.find_all('p')]

# Create a dictionary with the extracted data
data = {
    'Title': titles,
    'Description': details
}

# Print the dictionary to verify its structure
print("Extracted Data Dictionary:")
for key, value in data.items():
    print(f"{key}: {value}")

# Convert the dictionary into a pandas DataFrame
df = pd.DataFrame(data)

# Print the DataFrame to confirm its structure and contents
print("\nDataFrame:")
print(df)

Extracted Data Dictionary:
Title: ['\nflutter          /\n          flutter ', '\nfacebook          /\n          react-native ', '\njustjavac          /\n          free-programming-books-zh_CN ', '\nSolido          /\n          awesome-flutter ', '\nultralytics          /\n          yolov5 ', '\nFiloSottile          /\n          mkcert ', '\nionic-team          /\n          ionic-framework ', '\ngoogle          /\n          material-design-icons ', '\nvsouza          /\n          awesome-ios ', '\nappwrite          /\n          appwrite ', '\ndkhamsing          /\n          open-source-ios-apps ', '\ndcloudio          /\n          uni-app ', '\nfastlane          /\n          fastlane ', '\nexpo          /\n          expo ', '\nSheetJS          /\n          sheetjs ', '\nxitu          /\n          gold-miner ', '\nbilibili          /\n          ijkplayer ', '\nutmapp          /\n          UTM ', '\nAvaloniaUI          /\n          Avalonia ', '\nquasarframework          /\n          qua

ValueError: All arrays must be of the same length