## Scraping all the questions and other details from stackoverflow.

In [1]:
import pandas as pd

In [2]:

import requests
from bs4 import BeautifulSoup

# Load the URL
base_url = 'https://stackoverflow.com/questions/tagged/'
tag = 'python'
query_filter = 'Votes'
url = f"{base_url}{tag}?tab={query_filter}"

# Send a GET request to the URL
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    # Parse the HTML content
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # Find all elements with the class 's-post-summary' or 'js-post-summary'
    post_summaries = soup.find_all(class_=['s-post-summary'])
    
    # Iterate over each post summary
    for post_summary in post_summaries:
        # Split the text of post summary by newline character
        lines = post_summary.text.split("\n")
        
        # Filter out empty lines
        lines = [line.strip() for line in lines if line.strip()]
        
        # Print each line
        for line in lines:
            print(line)
        
        # Add a separator for better readability
        print('-' * 50)
else:
    print("Failed to fetch the page")


12878
votes
50
answers
3.3m
views
What does the "yield" keyword do in Python?
What functionality does the 'yield' keyword in Python provide?
For example, I'm trying to understand this code1:
def _get_child_candidates(self, distance, min_dist, max_dist):
if self._leftchild ...
pythoniteratorgeneratoryield
Alex. S.
145k
asked Oct 23, 2008 at 22:21
--------------------------------------------------
8196
votes
46
answers
4.6m
views
What does if __name__ == "__main__": do?
What does this do, and why should one include the if statement?
if __name__ == "__main__":
print("Hello, World!")
If you are trying to close a question where someone should be ...
pythonnamespacesprogram-entry-pointpython-moduleidioms
Devoted
180k
asked Jan 7, 2009 at 4:11
--------------------------------------------------
7901
votes
32
answers
2.9m
views
Does Python have a ternary conditional operator?
Is there a ternary conditional operator in Python?
pythonoperatorsconditional-operator
Devoted
180k
asked Dec 27, 2008 a

## Scraping only needed data

In [3]:
import requests
from bs4 import BeautifulSoup

# Load the URL
base_url = 'https://stackoverflow.com/questions/tagged/'
tag = 'python'
query_filter = 'Votes'
url = f"{base_url}{tag}?tab={query_filter}"

# Send a GET request to the URL
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    # Parse the HTML content
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # Find all elements with the class 's-post-summary' or 'js-post-summary'
    post_summaries = soup.find_all(class_=['s-post-summary'])
    
    # Initialize data list
    data = []
    
    # Iterate over each post summary
    for post_summary in post_summaries:
        # Extract the question title
        question_title = post_summary.find(class_='s-post-summary--content-title').text.strip()
        
        # Extract the number of votes
        votes = post_summary.find(class_='s-post-summary--stats-item-number').text.strip()

        # Extract tags
        tags = post_summary.find(class_='tags').text.strip()
        
        post_data = {
            'question_title': question_title,
            'votes': votes,
            'tags': tags
        }

        data.append(post_data)
        
    # Print the extracted data
    for item in data:
        print("Question:", item['question_title'])
        print("Votes:", item['votes'])
        print("Tags:", item['tags'])
        print('-' * 50)


Question: What does the "yield" keyword do in Python?
Votes: 12878
Tags: pythoniteratorgeneratoryield
--------------------------------------------------
Question: What does if __name__ == "__main__": do?
Votes: 8196
Tags: pythonnamespacesprogram-entry-pointpython-moduleidioms
--------------------------------------------------
Question: Does Python have a ternary conditional operator?
Votes: 7901
Tags: pythonoperatorsconditional-operator
--------------------------------------------------
Question: What are metaclasses in Python?
Votes: 7364
Tags: pythonoopmetaclasspython-classpython-datamodel
--------------------------------------------------
Question: How do I check whether a file exists without exceptions?
Votes: 7123
Tags: pythonfilefile-exists
--------------------------------------------------
Question: How do I merge two dictionaries in a single expression in Python?
Votes: 6917
Tags: pythondictionarymerge
--------------------------------------------------
Question: How do I execut

In [4]:
# data[0]

## Changing tags and scraping

In [5]:
import requests
from bs4 import BeautifulSoup

# Load the URL
base_url = 'https://stackoverflow.com/questions/tagged/'
tag = 'javascript'
query_filter = 'Votes'
url = f"{base_url}{tag}?tab={query_filter}"

# Send a GET request to the URL
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    # Parse the HTML content
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # Find all elements with the class 's-post-summary' or 'js-post-summary'
    post_summaries = soup.find_all(class_=['s-post-summary'])
    
    # Initialize data list
    data1 = []
    
    # Iterate over each post summary
    for post_summary in post_summaries:
        # Extract the question title
        question_title = post_summary.find(class_='s-post-summary--content-title').text.strip()
        
        # Extract the number of votes
        votes = post_summary.find(class_='s-post-summary--stats-item-number').text.strip()

        # Extract tags
        tags = post_summary.find(class_='tags').text.strip()
        
        post_data = {
            'question_title': question_title,
            'votes': votes,
            'tags': tags
        }

        data1.append(post_data)
        
    # Print the extracted data
    for item in data1:
        print("Question:", item['question_title'])
        print("Votes:", item['votes'])
        print("Tags:", item['tags'])
        print('-' * 50)


Question: How can I remove a specific item from an array in JavaScript?
Votes: 11875
Tags: javascriptarrays
--------------------------------------------------
Question: How do I check if an element is hidden in jQuery?
Votes: 8672
Tags: javascriptjquerydomvisibilitydisplay
--------------------------------------------------
Question: What does "use strict" do in JavaScript, and what is the reasoning behind it?
Votes: 8475
Tags: javascriptsyntaxjslintuse-strict
--------------------------------------------------
Question: How do I redirect to another webpage?
Votes: 7701
Tags: javascriptjqueryredirect
--------------------------------------------------
Question: var functionName = function() {} vs function functionName() {}
Votes: 7627
Tags: javascriptfunctionmethodssyntax
--------------------------------------------------
Question: How do JavaScript closures work?
Votes: 7619
Tags: javascriptfunctionvariablesscopeclosures
--------------------------------------------------
Question: How do

In [6]:
# data1[0]

## Scraping multiple pages

In [18]:
import requests
from bs4 import BeautifulSoup

# Load the URL
base_url = 'https://stackoverflow.com/questions/tagged/'
tag = 'python'
query_filter = 'Votes'
url = f"{base_url}{tag}?tab={query_filter}"

# Function to scrape a single page
def scrape_page(url):
    # Send a GET request to the URL
    response = requests.get(url)

    # Check if the request was successful
    if response.status_code == 200:
        # Parse the HTML content
        soup = BeautifulSoup(response.text, 'html.parser')
        
        # Find all elements with the class 's-post-summary' or 'js-post-summary'
        post_summaries = soup.find_all(class_=['s-post-summary'])
        
        # Initialize data list
        data2 = []
        
        # Iterate over each post summary
        for post_summary in post_summaries:
            # Extract the question title
            question_title = post_summary.find(class_='s-post-summary--content-title').text.strip()
            
            # Extract the number of votes
            votes = post_summary.find(class_='s-post-summary--stats-item-number').text.strip()

            # Extract tags
            tags = post_summary.find(class_='tags').text.strip()
            
            post_data = {
                'question_title': question_title,
                'votes': votes,
                'tags': tags
            }

            data2.append(post_data)
        
        return data2

# Scrape multiple pages
total_pages = 5  # Number of pages to scrape
for page_num in range(1, total_pages + 1):
    url = url.format(base_url, tag, query_filter, page_num)
    page_data = scrape_page(url)
    
    # Print the extracted data for the current page
    print(f"Page {page_num}:")
    for item in page_data:
        print("Question:", item['question_title'])
        print("Votes:", item['votes'])
        print("Tags:", item['tags'])
        print('-' * 50)


Page 1:
Question: What does the "yield" keyword do in Python?
Votes: 12878
Tags: pythoniteratorgeneratoryield
--------------------------------------------------
Question: What does if __name__ == "__main__": do?
Votes: 8196
Tags: pythonnamespacesprogram-entry-pointpython-moduleidioms
--------------------------------------------------
Question: Does Python have a ternary conditional operator?
Votes: 7901
Tags: pythonoperatorsconditional-operator
--------------------------------------------------
Question: What are metaclasses in Python?
Votes: 7364
Tags: pythonoopmetaclasspython-classpython-datamodel
--------------------------------------------------
Question: How do I check whether a file exists without exceptions?
Votes: 7123
Tags: pythonfilefile-exists
--------------------------------------------------
Question: How do I merge two dictionaries in a single expression in Python?
Votes: 6917
Tags: pythondictionarymerge
--------------------------------------------------
Question: How do 

In [19]:
df = pd.DataFrame(page_data)
df.head(112)

Unnamed: 0,question_title,votes,tags
0,"What does the ""yield"" keyword do in Python?",12878,pythoniteratorgeneratoryield
1,"What does if __name__ == ""__main__"": do?",8196,pythonnamespacesprogram-entry-pointpython-modu...
2,Does Python have a ternary conditional operator?,7901,pythonoperatorsconditional-operator
3,What are metaclasses in Python?,7364,pythonoopmetaclasspython-classpython-datamodel
4,How do I check whether a file exists without e...,7123,pythonfilefile-exists
5,How do I merge two dictionaries in a single ex...,6917,pythondictionarymerge
6,How do I execute a program or call a system co...,6111,pythonshellterminalsubprocesscommand
7,"How do I create a directory, and any missing p...",5643,pythonexceptionpathdirectoryoperating-system
8,How to access the index value in a 'for' loop?,5421,pythonloopslist
9,How do I make a flat list out of a list of lists?,5317,pythonlistmultidimensional-arrayflatten


In [20]:
df.shape

(50, 3)

In [21]:
df.to_csv("python.csv", index=False)