# Special Category Question Scraper

Uses BeautifulSoup to try and parse through Quizlet and find good questions to ask.

### Importing Libraries

Used to make a request to the URL, parse through the HTML, and save the information as a .txt file.

In [1]:
import requests
from bs4 import BeautifulSoup 
import csv

### Get Links for Quizlet Sets

Using a key word, this function will create a URL based on the key word and extract all of the possible sets using that key word.

In [2]:
def get_links(category):
    
    # Empty array to hold link strings
    links_strings = []
    
    # Define the URL
    URL = "https://quizlet.com/subject/" + category + "/"
    
    # Make a request for the URL
    r = requests.get(URL) 
    
    # Make a BeautifulSoup object to parse the HTML
    soup = BeautifulSoup(r.content, 'html5lib') 
    
    # Parse through all of the links
    for link in soup.find_all('a'):
        
        # Get the string of a link
        link_string = link.get('href')
        
        # Check if a string is a valid URL
        if (link_string[0] != "/" and "quizlet.com/" in link_string):
            
            # Append to overall link strings
            link_strings.append(link_string)
            
    return link_strings

### Get Questions from the Links

Gets the questions from the links and appends them to a .txt file.

In [3]:
def create_questions(topic, link_strings):
    
    # Opens up a new file with the topic name
    file1 = open("questions/" + topic + ".txt","w") 
    
    # For each link in the array of links
    for link in link_strings:
        
        # Make a new request for the link
        r = requests.get(link)
        
        # Make a BeautifulSoup object to parse the HTML
        soup = BeautifulSoup(r.content, 'html5lib') 
        
        # Parse through all of the a tags in the links
        for link in soup.find_all('a'):
            
            # Check to see what the classes of the a tags are
            link_class = link.get('class')[0]
            
            # If the tag is a term
            if (link_class == "SetPageTerm-wordText"):
                
                # Get the text of the term
                span_text = str(link.contents)
                
                # Take out any extra HTML tags
                span_text = span_text.replace("<br/>", "")
                
                # Write the term to the text file
                file1.write("Term: " + ((span_text)[44:len(span_text) -8]) + "\n")
                
            # If the tag is a definition
            elif (link_class == "SetPageTerm-definitionText"):
                
                # Get the text of the definition
                span_text = str(link.contents)
                
                # Take out any extra HTML tags
                span_text = span_text.replace("<br/>", "")
                
                # Write the definition to the text file
                file1.write("Definition: " + ((span_text)[44:len(span_text) -8])  + "\n\n")

    # Close the file
    file1.close()

### Test Program

Tests all necessary functions.

In [4]:
special_categories = ["star-wars", "world-war-ii", "shakespeare's-works", "physics"]

for category in special_categories:
    
    # Create an empty array of links
    link_strings = []
    
    # Get all of the links corresponding to a certain category
    link_strings = get_links(category)
    
    # Create the questions for the category
    create_questions(category, link_strings)