# Stack Overflow
Stack Overflow is one of the most popular question and answer website for both professional and entusiast programmers. Founded originally in 2008, it has grown to become the largest, most trusted online community for developers to learn, share their knowledge, and build their careers. 

Nowadays, Stack Overflow has been incorporated as a part of the Stack Excange network of Q&A sites with a mission to help each other. With your help, we're working together to build a library of detailed answers to every question about programming.

It takes is name from the very common programming bug, called "stack overflow", where a program tries to use more memory than is available on the stack. The website's tagline is "Where Developers Learn, Share, & Build Careers".

While stack overflow is a great place to learn and share knowledge, people criticize it for being a somewhat hostile place for some of its user, especially the new developers, as well as for a very aggressive moderation policy.

Despite this, Stack Overflow is a name that every developer knows and uses, and it is a great place to learn and share knowledge.

## Using Stack Overflow's API
Stack Overflow has a public API that allows you to access the data on the website. It can be used to access questions, answers, users, tags, and more. The API is free to use, but anonimous requests are limited to 300 requests per day. To increase this limit, you need to register an account and get an API key, which will allow you to make 10,000 requests per day.  
Still, the API can be difficult to use to retrieve large amounts of data, as it only allows you to retrieve 100 items per request and has a backoff timer of 10 seconds per request. This means that if you want to retrieve 1000 items, you need to make 10 requests, each one waiting 10 seconds before the next one. This can be very time consuming, especially if you want to retrieve a large amount of data.  
If larger amounts of data are needed, you can use the Stack Exchange Data Explorer to retrieve the data you need. This tool allows you to write SQL queries to retrieve the data you need.


### Using the API
The API is a standard REST API, which means that you can use any HTTP client to make requests to it.  
We will be using the Python requests library to make requests to the API.

In [1]:
# Import necessary libraries
import requests
import json
import time

Now define a function that allows us to create requests to send to the API. This function will take the endpoint of the request as a parameter, and will return the response as a JSON object.

In [2]:
def request_to(site : str = "https://api.stackexchange.com/2.3", 
               method : str = "questions", 
               parameters : dict = {"order" : "desc", 
                                    "sort" : "activity", 
                                    "site" : "stackoverflow"}) -> dict:
    # Make the request
    response = requests.get(f"{site}/{method}", params=parameters)
    # Return the response as a dictionary
    return response.json()

Also define a helper function that allows us to print the JSON response in a more readable way.

In [3]:
def print_json(json_dict : dict) -> None:
    # Print the json dictionary in a readable format
    print(json.dumps(json_dict, indent=4))

Lets test what we have:

In [4]:
data = request_to()
print_json(data)

{
    "items": [
        {
            "tags": [
                "reactjs"
            ],
            "owner": {
                "account_id": 20288629,
                "reputation": 1791,
                "user_id": 14880787,
                "user_type": "registered",
                "profile_image": "https://lh4.googleusercontent.com/-YSt5K9vGiTc/AAAAAAAAAAI/AAAAAAAAAAA/AMZuucm51GVw9vFMGAgI58cIhJqivOUHIA/s96-c/photo.jpg?sz=256",
                "display_name": "Manishkumar Adesara",
                "link": "https://stackoverflow.com/users/14880787/manishkumar-adesara"
            },
            "is_answered": true,
            "view_count": 156999,
            "answer_count": 21,
            "score": 159,
            "last_activity_date": 1668424365,
            "creation_date": 1639593476,
            "last_edit_date": 1647330693,
            "question_id": 70368760,
            "content_license": "CC BY-SA 4.0",
            "link": "https://stackoverflow.com/questions/70368760/react

As we can see, the response is a JSON object with the information about the questions.
The response contains:
- items: the results of the query (in our case, the questions)
- has_more: a boolean value that indicates if there are more results to retrieve. We can use pagination to retrieve more results.
- quota_max: the maximum number of requests you can make per day. As specified before, this is 300 for anonimous requests and 10,000 for registered requests.
- quota_remaining: the number of requests you can still make today.  
Now let's try to use pagination. We will request the first 500 questions that are tagged with python.

In [10]:
data = request_to(method="questions", parameters={"order" : "desc", "site" : "stackoverflow", "tagged" : "python;"})
print_json(data)

{
    "items": [
        {
            "tags": [
                "javascript",
                "python",
                "html",
                "django"
            ],
            "owner": {
                "account_id": 26620676,
                "reputation": 1,
                "user_id": 20239504,
                "user_type": "registered",
                "profile_image": "https://www.gravatar.com/avatar/0c9e7412e0b5bade5c42040794b24097?s=256&d=identicon&r=PG",
                "display_name": "dragline",
                "link": "https://stackoverflow.com/users/20239504/dragline"
            },
            "is_answered": false,
            "view_count": 2,
            "answer_count": 0,
            "score": 0,
            "last_activity_date": 1668424727,
            "creation_date": 1668424727,
            "question_id": 74430807,
            "content_license": "CC BY-SA 4.0",
            "link": "https://stackoverflow.com/questions/74430807/creating-a-dynamic-input-field-with-djang

Lets try to visualize the length of the questions:

In [11]:
print(len(data["items"]))

30


This is not what we expected. In fact, we expected 100 results. We need to specify the page parameter to retrieve the next page of results.

In [12]:
data = request_to(method="questions", parameters={"order" : "desc", "site" : "stackoverflow", "tagged" : "python;", "pagesize" : 100})
print(len(data["items"]))

100


Now we can finally retrieve the first 500 questions.

In [23]:
data = []
ELEMS = 500
ELEMPERPAGE = 100

def page_request(method : str, i : int, parameters : dict, page_size : int = 100) -> dict:
    # For sagety, notify the user if page / pagesize was already present in the parameter list
    if "page" in parameters:
        print("Page already present in parameters. Overwriting.")
    if "pagesize" in parameters:
        print("Page size already present in parameters. Overwriting.")
    parameters["page"] = i
    parameters["pagesize"] = page_size
    return request_to(method="questions", parameters=parameters)

for i in range(1, ((ELEMS + ELEMPERPAGE) //ELEMPERPAGE) ):
    data.append(page_request("questions", i, parameters={"order" : "desc", "site" : "stackoverflow", "tagged" : "python;"}))
    if "backoff" in data[-1]:
        time.sleep(data[-1]["backoff"])

In [24]:
print([len(data[i]) for i in range(len(data))])
print([len(data[i]["items"]) for i in range(len(data))])

[4, 4, 4, 4, 4]
[100, 100, 100, 100, 100]


Now that we know how to access the data, lets try to do something more interesting.  
First, retrieve the first 2500 questions that are unanswered, sorted by the number of votes (descending).

In [None]:
data = []
ELEMS = 2500
parameters = {"order" : "desc", "site" : "stackoverflow", "unanswered" : "true"}
for i in range(1, ((ELEMS + ELEMPERPAGE) //ELEMPERPAGE) ):
    data.append(page_request("questions", i, parameters))
    if "backoff" in data[-1]:
        time.sleep(data[-1]["backoff"])

In [34]:
print(len(data))
# Check how many requests do we still have
print(data[24]["quota_remaining"])


100
246
