# Web 2: Flask

In [1]:
import requests
import time
import urllib.robotparser

### Rate-limited webpage parsing


- `requests` module:
    - `resp = requests.get(<URL>)` method: enables us to send HTTP GET request
    - `resp.status_code`: status code of the response
    - `resp.text`: `str` text content of the response
    - `resp.headers`: `dict` content of response headers
    
- `@` operator is called a "decorator"
- `flask.Response`: enables us to create a response object instance
    - Arguments: `str` representing reponse, `headers` dict representing metadata, `status` representing status code.
    - ex: 
    ```python
    flask.Response("<b>go away</b>",
                              status=429,
                              headers={"Retry-After": "3"})
    ```
    
    ```python
    flask.Response("""User-Agent: *
    Disallow: /never
    """, headers={"Content-Type": "text/plain"})
    ```

- `flask.request.remote_addr`: enables us to take action based on the IP address from which we receive the request

- 429 Too Many Requests: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429

In [2]:
base_url = "http://35.226.223.87:5000/"

In [3]:
def friendly_get(url):
    while True:
        resp = requests.get(url)
        if resp.status_code == 429:
            seconds = int(resp.headers.get("Retry-After", 1))
            print(f"sleep {seconds}")
            time.sleep(seconds)
            continue
        resp.raise_for_status() # raise exception if not 200
        return resp
    
friendly_get(base_url + "slow").text

'welcome!'

### `urllib.robotparser`

- Documentation: https://docs.python.org/3/library/urllib.robotparser.html
- A few websites with robots.txt
    - https://en.wikipedia.org/robots.txt
    - https://www.reddit.com/robots.txt
    - https://cs320.cs.wisc.edu/su24/robots.txt