## Focus on time.sleep

From time to time, some websites can block you when you do too many requests at the same time. In order to avoid this situation you need to :
* perform requests scraping using a user_agent
* if that is not enough, use ```time.sleep``` to "act human"

## Why use time.sleep when scraping ? 

### Be polite to the server (rate limiting)
If you send many requests too fast, the server may:
* block your IP
* return 429 Too Many Requests
* silently throttle or ban you

time.sleep() slows you down so you look like a human, not a bot.

### Avoid breaking fragile websites
Some sites:
* can’t handle rapid repeated requests
* crash or return incomplete responses

Sleeping between requests makes scraping more stable.

### Respect website rules
Many sites mention request frequency in:
* robots.txt
* terms of service
Using sleep is the bare minimum of “ethical scraping”.

### Wait for content to be ready
In some workflows:
* page 2 depends on page 1
* backend needs time to generate data
* server-side caching kicks in
Without a pause, later requests may fail or return empty data.

## How to use it in practice ? 

In [1]:
import requests

In [2]:
import requests

url = "https://api.github.com/users/octocat"

for i in range(70):
    r = requests.get(url)
    print(i, r.status_code)


r = requests.get(url)
print(r.headers.get("X-RateLimit-Limit"))
print(r.headers.get("X-RateLimit-Remaining"))
print(r.headers.get("X-RateLimit-Reset"))


0 200
1 200
2 200
3 200
4 200
5 200
6 200
7 200
8 200
9 200
10 200
11 200
12 200
13 200
14 200
15 200
16 200
17 200
18 200
19 200
20 200
21 200
22 200
23 200
24 200
25 200
26 200
27 200
28 200
29 200
30 200
31 200
32 200
33 200
34 200
35 200
36 200
37 200
38 200
39 200
40 200
41 200
42 200
43 200
44 200
45 200
46 200
47 200
48 200
49 200
50 200
51 200
52 200
53 200
54 200
55 200
56 200
57 200
58 200
59 200
60 403
61 403
62 403
63 403
64 403
65 403
66 403
67 403
68 403
69 403
60
0
1769467544


In [3]:
import requests
import time

for i in range(10):
    r = requests.get("https://httpbin.org/status/429")
    print(r.status_code)


    time.sleep(2)  # wait 2 seconds between requests


429
429
429
429
429
429
429
429
429
429


In [4]:
import requests
import time

url = "https://api.github.com/users/octocat"

for i in range(5):
    r = requests.get(url)
    print(i, r.status_code)
    time.sleep(60)  # respect GitHub limit


0 403
1 403
2 403
3 403


KeyboardInterrupt: 

In [5]:
import time
import random
# if you want to act even more random

time.sleep(random.uniform(1, 3))
