# Practice NB Using ZenRows Web-Scraping Tutorial
---
## Douglas Krouth
## [TUTORIAL URL](https://www.zenrows.com/blog/stealth-web-scraping-in-python-avoid-blocking-like-a-ninja#ip-rate-limit)

In [2]:
# IP Rate Limit
# The idea here is that we don't know whether there exists IP rate limits that we need to be aware of.
# The workaround for this is to use a rotating/changing IP address via proxy.
import requests 
 
response = requests.get('http://httpbin.org/ip') 
print(response.json()['origin'])

73.62.199.225


Free proxy list site : [URL](https://free-proxy-list.net/)
---
This gives us a list of *unreliable* proxies that we can practice with.

In [4]:
proxies = {'http': 'http://105.242.158.92:3129'} 
response = requests.get('http://httpbin.org/ip', proxies=proxies) 
print(response.json()['origin']) 

105.242.158.92


## User-Agent Header
Check request headers to see if we're presenting any suspicious UA info

In [5]:
response = requests.get('http://httpbin.org/headers') 
print(response.json()['headers']['User-Agent']) 
# python-requests/2.25.1

python-requests/2.28.2


In [6]:
# Fake User-Agent header passed with requests
headers = {"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36"} 
response = requests.get('http://httpbin.org/headers', headers=headers) 
print(response.json()['headers']['User-Agent']) # Mozilla/5.0 ...

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36


Mitigate detection by rotating User-Agent headers at random. Same idea as IP Addresses, allows us to simulate varying user traffic.

There are resources online that can be used to generate/source info on browser user-agent info. One challenge with simulating UA data is that there can be changes/updates to browsers that will need to be acommodated as they'll likely change the format of the UA header.

[USER AGENT DATABASE](https://explore.whatismybrowser.com/useragents/explore/)

In [7]:
import random 
 
user_agents = [ 
	'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36', 
	'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36', 
	'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36', 
	'Mozilla/5.0 (iPhone; CPU iPhone OS 12_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148', 
	'Mozilla/5.0 (Linux; Android 11; SM-G960U) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.72 Mobile Safari/537.36' 
] 
user_agent = random.choice(user_agents) 
headers = {'User-Agent': user_agent} 
response = requests.get('https://httpbin.org/headers', headers=headers) 
print(response.json()['headers']['User-Agent']) 
# Mozilla/5.0 (iPhone; CPU iPhone OS 12_2 like Mac OS X) ...

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36
