# User-Agent Scraper

A small scraper that creates a list of user-agent string for further using (especially for scraping, so we are not that likely to get blocked). The user-agent strings get scraped from https://udger.com/resources/ua-list. The program works on BeautifulSoup 4 and pandas for the dataframe. Workflow is like follows:
  
1. Get a list of available browsers and define a list of the ones we want to use.
2. Open each of these Browser user-agent sites.
3. Scrape all the available user-agents and save all the informaiton in a triple (name, url, user-agent).
4. Write them into a dataframe
  
Optionally, implement a helper method that loads this file and outputs a dataframe containing user-agents. So that the UA can easy be used in other programs.


In [1]:
# Constants
UDGER_URL = "https://udger.com"
BROWSER_LIST_LINK = "https://udger.com/resources/ua-list"
BROWSER_NAMES = []
BROWSER_LINKS = []
TRIPLE = []

In [2]:
import requests
from bs4 import BeautifulSoup
from tqdm.notebook import tqdm
import pandas as pd
tqdm().pandas()

HBox(children=(FloatProgress(value=1.0, bar_style='info', max=1.0), HTML(value='')))

  from pandas import Panel


Get a list of all the available browsers with their corresponing links as a tuple so it is easy to loop through it and pick only the ones we want (or all)

In [3]:
page = requests.get(BROWSER_LIST_LINK)
soup = BeautifulSoup(page.content, 'lxml')
table = soup.find_all("table")[0]
tds = table.find_all("td")
browser_tds = []
for td in tqdm(tds):
    if str(td).startswith("<td><a href=\"/resources/ua-list/browser-detail?browser=") & (len(browser_tds) < 200):
        browser_tds.append(td)
        
print(f"{len(browser_tds)} Browsers found")

HBox(children=(FloatProgress(value=0.0, max=2903.0), HTML(value='')))


200 Browsers found


In [4]:
# Loading browser names and their urls
for browser_td in tqdm(browser_tds):
    name = str(browser_td.get_text())
    url = UDGER_URL + str(browser_td.find()["href"]).replace(" ","%20")
    BROWSER_NAMES.append(name)
    BROWSER_LINKS.append(url)

HBox(children=(FloatProgress(value=0.0, max=200.0), HTML(value='')))




In [5]:
# Get all available user-agents, create and add tuples with them, saving all the information
for i in tqdm(range(len(BROWSER_NAMES))):
    name = BROWSER_NAMES[i]
    url = BROWSER_LINKS[i]
    
    page = requests.get(url)
    soup = BeautifulSoup(page.content, 'lxml')
    ua_links = soup.find_all("a", href=True)
    for link in ua_links:
        if "/resources/online-parser" in str(link) and link.get_text() != "Online parser":
            ua = (link.get_text())
            information_set = (name, url, ua)
            print(information_set)
            TRIPLE.append(information_set)

HBox(children=(FloatProgress(value=0.0, max=200.0), HTML(value='')))

('115 Browser', 'https://udger.com/resources/ua-list/browser-detail?browser=115%20Browser', 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.59 Safari/537.36 115Browser/8.6.2')
('2345 Explorer', 'https://udger.com/resources/ua-list/browser-detail?browser=2345%20Explorer', 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; 2345Explorer 5.0.0.14136)')
('2345 Explorer', 'https://udger.com/resources/ua-list/browser-detail?browser=2345%20Explorer', 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.108 Safari/537.36 2345Explorer/7.1.0.12633')
('360 browser', 'https://udger.com/resources/ua-list/browser-detail?browser=360%20browser', 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1 QIHU 360SE')
('360 browser', 'https://udger.com/resources/ua-list/browser-detail?browser=360%20browser', 'Mozilla/5.0 (Windows NT 6.1) AppleWebK

('AOL Shield', 'https://udger.com/resources/ua-list/browser-detail?browser=AOL%20Shield', 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2841.00 Safari/537.36 AOLShield/54.0.2848.0')
('AOL Shield', 'https://udger.com/resources/ua-list/browser-detail?browser=AOL%20Shield', 'Mozilla/5.0 (Windows NT 6.1; rv:52.0) Gecko/20100101 AOLShield/52.4.2')
('Aplix', 'https://udger.com/resources/ua-list/browser-detail?browser=Aplix', 'Aplix_SEGASATURN_browser/1.x (Japanese)')
('Arachne', 'https://udger.com/resources/ua-list/browser-detail?browser=Arachne', 'xChaos_Arachne/5.1.89;GPL,386+')
('Arctic Fox', 'https://udger.com/resources/ua-list/browser-detail?browser=Arctic%20Fox', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:52.9) Gecko/20100101 Goanna/3.4 Firefox/52.9 ArcticFox/27.9.15')
('Arora', 'https://udger.com/resources/ua-list/browser-detail?browser=Arora', 'Mozilla/5.0 (X11; U; Linux; C -) AppleWebKit/523.15 (KHTML, like Gecko, Safari/419.3) Arora/0.5')
('

('Blackbird', 'https://udger.com/resources/ua-list/browser-detail?browser=Blackbird', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9) Gecko/2008120120 Blackbird/0.9991')
('BlackHawk', 'https://udger.com/resources/ua-list/browser-detail?browser=BlackHawk', 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.3 (KHTML, like Gecko) BlackHawk/1.0.195.0 Chrome/127.0.0.1 Safari/62439616.534')
('BlackHawk', 'https://udger.com/resources/ua-list/browser-detail?browser=BlackHawk', 'Mozilla/5.0 (Windows NT 6.1; rv:25.3) Gecko/20150425 BlackHawk/25.3.1')
('Bolt', 'https://udger.com/resources/ua-list/browser-detail?browser=Bolt', 'Mozilla/5.0 (X11; 78; CentOS; US-en) AppleWebKit/527+ (KHTML, like Gecko) Bolt/0.862 Version/3.0 Safari/523.15')
('Bolt', 'https://udger.com/resources/ua-list/browser-detail?browser=Bolt', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; BOLT/2.514) AppleWebKit/534.6 (KHTML, like Gecko) Version/5.0 Safari/534.6.3')
('Brave', 'https://udger.com/resourc

('Chrome frozen UA', 'https://udger.com/resources/ua-list/browser-detail?browser=Chrome%20frozen%20UA', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3764.0 Safari/537.36')
('Chrome Headless', 'https://udger.com/resources/ua-list/browser-detail?browser=Chrome%20Headless', 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome Safari/537.36')
('Chromium', 'https://udger.com/resources/ua-list/browser-detail?browser=Chromium', 'Mozilla/5.0 (X11; U; Linux x86_64; en-US) AppleWebKit/534.10 (KHTML, like Gecko) Ubuntu/10.10 Chromium/8.0.552.237 Chrome/8.0.552.237 Safari/534.10')
('Chromium', 'https://udger.com/resources/ua-list/browser-detail?browser=Chromium', 'Mozilla/5.0 (X11; Linux i686) AppleWebKit/535.7 (KHTML, like Gecko) Ubuntu/11.10 Chromium/16.0.912.21 Chrome/16.0.912.21 Safari/535.7')
('Chromium', 'https://udger.com/resources/ua-list/browser-detail?browser=Chromium', 'Mozilla/5.0 (X11; Linux x86_64) App

('Cuam', 'https://udger.com/resources/ua-list/browser-detail?browser=Cuam', 'Cuam Ver0.050bx  ')
('Cunaguaro', 'https://udger.com/resources/ua-list/browser-detail?browser=Cunaguaro', 'Mozilla/5.0 (X11; Linux i686; rv:8.0) Gecko/20100101 Firefox/8.0 Cunaguaro/8.0')
('Cunaguaro', 'https://udger.com/resources/ua-list/browser-detail?browser=Cunaguaro', 'Mozilla/5.0 (X11; Linux i686; rv:27.0) Gecko/20100101 Firefox/27.0 Cunaguaro/27.0')
('Cyberdog', 'https://udger.com/resources/ua-list/browser-detail?browser=Cyberdog', 'Cyberdog/2.0 (Macintosh; PPC)')
('Cyberfox', 'https://udger.com/resources/ua-list/browser-detail?browser=Cyberfox', 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:28.0) Gecko/20100101 Firefox/28.0 Cyberfox/28.0.1')
('Cyberfox', 'https://udger.com/resources/ua-list/browser-detail?browser=Cyberfox', 'Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:30.0) Gecko/20100101 Firefox/30.0 Cyberfox/30.0')
('Cyberfox', 'https://udger.com/resources/ua-list/browser-detail?browser=Cyberfox', 'Mo

('Firefox (BonEcho)', 'https://udger.com/resources/ua-list/browser-detail?browser=Firefox%20(BonEcho)', 'Mozilla/5.0 (BeOS; U; Haiku BePC; en-US; rv:1.8.1.21pre) Gecko/20090227 BonEcho/2.0.0.21pre')
('Firefox (BonEcho)', 'https://udger.com/resources/ua-list/browser-detail?browser=Firefox%20(BonEcho)', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.9) Gecko/20071113 BonEcho/2.0.0.9')
('Firefox (BonEcho)', 'https://udger.com/resources/ua-list/browser-detail?browser=Firefox%20(BonEcho)', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1) Gecko/20061026 BonEcho/2.0')
('Firefox (GranParadiso)', 'https://udger.com/resources/ua-list/browser-detail?browser=Firefox%20(GranParadiso)', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.8) Gecko/2009033017 GranParadiso/3.0.8')
('Firefox (Lorentz)', 'https://udger.com/resources/ua-list/browser-detail?browser=Firefox%20(Lorentz)', 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2) Gecko/20100411 Lorentz/3.6.3 GTB7.0')
('Firefox (Min

('IBM WebExplorer', 'https://udger.com/resources/ua-list/browser-detail?browser=IBM%20WebExplorer', 'IBM WebExplorer /v0.94')
('IBM WebExplorer', 'https://udger.com/resources/ua-list/browser-detail?browser=IBM%20WebExplorer', 'IBM WebExplorer /v1.02c')
('IBrowse', 'https://udger.com/resources/ua-list/browser-detail?browser=IBrowse', 'IBrowse/2.3 (AmigaOS 3.9)')
('IBrowse', 'https://udger.com/resources/ua-list/browser-detail?browser=IBrowse', 'Mozilla/5.0 (compatible; IBrowse 3.0; AmigaOS4.0)')
('iCab', 'https://udger.com/resources/ua-list/browser-detail?browser=iCab', 'Mozilla/4.5 (compatible; iCab 2.9.1; Macintosh; U; PPC)')
('iCab', 'https://udger.com/resources/ua-list/browser-detail?browser=iCab', 'iCab/3.0.2 (Macintosh; U; PPC Mac OS X)')
('iCab', 'https://udger.com/resources/ua-list/browser-detail?browser=iCab', 'iCab/4.0 (Macintosh; U; Intel Mac OS X)')
('iCab', 'https://udger.com/resources/ua-list/browser-detail?browser=iCab', 'iCab/4.7 (Macintosh; U; PPC Mac OS X)')
('iCab', 'h

('InternetSurfboard', 'https://udger.com/resources/ua-list/browser-detail?browser=InternetSurfboard', 'Mozilla/5.0 (Windows; U; Windows NT 6.1; cs-CZ) AppleWebKit/533.3 (KHTML, like Gecko) InternetSurfboard/0.002 Safari/533.3')
('iRider', 'https://udger.com/resources/ua-list/browser-detail?browser=iRider', 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; iRider 2.21.1108; FDM)')
('iRider', 'https://udger.com/resources/ua-list/browser-detail?browser=iRider', 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; Trident/4.0; iRider 2.60.0008; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0)')
('Iridium', 'https://udger.com/resources/ua-list/browser-detail?browser=Iridium', 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Iridium/48.2 Safari/537.36 Chrome/48.0.2564.116')
('Iridium', 'https://udger.com/resources/ua-list/browser-detail?browser=Iridium', 'Mozilla/5.0 (X11; OpenBSD amd64) AppleWebKit/537.36 (KHTML, like G

('Kylo', 'https://udger.com/resources/ua-list/browser-detail?browser=Kylo', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2) Gecko/20100222 Firefox/3.6 Kylo/0.8.4.74873')
('Kylo', 'https://udger.com/resources/ua-list/browser-detail?browser=Kylo', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2) Gecko/20100222 Firefox/3.6 Kylo/0.6.1.70394')
('LBrowser', 'https://udger.com/resources/ua-list/browser-detail?browser=LBrowser', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.6) Gecko/20071115 Firefox/2.0.0.6 LBrowser/2.0.0.6')
('LG Web Browser', 'https://udger.com/resources/ua-list/browser-detail?browser=LG%20Web%20Browser', 'Mozilla/5.0 (DirectFB; U; Linux mips; en) AppleWebKit/531.2+ (KHTML, like Gecko) Safari/531.2+ LG Browser/4.0.10(+SCREEN+TUNER; LGE; 42LE5500-SA; 04.02.02; 0x00000001;); LG NetCast.TV-2010')
('LG Web Browser', 'https://udger.com/resources/ua-list/browser-detail?browser=LG%20Web%20Browser', 'Mozilla/5.0 (Unknown; Linux armv7l) AppleWebKit/537.1+ (KH

('Mercury', 'https://udger.com/resources/ua-list/browser-detail?browser=Mercury', 'Mozilla/5.0 (iPhone; CPU iPhone OS 6_0_1 like Mac OS X) AppleWebKit/536.26 (KHTML, like Gecko) Mercury/3.4 Mobile/10A523 Safari/8536.25')
('Microsoft Edge', 'https://udger.com/resources/ua-list/browser-detail?browser=Microsoft%20Edge', 'Mozilla/5.0 (Windows NT 6.4; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.143 Safari/537.36 Edge/12.0')
('Microsoft Edge', 'https://udger.com/resources/ua-list/browser-detail?browser=Microsoft%20Edge', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.9600')
('Microsoft Edge', 'https://udger.com/resources/ua-list/browser-detail?browser=Microsoft%20Edge', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.10240')
('Microsoft Edge', 'https://udger.com/resources/ua-list/browser-detail?browser=Microsoft%20Edge',

('Netscape Navigator', 'https://udger.com/resources/ua-list/browser-detail?browser=Netscape%20Navigator', 'Mozilla/5.0 (Windows; U; Win 9x 4.90; de-DE; rv:0.9.2) Gecko/20010726 Netscape6/6.1')
('Netscape Navigator', 'https://udger.com/resources/ua-list/browser-detail?browser=Netscape%20Navigator', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.0.2) Gecko/20030208 Netscape/7.02')
('Netscape Navigator', 'https://udger.com/resources/ua-list/browser-detail?browser=Netscape%20Navigator', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.5) Gecko/20060127 Netscape/8.1')
('Netscape Navigator', 'https://udger.com/resources/ua-list/browser-detail?browser=Netscape%20Navigator', 'Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.12) Gecko/20080219 Firefox/2.0.0.12 Navigator/9.0.0.6')
('Netscape Navigator', 'https://udger.com/resources/ua-list/browser-detail?browser=Netscape%20Navigator', 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1.12) Gecko/20080219 Firefox/2.0

('Oregano', 'https://udger.com/resources/ua-list/browser-detail?browser=Oregano', 'Mozilla/1.10 [en] (Compatible; RISC OS 3.70; Oregano 1.10)')
('Otter', 'https://udger.com/resources/ua-list/browser-detail?browser=Otter', 'Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/538.1 (KHTML, like Gecko) Otter/0.1.01 Safari/538.1')
('Otter', 'https://udger.com/resources/ua-list/browser-detail?browser=Otter', 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) Otter/0.0.01 Safari/538.1')
('Otter', 'https://udger.com/resources/ua-list/browser-detail?browser=Otter', 'Mozilla/5.0 (X11; Linux i686) AppleWebKit/538.1 (KHTML, like Gecko) Otter/0.2.01-dev Safari/538.1')
('Otter', 'https://udger.com/resources/ua-list/browser-detail?browser=Otter', 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/538.1 (KHTML, like Gecko) Otter/0.3.01-dev Safari/538.1')
('Otter', 'https://udger.com/resources/ua-list/browser-detail?browser=Otter', 'Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/538.1 (KHTM

('QupZilla', 'https://udger.com/resources/ua-list/browser-detail?browser=QupZilla', 'Mozilla/5.0 (Windows; U; Windows NT 6.1; cs-CZ) AppleWebKit/533.3 (KHTML, like Gecko) QupZilla/1.1.5 Safari/533.3')
('QupZilla', 'https://udger.com/resources/ua-list/browser-detail?browser=QupZilla', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; cs-CZ) AppleWebKit/533.3 (KHTML, like Gecko) QupZilla/1.1.5 Safari/533.3')
('QupZilla', 'https://udger.com/resources/ua-list/browser-detail?browser=QupZilla', 'Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en-US) AppleWebKit/533.3 (KHTML, like Gecko) QupZilla/1.1.5 Safari/533.3')
('QupZilla', 'https://udger.com/resources/ua-list/browser-detail?browser=QupZilla', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; es-ES) AppleWebKit/533.3 (KHTML, like Gecko) QupZilla/1.1.0 Safari/533.3')
('QupZilla', 'https://udger.com/resources/ua-list/browser-detail?browser=QupZilla', 'Mozilla/5.0 (X11; U; Linux i686; pl-PL) AppleWebKit/533.3 (KHTML, like Gecko) QupZilla/1.0.0-rc1 Safari/533.3')

In [7]:
# Create lists of information, create a dataframe and save it
names = []
urls = []
ua = []
for triple in TRIPLE:
    names.append(triple[0])
    urls.append(triple[1])
    ua.append(triple[2])
    
columns = {"browser_name": names, "browser_url": urls, "user-agent": ua}
df = pd.DataFrame(columns)
df.to_csv("User_agent_table.csv", sep="µ", index=False)