# <strong>BeautifulSoup:</strong> Web Scrapping - TRLE (Tomb Raider Level Editor)

**Name:** Arsalan Ali<br>
**Email:** arslanchaos@gmail.com

---

### **Table of Contents**
* Website to Scrap: "Tomb Raider Level Editor (TRLE)"
* Link of the site: https://www.trle.net/pFind.php
* Import Libraries
* Set URL and Headers
* Fetch Webpage
* Parse Webpage Data into BeautifulSoup
* Testing
* Web-scrapping of Multiple Pages
* Saving DataFrame as a CSV Dataset


**Note :** Columns to extract
*   nickname
*   level name
*   difficulty
*   rating
*   size
*   downloads
*   date

---

### Import Libraries

In [1]:
import pandas as pd
import requests
import re
from bs4 import BeautifulSoup

### Set URL and Headers

In [2]:
url = 'https://www.trle.net/pFind.php?atype=&idx=1'

# Headers are used to access the websites as a real user (not a bot)
headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.3; Win 64 ; x64) Apple WeKit /537.36(KHTML , like Gecko) Chrome/80.0.3987.162 Safari/537.36'}

# If you've a proxy then you can use it too
# proxies = {'https': 'http://62.60.160.34:3129'}

### Fetch Webpage

In [3]:
webpage = requests.get(url, headers=headers).text

# We can use the proxy using the proxies parameter
# webpage = requests.get(url, proxies=proxies, headers=headers).text

### Parse Webpage data into BeautifulSoup

In [4]:
soup = BeautifulSoup(webpage, "lxml")

### Testing

In [5]:
soup

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<!-- DW6 -->
<head>
<title>trle.net</title>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type"/>
<link href="trle.css" rel="stylesheet" type="text/css"/>
<script language="JavaScript" type="text/javascript"></script>
<style type="text/css">
</style>
</head>
<body bgcolor="#E1EEB2">
<!-- php functions -->
<!-- Main Code -->
<!-- Header html -->
<table border="0" cellpadding="0" cellspacing="0" width="100%">
<tr bgcolor="#E1EEB2">
<td rowspan="100%" width="5%"> </td>
<td rowspan="2"><a href="https://www.trle.net"><img alt="trle.net" height="95" longdesc="https://www.trle.net" src="trle_head.gif" width="169"/></a> </td>
<td align="center" class="largeGText" height="50" id="logo" nowrap="nowrap" valign="bottom" width="90%"><strong>Tomb Raider Level Editor </strong></td>
<td rowspan="100%" width="5%"> </td>
</tr

In [57]:
table = soup.find_all("table", {"class": "FindTable"})[0]
user_row = table.find_all("tr")[1]
user_row.find_all("td")[14] #.text.strip()

<td align="right" class="bodyText" nowrap="">
		    01-Sep-2022<img border="0" height="1" src="trle_spacer.gif" width="3"/> </td>

In [62]:
author, level_name, difficulty, size, rating, download, date = ([] for i in range(7))

for i in range(1,21):
    table = soup.find_all("table", {"class": "FindTable"})[0]
    user_row = table.find_all("tr")[i]
    author.append(user_row.find_all("td")[0].text.strip())
    level_name.append(user_row.find_all("td")[5].text.strip())
    difficulty.append(user_row.find_all("td")[6].text.strip())
    size.append(user_row.find_all("td")[9].text.strip())
    rating.append(user_row.find_all("td")[11].text.strip())
    download.append(user_row.find_all("td")[12].text.strip())
    date.append(user_row.find_all("td")[14].text.strip())

In [63]:
# Creating a dictionary that'll act as columns and rows for the DataFrame
dataframe = {
    "Author": author,
    "Level Name": level_name,
    "Difficulty": difficulty,
    "Ratings": rating,
    "Size (MBs)": size,
    "Downloads":download,
    "Release Date": date
}

# Creating the DataFrame by feeding it the dictionary
TRLE_data = pd.DataFrame(dataframe)

# Viewing the DataFrame
TRLE_data

Unnamed: 0,Author,Level Name,Difficulty,Ratings,Size (MBs),Downloads,Release Date
0,Sabatu,Sabatu's Tomb Raider 4 (Demo),medium,8.54,42,620,01-Sep-2022
1,TombExplorer,Siberian Expedition,medium,9.61,307,758,30-Aug-2022
2,trtimes,Temple of Thoth,easy,7.9,73,505,29-Aug-2022
3,Sabatu,Aperama Competition - City of Khamoon,challenging,9.3,136,453,27-Aug-2022
4,Neltharion,Ex Oblivione (Demo),medium,7.56,69,514,21-Aug-2022
5,cumulonimbus48,The Guardian of the Talion returns,medium,7.34,145,590,19-Aug-2022
6,Feder,Oasis,medium,9.39,167,899,13-Aug-2022
7,Kubsy,The Abandoned Library,medium,9.13,88,498,13-Aug-2022
8,Igor Gois,Cultist's Secret - Level 01 (Demo),medium,7.86,110,412,13-Aug-2022
9,Denny,Mystery of the Sunken Submarines,medium,7.64,101,474,06-Aug-2022


### Web-scrapping of Multiple Pages

In [123]:
TRLE_dataset = pd.DataFrame()
author, level_name, difficulty, size, rating, download, date = ([] for i in range(7))

for loop in range(0,2500, 19):
    url = f'https://www.trle.net/pFind.php?atype=&idx={loop}'
    headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.3; Win 64 ; x64) Apple WeKit /537.36(KHTML , like Gecko) Chrome/80.0.3987.162 Safari/537.36'}
    webpage = requests.get(url, headers=headers).text
    soup = BeautifulSoup(webpage, "lxml")
    for i in range(1,20):
        table = soup.find_all("table", {"class": "FindTable"})[0]
        user_row = table.find_all("tr")[i]
        author.append(user_row.find_all("td")[0].text.strip())
        level_name.append(user_row.find_all("td")[5].text.strip())
        difficulty.append(user_row.find_all("td")[6].text.strip())
        size.append(user_row.find_all("td")[9].text.strip())
        rating.append(user_row.find_all("td")[11].text.strip())
        download.append(user_row.find_all("td")[12].text.strip())
        date.append(user_row.find_all("td")[14].text.strip())
    
dataframe = {
    "Author": author,
    "Level Name": level_name,
    "Difficulty": difficulty,
    "Ratings": rating,
    "Size (MBs)": size,
    "Downloads":download,
    "Release Date": date
}

TRLE_dataset = pd.DataFrame(dataframe)

TRLE_dataset

Unnamed: 0,Author,Level Name,Difficulty,Ratings,Size (MBs),Downloads,Release Date
0,alan,The M16 Curse,medium,6.19,53,300,22-Sep-2022
1,Sabatu,Sabatu's Tomb Raider 4 (Demo),medium,8.54,42,622,01-Sep-2022
2,TombExplorer,Siberian Expedition,medium,9.61,307,758,30-Aug-2022
3,trtimes,Temple of Thoth,easy,7.90,73,505,29-Aug-2022
4,Sabatu,Aperama Competition - City of Khamoon,challenging,9.30,136,454,27-Aug-2022
...,...,...,...,...,...,...,...
2503,Baddy,Atlantis 1,,7.88,26,1776,15-Jun-2003
2504,Agnes,Khal,challenging,6.18,23,1457,15-Jun-2003
2505,MichaelP,Losing your Marbles,challenging,8.78,31,1779,15-Jun-2003
2506,deskj,Home Sweet Home,easy,5.02,17,1368,13-Jun-2003


### Saving DataFrame as a CSV Dataset

In [124]:
TRLE_dataset.to_csv("TRLE_dataset.csv")