# &#9678; WebScraping Project - Dark Souls Enemies

** In this project we are going to retrieving data from [Dark Souls Remastered Wiki Page](https://darksouls.wiki.fextralife.com/Dark+Souls+Wiki), particularly from the [Enemies' Section](https://darksouls.wiki.fextralife.com/Enemies). We're going to be taking each enemie's name, Stats (HP, Physical Defense, Slash Defense, Poison Resistance, Bleed Resistance, etc..), as well as the multiple variations of each one in order to create a DataFrame containing all this info.! **

In [167]:
import requests  # Web Request
import pandas as pd # For Dataframes!
from bs4 import BeautifulSoup # HTML Parser

** Let's begin by taking a single enemy example **

In [168]:
url = 'https://darksouls.wiki.fextralife.com/Torch+Hollows'

html = requests.get(url)

** Creating our BeautifulSoup object: **

In [169]:
bs_obj = BeautifulSoup(html.text, "lxml")

** Now that we got the raw html of the [Torch Hollows Enemy Page](https://darksouls.wiki.fextralife.com/Torch+Hollows) we can start locating where our desired information resides in it. Inside the page we can see on the right a box containing some information that we need from the enemy: their defenses and immunities.**

![infobox](infobox_TH.png)

** However, if we scroll down to the 'Locations' section, we see that there are multiple variations of the character, depending on what point (location) of the game you encounter it, each version has its own HP and Souls associated to it.** 

![locations](locations_TH.png)

 ** In order to go around this, the solution is be to create two separate tables: the first being the Enemy table, which will contain only a single record for each character with the respective information about Defenses and immunites, the primary key will be the name of the enemy itself, since there will be no redundancy in this dataframe. The second table will provide us the multiple variations of the enemies, the name of the character will be a Foreign Key, since there will be redundancy. At the end, we will be able to join both tables into a <u>single</u> frame **

# Let's get to it!

*"Chosen Undead. Your fate is… to succeed the Great Lord Gwyn. So that you may link the Fire, cast away the Dark, and undo the curse of the Undead."*

----

##  &#9678; 1. Enemies Table

** Let's locate the stats table inside our html... **

In [170]:
wiki_table = bs_obj.find("div", {"id": "infobox"}).find_next().find_next() # Currently on  'Wiki_table'

** Now inside the table, let's proceed by locating our defenses and resistances stats: **

In [171]:
Defenses_raw = wiki_table.find_all("tr")[3]

Resistances_raw = wiki_table.find_all("tr")[5]

In [172]:
Defenses_raw

<tr>
<td>77 ~ 159</td>
<td>77 ~ 159</td>
<td>77 ~ 159</td>
<td>77 ~ 159</td>
<td>58 ~ 119</td>
</tr>

In [173]:
Defenses = [i.text for i in Defenses_raw.find_all('td')]
Defenses

['77 ~ 159', '77 ~ 159', '77 ~ 159', '77 ~ 159', '58 ~ 119']

In [174]:
Resistances = [i.text for i in Resistances_raw.find_all('td')]
Resistances

['54 ~ 111', '58 ~ 119', 'E', 'E', 'E']

** Let's also get the enemy's name: **

In [175]:
Name = url.split('/')[-1].replace('+',' ')
Name

'Torch Hollows'

____

## Ok! Now, that we've successfully retrieve data from a single enemy, let's build a function using our previous commands so we can generalize for other instances later on!

In [176]:
def GetStats(url):
    html = requests.get(url)  
    bs_obj = BeautifulSoup(html.text, "lxml")  # creating BS object
    # Locating our wiki table
    wiki_table = bs_obj.find("div", {"id": "infobox"}).find_next().find_next() # Currently on  'Wiki_table'
    # Getting the stats
    Defenses_raw = wiki_table.find_all("tr")[3]
    Resistances_raw = wiki_table.find_all("tr")[5]
    Defenses = [i.text.strip().replace(u'\xa0', u' ') for i in Defenses_raw.find_all('td')]
    Resistances = [i.text.strip().replace(u'\xa0', u' ') for i in Resistances_raw.find_all('td')]
    Name = url.split('/')[-1].replace('+',' ')
    
    #result = [Name,Defenses,Resistances]
    return Name,Defenses,Resistances

** Alright, now we have a function that takes the enemy's url and returns a list containing the enemy's stats (Name, Defenses, Resistances). However, let's make this even better by crafting a function that returns a DataFrame instead **

In [177]:
Name = GetStats('https://darksouls.wiki.fextralife.com/Torch+Hollows')[0]
Defenses = GetStats('https://darksouls.wiki.fextralife.com/Torch+Hollows')[1]
Resistances = GetStats('https://darksouls.wiki.fextralife.com/Torch+Hollows')[2]

** Let's create two lists, one containing the enemy stats (attributes) and another containing the respective labels for these attributes **

In [178]:
Enemy_Attr = [Name] +Defenses + Resistances 

In [179]:
Enemy_Attr_Labels =['Name','Physical Defense','Strike Defense','Slash Defense','Thrust Defense','Magic Defense',
 'Fire Defense','Lightning Defense','Poison Resistance','Toxic Resistance','Bleed Resistance']

** Now, let's par up the two lists into a dictionary object. **

In [180]:
dictionary = dict(zip(Enemy_Attr_Labels, Enemy_Attr))

** Finally, we will cast our dictionary into the DataFrame function! **

In [181]:
pd.DataFrame(dictionary,index=[0])

Unnamed: 0,Name,Physical Defense,Strike Defense,Slash Defense,Thrust Defense,Magic Defense,Fire Defense,Lightning Defense,Poison Resistance,Toxic Resistance,Bleed Resistance
0,Torch Hollows,77 ~ 159,77 ~ 159,77 ~ 159,77 ~ 159,58 ~ 119,54 ~ 111,58 ~ 119,E,E,E


____

## Once again, let's converge the above lines of code into a single function!

In [182]:
def GetEnemy(url):
    Name = GetStats(url)[0]
    Defenses = GetStats(url)[1]
    Resistances = GetStats(url)[2]
    Enemy_Attr = [Name] +Defenses + Resistances 
    Enemy_Attr_Labels =['Name','Physical Defense','Strike Defense','Slash Defense','Thrust Defense','Magic Defense',
     'Fire Defense','Lightning Defense','Poison Resistance','Toxic Resistance','Bleed Resistance']
    dictionary = dict(zip(Enemy_Attr_Labels, Enemy_Attr))
    return pd.DataFrame(dictionary,index=[0])

In [183]:
#Example:
GetEnemy('https://darksouls.wiki.fextralife.com/Torch+Hollows')

Unnamed: 0,Name,Physical Defense,Strike Defense,Slash Defense,Thrust Defense,Magic Defense,Fire Defense,Lightning Defense,Poison Resistance,Toxic Resistance,Bleed Resistance
0,Torch Hollows,77 ~ 159,77 ~ 159,77 ~ 159,77 ~ 159,58 ~ 119,54 ~ 111,58 ~ 119,E,E,E


In [184]:
#Example:
GetEnemy('https://darksouls.wiki.fextralife.com/Moonlight+Butterfly')

Unnamed: 0,Name,Physical Defense,Strike Defense,Slash Defense,Thrust Defense,Magic Defense,Fire Defense,Lightning Defense,Poison Resistance,Toxic Resistance,Bleed Resistance
0,Moonlight Butterfly,184 ~ 306,184 ~306,184 ~306,184 ~306,230 ~ 384,110 ~183,148 ~246,S,S,S


___

** Cool, we've got ourselves a function that takes the URL of a enemy page and spits out a DataFrame of its stats! **

** &#9672; We've lit our first bonfire! ** <img src="https://thumbs.gfycat.com/EarnestActiveAkitainu-size_restricted.gif"  alt="drawing" width="360">

*"If you require rest, now is the time. That is, after all, what the bonfire is for."*

____

** We now need to get all the enemies' links into a single list, so we can iterate itvwith our function... **

In [185]:
url = 'https://darksouls.wiki.fextralife.com/Enemies'  ## This is the page containing all the links we'll need
html = requests.get(url)
bs_obj = BeautifulSoup(html.text,'lxml')

In [186]:
bs_obj.find_all('h3')[1:9]  ## These are all the sections within that page

[<h3 id="toc7">Hollows</h3>,
 <h3 id="toc63">Giants</h3>,
 <h3 id="toc101">AOTA Enemies</h3>,
 <h3 id="toc26">Skeletons</h3>,
 <h3>Demons</h3>,
 <h3> Other</h3>,
 <h3>  Wildlife</h3>,
 <h3 id="toc73">Special/Non-respawning</h3>]

** Let's take a single section as example: **

In [187]:
example = bs_obj.find_all('h3')[1:9][7] ## Special/Non-Respawning Section

In [188]:
links_h= ['https://darksouls.wiki.fextralife.com' +e['href'] for e in example.find_next().find_all('a')]
links_h  ## Here are the links for each enemy of this section:

['https://darksouls.wiki.fextralife.com/Crystal+Lizard',
 'https://darksouls.wiki.fextralife.com/Parasitic+Wall+Hugger',
 'https://darksouls.wiki.fextralife.com/Oscar+of+Astora',
 'https://darksouls.wiki.fextralife.com/The+Black+Knight',
 'https://darksouls.wiki.fextralife.com/Blowdart+Sniper',
 'https://darksouls.wiki.fextralife.com/Darkmoon+Soldier',
 'https://darksouls.wiki.fextralife.com/Possessed+Tree',
 'https://darksouls.wiki.fextralife.com/Giant+Cat',
 'https://darksouls.wiki.fextralife.com/Mimic',
 'https://darksouls.wiki.fextralife.com/The+Channeler',
 'https://darksouls.wiki.fextralife.com/Vagrants']

____

## Once again, let's converge the above lines of code into a single function!

In [189]:
def Get_All_Links(url):
    html = requests.get(url)
    bs_obj = BeautifulSoup(html.text,'lxml')
    links = []
    h3 = bs_obj.find_all('h3')[1:9]
    for i in h3:
        links += ['https://darksouls.wiki.fextralife.com' +j['href'] for j in i.find_next().find_all('a')]
    return links

___

## Ok! Let's take a look at the functions we have so far

** Function that takes the enemy URL and retrieves its stats **

In [190]:
def GetStats(url):
    html = requests.get(url)
    bs_obj = BeautifulSoup(html.text, "lxml")
    wiki_table = bs_obj.find("div", {"id": "infobox"}).find_next().find_next() # Currently on  'Wiki_table'
    Defenses_raw = wiki_table.find_all("tr")[3]
    Resistances_raw = wiki_table.find_all("tr")[5]
    Defenses = [i.text.strip().replace(u'\xa0', u' ') for i in Defenses_raw.find_all('td')]
    Resistances = [i.text.strip().replace(u'\xa0', u' ') for i in Resistances_raw.find_all('td')]
    Name = url.split('/')[-1].replace('+',' ')
    
    #result = [Name,Defenses,Resistances]
    return Name,Defenses,Resistances

** Function that takes enemy URL and spits out a DataFrame with its Stats! **

In [191]:
def GetEnemy(url):
    Name = GetStats(url)[0]
    Defenses = GetStats(url)[1]
    Resistances = GetStats(url)[2]
    Enemy_Attr = [Name] +Defenses + Resistances 
    Enemy_Attr_Labels =['Name','Physical Defense','Strike Defense','Slash Defense','Thrust Defense','Magic Defense',
     'Fire Defense','Lightning Defense','Poison Resistance','Toxic Resistance','Bleed Resistance']
    dictionary = dict(zip(Enemy_Attr_Labels, Enemy_Attr))
    return pd.DataFrame(dictionary,index=[0])

** Function that builds a list of all links of the [Enemies Section Page](https://darksouls.wiki.fextralife.com/Enemies"). **

In [192]:
def Get_All_Links(url):
    html = requests.get(url)
    bs_obj = BeautifulSoup(html.text,'lxml')
    links = []
    h3 = bs_obj.find_all('h3')[1:9]
    for i in h3:
        links += ['https://darksouls.wiki.fextralife.com' +j['href'] for j in i.find_next().find_all('a')]
    return links

____

*"Bonfire ahead"*

## Let the Scraping begin!

In [193]:
url = "https://darksouls.wiki.fextralife.com/Enemies"
links = Get_All_Links(url)

In [194]:
enemies = pd.DataFrame()
for l in links:
    enemies = pd.concat([enemies,GetEnemy(l)],sort=False)
    
enemies.reset_index(drop=True,inplace=True)

In [195]:
enemies.tail(20)

Unnamed: 0,Name,Physical Defense,Strike Defense,Slash Defense,Thrust Defense,Magic Defense,Fire Defense,Lightning Defense,Poison Resistance,Toxic Resistance,Bleed Resistance
68,Giant Leech,128,128,128,128,90,101,97,A,A,D
69,Giant Mosquito,70,70,70,70,53,48,53,A,A,D
70,Cragspider,216,216,216,216,167,1188,178,C,C,C
71,Chaos Bug,134,134,134,134,80,672,134,D,D,D
72,Man Serpent,359 ~ 462,359 ~ 462,359 ~ 462,359 ~ 462,191 ~ 246,161 ~ 207,154 ~ 198,B,B,D
73,Serpent Mage,340 ~ 438,340 ~438,340 ~ 438,340 ~ 438,172 ~ 222,151 ~ 195,144 ~ 186,B,B,D
74,Burrowing Rockworm,418,418,522,418,292,835,315,D,D,A
75,Slimes,1500,1500,1500,1500,488,75,488,S,S,S
76,Moonlight Butterfly,184 ~ 306,184 ~306,184 ~306,184 ~306,230 ~ 384,110 ~183,148 ~246,S,S,S
77,Crystal Lizard,184,184,230,184,162,125,162,S,S,S


____

** &#9672; By the Gods! We've kindled our second bonfire! ** <img src="https://thumbs.gfycat.com/EarnestActiveAkitainu-size_restricted.gif"  alt="drawing" width="360">

*"I see there is at least one legend among ye humans"*

## &#9678; Table 2 - Variations (HP, Souls and Location)

** As mentioned before, let's continue by getting all the variations of the enemies in a secondary table **

** As a standard, we should get one instance to illustrate what we're doing: **

In [196]:
url = 'https://darksouls.wiki.fextralife.com/Wheel+Skeleton' #1 enemy

html = requests.get(url)

In [197]:
bs_obj = BeautifulSoup(html.text, "lxml")

In [198]:
locations_table_raw =bs_obj.find_all("table", {"class": "wiki_table"})[-2].find_all('td')
locations_table_raw # Here are the attributes we're looking for:

[<td style="width: 473px;"><a class="wiki_link" href="/The+catacombs">The Catacombs</a></td>,
 <td style="width: 157.628px;">134</td>,
 <td style="width: 35.3715px;">273</td>,
 <td style="width: 136px;">400</td>,
 <td style="width: 136px;">1600</td>,
 <td style="width: 473px;"><a class="wiki_link" href="/Painted+World+of+Ariamis">Painted World of Ariamis</a></td>,
 <td style="width: 157.628px;">196</td>,
 <td style="width: 35.3715px;">310</td>,
 <td style="width: 136px;">800</td>,
 <td style="width: 136px;">1600</td>]

In [199]:
locations_table = [i.text.strip() for i in locations_table_raw]
locations_table

['The Catacombs',
 '134',
 '273',
 '400',
 '1600',
 'Painted World of Ariamis',
 '196',
 '310',
 '800',
 '1600']

** Depending on the enemy page, we may have some features missing from features for the 'Locations' table, thus we end up with different number of columns. You can spot this difference when comparing [this](https://darksouls.wiki.fextralife.com/Skeleton) with [that](https://darksouls.wiki.fextralife.com/Wheel+Skeleton). Did you notice how on the sencond link's table the 'Respawn' column is missing? **

** That's why we're going to be creating a variable for the number of column: **

In [200]:
num_col = len(bs_obj.find_all("table", {"class": "wiki_table"})[-2].find_all('tr')[-1].find_all('td'))
num_col

5

** We should break our locations_table list into sublists separating each instance that we're dealing with **

In [201]:
groups = [locations_table[i:i + num_col] for i in range(0, len(locations_table), num_col)]
groups

[['The Catacombs', '134', '273', '400', '1600'],
 ['Painted World of Ariamis', '196', '310', '800', '1600']]

** Now, let's aggregate each information into sublists containing similar information in its respective order: **

In [202]:
lista = list(range(num_col))
for i in range(num_col):
    lista[i] = [k[i] for k in groups]

In [203]:
lista

[['The Catacombs', 'Painted World of Ariamis'],
 ['134', '196'],
 ['273', '310'],
 ['400', '800'],
 ['1600', '1600']]

** We should not forget the character's name... **

In [204]:
Name = url.split('/')[-1].strip().replace('+',' ')

In [205]:
lista.insert(0,Name)

In [206]:
lista

['Wheel Skeleton',
 ['The Catacombs', 'Painted World of Ariamis'],
 ['134', '196'],
 ['273', '310'],
 ['400', '800'],
 ['1600', '1600']]

** Here are the attribute labels this time around:**

In [207]:
attr_labels = ['Name','Location','HP','NG+','Souls','NG+_s','Respawn']

In [208]:
dictionary = dict(zip(attr_labels, lista))

** Finaly, our dictionary for each enemy will look something like this: **

In [209]:
dictionary

{'Name': 'Wheel Skeleton',
 'Location': ['The Catacombs', 'Painted World of Ariamis'],
 'HP': ['134', '196'],
 'NG+': ['273', '310'],
 'Souls': ['400', '800'],
 'NG+_s': ['1600', '1600']}

In [210]:
pd.DataFrame(dictionary)

Unnamed: 0,Name,Location,HP,NG+,Souls,NG+_s
0,Wheel Skeleton,The Catacombs,134,273,400,1600
1,Wheel Skeleton,Painted World of Ariamis,196,310,800,1600


# We're getting there! Let's consolidate our progress by creating a function

In [211]:
def EnemyVariations( url ):
    html = requests.get(url)
    bs_obj = BeautifulSoup(html.text, "lxml")
    locations_table_raw =bs_obj.find_all("table", {"class": "wiki_table"})[-2].find_all('td')
    num_col = len(bs_obj.find_all("table", {"class": "wiki_table"})[-2].find_all('tr')[-1].find_all('td'))
    locations_table = [i.text.strip() for i in locations_table_raw]
    groups = [locations_table[i:i + num_col] for i in range(0, len(locations_table), num_col)]
    ##
    lista = list(range(num_col))
    for i in range(num_col):
        lista[i] = [k[i] for k in groups]
    Name = url.split('/')[-1].strip().replace('+',' ')
    lista.insert(0,Name)
    ##
    attr_labels = ['Name','Location','HP','NG+','Souls','NG+_s','Respawn']
    dictionary = dict(zip(attr_labels, lista))
    
    return pd.DataFrame(dictionary)

In [212]:
def Get_All_Links(url):
    html = requests.get(url)
    bs_obj = BeautifulSoup(html.text,'lxml')
    links = []
    h3 = bs_obj.find_all('h3')[1:9]
    for i in h3:
        links += ['https://darksouls.wiki.fextralife.com' +j['href'] for j in i.find_next().find_all('a')]
    return links

In [None]:
EnemyVariations('https://darksouls.wiki.fextralife.com/Torch+Hollows')

____

In [None]:
url ='https://darksouls.wiki.fextralife.com/Enemies'

In [None]:
links = Get_All_Links(url)

In [None]:
variations = pd.DataFrame()
for l in links:
    variations = pd.concat([variations,EnemyVariations(l)],sort=False)

In [218]:
variations.tail(20)

Unnamed: 0,Name,Location,HP,NG+,Souls,NG+_s,Respawn
1,The Black Knight,Undead Parish,532,1219,800,3200,No
2,The Black Knight,Darkroot Basin,603,1225,1800,7200,No
3,The Black Knight,The Catacombs,710,1299,1200,3600,No
4,The Black Knight,Northern Undead Asylum,745,1304,1200,3600,No
5,The Black Knight,Tomb of Giants,887,1393,1200,2400,No
6,The Black Knight,Kiln of the First Flame,958,1476,2000,4000,Yes
0,Blowdart Sniper,Blighttown,93,179,500,1500,No
0,Darkmoon Soldier,Anor Londo,767 (Balder Knight),1342,5000,15000,Yes
1,Darkmoon Soldier,Anor Londo,821 (Berenike Knight),1436,5000,15000,Yes
0,Possessed Tree,Darkwood Garden,540,1236,180,720,No


----

## &#9678; 3. Merging Datasets

In [214]:
complete = pd.merge(enemies, variations, how='inner')

** Reordering columns for better visualizatio: **

In [215]:
cols = complete.columns.tolist()

In [216]:
complete = complete[cols[0:1] + cols[-6:] + cols[1:-6]]

In [217]:
# Example:
complete[80:90]

Unnamed: 0,Name,Location,HP,NG+,Souls,NG+_s,Respawn,Physical Defense,Strike Defense,Slash Defense,Thrust Defense,Magic Defense,Fire Defense,Lightning Defense,Poison Resistance,Toxic Resistance,Bleed Resistance
80,Wheel Skeleton,The Catacombs,134,273,400,1600,,239 ~ 348,167 ~ 244,239 ~ 348,299 ~ 435,183 ~ 267,163 ~ 237,251 ~ 366,S,S,S
81,Wheel Skeleton,Painted World of Ariamis,196,310,800,1600,,239 ~ 348,167 ~ 244,239 ~ 348,299 ~ 435,183 ~ 267,163 ~ 237,251 ~ 366,S,S,S
82,Bone Tower,Tomb of Giants,291,479,250,500,Yes,371,260,371,464,287,251,410,S,S,S
83,Skeleton Baby,Tomb of Giants,36,56,100,200,Yes (Constant),207,145,207,259,161,142,262,S,S,S
84,Bat Wing Demon,Anor Londo,419,734,500,1500,Yes,226,226,226,226,216,167,512,C,C,C
85,Minor Taurus Demon,Demon Ruins,900,1440,1200,2400,Yes,103,103,103,103,88,77,67,B,B,B
86,Minor Capra Demon,Demon Ruins,589,996,800,2400,Yes,159,159,159,159,127,119,67,A,A,A
87,Stone Demon,Demon Ruins,196,314,300,600,Yes,435 ~ 456,435 ~ 456,435 ~ 456,435 ~ 456,283 ~ 297,869 ~ 912,869 ~ 912,S,S,S
88,Stone Demon,Lost Izalith,205,321,300,600,Yes,435 ~ 456,435 ~ 456,435 ~ 456,435 ~ 456,283 ~ 297,869 ~ 912,869 ~ 912,S,S,S
89,Crow Demon,Painted World of Ariamis,930,1460,1000,2000,Yes,306,306,230,306,231,183,231,C,C,D


** &#9672; Praise the Sun! You have linked the Fire! **

*"The flow of time itself is convoluted; with heroes centuries old phasing in and out"*


<tr>
<td> <img src="https://thumbs.gfycat.com/HollowElectricHammerheadshark-size_restricted.gif" alt="Drawing" style="width: 450px;"/> </td>
<td> <img src="https://66.media.tumblr.com/3e0446ba78ba444f64450d00cf1efb0d/tumblr_o8izfsEPwf1rf5ar6o1_250.gif" alt="Drawing" style="width: 250px;"/> </td>
</tr>

_____

*"Well, I'm going to have myself a little nap. The only thing to do, really, after a nice toast. You are a true friend. Best of luck with your duty."*
