# Capstone Project - League of Legends Champion Recommender

> Author: Ryan Yong

**Summary:**   
- Develop a Recommender System for recommending champions to users based on their account mastery points.
- Training data: Account & Champion Data

There are a total of 7 notebooks for this project:  
 1. `01a_data_scrape.ipynb`   
 2. `01b_wiki_scrape_fail.ipynb`   
 3. `02_champion_dataset_EDA.ipynb`
 4. `03_account_dataset_EDA.ipynb`
 5. `04_intial_recommender_system.ipynb`
 6. `05_final_hybrid_system.ipynb`
 7. `06_implementation.ipynb`

---
**This Notebook**
- Scrapes the champion data from the official [League of Legends Wikipedia Page](https://leagueoflegends.fandom.com/wiki/League_of_Legends_Wiki)
- Attempted to create the `champion_data.csv`

## Attempt 1:

In [None]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

In [4]:
# Function to scrape champion information from a given URL
def scrape_champion_info(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # Extract champion name
    champion_name = soup.find('h1', class_='page-header__title').text.strip()
    
    # Initialize variables for champion information
    champion_class = champion_position = champion_resource = champion_range_type = champion_adaptive_type = champion_ratings = "N/A"
    
    # Extract champion information if available
    champion_info_table = soup.find('table', class_='article-table')
    if champion_info_table:
        rows = champion_info_table.find_all('tr')
        for row in rows:
            cols = row.find_all('td')
            if len(cols) == 2:
                attribute = cols[0].text.strip()
                value = cols[1].text.strip()
                if attribute == "Class":
                    champion_class = value
                elif attribute == "Position":
                    champion_position = value
                elif attribute == "Resource":
                    champion_resource = value
                elif attribute == "Attack range":
                    champion_range_type = value
                elif attribute == "Adaptive type":
                    champion_adaptive_type = value
    
    # Extract ratings
    ratings_element = soup.find('span', class_='hero-overview__rating-value')
    if ratings_element:
        champion_ratings = ratings_element.text.strip()
    
    return {
        'Champion': champion_name,
        'Class': champion_class,
        'Position': champion_position,
        'Resource': champion_resource,
        'Range Type': champion_range_type,
        'Adaptive Type': champion_adaptive_type,
        'Ratings': champion_ratings
    }

# List of champion URLs
champion_urls = [
    'https://leagueoflegends.fandom.com/wiki/Aatrox/LoL',
    # Add other champion URLs here
]

# List to store champion information
champion_data = []

# Scrape champion information from each URL
for url in champion_urls:
    champion_info = scrape_champion_info(url)
    champion_data.append(champion_info)

# Create DataFrame from champion data
champion_df = pd.DataFrame(champion_data)

# Display the DataFrame
print(champion_df)




                     Champion Class Position Resource Range Type  \
0  Aatrox (League of Legends)   N/A      N/A      N/A        N/A   

  Adaptive Type Ratings  
0           N/A     N/A  


As shown above, dataset revealed the columns within the dataframe. however, all data was N/A rather than the data itself. in an attempt to troubleshoot, the soup variable was called out to see what the json file showed

In [2]:
soup

<!DOCTYPE html>

<html class="client-nojs sse-other" dir="ltr" lang="en">
<head>
<meta charset="utf-8"/>
<title>Aatrox (League of Legends) | League of Legends Wiki | Fandom</title>
<script>document.documentElement.className="client-js sse-other";RLCONF={"wgBreakFrames":false,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgRequestId":"4a5b06736a3212d0eb39861f603c278d","wgCSPNonce":false,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":false,"wgNamespaceNumber":0,"wgPageName":"Aatrox/LoL","wgTitle":"Aatrox/LoL","wgCurRevisionId":3704745,"wgRevisionId":3704745,"wgArticleId":1452145,"wgIsArticle":true,"wgIsRedirect":false,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Pages using DynamicPageList3 parser function","Pages using DynamicPageList3 dplreplace parser function","Champion

After looking at the json file, a second attempt was made to scrape the data

## Attempt 2

In [7]:
# Function to scrape champion information from a given URL
def scrape_champion_info(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # Extract champion name
    champion_name = soup.find('h1', class_='page-header__title').text.strip()
    
    # Initialize variables for champion information
    champion_class = champion_position = champion_resource = champion_range_type = champion_adaptive_type = "N/A"
    ratings = {
        'Damage': 'N/A',
        'Toughness': 'N/A',
        'Control': 'N/A',
        'Utility': 'N/A',
        'Style': 'N/A',
        'Difficulty': 'N/A'
    }
    
    # Find the champion infobox table
    infobox = soup.find('aside', class_='portable-infobox')
    
    # Extract class, position, resource, range type, and adaptive type
    if infobox:
        rows = infobox.find_all('div', class_='pi-item')
        for row in rows:
            label = row.find('h3', class_='pi-data-label')
            value = row.find('div', class_='pi-data-value')
            if label and value:
                label_text = label.text.strip()
                value_text = value.text.strip()
                if label_text == "Class":
                    champion_class = value_text
                elif label_text == "Position":
                    champion_position = value_text
                elif label_text == "Resource":
                    champion_resource = value_text
                elif label_text == "Range type":
                    champion_range_type = value_text
                elif label_text == "Adaptive type":
                    champion_adaptive_type = value_text
    
    # Extract ratings
    ratings_table = infobox.find('table', class_='article-table')
    if ratings_table:
        rows = ratings_table.find_all('tr')
        for row in rows:
            cells = row.find_all('td')
            if len(cells) == 2:
                rating_label = cells[0].text.strip()
                rating_value = cells[1].text.strip()
                if rating_label in ratings:
                    ratings[rating_label] = rating_value
    
    return {
        'Champion': champion_name,
        'Class': champion_class,
        'Position': champion_position,
        'Resource': champion_resource,
        'Range Type': champion_range_type,
        'Adaptive Type': champion_adaptive_type,
        **ratings
    }

# URL of the champion page
champion_url = 'https://leagueoflegends.fandom.com/wiki/Aatrox/LoL'

# Scrape champion information from the URL
champion_info = scrape_champion_info(champion_url)

# Print the champion information
print(champion_info)


{'Champion': 'Aatrox (League of Legends)', 'Class': 'N/A', 'Position': 'N/A', 'Resource': 'N/A', 'Range Type': 'N/A', 'Adaptive Type': 'N/A', 'Damage': 'N/A', 'Toughness': 'N/A', 'Control': 'N/A', 'Utility': 'N/A', 'Style': 'N/A', 'Difficulty': 'N/A'}


As seen above, a dictionary was formed. However, once again, all the data is N/A. Due to time constraints. The dataset was scraped manually. Future work here should be done to attempt to automate this. However, this will not be a priority at the moment as given the manually obtained dataset is universally applicable, and only needs updating should additional champions arrive or if Riot decides to do a rehauling of their categorical classification system.