# Mini-Web Scrapper - Scrapping Animal Crossing Portal for villager popularity list.

### Introduction:
For this web scrapping project, we are going to be scrapping https://www.animalcrossingportal.com/, a fan made animal crossing community website. Specifically, The [Animal Crossing New horizons villager popularity list page.](https://www.animalcrossingportal.com/games/new-horizons/guides/villager-popularity-list.php#/)

The goal of this project is to obtain a full list of the villager's popularity ranking in anch. 
The page contains 6 tiers, classifying villagers into rankings of most popular to least popular, as we go down the page. Within each tier, Vilagers are ranked from most popular to least popular within their own tier. 

- The "Highest Popularity" tier contains 15 villagers
- The "Very Popular" tier contains 25 villagers
- The "Fairly Popular" tier contains 30 villagers
- The "Middle Ground" tier contains 60 villagers
- The "Less Popular" tier contains 120 villagers
- The "Least Popular" tier contains 163 villagers

We will attempt to scrap the tier of each villager, as well as their ranking in each tier from the website. 
For this project, we will be using Python, Beautiful Soup, pathlib, and Selenium webdriver. 

### Brief Description
From this project, I hope to obtain information about the popularity of each of the villagers. I will be combining the popularity data with another villager characteristic dataset , to see if there are any characteristics that correspond to higher villager popularity. 



### Outline: How are we going to extract them? 

- Obtain page https://www.animalcrossingportal.com/games/new-horizons/guides/villager-popularity-list.php#/
- obtain villager information(rank, name and tier) from every tier, and store in dictionary
- concatonate the tier dictionaries together to form a Dataframe of all tiers
- convert Dataframe in csv file and store in "/data"

In [1]:
!pip install selenium --quiet

#### Current Path:

In [13]:
from pathlib import Path
cwd = Path().resolve()
str(cwd)

'/Users/there/Desktop/acnh-scrapper'

### Initialize and Obtain page html
- html page is initilized and stored into acnh_doc
- function returns acnh_doc as BeautifulSoup object. 

In [14]:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from bs4 import BeautifulSoup
from pathlib import Path

def get_acnhpage(path):
    # don't use executable_path as it is outdated?
    driver_path = path+"/chromedriver"
    service = Service(driver_path)
    options = webdriver.ChromeOptions()
    driver = webdriver.Chrome(service=service, options=options)
    
    url = "https://www.animalcrossingportal.com/games/new-horizons/guides/villager-popularity-list.php#/"
    driver.get(url)
    
    acnh_doc = driver.page_source
    acnh_doc = BeautifulSoup(acnh_doc, 'html.parser')
    
    return acnh_doc

`get_acnhpage(path)` takes in the current file path, it also initializes the animal crossing villager popularity page html and returns the BeautifulSoup-ed html doc. 

 ### Parses information from every tier
 - takes in acnh_doc file, and selects all tier tags
 - iterates through each tier(total=6) and obtain rank and names of villager in each tier 
 - Create dictionary for each tier, and store villager rank and names.
 - returns a list of tier dictionaries. 

In [15]:
# returns dictionary list of tier info
def get_tiers(acnh_doc):
    tier_select = "c-tier"
    # only first 6 tier lists are related to acnh
    tier_tag = acnh_doc.find_all('div', {"class":tier_select})[:-3]

    tier1 = {}
    tier2 = {}
    tier3 = {}
    tier4 = {}
    tier5 = {}
    tier6 = {}
    dict_list = [tier1, tier2, tier3, tier4, tier5, tier6]
    
    # pass in tier tags to get_villagers(tier_tag)
    # store return
    for tier in range(len(tier_tag)):
        rank, name = get_villagers(tier_tag[tier])
        dict_list[tier] = {"rank": rank,
                           "name": name,
                           "tier": tier+1}
    return dict_list

`get_tiers(acnh_doc)` takes html doc, by iterating through each tier tag and calling `get_villagers()`, returns a list of dictionaries containing villager data from each tier. 

### Parses villager info from each tier
- takes in a tier tag, and selects all villager tags in the tier. 
- parses and returns rank and names of villager in tier.                 

In [16]:
# villager tag of ONE tier(tier 1)
def get_villagers(tier):
    villager_select = "c-villager"
    villager_tag = tier.find_all("div", {"class": villager_select})
    
    rank = []
    name = []
    
    
    for villager in villager_tag:
        name_rank = villager.find_all("p")
        rank.append(name_rank[0].text)
        name.append(name_rank[1].text)
    
    return rank, name

`get_villagers` takes in a sinlge tier tag, and returns rank and names of villagers in tier. 

### Helper function: combines all tier dicts into one big df

In [17]:
import pandas as pd
# pass in list of dictoinaries(of tier list), returns combined df of all tiers. 
def concat_tiers(dict_list):
    df = pd.DataFrame()
    for tier in dict_list:
        tier_df = pd.DataFrame(tier)
        df = pd.concat([df, tier_df])
    return df

`concat_tiers` concats all tier dictionaries into a dataframe. 

### Putting it all together!
- call `get_acnhpage(path)` to obtain html doc
- pass html doc into get_tiers to obtain villager data from each tier. 
- combine all tier data to from a Full Dataframe of all villager data. 
- store Dataframe into csv file. 

In [18]:
def scrape_acnh():
    # initializes driver and page doc. 
    path = str(Path().resolve())
    acnh_doc = get_acnhpage(path)
    
    # obtains data from all tiers in a list of dictionaries. 
    tiers_dict_list = get_tiers(acnh_doc)
    
    acnh_villagers_df = concat_tiers(tiers_dict_list)
    
    acnh_villagers_df.to_csv(path+"/acnh_villager_data.csv", index=None)
    return acnh_villagers_df

In [19]:
scrape_acnh()

Unnamed: 0,rank,name,tier
0,1,Raymond,1
1,2,Marshal,1
2,3,Shino,1
3,4,Sherb,1
4,5,Sasha,1
...,...,...,...
158,159,Rocco,6
159,160,Bettina,6
160,161,Boris,6
161,162,Bitty,6


## Reference and Summary
- Summary: we parsed information from [ACNH villager popularity page.], (https://www.animalcrossingportal.com/games/new-horizons/guides/villager-popularity-list.php#/), and stored villager info in csv file. 
- Ideas for future projects: the follow up to this project is a EDA I will be performing on this dataset in combination with another Animal Crossing Dataset. 

### Reflection:
This was a relatively small and easy web scrapping project, it was also my introduction to Selenium, and it's many capabilities. Unfortunately none of it's higher functionalities came into use in this mini-project. In future proejcts I wish to make use and familiarize myself with Selenium. 


In [24]:
!pip install jovian --quiet

In [25]:
import jovian
jovian.commit(filename='acnh-scrapper.ipynb', outputs=["acnh_villager_data.csv"])

<IPython.core.display.Javascript object>

[jovian] Updating notebook "ampiiere/acnh-scrapper" on https://jovian.ai/[0m
[jovian] Uploading additional outputs...[0m
[jovian] Committed successfully! https://jovian.ai/ampiiere/acnh-scrapper[0m


'https://jovian.ai/ampiiere/acnh-scrapper'