# Parser
ACNH Flowers by crow625

In Animal Crossing: New Horizons, each flower has three (or four, in the case of roses) genes which can take values of 0, 1, or 2. The combination of these genes determines the color of the flower. When two flowers are selected to breed, each of their genes are passed down like real genes, which can be modeled by a Punnett Square. A 0 represents a recessive homozygous allele, a 1 represents a heterozygous allele, and a 2 represents a dominant homozygous allele. 

This file scrapes the animal crossing fandom wiki to retrieve the phenotypes of each flower based on their genes. For example, a rose with genes R1Y0W2B2 (red 1, yellow 0, white 2, brightness 2) is purple. The scraped data is stored in a csv file for use in the ACNH Flowers web app.

Thank you to the contributors on animalcrossing.fandom.com.

This cell imports the required packages and saves the request from the website.

In [1]:
import pandas as pd
import requests
import bs4

url = "https://animalcrossing.fandom.com/wiki/Flowers/New_Horizons_mechanics"
response = requests.get(url)

This cell locates the two desired tables, the rose table and the 3-gene flower table, and saves them as variables.

In [2]:
soup = bs4.BeautifulSoup(response.text, 'html.parser')
tables = soup.find_all(class_="roundy mw-collapsible")
rose_table = tables[0]
flower_table = tables[1]

The following two cells trim down the tables to be only the desired rows of the table (trimming headers and footers).

In [3]:
rose_table = rose_table.find_all("tr")
flower_table = flower_table.find_all("tr")

In [4]:
rose_table = rose_table[2:-1]
flower_table = flower_table[3:-1]

This cell defines a function to get the color of a flower from a string. The strings passed in are alt text for images of the flowers, since the tables on fandom use images rather than text to represent the flower colors.

In [5]:
def get_color(img_name):
    for color in ["white", "red", "yellow", "pink", "orange", "blue", "purple", "black", "green", "gold"]:
        if img_name.find(color) >= 0:
            return color
    return "unknown"

This cell defines a function to generate a flower's color string given its genes. For example, a flower with 0 red, 1 yellow, and 1 white will have color string R0Y1W1.

In [25]:
def color_string(red, yellow, white, b=None):
    if b != None:
        return "R" + str(red) + "Y" + str(yellow) + "W" + str(white) + "B" + str(b)
    return "R" + str(red) + "Y" + str(yellow) + "W" + str(white)

This cell initializes a place to store the data, then iterates over both tables simultaneously. In each iteration, it locates and stores the RYW values and saves the flower's color in its respective list. Roses are given three columns, one for each B value. 

Additionally, if the flower is a seed, it stores its genes in the seed table corresponding to its color and type.
An exception is made for windflowers: since there is no yellow windflower, instead of a seed yellow, there is a seed orange. The genes of this seed orange are stored in the seeds table as if it were a seed yellow for the sake of restricting all the data to three colors.

In [35]:
data = {
    "red": [],
    "yellow": [],
    "white": [],
    "B0": [],
    "B1": [],
    "B2": [],
    "tulips": [],
    "pansies": [],
    "cosmos": [],
    "lilies": [],
    "hyacinths": [],
    "windflowers": [],
    "mums": []
}

seeds = {
    "red": {},
    "yellow": {},
    "white": {}
}
    
for flower_row in zip(rose_table, flower_table):
    roses = flower_row[0].find_all("td")
    flowers = flower_row[1].find_all("td")
    
    row = {}

    row["red"] = roses[0].text.strip()
    row["yellow"] = roses[1].text.strip()
    row["white"] = roses[2].text.strip()
    row["B0"] = roses[3]
    row["B1"] = roses[4]
    row["B2"] = roses[5]
    row["tulips"] = flowers[3]
    row["pansies"] = flowers[4]
    row["cosmos"] = flowers[5]
    row["lilies"] = flowers[6]
    row["hyacinths"] = flowers[7]
    row["windflowers"] = flowers[8]
    row["mums"] = flowers[9]
    
    for key in row.keys():
        # don't do any additional processing on the genes
        if (key == "red" or key == "yellow" or key == "white"):
            data[key].append(row[key])
        else:
            # fetch the flower's color and add it to the table
            color = get_color(row[key].find("img")["alt"])
            data[key].append(color)
            # if this is a seed flower, its background color will be #AED6F1
            if row[key]["style"].find("#AED6F1") >= 0:
                # if a rose, get the color string with a B value
                if key == "B0" or key == "B1" or key == "B2":
                    c_string = color_string(data["red"][-1], data["yellow"][-1], data["white"][-1], key[-1])
                    seeds[color]["roses"] = c_string
                # if not a rose, get the 3-gene color string
                else:
                    c_string = color_string(data["red"][-1], data["yellow"][-1], data["white"][-1])
                    # There is no seed yellow windflower (instead, there is a seed orange)
                    # So the seed orange is stored as yellow
                    if key == "windflowers" and color == "orange":
                        seeds["yellow"][key] = c_string
                    else:
                        seeds[color][key] = c_string
                

This cell converts the flower color data from the previous cell to a pandas dataframe and writes it to 'flowers.csv'

In [70]:
df = pd.DataFrame(data)
df.to_csv('flowers.csv', index=False)

This cell converts the flower seed data from the previous cell to a pandas dataframe and writes it to 'seeds.csv'

In [38]:
df = pd.DataFrame(seeds)
df.to_csv('seeds.csv', index=True)