Rarity Analysis will look at items listed in the catalog and measures how difficult it would be to acquire. 

This notebook lays the groundwork for that analysis by doing the following:
1. **Defines items** (by exploring the available catalogs.)
3. Defines criteria to be measured against to **determine rarity** (by exploring the columns in each catalog. Any dimension that indicates availability/restrictions/etc. could factor into the rarity analysis.)
4. Identify **cleaning requirements** (what can be removed? what should be updated?)
5. Determine the **rarity scale** (based on the values within relevant columns, what items are rare and what items are common?) 

In [1]:
import os
import sys
import pandas as pd
from pathlib import Path

In [2]:
notebook_dir = Path(os.getcwd()) #get directory where this notebook is located
src_dir = os.path.join(notebook_dir, '..', 'src') #create relative path to 'src' directory
if os.path.exists(src_dir):
    sys.path.append(src_dir)
    print(f"Added {src_dir} to sys.path")
else:
    print(f"Error: {src_dir} does not exist")

Added C:\Users\Code Lou\Documents\ACNH_Aesthetic_Rarity_Guide\notebooks\..\src to sys.path


<h4>Load Data</h4>

In [3]:
from initial_load import load_data
data = load_data(data_folder='data')

Loaded accessories with 222 rows and 22 columns.
Loaded achievements with 84 rows and 21 columns.
Loaded art with 70 rows and 26 columns.
Loaded bags with 96 rows and 20 columns.
Loaded bottoms with 726 rows and 20 columns.
Loaded construction with 236 rows and 7 columns.
Loaded dress-up with 913 rows and 22 columns.
Loaded fencing with 19 rows and 11 columns.
Loaded fish with 80 rows and 41 columns.
Loaded floors with 176 rows and 19 columns.
Loaded fossils with 73 rows and 14 columns.
Loaded headwear with 698 rows and 22 columns.
Loaded housewares with 3275 rows and 32 columns.
Loaded insects with 80 rows and 38 columns.
Loaded miscellaneous with 1307 rows and 31 columns.
Loaded music with 98 rows and 13 columns.
Loaded other with 353 rows and 15 columns.
Loaded photos with 3128 rows and 20 columns.
Loaded posters with 452 rows and 13 columns.
Loaded reactions with 44 rows and 5 columns.
Loaded recipes with 595 rows and 24 columns.
Loaded rugs with 132 rows and 19 columns.
Loaded sho

For the sake of simplicity and relevance, *items* include:
**accessories**, **art**, **bags**, **bottoms**, **dress-up**, **headwear**, **housewares**, **rugs**, **shoes**, **socks**, **tools**, **tops**, **umbrellas**

<h4>Explore item dimensions (columns)</h4>

In [4]:
#Accessories
accessories_df = data['accessories'] #access the Accessories data from the dictionary
print(accessories_df.columns) #retrieve the column names
#print(accessories_df.head()) #access the first 5 values in each column


Index(['Name', 'Variation', 'DIY', 'Buy', 'Sell', 'Color 1', 'Color 2', 'Size',
       'Miles Price', 'Source', 'Source Notes', 'Seasonal Availability',
       'Mannequin Piece', 'Version', 'Style', 'Label Themes', 'Type',
       'Villager Equippable', 'Catalog', 'Filename', 'Internal ID',
       'Unique Entry ID'],
      dtype='object')


In [5]:
#Art
art_df = data['art'] #access the item's data from the dictionary
print(art_df.columns) #retrieve the column names
#print(art_df.head()) #access the first 5 values in each column

Index(['Name', 'Genuine', 'Category', 'Buy', 'Sell', 'Color 1', 'Color 2',
       'Size', 'Real Artwork Title', 'Artist', 'Museum Description', 'Source',
       'Source Notes', 'Version', 'HHA Concept 1', 'HHA Concept 2',
       'HHA Series', 'HHA Set', 'Interact', 'Tag', 'Speaker Type',
       'Lighting Type', 'Catalog', 'Filename', 'Internal ID',
       'Unique Entry ID'],
      dtype='object')


In [6]:
#Bags
bags_df = data['bags'] #access the item's data from the dictionary
print(bags_df.columns) #retrieve the column names
#print(bags_df.head()) #access the first 5 values in each column

Index(['Name', 'Variation', 'DIY', 'Buy', 'Sell', 'Color 1', 'Color 2', 'Size',
       'Miles Price', 'Source', 'Source Notes', 'Seasonal Availability',
       'Version', 'Style', 'Label Themes', 'Villager Equippable', 'Catalog',
       'Filename', 'Internal ID', 'Unique Entry ID'],
      dtype='object')


In [7]:
#Bottoms
bottoms_df = data['bottoms'] #access the item's data from the dictionary
print(bottoms_df.columns) #retrieve the column names
#print(bottoms_df.head()) #access the first 5 values in each column

Index(['Name', 'Variation', 'DIY', 'Buy', 'Sell', 'Color 1', 'Color 2', 'Size',
       'Source', 'Source Notes', 'Seasonal Availability', 'Mannequin Piece',
       'Version', 'Style', 'Label Themes', 'Villager Equippable', 'Catalog',
       'Filename', 'Internal ID', 'Unique Entry ID'],
      dtype='object')


In [8]:
#dress-up
dressup_df = data['dress-up'] #access the item's data from the dictionary
print(dressup_df.columns) #retrieve the column names
#print(dressup_df.head()) #access the first 5 values in each column

Index(['Name', 'Variation', 'DIY', 'Buy', 'Sell', 'Color 1', 'Color 2', 'Size',
       'Source', 'Source Notes', 'Seasonal Availability', 'Mannequin Piece',
       'Version', 'Style', 'Label Themes', 'Villager Equippable', 'Catalog',
       'Primary Shape', 'Secondary Shape', 'Filename', 'Internal ID',
       'Unique Entry ID'],
      dtype='object')


In [10]:
#headwear
headwear_df = data['headwear'] #access the item's data from the dictionary
print(headwear_df.columns) #retrieve the column names
#print(headwear_df.head()) #access the first 5 values in each column

Index(['Name', 'Variation', 'DIY', 'Buy', 'Sell', 'Color 1', 'Color 2', 'Size',
       'Miles Price', 'Source', 'Source Notes', 'Seasonal Availability',
       'Mannequin Piece', 'Version', 'Style', 'Label Themes', 'Type',
       'Villager Equippable', 'Catalog', 'Filename', 'Internal ID',
       'Unique Entry ID'],
      dtype='object')


In [11]:
#housewares
housewares_df = data['housewares'] #access the item's data from the dictionary
print(housewares_df.columns) #retrieve the column names
#print(housewares_df.head()) #access the first 5 values in each column

Index(['Name', 'Variation', 'Body Title', 'Pattern', 'Pattern Title', 'DIY',
       'Body Customize', 'Pattern Customize', 'Kit Cost', 'Buy', 'Sell',
       'Color 1', 'Color 2', 'Size', 'Miles Price', 'Source', 'Source Notes',
       'Version', 'HHA Concept 1', 'HHA Concept 2', 'HHA Series', 'HHA Set',
       'Interact', 'Tag', 'Outdoor', 'Speaker Type', 'Lighting Type',
       'Catalog', 'Filename', 'Variant ID', 'Internal ID', 'Unique Entry ID'],
      dtype='object')


In [13]:
#rugs
rugs_df = data['rugs'] #access the item's data from the dictionary
print(rugs_df.columns) #retrieve the column names
#print(rugs_df.head()) #access the first 5 values in each column

Index(['Name', 'DIY', 'Buy', 'Sell', 'Color 1', 'Color 2', 'Size',
       'Miles Price', 'Source', 'Source Notes', 'Version', 'HHA Concept 1',
       'HHA Concept 2', 'HHA Series', 'Tag', 'Catalog', 'Filename',
       'Internal ID', 'Unique Entry ID'],
      dtype='object')


In [14]:
#shoes
shoes_df = data['shoes'] #access the item's data from the dictionary
print(shoes_df.columns) #retrieve the column names
#print(shoes_df.head()) #access the first 5 values in each column

Index(['Name', 'Variation', 'DIY', 'Buy', 'Sell', 'Color 1', 'Color 2', 'Size',
       'Miles Price', 'Source', 'Source Notes', 'Seasonal Availability',
       'Mannequin Piece', 'Version', 'Style', 'Label Themes',
       'Villager Equippable', 'Catalog', 'Filename', 'Internal ID',
       'Unique Entry ID'],
      dtype='object')


In [15]:
#socks
socks_df = data['socks'] #access the item's data from the dictionary
print(socks_df.columns) #retrieve the column names
#print(socks_df.head()) #access the first 5 values in each column

Index(['Name', 'Variation', 'DIY', 'Buy', 'Sell', 'Color 1', 'Color 2', 'Size',
       'Miles Price', 'Source', 'Source Notes', 'Seasonal Availability',
       'Mannequin Piece', 'Version', 'Style', 'Label Themes',
       'Villager Equippable', 'Catalog', 'Filename', 'Internal ID',
       'Unique Entry ID'],
      dtype='object')


In [29]:
#tools
tools_df = data['tools'] #access the item's data from the dictionary
print(tools_df.columns) #retrieve the column names
#print(tools_df.head()) #access the first 5 values in each column

Index(['Name', 'Variation', 'Body Title', 'DIY', 'Customize', 'Kit Cost',
       'Uses', 'Stack Size', 'Buy', 'Sell', 'Color 1', 'Color 2', 'Size',
       'Set', 'Miles Price', 'Source', 'Source Notes', 'Version', 'Filename',
       'Variant ID', 'Internal ID', 'Unique Entry ID'],
      dtype='object')


In [16]:
#tops
tops_df = data['tops'] #access the item's data from the dictionary
print(tops_df.columns) #retrieve the column names
#print(tops_df.head()) #access the first 5 values in each column

Index(['Name', 'Variation', 'DIY', 'Buy', 'Sell', 'Color 1', 'Color 2', 'Size',
       'Miles Price', 'Source', 'Source Notes', 'Seasonal Availability',
       'Mannequin Piece', 'Version', 'Style', 'Label Themes',
       'Villager Equippable', 'Catalog', 'Filename', 'Internal ID',
       'Unique Entry ID'],
      dtype='object')


In [17]:
#umbrellas
umbrellas_df = data['umbrellas'] #access the item's data from the dictionary
print(umbrellas_df.columns) #retrieve the column names
#print(umbrellas_df.head()) #access the first 5 values in each column

Index(['Name', 'DIY', 'Buy', 'Sell', 'Color 1', 'Color 2', 'Size',
       'Miles Price', 'Source', 'Source Notes', 'Version',
       'Villager Equippable', 'Catalog', 'Filename', 'Internal ID',
       'Unique Entry ID'],
      dtype='object')


**Seasonal Availability**, **Source**, **Source Notes**, and **Catalog** are all relevant to the Rarity Analysis. 

<h4>Identify dataframes that contain relevant dimensions</h4>

**Seasonal Availability**, **Source**, **Source Notes**, and **Catalog** are all relevant to the Rarity Analysis. 
1. **Seasonal Availability**: indicates restrictions on **when** item is available
2. **Source**: indicates restrictions on **where** item is available
3. **Source Notes**: indicates restrictions on **where** item is available
4. **Catalog**: indicates restrictions on **how** item is available

In [20]:
#Get all dataframes containing x column
#def get_dfs_with_column(data, column_name): #create the function that'll return relevant dataframes
#    items = []
#    for name, df in data.items():
#        if column_name in df.columns:
#            items.append(name)
#    return items

In [21]:
#Get all dataframes containing Seasonal Availability column
def get_dfs_with_column(data, column_name): #create the function that'll return relevant dataframes
    items = []
    for name, df in data.items():
        if column_name in df.columns:
            items.append(name)
    return items
#call the function
results = get_dfs_with_column(data, 'Seasonal Availability')
print('Dataframes with "Seasonal Availability" Column: ', results)

Dataframes with "Seasonal Availability" Column:  ['accessories', 'bags', 'bottoms', 'dress-up', 'headwear', 'shoes', 'socks', 'tops']


In [22]:
#Get all dataframes containing Source column
def get_dfs_with_column(data, column_name): #create the function that'll return relevant dataframes
    items = []
    for name, df in data.items():
        if column_name in df.columns:
            items.append(name)
    return items
#call the function
results = get_dfs_with_column(data, 'Source')
print('Dataframes with "Source" Column: ', results)

Dataframes with "Source" Column:  ['accessories', 'art', 'bags', 'bottoms', 'construction', 'dress-up', 'fencing', 'floors', 'fossils', 'headwear', 'housewares', 'miscellaneous', 'music', 'other', 'photos', 'posters', 'reactions', 'recipes', 'rugs', 'shoes', 'socks', 'tools', 'tops', 'umbrellas', 'wall-mounted', 'wallpaper']


In [23]:
#Get all dataframes containing Source Notes column
def get_dfs_with_column(data, column_name): #create the function that'll return relevant dataframes
    items = []
    for name, df in data.items():
        if column_name in df.columns:
            items.append(name)
    return items
#call the function
results = get_dfs_with_column(data, 'Source Notes')
print('Dataframes with "Source Notes" Column: ', results)

Dataframes with "Source Notes" Column:  ['accessories', 'art', 'bags', 'bottoms', 'dress-up', 'fencing', 'floors', 'headwear', 'housewares', 'miscellaneous', 'music', 'other', 'posters', 'reactions', 'recipes', 'rugs', 'shoes', 'socks', 'tools', 'tops', 'umbrellas', 'wall-mounted', 'wallpaper']


In [24]:
#Get all dataframes containing Catalog column
def get_dfs_with_column(data, column_name): #create the function that'll return relevant dataframes
    items = []
    for name, df in data.items():
        if column_name in df.columns:
            items.append(name)
    return items
#call the function
results = get_dfs_with_column(data, 'Catalog')
print('Dataframes with "Catalog" column: ', results)

Dataframes with "Catalog" column:  ['accessories', 'art', 'bags', 'bottoms', 'dress-up', 'floors', 'fossils', 'headwear', 'housewares', 'miscellaneous', 'music', 'photos', 'posters', 'rugs', 'shoes', 'socks', 'tops', 'umbrellas', 'wall-mounted', 'wallpaper']


**TO DO**: Combine relevant dataframes into a single rarity_df - *see clean_catalogs.py*

<h4>Cleaning To Do's</h4>

**Narrow the scope of items we're evaluating.**

1. Concat the item dataframes (key value = Unique Entry ID)
2. Remove irrelevant columns (keep = Seasonal Availability, Source, Source Notes, and Catalog)
3. Add a column that indicates the source dataframe (new column name = Item Type)

**Resolve seasonal categorization errors.**

Item dataframes sometimes used the "Source" and "Source Notes"columns to indicates seasonal restrictions instead of "Seasonal Availability". 

Some dataframes *did not contain* a "Seasonal Availability" column to start with.

This should be updated as follows:
1. If Source Notes = "Only available during Fall" update Seasonal Availability to say "Fall"
2. If Source Notes = "Only available during Spring" update Seasonal Availability to say "Spring"
3. If Source Notes = "Only available during Winter" update Seasonal Availability to say "Winter"
4. If Source Notes = "Only available during Summer" update Seasonal Availability to say "Summer"

Some items have *holiday seasonality* vs. *meteorological seasonality*.

This should be updated as follows:
1. If {{Source|Source Notes}} contains "Festive Season" update Seasonal Availability to say "Winter"
2. If {{Source|Source Notes}} contains "Cherry-Blossom Season" update Seasonal Availability to say "Spring"
3. If {{Source|Source Notes}} contains "Maple Leaf Season" update Seasonal Availability to say "Fall"
4. If {{Source|Source Notes}} contains "Mushroom Season" update Seasonal Availability to say "Fall"
5. If {{Source|Source Notes}} contains "Wedding Season" update Seasonal Availability to say "Summer"

*See clean_catalogs.py* for cleaning scripts.

<h4>Plan for categorizing rarity</h4>

Item availability can be restricted based on **when**, **where**, and **how** it can be obtained by player:

1. Seasonal Availability: indicates restrictions on **when** item is available
2. Source: indicates restrictions on **where** item is available.
3. Source Notes: indicates restrictions on **where** item is available
4. Catalog: indicates restrictions on **how** item is available

**Rarity Scale**
1. Common
2. Normal - neither scarce or common
3. Scarce
7. Rare

<h5>Rarity assignments</h5>

Think through the rarity assignments to classify items consistently based on their dimensions and how players obtain them.

In [4]:
from clean_catalogs import load_cleaned_data
print("import successful")
rarity_df = load_cleaned_data()

import successful


Define Rarity Rules

**Common items**
1. able to be purchased year-round.
2. from accessible sources (e.g. Nook's Cranny)
3. can be built with DIY recipes.
   
**Normal items**
1. obtained during the main story line of the game.
2. easily obtainable via villagers or during regular gameplay.


**Scarce items**
1. left up to chance.
2. only available during specific seasonsal events.
3. only available when an NPC visits the island.
   
**Rare items**
1. available after obtaining a 5-star island rating.
3. require high levels of friendship.
4. require the player to link Nintendo Account and redeem a 16-digit code.
5. obtained by helping Gulliver after he gets stranded on the island.