Rarity Analysis will look at items listed in the catalog and measures how difficult it would be to acquire. 

This notebook lays the groundwork for that analysis by doing the following:
1. **Defines items** (by exploring the available catalogs. e.g. a "Reaction" is not an item, a "bag" is an item. Only the latter will be evaluated for rarity.)
3. Defines criteria to be measured against to **determine rarity** (by exploring the columns in each catalog. Any dimension that indicates availability/restrictions/etc. could factor into the rarity calculation.)
4. Determine the **rarity calculation** (based on the values within relevant columns, what items are rare and what items are common?) 

In [25]:
import os
import sys
import pandas as pd
os.chdir('..') #change the current working directory
sys.path.append(os.getcwd()) #add diretory to the path

<h4>Load Data</h4>

In [2]:
from load_python import load_data
data = load_data(data_folder='data')

Loaded accessories with 222 rows and 22 columns.
Loaded achievements with 84 rows and 21 columns.
Loaded art with 70 rows and 26 columns.
Loaded bags with 96 rows and 20 columns.
Loaded bottoms with 726 rows and 20 columns.
Loaded construction with 236 rows and 7 columns.
Loaded dress-up with 913 rows and 22 columns.
Loaded fencing with 19 rows and 11 columns.
Loaded fish with 80 rows and 41 columns.
Loaded floors with 176 rows and 19 columns.
Loaded fossils with 73 rows and 14 columns.
Loaded headwear with 698 rows and 22 columns.
Loaded housewares with 3275 rows and 32 columns.
Loaded insects with 80 rows and 38 columns.
Loaded miscellaneous with 1307 rows and 31 columns.
Loaded music with 98 rows and 13 columns.
Loaded other with 353 rows and 15 columns.
Loaded photos with 3128 rows and 20 columns.
Loaded posters with 452 rows and 13 columns.
Loaded reactions with 44 rows and 5 columns.
Loaded recipes with 595 rows and 24 columns.
Loaded rugs with 132 rows and 19 columns.
Loaded sho

items = **accessories**, **art**, **bags**, **dress-up**, **floors**, **headwear**, **bottoms**, **housewares**, **posters**, **socks**, **shoes**, **rugs**, **tops**, **umbrellas**, **wall-mounted**, **wallpaper**.

<h4>Explore item dimensions (columns)</h4>

In [3]:
#Accessories
accessories_df = data['accessories'] #access the Accessories data from the dictionary
print(accessories_df.columns) #retrieve the column names


Index(['Name', 'Variation', 'DIY', 'Buy', 'Sell', 'Color 1', 'Color 2', 'Size',
       'Miles Price', 'Source', 'Source Notes', 'Seasonal Availability',
       'Mannequin Piece', 'Version', 'Style', 'Label Themes', 'Type',
       'Villager Equippable', 'Catalog', 'Filename', 'Internal ID',
       'Unique Entry ID'],
      dtype='object')


In [4]:
#Art
art_df = data['art'] #access the item's data from the dictionary
print(art_df.columns) #retrieve the column names


Index(['Name', 'Genuine', 'Category', 'Buy', 'Sell', 'Color 1', 'Color 2',
       'Size', 'Real Artwork Title', 'Artist', 'Museum Description', 'Source',
       'Source Notes', 'Version', 'HHA Concept 1', 'HHA Concept 2',
       'HHA Series', 'HHA Set', 'Interact', 'Tag', 'Speaker Type',
       'Lighting Type', 'Catalog', 'Filename', 'Internal ID',
       'Unique Entry ID'],
      dtype='object')


In [5]:
#Bags
bags_df = data['bags'] #access the item's data from the dictionary
print(bags_df.columns) #retrieve the column names


Index(['Name', 'Variation', 'DIY', 'Buy', 'Sell', 'Color 1', 'Color 2', 'Size',
       'Miles Price', 'Source', 'Source Notes', 'Seasonal Availability',
       'Version', 'Style', 'Label Themes', 'Villager Equippable', 'Catalog',
       'Filename', 'Internal ID', 'Unique Entry ID'],
      dtype='object')


In [6]:
#Bottoms
bottoms_df = data['bottoms'] #access the item's data from the dictionary
print(bottoms_df.columns) #retrieve the column names


Index(['Name', 'Variation', 'DIY', 'Buy', 'Sell', 'Color 1', 'Color 2', 'Size',
       'Source', 'Source Notes', 'Seasonal Availability', 'Mannequin Piece',
       'Version', 'Style', 'Label Themes', 'Villager Equippable', 'Catalog',
       'Filename', 'Internal ID', 'Unique Entry ID'],
      dtype='object')


In [7]:
#dress-up
dressup_df = data['dress-up'] #access the item's data from the dictionary
print(dressup_df.columns) #retrieve the column names


Index(['Name', 'Variation', 'DIY', 'Buy', 'Sell', 'Color 1', 'Color 2', 'Size',
       'Source', 'Source Notes', 'Seasonal Availability', 'Mannequin Piece',
       'Version', 'Style', 'Label Themes', 'Villager Equippable', 'Catalog',
       'Primary Shape', 'Secondary Shape', 'Filename', 'Internal ID',
       'Unique Entry ID'],
      dtype='object')


In [8]:
#floors
floors_df = data['floors'] #access the item's data from the dictionary
print(floors_df.columns) #retrieve the column names


Index(['Name', 'VFX', 'DIY', 'Buy', 'Sell', 'Color 1', 'Color 2',
       'Miles Price', 'Source', 'Source Notes', 'Version', 'HHA Concept 1',
       'HHA Concept 2', 'HHA Series', 'Tag', 'Catalog', 'Filename',
       'Internal ID', 'Unique Entry ID'],
      dtype='object')


In [9]:
#headwear
headwear_df = data['headwear'] #access the item's data from the dictionary
print(headwear_df.columns) #retrieve the column names


Index(['Name', 'Variation', 'DIY', 'Buy', 'Sell', 'Color 1', 'Color 2', 'Size',
       'Miles Price', 'Source', 'Source Notes', 'Seasonal Availability',
       'Mannequin Piece', 'Version', 'Style', 'Label Themes', 'Type',
       'Villager Equippable', 'Catalog', 'Filename', 'Internal ID',
       'Unique Entry ID'],
      dtype='object')


In [10]:
#housewares
housewares_df = data['housewares'] #access the item's data from the dictionary
print(housewares_df.columns) #retrieve the column names


Index(['Name', 'Variation', 'Body Title', 'Pattern', 'Pattern Title', 'DIY',
       'Body Customize', 'Pattern Customize', 'Kit Cost', 'Buy', 'Sell',
       'Color 1', 'Color 2', 'Size', 'Miles Price', 'Source', 'Source Notes',
       'Version', 'HHA Concept 1', 'HHA Concept 2', 'HHA Series', 'HHA Set',
       'Interact', 'Tag', 'Outdoor', 'Speaker Type', 'Lighting Type',
       'Catalog', 'Filename', 'Variant ID', 'Internal ID', 'Unique Entry ID'],
      dtype='object')


In [11]:
#posters
posters_df = data['posters'] #access the item's data from the dictionary
print(posters_df.columns) #retrieve the column names


Index(['Name', 'Buy', 'Sell', 'Color 1', 'Color 2', 'Size', 'Source',
       'Source Notes', 'Version', 'Catalog', 'Filename', 'Internal ID',
       'Unique Entry ID'],
      dtype='object')


In [12]:
#rugs
rugs_df = data['rugs'] #access the item's data from the dictionary
print(rugs_df.columns) #retrieve the column names


Index(['Name', 'DIY', 'Buy', 'Sell', 'Color 1', 'Color 2', 'Size',
       'Miles Price', 'Source', 'Source Notes', 'Version', 'HHA Concept 1',
       'HHA Concept 2', 'HHA Series', 'Tag', 'Catalog', 'Filename',
       'Internal ID', 'Unique Entry ID'],
      dtype='object')


In [13]:
#shoes
shoes_df = data['shoes'] #access the item's data from the dictionary
print(shoes_df.columns) #retrieve the column names


Index(['Name', 'Variation', 'DIY', 'Buy', 'Sell', 'Color 1', 'Color 2', 'Size',
       'Miles Price', 'Source', 'Source Notes', 'Seasonal Availability',
       'Mannequin Piece', 'Version', 'Style', 'Label Themes',
       'Villager Equippable', 'Catalog', 'Filename', 'Internal ID',
       'Unique Entry ID'],
      dtype='object')


In [14]:
#socks
socks_df = data['socks'] #access the item's data from the dictionary
print(socks_df.columns) #retrieve the column names


Index(['Name', 'Variation', 'DIY', 'Buy', 'Sell', 'Color 1', 'Color 2', 'Size',
       'Miles Price', 'Source', 'Source Notes', 'Seasonal Availability',
       'Mannequin Piece', 'Version', 'Style', 'Label Themes',
       'Villager Equippable', 'Catalog', 'Filename', 'Internal ID',
       'Unique Entry ID'],
      dtype='object')


In [15]:
#tops
tops_df = data['tops'] #access the item's data from the dictionary
print(tops_df.columns) #retrieve the column names


Index(['Name', 'Variation', 'DIY', 'Buy', 'Sell', 'Color 1', 'Color 2', 'Size',
       'Miles Price', 'Source', 'Source Notes', 'Seasonal Availability',
       'Mannequin Piece', 'Version', 'Style', 'Label Themes',
       'Villager Equippable', 'Catalog', 'Filename', 'Internal ID',
       'Unique Entry ID'],
      dtype='object')


In [16]:
#umbrellas
umbrellas_df = data['umbrellas'] #access the item's data from the dictionary
print(umbrellas_df.columns) #retrieve the column names


Index(['Name', 'DIY', 'Buy', 'Sell', 'Color 1', 'Color 2', 'Size',
       'Miles Price', 'Source', 'Source Notes', 'Version',
       'Villager Equippable', 'Catalog', 'Filename', 'Internal ID',
       'Unique Entry ID'],
      dtype='object')


In [17]:
#Wall-mounted
wall_mounted_df = data['wall-mounted'] #access the item's data from the dictionary
print(wall_mounted_df.columns) #retrieve the column names


Index(['Name', 'Variation', 'Body Title', 'Pattern', 'Pattern Title', 'DIY',
       'Body Customize', 'Pattern Customize', 'Kit Cost', 'Buy', 'Sell',
       'Color 1', 'Color 2', 'Size', 'Source', 'Source Notes', 'Version',
       'HHA Concept 1', 'HHA Concept 2', 'HHA Series', 'HHA Set', 'Interact',
       'Tag', 'Outdoor', 'Lighting Type', 'Door Deco', 'Catalog', 'Filename',
       'Variant ID', 'Internal ID', 'Unique Entry ID'],
      dtype='object')


In [18]:
#Wallpaper
wallpaper_df = data['wallpaper'] #access the item's data from the dictionary
print(wallpaper_df.columns) #retrieve the column names


Index(['Name', 'VFX', 'VFX Type', 'DIY', 'Buy', 'Sell', 'Color 1', 'Color 2',
       'Miles Price', 'Source', 'Source Notes', 'Catalog', 'Window Type',
       'Window Color', 'Pane Type', 'Curtain Type', 'Curtain Color',
       'Ceiling Type', 'HHA Concept 1', 'HHA Concept 2', 'HHA Series', 'Tag',
       'Version', 'Filename', 'Internal ID', 'Unique Entry ID'],
      dtype='object')


**Seasonal Availability**, **Source**, **Source Notes**, and **Catalog** are all relevant to the Rarity Analysis. 

<h4>Identify dataframes that contain relevant dimensions.</h4>

**Seasonal Availability**, **Source**, **Source Notes**, and **Catalog** are all relevant to the Rarity Analysis. 
1. **Seasonal Availability**: indicates restrictions on **when** item is available
2. **Source**: indicates restrictions on **where** item is available
3. **Source Notes**: indicates restrictions on **where** item is available
4. **Catalog**: indicates restrictions on **how** item is available

In [19]:
#Get all dataframes containing x column
#def get_dfs_with_column(data, column_name): #create the function that'll return relevant dataframes
#    items = []
#    for name, df in data.items():
#        if column_name in df.columns:
#            items.append(name)
#    return items

In [20]:
#Get all dataframes containing Seasonal Availability column
def get_dfs_with_column(data, column_name): #create the function that'll return relevant dataframes
    items = []
    for name, df in data.items():
        if column_name in df.columns:
            items.append(name)
    return items
#call the function
results = get_dfs_with_column(data, 'Seasonal Availability')
print('Dataframes with "Seasonal Availability" Column: ', results)

Dataframes with "Seasonal Availability" Column:  ['accessories', 'bags', 'bottoms', 'dress-up', 'headwear', 'shoes', 'socks', 'tops']


In [21]:
#Get all dataframes containing Source column
def get_dfs_with_column(data, column_name): #create the function that'll return relevant dataframes
    items = []
    for name, df in data.items():
        if column_name in df.columns:
            items.append(name)
    return items
#call the function
results = get_dfs_with_column(data, 'Source')
print('Dataframes with "Source" Column: ', results)

Dataframes with "Source" Column:  ['accessories', 'art', 'bags', 'bottoms', 'construction', 'dress-up', 'fencing', 'floors', 'fossils', 'headwear', 'housewares', 'miscellaneous', 'music', 'other', 'photos', 'posters', 'reactions', 'recipes', 'rugs', 'shoes', 'socks', 'tools', 'tops', 'umbrellas', 'wall-mounted', 'wallpaper']


In [22]:
#Get all dataframes containing Source Notes column
def get_dfs_with_column(data, column_name): #create the function that'll return relevant dataframes
    items = []
    for name, df in data.items():
        if column_name in df.columns:
            items.append(name)
    return items
#call the function
results = get_dfs_with_column(data, 'Source Notes')
print('Dataframes with "Source Notes" Column: ', results)

Dataframes with "Source Notes" Column:  ['accessories', 'art', 'bags', 'bottoms', 'dress-up', 'fencing', 'floors', 'headwear', 'housewares', 'miscellaneous', 'music', 'other', 'posters', 'reactions', 'recipes', 'rugs', 'shoes', 'socks', 'tools', 'tops', 'umbrellas', 'wall-mounted', 'wallpaper']


In [23]:
#Get all dataframes containing Catalog column
def get_dfs_with_column(data, column_name): #create the function that'll return relevant dataframes
    items = []
    for name, df in data.items():
        if column_name in df.columns:
            items.append(name)
    return items
#call the function
results = get_dfs_with_column(data, 'Catalog')
print('Dataframes with "Catalog" column: ', results)

Dataframes with "Catalog" column:  ['accessories', 'art', 'bags', 'bottoms', 'dress-up', 'floors', 'fossils', 'headwear', 'housewares', 'miscellaneous', 'music', 'photos', 'posters', 'rugs', 'shoes', 'socks', 'tops', 'umbrellas', 'wall-mounted', 'wallpaper']


Combine relevant dataframes into a single rarity_df.

<h4>Explore unique values within the relevant dimension</h4>

**Seasonal Availability**, **Source**, **Source Notes**, and **Catalog** are all relevant to the Rarity Analysis. 
1. **Seasonal Availability**: indicates restrictions on **when** item is available
2. **Source**: indicates restrictions on **where** item is available
3. **Source Notes**: indicates restrictions on **where** item is available
4. **Catalog**: indicates restrictions on **how** item is available

<h4>Plan for calculation</h4>

Item availability can be restricted based on **when**, **where**, and **how** it can be obtained by player.

Establish rarity scale:
1. Common (no restrictions)
2. Normal - neither scarce or common (1-2 restrictions)
3. Scarce (3-4 restrictions)
4. Rare (5+ restrictions)

<h4>Cleaning To Do's</h4>

**Round 1:  *Narrow the scope of items we're evaluating.***

1. Concat the item dataframes (key value = Unique Entry ID)
2. Remove irrelevant columns (keep = Seasonal Availability, Source, Source Notes, and Catalog)
3. Add a column that indicates the source dataframe (new column name = Item Type)

**Round 2:  *Resolve seasonal categorization errors.***

Item dataframes sometimes used the "Source Notes" to indicates seasonal restrictions instead of "Seasonal Availability". 

Some dataframes *did not contain* a "Seasonal Availability" column to start with.

This should be updated as follows:
1. If Source Notes = "Only available during Fall" update Seasonal Availability to say "Fall"
2. If Source Notes = "Only available during Spring" update Seasonal Availability to say "Spring"
3. If Source Notes = "Only available during Winter" update Seasonal Availability to say "Winter"
4. If Source Notes = "Only available during Summer" update Seasonal Availability to say "Summer"

Some items have *holiday seasonality* vs. *meteorological seasonality*.

This should be updated as follows:
1. If Source Notes contains "Festive Season" update Seasonal Availability to say "Winter"
2. If Source Notes contains "Cherry-Blossom Season" update Seasonal Availability to say "Spring"
3. If Source Notes contains "Maple Leaf Season" update Seasonal Availability to say "Fall"
4. If Source Notes contains "Mushroom Season" update Seasonal Availability to say "Fall"
5. If Source Notes contains "Wedding Season" update Seasonal Availability to say "Summer"