# Thinking too big: create an "app" or browser extension that will read the materials and sustainability measurements to give a metric on how sustainable it is, while also providing some "warnings" about "hidden" potential impacts of certain clothes - e.g. synthetic has less impact up-front than many other materials but there may be an issue with microfibers being released into the environment

# NEW IDEA: Marry together the goodonyou rating system of the company with the item-level specific data (e.g. material, cost, etc.) -- this is how we separate two t-shirts that are the same color and material.

# Lifespan by Material

From "Does Use Matter? Comparison of Environmental Impacts of Clothing Based on Fiber Type"...

 - Shirts/blouses/tops (casual/everyday)
   - Cotton: 3.8 years
   - Synthetic/Man Made: 6.2 years
   - Wool and Blends: 6.0 years
   - Silk: 8.5 years
   
The UK Waste and Resources Action Programme (WRAP) has prepared a clothing longevity protocol with the aim of improving the sustainability of clothing across its life-cycle - they estimate the per-year use for ***t-shirts (25 wears), and finally shirts (16 wears).***


" For example, wool requires less energy and chemicals to be kept
clean, compared to cotton. Cotton requires a more powerful wash, and often also uses energy for
drying and wrinkle removal. Synthetic fabrics become dirty faster and are washed more frequently"

![https://www.mdpi.com/2071-1050/11/14/3846/htm#B28-sustainability-11-03846](figs/higg_life.png "a title")
![https://www.mdpi.com/2071-1050/11/14/3846/htm#B28-sustainability-11-03846](figs/higg_corrected.png "a title")

# Fiber content reuse (donate to charity, donate to Family/Friends, or sell)

 - "The 2012 Nielsen survey also asked where the respondents were going to dispose of garments when they stopped using them"
   - Cotton/Blends: 42%
   - Wool/Blends: 50%
   - Synthetics: 44%
   - Silk: 47%
   
  
![https://www.mdpi.com/2071-1050/11/14/3846/htm#B28-sustainability-11-03846](figs/reuse.png "a title")


## Sustainable End-of-life as (% DONATION + % Sell + % "Recycle"), they become:
    - Cotton/Blends: 56% (+14)
    - Wool/Blends: 60% (+10)
    - Synthetics: 52% (+8)
    - Silk: 56% (+9)



# Consider a LCA to include Co-products?  This would be "consequential modeling"

Many natural fiber materials used for clothing will have "co-products", which are other useful materials that are not directly used in the production of the clothing item.  These co-products may offset the initial impact of some materials.

On the other hand, synthetics have few co-products and therefore a majority of their impact can be attributed to their production sole purpose.

---

Cotton harvesting:
 - Lint/Fiber (the main product, used to make clothes only 47% of the total mass)
 - Cottonseed (~53% of the total mass, not useful for clothing, but can be useful...)
   - Cottonseed oil (15–22% of seed mass)
   - cottonseed meal (17% by mass)
     
Wool harvesting:
 - Wool Fiber (the main product)
 - Meat (displace the red meat impacts when harvesting only for the meat and not the high quality fiber)

---

# Overall, Combined Model Possibility:

**START**

***Academic/standards background***
 - 1a) Higg MSI
   - will need to be careful/transparent with our use of this.  It is a wildly used tool; however, nearly every academic article I've read is HIGHLY critical of it when it comes to certain things (hopefully we can address the gaps)
 - 1b) Include garment lifetime and washing/drying impacts (for three major types: cotton, polyester, wool) - from Watson et. al [2019]
   - These two combined make up a majority of cradle to grave rating, being agnostic to the particular practices of a manufacturer or impacts due to their location.
   - It will include the cost of manufacturing as well as the average lifespan - might include a min and max lifespan to give potential range based on consumer's use
   - incorporate the average T-shirt lifespan into the fiber type lifespan
   
***Good On You Index***
 - 2.) Incorporate the brand-specific ratings to further specify sustainability - not just the item material but practices of that brand.  
   - This will include price if not already obtained from item level.
   - Need to take an even closer look at their rating system 
   
***Further Options***
 - 3a) Micro-plastic leaching
   - This will be a later consideration, seeing as not all synthetics are made equal and will have this problem.  
   - Also, many of the issues can be reduced with education of the buyer, letting them know the "risks" and how to properly care for it to ensure it does not release large amounts of micro-plastics into the general waterway/Environment. 
 - 3b) Carbon footprint of shipping from plant to seller (US in general, most likely or one of the coasts)
 - 3c) 3R Rates (reduce reuse recycle) for certain materials - from Laitala et al. [2018]
 

# OUR RANKING WILL BE A TYPE OF RANKING THAT IS BASED ON COMPENTSATION

from "Evaluating alternative environmental decision support matrices":

decision analysis and aggregation theory  s “compensation” (Munda and Nardo 2009; Rowley et al. 2012). Compensation is the characteristic of a method that enables poor and good performance to make up for one another indefnitely

The property of compensation is not fawed, per se, as it
is applicable in cases where poor and good performances
should ofset each other, such as with economic criteria
(proft and loss). However, in an environmental and/or sustainability context the practice of compensatory aggregation methods for a composite index is deemed unsuitable
and instead partially compensatory/non-linear methods are
recommended instead (Munda and Nardo 2009; Pollesch
and Dale 2015, 2016; Munda 2016)

In [2]:
import pandas as pd
import numpy as np
import re
from collections import defaultdict 


In [3]:
higg_df = pd.read_json('../data/HIGG_MSI_data')

#path='./data/E-Weaver_data.csv'
folder_path = '/Users/cmurph53/Documents/GitHub/DSCI591-Fall21-RecommendationSystem/data'
image_path = folder_path+'/image_data/'
item_df = pd.read_csv(folder_path + '/E-Weaver_data.csv', index_col=0)

In [4]:
#higg_df = higg_df[higg_df['MAT_FAMILY'].isin(['LEATHER','TEXTILES','SYNTHETIC LEATHER'])]
regex_pat = re.compile('(fabric|fiber)', flags = re.IGNORECASE)
higg_df.loc[:,'MATERIAL'] = higg_df['MATERIAL'].str.replace(regex_pat, '', regex=True).str.lower().str.strip()




#ONLY ONE POLYSTYRENE INSTANCE - MIGHT DROP IT BECAUSE WE DON'T HAVE IT IN THE HIGG
#(df[df['polystyrene']!=0])

# correcting names and replacing columns:

In [5]:
#appending the "polystyrene"
#higg_df = higg_df.append(higg_df.iloc[33])
higg_df.replace('Polysterene (PS) plastic', 'polysterene', inplace=True)


equiv_names_col = {'flax':'linen', 'viscose/rayon':'rayon, cupro', 'elastane/spandex':'spandex', 'goat leather':'lambskin', 'acetate, triacetate':'triacetate'}
higg_df.insert(loc = 2, column='EQUIV_NAME', value = [equiv_names_col[i] if i in equiv_names_col else 'nan' for i in higg_df['MATERIAL']])

#replace viscose/rayon with viscose
#replace elastane/spandex with elastane
higg_df.replace({'viscose/rayon': 'viscose', 'elastane/spandex':'elastane', 'acetate, triacetate':'acetate'}, inplace=True)

# NEXT: WRITE FUNCTION TO CONNECT THE MAT NAME FROM SCRAPED DF TO THE VALUES IN HIGG - MAKING SURE THE NAME CAN BE FOUND

In [6]:
def item_to_higg(item_df, higg_df):
#    item_higg_df = pd.DataFrame(index = item_df.index)
    mat_dict = defaultdict(dict)
    
    index_names = ['Global Warming','Eutrophication','Water Scarcity','Resource Depletion, Fossil Fuels']
    mat_names = ['linen', 'nylon', 'cotton', 'lyocell', 'lambskin',
       'elastane', 'wool', 'viscose', 'polyamide', 'silk', 'acrylic',
       'polyester', 'modal', 'polystyrene', 'rayon', 'spandex', 'cupro',
       'hemp']
    
    #iterate through the items
    for item_i in item_df[mat_names].iterrows():
        #grab only the non-zero instances - name and the fraction
        item_index = item_i[0]
        
        row_i = item_i[1][item_i[1]!=0]
        mats, ratios = row_i.index, row_i.values
        
        mat_dict[item_index] = {'mat_higg':[], 'ratio':[]}
        #iterate through each material and search the higg_index
        for mat_i, ratio_i in zip(mats, ratios):
            mat_dict[item_index]['ratio'].append(ratio_i)
            
            
            
            
            #check if name is in MATERIAL
            if sum(higg_df['MATERIAL'].str.contains(mat_i))!=0:
                matches = higg_df[higg_df['MATERIAL'].str.contains(mat_i)]
                
                #if there are more than 1 match, choose the one that is textiles
                if len(matches)>1:
                    tar_higg = matches[matches['MAT_FAMILY']=='TEXTILES'][index_names].values[0]
                
                else:
                    tar_higg = matches[index_names].values[0]
                    
                    
                #tar_higg = higg_df[higg_df['MATERIAL']==tar_name][index_names]
            #then check EQUIV_NAME names
            elif sum(higg_df['EQUIV_NAME'].str.contains(mat_i))!=0:
                matches = higg_df[higg_df['EQUIV_NAME'].str.contains(mat_i)]
                tar_higg = matches[index_names].values[0]
                
                #tar_higg = higg_df[higg_df['EQUIV_NAME']==tar_name][index_names]
            else:
                tar_higg = 'nan'
#                return('NOT IN HIGG INDEX')
            
            mat_dict[item_index]['mat_higg'].append(tar_higg)
    return(pd.DataFrame(mat_dict).T)

In [7]:
item_higg_df = item_to_higg(item_df, higg_df)

In [10]:
item_higg_df.iloc[63]['mat_higg']

[array([8.63, 6.72, 5.57, 9.22]), array([ 9.66,  3.29,  1.32, 12.6 ])]

## User's input their Ranking of concerns to weight each of the metrics

In [11]:
user_concern_preference = {'gloabl_warming': 1, 'ocean': 4, 'water':3, 'resource_depletion':2}

In [12]:
def normal_user_pref(user_pref):
    #columns
    df_cols = ['Global Warming',	'Eutrophication',	'Water Scarcity',	'Resource Depletion, Fossil Fuels']
    
    #convert ranking to fractions 
    vals = 1/np.array(list(user_pref.values()))
    vals = np.round(vals, decimals=1)
    normal_vals = vals/sum(vals)
    
    return({i:j for i,j in zip(df_cols, normal_vals)})

In [None]:
norms=normal_user_pref(user_concern_preference)
pref_weighted_textiles = higg_df[norms.keys()]*norms.values()

In [31]:
# 1.00E + 00 kg−1 CO2 eq 
#1.00E + 03 kg−1 PO eq
# 3.31E + 01 (m3)−1 
# 7.59E − 02 MJ−1

norm_values = {i:j for i,j  in zip(['gloabl_warming','ocean', 'water','resource_depletion'], [1, 1000, 33.1, 0.0759])}

In [32]:
norm_values

{'gloabl_warming': 1.0,
 'ocean': 0.001,
 'water': 0.030211480362537763,
 'resource_depletion': 13.175230566534916}

In [34]:
tst_item = item_higg_df.iloc[0]['mat_higg'][0]

NR = list(norm_values.values())

#NF is the inverse of NR
#normalization factor (the inverse of a normalization reference value)
NF = [1/i for i in NR]


#higg_tst = np.sum(tst*list(norm_facors.values()))
higg_tst = np.sum(tst_item*NF)

higg_tst

98.6849680188195

In [None]:
tst_item*list(norm_facors.values())

array([8.80000e+00, 1.76000e+04, 1.89001e+03, 5.07771e-01])

In [20]:
tst

array([ 8.8 , 17.6 , 57.1 ,  6.69])