<a href="https://colab.research.google.com/github/afortuny/SustainableFashionAI/blob/main/CircularityAnalysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Analyzing product reviews to understand circularity?

We will leverage the data from https://www.trailrunningreview.com/ , a leading company in product analysis, and we will evalute each trail running shoe from SS22 based on that dimensions:

Circularity:

*   Durability: Is the product make to last?
*   Versatility: can the product be used for multiple conditions /situations?
*   Sustainable materials: is the product made with organic, recycable or vegan materials?

Desirability:

*   Function: Is the product build up appropiate for its purpose?
*   Innovation: Is the product disrupting the market in some sense?
*   Price: Is the product affordable?



# Understanding the analytical problem at hand

Our dataset contain large product reviews from which we should be able to extract all the aspects above, with the exeption of price, which is already part of metadata. For the latter our plan is to simply create clusters of products based on their whole review similarity and calculate deviations with respect the average price for the cluster. For the other features we will use unsupersvised aspect sentiment analysis. To do that we need to follow the next steps:



1.   Use a pretrained model in the language of the corpora. In our case spanish.
2.   Detect the aspects of the text, map them with our key dimensions: durability, versaility, sustainability, functionality and innovation. 
3.   Cut the text parts related to the aspect
4.   Perform sentiment analysis of the aspect related chunks
5.   Provide a score based on the intensity of the sentiment per score.

We will try the following workflow on a single review to validate our process before we do the large scale data parsing and fine tune of the language model for the domain we are working on.







# Testing the workflow on a single review

In [3]:
import chardet    
rawdata = open('/content/drive/MyDrive/Sustainability Fashion AI/SampleReview.csv', 'rb').read()
result = chardet.detect(rawdata)
charenc = result['encoding']
print(charenc)

Windows-1252


In [148]:
import pandas as pd
review = pd.read_csv('/content/drive/MyDrive/Sustainability Fashion AI/SampleReview.csv',encoding = 'Windows-1252') 
review['durabilidad'] = ''
review['funcionalidad'] = ''
review['innovacion'] = ''
review['polivalencia'] = ''
review['sostenibilidad'] = ''

In [8]:
review_txt = review['Review'].astype(str)

## detect the list of potential aspects and map them with our key terms based on similarity

In [None]:
!python -m spacy download es

In [19]:
import spacy
import pandas as pd
nlp = spacy.load("es_core_news_sm")

In [114]:
aspects_p = nlp("durabilidad sostenibilidad polivalencia funcionalidad innovacion")

In [149]:
review

Unnamed: 0,Model,Brand,Weight,Price,Drop,Runner weight,Use,Terrain,Review,durabilidad,funcionalidad,innovacion,polivalencia,sostenibilidad
0,MAFATE SPEED 4,HOKA,241 / 295 g,"175,00 €",4 mm,Medio,Polivalente,Larga distancia,Sin duda la considerable media suela de las Ho...,,,,,
1,AGILITY PEAK 4 GORE-TEX,MERRELL,264 / 320 g,"140,00 €",6 mm,Pesado,Polivalente,Larga distancia,"Ahora que en breve viene el frío y mal tiempo,...",,,,,
2,REACT PEGASUS TRAIL 4,NIKE,291 g,"129,99 €",9 mm,Ligero,Compacto,Larga distancia,Podríamos decir que estas Nike React Pegasus T...,,,,,


In [166]:
#for i in range(0,len(review.index)):
for i in range(0,2):
  review_p = nlp(nlp(review.loc[i]["Review"]))
  scores = [(aspect.text, token.text, aspect.similarity(token)) for token in review_p.ents for aspect in aspects_p]
  df = pd.DataFrame(scores)
  df.columns =['aspect', 'term','similarity']
  df = df.drop_duplicates(
  subset = ['aspect', 'term'],
  keep = 'last').reset_index(drop = True)
  df_results = df.groupby('aspect').agg({'similarity': ['median', 'min', 'max']})
  df_results.columns = ["median","min","max"]
  #max= df['similarity'].max()
  df_results['score'] = df_results['median']
  review.loc[i,['durabilidad','funcionalidad','innovacion','polivalencia','sostenibilidad']] = df_results['score'].T

  after removing the cwd from sys.path.


In [167]:
review

Unnamed: 0,Model,Brand,Weight,Price,Drop,Runner weight,Use,Terrain,Review,durabilidad,funcionalidad,innovacion,polivalencia,sostenibilidad
0,MAFATE SPEED 4,HOKA,241 / 295 g,"175,00 €",4 mm,Medio,Polivalente,Larga distancia,Sin duda la considerable media suela de las Ho...,0.08158,0.200202,0.053097,0.087416,0.063019
1,AGILITY PEAK 4 GORE-TEX,MERRELL,264 / 320 g,"140,00 €",6 mm,Pesado,Polivalente,Larga distancia,"Ahora que en breve viene el frío y mal tiempo,...",0.134401,0.317105,0.354117,0.190218,0.135602
2,REACT PEGASUS TRAIL 4,NIKE,291 g,"129,99 €",9 mm,Ligero,Compacto,Larga distancia,Podríamos decir que estas Nike React Pegasus T...,0.533297,0.922819,1.0,0.633271,0.558242


In [168]:
review.to_csv("export.csv")