
## EDA: Reviews Analysis

This notebook focuses on understanding the structure of the **reviews** table and its relevance to customer repurchase behavior.

### Key Observations

* Identified that the **nominal grain** of the reviews table is at the **review_id level**, but observed duplicate entries.

* These duplicates are expected, as a single product or order can receive **multiple reviews** over time.

* Review-related signals (such as review frequency and sentiment proxies) are an important behavioral factor when analyzing **repeat purchase likelihood**, as customer experience captured through reviews can influence whether a customer chooses to repurchase.


In [None]:
import pandas as pd
reviews=pd.read_csv("../Source Data/olist_order_reviews_dataset.csv")

In [3]:
reviews.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 99224 entries, 0 to 99223
Data columns (total 7 columns):
 #   Column                   Non-Null Count  Dtype 
---  ------                   --------------  ----- 
 0   review_id                99224 non-null  object
 1   order_id                 99224 non-null  object
 2   review_score             99224 non-null  int64 
 3   review_comment_title     11568 non-null  object
 4   review_comment_message   40977 non-null  object
 5   review_creation_date     99224 non-null  object
 6   review_answer_timestamp  99224 non-null  object
dtypes: int64(1), object(6)
memory usage: 5.3+ MB


In [13]:
reviews['review_id'].duplicated().sum()

814

An order can have multiple reviews so multiple review ids

In [5]:
reviews['order_id'].duplicated().sum()

551

In [11]:
multi_review_orders = (
    reviews.groupby("order_id")
    .size()
    .reset_index(name="review_count")
)

multi_review_orders = multi_review_orders[multi_review_orders["review_count"] > 2]

multi_review_orders.head()


Unnamed: 0,order_id,review_count
1455,03c939fd7fd3b38f8485a0f95798f1f6,3
54489,8e17072ec97ce29f0e1f111e598b0c85,3
77319,c88b1d1b157a9999ce368f218a407141,3
86232,df56136b8031ecd28e200bb18e6ddb2e,3


In [8]:
reviews.loc[reviews['order_id'].isin(multi_review_orders['order_id'].tolist())].sort_values('order_id')

Unnamed: 0,review_id,order_id,review_score,review_comment_title,review_comment_message,review_creation_date,review_answer_timestamp
25612,89a02c45c340aeeb1354a24e7d4b2c1e,0035246a40f520710769010f752e7507,5,,,2017-08-29 00:00:00,2017-08-30 01:59:12
22423,2a74b0559eb58fc1ff842ecc999594cb,0035246a40f520710769010f752e7507,5,,Estou acostumada a comprar produtos pelo barat...,2017-08-25 00:00:00,2017-08-29 21:45:57
22779,ab30810c29da5da8045216f0f62652a2,013056cfe49763c6f66bda03396c5ee3,5,,,2018-02-22 00:00:00,2018-02-23 12:12:30
68633,73413b847f63e02bc752b364f6d05ee9,013056cfe49763c6f66bda03396c5ee3,4,,,2018-03-04 00:00:00,2018-03-05 17:02:00
854,830636803620cdf8b6ffaf1b2f6e92b2,0176a6846bcb3b0d3aa3116a9a768597,5,,,2017-12-30 00:00:00,2018-01-02 10:54:06
...,...,...,...,...,...,...,...
27465,5e78482ee783451be6026e5cf0c72de1,ff763b73e473d03c321bcd5a053316e8,3,,Não sei que haverá acontecido os demais chegaram,2017-11-18 00:00:00,2017-11-18 09:02:48
41355,39de8ad3a1a494fc68cc2d5382f052f4,ff850ba359507b996e8b2fbb26df8d03,5,,Envio rapido... Produto 100%,2017-08-16 00:00:00,2017-08-17 11:56:55
18783,80f25f32c00540d49d57796fb6658535,ff850ba359507b996e8b2fbb26df8d03,5,,"Envio rapido, produto conforme descrito no anu...",2017-08-22 00:00:00,2017-08-25 11:40:22
92230,870d856a4873d3a67252b0c51d79b950,ffaabba06c9d293a3c614e0515ddbabc,3,,,2017-12-20 00:00:00,2017-12-20 18:50:16


In [12]:
reviews.loc[reviews['order_id']=='03c939fd7fd3b38f8485a0f95798f1f6']

Unnamed: 0,review_id,order_id,review_score,review_comment_title,review_comment_message,review_creation_date,review_answer_timestamp
8273,b04ed893318da5b863e878cd3d0511df,03c939fd7fd3b38f8485a0f95798f1f6,3,,Um ponto negativo que achei foi a cobrança de ...,2018-03-20 00:00:00,2018-03-21 02:28:23
51527,f4bb9d6dd4fb6dcc2298f0e7b17b8e1e,03c939fd7fd3b38f8485a0f95798f1f6,4,,,2018-03-29 00:00:00,2018-03-30 00:29:09
69438,405eb2ea45e1dbe2662541ae5b47e2aa,03c939fd7fd3b38f8485a0f95798f1f6,3,,Seria ótimo se tivesem entregue os 3 (três) pe...,2018-03-06 00:00:00,2018-03-06 19:50:32
