---

Milestone - 3

- Name    = Divani Rafitya
- Batch   = FTDS_BSD_006
- [Dataset](https://www.kaggle.com/datasets/iamsouravbanerjee/customer-shopping-trends-dataset/data)

---

**Business Understanding:**
To ensure a business's sustainability and growth, it is essential to understand customer shopping preferences and behavior. However, comprehending these aspects remains challenging due to the complexity and diversity of consumer interactions. Therefore, gathering insight regarding customer behavior is crucial for businesses to enhance their understanding of their customer base. This includes segmenting customers based on their shopping preferences, analyzing purchase patterns and frequency, assessing customer feedback ratings to improve satisfaction, and optimizing promotional offers and discounts through customer segmentation.

Therefore, as a consultant, my primary objective is to understand customer preferences and trends. Achieving this involves analyzing consumer behavior and purchasing patterns to enhance marketing strategies, ultimately leading to increased profitability.

To make it easier to carry out the analysis, the 5W1H approach is used to understand consumer behavior and purchasing patterns.
- How do overall sales vary across each category based on gender?
- Who are our primary customers based on location?
- What are the most frequently used payment methods?
- Is offering discounts effective in encouraging customers to make purchases?
- Are customers satisfied with their shopping experience based on review ratings? 
- What can be improved to increase customer satisfaction?

**Conclusion:**
In conclusion, understanding customer shopping preferences is crucial for sustaining and achieving growth. From the analysis, several key points emerge:
- Male customers exhibit higher purchasing power compared to Female. However, the significant purchase of dress items by Male customers raises questions about data accuracy or indicates a trend where men may buy dresses as gifts.
- Montana, California, and Idaho are the primary demographic locations for the business.
- Customers tend to prefer PayPal and Credit Card payments due to the convenience and security they offer.
- Subscription-based marketing strategies effectively drive customer purchases, as evidenced by 100% of subscribers using available promo codes.
- The distribution of review ratings indicates a significant number of dissatisfied customers, particularly regarding clothing products, where items received do not meet customer expectations.
- Review ratings remain poor, especially in the male clothing category.

**Recommendation:**
There are several recommendations for business:
- Conduct periodic customer satisfaction surveys to evaluate their experience and gain insights for further improvements.
- Consider expanding business by opening new warehouse branches in Montana and California due to the primary customer base located there, aiming to enhance efficiency in shipping and improve customer satisfaction.
- Meet customer expectations, especially in the clothing category, by improving product quality and implementing a return policy under applicable terms and conditions.
- Enhance product offerings by introducing limited-time subscription discounts for non-subscribing customers.
- Conduct periodic customer satisfaction surveys to evaluate their experience and gain insights for further improvements.

**Data Validation using Great Expectation:**

In [1]:
# install libraries
! pip install -q great-expectations

In [2]:
# import libraries
from great_expectations.data_context import FileDataContext

1. Initialization data context

In [3]:
# create a data context
context = FileDataContext.create(project_root_dir='./')

2. Connect to a Datasource and Data Asset

In [4]:
# give a name to a datasource (must be unique between Datasources)
datasource_name = 'shoppingtrends'
datasource = context.sources.add_pandas(datasource_name)

# give a name to a data asset
asset_name = 'trends'
path_to_data = 'P2M3_divani_rafitya_data_clean.csv'
asset = datasource.add_csv_asset(asset_name, filepath_or_buffer=path_to_data)

# build batch request
batch_request = asset.build_batch_request()

3. Create Expectation Suite

In [5]:
# creat an expectation suite
expectation_suite_name = 'expectation-shoppingtrends-dataset'
context.add_or_update_expectation_suite(expectation_suite_name)

# create a validator using above expectation suite
validator = context.get_validator(
    batch_request = batch_request,
    expectation_suite_name = expectation_suite_name)

# check the validator
validator.head()

Calculating Metrics:   0%|          | 0/1 [00:00<?, ?it/s]

Unnamed: 0.1,Unnamed: 0,customer_id,age,gender,item_purchased,category,purchase_amount_usd,location,size,color,season,review_rating,subscription_status,shipping_type,discount_applied,promo_code_used,previous_purchases,payment_method,frequency_of_purchases
0,0,1,55,Male,Blouse,Clothing,53,Kentucky,L,Gray,Winter,3.1,Yes,Express,Yes,Yes,14,Venmo,Fortnightly
1,1,2,19,Male,Sweater,Clothing,64,Maine,L,Maroon,Winter,3.1,Yes,Express,Yes,Yes,2,Cash,Fortnightly
2,2,3,50,Male,Jeans,Clothing,73,Massachusetts,S,Maroon,Spring,3.1,Yes,Free Shipping,Yes,Yes,23,Credit Card,Weekly
3,3,4,21,Male,Sandals,Footwear,90,Rhode Island,M,Maroon,Spring,3.5,Yes,Next Day Air,Yes,Yes,49,PayPal,Weekly
4,4,5,45,Male,Blouse,Clothing,49,Oregon,M,Turquoise,Spring,2.7,Yes,Free Shipping,Yes,Yes,31,PayPal,Annually


4. Expectation

In [6]:
# expectation 1: column `customer_id` must be unique
## must be unique to identify each customer, thus, it cannot be the same
validator.expect_column_values_to_be_unique(column='customer_id')

Calculating Metrics:   0%|          | 0/8 [00:00<?, ?it/s]

{
  "success": true,
  "result": {
    "element_count": 3900,
    "unexpected_count": 0,
    "unexpected_percent": 0.0,
    "partial_unexpected_list": [],
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_percent_total": 0.0,
    "unexpected_percent_nonmissing": 0.0
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

In [7]:
# expectation 2: column `age` must to be between 0-100
## must be in reasonable age range to be able to make transactions
validator.expect_column_values_to_be_between(column='age',min_value=10,max_value=100)

Calculating Metrics:   0%|          | 0/8 [00:00<?, ?it/s]

{
  "success": true,
  "result": {
    "element_count": 3900,
    "unexpected_count": 0,
    "unexpected_percent": 0.0,
    "partial_unexpected_list": [],
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_percent_total": 0.0,
    "unexpected_percent_nonmissing": 0.0
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

In [8]:
# expectation 3: column `payment_method` must to be in set
## payment method: PayPal, Credit Card, Cash, Debit Card, Venmo, Bank Transfer
validator.expect_column_values_to_be_in_set(column='payment_method',value_set=['PayPal','Credit Card','Cash','Debit Card','Venmo','Bank Transfer'])

Calculating Metrics:   0%|          | 0/8 [00:00<?, ?it/s]

{
  "success": true,
  "result": {
    "element_count": 3900,
    "unexpected_count": 0,
    "unexpected_percent": 0.0,
    "partial_unexpected_list": [],
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_percent_total": 0.0,
    "unexpected_percent_nonmissing": 0.0
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

In [9]:
# expectation 4: column `review_rating` must to be in type list
## must be in form of integer or float, not string
validator.expect_column_values_to_be_in_type_list(column='review_rating',type_list=['int','float'])

Calculating Metrics:   0%|          | 0/1 [00:00<?, ?it/s]

{
  "success": true,
  "result": {
    "observed_value": "float64"
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

In [10]:
# expectation 5: column `category` must be in regular expression of 4 categories only
## category: Clothing, Accessories, Footwear, Outerwear
validator.expect_column_values_to_match_regex(column='category',regex='^(Clothing|Accessories|Footwear|Outerwear)$')

Calculating Metrics:   0%|          | 0/8 [00:00<?, ?it/s]

{
  "success": true,
  "result": {
    "element_count": 3900,
    "unexpected_count": 0,
    "unexpected_percent": 0.0,
    "partial_unexpected_list": [],
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_percent_total": 0.0,
    "unexpected_percent_nonmissing": 0.0
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

In [11]:
# expectation 6: column `purchase_amount_usd` must be in integer value
validator.expect_column_values_to_be_of_type(column='purchase_amount_usd',type_='int')

Calculating Metrics:   0%|          | 0/1 [00:00<?, ?it/s]

{
  "success": true,
  "result": {
    "observed_value": "int64"
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}

In [12]:
# expectation 7: column `location` must be in string value with specified character length
validator.expect_column_value_lengths_to_be_between(column='location',min_value=1,max_value=15)

Calculating Metrics:   0%|          | 0/9 [00:00<?, ?it/s]

{
  "success": true,
  "result": {
    "element_count": 3900,
    "unexpected_count": 0,
    "unexpected_percent": 0.0,
    "partial_unexpected_list": [],
    "missing_count": 0,
    "missing_percent": 0.0,
    "unexpected_percent_total": 0.0,
    "unexpected_percent_nonmissing": 0.0
  },
  "meta": {},
  "exception_info": {
    "raised_exception": false,
    "exception_traceback": null,
    "exception_message": null
  }
}