# Introduction
Welcome to the analysis of the Brazilian e-commerce dataset from Olist Store. This project aims to test several hypotheses to determine their statistical significance in relation to the broader population. By examining various aspects of the dataset, we hope to uncover insights that can benefit sellers on the e-commerce platform.

## Hypotheses

1. Influence of Payment Methods on Purchase Frequency:<br>
<br>
    - We hypothesize that certain payment methods are preferred by customers and lead to higher purchase frequency. Sellers can offer and promote these preferred payment methods to increase sales.<br>
<br>
2. Effect of Product Reviews on Sales:<br>
<br>
    - We hypothesize that products with higher review ratings have higher sales. Sellers can focus on improving product quality and encouraging satisfied customers to leave positive reviews to boost sales.<br>
<br>
3. Customer Loyalty and Repurchase Rates:<br>
<br>
   - We hypothesize that customers who leave positive reviews are more likely to make repeat purchases. Sellers can implement loyalty programs and follow-up strategies to encourage repeat business from satisfied customers.


# About the dataset

The dataset can be found on kaggle [click here to view dataset in kaggle](https://www.kaggle.com/datasets/olistbr/brazilian-ecommerce)

This dataset contains public information on orders made at Olist Store, a Brazilian e-commerce platform. It includes details of 100,000 orders from 2016 to 2018 across various marketplaces in Brazil.

While this dataset offers a comprehensive view of sales during that period, it does not cover the entire population of all possible sales data. Therefore, it is considered a **sample dataset**.

**Hypothesis testing** is essentially a method to determine if the observations from a sample can be generalized to the broader population. It helps to assess whether the patterns or effects seen in the sample data are likely to be true for the entire population or if they could have occurred by random chance.


  
### Accessing the data
The dataset consists of 8 csv files. I've imported these files into [DB Browser for SQLite](https://sqlitebrowser.org/) to query the data needed for each step and upload the files into my github repository to  be used in jupyter notebooks.

Please refer to the data schema below:

<div style="text-align: center;">
  <img src=https://i.imgur.com/HRhd2Y0.pngL" alt=Schemat" width=6300" height=5200" style="margin-left: 20px;">
</div>


## For our first hypothesis:
Influence of Payment Methods on Purchase Frequency:

- We hypothesize that certain payment methods are preferred by customers and lead to higher purchase frequency. Sellers can offer and promote these preferred payment methods to increase sales.

#### Data required:
##### from the olist_order_payments_dataset table:
 - order_id - unique identifier of an order.<br>
- payment_type - method of payment chosen by the customer.<br>
##### from the olist_order_customer_dataset table:
- customer_id - key to the orders dataset. Each order has a unique customer_id.<br>
- customer_unique_id - unique identifier of a customer.<br>
- customer_zip_code_prefix - first five digits of customer zip code<br>
- customer_city - customer city name<br>
- customer_state - customer state<br>
##### from the olist_orders_dataset table:
- customer_id - key to the customer dataset. Each order has a unique customer_id.<br>
- order_id - unique identifier of the order.
  
```sql
SELECT c.customer_unique_id, c.customer_id, o.order_id, p.payment_type, c.customer_zip_code_prefix,
       c.customer_city, c.customer_state
  FROM olist_customers_dataset as c
  LEFT JOIN olist_orders_dataset as o
    ON c.customer_id = o.customer_id
  LEFT JOIN olist_order_payments_dataset as p
    ON p.order_id = o.order_id
 lIMIT 5;
```

**Result:**


customer_unique_id | customer_id | order_id | payment_type | customer_zip_code_prefix | customer_city | customer_state
:--:|:--:|:--:|:--:|:--:|:--:|:--:|
861eff4711a542e4b93843c6dd7febb0 | 06b8999e2fba1a1fbc88172c00ba8bc7 | 00e7ee1b050b8499577073aeb2a297a1 | credit_card | 14409 | franca | SP
290c77bc529b7ac935b93aa66c333dc3 | 18955e83d337fd6b2def6b18a428ac77 | 29150127e6685892b6eab3eec79f59c7 | credit_card | 9790 | sao bernardo do campo | SP
060e732b5b29e8181a18229c7b0b2b5e | 4e7b3e00288586ebd08712fdd0374a03 | b2059ed67ce144a36e2aa97d2c9e9ad2 | credit_card | 1151 | sao paulo | SP
259dac757896d24d7702b9acbbff3f3c | b2b6027bc5c5109e529d4dc6358b12c3 | 951670f92359f4fe4a63112aa7306eba | credit_card | 8775 | mogi das cruzes | SP
345ecd01c38d18a9036ed96c73b8d066 | 4f2d8ab171c80ec8364f7c12e35b23ad | 6b7d50bd145f6fc7f33cebabd7e49d0f | credit_card | 13056 | campinas | SP


title | first_name | last_name |
--|--|--|
Senior General Manager | Madan | Mohan