# Columns review

## BaSalam.reviews.csv

| Column Name          | Description                                      | Data Type | Example Value | Notes |
|----------------------|--------------------------------------------------|-----------|---------------|-------|
| `_id`                | Unique identifier for the review                 | Int       | 661ba8636a6e1c5d7e70ef18| -     |
| `productId`          | Product ID                                       | Int       | 7122085       | link to products.csv      |
| `star`               | Rating (1-5)                                     | Int       | 1             | 78% is 5     |
| `user_id`            | User ID                                          | Int       | 651823        | most frequent value is 4     |
| `isPost`             | Indicates if the data is a post                  | Bool      | False         | All false, cause thay are comments     |
| `isPublic`           | Indicates whether the review is public or private| Bool      | True          | 98% is True     |
| `Id`                 | Another unique identifier for the review         | Int       | 1870799       | Possibly redundant with `_id` |
| `createdAt`          | Timestamp when the review was created            | Datetime  | 2018-10-17T12:21:26 | beetween 2018-09-22 and 2024-04-09    |
| `updatedAt`          | Timestamp when the review was last updated       | Datetime  | 2018-10-17T12:21:26| -     |
| `hashId`             | Hashed version of the ID                         | Str       | eLmjEG             |      |
| `isPosted`           | ?? I think it is same as  `isPost`                          | Bool      | True          | some rows are True - 7% is null |
| `isLikedByCurrentUser`| Indicates if the review is liked by the current user       | Bool      | True          | -     |
| `isDislikedByCurrentUser`| Indicates if the review is disliked by the current user | Bool      | False         | -     |
| `likeCount`           | Number of users who liked the review                       | Int       | 221           | 75% is 0     |
| `dislikeCount`        | Number of users who disliked the review                    | Int       | 0             | all 0     |
| `attachments`         | Photos and videos attached to the review                   | Dict      | {'photos': [], 'video': None}              | 97% is nothing     |
| `history_count`       | Rich review content(in kaggle)  ??                                    | Int       | 1             | ??      |
| `name_of_user`        | User name                                                  | Str       | دوست باسلامی  | 40% is null     |
| `hash_id_of_user`     | Hashed user ID                                             | Str       | Ye0Ky         | -     |
| `photo_of_user`       | User photo                                                 | Dict(urls)| {'LARGE': '', 'MEDIUM': '', 'SMALL': '', 'EXTRA_SMALL': ''}      | download link - 96% is null     |
| `description`         | The text or content of the review                          | Str       | عالی       |       41% is null     |
| `reason_ids[0]` to `reason_ids[7]` | Detailed reasons for reviews through (in kaggle)      | Float     | 155.0    | ?? 93 to 99% is null     |
| `variation_metadata`  | Detailed reasons for reviews through (in kaggle)           | Dict | { } | ?? 95% is null |

## BaSalam.products.csv

| Col name               | Description                                              | Type    | Example value | Note                          |
|------------------------|---------------------------------------------------------|------|------------|------------------------------|
| `_id`                  | Each product id                                | Int64   | 13310448        | -       |
| `_score`               | Basic identifiers and metrics(site) Not unique | float64 | 83.333336       | 1500 unique values   |
| `sales_count_week`     | Sales performance                              | int64   | 0      | 96% is 0 |
| `name`                 | Product name                                   | Object (str) | ریش تراش برقی پرو مکس    | there is duplicate names - less than 1% is nan  |
| `price`                | Product price                                  | float64 | 1000      | less than 1% is nan  |
| `status_id`            | Available                                      | float64 | 2976.0    | 99% is 2976.0      |
| `status_title`         | Available                                      | Object (str) | دردسترس | less than 1% is nan|
| `stock`                | Stock                                          | float64 | 1.0          | less than 1% is nan |
| `photo_SMALL`          | Product photo                                  | url     | link | less than 1% is nan |
| `photo_MEDIUM`         | Product photo                                  | url     | link | less than 1% is nan |
| `rating_average`       | Average Customer ratings for product(0 to 5)   | float64 | 3.6  | less than 1% is nan |
| `rating_count`         | Number of rate for a product                   | float64 | 2.0  | less than 1% is nan |
| `rating_signals`       | same as rating_count                           | float64 | 3    | less than 1% is nan |
| `primaryPrice`         | primary price of product                       | float64 | 100.0| less than 1% is nan |
| `preparationDays`      | How many days take to product get ready        | float64 | 30   | less than 1% is nan |
| `weight`               | Product weight                                 | float64 | 3    | less than 1% is nan |
| `categoryId`           | categoryId old                                 | float64 | 78.0 | less than 10% is nan - 80% is 0 |
| `categoryTitle`        | categoryTitle new                              | str     | ارده | less than 1% is nan |
| `new_categoryId`       | categoryId new                                 | float64 | 35   | less than 1% is nan |
| `has_delivery`         | delivery                                       | bool    | True | 60% is True |
| `has_variation`        |      ??                                        | bool    | False| 95% is False |
| `navigation_id`        | شناسه مربوط به ناوبری                         | float64 |4003.0| 98% is nan |
| `vendor_name`          | نام فروشنده محصول                             | str     | تیپاکس| less than 1% is nan |
| `vendor_identifier`    | شناسه فروشنده                                 | str     | vendor_identifier   | less than 1% is nan |
| `vendor_statusId`      | شناسه وضعیت فروشنده                           | Int     | 2988.0             | less than 1% is nan |
| `vendor_freeShippingToIran` | آیا فروشنده ارسال رایگان به ایران دارد          | Float   | 20000000.0 | 30% is nan |
| `vendor_freeShippingToSameCity` | آیا فروشنده ارسال رایگان به همان شهر دارد   | Float   | 6200000.0  | 30% is nan |
| `vendor_cityId`        | شناسه شهر فروشنده                       | float64 | 2052.0     | less than 1% is nan |
| `vendor_provinceId`    | شناسه استان فروشنده                     | float64 | 32/0      | 32 province - less than 1% is nan |
| `vendor_has_delivery`  | آیا فروشنده خدمات ارسال ارائه می‌دهد    | Bool    | true      | less than 1% is nan |
| `vendor_score`         | vendor score                             | 0/1     | 1    | 50% is nan |
| `vendor_id`            | شناسه فروشنده                           | Float   | 19    | less than 1% is nan |
| `vendor_status_id`     | شناسه وضعیت فروشنده                     | Bool    | 2987.0 | less than 1% is nan |
| `vendor_status_title`  | عنوان وضعیت فروشنده                     | Str     | فعال  | less than 1% is nan - 99% is فعال|
| `vendor_owner_city`    | شهری که فروشنده در آن قرار دارد        | Str    | یزد   | less than 1% is nan |
| `vendor_owner_id`      | شناسه یکتا برای مالک فروشنده           | float64     | 11  | less than 1% is nan - different from vendor_id |
| `isFreeShipping`       | آیا محصول ارسال رایگان دارد(حداقل اندازه سبد خرید) | bool  | 3 | less than 1% is nan |
| `IsAvailable`          | آیا محصول در دسترس است (موجودی)        | Bool    | true  | less than 1% is nan |
| `IsSaleable`           | آیا محصول قابلیت فروش دارد             | Bool    | false | less than 1% is nan |
| `mainAttribute`        | ویژگی اصلی محصول Additional attributes and miscellaneous details | str     | گرم    | 91%  is nan |
| `published`            | آیا محصول منتشر شده است           | bool    | 2341306    | 97% is nan - 99% is True |
| `video_ORIGINAL`       | url      | url     | link    |  93% is nan   |
| `promotions`           | اطلاعات مربوط به تخفیف‌ها           | dict    | dict    | 99% is nan |

# Exploration

- ***There are reviews that are positive, but have a low star rating and vice versa.***
- ***We have no reviews for 1,684,121 (2,411,358 -> 70%) of the products***
- ***Approximately 647,351 (90%) of those products have less than 10 reviews***
![My Image](img/reviews_per_products.png)
- ***There are a lot of category title***
- ***There is a big difference between the average of category 1 - with products that have reviews and 2 - with all those products***
- ***1634181 (68%) of products have 0 rating***
- ***There are some products with high number of rating but without any review***
- ***90% of non zero rating count products have under 10 rating count***
- ***Top five provinces in number of vendors: 1-Tehran(23529) 2-Esfahan(8851) 3-RazaviKhorasan(8446) 4-Qom(5002) 5-Fars(4149) others***
- ***The 100 best and worst vendors have the same results.***