# Lazada Product Review Data In Brief
----
## Description
First step of data exploration. Understanding data dan explore macro-characteristic of the data that we have to deal with

## Author
+ [Christian Wibisono](https://www.kaggle.com/christianwbsn)

In [1]:
import pandas as pd
import numpy as np

In [2]:
raw = pd.read_csv( "../data/raw/lazada_reviews.csv")

In [3]:
raw.head()

Unnamed: 0,rating,review
0,1,pengiriman melalui NINJA sangattttt lamaaaa. j...
1,1,pesananku pada no order ini terkirim dgn baik....
2,5,ga sia sia susah payah ikutan flashsale akhirn...
3,5,"Setelah 7 kali gagal flashsale, akhirnya dapat..."
4,5,"saya kurang setuju dengan FS, memang sih untuk..."


As we can see the data contain 2 columns *rating* and *review*

In [4]:
raw.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 280803 entries, 0 to 280802
Data columns (total 2 columns):
rating    280803 non-null int64
review    220233 non-null object
dtypes: int64(1), object(1)
memory usage: 4.3+ MB


1. rating: numerical (1-5)
2. review: text
----
We have 280803 data points but there are 60570 missing values in review column.
Since our objective is to do **sentiment analysis** on text data, those missing value will be dropped later.
Those missing values couldn't tell us anything about the data

## Characteristic Of Marketplace Product Review in Bahasa Indonesia 

In [5]:
raw['review'][:10]

0    pengiriman melalui NINJA sangattttt lamaaaa. j...
1    pesananku pada no order ini terkirim dgn baik....
2    ga sia sia susah payah ikutan flashsale akhirn...
3    Setelah 7 kali gagal flashsale, akhirnya dapat...
4    saya kurang setuju dengan FS, memang sih untuk...
5    kurir ninja express leleeeeetttt\nkecewa,,\nke...
6    Pengiriman ke kota Depok membutuhkan 11 hari t...
7    Barang udah nyampe...cuma lama banget mending ...
8    saran tolong pengitiman paket saya jng melalui...
9    Hp xiomi emng OK..cepat nyampe,kurir ramah..te...
Name: review, dtype: object

## 1.Informal words

In [6]:
print(raw['review'][2])
print("---------------")
print(raw['review'][7])

ga sia sia susah payah ikutan flashsale akhirnya dapet juga,walaupun harganya naik... turunin lg dong garganya... overall... sip mantab,thank lazada thank xiaomi redmi 5a!!!
---------------
Barang udah nyampe...cuma lama banget mending pake expedisi yg bisa dipercaya...pake LEX ID jgn pake Ninjavanid I'd lama


Data contain words in slang words such as:
+ *ga* --> tidak
+ *mantab* --> mantap
+ *nyampe* --> sampai

## 2. Abbreviation

In [7]:
print(raw['review'][10])
print("---------------")
print(raw['review'][11])

beli flash sale xiaomi redmi 5A + powerbank, knp yg dateng cuma hpnya aja? pdahal bayar sepaket sama powerbank.
---------------
Barangnya sudah sampai secara cepat dan tepat, tp sayang saya pesan 2 item yg 1 item kok dibatalkan sepihak oleh lazada ☹️


Data contain abbreviation i.e:
+ *knp* --> kenapa
+ *yg* --> yang
+ *tp* --> tapi

## 3. Foreign Word

In [8]:
print(raw['review'][30])
print("---------------")
print(raw['review'][19])

Barang bagus sesuai dengan gambar, packing rapi, kualitas bagus sesuai harga, pengiriman sesuai jadwal, recomended seller.
---------------
Flash Sale TERBAIK!!!!
pesan kemarin, hari ini langsung diterima..
Terimakasih Lazada, happy birthday!!!


Data contain foreign languange i.e:
+ *seller* --> penjual
+ *packing* --> pengemasan
+ *recommended* --> direkomendasikan
+ *happy birthday* --> selamat ulang tahun

## 4. Domain Spesific Word

In [9]:
print(raw['review'][10])
print("---------------")
print(raw['review'][14])

beli flash sale xiaomi redmi 5A + powerbank, knp yg dateng cuma hpnya aja? pdahal bayar sepaket sama powerbank.
---------------
Bulan lalu dpet Redmi 5 plus gold dan item, skrng dapet pagi Redmi 5A, ga ragu lagi belanja di Lazada


Our domain: Marketplace Product Review 
Data contain domain spesific word i.e:
+ *flash sale*
+ *xiaomi*
+ *lazada*
+ *redmi*
+ *powerbank*

## 5. Emoji

In [10]:
print(raw['review'][12])
print("---------------")
print(raw['review'][31])

proses waktu pengiriman setelah pemesanan sangat cepat, kurang dari 2 hari xiaomi redmi 5A sudah on hand,

packing rapih tidak ada cacat barang saat diterima, terima kasih LAZADA 😀
---------------
kecewa baru dapat ,krna harga naik 😃😃..pengiriman super cepat sehari nyampe..pakai kurir tiki..semoga barang awet


# 😃

This is the example of emoji found in our dataset

## 6. Emoticon

In [11]:
print(raw['review'][82])
print("---------------")
print(raw['review'][280323])

Barang bagus, pengiriman cepat.. hanya dpt di flash sale harus cepat2 standby di layar kmputer or di Hp... hehhe... :)
---------------
so far baik2 aja sih kayaknya, tapi karena yang 2 mengecewakan jadi saya kecewa :(


## :( sad or happy :)

## 7. Interjection

In [12]:
print(raw['review'][12983])
print("---------------")
print(raw['review'][222677])

Barangnya bagus melebihi redmi 4A. meskipun saya dpt nya bukan yg flash sale. kwkwk
---------------
Saya sudah order sekitar 10 menit yg lalu,sampai sekarang barang belum datang juga




Hahahaha


Data contain interjection such as:
+ **wkwkwk**
+ **Hahahaha**

## 8. Typographical Error

In [13]:
print(raw['review'][15])
print("---------------")
print(raw['review'][8])

lazadaa ituuu kadangh lamaa pengirimannya kadang cepeett...lazada ituuu rawan salah harga dan rawan cansel
---------------
saran tolong pengitiman paket saya jng melaluin jasa penitiman ninja expres lg. paket lama sampai nya.


There are so **many** typographical error in our dataset. Single review can contain more than one typos
+ kadang**h** --> kadang(addition)
+ can**s**el --> cancel(replace)
+ pen**it**iman --> pengiriman(deletion,transposition,replace)
+ melalui**n** --> melalui(addition)

## 9. Irrelevant Text

In [14]:
print(raw['review'][1639])
print("---------------")
print(raw['review'][224544])
print("---------------")
print(raw['review'][80])

#2019gantipresiden
---------------
follow my instagram :
@nahlaa05
@nahlaasafaa
@nahlaam_
nanti bakal di follback kok 
 - oh ya btw squishy nya bagus
---------------
om telolet om Om telolet om Om telolet om Om telolet om Om telolet om Om telolet om Om telolet om Om telolet om Om telolet om Om telolet om Om telolet om Om telolet om Om telolet om Om telolet om Om telolet


## 10. Completely Not in Bahasa

In [15]:
print(raw['review'][75])
print("---------------")
print(raw['review'][551])

the only downside about this transaction was the difficult to get it in flash sale..
but it paid off when you success.. and courier man Mas Suryadi was very helpful
---------------
Over all good.


## 11. Creative Isn't it?

In [16]:
print(raw['review'][38])
print("---------------")
print(raw['review'][21677])

harga naik 100.000 tapi tetap....
╔══╦═╦═╦══╦═╦═╦╦╦╗
║║║║║║║╠╗╔╣║║║║║║║
║║║║╦║║║║║║╦║╔╬╬╬╣
╚╩╩╩╩╩╩╝╚╝╚╩╩╝╚╩╩╝
---------------
.╔══╗═════╔╗═════.
.╚╗╔╬═╦═╦═╣╠╦╦╦╦╗.
.═║║║╩╣║║╬║═╣║║║║.
.═╚╝╚═╩╩╬╗╠╩╬╗╠═╝.
.═══════╚═╝═╚═╝══.


## 12. Local *'Slang'* Words

In [17]:
print(raw['review'][17719])
print("---------------")
print(raw['review'][17025])

Josh gondos top markotop mantap surantap
---------------
SIP MARKUSIP .....


## 13. Unstandardized Words

In [18]:
print(raw['review'][0])
print("---------------")
print(raw['review'][5])

pengiriman melalui NINJA sangattttt lamaaaa. jauh berbeda dengan kurir internal lazada. lebih baik kasih opsi ke pelanggan agar bisa memilih kurir.
---------------
kurir ninja express leleeeeetttt
kecewa,,
kenapa sih Lazada masih mau bekerja sama dengan kurir siput ituuuuu
🐌🐌🐌🐌🐌


+ Sangatttttt --> sangat
+ lamaaaaaa --> lama
+ leleeeeeetttt --> lelet
+ ituuuuuuuu --> itu

## 14. Meaningless Words

In [19]:
print(raw['review'][224432])
print("---------------")
print(raw['review'][279396])
print("---------------")
print(raw['review'][279404])
print("---------------")
print(raw['review'][279403])

Yb km yg v yyo hcy
---------------
Vjcvcv
---------------
E
---------------
Qa


Know the meaning of those word? email me: [christian.wibisono7@gmail.com](mailto:christian.wibisono7@gmail.com)

## 15. Inconsistent Rating and Review

In [20]:
print("Rating " + str(raw['rating'][6970]) + ":" + raw['review'][6970])
print("---------------")
print("Rating " + str(raw['rating'][15908]) + ":" + raw['review'][15908])

Rating 4:DUIT REFUND 2.5 JT DI TILEP EMBAT LAZADA. ANJG
---------------
Rating 1:hp sangat keren.. setiap minggu saya pasti beli.. terimakasih lazada..


Both condition appears in our dataset:
+ Positive review with negative rating
+ Negative review with positive rating

This condition can led to incorrectly labelled data

## 16. Reversed Words

In [21]:
print(raw['review'][228587])
print("---------------")

Pilihan mantap buat para gadgeters!
Kuy
---------------


+ Kuy --> Yuk (reversed letter by letter)