# DATA LOADING 
***

![Data Loading](../images/pexels-leeloothefirst-5561923.jpg)


# Online Retail Data Set

### Source:
Dr. Daqing Chen, Director: Public Analytics group, chend@lsbu.ac.uk,  
School of Engineering, London South Bank University, London SE1 0AA, UK.

---

### Data Set Information:
This is a **transnational data set** which contains all transactions occurring between **01/12/2010 and 09/12/2011** for a **UK-based non-store online retail company**.  
The company primarily sells **unique, all-occasion gifts**. Many customers of the company are **wholesalers**.

---

### Attribute Information:
- **InvoiceNo**: Invoice number. A 6-digit integral number uniquely assigned to each transaction. If this code starts with letter 'C', it indicates a **cancellation**.
- **StockCode**: Product (item) code. A 5-digit integral number uniquely assigned to each distinct product.
- **Description**: Product (item) name.
- **Quantity**: The quantity of each product (item) per transaction.
- **InvoiceDate**: The date and time when each transaction was generated.
- **UnitPrice**: Product price per unit in sterling.
- **CustomerID**: A 5-digit integral number uniquely assigned to each customer.
- **Country**: The name of the country where each customer resides.

---

### Relevant Papers:
1. **The evolution of direct, data and digital marketing**  
   Richard Webber, Journal of Direct, Data and Digital Marketing Practice (2013), 14, 291–309.
   
2. **Clustering Experiments on Big Transaction Data for Market Segmentation**  
   Ashishkumar Singh, Grace Rumantir, Annie South, Blair Bethwaite,  
   Proceedings of the 2014 International Conference on Big Data Science and Computing.
   
3. **A decision-making framework for precision marketing**  
   Zhen You, Yain-Whar Si, Defu Zhang, XiangXiang Zeng, Stephen C.H. Leung, Tao Li,  
   Expert Systems with Applications, 42 (2015), 3357–3367.

---

### Citation Request:
Daqing Chen, Sai Liang Sain, and Kun Guo, **"Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining,"**  
*Journal of Database Marketing and Customer Strategy Management*, Vol. 19, No. 3, pp. 197–208, 2012.  
Published online before print: 27 August 2012. doi: [10.1057/dbm.2012.17](https://doi.org/10.1057/dbm.2012.17).

---

### Source:
[UCI Machine Learning Repository - Online Retail Data Set](http://archive.ics.uci.edu/ml/datasets/Online+Retail)


In [2]:
import pandas as pd
df = pd.read_excel('https://query.data.world/s/tx7q23fjzke2ptxeotdq53gyhwf4aa?dws=00000')

In [3]:
df

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 08:26:00,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 08:26:00,2.75,17850.0,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
...,...,...,...,...,...,...,...,...
541904,581587,22613,PACK OF 20 SPACEBOY NAPKINS,12,2011-12-09 12:50:00,0.85,12680.0,France
541905,581587,22899,CHILDREN'S APRON DOLLY GIRL,6,2011-12-09 12:50:00,2.10,12680.0,France
541906,581587,23254,CHILDRENS CUTLERY DOLLY GIRL,4,2011-12-09 12:50:00,4.15,12680.0,France
541907,581587,23255,CHILDRENS CUTLERY CIRCUS PARADE,4,2011-12-09 12:50:00,4.15,12680.0,France


In [6]:
# Save the dataset as an Excel file
file_name = '../data/raw/ecommerce_data_analysis.xlsx'
df.to_excel(file_name, index=False)

print(f"Data has been saved as {file_name}")

Data has been saved as ../data/ecommerce_data_analysis.xlsx
