### Dataset collection

MySQL query to load all `account_items`

```mysql
select
    product_id, count, price, total, account.date
from account_item
inner join account on account_item.account_id = account.id
where account.variant = 3;
```

MySQL query to load products data

```mysql
select
    product_id, product.name, product_group.name
from product_groups
inner join product_group on product_group.id = group_id
inner join product on product.id = product_groups.product_id
where product_id in (select product_id
                     from account_item
                              inner join account on account_item.account_id = account.id
                     where account.variant = 3)
```

To select product crosses

``` mysql
select
    product_id, cross_product_id
from product_cross
where product_id in (select product_id
                     from account_item
                              inner join account on account_item.account_id = account.id
                     where account.variant = 3)
and cross_product_id in (select product_id
                         from account_item
                                  inner join account on account_item.account_id = account.id
                         where account.variant = 3);

```

In [4]:
import pandas as pd

In [16]:
account_items_df = pd.read_csv("account_items.csv")
product_info_df = pd.read_csv("product_information.csv")
crosses_info_df = pd.read_csv("crosses.csv")

In [9]:
account_items_df.head()

Unnamed: 0,product_id,count,price,total,date
0,1474225,1,7.66,7.66,2019-02-08 08:51:41
1,42219302,1,3.05,3.05,2019-02-08 08:52:45
2,1396497,1,1.8,1.8,2019-02-08 08:53:30
3,42219306,1,13.05,13.05,2019-02-08 08:53:30
4,44571735,1,2.02,2.02,2019-02-08 08:53:30


In [10]:
product_info_df.head()

Unnamed: 0,product_id,product_name,group_name
0,16632,"Центральный выключатель, система сцепления",Деталі трансмісії транспорту
1,17290,Підшипник зчеплення,Деталі трансмісії транспорту
2,17359,Насос зчеплення,Деталі трансмісії транспорту
3,19334,Зчеплення (набір) + підшипник,Деталі трансмісії транспорту
4,20346,Зчеплення (набір) + підшипник,Деталі трансмісії транспорту


In [11]:
crosses_info_df.head()

Unnamed: 0,product_id,cross_product_id
0,42219306,44560075
1,2283784,5871020
2,36925441,37037941
3,1203948,51156853
4,2380323,40980404


In [21]:
df = pd.merge(left=account_items_df, right=product_info_df, on="product_id")

In [23]:
df.to_csv("dataset.csv", index=False)