### Проанализируйте интервалы времени между последовательными покупками для каждого клиента в наборе данных о покупках - напишите код для вычисления разницы в днях между текущей покупкой и предыдущей покупкой каждого клиента. Отобразите результат в новом столбце days_between_purchases. Какое количество NaN в столбце days_between_purchases?

In [2]:
import pandas as pd

#### Step 1. Loading and filtrating data

In [3]:
try:
    df = pd.read_parquet("data/couriers_orders.parquet")
    print(df.head())
except FileNotFoundError as e:
    print(e)

        date  courier_id  order_id  distance  travel_time
0 2021-07-12          10         1      1.90        36.17
1 2021-07-02           3         2      3.98        21.34
2 2021-04-15           6         3      3.98        43.33
3 2021-07-16          10         4      2.85        14.01
4 2021-06-11          10         5      4.89        32.09


#### Step 2. Sorting data

In [8]:
df = df.sort_values(by=["courier_id", "date"])
print(df)

           date  courier_id  order_id  distance  travel_time
1330 2021-04-03           1      1331      1.20        39.68
1302 2021-04-04           1      1303      1.23        49.07
346  2021-04-05           1       347      2.32        42.44
277  2021-04-06           1       278      2.23        57.29
1637 2021-04-08           1      1638      2.21        42.41
...         ...         ...       ...       ...          ...
459  2021-08-29          10       460      4.71        28.85
746  2021-08-29          10       747      4.98        58.31
1663 2021-08-29          10      1664      0.77        53.74
573  2021-08-30          10       574      1.64        55.48
1346 2021-08-31          10      1347      3.29        55.35

[1666 rows x 5 columns]


#### Step 3. Searching previous order

In [41]:
df["prev_order_date"] = df.groupby("courier_id")["date"].shift(1)
print(df.head())

           date  courier_id  order_id  distance  travel_time prev_order_date  days_between_purchases
1330 2021-04-03           1      1331      1.20        39.68             NaT                     NaN
1302 2021-04-04           1      1303      1.23        49.07      2021-04-03                     1.0
346  2021-04-05           1       347      2.32        42.44      2021-04-04                     1.0
277  2021-04-06           1       278      2.23        57.29      2021-04-05                     1.0
1637 2021-04-08           1      1638      2.21        42.41      2021-04-06                     2.0


#### Step 4. Days diff calc

In [13]:
time_diff_calc = df["date"] - df["prev_order_date"]
print(time_diff_calc)

1330      NaT
1302   1 days
346    1 days
277    1 days
1637   2 days
        ...  
459    0 days
746    0 days
1663   0 days
573    1 days
1346   1 days
Length: 1666, dtype: timedelta64[us]


In [17]:
pd.set_option("display.width", 500)

In [18]:
df["days_between_purchases"] = time_diff_calc.dt.days
print(df)

           date  courier_id  order_id  distance  travel_time prev_order_date  days_between_purchases
1330 2021-04-03           1      1331      1.20        39.68             NaT                     NaN
1302 2021-04-04           1      1303      1.23        49.07      2021-04-03                     1.0
346  2021-04-05           1       347      2.32        42.44      2021-04-04                     1.0
277  2021-04-06           1       278      2.23        57.29      2021-04-05                     1.0
1637 2021-04-08           1      1638      2.21        42.41      2021-04-06                     2.0
...         ...         ...       ...       ...          ...             ...                     ...
459  2021-08-29          10       460      4.71        28.85      2021-08-29                     0.0
746  2021-08-29          10       747      4.98        58.31      2021-08-29                     0.0
1663 2021-08-29          10      1664      0.77        53.74      2021-08-29               

#### Step 5. NaN count

In [37]:
nan_count = df["days_between_purchases"].isna().sum()

total_couriers = df["courier_id"].nunique()

print("=" * 45)
print(f"    NaN count in days between purchases: {nan_count}")
print(f"    Unique couriers: {total_couriers}")
print("=" * 45)

    NaN count in days between purchases: 10
    Unique couriers: 10
