# Task 4.6 – Combining & Exporting Data

In this task, I will:
- Merge cleaned orders with prior order-product data
- Export the combined dataset as a `.pkl` file
- Import the `.pkl` in another notebook
- Merge with the cleaned product data
- Export the final full dataset for future analysis

In [1]:
# Import libraries

import pandas as pd
import numpy as np
import os

In [2]:
path = r'/Users/canancengel/A4_Instacart Basket Analysis/02_Data'

## Step 1: Load Cleaned Orders and Prior Order Products¶
I begin by loading the cleaned orders dataset and the prior order-product relationships.

In [3]:
products_cleaned = pd.read_csv('/Users/canancengel/A4_Instacart Basket Analysis/02_Data/Prepared Data/products_cleaned.csv')
orders_cleaned = pd.read_csv('/Users/canancengel/A4_Instacart Basket Analysis/02_Data/Prepared Data/orders_cleaned.csv')
orders_prior = pd.read_csv('/Users/canancengel/A4_Instacart Basket Analysis/02_Data/Original Data/orders_products_prior.csv')

In [4]:
print(products_cleaned.shape)
print(orders_cleaned.shape)
print(orders_prior.shape)

(49672, 5)
(3421083, 7)
(32434489, 4)


In [5]:
# Load cleaned orders and prior product data
df_ords = pd.read_csv(os.path.join(path, 'Prepared Data', 'orders_cleaned.csv'))
df_ords_prior = pd.read_csv(os.path.join(path, 'Original Data', 'orders_products_prior.csv'))

## Step 2: Merge orders with prior order-product data

I merge the datasets on `order_id` using an inner join and also include the `_merge` column to confirm that all rows came from both tables.

In [22]:
orders_products_combined = orders_cleaned.merge(
    orders_prior, on='order_id', how='inner', indicator=True
)
print(orders_products_combined['_merge'].value_counts())

_merge
both          32434489
left_only            0
right_only           0
Name: count, dtype: int64


## Step 3: Export merged dataframe to `.pkl`

I export the merged dataframe in Pickle format (`.pkl`), which is a compact and efficient format for storage and reloading in future steps.

In [23]:
df_merged.to_pickle('/Users/canancengel/A4_Instacart Basket Analysis/02_Data/Prepared Data/orders_products_combined.pkl')

## Step 4: Import the combined dataframe from `.pkl`

I simulate opening a new notebook by importing the `.pkl` file.  
I'll also check the shape to confirm that the file saved and loaded correctly.

In [24]:
df_combined = pd.read_pickle('/Users/canancengel/A4_Instacart Basket Analysis/02_Data/Prepared Data/orders_products_combined.pkl')
print(df_combined.shape)

(32434489, 11)


## Step 5: Merge combined data with cleaned product data

I use the product_id column to merge the order-product combination with the cleaned product details.

In [25]:
# Load cleaned product data
df_prods = pd.read_csv(os.path.join(path, 'Prepared Data', 'products_cleaned.csv'))

In [26]:
# Merge with product data, using a custom merge flag
df_final = df_combined.merge(df_prods, on='product_id', how='inner', indicator='_merge_flag')

In [27]:
# Check merge results
df_final['_merge_flag'].value_counts()

_merge_flag
both          32404859
left_only            0
right_only           0
Name: count, dtype: int64

In [28]:
df_final = df_final.drop(columns=['_merge_flag'])

## Step 6: Export final merged dataframe to `.pkl`

After combining the cleaned orders and order-product dataframes, I merged this result with the cleaned products dataframe. Now, the final merged dataset contains only consistent and accurate data, and will be used for further analysis.

In [29]:
df_final.to_pickle('/Users/canancengel/A4_Instacart Basket Analysis/02_Data/Prepared Data/ords_prods_merge.pkl')

In [30]:
df_final.shape

(32404859, 15)

In [31]:
df_final.head()

Unnamed: 0,order_id,user_id,order_number,order_dow,order_hour_of_day,days_since_prior_order,new_customer,product_id,add_to_cart_order,reordered,_merge,product_name,aisle_id,department_id,prices
0,2539329,1,1,2,8,,True,196,1,0,both,Soda,77,7,9.0
1,2539329,1,1,2,8,,True,14084,2,0,both,Organic Unsweetened Vanilla Almond Milk,91,16,12.5
2,2539329,1,1,2,8,,True,12427,3,0,both,Original Beef Jerky,23,19,4.4
3,2539329,1,1,2,8,,True,26088,4,0,both,Aged White Cheddar Popcorn,23,19,4.7
4,2539329,1,1,2,8,,True,26405,5,0,both,XL Pick-A-Size Paper Towel Rolls,54,17,1.0
