# Combining & Exporting Data (Continued)

# Contents
1. Import libraries
2. Import datasets
3. Check imported data
4. Merge dataframes
5. Export merged dataframe

# 1. Import libraries

In [5]:
#Import libraries
import pandas as pd
import numpy as np
import os

# 2. Import datasets

In [7]:
#Define the project path as a string and assign to a variable called path
path = r'/Users/davidgriesel/Documents/0 - CareerFoundry/02 - Data Analytics Immersion/04 - Python Fundamentals for Data Analysts/Instacart Basket Analysis - IC 202409'

## 2.1. Orders and products combined

In [9]:
#Import dataset
df_orders_products_combined = pd.read_pickle(os.path.join(path, '02 - Data', 'Prepared Data', '06_orders_products_combined.pkl'))

## 2.2. Products cleaned

In [11]:
#Import dataset.
df_products_cleaned = pd.read_csv(os.path.join(path, '02 - Data', 'Prepared Data', '05_products_cleaned.csv'), index_col = 0)

# 3. Check imported data

## 3.1. Orders and products combined 

In [14]:
#Get the dimensions of the dataframe
df_orders_products_combined.shape

(32434489, 10)

In [15]:
#Display the first 5 rows of the dataframe
df_orders_products_combined.head()

Unnamed: 0,order_id,user_id,order_number,orders_day_of_week,order_hour_of_day,days_since_prior_order,first_order,product_id,add_to_cart_order,reordered
0,2539329,1,1,2,8,,True,196,1,0
1,2539329,1,1,2,8,,True,14084,2,0
2,2539329,1,1,2,8,,True,12427,3,0
3,2539329,1,1,2,8,,True,26088,4,0
4,2539329,1,1,2,8,,True,26405,5,0


## 3.2. Products cleaned

In [17]:
#Get the dimensions of the dataframe
df_products_cleaned.shape

(49672, 5)

In [18]:
#Display the first 5 rows of the dataframe
df_products_cleaned.head()

Unnamed: 0,product_id,product_name,aisle_id,department_id,prices
0,1,Chocolate Sandwich Cookies,61,19,5.8
1,2,All-Seasons Salt,104,13,9.3
2,3,Robust Golden Unsweetened Oolong Tea,94,7,4.5
3,4,Smart Ones Classic Favorites Mini Rigatoni Wit...,38,1,10.5
4,5,Green Chile Anytime Sauce,5,13,4.3


# 4. Merge dataframes

In [20]:
#Merge using order_id and inner join while adding a merge flag.
df_orders_products_merged = df_orders_products_combined.merge(df_products_cleaned, on = 'product_id', indicator = True)

In [21]:
#Check merge rate
df_orders_products_merged['_merge'].value_counts()

_merge
both          32404859
left_only            0
right_only           0
Name: count, dtype: int64

In [22]:
#Get the dimensions of the dataframe
df_orders_products_merged.shape

(32404859, 15)

In [23]:
#Display the first 5 rows of the dataframe
df_orders_products_merged.head()

Unnamed: 0,order_id,user_id,order_number,orders_day_of_week,order_hour_of_day,days_since_prior_order,first_order,product_id,add_to_cart_order,reordered,product_name,aisle_id,department_id,prices,_merge
0,2539329,1,1,2,8,,True,196,1,0,Soda,77,7,9.0,both
1,2539329,1,1,2,8,,True,14084,2,0,Organic Unsweetened Vanilla Almond Milk,91,16,12.5,both
2,2539329,1,1,2,8,,True,12427,3,0,Original Beef Jerky,23,19,4.4,both
3,2539329,1,1,2,8,,True,26088,4,0,Aged White Cheddar Popcorn,23,19,4.7,both
4,2539329,1,1,2,8,,True,26405,5,0,XL Pick-A-Size Paper Towel Rolls,54,17,1.0,both


# 5. Export dataframe

In [25]:
#Drop merge flag
df_orders_products_merged = df_orders_products_merged.drop(columns = ['_merge'])

In [26]:
#Get the dimensions of the merged dataframe
df_orders_products_merged.shape

(32404859, 14)

In [27]:
#Export merged dataframe
df_orders_products_merged.to_pickle(os.path.join(path, '02 - Data','Prepared Data', '06_orders_products_merged.pkl'))

# Task (continued)

## 3. Import datasets
In a new notebook, import the orders_products_combined dataframe from the pickle file you just saved

##### Refer: 2. Import datasets

## 4. Check data
Check the shape of the imported dataframe (it should be the same as the one you exported—always check!).

##### Refer: 3. Check imported data

## 5. Merge datasets
Determine a suitable way to combine the orders_products_combined dataframe with your products data set. Make sure you’re using your wrangled, cleaned, and deduped products data set stored in your “Prepared Data” folder from the previous Exercise’s task.

##### Refer: 4. Merge dataframes

## 6. Confirm results
Confirm the results of the merge using the merge flag.

##### Refer: 4. Merge dataframes

## 7. Export dataframe
Export the newly created dataframe as ords_prods_merge in a suitable format (taking into consideration the size).

##### Refer: 5. Export dataframe

## 8. Check notebook
Ensure your notebooks and Instacart project folder are organized and that comments and section headings have been used throughout your code. All your exported data files should be effectively labeled and stored in your “Data” folder.

##### Checked notebook, included section headings and code comments

## 9. Save & submit
Save the two notebooks and send them to your tutor along with a screenshot of the exported data sets in your Instacart project folder.

##### Notebook saved and submitted