Find the percentage of shipable orders

Find the percentage of shipable orders.
Consider an order is shipable if the customer's address is known.

In [1]:
import pandas as pd
import numpy as np

In [4]:
orders = pd.read_csv("../CSV/orders.csv")
orders = orders.iloc[:, :5]
orders.head()

Unnamed: 0,id,cust_id,order_date,order_details,total_order_cost
0,1,3,2019-03-04,Coat,100
1,2,3,2019-03-01,Shoes,80
2,3,3,2019-03-07,Skirt,30
3,4,7,2019-02-01,Coat,25
4,5,7,2019-03-10,Shoes,80


In [7]:
customers = pd.read_csv("../CSV/customers.csv")
columns_to_drop = ['Unnamed: 6']
customers.drop(columns=columns_to_drop, inplace=True)
customers.head()

Unnamed: 0,id,first_name,last_name,city,address,phone_number
0,8,John,Joseph,San Francisco,,928-386-8164
1,7,Jill,Michael,Austin,,813-297-0692
2,4,William,Daniel,Denver,,813-368-1200
3,5,Henry,Jackson,Miami,,808-601-7513
4,13,Emma,Isaac,Miami,,808-690-5201


In [8]:
merged_df = pd.merge(orders,customers,left_on='cust_id',right_on='id')
merged_df.head()

Unnamed: 0,id_x,cust_id,order_date,order_details,total_order_cost,id_y,first_name,last_name,city,address,phone_number
0,1,3,2019-03-04,Coat,100,3,Farida,Joseph,San Francisco,3153 Rhapsody Street,813-368-1200
1,2,3,2019-03-01,Shoes,80,3,Farida,Joseph,San Francisco,3153 Rhapsody Street,813-368-1200
2,3,3,2019-03-07,Skirt,30,3,Farida,Joseph,San Francisco,3153 Rhapsody Street,813-368-1200
3,4,7,2019-02-01,Coat,25,7,Jill,Michael,Austin,,813-297-0692
4,5,7,2019-03-10,Shoes,80,7,Jill,Michael,Austin,,813-297-0692


In [9]:
merged_df['is_shipable'] = (merged_df.address.notnull()).astype(int)
merged_df

Unnamed: 0,id_x,cust_id,order_date,order_details,total_order_cost,id_y,first_name,last_name,city,address,phone_number,is_shipable
0,1,3,2019-03-04,Coat,100,3,Farida,Joseph,San Francisco,3153 Rhapsody Street,813-368-1200,1
1,2,3,2019-03-01,Shoes,80,3,Farida,Joseph,San Francisco,3153 Rhapsody Street,813-368-1200,1
2,3,3,2019-03-07,Skirt,30,3,Farida,Joseph,San Francisco,3153 Rhapsody Street,813-368-1200,1
3,4,7,2019-02-01,Coat,25,7,Jill,Michael,Austin,,813-297-0692,0
4,5,7,2019-03-10,Shoes,80,7,Jill,Michael,Austin,,813-297-0692,0
5,6,15,2019-02-01,Boats,100,15,Mia,Owen,Miami,,808-640-5201,0
6,7,15,2019-01-11,Shirts,60,15,Mia,Owen,Miami,,808-640-5201,0
7,8,15,2019-03-11,Slipper,20,15,Mia,Owen,Miami,,808-640-5201,0
8,9,15,2019-03-01,Jeans,80,15,Mia,Owen,Miami,,808-640-5201,0
9,10,15,2019-03-09,Shirts,50,15,Mia,Owen,Miami,,808-640-5201,0


In [10]:
result = 100 * (merged_df['is_shipable'].sum()/len(merged_df))
result

28.000000000000004

Solution Walkthrough
In this walkthrough, we will understand the given code and go through the steps involved in finding the percentage of shipable orders. We will learn how to use the merge function in Pandas to combine two dataframes, calculate a new column based on certain conditions, and calculate the percentage.

Understanding The Data
The code assumes the presence of two dataframes: orders and customers. The orders dataframe contains information about various orders, including the cust_id column that represents the customer ID associated with each order. The customers dataframe contains customer information, including the id column which matches the cust_id column in the orders dataframe. The address column in the customers dataframe represents the customer's address.

The Problem Statement
The task is to find the percentage of shipable orders. An order is considered shipable if the customer's address is known. We need to calculate the percentage of orders for which we have the customer's address.

Breaking Down The Code
Let's break down the given code into smaller sections and understand what each line does:

import pandas as pd
import numpy as np
The code starts by importing the pandas library and numpy library. We will use pandas for handling dataframes and numpy for mathematical calculations.

merged_df = pd.merge(
    orders, customers, left_on="cust_id", right_on="id"
)
This line combines the orders and customers dataframes using the merge function from pandas. We specify the columns to merge on using the left_on and right_on parameters. The resulting dataframe is assigned to the variable merged_df.

merged_df["is_shipable"] = (merged_df.address.notnull()).astype(int)
This line calculates a new column called is_shipable in the merged_df dataframe. The is_shipable column is assigned a value of 1 if the address column is not null (i.e. the customer's address is known), and 0 otherwise. The astype(int) converts the boolean values to integers.

result = 100 * (merged_df["is_shipable"].sum() / len(merged_df))
This line calculates the percentage of shipable orders by summing the values in the is_shipable column and dividing it by the total number of rows in the merged_df dataframe. Multiplying by 100 gives us the percentage, and the result is assigned to the variable result.

Bringing It All Together
The complete code combines the orders and customers dataframes using the merge function, calculates a new column is_shipable based on the presence of the customer's address, and then calculates the percentage of shipable orders.

import pandas as pd
import numpy as np

merged_df = pd.merge(
    orders, customers, left_on="cust_id", right_on="id"
)
merged_df["is_shipable"] = (merged_df.address.notnull()).astype(int)
result = 100 * (merged_df["is_shipable"].sum() / len(merged_df))
Conclusion
The given code effectively merges the orders and customers dataframes, calculates a new column based on the customer's address, and determines the percentage of shipable orders.