# Joining Datasets Together

Similar to SQL we need to append and join datasets together 



## The Dataset...

The data used in these examples is dummy data.

It is developed from a combination of Wikipedia pages and random generated numbers.

Wiki Pages:

- https://en.wikipedia.org/wiki/List_of_culinary_fruits
- https://en.wikipedia.org/wiki/List_of_vegetables
- https://en.wikipedia.org/wiki/List_of_streets_in_Perth
- https://en.wikipedia.org/wiki/List_of_Sydney_suburbs

In [1]:
# Import the dependencies

import pandas as pd
import numpy as np

import warnings
warnings.simplefilter('ignore')

In [2]:
# Import the Sales Data

sales_data = pd.read_excel(r'../Data/SalesDataset.xlsx')

# Import historical Sales Data

historical_sales_data = pd.read_excel(r'../Data/SalesDataset_2.xlsx')

# Import Store Mapping

store_mapping = pd.read_excel(r'../Data/Store_Table.xlsx')

## Appending / Concatenating 

In [3]:
# Append/Concat 2 datasets together

# Check the original size of the dataset

print(f"There is ",len(sales_data),"rows of data")

There is  13752 rows of data


In [4]:
# Append

append_sales_data = sales_data.append(historical_sales_data)

print(f"There is now ",len(append_sales_data),"rows of data")

There is now  27504 rows of data


In [5]:
# Concat

concat_sales_data = pd.concat([sales_data, historical_sales_data])

print(f"There is now ",len(concat_sales_data),"rows of data")

There is now  27504 rows of data


## Joining

### Left Joins

In [6]:
sales_data.head()

Unnamed: 0,Date,Campaign_ID,Customer_Group,Store_ID,Store_Name,Product_Category,Product_Group,Product,Product_ID,Units,Gross_Sales,Discount
0,31/12/2019,1000000.0,A Market That's Super,2001,Berowra Creek,Fruit,Tropical Fruit,Hydnora abyssinica,1100057,990,29.7,0.5
1,30/04/2020,,Super Super Market,1012,Bardia,Fruit,Tropical Fruit,Salak,1100094,630,0.0,0.498927
2,31/07/2020,,Market,3000,Blackett,Fruit,Tropical Fruit,Kola nut,1100062,671,1241.35,0.494303
3,31/10/2020,,A Market That's Super,2011,Bilgola Beach,Fruit,Tropical Fruit,Jackfruit,1100060,611,1283.1,0.493447
4,31/10/2020,,A Market That's Super,2006,Beverly Hills,Fruit,Tropical Fruit,Terap,1100107,684,1026.0,0.492293


In [7]:
store_mapping.head()

Unnamed: 0,Store_ID,Store_Name,Square_Metres,Address,Phone_Number,Premium
0,1009,Bankstown Aerodrome,448,80 Cantle Street,90993944,Yes
1,1024,Bellevue Hill,354,35 Carr Street,90993945,Yes
2,1022,Belfield,505,95 Cathedral Avenue,90993949,Yes
3,1008,Bankstown,290,87 Caroline Street,90993959,Yes
4,1018,Beacon Hill,996,94 Causeway Bridge,90993966,Yes


In [8]:
# Get more information about stores in the Sales dataset - join on Store ID

sales_data_store_data = pd.merge(sales_data, store_mapping, how='left', left_on='Store_ID', right_on='Store_ID')

sales_data_store_data.head()

Unnamed: 0,Date,Campaign_ID,Customer_Group,Store_ID,Store_Name_x,Product_Category,Product_Group,Product,Product_ID,Units,Gross_Sales,Discount,Store_Name_y,Square_Metres,Address,Phone_Number,Premium
0,31/12/2019,1000000.0,A Market That's Super,2001,Berowra Creek,Fruit,Tropical Fruit,Hydnora abyssinica,1100057,990,29.7,0.5,Berowra Creek,939,78 Forbes Lane,90994145,No
1,30/04/2020,,Super Super Market,1012,Bardia,Fruit,Tropical Fruit,Salak,1100094,630,0.0,0.498927,Bardia,650,96 Dyer Street,90994075,No
2,31/07/2020,,Market,3000,Blackett,Fruit,Tropical Fruit,Kola nut,1100062,671,1241.35,0.494303,Blackett,468,63 Cliff Street,90994036,No
3,31/10/2020,,A Market That's Super,2011,Bilgola Beach,Fruit,Tropical Fruit,Jackfruit,1100060,611,1283.1,0.493447,Bilgola Beach,804,50 Ellen Street,90994102,No
4,31/10/2020,,A Market That's Super,2006,Beverly Hills,Fruit,Tropical Fruit,Terap,1100107,684,1026.0,0.492293,Beverly Hills,561,65 Chapman Street,90993975,Yes


In [9]:
len(sales_data_store_data)

13752

### Inner Joins

In [10]:
# Filter the store mapping

filtered_store_map = store_mapping[store_mapping["Store_Name"] == "Bankstown Aerodrome"]

filtered_store_map

Unnamed: 0,Store_ID,Store_Name,Square_Metres,Address,Phone_Number,Premium
0,1009,Bankstown Aerodrome,448,80 Cantle Street,90993944,Yes


In [11]:
inner_sales_data_store_data = pd.merge(sales_data, filtered_store_map,
                                       how='inner',
                                       left_on='Store_ID',
                                       right_on='Store_ID')

inner_sales_data_store_data.head()

Unnamed: 0,Date,Campaign_ID,Customer_Group,Store_ID,Store_Name_x,Product_Category,Product_Group,Product,Product_ID,Units,Gross_Sales,Discount,Store_Name_y,Square_Metres,Address,Phone_Number,Premium
0,31/12/2019,1000000.0,Super Super Market,1009,Bankstown Aerodrome,Fruit,Tropical Fruit,Ice-cream bean,1100058,998,0.0,0.5,Bankstown Aerodrome,448,80 Cantle Street,90993944,Yes
1,31/07/2019,1000000.0,Super Super Market,1009,Bankstown Aerodrome,Fruit,Tropical Fruit,Jackfruit,1100060,905,1900.5,0.5,Bankstown Aerodrome,448,80 Cantle Street,90993944,Yes
2,31/10/2020,,Super Super Market,1009,Bankstown Aerodrome,Fruit,Tropical Fruit,Noni,1100081,683,2629.55,0.442315,Bankstown Aerodrome,448,80 Cantle Street,90993944,Yes
3,30/11/2020,,Super Super Market,1009,Bankstown Aerodrome,Fruit,Tropical Fruit,South american sapote,1100100,722,7075.6,0.409453,Bankstown Aerodrome,448,80 Cantle Street,90993944,Yes
4,31/10/2020,,Super Super Market,1009,Bankstown Aerodrome,Fruit,Tropical Fruit,Jícara,1100061,560,1870.4,0.376879,Bankstown Aerodrome,448,80 Cantle Street,90993944,Yes


In [12]:
len(inner_sales_data_store_data)

278

In [13]:
outer_sales_data_store_data = pd.merge(sales_data, filtered_store_map,
                                       how='outer',
                                       left_on='Store_ID',
                                       right_on='Store_ID')

outer_sales_data_store_data.head(100)

Unnamed: 0,Date,Campaign_ID,Customer_Group,Store_ID,Store_Name_x,Product_Category,Product_Group,Product,Product_ID,Units,Gross_Sales,Discount,Store_Name_y,Square_Metres,Address,Phone_Number,Premium
0,31/12/2019,1000000.0,A Market That's Super,2001,Berowra Creek,Fruit,Tropical Fruit,Hydnora abyssinica,1100057,990,29.70,0.500000,,,,,
1,30/04/2020,,A Market That's Super,2001,Berowra Creek,Fruit,Tropical Fruit,Ice-cream bean,1100058,438,0.00,0.352215,,,,,
2,31/07/2019,1000000.0,A Market That's Super,2001,Berowra Creek,Fruit,Tropical Fruit,Mammee,1100069,940,3656.60,0.500000,,,,,
3,31/12/2019,1000000.0,A Market That's Super,2001,Berowra Creek,Fruit,Tropical Fruit,Menteng,1100072,983,6536.95,0.500000,,,,,
4,31/08/2019,1000000.0,A Market That's Super,2001,Berowra Creek,Fruit,Tropical Fruit,Menteng,1100072,889,5911.85,0.500000,,,,,
5,30/11/2019,1000000.0,A Market That's Super,2001,Berowra Creek,Fruit,Tropical Fruit,Monkey fruit,1100073,940,2284.20,0.500000,,,,,
6,31/03/2019,1000000.0,A Market That's Super,2001,Berowra Creek,Fruit,Tropical Fruit,Noni,1100081,825,3176.25,0.500000,,,,,
7,31/01/2020,,A Market That's Super,2001,Berowra Creek,Fruit,Tropical Fruit,Maypop,1100071,62,0.00,0.194802,,,,,
8,31/10/2019,1000000.0,A Market That's Super,2001,Berowra Creek,Fruit,Tropical Fruit,Persimmon,1100088,904,7060.24,0.500000,,,,,
9,30/11/2020,,A Market That's Super,2001,Berowra Creek,Fruit,Tropical Fruit,Lardizabala,1100065,98,91.14,0.127550,,,,,


In [14]:
len(outer_sales_data_store_data)

13752