# 3.1.4 - Applying concat() and merge()

Arjun Khatri dreams of becoming a data analyst at a biotechnology organisation. However, as a recent graduate with limited real-world experience, he is working for Mr Zachariou at the All-You-Need convenience store. The store is really busy and doesn’t have a reliable inventory management system in place.

Mr Zachariou has asked Arjun to use historical inventory data to help inform the development of a new system that will be used to track sales and stock availability. Arjun sees the project as a great opportunity to gain experience and develop his skills.

In [1]:
# Understand the data sets 

# Import pandas.
import pandas as pd

# Read the CSV file from the current working directory.
transactions_2010 = pd.read_csv('transactions_2010.csv')

# View the DataFrame.
print(transactions_2010.shape)
transactions_2010.head()

(26850, 6)


Unnamed: 0,InvoiceNo,StockCode,Quantity,InvoiceDate,UnitPrice,CustomerID
0,536365,85123A,6,2010-12-01 8:26,2.55,17850
1,536365,71053,6,2010-12-01 8:26,3.39,17850
2,536365,84406B,8,2010-12-01 8:26,2.75,17850
3,536365,84029G,6,2010-12-01 8:26,3.39,17850
4,536365,84029E,6,2010-12-01 8:26,3.39,17850


In [2]:
# Determine the length of the DataFrame using the len()function

print(len(transactions_2010))

# Determine how many rows.
print(f"{transactions_2010.shape[0]} rows for transactions_2010")

26850
26850 rows for transactions_2010


In [3]:
# applying the above similar principle to transactions_2011

# Read the CSV file from the current working directory.
transactions_2011 = pd.read_csv('transactions_2011.csv')

# View the DataFrame.
print(transactions_2011.shape)
transactions_2011.head()

(379979, 6)


Unnamed: 0,InvoiceNo,StockCode,Quantity,InvoiceDate,UnitPrice,CustomerID
0,539993,22386,10,2011-01-04 10:00,1.95,13313
1,539993,21499,25,2011-01-04 10:00,0.42,13313
2,539993,21498,25,2011-01-04 10:00,0.42,13313
3,539993,22379,5,2011-01-04 10:00,2.1,13313
4,539993,20718,10,2011-01-04 10:00,1.25,13313


In [5]:
# Determine the length of the DataFrame using the len()function

print(len(transactions_2011))

# Determine how many rows.
print(f"{transactions_2011.shape[0]} rows for transactions_2011")

379979
379979 rows for transactions_2011


In [7]:
# Combine the two DataFrames to inform decisions. 

""" Follow below; 
*Create a new DataFrame with the name transactions.

*Use the pd.concat() function and specify the two DataFrames as transactions_2010 and transactions_2011 
within square ([]) brackets. This indicates to Python to treat the two DataFrames as one unit.

*Specify to concat by rows (axis=0).

*Pass the shape function to sense-check the DataFrame."""

# Combine the two DataFrames with the concat() function.
# The two DataFrames are transactions_2010 and transactions_2011.
transactions = pd.concat([transactions_2010, transactions_2011], axis=0)

transactions.head()

# Determine how many rows. 

print(f"{transactions.shape[0]} rows for all transactions")

# view the DataFrame
transactions.shape

406829 rows for all transactions


(406829, 6)

In [10]:
# Join transactions and products DataFrames

"""Follow below; 
*Decide which tables to join and whether to join based on a column name or rows (index). 
In this example, you can combine the products.csv file with the transactions DataFrame you created earlier. 
You can specify the Description column from the products.csv file and the StockCode column 
from the transactions DataFrame.

*Create a new DataFrame with the name products and import the products.csv file with the pd.read_csv() function.

*Create another DataFrame (e.g. transactions_description) and specify the DataFrame you want
to merge with transaction.merge() as products. 
Inside the brackets, specify the parameter on what to join (on='StockCode'), and the type of join (how='left').

*View the DataFrame with the head() function."""

# Read the CSV file from the current working directory.
products = pd.read_csv('products.csv')

# Use the left join to merge the two DataFrames.
transactions_description = pd.merge(transactions, products, on='StockCode', how='left')

# View the new DataFrame.
transactions_description.head()

Unnamed: 0,InvoiceNo,StockCode,Quantity,InvoiceDate,UnitPrice,CustomerID,Description
0,536365,85123A,6,2010-12-01 8:26,2.55,17850,CREAM HANGING HEART T-LIGHT HOLDER
1,536365,71053,6,2010-12-01 8:26,3.39,17850,WHITE METAL LANTERN
2,536365,84406B,8,2010-12-01 8:26,2.75,17850,CREAM CUPID HEARTS COAT HANGER
3,536365,84029G,6,2010-12-01 8:26,3.39,17850,KNITTED UNION FLAG HOT WATER BOTTLE
4,536365,84029E,6,2010-12-01 8:26,3.39,17850,RED WOOLLY HOTTIE WHITE HEART.


In [13]:
# Join transactions and customers DataFrames

""" Follow below; 
*Create a new DataFrame with the name transactions_description_country.

*Specify the DataFrame you want to merge with transactions_description.merge(). 
Inside the brackets, specify the DataFrame to join (customers), 
the parameter on what to join (on='CustomerID') and the type of join (how='left').

*Then view the DataFrame with the head() function."""

# Read the CSV file from the current working directory.
customers = pd.read_csv('customers.csv')

# Use the left join to merge the two DataFrames.
transactions_description_country = pd.merge(transactions_description, customers,
                                            on='CustomerID', how='left')

# View the new DataFrame.
transactions_description_country.head()

Unnamed: 0,InvoiceNo,StockCode,Quantity,InvoiceDate,UnitPrice,CustomerID,Description,Country
0,536365,85123A,6,2010-12-01 8:26,2.55,17850,CREAM HANGING HEART T-LIGHT HOLDER,United Kingdom
1,536365,71053,6,2010-12-01 8:26,3.39,17850,WHITE METAL LANTERN,United Kingdom
2,536365,84406B,8,2010-12-01 8:26,2.75,17850,CREAM CUPID HEARTS COAT HANGER,United Kingdom
3,536365,84029G,6,2010-12-01 8:26,3.39,17850,KNITTED UNION FLAG HOT WATER BOTTLE,United Kingdom
4,536365,84029E,6,2010-12-01 8:26,3.39,17850,RED WOOLLY HOTTIE WHITE HEART.,United Kingdom
