# Reporting using Pandas - Going Beyond Basics 

# Topics in this notebook?

**1. Aggregating statistics grouped by category**

    1. Reading a .csv File - Online Store Sales Data
    2. Grouping the Data on the basis of Product Category
        1. Returning all the groups and row indexes
        2. Get unique group keys
        3. Filter data on the basis of group keys
        4. Returning first row, last row and nth row for each group
    3. Grouping the Data Based on Product Category and Sub-Category**
        1. Returning all the groups and row indexes
        2. Get unique group keys
        3. Filter data on the basis of group keys
        4. Returning first row, last row and nth row for each group
    4. split-apply-combine
    5. Aggregation
        1. Built-in Aggregation Methods
        2. Aggregation with User-Defined Functions
        3. Applying different aggregation functions to DataFrame columns
    6. Filteration
        1. Built-in Filteration
        2. Filteration with User-Defined Functions
    7. Transformation
        1. Built-in Transformation
        2. Transformation with User-Defined Functions
**2. Solving a Case Study using groupby()**

    1. Reading a .csv File - Online Store Sales Data
    2. What are the different customer segments?
    3. How many sales records do we have in the dataset?
    4. What are the different product categories?
    5. How many days on average it take for the products to get shipped?
    6. Are there more orders placed on weekends?
    7. What is the minimum order amount and maximum order amount?
    8. What is the revenue generated in the year 2017?
    9. Which customer contributed to the maximum revenue in 2017 and how much?
    10. Who is the customer with customer_id == TC-20980 ?
    11. Which region recorded maximum sales count?
    12. Which product category is doing best? (revenue and count)
    
**3. Analysing and Summarizing using pivot_table()**

    1. What is the region-wise revenue?
    2. What is the region-wise count of sales?
    3. What is the region-wise count and sum of sales?
    4. What is the region-wise revenue generated of each product category?
    5. What is the region-wise revenue generated of each product sub-category under product category?

# Aggregating statistics grouped by category

**Question: How to calculate summary statistics?**

**Answer:** Basic statistics (mean, median, min, max, counts…) are easily calculable. These or custom aggregations can be applied on the entire data set, a sliding window of the data, or grouped by categories. The latter is also known as the split-apply-combine approach.

**Important Note**
    **groupby() and pivot_table()** are very powerful in analysing and summarizing the data. pivot_table() are more powerful when applying complex aggregation operations.

# Reading a .csv File - Online Store Sales Data

In [1]:
import numpy as np
import pandas as pd

In [4]:
df = pd.read_csv("online_store_sales (1).csv")

In [5]:
df.head()

Unnamed: 0,Row ID,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,Country,City,State,Postal Code,Region,Product ID,Category,Sub-Category,Product Name,Sales
0,1,CA-2017-152156,08/11/2017,11/11/2017,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,42420.0,South,FUR-BO-10001798,Furniture,Bookcases,Bush Somerset Collection Bookcase,261.96
1,2,CA-2017-152156,08/11/2017,11/11/2017,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,42420.0,South,FUR-CH-10000454,Furniture,Chairs,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.94
2,3,CA-2017-138688,12/06/2017,16/06/2017,Second Class,DV-13045,Darrin Van Huff,Corporate,United States,Los Angeles,California,90036.0,West,OFF-LA-10000240,Office Supplies,Labels,Self-Adhesive Address Labels for Typewriters b...,14.62
3,4,US-2016-108966,11/10/2016,18/10/2016,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,Florida,33311.0,South,FUR-TA-10000577,Furniture,Tables,Bretford CR4500 Series Slim Rectangular Table,957.5775
4,5,US-2016-108966,11/10/2016,18/10/2016,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,Florida,33311.0,South,OFF-ST-10000760,Office Supplies,Storage,Eldon Fold 'N Roll Cart System,22.368


In [6]:
df.tail()

Unnamed: 0,Row ID,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,Country,City,State,Postal Code,Region,Product ID,Category,Sub-Category,Product Name,Sales
9795,9796,CA-2017-125920,21/05/2017,28/05/2017,Standard Class,SH-19975,Sally Hughsby,Corporate,United States,Chicago,Illinois,60610.0,Central,OFF-BI-10003429,Office Supplies,Binders,"Cardinal HOLDit! Binder Insert Strips,Extra St...",3.798
9796,9797,CA-2016-128608,12/01/2016,17/01/2016,Standard Class,CS-12490,Cindy Schnelling,Corporate,United States,Toledo,Ohio,43615.0,East,OFF-AR-10001374,Office Supplies,Art,"BIC Brite Liner Highlighters, Chisel Tip",10.368
9797,9798,CA-2016-128608,12/01/2016,17/01/2016,Standard Class,CS-12490,Cindy Schnelling,Corporate,United States,Toledo,Ohio,43615.0,East,TEC-PH-10004977,Technology,Phones,GE 30524EE4,235.188
9798,9799,CA-2016-128608,12/01/2016,17/01/2016,Standard Class,CS-12490,Cindy Schnelling,Corporate,United States,Toledo,Ohio,43615.0,East,TEC-PH-10000912,Technology,Phones,Anker 24W Portable Micro USB Car Charger,26.376
9799,9800,CA-2016-128608,12/01/2016,17/01/2016,Standard Class,CS-12490,Cindy Schnelling,Corporate,United States,Toledo,Ohio,43615.0,East,TEC-AC-10000487,Technology,Accessories,SanDisk Cruzer 4 GB USB Flash Drive,10.384


In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9800 entries, 0 to 9799
Data columns (total 18 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Row ID         9800 non-null   int64  
 1   Order ID       9800 non-null   object 
 2   Order Date     9800 non-null   object 
 3   Ship Date      9800 non-null   object 
 4   Ship Mode      9800 non-null   object 
 5   Customer ID    9800 non-null   object 
 6   Customer Name  9800 non-null   object 
 7   Segment        9800 non-null   object 
 8   Country        9800 non-null   object 
 9   City           9800 non-null   object 
 10  State          9800 non-null   object 
 11  Postal Code    9789 non-null   float64
 12  Region         9800 non-null   object 
 13  Product ID     9800 non-null   object 
 14  Category       9800 non-null   object 
 15  Sub-Category   9800 non-null   object 
 16  Product Name   9800 non-null   object 
 17  Sales          9800 non-null   float64
dtypes: float

In [8]:
col_names = [ col.strip().lower().replace(' ', '_').replace('-', '_') for col in df.columns ]

df.columns = col_names

df.columns

Index(['row_id', 'order_id', 'order_date', 'ship_date', 'ship_mode',
       'customer_id', 'customer_name', 'segment', 'country', 'city', 'state',
       'postal_code', 'region', 'product_id', 'category', 'sub_category',
       'product_name', 'sales'],
      dtype='object')

# Grouping the Data on the basis of Product Category

In [9]:
grouped_df = df.groupby('category')

**Returning all the groups and row indexes**

The **groups** attribute is a dictionary whose keys are the computed unique groups and corresponding values are the axis labels belonging to each group.

In [10]:
grouped_df.groups

{'Furniture': [0, 1, 3, 5, 10, 23, 24, 27, 29, 36, 38, 39, 51, 52, 57, 65, 66, 72, 73, 76, 78, 85, 93, 96, 104, 110, 117, 119, 124, 125, 128, 129, 139, 140, 146, 149, 157, 167, 173, 177, 189, 192, 201, 204, 213, 222, 226, 228, 229, 231, 232, 234, 238, 239, 241, 242, 244, 249, 254, 272, 282, 292, 293, 294, 295, 301, 303, 304, 309, 310, 311, 313, 317, 325, 328, 338, 354, 362, 364, 369, 377, 384, 387, 399, 408, 412, 413, 415, 417, 422, 424, 425, 439, 440, 444, 446, 453, 456, 457, 462, ...], 'Office Supplies': [2, 4, 6, 8, 9, 12, 13, 14, 15, 16, 17, 18, 20, 21, 22, 25, 28, 30, 31, 32, 33, 34, 37, 42, 43, 45, 46, 49, 50, 53, 55, 56, 58, 60, 61, 63, 64, 67, 69, 70, 71, 74, 75, 77, 79, 80, 81, 82, 83, 84, 87, 88, 89, 91, 92, 94, 95, 97, 98, 99, 101, 102, 105, 108, 111, 112, 113, 114, 115, 116, 118, 120, 121, 122, 126, 127, 131, 132, 133, 134, 135, 136, 137, 138, 141, 142, 143, 144, 145, 150, 151, 153, 154, 155, 156, 158, 160, 162, 163, 164, ...], 'Technology': [7, 11, 19, 26, 35, 40, 41, 44, 

**Get unique group keys**

In [11]:
grouped_df.groups.keys()

dict_keys(['Furniture', 'Office Supplies', 'Technology'])

**Filter data on the basis of group keys**

In [12]:
# Selecting a group

grouped_df.get_group("Technology")

Unnamed: 0,row_id,order_id,order_date,ship_date,ship_mode,customer_id,customer_name,segment,country,city,state,postal_code,region,product_id,category,sub_category,product_name,sales
7,8,CA-2015-115812,09/06/2015,14/06/2015,Standard Class,BH-11710,Brosina Hoffman,Consumer,United States,Los Angeles,California,90032.0,West,TEC-PH-10002275,Technology,Phones,Mitel 5320 IP Phone VoIP phone,907.152
11,12,CA-2015-115812,09/06/2015,14/06/2015,Standard Class,BH-11710,Brosina Hoffman,Consumer,United States,Los Angeles,California,90032.0,West,TEC-PH-10002033,Technology,Phones,Konftel 250 Conference phone - Charcoal black,911.424
19,20,CA-2015-143336,27/08/2015,01/09/2015,Second Class,ZD-21925,Zuschuss Donatelli,Consumer,United States,San Francisco,California,94109.0,West,TEC-PH-10001949,Technology,Phones,Cisco SPA 501G IP Phone,213.480
26,27,CA-2017-121755,16/01/2017,20/01/2017,Second Class,EH-13945,Eric Hoffmann,Consumer,United States,Los Angeles,California,90049.0,West,TEC-AC-10003027,Technology,Accessories,Imation 8GB Mini TravelDrive USB 2.0 Flash Drive,90.570
35,36,CA-2017-117590,08/12/2017,10/12/2017,First Class,GH-14485,Gene Hale,Corporate,United States,Richardson,Texas,75080.0,Central,TEC-PH-10004977,Technology,Phones,GE 30524EE4,1097.544
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9780,9781,CA-2017-153178,14/09/2017,18/09/2017,Standard Class,CL-12565,Clay Ludtke,Consumer,United States,Long Beach,New York,11561.0,East,TEC-PH-10001944,Technology,Phones,Wi-Ex zBoost YX540 Cellular Phone Signal Booster,437.850
9789,9790,CA-2018-144491,27/03/2018,01/04/2018,Standard Class,CJ-12010,Caroline Jumper,Consumer,United States,Houston,Texas,77070.0,Central,TEC-AC-10004901,Technology,Accessories,Kensington SlimBlade Notebook Wireless Mouse w...,39.992
9797,9798,CA-2016-128608,12/01/2016,17/01/2016,Standard Class,CS-12490,Cindy Schnelling,Corporate,United States,Toledo,Ohio,43615.0,East,TEC-PH-10004977,Technology,Phones,GE 30524EE4,235.188
9798,9799,CA-2016-128608,12/01/2016,17/01/2016,Standard Class,CS-12490,Cindy Schnelling,Corporate,United States,Toledo,Ohio,43615.0,East,TEC-PH-10000912,Technology,Phones,Anker 24W Portable Micro USB Car Charger,26.376


**Returning first row, last row and nth row for each group**

In [13]:
grouped_df.first()

Unnamed: 0_level_0,row_id,order_id,order_date,ship_date,ship_mode,customer_id,customer_name,segment,country,city,state,postal_code,region,product_id,sub_category,product_name,sales
category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
Furniture,1,CA-2017-152156,08/11/2017,11/11/2017,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,42420.0,South,FUR-BO-10001798,Bookcases,Bush Somerset Collection Bookcase,261.96
Office Supplies,3,CA-2017-138688,12/06/2017,16/06/2017,Second Class,DV-13045,Darrin Van Huff,Corporate,United States,Los Angeles,California,90036.0,West,OFF-LA-10000240,Labels,Self-Adhesive Address Labels for Typewriters b...,14.62
Technology,8,CA-2015-115812,09/06/2015,14/06/2015,Standard Class,BH-11710,Brosina Hoffman,Consumer,United States,Los Angeles,California,90032.0,West,TEC-PH-10002275,Phones,Mitel 5320 IP Phone VoIP phone,907.152


In [14]:
grouped_df.last()

Unnamed: 0_level_0,row_id,order_id,order_date,ship_date,ship_mode,customer_id,customer_name,segment,country,city,state,postal_code,region,product_id,sub_category,product_name,sales
category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
Furniture,9793,CA-2015-127166,21/05/2015,23/05/2015,Second Class,KH-16360,Katherine Hughes,Consumer,United States,Houston,Texas,77070.0,Central,FUR-CH-10003396,Chairs,Global Deluxe Steno Chair,107.772
Office Supplies,9797,CA-2016-128608,12/01/2016,17/01/2016,Standard Class,CS-12490,Cindy Schnelling,Corporate,United States,Toledo,Ohio,43615.0,East,OFF-AR-10001374,Art,"BIC Brite Liner Highlighters, Chisel Tip",10.368
Technology,9800,CA-2016-128608,12/01/2016,17/01/2016,Standard Class,CS-12490,Cindy Schnelling,Corporate,United States,Toledo,Ohio,43615.0,East,TEC-AC-10000487,Accessories,SanDisk Cruzer 4 GB USB Flash Drive,10.384


In [15]:
grouped_df.nth(10)

Unnamed: 0,row_id,order_id,order_date,ship_date,ship_mode,customer_id,customer_name,segment,country,city,state,postal_code,region,product_id,category,sub_category,product_name,sales
17,18,CA-2015-167164,13/05/2015,15/05/2015,Second Class,AG-10270,Alejandro Grove,Consumer,United States,West Jordan,Utah,84084.0,West,OFF-ST-10000107,Office Supplies,Storage,Fellowes Super Stor/Drawer,55.5
38,39,CA-2016-117415,27/12/2016,31/12/2016,Standard Class,SN-20710,Steve Nguyen,Home Office,United States,Houston,Texas,77041.0,Central,FUR-BO-10002545,Furniture,Bookcases,"Atlantic Metals Mobile 3-Shelf Bookcases, Cust...",532.3992
54,55,CA-2017-105816,11/12/2017,17/12/2017,Standard Class,JM-15265,Janet Molinari,Corporate,United States,New York City,New York,10024.0,East,TEC-PH-10002447,Technology,Phones,AT&T CL83451 4-Handset Telephone,1029.95


# Grouping the Data Based on Product Category and Sub-Category

In [16]:
# Grouping based on category first and then sub_category

grouped_df = df.groupby(['category', 'sub_category'])

**Returning all the groups and row indexes**

In [17]:
# Returning each group and row ids associated to the group

grouped_df.groups

{('Furniture', 'Bookcases'): [0, 27, 38, 189, 192, 213, 292, 354, 369, 399, 412, 468, 472, 485, 688, 708, 736, 783, 841, 906, 954, 1042, 1114, 1211, 1247, 1302, 1369, 1386, 1534, 1539, 1545, 1594, 1610, 1714, 1723, 1760, 1762, 1860, 1875, 1932, 2007, 2025, 2115, 2122, 2225, 2262, 2281, 2305, 2326, 2353, 2403, 2415, 2471, 2543, 2546, 2558, 2603, 2650, 2654, 2737, 2777, 2796, 2808, 2825, 2860, 3023, 3030, 3074, 3098, 3100, 3102, 3175, 3351, 3365, 3368, 3466, 3507, 3512, 3762, 3820, 3845, 3910, 3928, 3985, 3994, 3999, 4023, 4071, 4088, 4110, 4184, 4217, 4223, 4266, 4284, 4383, 4385, 4389, 4423, 4453, ...], ('Furniture', 'Chairs'): [1, 23, 39, 52, 57, 66, 72, 85, 124, 128, 149, 157, 167, 173, 177, 228, 229, 244, 249, 294, 310, 317, 328, 362, 413, 415, 417, 424, 439, 444, 456, 457, 498, 502, 526, 531, 539, 551, 569, 586, 622, 635, 657, 730, 769, 777, 787, 791, 799, 819, 829, 847, 880, 916, 960, 980, 983, 990, 1021, 1030, 1045, 1060, 1067, 1081, 1126, 1158, 1177, 1190, 1198, 1200, 1202, 1212

**Filter data on the basis of group keys**

In [18]:
grouped_df.get_group(('Technology', 'Phones'))

Unnamed: 0,row_id,order_id,order_date,ship_date,ship_mode,customer_id,customer_name,segment,country,city,state,postal_code,region,product_id,category,sub_category,product_name,sales
7,8,CA-2015-115812,09/06/2015,14/06/2015,Standard Class,BH-11710,Brosina Hoffman,Consumer,United States,Los Angeles,California,90032.0,West,TEC-PH-10002275,Technology,Phones,Mitel 5320 IP Phone VoIP phone,907.152
11,12,CA-2015-115812,09/06/2015,14/06/2015,Standard Class,BH-11710,Brosina Hoffman,Consumer,United States,Los Angeles,California,90032.0,West,TEC-PH-10002033,Technology,Phones,Konftel 250 Conference phone - Charcoal black,911.424
19,20,CA-2015-143336,27/08/2015,01/09/2015,Second Class,ZD-21925,Zuschuss Donatelli,Consumer,United States,San Francisco,California,94109.0,West,TEC-PH-10001949,Technology,Phones,Cisco SPA 501G IP Phone,213.480
35,36,CA-2017-117590,08/12/2017,10/12/2017,First Class,GH-14485,Gene Hale,Corporate,United States,Richardson,Texas,75080.0,Central,TEC-PH-10004977,Technology,Phones,GE 30524EE4,1097.544
40,41,CA-2016-117415,27/12/2016,31/12/2016,Standard Class,SN-20710,Steve Nguyen,Home Office,United States,Houston,Texas,77041.0,Central,TEC-PH-10000486,Technology,Phones,Plantronics HL10 Handset Lifter,371.168
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9764,9765,CA-2015-123855,18/06/2015,23/06/2015,Standard Class,MC-18100,Mick Crebagga,Consumer,United States,Los Angeles,California,90036.0,West,TEC-PH-10000215,Technology,Phones,Plantronics Cordless Phone Headset with In-lin...,139.800
9773,9774,CA-2017-160234,26/06/2017,03/07/2017,Standard Class,PF-19225,Phillip Flathmann,Consumer,United States,Atlanta,Georgia,30318.0,South,TEC-PH-10004434,Technology,Phones,Cisco IP Phone 7961G VoIP phone - Dark gray,135.950
9780,9781,CA-2017-153178,14/09/2017,18/09/2017,Standard Class,CL-12565,Clay Ludtke,Consumer,United States,Long Beach,New York,11561.0,East,TEC-PH-10001944,Technology,Phones,Wi-Ex zBoost YX540 Cellular Phone Signal Booster,437.850
9797,9798,CA-2016-128608,12/01/2016,17/01/2016,Standard Class,CS-12490,Cindy Schnelling,Corporate,United States,Toledo,Ohio,43615.0,East,TEC-PH-10004977,Technology,Phones,GE 30524EE4,235.188


**Returning first row, last row and nth row for each group**

In [19]:
grouped_df.first()

Unnamed: 0_level_0,Unnamed: 1_level_0,row_id,order_id,order_date,ship_date,ship_mode,customer_id,customer_name,segment,country,city,state,postal_code,region,product_id,product_name,sales
category,sub_category,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
Furniture,Bookcases,1,CA-2017-152156,08/11/2017,11/11/2017,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,42420.0,South,FUR-BO-10001798,Bush Somerset Collection Bookcase,261.96
Furniture,Chairs,2,CA-2017-152156,08/11/2017,11/11/2017,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,42420.0,South,FUR-CH-10000454,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.94
Furniture,Furnishings,6,CA-2015-115812,09/06/2015,14/06/2015,Standard Class,BH-11710,Brosina Hoffman,Consumer,United States,Los Angeles,California,90032.0,West,FUR-FU-10001487,Eldon Expressions Wood and Plastic Desk Access...,48.86
Furniture,Tables,4,US-2016-108966,11/10/2016,18/10/2016,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,Florida,33311.0,South,FUR-TA-10000577,Bretford CR4500 Series Slim Rectangular Table,957.5775
Office Supplies,Appliances,10,CA-2015-115812,09/06/2015,14/06/2015,Standard Class,BH-11710,Brosina Hoffman,Consumer,United States,Los Angeles,California,90032.0,West,OFF-AP-10002892,Belkin F5C206VTEL 6 Outlet Surge,114.9
Office Supplies,Art,7,CA-2015-115812,09/06/2015,14/06/2015,Standard Class,BH-11710,Brosina Hoffman,Consumer,United States,Los Angeles,California,90032.0,West,OFF-AR-10002833,Newell 322,7.28
Office Supplies,Binders,9,CA-2015-115812,09/06/2015,14/06/2015,Standard Class,BH-11710,Brosina Hoffman,Consumer,United States,Los Angeles,California,90032.0,West,OFF-BI-10003910,DXL Angle-View Binders with Locking Rings by S...,18.504
Office Supplies,Envelopes,31,US-2016-150630,17/09/2016,21/09/2016,Standard Class,TB-21520,Tracy Blumstein,Consumer,United States,Philadelphia,Pennsylvania,19140.0,East,OFF-EN-10001509,Poly String Tie Envelopes,3.264
Office Supplies,Fasteners,54,CA-2017-105816,11/12/2017,17/12/2017,Standard Class,JM-15265,Janet Molinari,Corporate,United States,New York City,New York,10024.0,East,OFF-FA-10000304,Advantus Push Pins,15.26
Office Supplies,Labels,3,CA-2017-138688,12/06/2017,16/06/2017,Second Class,DV-13045,Darrin Van Huff,Corporate,United States,Los Angeles,California,90036.0,West,OFF-LA-10000240,Self-Adhesive Address Labels for Typewriters b...,14.62


In [20]:
grouped_df.last()

Unnamed: 0_level_0,Unnamed: 1_level_0,row_id,order_id,order_date,ship_date,ship_mode,customer_id,customer_name,segment,country,city,state,postal_code,region,product_id,product_name,sales
category,sub_category,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
Furniture,Bookcases,9788,CA-2018-144491,27/03/2018,01/04/2018,Standard Class,CJ-12010,Caroline Jumper,Consumer,United States,Houston,Texas,77070.0,Central,FUR-BO-10001811,"Atlantic Metals Mobile 5-Shelf Bookcases, Cust...",1023.332
Furniture,Chairs,9793,CA-2015-127166,21/05/2015,23/05/2015,Second Class,KH-16360,Katherine Hughes,Consumer,United States,Houston,Texas,77070.0,Central,FUR-CH-10003396,Global Deluxe Steno Chair,107.772
Furniture,Furnishings,9785,CA-2016-149748,31/05/2016,02/06/2016,Second Class,EM-13825,Elizabeth Moffitt,Corporate,United States,Paterson,New Jersey,7501.0,East,FUR-FU-10001847,Eldon Image Series Black Desk Accessories,8.28
Furniture,Tables,9757,CA-2018-113705,27/03/2018,29/03/2018,Second Class,LC-16870,Lena Cacioppo,Consumer,United States,Richmond,Virginia,23223.0,South,FUR-TA-10002533,BPI Conference Tables,292.1
Office Supplies,Appliances,9780,CA-2015-169019,26/07/2015,30/07/2015,Standard Class,LF-17185,Luke Foster,Consumer,United States,San Antonio,Texas,78207.0,Central,OFF-AP-10003281,Acco 6 Outlet Guardian Standard Surge Suppressor,4.836
Office Supplies,Art,9797,CA-2016-128608,12/01/2016,17/01/2016,Standard Class,CS-12490,Cindy Schnelling,Corporate,United States,Toledo,Ohio,43615.0,East,OFF-AR-10001374,"BIC Brite Liner Highlighters, Chisel Tip",10.368
Office Supplies,Binders,9796,CA-2017-125920,21/05/2017,28/05/2017,Standard Class,SH-19975,Sally Hughsby,Corporate,United States,Chicago,Illinois,60610.0,Central,OFF-BI-10003429,"Cardinal HOLDit! Binder Insert Strips,Extra St...",3.798
Office Supplies,Envelopes,9792,CA-2015-127166,21/05/2015,23/05/2015,Second Class,KH-16360,Katherine Hughes,Consumer,United States,Houston,Texas,77070.0,Central,OFF-EN-10003134,Staple envelope,56.064
Office Supplies,Fasteners,9702,CA-2017-105291,30/10/2017,04/11/2017,Standard Class,SP-20920,Susan Pistek,Consumer,United States,San Luis Obispo,California,93405.0,West,OFF-FA-10003059,Assorted Color Push Pins,3.62
Office Supplies,Labels,9754,CA-2018-113705,27/03/2018,29/03/2018,Second Class,LC-16870,Lena Cacioppo,Consumer,United States,Richmond,Virginia,23223.0,South,OFF-LA-10000476,Avery 05222 Permanent Self-Adhesive File Folde...,8.26


In [21]:
grouped_df.nth(10)

Unnamed: 0,row_id,order_id,order_date,ship_date,ship_mode,customer_id,customer_name,segment,country,city,state,postal_code,region,product_id,category,sub_category,product_name,sales
63,64,CA-2016-135545,24/11/2016,30/11/2016,Standard Class,KM-16720,Kunst Miller,Consumer,United States,Los Angeles,California,90004.0,West,OFF-BI-10001078,Office Supplies,Binders,"Acco PRESSTEX Data Binder with Storage Hooks, ...",25.824
84,85,US-2018-119662,13/11/2018,16/11/2018,First Class,CS-12400,Christopher Schild,Home Office,United States,Chicago,Illinois,60623.0,Central,OFF-ST-10003656,Office Supplies,Storage,Safco Industrial Wire Shelving,230.376
102,103,CA-2017-129903,01/12/2017,04/12/2017,Second Class,GZ-14470,Gary Zandusky,Consumer,United States,Rochester,Minnesota,55901.0,Central,OFF-PA-10004040,Office Supplies,Paper,Universal Premium White Copier/Laser Paper (20...,23.92
104,105,US-2016-156867,13/11/2016,17/11/2016,Standard Class,LC-16870,Lena Cacioppo,Consumer,United States,Aurora,Colorado,80013.0,West,FUR-FU-10004006,Furniture,Furnishings,"Deflect-o DuraMat Lighweight, Studded, Beveled...",102.36
107,108,CA-2018-119004,23/11/2018,28/11/2018,Standard Class,JM-15250,Janet Martin,Consumer,United States,Charlotte,North Carolina,28205.0,South,TEC-PH-10002844,Technology,Phones,Speck Products Candyshell Flip Case,27.992
111,112,CA-2017-128867,03/11/2017,10/11/2017,Standard Class,CL-12565,Clay Ludtke,Consumer,United States,Urbandale,Iowa,50322.0,Central,OFF-AR-10000380,Office Supplies,Art,"Hunt PowerHouse Electric Pencil Sharpener, Blue",75.96
149,150,CA-2017-114489,05/12/2017,09/12/2017,Standard Class,JE-16165,Justin Ellison,Corporate,United States,Franklin,Wisconsin,53132.0,Central,FUR-CH-10000454,Furniture,Chairs,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",1951.84
161,162,CA-2016-119697,28/12/2016,31/12/2016,Second Class,EM-13960,Eric Murdock,Consumer,United States,Philadelphia,Pennsylvania,19134.0,East,TEC-AC-10003657,Technology,Accessories,Lenovo 17-Key USB Numeric Keypad,54.384
202,203,CA-2015-133690,03/08/2015,05/08/2015,First Class,BS-11755,Bruce Stewart,Consumer,United States,Denver,Colorado,80219.0,West,OFF-AP-10003622,Office Supplies,Appliances,"Bravo II Megaboss 12-Amp Hard Body Upright, Re...",2.6
269,270,US-2018-145366,09/12/2018,13/12/2018,Standard Class,CA-12310,Christine Abelman,Corporate,United States,Cincinnati,Ohio,45231.0,East,OFF-EN-10004386,Office Supplies,Envelopes,Recycled Interoffice Envelopes with String and...,57.576


**split-apply-combine**

Calculating a given statistic (e.g. mean age) for each category in a column (e.g. male/female in the Sex column) is a common pattern. The groupby method is used to support this type of operations. This fits in the more general split-apply-combine pattern:

**Split** the data into groups

**Apply** a function to each group independently

**Combine** the results into a data structure

In the apply step, we might wish to do one of the following:

**Aggregation:** compute a summary statistic (or statistics) for each group. Some examples:

    Compute group sums or means.
    Compute group sizes / counts.

**Filtration:** discard some groups, according to a group-wise computation that evaluates to True or False. Some examples:

    Discard data that belong to groups with only a few members.
    Filter out data based on the group sum or mean.

**Transformation:** perform some group-specific computations and return a like-indexed object. Some examples:

    Standardize data (zscore) within a group.
    Filling NAs within groups with a value derived from each group.

# Aggregation

**Built-in Aggregation Methods**

Many common aggregations are built-in to GroupBy objects as methods. Of the methods listed below, those with a * do not have a Cython-optimized implementation.

**Method	           Description**

**any()**   --->	Compute whether any of the values in the groups are truthy

**all()**   --->    Compute whether all of the values in the groups are truthy

**count()** ---> 	Compute the number of non-NA values in the groups

**cov()*** ---> 	Compute the covariance of the groups

**first()** ---> 	Compute the first occurring value in each group

**idxmax()*** ---> 	Compute the index of the maximum value in each group

**idxmin()*** ---> Compute the index of the minimum value in each group

**last()** ---> Compute the last occurring value in each group

**max()** ---> 	Compute the maximum value in each group

**mean()** ---> 	Compute the mean of each group

**median()** ---> 	Compute the median of each group

**min()** ---> 	Compute the minimum value in each group

**nunique()** ---> 	Compute the number of unique values in each group

**prod()** ---> 	Compute the product of the values in each group

**quantile()** ---> 	Compute a given quantile of the values in each group

**sem()** ---> 	Compute the standard error of the mean of the values in each group

**size()** ---> 	Compute the number of values in each group

**skew()*** ---> 	Compute the skew of the values in each group

**std()** ---> 	Compute the standard deviation of the values in each group

**sum()** ---> 	Compute the sum of the values in each group

**var()** ---> 	Compute the variance of the values in each group

In [22]:
grouped_df = df.groupby('category')

In [23]:
grouped_df['category'].count()

category
Furniture          2078
Office Supplies    5909
Technology         1813
Name: category, dtype: int64

In [27]:
grouped_df['sales'].min()

category
Furniture          1.892
Office Supplies    0.444
Technology         0.990
Name: sales, dtype: float64

In [28]:
grouped_df['sales'].max()

category
Furniture           4416.174
Office Supplies     9892.740
Technology         22638.480
Name: sales, dtype: float64

In [29]:
grouped_df['sales'].mean()

category
Furniture          350.653790
Office Supplies    119.381001
Technology         456.401474
Name: sales, dtype: float64

In [30]:
grouped_df = df.groupby(['category', 'sub_category'])

In [31]:
grouped_df['sub_category'].count()

category         sub_category
Furniture        Bookcases        226
                 Chairs           607
                 Furnishings      931
                 Tables           314
Office Supplies  Appliances       459
                 Art              785
                 Binders         1492
                 Envelopes        248
                 Fasteners        214
                 Labels           357
                 Paper           1338
                 Storage          832
                 Supplies         184
Technology       Accessories      756
                 Copiers           66
                 Machines         115
                 Phones           876
Name: sub_category, dtype: int64

In [32]:
grouped_df['sales'].min()

category         sub_category
Furniture        Bookcases        35.490
                 Chairs           26.640
                 Furnishings       1.892
                 Tables           24.368
Office Supplies  Appliances        0.444
                 Art               1.344
                 Binders           0.556
                 Envelopes         1.632
                 Fasteners         1.240
                 Labels            2.088
                 Paper             3.380
                 Storage           4.464
                 Supplies          1.744
Technology       Accessories       0.990
                 Copiers         299.990
                 Machines         11.560
                 Phones            2.970
Name: sales, dtype: float64

In [33]:
grouped_df['sales'].max()

category         sub_category
Furniture        Bookcases        4404.900
                 Chairs           4416.174
                 Furnishings      1336.440
                 Tables           4297.644
Office Supplies  Appliances       2625.120
                 Art              1113.024
                 Binders          9892.740
                 Envelopes         604.656
                 Fasteners          93.360
                 Labels            786.480
                 Paper             733.950
                 Storage          2934.330
                 Supplies         8187.650
Technology       Accessories      3347.370
                 Copiers         17499.950
                 Machines        22638.480
                 Phones           4548.810
Name: sales, dtype: float64

In [34]:
grouped_df['sales'].mean()

category         sub_category
Furniture        Bookcases        503.598224
                 Chairs           531.833165
                 Furnishings       95.823865
                 Tables           645.893720
Office Supplies  Appliances       227.926804
                 Art               34.019631
                 Binders          134.067550
                 Envelopes         65.032444
                 Fasteners         14.027850
                 Labels            34.587468
                 Paper             57.420257
                 Storage          263.633885
                 Supplies         252.284283
Technology       Accessories      217.178175
                 Copiers         2215.880212
                 Machines        1645.553313
                 Phones           374.180877
Name: sales, dtype: float64

In [35]:
grouped_df['sales'].idxmax()

category         sub_category
Furniture        Bookcases       9741
                 Chairs          7243
                 Furnishings     7387
                 Tables          9639
Office Supplies  Appliances      7579
                 Art               67
                 Binders         9039
                 Envelopes       2516
                 Fasteners       8006
                 Labels          1621
                 Paper           3262
                 Storage         3070
                 Supplies        2505
Technology       Accessories      251
                 Copiers         6826
                 Machines        2697
                 Phones          2492
Name: sales, dtype: int64

In [36]:
df.loc[2492]

row_id                      2493
order_id          CA-2015-144624
order_date            19/11/2015
ship_date             23/11/2015
ship_mode         Standard Class
customer_id             JM-15865
customer_name        John Murray
segment                 Consumer
country            United States
city                   Jamestown
state                   New York
postal_code              14701.0
region                      East
product_id       TEC-PH-10002885
category              Technology
sub_category              Phones
product_name      Apple iPhone 5
sales                    4548.81
Name: 2492, dtype: object

In [37]:
df.loc[2697]

row_id                                                        2698
order_id                                            CA-2015-145317
order_date                                              18/03/2015
ship_date                                               23/03/2015
ship_mode                                           Standard Class
customer_id                                               SM-20320
customer_name                                          Sean Miller
segment                                                Home Office
country                                              United States
city                                                  Jacksonville
state                                                      Florida
postal_code                                                32216.0
region                                                       South
product_id                                         TEC-MA-10002412
category                                                Techno

**Aggregation with User-Defined Functions**

In [38]:
grouped_df['sales'].agg(["min", "mean", "max"])

Unnamed: 0_level_0,Unnamed: 1_level_0,min,mean,max
category,sub_category,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Furniture,Bookcases,35.49,503.598224,4404.9
Furniture,Chairs,26.64,531.833165,4416.174
Furniture,Furnishings,1.892,95.823865,1336.44
Furniture,Tables,24.368,645.89372,4297.644
Office Supplies,Appliances,0.444,227.926804,2625.12
Office Supplies,Art,1.344,34.019631,1113.024
Office Supplies,Binders,0.556,134.06755,9892.74
Office Supplies,Envelopes,1.632,65.032444,604.656
Office Supplies,Fasteners,1.24,14.02785,93.36
Office Supplies,Labels,2.088,34.587468,786.48


In [39]:
grouped_df['sales'].agg(lambda values : min(values))

category         sub_category
Furniture        Bookcases        35.490
                 Chairs           26.640
                 Furnishings       1.892
                 Tables           24.368
Office Supplies  Appliances        0.444
                 Art               1.344
                 Binders           0.556
                 Envelopes         1.632
                 Fasteners         1.240
                 Labels            2.088
                 Paper             3.380
                 Storage           4.464
                 Supplies          1.744
Technology       Accessories       0.990
                 Copiers         299.990
                 Machines         11.560
                 Phones            2.970
Name: sales, dtype: float64

**Applying different aggregation functions to DataFrame columns**

In [40]:
grouped_df.agg({'order_date' : ['min', 'max'], 'sales': ['mean', 'std']})

Unnamed: 0_level_0,Unnamed: 1_level_0,order_date,order_date,sales,sales
Unnamed: 0_level_1,Unnamed: 1_level_1,min,max,mean,std
category,sub_category,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
Furniture,Bookcases,01/03/2016,31/12/2015,503.598224,641.41928
Furniture,Chairs,01/01/2018,31/12/2015,531.833165,551.180296
Furniture,Furnishings,01/01/2018,31/12/2016,95.823865,148.42149
Furniture,Tables,01/03/2015,31/10/2015,645.89372,598.584981
Office Supplies,Appliances,01/01/2018,31/08/2018,227.926804,378.006735
Office Supplies,Art,01/01/2018,31/12/2017,34.019631,60.301752
Office Supplies,Binders,01/01/2018,31/12/2017,134.06755,568.09997
Office Supplies,Envelopes,01/04/2018,31/12/2015,65.032444,85.170691
Office Supplies,Fasteners,01/03/2015,31/12/2016,14.02785,12.466864
Office Supplies,Labels,01/04/2015,31/12/2015,34.587468,74.802711


# Filteration

A filtration is a GroupBy operation the subsets the original grouping object. It may either filter out entire groups, part of groups, or both. Filtrations return a filtered version of the calling object, including the grouping columns when provided. In the following example, class is included in the result.

**Built-in Filteration**

**Method	Description**

**head()**	Select the top row(s) of each group

**nth()**	Select the nth row(s) of each group

**tail()**	Select the bottom row(s) of each group

**Filteration with User-Defined Functions**

The **filter** method takes a User-Defined Function (UDF) that, when applied to an entire group, returns either **True or False.** The result of the **filter** method is then the subset of groups for which the UDF returned True.

In [41]:
grouped_df = df.groupby('category')

In [42]:
grouped_df['category'].count()

category
Furniture          2078
Office Supplies    5909
Technology         1813
Name: category, dtype: int64

In [43]:
grouped_df['sales'].mean()

category
Furniture          350.653790
Office Supplies    119.381001
Technology         456.401474
Name: sales, dtype: float64

In [44]:
grouped_df.filter(lambda group: group['sales'].mean() > 200)

Unnamed: 0,row_id,order_id,order_date,ship_date,ship_mode,customer_id,customer_name,segment,country,city,state,postal_code,region,product_id,category,sub_category,product_name,sales
0,1,CA-2017-152156,08/11/2017,11/11/2017,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,42420.0,South,FUR-BO-10001798,Furniture,Bookcases,Bush Somerset Collection Bookcase,261.9600
1,2,CA-2017-152156,08/11/2017,11/11/2017,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,42420.0,South,FUR-CH-10000454,Furniture,Chairs,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.9400
3,4,US-2016-108966,11/10/2016,18/10/2016,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,Florida,33311.0,South,FUR-TA-10000577,Furniture,Tables,Bretford CR4500 Series Slim Rectangular Table,957.5775
5,6,CA-2015-115812,09/06/2015,14/06/2015,Standard Class,BH-11710,Brosina Hoffman,Consumer,United States,Los Angeles,California,90032.0,West,FUR-FU-10001487,Furniture,Furnishings,Eldon Expressions Wood and Plastic Desk Access...,48.8600
7,8,CA-2015-115812,09/06/2015,14/06/2015,Standard Class,BH-11710,Brosina Hoffman,Consumer,United States,Los Angeles,California,90032.0,West,TEC-PH-10002275,Technology,Phones,Mitel 5320 IP Phone VoIP phone,907.1520
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9790,9791,CA-2018-144491,27/03/2018,01/04/2018,Standard Class,CJ-12010,Caroline Jumper,Consumer,United States,Houston,Texas,77070.0,Central,FUR-CH-10001714,Furniture,Chairs,"Global Leather & Oak Executive Chair, Burgundy",211.2460
9792,9793,CA-2015-127166,21/05/2015,23/05/2015,Second Class,KH-16360,Katherine Hughes,Consumer,United States,Houston,Texas,77070.0,Central,FUR-CH-10003396,Furniture,Chairs,Global Deluxe Steno Chair,107.7720
9797,9798,CA-2016-128608,12/01/2016,17/01/2016,Standard Class,CS-12490,Cindy Schnelling,Corporate,United States,Toledo,Ohio,43615.0,East,TEC-PH-10004977,Technology,Phones,GE 30524EE4,235.1880
9798,9799,CA-2016-128608,12/01/2016,17/01/2016,Standard Class,CS-12490,Cindy Schnelling,Corporate,United States,Toledo,Ohio,43615.0,East,TEC-PH-10000912,Technology,Phones,Anker 24W Portable Micro USB Car Charger,26.3760


# Transformation

Unlike aggregations, the groupings that are used to split the original object are not included in the result.

In [45]:
df.head()

Unnamed: 0,row_id,order_id,order_date,ship_date,ship_mode,customer_id,customer_name,segment,country,city,state,postal_code,region,product_id,category,sub_category,product_name,sales
0,1,CA-2017-152156,08/11/2017,11/11/2017,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,42420.0,South,FUR-BO-10001798,Furniture,Bookcases,Bush Somerset Collection Bookcase,261.96
1,2,CA-2017-152156,08/11/2017,11/11/2017,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,42420.0,South,FUR-CH-10000454,Furniture,Chairs,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.94
2,3,CA-2017-138688,12/06/2017,16/06/2017,Second Class,DV-13045,Darrin Van Huff,Corporate,United States,Los Angeles,California,90036.0,West,OFF-LA-10000240,Office Supplies,Labels,Self-Adhesive Address Labels for Typewriters b...,14.62
3,4,US-2016-108966,11/10/2016,18/10/2016,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,Florida,33311.0,South,FUR-TA-10000577,Furniture,Tables,Bretford CR4500 Series Slim Rectangular Table,957.5775
4,5,US-2016-108966,11/10/2016,18/10/2016,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,Florida,33311.0,South,OFF-ST-10000760,Office Supplies,Storage,Eldon Fold 'N Roll Cart System,22.368


In [48]:
grouped_df = df.groupby('category')

In [49]:
grouped_df.cumsum()

NotImplementedError: function is not implemented for this dtype: [how->cumsum,dtype->object]

**Built-in Transformation**

**Method	Description**

**bfill()**	-->  Back fill NA values within each group

**cumcount()** -->	Compute the cumulative count within each group

**cummax()** -->	Compute the cumulative max within each group

**cummin()** --> Compute the cumulative min within each group

**cumprod()** -->	Compute the cumulative product within each group

**cumsum()** -->	Compute the cumulative sum within each group

**diff()**	--> Compute the difference between adjacent values within each group

**ffill()**	--> Forward fill NA values within each group

**fillna()** --> Fill NA values within each group

**pct_change()** --> Compute the percent change between adjacent values within each group

**rank()** --> Compute the rank of each value within each group

**shift()**	--> Shift values up or down within each group

# Transformation with User-Defined Functions

Similar to the aggregation method, the **transform()** method can accept string aliases to the built-in transformation methods in the previous section. It can also accept string aliases to the built-in aggregation methods. When an aggregation method is provided, the result will be broadcast across the group.

In addition to string aliases, the transform() method can also accept User-Defined Functions (UDFs). The UDF must:

**Note:** Transforming by supplying **transform** with a UDF is often less performant than using the built-in methods on GroupBy. Consider breaking up a complex operation into a chain of operations that utilize the built-in methods.

# Solving a Case Study using Groupby

**Reading a .csv File - Online Store Sales Data**

In [51]:
df = pd.read_csv('online_store_sales (1).csv', parse_dates=["Order Date", "Ship Date"], dayfirst=True)

df.head()

Unnamed: 0,Row ID,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,Country,City,State,Postal Code,Region,Product ID,Category,Sub-Category,Product Name,Sales
0,1,CA-2017-152156,2017-11-08,2017-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,42420.0,South,FUR-BO-10001798,Furniture,Bookcases,Bush Somerset Collection Bookcase,261.96
1,2,CA-2017-152156,2017-11-08,2017-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,42420.0,South,FUR-CH-10000454,Furniture,Chairs,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.94
2,3,CA-2017-138688,2017-06-12,2017-06-16,Second Class,DV-13045,Darrin Van Huff,Corporate,United States,Los Angeles,California,90036.0,West,OFF-LA-10000240,Office Supplies,Labels,Self-Adhesive Address Labels for Typewriters b...,14.62
3,4,US-2016-108966,2016-10-11,2016-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,Florida,33311.0,South,FUR-TA-10000577,Furniture,Tables,Bretford CR4500 Series Slim Rectangular Table,957.5775
4,5,US-2016-108966,2016-10-11,2016-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,Florida,33311.0,South,OFF-ST-10000760,Office Supplies,Storage,Eldon Fold 'N Roll Cart System,22.368


**What comes to my mind immediately after looking at the dataset?**

    What are the different customer segments?
    How many sales records do we have in the dataset?
    What are the different product categories?
    How many days on average it takes for the products to get shipped?
    Are there more orders placed on weekends?
    What is the minimum order amount and maximum order amount?
    Which customer contributed to the maximum revenue in 2017 and how much?
    What is the revenue generated in the year 2017?
    Which region recorded maximum sales count?
    Which product category is doing best? (revenue and count)

**Let's try to answer all the questions.**

In [52]:
col_names = [ col.strip().lower().replace(' ', '_').replace('-', '_') for col in df.columns ]

df.columns = col_names

df.columns

Index(['row_id', 'order_id', 'order_date', 'ship_date', 'ship_mode',
       'customer_id', 'customer_name', 'segment', 'country', 'city', 'state',
       'postal_code', 'region', 'product_id', 'category', 'sub_category',
       'product_name', 'sales'],
      dtype='object')

**What are the different customer segments?**

In [53]:
print("Customer Segments:\n", df['segment'].unique())

Customer Segments:
 ['Consumer' 'Corporate' 'Home Office']


In [25]:
df["segment"].nunique()

3

**How many sales records do we have in the dataset?**

In [26]:
df.shape

(9800, 18)

In [27]:
df.shape[0]

9800

**What are the different product categories?**

In [14]:
df["category"].unique()

array(['Furniture', 'Office Supplies', 'Technology'], dtype=object)

**How many days on average it take for the products to get shipped?**

In [28]:
df["shipped_days"] = df["ship_date"] - df["order_date"]

In [47]:
df["shipped_days"] = df["shipped_days"].dt.days

In [48]:
df["shipped_days"].mean()

3.9611224489795918

**What is the revenue generated in the year 2017?**

In [52]:
df["ship_year"] = df["ship_date"].dt.year
group_by = df.groupby("ship_year")

In [53]:
group_by["sales"].sum()

ship_year
2015    467041.8731
2016    467302.3884
2017    602306.3450
2018    719726.4794
2019      5159.6968
Name: sales, dtype: float64

In [70]:
group_by['ship_date'].count()

order_year
2015    1953
2016    2055
2017    2534
2018    3258
Name: ship_date, dtype: int64

In [56]:
df["order_year"] = df["order_date"].dt.year
group_by1 = df.groupby("order_year")

In [57]:
group_by1["sales"].sum()

order_year
2015    479856.2081
2016    459436.0054
2017    600192.5500
2018    722052.0192
Name: sales, dtype: float64

In [74]:
group_by1["order_date"].count()

order_year
2015    1953
2016    2055
2017    2534
2018    3258
Name: order_date, dtype: int64

**Maximum revenue generated on which week day?**

In [29]:
df["day_names"] = df["order_date"].dt.day_name()

In [30]:
groupby_df = df.groupby("day_names")

In [37]:
groupby_df["sales"].sum().sort_values(ascending= False)

day_names
Saturday     420901.4763
Tuesday      420535.9243
Sunday       377868.7779
Monday       348791.5516
Wednesday    315888.9722
Friday       234710.8402
Thursday     142839.2402
Name: sales, dtype: float64

In [41]:
df.pivot_table(values ="sales",
               index="day_names",
               aggfunc=["sum"])

Unnamed: 0_level_0,sum
Unnamed: 0_level_1,sales
day_names,Unnamed: 1_level_2
Friday,234710.8402
Monday,348791.5516
Saturday,420901.4763
Sunday,377868.7779
Thursday,142839.2402
Tuesday,420535.9243
Wednesday,315888.9722


**What is the minimum order amount and maximum order amount?**

In [81]:
gp_df = df.groupby("order_id")

In [83]:
gp_df["sales"].min()

order_id
CA-2015-100006    377.970
CA-2015-100090    196.704
CA-2015-100293     91.056
CA-2015-100328      3.928
CA-2015-100363      2.368
                   ...   
US-2018-168802     18.368
US-2018-169320     11.680
US-2018-169488     16.900
US-2018-169502     21.810
US-2018-169551     13.392
Name: sales, Length: 4922, dtype: float64

In [84]:
gp_df["sales"].max()

order_id
CA-2015-100006    377.970
CA-2015-100090    502.488
CA-2015-100293     91.056
CA-2015-100328      3.928
CA-2015-100363     19.008
                   ...   
US-2018-168802     18.368
US-2018-169320    159.750
US-2018-169488     39.960
US-2018-169502     91.600
US-2018-169551    683.988
Name: sales, Length: 4922, dtype: float64



What just happened? 🤯

This is not what I expected. 😥

Always remember the basics - Groupby Splits, Aggregation is applied on each group and results are combined and displayed.


In [85]:
gp_df["sales"].sum()

order_id
CA-2015-100006     377.970
CA-2015-100090     699.192
CA-2015-100293      91.056
CA-2015-100328       3.928
CA-2015-100363      21.376
                    ...   
US-2018-168802      18.368
US-2018-169320     171.430
US-2018-169488      56.860
US-2018-169502     113.410
US-2018-169551    1344.838
Name: sales, Length: 4922, dtype: float64

In [86]:
order_id_sum = gp_df["sales"].sum()

In [92]:
order_id_sum.head()

order_id
CA-2015-100006    377.970
CA-2015-100090    699.192
CA-2015-100293     91.056
CA-2015-100328      3.928
CA-2015-100363     21.376
Name: sales, dtype: float64

In [93]:
order_id_sum = order_id_sum.reset_index()

In [95]:
order_id_sum

Unnamed: 0,order_id,sales
0,CA-2015-100006,377.970
1,CA-2015-100090,699.192
2,CA-2015-100293,91.056
3,CA-2015-100328,3.928
4,CA-2015-100363,21.376
...,...,...
4917,US-2018-168802,18.368
4918,US-2018-169320,171.430
4919,US-2018-169488,56.860
4920,US-2018-169502,113.410


In [96]:
order_id_sum["sales"].min()

0.556

In [97]:
order_id_sum["sales"].max()

23661.228

# Analysing and Summarizing using pivot_table()

**What is the region wise revebue and count of sales?**

In [43]:
df.pivot_table(values = 'sales',
               index =["region"],
               aggfunc=["count"])

Unnamed: 0_level_0,count
Unnamed: 0_level_1,sales
region,Unnamed: 1_level_2
Central,2277
East,2785
South,1598
West,3140


In [54]:
df.pivot_table(values="sales", 
               index=["region"],
               margins=True, 
               aggfunc="sum").round(2)

Unnamed: 0_level_0,sales
region,Unnamed: 1_level_1
Central,492646.91
East,669518.73
South,389151.46
West,710219.68
All,2261536.78


In [42]:
df["region"].value_counts()

region
West       3140
East       2785
Central    2277
South      1598
Name: count, dtype: int64

In [45]:
df.pivot_table(values = 'sales',
               index =["region"],
               aggfunc= "sum")

Unnamed: 0_level_0,sales
region,Unnamed: 1_level_1
Central,492646.9132
East,669518.726
South,389151.459
West,710219.6845


In [55]:
df.pivot_table(values="sales", 
               index=["region"], 
               aggfunc="sum").apply(lambda values: values*100/sum(values))

Unnamed: 0_level_0,sales
region,Unnamed: 1_level_1
Central,21.783723
East,29.604591
South,17.20739
West,31.404295


**What is the region-wise revenue generated of each product category?**

In [45]:
df.pivot_table(values = 'sales',
               index =["region"],
               columns = ["category"],
               aggfunc=["sum"])

Unnamed: 0_level_0,sum,sum,sum
category,Furniture,Office Supplies,Technology
region,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
Central,160317.4622,163590.243,168739.208
East,206461.388,199940.811,263116.527
South,116531.48,124424.771,148195.208
West,245348.2455,217466.509,247404.93


In [47]:
gp = df.groupby(["category","region"])
gp["sales"].sum()

category         region 
Furniture        Central    160317.4622
                 East       206461.3880
                 South      116531.4800
                 West       245348.2455
Office Supplies  Central    163590.2430
                 East       199940.8110
                 South      124424.7710
                 West       217466.5090
Technology       Central    168739.2080
                 East       263116.5270
                 South      148195.2080
                 West       247404.9300
Name: sales, dtype: float64

**What is the region-wise revenue generated of each product sub-category under product category?**

In [50]:
#to find the percentage of sales osf each region
df.pivot_table(values="sales",
              index = ["category", "sub_category"],
               columns=["region"],
              aggfunc=["sum"])

Unnamed: 0_level_0,Unnamed: 1_level_0,sum,sum,sum,sum
Unnamed: 0_level_1,region,Central,East,South,West
category,sub_category,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
Furniture,Bookcases,23773.7112,43819.334,10899.362,35320.7915
Furniture,Chairs,82372.776,95687.509,44739.246,100023.2
Furniture,Furnishings,15016.004,28145.326,17062.66,28988.028
Furniture,Tables,39154.971,38809.219,43830.212,81016.226
Office Supplies,Appliances,21176.833,34119.078,19525.326,29797.166
Office Supplies,Art,5746.188,7430.974,4510.424,9017.824
Office Supplies,Binders,56865.012,51255.775,36734.365,55173.633
Office Supplies,Envelopes,4537.304,4138.246,3345.556,4106.94
Office Supplies,Fasteners,769.57,819.718,503.316,909.356
Office Supplies,Labels,2435.536,2554.914,2344.18,5013.096


In [48]:
gp = df.groupby(["category","sub_category","region"])
gp["sales"].sum()

category    sub_category  region 
Furniture   Bookcases     Central    23773.7112
                          East       43819.3340
                          South      10899.3620
                          West       35320.7915
            Chairs        Central    82372.7760
                                        ...    
Technology  Machines      West       42444.1220
            Phones        Central    71939.9520
                          East       99884.6620
                          South      58098.3380
                          West       97859.4960
Name: sales, Length: 68, dtype: float64

**What is the region-wise revenue generated of each product sub-category under product category?**

In [56]:
df.pivot_table(values="sales", 
               index=["category", 'sub_category'], 
               columns=["region"], 
               aggfunc="count")

Unnamed: 0_level_0,region,Central,East,South,West
category,sub_category,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Furniture,Bookcases,49,70,28,79
Furniture,Chairs,151,167,86,203
Furniture,Furnishings,198,275,162,296
Furniture,Tables,72,79,50,113
Office Supplies,Appliances,122,123,81,133
Office Supplies,Art,175,225,140,245
Office Supplies,Binders,362,427,241,462
Office Supplies,Envelopes,58,70,54,66
Office Supplies,Fasteners,53,61,29,71
Office Supplies,Labels,75,105,64,113


**Built in Filtaration**

In [6]:
df.columns

Index(['row_id', 'order_id', 'order_date', 'ship_date', 'ship_mode',
       'customer_id', 'customer_name', 'segment', 'country', 'city', 'state',
       'postal_code', 'region', 'product_id', 'category', 'sub_category',
       'product_name', 'sales'],
      dtype='object')

In [7]:
grouped_df = df.groupby("category")

In [8]:
grouped_df.groups.keys()

dict_keys(['Furniture', 'Office Supplies', 'Technology'])

In [9]:
grouped_df.head() # returns the fisrt 5 row's of each group.

Unnamed: 0,row_id,order_id,order_date,ship_date,ship_mode,customer_id,customer_name,segment,country,city,state,postal_code,region,product_id,category,sub_category,product_name,sales
0,1,CA-2017-152156,2017-11-08,2017-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,42420.0,South,FUR-BO-10001798,Furniture,Bookcases,Bush Somerset Collection Bookcase,261.96
1,2,CA-2017-152156,2017-11-08,2017-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,42420.0,South,FUR-CH-10000454,Furniture,Chairs,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.94
2,3,CA-2017-138688,2017-06-12,2017-06-16,Second Class,DV-13045,Darrin Van Huff,Corporate,United States,Los Angeles,California,90036.0,West,OFF-LA-10000240,Office Supplies,Labels,Self-Adhesive Address Labels for Typewriters b...,14.62
3,4,US-2016-108966,2016-10-11,2016-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,Florida,33311.0,South,FUR-TA-10000577,Furniture,Tables,Bretford CR4500 Series Slim Rectangular Table,957.5775
4,5,US-2016-108966,2016-10-11,2016-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,Florida,33311.0,South,OFF-ST-10000760,Office Supplies,Storage,Eldon Fold 'N Roll Cart System,22.368
5,6,CA-2015-115812,2015-06-09,2015-06-14,Standard Class,BH-11710,Brosina Hoffman,Consumer,United States,Los Angeles,California,90032.0,West,FUR-FU-10001487,Furniture,Furnishings,Eldon Expressions Wood and Plastic Desk Access...,48.86
6,7,CA-2015-115812,2015-06-09,2015-06-14,Standard Class,BH-11710,Brosina Hoffman,Consumer,United States,Los Angeles,California,90032.0,West,OFF-AR-10002833,Office Supplies,Art,Newell 322,7.28
7,8,CA-2015-115812,2015-06-09,2015-06-14,Standard Class,BH-11710,Brosina Hoffman,Consumer,United States,Los Angeles,California,90032.0,West,TEC-PH-10002275,Technology,Phones,Mitel 5320 IP Phone VoIP phone,907.152
8,9,CA-2015-115812,2015-06-09,2015-06-14,Standard Class,BH-11710,Brosina Hoffman,Consumer,United States,Los Angeles,California,90032.0,West,OFF-BI-10003910,Office Supplies,Binders,DXL Angle-View Binders with Locking Rings by S...,18.504
9,10,CA-2015-115812,2015-06-09,2015-06-14,Standard Class,BH-11710,Brosina Hoffman,Consumer,United States,Los Angeles,California,90032.0,West,OFF-AP-10002892,Office Supplies,Appliances,Belkin F5C206VTEL 6 Outlet Surge,114.9


In [10]:
grouped_df.tail()

Unnamed: 0,row_id,order_id,order_date,ship_date,ship_mode,customer_id,customer_name,segment,country,city,state,postal_code,region,product_id,category,sub_category,product_name,sales
9780,9781,CA-2017-153178,2017-09-14,2017-09-18,Standard Class,CL-12565,Clay Ludtke,Consumer,United States,Long Beach,New York,11561.0,East,TEC-PH-10001944,Technology,Phones,Wi-Ex zBoost YX540 Cellular Phone Signal Booster,437.85
9786,9787,US-2015-114377,2015-11-05,2015-11-05,Same Day,BG-11035,Barry Gonzalez,Consumer,United States,Hampton,Virginia,23666.0,South,FUR-CH-10004754,Furniture,Chairs,"Global Stack Chair with Arms, Black",149.9
9787,9788,CA-2018-144491,2018-03-27,2018-04-01,Standard Class,CJ-12010,Caroline Jumper,Consumer,United States,Houston,Texas,77070.0,Central,FUR-BO-10001811,Furniture,Bookcases,"Atlantic Metals Mobile 5-Shelf Bookcases, Cust...",1023.332
9788,9789,CA-2018-144491,2018-03-27,2018-04-01,Standard Class,CJ-12010,Caroline Jumper,Consumer,United States,Houston,Texas,77070.0,Central,FUR-CH-10004063,Furniture,Chairs,Global Deluxe High-Back Manager's Chair,600.558
9789,9790,CA-2018-144491,2018-03-27,2018-04-01,Standard Class,CJ-12010,Caroline Jumper,Consumer,United States,Houston,Texas,77070.0,Central,TEC-AC-10004901,Technology,Accessories,Kensington SlimBlade Notebook Wireless Mouse w...,39.992
9790,9791,CA-2018-144491,2018-03-27,2018-04-01,Standard Class,CJ-12010,Caroline Jumper,Consumer,United States,Houston,Texas,77070.0,Central,FUR-CH-10001714,Furniture,Chairs,"Global Leather & Oak Executive Chair, Burgundy",211.246
9791,9792,CA-2015-127166,2015-05-21,2015-05-23,Second Class,KH-16360,Katherine Hughes,Consumer,United States,Houston,Texas,77070.0,Central,OFF-EN-10003134,Office Supplies,Envelopes,Staple envelope,56.064
9792,9793,CA-2015-127166,2015-05-21,2015-05-23,Second Class,KH-16360,Katherine Hughes,Consumer,United States,Houston,Texas,77070.0,Central,FUR-CH-10003396,Furniture,Chairs,Global Deluxe Steno Chair,107.772
9793,9794,CA-2015-127166,2015-05-21,2015-05-23,Second Class,KH-16360,Katherine Hughes,Consumer,United States,Houston,Texas,77070.0,Central,OFF-PA-10001560,Office Supplies,Paper,"Adams Telephone Message Books, 5 1/4” x 11”",4.832
9794,9795,CA-2015-127166,2015-05-21,2015-05-23,Second Class,KH-16360,Katherine Hughes,Consumer,United States,Houston,Texas,77070.0,Central,OFF-BI-10000977,Office Supplies,Binders,Ibico Plastic Spiral Binding Combs,18.24


In [11]:
grouped_df.first()

Unnamed: 0_level_0,row_id,order_id,order_date,ship_date,ship_mode,customer_id,customer_name,segment,country,city,state,postal_code,region,product_id,sub_category,product_name,sales
category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
Furniture,1,CA-2017-152156,2017-11-08,2017-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,42420.0,South,FUR-BO-10001798,Bookcases,Bush Somerset Collection Bookcase,261.96
Office Supplies,3,CA-2017-138688,2017-06-12,2017-06-16,Second Class,DV-13045,Darrin Van Huff,Corporate,United States,Los Angeles,California,90036.0,West,OFF-LA-10000240,Labels,Self-Adhesive Address Labels for Typewriters b...,14.62
Technology,8,CA-2015-115812,2015-06-09,2015-06-14,Standard Class,BH-11710,Brosina Hoffman,Consumer,United States,Los Angeles,California,90032.0,West,TEC-PH-10002275,Phones,Mitel 5320 IP Phone VoIP phone,907.152


In [12]:
grouped_df.last()

Unnamed: 0_level_0,row_id,order_id,order_date,ship_date,ship_mode,customer_id,customer_name,segment,country,city,state,postal_code,region,product_id,sub_category,product_name,sales
category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
Furniture,9793,CA-2015-127166,2015-05-21,2015-05-23,Second Class,KH-16360,Katherine Hughes,Consumer,United States,Houston,Texas,77070.0,Central,FUR-CH-10003396,Chairs,Global Deluxe Steno Chair,107.772
Office Supplies,9797,CA-2016-128608,2016-01-12,2016-01-17,Standard Class,CS-12490,Cindy Schnelling,Corporate,United States,Toledo,Ohio,43615.0,East,OFF-AR-10001374,Art,"BIC Brite Liner Highlighters, Chisel Tip",10.368
Technology,9800,CA-2016-128608,2016-01-12,2016-01-17,Standard Class,CS-12490,Cindy Schnelling,Corporate,United States,Toledo,Ohio,43615.0,East,TEC-AC-10000487,Accessories,SanDisk Cruzer 4 GB USB Flash Drive,10.384


**Userdefined Filtaration**

In [13]:
grouped_df.filter(lambda group:group["sales"].mean() > 200)

Unnamed: 0,row_id,order_id,order_date,ship_date,ship_mode,customer_id,customer_name,segment,country,city,state,postal_code,region,product_id,category,sub_category,product_name,sales
0,1,CA-2017-152156,2017-11-08,2017-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,42420.0,South,FUR-BO-10001798,Furniture,Bookcases,Bush Somerset Collection Bookcase,261.9600
1,2,CA-2017-152156,2017-11-08,2017-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,42420.0,South,FUR-CH-10000454,Furniture,Chairs,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.9400
3,4,US-2016-108966,2016-10-11,2016-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,Florida,33311.0,South,FUR-TA-10000577,Furniture,Tables,Bretford CR4500 Series Slim Rectangular Table,957.5775
5,6,CA-2015-115812,2015-06-09,2015-06-14,Standard Class,BH-11710,Brosina Hoffman,Consumer,United States,Los Angeles,California,90032.0,West,FUR-FU-10001487,Furniture,Furnishings,Eldon Expressions Wood and Plastic Desk Access...,48.8600
7,8,CA-2015-115812,2015-06-09,2015-06-14,Standard Class,BH-11710,Brosina Hoffman,Consumer,United States,Los Angeles,California,90032.0,West,TEC-PH-10002275,Technology,Phones,Mitel 5320 IP Phone VoIP phone,907.1520
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9790,9791,CA-2018-144491,2018-03-27,2018-04-01,Standard Class,CJ-12010,Caroline Jumper,Consumer,United States,Houston,Texas,77070.0,Central,FUR-CH-10001714,Furniture,Chairs,"Global Leather & Oak Executive Chair, Burgundy",211.2460
9792,9793,CA-2015-127166,2015-05-21,2015-05-23,Second Class,KH-16360,Katherine Hughes,Consumer,United States,Houston,Texas,77070.0,Central,FUR-CH-10003396,Furniture,Chairs,Global Deluxe Steno Chair,107.7720
9797,9798,CA-2016-128608,2016-01-12,2016-01-17,Standard Class,CS-12490,Cindy Schnelling,Corporate,United States,Toledo,Ohio,43615.0,East,TEC-PH-10004977,Technology,Phones,GE 30524EE4,235.1880
9798,9799,CA-2016-128608,2016-01-12,2016-01-17,Standard Class,CS-12490,Cindy Schnelling,Corporate,United States,Toledo,Ohio,43615.0,East,TEC-PH-10000912,Technology,Phones,Anker 24W Portable Micro USB Car Charger,26.3760


In [14]:
grouped_df = df.groupby("category")

In [51]:
grouped_df.filter(lambda group:(group["segment"] == "Corporate").any())

Unnamed: 0,row_id,order_id,order_date,ship_date,ship_mode,customer_id,customer_name,segment,country,city,state,postal_code,region,product_id,category,sub_category,product_name,sales
0,1,CA-2017-152156,2017-11-08,2017-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,42420.0,South,FUR-BO-10001798,Furniture,Bookcases,Bush Somerset Collection Bookcase,261.9600
1,2,CA-2017-152156,2017-11-08,2017-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,Kentucky,42420.0,South,FUR-CH-10000454,Furniture,Chairs,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.9400
2,3,CA-2017-138688,2017-06-12,2017-06-16,Second Class,DV-13045,Darrin Van Huff,Corporate,United States,Los Angeles,California,90036.0,West,OFF-LA-10000240,Office Supplies,Labels,Self-Adhesive Address Labels for Typewriters b...,14.6200
3,4,US-2016-108966,2016-10-11,2016-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,Florida,33311.0,South,FUR-TA-10000577,Furniture,Tables,Bretford CR4500 Series Slim Rectangular Table,957.5775
4,5,US-2016-108966,2016-10-11,2016-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,Florida,33311.0,South,OFF-ST-10000760,Office Supplies,Storage,Eldon Fold 'N Roll Cart System,22.3680
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9795,9796,CA-2017-125920,2017-05-21,2017-05-28,Standard Class,SH-19975,Sally Hughsby,Corporate,United States,Chicago,Illinois,60610.0,Central,OFF-BI-10003429,Office Supplies,Binders,"Cardinal HOLDit! Binder Insert Strips,Extra St...",3.7980
9796,9797,CA-2016-128608,2016-01-12,2016-01-17,Standard Class,CS-12490,Cindy Schnelling,Corporate,United States,Toledo,Ohio,43615.0,East,OFF-AR-10001374,Office Supplies,Art,"BIC Brite Liner Highlighters, Chisel Tip",10.3680
9797,9798,CA-2016-128608,2016-01-12,2016-01-17,Standard Class,CS-12490,Cindy Schnelling,Corporate,United States,Toledo,Ohio,43615.0,East,TEC-PH-10004977,Technology,Phones,GE 30524EE4,235.1880
9798,9799,CA-2016-128608,2016-01-12,2016-01-17,Standard Class,CS-12490,Cindy Schnelling,Corporate,United States,Toledo,Ohio,43615.0,East,TEC-PH-10000912,Technology,Phones,Anker 24W Portable Micro USB Car Charger,26.3760
