# Filtering a Dataset

Similiar to filtering in Excel, Pandas can also filter datasets.

The big benefit of filtering in Pandas is that you can 'save' your filtered datasets and use them for modelling.

## The Dataset...

The data used in these examples is dummy data.

It is developed from a combination of Wikipedia pages and random generated numbers.

Wiki Pages:

#### Products

- https://en.wikipedia.org/wiki/List_of_culinary_fruits
- https://en.wikipedia.org/wiki/List_of_vegetables

#### Store Names (random suburbs in Sydney)

- https://en.wikipedia.org/wiki/List_of_Sydney_suburbs

In [1]:
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

In [2]:
# Import the dependencies

import pandas as pd
import numpy as np

import warnings
warnings.simplefilter('ignore')

from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

In [3]:
# Import the dataset

sales_data = pd.read_excel(r'../Data/SalesDataset.xlsx')

# Quick view of the data "The Head"
sales_data.head()

Unnamed: 0,Date,Campaign_ID,Customer_Group,Store_ID,Store_Name,Product_Category,Product_Group,Product,Product_ID,Units,Gross_Sales,Discount
0,31/12/2019,1000000.0,A Market That's Super,2001,Berowra Creek,Fruit,Tropical Fruit,Hydnora abyssinica,1100057,990,29.7,0.5
1,30/04/2020,,Super Super Market,1012,Bardia,Fruit,Tropical Fruit,Salak,1100094,630,0.0,0.498927
2,31/07/2020,,Market,3000,Blackett,Fruit,Tropical Fruit,Kola nut,1100062,671,1241.35,0.494303
3,31/10/2020,,A Market That's Super,2011,Bilgola Beach,Fruit,Tropical Fruit,Jackfruit,1100060,611,1283.1,0.493447
4,31/10/2020,,A Market That's Super,2006,Beverly Hills,Fruit,Tropical Fruit,Terap,1100107,684,1026.0,0.492293


## Where do we start? What can we filter?


How big is this dataset? How much missing data is there??


Ask a series of questions judging the columns and the types of data sitting in the columns...

- How many different Customer Groups are there?
    - Let's look at the customers inside a group
        - Then products inside of that customer
  
  
  
- How many different Products are there?
    - Search by text
    
  
- Filtering my Columns
  
  
  
- What is the actual date range of this dataset?
    - How can we add date filters to our product and customer filters?
  
  

In [4]:
# How many rows are we dealing with here??

len(sales_data)

13752

In [5]:
# Which columns are most empty??

sales_data.isnull().sum()

Date                   0
Campaign_ID         7560
Customer_Group         0
Store_ID               0
Store_Name             0
Product_Category       0
Product_Group          0
Product               96
Product_ID             0
Units                  0
Gross_Sales            0
Discount               0
dtype: int64

In [6]:
# Which must mean the len - empty should be...

sales_data.notnull().sum()

Date                13752
Campaign_ID          6192
Customer_Group      13752
Store_ID            13752
Store_Name          13752
Product_Category    13752
Product_Group       13752
Product             13656
Product_ID          13752
Units               13752
Gross_Sales         13752
Discount            13752
dtype: int64

## Let's Filter...

In [7]:
print(sales_data["Customer_Group"].unique())

["A Market That's Super" 'Super Super Market' 'Market'
 'Not So Super Market']


In [8]:
# Let's create a new dataset that is for Super Super Market with a filter

Super_Super_Market = sales_data[sales_data["Customer_Group"] == "Super Super Market"]

Super_Super_Market.head()

Unnamed: 0,Date,Campaign_ID,Customer_Group,Store_ID,Store_Name,Product_Category,Product_Group,Product,Product_ID,Units,Gross_Sales,Discount
1,30/04/2020,,Super Super Market,1012,Bardia,Fruit,Tropical Fruit,Salak,1100094,630,0.0,0.498927
7,31/08/2020,,Super Super Market,1001,Balgowlah,Fruit,Tropical Fruit,Wild jack,1100112,691,608.08,0.489313
8,31/12/2019,1000000.0,Super Super Market,1009,Bankstown Aerodrome,Fruit,Tropical Fruit,Ice-cream bean,1100058,998,0.0,0.5
10,31/12/2020,,Super Super Market,1011,Barden Ridge,Fruit,Tropical Fruit,Sugar-apple,1100103,762,1013.46,0.485567
13,30/09/2020,,Super Super Market,1018,Beacon Hill,Fruit,Tropical Fruit,Soursop,1100099,628,332.84,0.481963


In [9]:
# How many Stores in the Super Super Market Customer Group?

Super_Super_Market["Store_Name"].nunique()

19

In [10]:
# Names of the stores in Super Super Market?

Super_Super_Market["Store_Name"].unique()

array(['Bardia', 'Balgowlah', 'Bankstown Aerodrome', 'Barden Ridge',
       'Beacon Hill', 'Balmain East', 'Balmain', 'Bardwell Valley',
       'Badgerys Creek', 'Banksia', 'Balgowlah Heights', 'Bardwell Park',
       'Bankstown', 'Barangaroo', 'Baulkham Hills', 'Bass Hill',
       'Bayview', 'Banksmeadow', 'Bangor'], dtype=object)

In [11]:
# In the Customer Group 'Super Super Market' What products does the store "Beacon Hill" sell?

# Use original dataset

SSM_BH_P = sales_data[(sales_data["Customer_Group"] == "Super Super Market") & (sales_data["Store_Name"] == "Beacon Hill")]
print(SSM_BH_P["Product"].unique())
print("----------------------")
print(SSM_BH_P["Product"].nunique())

['Soursop' 'Ilama' 'Marolo' 'Pepino' 'Noni' 'Vanilla' 'Purple mangosteen'
 'Red fruit' 'Malabar plum' 'Star fruit' 'Lúcuma' 'Strangler fig'
 'Monkeypod' 'Blackberry' 'Passiflora platyloba' 'Monkey jackfruit'
 'Soncoya' 'Achacha' 'Wood-apple' 'Bailan melon' 'Persimmon' 'Muskmelon'
 'Santa Claus melon' 'Lingonberry' 'Mangaba' 'Bearberry' 'Bengal currant'
 'Raspberry' 'Salmonberry' 'Conkerberry' 'Purple guava' 'Cudrang'
 'Native cherry' 'North american cantaloupe' 'Calligonum junceum'
 'Roseleaf bramble berry' 'Camu camu' 'Sea grape' 'Black currant'
 'Watermelon' 'Bolwarra' 'Spanish tamarind' 'Wolfberry' 'Hackberry'
 'Hardy kiwi' 'Honeyberry' 'Zig-zag vine fruit' 'Calamondin' 'Clementine'
 'Mora común' 'Kinnow' 'Huckleberry' 'Mandarin orange' 'Jiangsu kumquat'
 'Golden kiwifruit' 'Ichang papeda' 'Citron' 'Yantok' 'Limequat'
 'Bergamot orange' 'Red huckleberry' 'Oval kumquat' 'Native currant'
 'Tangor' 'Acerola' 'Strawberry tree fruit' 'Thimbleberry'
 'Cherry of the Rio Grande' 'Cocoplum' 

In [12]:
# Create a dataset for Hackberries in Beacon Hill in the Customer Group Super Super Market

Hack_Berries_SSM_BH = sales_data[(sales_data["Customer_Group"] == "Super Super Market")
                                 & (sales_data["Store_Name"] == "Beacon Hill")
                                & (sales_data["Product"] == "Hackberry")]

Hack_Berries_SSM_BH

Unnamed: 0,Date,Campaign_ID,Customer_Group,Store_ID,Store_Name,Product_Category,Product_Group,Product,Product_ID,Units,Gross_Sales,Discount
2829,31/10/2019,,Super Super Market,1018,Beacon Hill,Fruit,Berries,Hackberry,1300048,880,1390.4,0.5


## What's going on with the products in this dataset?

In [13]:
# How many different types of products are there?

sales_data["Product"].nunique()

567

In [14]:
# Any apples in this dataset?

apples = sales_data[sales_data["Product"].str.contains('apple', na=False)]
apples["Product"].unique()

array(['Sugar-apple', 'Pineapple', 'Pond apple', 'Velvet apple',
       'Wood-apple', 'Kei apple', 'Mayapple', 'Purple apple-berry',
       'Sweet apple-berry', 'Emu apple', 'Malay rose apple',
       'Red bush apple', 'Watery rose apple', 'Wax apple', 'Cocky apple',
       "Niedzwetzky's apple", 'Southern crabapple',
       'African custard-apple', 'Black apple', 'Cashew apple',
       'Custard apple', 'Elephant apple'], dtype=object)

In [15]:
# I only want a few of these apple types...

apples_to_consider = ["Sugar-apple", "Elephant apple", "Wax apple"]

cool_apples = sales_data[sales_data["Product"].isin(apples_to_consider)]
cool_apples.head()

Unnamed: 0,Date,Campaign_ID,Customer_Group,Store_ID,Store_Name,Product_Category,Product_Group,Product,Product_ID,Units,Gross_Sales,Discount
10,31/12/2020,,Super Super Market,1011,Barden Ridge,Fruit,Tropical Fruit,Sugar-apple,1100103,762,1013.46,0.485567
20,30/06/2020,,Super Super Market,1003,Balmain,Fruit,Tropical Fruit,Sugar-apple,1100103,653,868.49,0.474087
25,29/02/2020,,Super Super Market,1011,Barden Ridge,Fruit,Tropical Fruit,Sugar-apple,1100103,780,1037.4,0.465146
116,30/04/2020,,Super Super Market,1015,Bass Hill,Fruit,Tropical Fruit,Sugar-apple,1100103,453,602.49,0.379326
208,30/04/2020,,Not So Super Market,1026,Belrose,Fruit,Tropical Fruit,Sugar-apple,1100103,503,668.99,0.311445


In [16]:
# Which stores sell the cool apples?

cool_apples["Store_Name"].unique()

array(['Barden Ridge', 'Balmain', 'Bass Hill', 'Belrose', 'Beecroft',
       'Berowra Heights', 'Balgowlah Heights', 'Birrong', 'Belfield',
       'Bexley North', 'Blacktown', 'Blairmount', 'Bayview',
       'Blair Athol', 'Baulkham Hills', 'Bilgola Plateau',
       'Bardwell Park', 'Bexley', 'Bella Vista', 'Blackett',
       'Beaconsfield', 'Beacon Hill', 'Bardwell Valley', 'Bellevue Hill',
       'Beaumont Hills', 'Barangaroo', 'Bangor', 'Bankstown', 'Bardia',
       'Beverley Park', 'Berala', 'Blakehurst', 'Bickley Vale',
       'Birchgrove', 'Balgowlah', 'Beverly Hills', 'Bilgola Beach',
       'Belmore', 'Berowra Waters', 'Bankstown Aerodrome', 'Banksmeadow',
       'Berowra Creek', 'Bidwill'], dtype=object)

In [17]:
# Power of programming over Excel - Use Regex

# What products end with the word "apple"
ends_with_apple = sales_data[sales_data["Product"].str.contains(' apple$', na=False)]
ends_with_apple.head()

Unnamed: 0,Date,Campaign_ID,Customer_Group,Store_ID,Store_Name,Product_Category,Product_Group,Product,Product_ID,Units,Gross_Sales,Discount
164,31/05/2020,,Market,3004,Blakehurst,Fruit,Tropical Fruit,Pond apple,1100091,438,236.52,0.346253
201,31/05/2020,,A Market That's Super,2009,Bickley Vale,Fruit,Tropical Fruit,Pond apple,1100091,595,321.3,0.31802
231,31/12/2020,,Super Super Market,1003,Balmain,Fruit,Tropical Fruit,Velvet apple,1100110,416,341.12,0.305937
284,31/05/2020,,Market,3000,Blackett,Fruit,Tropical Fruit,Pond apple,1100091,483,260.82,0.274422
440,30/11/2019,1000000.0,Not So Super Market,1023,Bella Vista,Fruit,Tropical Fruit,Pond apple,1100091,873,471.42,0.5


In [18]:
# What products end with the word "apple"
starts_with_apple = sales_data[sales_data["Product"].str.contains('^apple', na=False)]
starts_with_apple

Unnamed: 0,Date,Campaign_ID,Customer_Group,Store_ID,Store_Name,Product_Category,Product_Group,Product,Product_ID,Units,Gross_Sales,Discount


## What columns do I have or want?

In [19]:
print(sales_data.columns.values)

['Date' 'Campaign_ID' 'Customer_Group' 'Store_ID' 'Store_Name'
 'Product_Category' 'Product_Group' 'Product' 'Product_ID' 'Units'
 'Gross_Sales' 'Discount']


In [20]:
# State the columns you want to see

sales_data[['Date', 'Customer_Group', 'Store_Name', 'Product_Category', 'Product_Group', 'Product', 'Product_ID', 'Units']].head()

Unnamed: 0,Date,Customer_Group,Store_Name,Product_Category,Product_Group,Product,Product_ID,Units
0,31/12/2019,A Market That's Super,Berowra Creek,Fruit,Tropical Fruit,Hydnora abyssinica,1100057,990
1,30/04/2020,Super Super Market,Bardia,Fruit,Tropical Fruit,Salak,1100094,630
2,31/07/2020,Market,Blackett,Fruit,Tropical Fruit,Kola nut,1100062,671
3,31/10/2020,A Market That's Super,Bilgola Beach,Fruit,Tropical Fruit,Jackfruit,1100060,611
4,31/10/2020,A Market That's Super,Beverly Hills,Fruit,Tropical Fruit,Terap,1100107,684


In [21]:
# Store the columns names in a variable and use the variable (preferred)

keep_these_columns = ['Date', 'Customer_Group', 'Store_Name', 'Product_Category', 'Product_Group', 'Product', 'Product_ID', 'Units']

refined_sales_dataset = sales_data[keep_these_columns]
refined_sales_dataset.head()

Unnamed: 0,Date,Customer_Group,Store_Name,Product_Category,Product_Group,Product,Product_ID,Units
0,31/12/2019,A Market That's Super,Berowra Creek,Fruit,Tropical Fruit,Hydnora abyssinica,1100057,990
1,30/04/2020,Super Super Market,Bardia,Fruit,Tropical Fruit,Salak,1100094,630
2,31/07/2020,Market,Blackett,Fruit,Tropical Fruit,Kola nut,1100062,671
3,31/10/2020,A Market That's Super,Bilgola Beach,Fruit,Tropical Fruit,Jackfruit,1100060,611
4,31/10/2020,A Market That's Super,Beverly Hills,Fruit,Tropical Fruit,Terap,1100107,684


In [22]:
# Alternatively just drop certain columns

refined_sales_dataset = sales_data.drop(['Campaign_ID', 'Store_ID'], axis=1)
refined_sales_dataset.head()

Unnamed: 0,Date,Customer_Group,Store_Name,Product_Category,Product_Group,Product,Product_ID,Units,Gross_Sales,Discount
0,31/12/2019,A Market That's Super,Berowra Creek,Fruit,Tropical Fruit,Hydnora abyssinica,1100057,990,29.7,0.5
1,30/04/2020,Super Super Market,Bardia,Fruit,Tropical Fruit,Salak,1100094,630,0.0,0.498927
2,31/07/2020,Market,Blackett,Fruit,Tropical Fruit,Kola nut,1100062,671,1241.35,0.494303
3,31/10/2020,A Market That's Super,Bilgola Beach,Fruit,Tropical Fruit,Jackfruit,1100060,611,1283.1,0.493447
4,31/10/2020,A Market That's Super,Beverly Hills,Fruit,Tropical Fruit,Terap,1100107,684,1026.0,0.492293


## Filter by Datatypes?

In [23]:
# Filter sales_data by numbers.. 

only_numbers = sales_data.select_dtypes(include=['float64', 'int64'])
only_numbers.head()

Unnamed: 0,Campaign_ID,Store_ID,Product_ID,Units,Gross_Sales,Discount
0,1000000.0,2001,1100057,990,29.7,0.5
1,,1012,1100094,630,0.0,0.498927
2,,3000,1100062,671,1241.35,0.494303
3,,2011,1100060,611,1283.1,0.493447
4,,2006,1100107,684,1026.0,0.492293


In [24]:
# Filter sales_data by non-numbers.. 

only_objects = sales_data.select_dtypes(include=['object'])
only_objects.head()

Unnamed: 0,Date,Customer_Group,Store_Name,Product_Category,Product_Group,Product
0,31/12/2019,A Market That's Super,Berowra Creek,Fruit,Tropical Fruit,Hydnora abyssinica
1,30/04/2020,Super Super Market,Bardia,Fruit,Tropical Fruit,Salak
2,31/07/2020,Market,Blackett,Fruit,Tropical Fruit,Kola nut
3,31/10/2020,A Market That's Super,Bilgola Beach,Fruit,Tropical Fruit,Jackfruit
4,31/10/2020,A Market That's Super,Beverly Hills,Fruit,Tropical Fruit,Terap


## Filter by Dates/Time Periods

In [25]:
sales_data = pd.read_excel(r'../Data/SalesDataset.xlsx')

# Create a copy of the original dataset for this example
sales_data_date = sales_data

In [26]:
# Change the "Date" column into a workable datetime object

sales_data_date["Date"] = pd.to_datetime(sales_data_date["Date"])

sales_data_date["Date"].unique()


array(['2019-12-31T00:00:00.000000000', '2020-04-30T00:00:00.000000000',
       '2020-07-31T00:00:00.000000000', '2020-10-31T00:00:00.000000000',
       '2020-02-29T00:00:00.000000000', '2020-08-31T00:00:00.000000000',
       '2020-11-30T00:00:00.000000000', '2020-12-31T00:00:00.000000000',
       '2020-09-30T00:00:00.000000000', '2019-02-28T00:00:00.000000000',
       '2020-06-30T00:00:00.000000000', '2020-01-31T00:00:00.000000000',
       '2020-03-31T00:00:00.000000000', '2019-09-30T00:00:00.000000000',
       '2019-10-31T00:00:00.000000000', '2019-08-31T00:00:00.000000000',
       '2019-03-31T00:00:00.000000000', '2019-07-31T00:00:00.000000000',
       '2019-11-30T00:00:00.000000000', '2019-01-31T00:00:00.000000000',
       '2019-05-31T00:00:00.000000000', '2019-06-30T00:00:00.000000000',
       '2019-04-30T00:00:00.000000000', '2020-05-31T00:00:00.000000000'],
      dtype='datetime64[ns]')

In [27]:
# Make the index the date
sales_data_date.set_index("Date", inplace=True)

# Sort the index
sales_data_date.sort_index(axis=1, inplace=True)

In [28]:
# You have to use this following convention to filter %yyyy/%mm/%dd

# Filter for a date range

sales_data_date['2020/01/31':'2020/03/01'].head()

Unnamed: 0_level_0,Campaign_ID,Customer_Group,Discount,Gross_Sales,Product,Product_Category,Product_Group,Product_ID,Store_ID,Store_Name,Units
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2020-02-29,,A Market That's Super,0.491922,938.45,Ooray,Fruit,Tropical Fruit,1100082,2003,Berowra Waters,685
2020-02-29,,Not So Super Market,0.484048,496.8,Monkey jackfruit,Fruit,Tropical Fruit,1100074,1025,Belmore,621
2020-02-29,,Market,0.470821,2144.28,Jícara,Fruit,Tropical Fruit,1100061,3003,Blairmount,642
2020-01-31,,Market,0.4697,0.0,Sandpaper fig,Fruit,Tropical Fruit,1100095,3003,Blairmount,783
2020-02-29,,Super Super Market,0.465146,1037.4,Sugar-apple,Fruit,Tropical Fruit,1100103,1011,Barden Ridge,780


## Examples of More Advanced Considerations using filtering/sorting techniques

In [29]:
# Any negative sales in the dataset?

# Filter rows based on a value
sales_data_negative = sales_data[sales_data["Gross_Sales"] < 0]
sales_data_negative

Unnamed: 0_level_0,Campaign_ID,Customer_Group,Discount,Gross_Sales,Product,Product_Category,Product_Group,Product_ID,Store_ID,Store_Name,Units
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2020-02-29,3000000.0,Not So Super Market,0.0,-15.7,Purple apple-berry,Fruit,Berries,1300081,1023,Bella Vista,-10
2019-06-30,,Not So Super Market,0.5,-122.0,Salal,Fruit,Berries,1300089,1021,Beecroft,100
2019-08-31,,Not So Super Market,0.5,-5270.0,Strawberry tree fruit,Fruit,Berries,1300098,1022,Belfield,-1000


In [30]:
# What about above average sales for products?

above_average = sales_data[sales_data["Gross_Sales"] > sales_data["Gross_Sales"].mean()]
above_average.sort_values("Gross_Sales", inplace=True, ascending=False) # Sort sales highest to lowest
above_average.head()

Unnamed: 0_level_0,Campaign_ID,Customer_Group,Discount,Gross_Sales,Product,Product_Category,Product_Group,Product_ID,Store_ID,Store_Name,Units
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2019-05-31,,Super Super Market,0.5,9860.45,Biribá,Fruit,Tropical Fruit,1100015,1015,Bass Hill,995
2019-06-30,,Super Super Market,0.5,9850.54,Biribá,Fruit,Tropical Fruit,1100015,1004,Balmain East,994
2020-03-31,,A Market That's Super,0.5,9662.25,Biribá,Fruit,Tropical Fruit,1100015,2001,Berowra Creek,975
2020-08-31,5000000.0,Not So Super Market,0.5,9643.23,Tangerine,Fruit,Citruses,1400051,1025,Belmore,983
2020-10-31,5000000.0,Super Super Market,0.5,9552.0,Jelly palm fruit,Fruit,Drupes,1500049,1018,Beacon Hill,995


In [31]:
# Below average?

below_average = sales_data[sales_data["Gross_Sales"] < sales_data["Gross_Sales"].mean()]
below_average.sort_values("Gross_Sales", inplace=True, ascending=False) # Sort sales highest to lowest
below_average.head()

Unnamed: 0_level_0,Campaign_ID,Customer_Group,Discount,Gross_Sales,Product,Product_Category,Product_Group,Product_ID,Store_ID,Store_Name,Units
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2020-09-30,3000000.0,A Market That's Super,0.5,1464.7,Saskatoon,Fruit,Berries,1300091,2005,Beverley Park,970
2020-11-30,4000000.0,Super Super Market,0.5,1464.0,Calamondin,Fruit,Citruses,1400008,1008,Bankstown,915
2020-05-31,3000000.0,A Market That's Super,0.5,1463.19,Saskatoon,Fruit,Berries,1300091,2014,Birrong,969
2019-03-31,,A Market That's Super,0.455817,1462.88,Mustard,Vegetables,Leafy and salad vegetables,2000048,2007,Bexley,656
2020-12-31,5000000.0,Not So Super Market,0.5,1462.24,Lotus root,Vegetables,Bulb and stem vegetables,2100013,1023,Bella Vista,988


In [32]:
# Filter with a function

def any_berries(x):
    if 'Berries' in x:
        return True
    else:
        return False
    
Berries = sales_data[sales_data.Product_Group.apply(any_berries)]
Berries.head()

Unnamed: 0_level_0,Campaign_ID,Customer_Group,Discount,Gross_Sales,Product,Product_Category,Product_Group,Product_ID,Store_ID,Store_Name,Units
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2020-12-31,,Super Super Market,0.49943,3573.43,Apple berry,Fruit,Berries,1300006,1004,Balmain East,719
2020-09-30,,A Market That's Super,0.497205,575.25,Camu camu,Fruit,Berries,1300026,2008,Bexley North,767
2020-10-31,,A Market That's Super,0.495738,0.0,American black elderberry,Fruit,Berries,1300002,2013,Birchgrove,673
2020-11-30,,Market,0.494029,1312.74,Creeping raspberry,Fruit,Berries,1300036,3004,Blakehurst,663
2020-09-30,,Not So Super Market,0.492558,1449.76,Ceylon gooseberry,Fruit,Berries,1300029,1019,Beaconsfield,697


## Slice Notation

I rarely use slice notation as I try to be as "explicit" as possible when I code. Meaning, I like to see the names of columns/series that I use.

Nevertheless, they can be useful, so here are some quick examples below.

In [33]:
# Select rows 0, 1, 2 
sales_data[:3]

Unnamed: 0_level_0,Campaign_ID,Customer_Group,Discount,Gross_Sales,Product,Product_Category,Product_Group,Product_ID,Store_ID,Store_Name,Units
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2019-12-31,1000000.0,A Market That's Super,0.5,29.7,Hydnora abyssinica,Fruit,Tropical Fruit,1100057,2001,Berowra Creek,990
2020-04-30,,Super Super Market,0.498927,0.0,Salak,Fruit,Tropical Fruit,1100094,1012,Bardia,630
2020-07-31,,Market,0.494303,1241.35,Kola nut,Fruit,Tropical Fruit,1100062,3000,Blackett,671


In [34]:
# Select the last row in the dataframe
sales_data[-1:]

Unnamed: 0_level_0,Campaign_ID,Customer_Group,Discount,Gross_Sales,Product,Product_Category,Product_Group,Product_ID,Store_ID,Store_Name,Units
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2020-01-31,,Super Super Market,0.000684,0.0,Gấc,Fruit,Tropical Fruit,1100049,1010,Barangaroo,29
