In [1]:
#Used for displaying plots below the cell
%matplotlib inline
import math
import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt

from collections import defaultdict
from scipy.stats.stats import pearsonr

In [2]:
df = pd.read_csv('customer_supermarket.csv', sep='\t', index_col=0)

In [3]:
df.head()

Unnamed: 0,BasketID,BasketDate,Sale,CustomerID,CustomerCountry,ProdID,ProdDescr,Qta
0,536365,01/12/10 08:26,255,17850.0,United Kingdom,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6
1,536365,01/12/10 08:26,339,17850.0,United Kingdom,71053,WHITE METAL LANTERN,6
2,536365,01/12/10 08:26,275,17850.0,United Kingdom,84406B,CREAM CUPID HEARTS COAT HANGER,8
3,536365,01/12/10 08:26,339,17850.0,United Kingdom,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6
4,536365,01/12/10 08:26,339,17850.0,United Kingdom,84029E,RED WOOLLY HOTTIE WHITE HEART.,6


The dataset seems to contain data about the shopping habits of the customers of a grocery store chain.  
Each row represents an object purchased:  
- BasketID: represents a batch of items bought at the same time, there can be more entries with the same BasketID  
- BasketDate: self explanatory, if the BasketID is the same between entries then also the BasketDate should stay the same  
- Sale: represents the value of the item, we need to figure out if it refers to a single item or the item*quantity
- CustomerID: identifies a unique customer
- ProdID: identifies a unique product for sale
- ProdDescr: describes the product
- Qta: number of items of the with id ProdID bought

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 471910 entries, 0 to 541909
Data columns (total 8 columns):
 #   Column           Non-Null Count   Dtype  
---  ------           --------------   -----  
 0   BasketID         471910 non-null  object 
 1   BasketDate       471910 non-null  object 
 2   Sale             471910 non-null  object 
 3   CustomerID       406830 non-null  float64
 4   CustomerCountry  471910 non-null  object 
 5   ProdID           471910 non-null  object 
 6   ProdDescr        471157 non-null  object 
 7   Qta              471910 non-null  int64  
dtypes: float64(1), int64(1), object(6)
memory usage: 32.4+ MB


Possibly interesting: the type of BasketID and Sale is not numerical

In [5]:
df.describe()

Unnamed: 0,CustomerID,Qta
count,406830.0,471910.0
mean,15287.68416,10.716533
std,1713.603074,231.355136
min,12346.0,-80995.0
25%,13953.0,1.0
50%,15152.0,4.0
75%,16791.0,12.0
max,18287.0,80995.0


We need to fix the object situation in order to get a better understanding of the data set.

In [7]:
basketIdList = pd.to_numeric(df.BasketID, errors='coerce').isnull()

In [27]:
c_entry_count = 0
for i in range(basketIdList.size):
    if(basketIdList.iat[i] == True):
        data_v = df.iloc[i]
        print(data_v)
        c_entry_count += 1
print(c_entry_count)

BasketID                  C536379
BasketDate         01/12/10 09:41
Sale                         27,5
CustomerID                  14527
CustomerCountry    United Kingdom
ProdID                          D
ProdDescr                Discount
Qta                            -1
Name: 141, dtype: object
BasketID                                   C536383
BasketDate                          01/12/10 09:49
Sale                                          4,65
CustomerID                                   15311
CustomerCountry                     United Kingdom
ProdID                                      35004C
ProdDescr          SET OF 3 COLOURED  FLYING DUCKS
Qta                                             -1
Name: 154, dtype: object
BasketID                                  C536391
BasketDate                         01/12/10 10:24
Sale                                         1,65
CustomerID                                  17548
CustomerCountry                    United Kingdom
ProdID              

BasketID                        C538876
BasketDate               14/12/10 15:20
Sale                               2,95
CustomerID                        16499
CustomerCountry          United Kingdom
ProdID                            22272
ProdDescr          FELTCRAFT DOLL MARIA
Qta                                  -1
Name: 30569, dtype: object
BasketID                                      C538882
BasketDate                             14/12/10 15:56
Sale                                             1,25
CustomerID                                      12797
CustomerCountry                              Portugal
ProdID                                          21791
ProdDescr          VINTAGE HEADS AND TAILS CARD GAME 
Qta                                               -12
Name: 30853, dtype: object
BasketID                                      C538882
BasketDate                             14/12/10 15:56
Sale                                             1,25
CustomerID                      

BasketID                      C540634
BasketDate             10/01/11 12:02
Sale                             6,75
CustomerID                      13672
CustomerCountry        United Kingdom
ProdID                         79323P
ProdDescr          PINK CHERRY LIGHTS
Qta                                -4
Name: 51026, dtype: object
BasketID                            C540643
BasketDate                   10/01/11 14:08
Sale                                  10,95
CustomerID                            12471
CustomerCountry                     Germany
ProdID                                22423
ProdDescr          REGENCY CAKESTAND 3 TIER
Qta                                      -2
Name: 51157, dtype: object
BasketID                               C540652
BasketDate                      10/01/11 15:04
Sale                                     265,5
CustomerID                               17406
CustomerCountry                 United Kingdom
ProdID                                   22655
ProdDesc

Name: 70628, dtype: object
BasketID                                C542089
BasketDate                       25/01/11 12:39
Sale                                       2,55
CustomerID                                12681
CustomerCountry                          France
ProdID                                    22301
ProdDescr          COFFEE MUG CAT + BIRD DESIGN
Qta                                          -3
Name: 70650, dtype: object
BasketID                                C542089
BasketDate                       25/01/11 12:39
Sale                                       2,55
CustomerID                                12681
CustomerCountry                          France
ProdID                                    22300
ProdDescr          COFFEE MUG DOG + BALL DESIGN
Qta                                          -2
Name: 70651, dtype: object
BasketID                                 C542091
BasketDate                        25/01/11 12:41
Sale                                       12,75
Cust

Name: 80219, dtype: object
BasketID                              C543026
BasketDate                     02/02/11 14:44
Sale                                     9,95
CustomerID                              16717
CustomerCountry                United Kingdom
ProdID                                  22768
ProdDescr          FAMILY PHOTO FRAME CORNICE
Qta                                        -2
Name: 80220, dtype: object
BasketID                     C543026
BasketDate            02/02/11 14:44
Sale                            2,95
CustomerID                     16717
CustomerCountry       United Kingdom
ProdID                         21844
ProdDescr          RED RETROSPOT MUG
Qta                               -2
Name: 80221, dtype: object
BasketID                                   C543039
BasketDate                          02/02/11 16:31
Sale                                          1,25
CustomerID                                   16992
CustomerCountry                     United Kingdom


BasketID                                       C544397
BasketDate                              18/02/11 12:21
Sale                                              2,95
CustomerID                                       12462
CustomerCountry                                  Spain
ProdID                                           22063
ProdDescr          CERAMIC BOWL WITH STRAWBERRY DESIGN
Qta                                                 -1
Name: 94590, dtype: object
BasketID                                      C544397
BasketDate                             18/02/11 12:21
Sale                                             9,95
CustomerID                                      12462
CustomerCountry                                 Spain
ProdID                                          37449
ProdDescr          CERAMIC CAKE STAND + HANGING CAKES
Qta                                                -1
Name: 94591, dtype: object
BasketID                                C544413
BasketDate                

Name: 111507, dtype: object
BasketID                            C545724
BasketDate                   07/03/11 11:18
Sale                                  12,75
CustomerID                            14051
CustomerCountry              United Kingdom
ProdID                                22423
ProdDescr          REGENCY CAKESTAND 3 TIER
Qta                                      -8
Name: 111508, dtype: object
BasketID                                      C545728
BasketDate                             07/03/11 11:41
Sale                                              8,5
CustomerID                                      12484
CustomerCountry                                 Spain
ProdID                                          22636
ProdDescr          CHILDS BREAKFAST SET CIRCUS PARADE
Qta                                                -1
Name: 111514, dtype: object
BasketID                            C545728
BasketDate                   07/03/11 11:41
Sale                                  12,75


Name: 124453, dtype: object
BasketID                             C546978
BasketDate                    18/03/11 12:15
Sale                                    1,45
CustomerID                             17857
CustomerCountry               United Kingdom
ProdID                                 21190
ProdDescr          PINK HEARTS PAPER GARLAND
Qta                                      -50
Name: 124475, dtype: object
BasketID                  C546989
BasketDate         18/03/11 12:59
Sale                      5225,03
CustomerID                    NaN
CustomerCountry    United Kingdom
ProdID                  AMAZONFEE
ProdDescr              AMAZON FEE
Qta                            -1
Name: 124787, dtype: object
BasketID                                      C546997
BasketDate                             18/03/11 13:32
Sale                                             0,55
CustomerID                                      12748
CustomerCountry                        United Kingdom
ProdID        

Name: 140675, dtype: object
BasketID                  C548454
BasketDate         31/03/11 11:42
Sale                          5,7
CustomerID                  16422
CustomerCountry    United Kingdom
ProdID                          M
ProdDescr                  Manual
Qta                            -1
Name: 140691, dtype: object
BasketID                  C548454
BasketDate         31/03/11 11:42
Sale                         7,88
CustomerID                  16422
CustomerCountry    United Kingdom
ProdID                          M
ProdDescr                  Manual
Qta                            -1
Name: 140692, dtype: object
BasketID                  C548454
BasketDate         31/03/11 11:42
Sale                         0,22
CustomerID                  16422
CustomerCountry    United Kingdom
ProdID                          M
ProdDescr                  Manual
Qta                            -1
Name: 140693, dtype: object
BasketID                  C548454
BasketDate         31/03/11 11:42
Sale

BasketID                                C549666
BasketDate                       11/04/11 12:17
Sale                                       3,75
CustomerID                                13113
CustomerCountry                  United Kingdom
ProdID                                    21471
ProdDescr          STRAWBERRY RAFFIA FOOD COVER
Qta                                          -1
Name: 152565, dtype: object
BasketID                     C549666
BasketDate            11/04/11 12:17
Sale                            2,95
CustomerID                     13113
CustomerCountry       United Kingdom
ProdID                         22176
ProdDescr          BLUE OWL SOFT TOY
Qta                               -1
Name: 152566, dtype: object
BasketID                              C549666
BasketDate                     11/04/11 12:17
Sale                                     2,08
CustomerID                              13113
CustomerCountry                United Kingdom
ProdID                            

Name: 165749, dtype: object
BasketID                  C550933
BasketDate         21/04/11 15:13
Sale                       678,03
CustomerID                    NaN
CustomerCountry    United Kingdom
ProdID                          M
ProdDescr                  Manual
Qta                            -1
Name: 166516, dtype: object
BasketID                            C550959
BasketDate                   21/04/11 16:38
Sale                                   4,25
CustomerID                            15594
CustomerCountry              United Kingdom
ProdID                                22960
ProdDescr          JAM MAKING SET WITH JARS
Qta                                      -1
Name: 166967, dtype: object
BasketID                                    C550961
BasketDate                           21/04/11 16:40
Sale                                           1,06
CustomerID                                    14428
CustomerCountry                      United Kingdom
ProdID                          

BasketID                               C552704
BasketDate                      10/05/11 16:05
Sale                                      3,39
CustomerID                               13098
CustomerCountry                 United Kingdom
ProdID                                   22729
ProdDescr          ALARM CLOCK BAKELIKE ORANGE
Qta                                         -1
Name: 184781, dtype: object
BasketID                                       C552704
BasketDate                              10/05/11 16:05
Sale                                              2,55
CustomerID                                       13098
CustomerCountry                         United Kingdom
ProdID                                           20914
ProdDescr          SET/5 RED RETROSPOT LID GLASS BOWLS
Qta                                                 -2
Name: 184782, dtype: object
BasketID                       C552704
BasketDate              10/05/11 16:05
Sale                              1,45
CustomerID 

Name: 205195, dtype: object
BasketID                            C554715
BasketDate                   26/05/11 11:02
Sale                                  12,75
CustomerID                            13658
CustomerCountry              United Kingdom
ProdID                                22423
ProdDescr          REGENCY CAKESTAND 3 TIER
Qta                                      -3
Name: 205196, dtype: object
BasketID                                  C554715
BasketDate                         26/05/11 11:02
Sale                                         1,25
CustomerID                                  13658
CustomerCountry                    United Kingdom
ProdID                                      21231
ProdDescr          SWEETHEART CERAMIC TRINKET BOX
Qta                                            -2
Name: 205197, dtype: object
BasketID                               C554716
BasketDate                      26/05/11 11:03
Sale                                      1,65
CustomerID             

Name: 217082, dtype: object
BasketID                            C555880
BasketDate                   07/06/11 15:32
Sale                                   4,15
CustomerID                            15643
CustomerCountry              United Kingdom
ProdID                                23169
ProdDescr          CLASSIC GLASS COOKIE JAR
Qta                                      -1
Name: 217083, dtype: object
BasketID                                  C555881
BasketDate                         07/06/11 15:37
Sale                                         1,65
CustomerID                                  12473
CustomerCountry                           Germany
ProdID                                      22556
ProdDescr          PLASTERS IN TIN CIRCUS PARADE 
Qta                                            -1
Name: 217084, dtype: object
BasketID                                      C555881
BasketDate                             07/06/11 15:37
Sale                                             2,95
Cu

BasketID                                   C557465
BasketDate                          20/06/11 13:07
Sale                                          1,49
CustomerID                                   14796
CustomerCountry                     United Kingdom
ProdID                                       22057
ProdDescr          CERAMIC PLATE STRAWBERRY DESIGN
Qta                                             -3
Name: 233218, dtype: object
BasketID                                     C557467
BasketDate                            20/06/11 13:13
Sale                                            4,15
CustomerID                                     17157
CustomerCountry                       United Kingdom
ProdID                                         23239
ProdDescr          SET OF 4 KNICK KNACK TINS POPPIES
Qta                                               -1
Name: 233302, dtype: object
BasketID                                     C557467
BasketDate                            20/06/11 13:13
Sale  

BasketID                                C558716
BasketDate                       01/07/11 13:22
Sale                                       0,42
CustomerID                                17888
CustomerCountry                  United Kingdom
ProdID                                    22998
ProdDescr          TRAVEL CARD WALLET KEEP CALM
Qta                                         -24
Name: 246743, dtype: object
BasketID                                 C558716
BasketDate                        01/07/11 13:22
Sale                                        0,42
CustomerID                                 17888
CustomerCountry                   United Kingdom
ProdID                                     22997
ProdDescr          TRAVEL CARD WALLET UNION JACK
Qta                                          -24
Name: 246744, dtype: object
BasketID                          C558716
BasketDate                 01/07/11 13:22
Sale                                 1,95
CustomerID                          17888


BasketID                                  C559939
BasketDate                         14/07/11 10:19
Sale                                         1,25
CustomerID                                  14426
CustomerCountry                    United Kingdom
ProdID                                      21232
ProdDescr          STRAWBERRY CERAMIC TRINKET BOX
Qta                                           -12
Name: 263171, dtype: object
BasketID                                   C559939
BasketDate                          14/07/11 10:19
Sale                                          0,55
CustomerID                                   14426
CustomerCountry                     United Kingdom
ProdID                                       21212
ProdDescr          PACK OF 72 RETROSPOT CAKE CASES
Qta                                            -24
Name: 263172, dtype: object
BasketID                                       C559947
BasketDate                              14/07/11 10:27
Sale                      

Name: 274393, dtype: object
BasketID                             C560910
BasketDate                    21/07/11 17:57
Sale                                    4,95
CustomerID                             14733
CustomerCountry               United Kingdom
ProdID                                 21539
ProdDescr          RED RETROSPOT BUTTER DISH
Qta                                       -1
Name: 274394, dtype: object
BasketID                        C560910
BasketDate               21/07/11 17:57
Sale                               3,95
CustomerID                        14733
CustomerCountry          United Kingdom
ProdID                            22461
ProdDescr          SAVOY ART DECO CLOCK
Qta                                  -5
Name: 274395, dtype: object
BasketID                                C560912
BasketDate                       21/07/11 18:05
Sale                                       3,75
CustomerID                                14400
CustomerCountry                  United King

BasketID                                C562582
BasketDate                       07/08/11 13:53
Sale                                       1,25
CustomerID                                15640
CustomerCountry                  United Kingdom
ProdID                                    21671
ProdDescr          RED SPOT CERAMIC DRAWER KNOB
Qta                                          -8
Name: 293154, dtype: object
BasketID                            C562582
BasketDate                   07/08/11 13:53
Sale                                  16,95
CustomerID                            15640
CustomerCountry              United Kingdom
ProdID                                23008
ProdDescr          DOLLY GIRL BABY GIFT SET
Qta                                      -1
Name: 293155, dtype: object
BasketID                                  C562582
BasketDate                         07/08/11 13:53
Sale                                         9,95
CustomerID                                  15640
Customer

Name: 309590, dtype: object
BasketID                                C564131
BasketDate                       23/08/11 10:55
Sale                                      16,95
CustomerID                                14407
CustomerCountry                  United Kingdom
ProdID                                    22946
ProdDescr          WOODEN ADVENT CALENDAR CREAM
Qta                                          -1
Name: 309592, dtype: object
BasketID                                  C564134
BasketDate                         23/08/11 11:02
Sale                                         1,45
CustomerID                                  12456
CustomerCountry                       Switzerland
ProdID                                      23198
ProdDescr          PANTRY MAGNETIC  SHOPPING LIST
Qta                                            -7
Name: 309606, dtype: object
BasketID                                 C564134
BasketDate                        23/08/11 11:02
Sale                              

BasketID                                 C565044
BasketDate                        31/08/11 17:02
Sale                                        1,45
CustomerID                                 12931
CustomerCountry                   United Kingdom
ProdID                                     21731
ProdDescr          RED TOADSTOOL LED NIGHT LIGHT
Qta                                         -126
Name: 320581, dtype: object
BasketID                                 C565044
BasketDate                        31/08/11 17:02
Sale                                        3,39
CustomerID                                 12931
CustomerCountry                   United Kingdom
ProdID                                     21479
ProdDescr          WHITE SKULL HOT WATER BOTTLE 
Qta                                         -144
Name: 320582, dtype: object
BasketID                                    C565044
BasketDate                           31/08/11 17:02
Sale                                           1,69
Cust

Name: 337747, dtype: object
BasketID                                    C566463
BasketDate                           12/09/11 17:50
Sale                                            8,5
CustomerID                                    17449
CustomerCountry                      United Kingdom
ProdID                                        22169
ProdDescr          FAMILY ALBUM WHITE PICTURE FRAME
Qta                                              -1
Name: 337748, dtype: object
BasketID                              C566464
BasketDate                     12/09/11 17:54
Sale                                      2,1
CustomerID                              15764
CustomerCountry                United Kingdom
ProdID                                  85053
ProdDescr          FRENCH ENAMEL CANDLEHOLDER
Qta                                        -1
Name: 337749, dtype: object
BasketID                                C566464
BasketDate                       12/09/11 17:54
Sale                                

Name: 352783, dtype: object
BasketID                           C567690
BasketDate                  21/09/11 17:01
Sale                                  1,65
CustomerID                           15810
CustomerCountry             United Kingdom
ProdID                               22384
ProdDescr          LUNCH BAG PINK POLKADOT
Qta                                    -11
Name: 352784, dtype: object
BasketID                          C567690
BasketDate                 21/09/11 17:01
Sale                                 1,65
CustomerID                          15810
CustomerCountry            United Kingdom
ProdID                              22383
ProdDescr          LUNCH BAG SUKI DESIGN 
Qta                                    -5
Name: 352785, dtype: object
BasketID                      C567690
BasketDate             21/09/11 17:01
Sale                             1,65
CustomerID                      15810
CustomerCountry        United Kingdom
ProdID                          20726
ProdDesc

BasketID                                 C568879
BasketDate                        29/09/11 12:38
Sale                                        1,45
CustomerID                                 14680
CustomerCountry                   United Kingdom
ProdID                                     23208
ProdDescr          LUNCH BAG VINTAGE LEAF DESIGN
Qta                                          -53
Name: 367285, dtype: object
BasketID                             C568879
BasketDate                    29/09/11 12:38
Sale                                    1,45
CustomerID                             14680
CustomerCountry               United Kingdom
ProdID                                 23207
ProdDescr          LUNCH BAG ALPHABET DESIGN
Qta                                      -82
Name: 367286, dtype: object
BasketID                                  C568879
BasketDate                         29/09/11 12:38
Sale                                         0,42
CustomerID                                

Name: 382798, dtype: object
BasketID                           C569954
BasketDate                  06/10/11 18:34
Sale                                  1,95
CustomerID                           14792
CustomerCountry             United Kingdom
ProdID                              47594A
ProdDescr          CAROUSEL DESIGN WASHBAG
Qta                                     -1
Name: 382799, dtype: object
BasketID                                C569954
BasketDate                       06/10/11 18:34
Sale                                       4,25
CustomerID                                14792
CustomerCountry                  United Kingdom
ProdID                                    22371
ProdDescr          AIRLINE BAG VINTAGE TOKYO 78
Qta                                          -1
Name: 382800, dtype: object
BasketID                           C569955
BasketDate                  06/10/11 18:35
Sale                                  5,75
CustomerID                           17644
CustomerCountry 

Name: 390542, dtype: object
BasketID                              C570556
BasketDate                     11/10/11 11:10
Sale                                     1,25
CustomerID                              16029
CustomerCountry                United Kingdom
ProdID                                  22147
ProdDescr          FELTCRAFT BUTTERFLY HEARTS
Qta                                      -840
Name: 390543, dtype: object
BasketID                                    C570556
BasketDate                           11/10/11 11:10
Sale                                           1,06
CustomerID                                    16029
CustomerCountry                      United Kingdom
ProdID                                        20971
ProdDescr          PINK BLUE FELT CRAFT TRINKET BOX
Qta                                           -1296
Name: 390544, dtype: object
BasketID                                     C570556
BasketDate                            11/10/11 11:10
Sale                      

BasketID                                     C571499
BasketDate                            17/10/11 15:07
Sale                                             6,5
CustomerID                                     12454
CustomerCountry                                Spain
ProdID                                         23071
ProdDescr          MARIE ANTOINETTE TRINKET BOX GOLD
Qta                                              -48
Name: 402442, dtype: object
BasketID                          C571499
BasketDate                 17/10/11 15:07
Sale                                41,75
CustomerID                          12454
CustomerCountry                     Spain
ProdID                              23064
ProdDescr          CINDERELLA CHANDELIER 
Qta                                   -10
Name: 402443, dtype: object
BasketID                                    C571499
BasketDate                           17/10/11 15:07
Sale                                           4,15
CustomerID                  

BasketID                                 C572989
BasketDate                        27/10/11 10:53
Sale                                        0,85
CustomerID                                 15786
CustomerCountry                   United Kingdom
ProdID                                     22603
ProdDescr          CHRISTMAS RETROSPOT TREE WOOD
Qta                                           -2
Name: 421483, dtype: object
BasketID                            C572991
BasketDate                   27/10/11 10:56
Sale                                  12,75
CustomerID                            17672
CustomerCountry              United Kingdom
ProdID                                22423
ProdDescr          REGENCY CAKESTAND 3 TIER
Qta                                      -1
Name: 421498, dtype: object
BasketID                                C572991
BasketDate                       27/10/11 10:56
Sale                                       2,55
CustomerID                                17672
Customer

Name: 439970, dtype: object
BasketID                                C574493
BasketDate                       04/11/11 13:09
Sale                                       1,65
CustomerID                                14842
CustomerCountry                  United Kingdom
ProdID                                    23103
ProdDescr          JINGLE BELL HEART DECORATION
Qta                                          -3
Name: 439971, dtype: object
BasketID                               C574493
BasketDate                      04/11/11 13:09
Sale                                      4,95
CustomerID                               14842
CustomerCountry                 United Kingdom
ProdID                                   23117
ProdDescr          POPPY FIELDS CHOPPING BOARD
Qta                                         -1
Name: 439972, dtype: object
BasketID                         C574493
BasketDate                04/11/11 13:09
Sale                                4,95
CustomerID                       

Name: 455288, dtype: object
BasketID                                    C575608
BasketDate                           10/11/11 12:38
Sale                                           4,25
CustomerID                                    16133
CustomerCountry                      United Kingdom
ProdID                                        22139
ProdDescr          RETROSPOT TEA SET CERAMIC 11 PC 
Qta                                              -3
Name: 455289, dtype: object
BasketID                               C575616
BasketDate                      10/11/11 12:51
Sale                                      4,15
CustomerID                               15805
CustomerCountry                 United Kingdom
ProdID                                   23469
ProdDescr          CARD HOLDER LOVE BIRD SMALL
Qta                                         -2
Name: 455407, dtype: object
BasketID                                C575616
BasketDate                       10/11/11 12:51
Sale                        

BasketID                                  C577073
BasketDate                         17/11/11 15:02
Sale                                         2,95
CustomerID                                  14359
CustomerCountry                    United Kingdom
ProdID                                      22698
ProdDescr          PINK REGENCY TEACUP AND SAUCER
Qta                                            -3
Name: 477726, dtype: object
BasketID                                C577075
BasketDate                       17/11/11 15:08
Sale                                       4,95
CustomerID                                14456
CustomerCountry                  United Kingdom
ProdID                                    22507
ProdDescr          MEMO BOARD RETROSPOT  DESIGN
Qta                                          -1
Name: 477744, dtype: object
BasketID                             C577081
BasketDate                    17/11/11 15:24
Sale                                     2,1
CustomerID               

BasketID                                     C578363
BasketDate                            24/11/11 10:40
Sale                                            3,75
CustomerID                                     15482
CustomerCountry                       United Kingdom
ProdID                                         22114
ProdDescr          HOT WATER BOTTLE TEA AND SYMPATHY
Qta                                              -24
Name: 497338, dtype: object
BasketID                              C578363
BasketDate                     24/11/11 10:40
Sale                                     4,25
CustomerID                              15482
CustomerCountry                United Kingdom
ProdID                                  22112
ProdDescr          CHOCOLATE HOT WATER BOTTLE
Qta                                       -24
Name: 497339, dtype: object
BasketID                                     C578363
BasketDate                            24/11/11 10:40
Sale                                          

Name: 516523, dtype: object
BasketID                           C579926
BasketDate                  01/12/11 09:19
Sale                                  1,65
CustomerID                           14389
CustomerCountry             United Kingdom
ProdID                               20727
ProdDescr          LUNCH BAG  BLACK SKULL.
Qta                                     -2
Name: 516524, dtype: object
BasketID                                 C579928
BasketDate                        01/12/11 09:29
Sale                                          25
CustomerID                                 14121
CustomerCountry                   United Kingdom
ProdID                                     23485
ProdDescr          BOTANICAL GARDENS WALL CLOCK 
Qta                                           -1
Name: 516549, dtype: object
BasketID                              C579929
BasketDate                     01/12/11 09:34
Sale                                     9,95
CustomerID                              17

BasketID                            C581228
BasketDate                   08/12/11 10:06
Sale                                  10,95
CustomerID                            16019
CustomerCountry              United Kingdom
ProdID                                22423
ProdDescr          REGENCY CAKESTAND 3 TIER
Qta                                      -6
Name: 536911, dtype: object
BasketID                                    C581228
BasketDate                           08/12/11 10:06
Sale                                           1,25
CustomerID                                    16019
CustomerCountry                      United Kingdom
ProdID                                        23210
ProdDescr          WHITE ROCKING HORSE HAND PAINTED
Qta                                             -12
Name: 536912, dtype: object
BasketID                               C581228
BasketDate                      08/12/11 10:06
Sale                                      2,95
CustomerID                         

In [9]:
basket_list = []
for i in range(basketIdList.size):
    if(basketIdList.iat[i] == True):
        data_v = df.iloc[i]
        basket_list.append(data_v.BasketID[0:3]) #Get the starting chars of each BasketID entry
        
print(set(basket_list)) #Get the unique values

{'C53', 'A56', 'C54', 'C56', 'C58', 'C55', 'C57'}


We can notice that a good chunk of the BasketID values start with a "C" instead of being just numbers and some start with "A" instead.

In [10]:
basket_c_list = []
basket_a_list = []
for i in range(basketIdList.size):
    if(basketIdList.iat[i] == True):
        data_v = df.iloc[i]
        if(data_v.BasketID[0] == "C"):
            basket_c_list.append(data_v)
        elif(data_v.BasketID[0] == "A"):
            basket_a_list.append(data_v)

In [21]:
print("Records starting with 'C':\n")
for entry in basket_c_list:
    print(entry)
    print("\n")

Records starting with 'C':

BasketID                  C536379
BasketDate         01/12/10 09:41
Sale                         27,5
CustomerID                  14527
CustomerCountry    United Kingdom
ProdID                          D
ProdDescr                Discount
Qta                            -1
Name: 141, dtype: object


BasketID                                   C536383
BasketDate                          01/12/10 09:49
Sale                                          4,65
CustomerID                                   15311
CustomerCountry                     United Kingdom
ProdID                                      35004C
ProdDescr          SET OF 3 COLOURED  FLYING DUCKS
Qta                                             -1
Name: 154, dtype: object


BasketID                                  C536391
BasketDate                         01/12/10 10:24
Sale                                         1,65
CustomerID                                  17548
CustomerCountry                    Uni

Name: 27367, dtype: object


BasketID                               C538536
BasketDate                      13/12/10 10:40
Sale                                      0,85
CustomerID                               15881
CustomerCountry                 United Kingdom
ProdID                                   22834
ProdDescr          HAND WARMER BABUSHKA DESIGN
Qta                                        -48
Name: 27389, dtype: object


BasketID                                     C538595
BasketDate                            13/12/10 12:03
Sale                                            2,55
CustomerID                                     15555
CustomerCountry                       United Kingdom
ProdID                                         82482
ProdDescr          WOODEN PICTURE FRAME WHITE FINISH
Qta                                               -2
Name: 27993, dtype: object


BasketID                       C538628
BasketDate              13/12/10 13:09
Sale                              2

Name: 45904, dtype: object


BasketID                            C540307
BasketDate                   06/01/11 12:58
Sale                                   1,25
CustomerID                            15823
CustomerCountry              United Kingdom
ProdID                                22094
ProdDescr          RED RETROSPOT TISSUE BOX
Qta                                     -12
Name: 45905, dtype: object


BasketID                       C540307
BasketDate              06/01/11 12:58
Sale                              1,25
CustomerID                       15823
CustomerCountry         United Kingdom
ProdID                           22093
ProdDescr          MOTORING TISSUE BOX
Qta                                -12
Name: 45906, dtype: object


BasketID                              C540307
BasketDate                     06/01/11 12:58
Sale                                      2,1
CustomerID                              15823
CustomerCountry                United Kingdom
ProdID             

Name: 72728, dtype: object


BasketID                        C542256
BasketDate               26/01/11 17:04
Sale                               1,06
CustomerID                        17722
CustomerCountry          United Kingdom
ProdID                            22095
ProdDescr          LADS ONLY TISSUE BOX
Qta                                  -1
Name: 72729, dtype: object


BasketID                                C542257
BasketDate                       26/01/11 17:06
Sale                                       7,95
CustomerID                                14341
CustomerCountry                  United Kingdom
ProdID                                    22841
ProdDescr          ROUND CAKE TIN VINTAGE GREEN
Qta                                          -2
Name: 72730, dtype: object


BasketID                      C542259
BasketDate             26/01/11 17:09
Sale                              8,5
CustomerID                      16656
CustomerCountry        United Kingdom
ProdID             

Name: 88128, dtype: object


BasketID                             C543772
BasketDate                    11/02/11 15:43
Sale                                    0,85
CustomerID                             13534
CustomerCountry               United Kingdom
ProdID                                 22355
ProdDescr          CHARLOTTE BAG SUKI DESIGN
Qta                                       -1
Name: 88129, dtype: object


BasketID                          C543773
BasketDate                 11/02/11 15:47
Sale                                 3,75
CustomerID                          12410
CustomerCountry               Switzerland
ProdID                              21218
ProdDescr          RED SPOTTY BISCUIT TIN
Qta                                    -1
Name: 88130, dtype: object


BasketID                        C543773
BasketDate               11/02/11 15:47
Sale                                8,5
CustomerID                        12410
CustomerCountry             Switzerland
ProdID           

Name: 116079, dtype: object


BasketID                      C546210
BasketDate             10/03/11 11:31
Sale                             1,25
CustomerID                      14713
CustomerCountry        United Kingdom
ProdID                          21272
ProdDescr          SALLE DE BAIN HOOK
Qta                                -1
Name: 116080, dtype: object


BasketID                         C546211
BasketDate                10/03/11 11:32
Sale                                0,85
CustomerID                         15358
CustomerCountry           United Kingdom
ProdID                             22962
ProdDescr          JAM JAR WITH PINK LID
Qta                                   -3
Name: 116081, dtype: object


BasketID                             C546214
BasketDate                    10/03/11 11:45
Sale                                    5,95
CustomerID                             15724
CustomerCountry               United Kingdom
ProdID                                 22499
ProdDesc

Name: 140796, dtype: object


BasketID                                      C548463
BasketDate                             31/03/11 12:11
Sale                                             5,95
CustomerID                                      13225
CustomerCountry                        United Kingdom
ProdID                                          82483
ProdDescr          WOOD 2 DRAWER CABINET WHITE FINISH
Qta                                               -46
Name: 140797, dtype: object


BasketID                              C548464
BasketDate                     31/03/11 12:14
Sale                                    14,95
CustomerID                              13043
CustomerCountry                United Kingdom
ProdID                                  22846
ProdDescr          BREAD BIN DINER STYLE RED 
Qta                                        -3
Name: 140798, dtype: object


BasketID                               C548465
BasketDate                      31/03/11 12:17
Sale            


BasketID                             C550181
BasketDate                    14/04/11 17:43
Sale                                     2,1
CustomerID                             13069
CustomerCountry               United Kingdom
ProdID                                 22748
ProdDescr          POPPY'S PLAYHOUSE KITCHEN
Qta                                       -1
Name: 156886, dtype: object


BasketID                          C550181
BasketDate                 14/04/11 17:43
Sale                                  2,1
CustomerID                          13069
CustomerCountry            United Kingdom
ProdID                              22381
ProdDescr          TOY TIDY PINK POLKADOT
Qta                                   -10
Name: 156887, dtype: object


BasketID                           C550181
BasketDate                  14/04/11 17:43
Sale                                  2,95
CustomerID                           13069
CustomerCountry             United Kingdom
ProdID                      


BasketID                                C552720
BasketDate                       11/05/11 09:49
Sale                                        2,1
CustomerID                                18272
CustomerCountry                  United Kingdom
ProdID                                    84817
ProdDescr          DANISH ROSE DECORATIVE PLATE
Qta                                          -2
Name: 184924, dtype: object


BasketID                         C552720
BasketDate                11/05/11 09:49
Sale                                2,95
CustomerID                         18272
CustomerCountry           United Kingdom
ProdID                             20932
ProdDescr          PINK POT PLANT CANDLE
Qta                                   -1
Name: 184925, dtype: object


BasketID                                C552720
BasketDate                       11/05/11 09:49
Sale                                       1,45
CustomerID                                18272
CustomerCountry                  Un



BasketID                             C554486
BasketDate                    24/05/11 13:25
Sale                                     2,1
CustomerID                             13168
CustomerCountry               United Kingdom
ProdID                                 82582
ProdDescr          AREA PATROLLED METAL SIGN
Qta                                       -1
Name: 202569, dtype: object


BasketID                             C554493
BasketDate                    24/05/11 13:59
Sale                                   29,95
CustomerID                             16818
CustomerCountry               United Kingdom
ProdID                                 84616
ProdDescr          SILVER ROCCOCO CHANDELIER
Qta                                       -1
Name: 202691, dtype: object


BasketID                  C554516
BasketDate         24/05/11 16:50
Sale                        13,88
CustomerID                  14527
CustomerCountry    United Kingdom
ProdID                          D
ProdDescr     

Name: 224383, dtype: object


BasketID                     C556522
BasketDate            13/06/11 11:21
Sale                            0,55
CustomerID                     16938
CustomerCountry       United Kingdom
ProdID                         22920
ProdDescr          HERB MARKER BASIL
Qta                            -1515
Name: 224419, dtype: object


BasketID                              C556530
BasketDate                     13/06/11 11:42
Sale                                     9,95
CustomerID                              18109
CustomerCountry                United Kingdom
ProdID                                  22501
ProdDescr          PICNIC BASKET WICKER LARGE
Qta                                        -3
Name: 224503, dtype: object


BasketID                      C556532
BasketDate             13/06/11 11:53
Sale                             0,65
CustomerID                      13856
CustomerCountry        United Kingdom
ProdID                          79000
ProdDescr         

BasketID                                C558444
BasketDate                       29/06/11 13:29
Sale                                       0,85
CustomerID                                15311
CustomerCountry                  United Kingdom
ProdID                                   75049L
ProdDescr          LARGE CIRCULAR MIRROR MOBILE
Qta                                          -4
Name: 243320, dtype: object


BasketID                               C558444
BasketDate                      29/06/11 13:29
Sale                                      0,64
CustomerID                               15311
CustomerCountry                 United Kingdom
ProdID                                   21313
ProdDescr          GLASS HEART T-LIGHT HOLDER 
Qta                                         -1
Name: 243321, dtype: object


BasketID                                    C558444
BasketDate                           29/06/11 13:29
Sale                                           1,06
CustomerID              

Name: 268324, dtype: object


BasketID                                 C560409
BasketDate                        18/07/11 14:24
Sale                                        7,95
CustomerID                                 16717
CustomerCountry                   United Kingdom
ProdID                                     22794
ProdDescr          SWEETHEART WIRE MAGAZINE RACK
Qta                                           -2
Name: 268325, dtype: object


BasketID                         C560409
BasketDate                18/07/11 14:24
Sale                                4,95
CustomerID                         16717
CustomerCountry           United Kingdom
ProdID                             22784
ProdDescr          LANTERN CREAM GAZEBO 
Qta                                   -3
Name: 268326, dtype: object


BasketID                               C560409
BasketDate                      18/07/11 14:24
Sale                                      9,95
CustomerID                               16717
Cu

Name: 290365, dtype: object


BasketID                                     C562375
BasketDate                            04/08/11 14:46
Sale                                            2,55
CustomerID                                     14911
CustomerCountry                                 EIRE
ProdID                                         22910
ProdDescr          PAPER CHAIN KIT VINTAGE CHRISTMAS
Qta                                             -200
Name: 290366, dtype: object


BasketID                                       C562375
BasketDate                              04/08/11 14:46
Sale                                              0,72
CustomerID                                       14911
CustomerCountry                                   EIRE
ProdID                                           22909
ProdDescr          SET OF 20 VINTAGE CHRISTMAS NAPKINS
Qta                                               -192
Name: 290367, dtype: object


BasketID                                    C5

Name: 317309, dtype: object


BasketID                                C564759
BasketDate                       30/08/11 10:40
Sale                                       3,75
CustomerID                                14911
CustomerCountry                            EIRE
ProdID                                    21471
ProdDescr          STRAWBERRY RAFFIA FOOD COVER
Qta                                          -2
Name: 317310, dtype: object


BasketID                            C564759
BasketDate                   30/08/11 10:40
Sale                                   4,25
CustomerID                            14911
CustomerCountry                        EIRE
ProdID                                22960
ProdDescr          JAM MAKING SET WITH JARS
Qta                                      -1
Name: 317311, dtype: object


BasketID                   C564763
BasketDate          30/08/11 10:49
Sale                           1,6
CustomerID                   14096
CustomerCountry     United Kingdom

Name: 342611, dtype: object


BasketID                  C566925
BasketDate         15/09/11 15:18
Sale                      1829,84
CustomerID                  12748
CustomerCountry    United Kingdom
ProdID                          M
ProdDescr                  Manual
Qta                            -1
Name: 342996, dtype: object


BasketID                         C566939
BasketDate                15/09/11 16:11
Sale                                4,25
CustomerID                         14111
CustomerCountry           United Kingdom
ProdID                             22784
ProdDescr          LANTERN CREAM GAZEBO 
Qta                                   -1
Name: 343171, dtype: object


BasketID                                  C566940
BasketDate                         15/09/11 16:13
Sale                                         2,95
CustomerID                                  14105
CustomerCountry                    United Kingdom
ProdID                                      22698
ProdDescr 

Name: 367021, dtype: object


BasketID                      C568830
BasketDate             29/09/11 11:34
Sale                             1,65
CustomerID                      13725
CustomerCountry        United Kingdom
ProdID                          20726
ProdDescr          LUNCH BAG WOODLAND
Qta                                -1
Name: 367022, dtype: object


BasketID                         C568832
BasketDate                29/09/11 11:35
Sale                                5,15
CustomerID                         17450
CustomerCountry           United Kingdom
ProdID                             23113
ProdDescr          PANTRY CHOPPING BOARD
Qta                                 -186
Name: 367023, dtype: object


BasketID                           C568879
BasketDate                  29/09/11 12:38
Sale                                  1,45
CustomerID                           14680
CustomerCountry             United Kingdom
ProdID                               20725
ProdDescr          L

Name: 384026, dtype: object


BasketID                                       C570099
BasketDate                              07/10/11 12:11
Sale                                              1,65
CustomerID                                       13798
CustomerCountry                         United Kingdom
ProdID                                           21928
ProdDescr          JUMBO BAG SCANDINAVIAN BLUE PAISLEY
Qta                                                 -1
Name: 384027, dtype: object


BasketID                           C570099
BasketDate                  07/10/11 12:11
Sale                                  1,65
CustomerID                           13798
CustomerCountry             United Kingdom
ProdID                              85099B
ProdDescr          JUMBO BAG RED RETROSPOT
Qta                                     -4
Name: 384028, dtype: object


BasketID                   C570099
BasketDate          07/10/11 12:11
Sale                          1,65
CustomerID           

Name: 400733, dtype: object


BasketID                                C571346
BasketDate                       17/10/11 11:46
Sale                                       1,45
CustomerID                                16101
CustomerCountry                  United Kingdom
ProdID                                    22969
ProdDescr          HOMEMADE JAM SCENTED CANDLES
Qta                                          -1
Name: 400734, dtype: object


BasketID                   C571440
BasketDate          17/10/11 13:31
Sale                        495,98
CustomerID                   14096
CustomerCountry     United Kingdom
ProdID                        CRUK
ProdDescr          CRUK Commission
Qta                             -1
Name: 401767, dtype: object


BasketID                            C571466
BasketDate                   17/10/11 14:31
Sale                                   1,25
CustomerID                            16463
CustomerCountry              United Kingdom
ProdID                    



BasketID                                 C573283
BasketDate                        28/10/11 13:57
Sale                                        9,95
CustomerID                                 18030
CustomerCountry                   United Kingdom
ProdID                                     22776
ProdDescr          SWEETHEART 3 TIER CAKE STAND 
Qta                                           -1
Name: 424728, dtype: object


BasketID                                 C573306
BasketDate                        28/10/11 16:15
Sale                                        9,95
CustomerID                                 17837
CustomerCountry                   United Kingdom
ProdID                                     22796
ProdDescr          PHOTO FRAME 3 CLASSIC HANGING
Qta                                           -1
Name: 425161, dtype: object


BasketID                                      C573306
BasketDate                             28/10/11 16:15
Sale                                          

Name: 455513, dtype: object


BasketID                                     C575626
BasketDate                            10/11/11 13:21
Sale                                            4,95
CustomerID                                     17651
CustomerCountry                       United Kingdom
ProdID                                         22720
ProdDescr          SET OF 3 CAKE TINS PANTRY DESIGN 
Qta                                               -1
Name: 455514, dtype: object


BasketID                                       C575630
BasketDate                              10/11/11 13:35
Sale                                              4,25
CustomerID                                       15830
CustomerCountry                         United Kingdom
ProdID                                          84029G
ProdDescr          KNITTED UNION FLAG HOT WATER BOTTLE
Qta                                                 -2
Name: 455574, dtype: object


BasketID                              C575630


Name: 490508, dtype: object


BasketID                  C578073
BasketDate         22/11/11 16:02
Sale                         0,32
CustomerID                  18139
CustomerCountry    United Kingdom
ProdID                          M
ProdDescr                  Manual
Qta                           -36
Name: 491655, dtype: object


BasketID                  C578073
BasketDate         22/11/11 16:02
Sale                         0,56
CustomerID                  18139
CustomerCountry    United Kingdom
ProdID                          M
ProdDescr                  Manual
Qta                           -36
Name: 491656, dtype: object


BasketID                  C578076
BasketDate         22/11/11 16:18
Sale                         0,56
CustomerID                  18139
CustomerCountry    United Kingdom
ProdID                          M
ProdDescr                  Manual
Qta                           -24
Name: 491886, dtype: object


BasketID                                     C578077
BasketDate 

Name: 529658, dtype: object


BasketID                                 C580740
BasketDate                        06/12/11 09:25
Sale                                        1,25
CustomerID                                 12479
CustomerCountry                          Germany
ProdID                                     20979
ProdDescr          36 PENCILS TUBE RED RETROSPOT
Qta                                          -12
Name: 529659, dtype: object


BasketID                      C580741
BasketDate             06/12/11 09:29
Sale                             2,08
CustomerID                      17659
CustomerCountry        United Kingdom
ProdID                          23084
ProdDescr          RABBIT NIGHT LIGHT
Qta                                -2
Name: 529660, dtype: object


BasketID                                 C580741
BasketDate                        06/12/11 09:29
Sale                                        9,95
CustomerID                                 17659
CustomerCountry   

In [18]:
print("Records starting with 'A':\n")
for entry in basket_a_list:
    print(entry)
    print("\n")

Records starting with 'A':

BasketID                   A563186
BasketDate          12/08/11 14:51
Sale                     -11062,06
CustomerID                     NaN
CustomerCountry     United Kingdom
ProdID                           B
ProdDescr          Adjust bad debt
Qta                              1
Name: 299983, dtype: object


BasketID                   A563187
BasketDate          12/08/11 14:52
Sale                     -11062,06
CustomerID                     NaN
CustomerCountry     United Kingdom
ProdID                           B
ProdDescr          Adjust bad debt
Qta                              1
Name: 299984, dtype: object




There seems to be a strong correlation between the "C" and a negative quantity, this could indicate a customer that asked for a refund.  

There is also some interesting correlation between the "A" start and a ProdDescr containing "Adjust bad debt", maybe the "A" stands for adjust and since the CustomerId in both cases is NaN this could be an operation that concerns only the management of the shop and not something that concerns the customers (which I assume is our primary objective).  
Or maybe a cashier made a mistake by typing a wrong number in the cash register and fixed it with this new operation.  

Let's investigate the "C" hypothesis.

In [64]:
neg_qta = df.loc[df['Qta'] < 0]
neg_c_qta = df.loc[df["BasketID"].str.contains("C")]
print("Total entries with negative Qta: " + str(len(neg_qta)))
print("Total entries with C in BasketID and negative Qta: " + str(len(neg_c_qta)))


Total entries with negative Qta: 9752
Total entries with C in BasketID and negative Qta: 9084


In [69]:
neg_qta_wo_C = neg_qta.drop(neg_c_qta.index)
len(neg_qta_wo_C)

668