## Andrew Byrnes: Fetch Rewards Coding Exercise - Data Analyst
### 1-Data_Prep.ipynb

This notebook preps the provided data files and uploads them to a SQLite database. It includes a an entity relationship diagram of the how I've modeled this data.
I chose SQLite for this challenge because it is lightweight and lends itself well to sharing.  
SQLite's flexible typing rules could be a liability for a database that would be continually updated and primarily used for analysis. For that usecase I would choose something that more strictly follows SQL standard.

### Data Sources
- data-modeling.html : coding exercise instuctions
- brands.json.gz, receipts.json.gz, users.json.gz : raw data files provided for completition of the challenge

### Changes
- 09-17-2022 : Started project, first look at data, identified transformation tasks 
- 09-18-2022 : cleaned df_brands _id and cpg columns
- 09-19-2022 : wrote function to clean columns with dicts, wrote function to convert epoch time to timestamps
- 09-20-2022 : refactored functions, applied funtions cleaning and converting data, explored df_receipts.rewardsReceiptItemList, notes on stakeholder questions
- 09-21-2022 : brainstorming notes on receipt_items, created df_receipt_items dataframe, loaded all 4 dataframes to SQLite fetch.db, drew fetch.db ERD
- 09-22-2022 : added a dupe_barcodes column to the brands data and reloaded, restarted kernal and re-ran everything
- 09-24-2022 : adjusting receipt_items ETL to bring in brandcode, regex to extract brandcodes, add extracted_brand_code to receipt_items

In [2]:
import pandas as pd
from pathlib import Path
import os
from datetime import datetime
import gzip
import json
import sqlite3

### File Locations

In [3]:
today = datetime.today()
print(today)
in_brands = Path.cwd() / "data" / "raw" / "brands.json.gz"
in_receipts = Path.cwd() / "data" / "raw" / "receipts.json.gz"
in_users = Path.cwd() / "data" / "raw" / "users.json.gz"
db_path = Path.cwd() / "data" / "processed" / "fetch.db"

2022-09-24 09:06:58.280927


### Drop database if exists

In [4]:
if os.path.exists(db_path):
    os.remove(db_path)
    print("The db has been removed successfully")
else:
    print("The db does not exist!")

The db has been removed successfully


### Formatting and options

In [5]:
pd.set_option('display.max_colwidth', None)
# pd.set_option('display.max_rows', None)
pd.reset_option('display.max_rows')
pd.set_option('display.max_columns', None)
# pd.reset_option('display.max_columns')
# surpressing a warning related to renaming columns just prior to loading to sqlite
pd.options.mode.chained_assignment = None

### Load JSON data to Panda's dataframes

In [6]:
df_brands = pd.read_json(in_brands,lines=True,compression='gzip')
df_receipts = pd.read_json(in_receipts,lines=True,compression='gzip')
df_users = pd.read_json(in_users,lines=True,compression='gzip')

### First look at data

### **brands**  
**to-do**:
- ~extract _ids~
- ~extract cpg ids~

In [7]:
df_brands.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1167 entries, 0 to 1166
Data columns (total 8 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   _id           1167 non-null   object 
 1   barcode       1167 non-null   int64  
 2   category      1012 non-null   object 
 3   categoryCode  517 non-null    object 
 4   cpg           1167 non-null   object 
 5   name          1167 non-null   object 
 6   topBrand      555 non-null    float64
 7   brandCode     933 non-null    object 
dtypes: float64(1), int64(1), object(6)
memory usage: 73.1+ KB


**Brand Data Schema**
- _id: brand uuid
- barcode: the barcode on the item
- brandCode: String that corresponds with the brand column in a partner product file
- category: The category name for which the brand sells products in
- categoryCode: The category code that references a BrandCategory
- cpg: reference to CPG collection
- topBrand: Boolean indicator for whether the brand should be featured as a 'top brand'
- name: Brand name

In [8]:
df_brands

Unnamed: 0,_id,barcode,category,categoryCode,cpg,name,topBrand,brandCode
0,{'$oid': '601ac115be37ce2ead437551'},511111019862,Baking,BAKING,"{'$id': {'$oid': '601ac114be37ce2ead437550'}, '$ref': 'Cogs'}",test brand @1612366101024,0.0,
1,{'$oid': '601c5460be37ce2ead43755f'},511111519928,Beverages,BEVERAGES,"{'$id': {'$oid': '5332f5fbe4b03c9a25efd0ba'}, '$ref': 'Cogs'}",Starbucks,0.0,STARBUCKS
2,{'$oid': '601ac142be37ce2ead43755d'},511111819905,Baking,BAKING,"{'$id': {'$oid': '601ac142be37ce2ead437559'}, '$ref': 'Cogs'}",test brand @1612366146176,0.0,TEST BRANDCODE @1612366146176
3,{'$oid': '601ac142be37ce2ead43755a'},511111519874,Baking,BAKING,"{'$id': {'$oid': '601ac142be37ce2ead437559'}, '$ref': 'Cogs'}",test brand @1612366146051,0.0,TEST BRANDCODE @1612366146051
4,{'$oid': '601ac142be37ce2ead43755e'},511111319917,Candy & Sweets,CANDY_AND_SWEETS,"{'$id': {'$oid': '5332fa12e4b03c9a25efd1e7'}, '$ref': 'Cogs'}",test brand @1612366146827,0.0,TEST BRANDCODE @1612366146827
...,...,...,...,...,...,...,...,...
1162,{'$oid': '5f77274dbe37ce6b592e90c0'},511111116752,Baking,BAKING,"{'$ref': 'Cogs', '$id': {'$oid': '5f77274dbe37ce6b592e90bf'}}",test brand @1601644365844,,
1163,{'$oid': '5dc1fca91dda2c0ad7da64ae'},511111706328,Breakfast & Cereal,,"{'$ref': 'Cogs', '$id': {'$oid': '53e10d6368abd3c7065097cc'}}",Dippin Dots® Cereal,,DIPPIN DOTS CEREAL
1164,{'$oid': '5f494c6e04db711dd8fe87e7'},511111416173,Candy & Sweets,CANDY_AND_SWEETS,"{'$ref': 'Cogs', '$id': {'$oid': '5332fa12e4b03c9a25efd1e7'}}",test brand @1598639215217,,TEST BRANDCODE @1598639215217
1165,{'$oid': '5a021611e4b00efe02b02a57'},511111400608,Grocery,,"{'$ref': 'Cogs', '$id': {'$oid': '5332f5f6e4b03c9a25efd0b4'}}",LIPTON TEA Leaves,0.0,LIPTON TEA Leaves


### **receipts**  
**to-do**:
- ~extract _ids~
- ~extract and convert createDate~
- ~extract and convert dateScanned~
- ~extract and convert finishedDate~
- ~extract and convert modifyDate~
- ~extract and convert pointsAwardedDate~
- ~extract and convert purchaseDate~
- create receipt_items table using the rewardsReceiptItemList

In [9]:
df_receipts.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1119 entries, 0 to 1118
Data columns (total 15 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   _id                      1119 non-null   object 
 1   bonusPointsEarned        544 non-null    float64
 2   bonusPointsEarnedReason  544 non-null    object 
 3   createDate               1119 non-null   object 
 4   dateScanned              1119 non-null   object 
 5   finishedDate             568 non-null    object 
 6   modifyDate               1119 non-null   object 
 7   pointsAwardedDate        537 non-null    object 
 8   pointsEarned             609 non-null    float64
 9   purchaseDate             671 non-null    object 
 10  purchasedItemCount       635 non-null    float64
 11  rewardsReceiptItemList   679 non-null    object 
 12  rewardsReceiptStatus     1119 non-null   object 
 13  totalSpent               684 non-null    float64
 14  userId                  

**Receipts Data Schema**
- _id: uuid for this receipt
- bonusPointsEarned: Number of bonus points that were awarded upon receipt completion
- bonusPointsEarnedReason: event that triggered bonus points
- createDate: The date that the event was created
- dateScanned: Date that the user scanned their receipt
- finishedDate: Date that the receipt finished processing
- modifyDate: The date the event was modified
- pointsAwardedDate: The date we awarded points for the transaction
- pointsEarned: The number of points earned for the receipt
- purchaseDate: the date of the purchase
- purchasedItemCount: Count of number of items on the receipt
- rewardsReceiptItemList: The items that were purchased on the receipt
- rewardsReceiptStatus: status of the receipt through receipt validation and processing
- totalSpent: The total amount on the receipt
- userId: string id back to the User collection for the user who scanned the receipt

In [10]:
df_receipts

Unnamed: 0,_id,bonusPointsEarned,bonusPointsEarnedReason,createDate,dateScanned,finishedDate,modifyDate,pointsAwardedDate,pointsEarned,purchaseDate,purchasedItemCount,rewardsReceiptItemList,rewardsReceiptStatus,totalSpent,userId
0,{'$oid': '5ff1e1eb0a720f0523000575'},500.0,"Receipt number 2 completed, bonus point schedule DEFAULT (5cefdcacf3693e0b50e83a36)",{'$date': 1609687531000},{'$date': 1609687531000},{'$date': 1609687531000},{'$date': 1609687536000},{'$date': 1609687531000},500.0,{'$date': 1609632000000},5.0,"[{'barcode': '4011', 'description': 'ITEM NOT FOUND', 'finalPrice': '26.00', 'itemPrice': '26.00', 'needsFetchReview': False, 'partnerItemId': '1', 'preventTargetGapPoints': True, 'quantityPurchased': 5, 'userFlaggedBarcode': '4011', 'userFlaggedNewItem': True, 'userFlaggedPrice': '26.00', 'userFlaggedQuantity': 5}]",FINISHED,26.00,5ff1e1eacfcf6c399c274ae6
1,{'$oid': '5ff1e1bb0a720f052300056b'},150.0,"Receipt number 5 completed, bonus point schedule DEFAULT (5cefdcacf3693e0b50e83a36)",{'$date': 1609687483000},{'$date': 1609687483000},{'$date': 1609687483000},{'$date': 1609687488000},{'$date': 1609687483000},150.0,{'$date': 1609601083000},2.0,"[{'barcode': '4011', 'description': 'ITEM NOT FOUND', 'finalPrice': '1', 'itemPrice': '1', 'partnerItemId': '1', 'quantityPurchased': 1}, {'barcode': '028400642255', 'description': 'DORITOS TORTILLA CHIP SPICY SWEET CHILI REDUCED FAT BAG 1 OZ', 'finalPrice': '10.00', 'itemPrice': '10.00', 'needsFetchReview': True, 'needsFetchReviewReason': 'USER_FLAGGED', 'partnerItemId': '2', 'pointsNotAwardedReason': 'Action not allowed for user and CPG', 'pointsPayerId': '5332f5fbe4b03c9a25efd0ba', 'preventTargetGapPoints': True, 'quantityPurchased': 1, 'rewardsGroup': 'DORITOS SPICY SWEET CHILI SINGLE SERVE', 'rewardsProductPartnerId': '5332f5fbe4b03c9a25efd0ba', 'userFlaggedBarcode': '028400642255', 'userFlaggedDescription': 'DORITOS TORTILLA CHIP SPICY SWEET CHILI REDUCED FAT BAG 1 OZ', 'userFlaggedNewItem': True, 'userFlaggedPrice': '10.00', 'userFlaggedQuantity': 1}]",FINISHED,11.00,5ff1e194b6a9d73a3a9f1052
2,{'$oid': '5ff1e1f10a720f052300057a'},5.0,All-receipts receipt bonus,{'$date': 1609687537000},{'$date': 1609687537000},,{'$date': 1609687542000},,5.0,{'$date': 1609632000000},1.0,"[{'needsFetchReview': False, 'partnerItemId': '1', 'preventTargetGapPoints': True, 'userFlaggedBarcode': '4011', 'userFlaggedNewItem': True, 'userFlaggedPrice': '26.00', 'userFlaggedQuantity': 3}]",REJECTED,10.00,5ff1e1f1cfcf6c399c274b0b
3,{'$oid': '5ff1e1ee0a7214ada100056f'},5.0,All-receipts receipt bonus,{'$date': 1609687534000},{'$date': 1609687534000},{'$date': 1609687534000},{'$date': 1609687539000},{'$date': 1609687534000},5.0,{'$date': 1609632000000},4.0,"[{'barcode': '4011', 'description': 'ITEM NOT FOUND', 'finalPrice': '28.00', 'itemPrice': '28.00', 'needsFetchReview': False, 'partnerItemId': '1', 'preventTargetGapPoints': True, 'quantityPurchased': 4, 'userFlaggedBarcode': '4011', 'userFlaggedNewItem': True, 'userFlaggedPrice': '28.00', 'userFlaggedQuantity': 4}]",FINISHED,28.00,5ff1e1eacfcf6c399c274ae6
4,{'$oid': '5ff1e1d20a7214ada1000561'},5.0,All-receipts receipt bonus,{'$date': 1609687506000},{'$date': 1609687506000},{'$date': 1609687511000},{'$date': 1609687511000},{'$date': 1609687506000},5.0,{'$date': 1609601106000},2.0,"[{'barcode': '4011', 'description': 'ITEM NOT FOUND', 'finalPrice': '1', 'itemPrice': '1', 'partnerItemId': '1', 'quantityPurchased': 1}, {'barcode': '1234', 'finalPrice': '2.56', 'itemPrice': '2.56', 'needsFetchReview': True, 'needsFetchReviewReason': 'USER_FLAGGED', 'partnerItemId': '2', 'preventTargetGapPoints': True, 'quantityPurchased': 3, 'userFlaggedBarcode': '1234', 'userFlaggedDescription': '', 'userFlaggedNewItem': True, 'userFlaggedPrice': '2.56', 'userFlaggedQuantity': 3}]",FINISHED,1.00,5ff1e194b6a9d73a3a9f1052
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1114,{'$oid': '603cc0630a720fde100003e6'},25.0,COMPLETE_NONPARTNER_RECEIPT,{'$date': 1614594147000},{'$date': 1614594147000},,{'$date': 1614594148000},,25.0,{'$date': 1597622400000},2.0,"[{'barcode': 'B076FJ92M4', 'description': 'mueller austria hypergrind precision electric spice/coffee grinder millwith large grinding capacity and hd motor also for spices, herbs, nuts,grains, white', 'discountedItemPrice': '22.97', 'finalPrice': '22.97', 'itemPrice': '22.97', 'originalReceiptItemText': 'mueller austria hypergrind precision electric spice/coffee grinder millwith large grinding capacity and hd motor also for spices, herbs, nuts,grains, white', 'partnerItemId': '0', 'priceAfterCoupon': '22.97', 'quantityPurchased': 1}, {'barcode': 'B07BRRLSVC', 'description': 'thindust summer face mask - sun protection neck gaiter for outdooractivities', 'discountedItemPrice': '11.99', 'finalPrice': '11.99', 'itemPrice': '11.99', 'originalReceiptItemText': 'thindust summer face mask - sun protection neck gaiter for outdooractivities', 'partnerItemId': '1', 'priceAfterCoupon': '11.99', 'quantityPurchased': 1}]",REJECTED,34.96,5fc961c3b8cfca11a077dd33
1115,{'$oid': '603d0b710a720fde1000042a'},,,{'$date': 1614613361873},{'$date': 1614613361873},,{'$date': 1614613361873},,,,,,SUBMITTED,,5fc961c3b8cfca11a077dd33
1116,{'$oid': '603cf5290a720fde10000413'},,,{'$date': 1614607657664},{'$date': 1614607657664},,{'$date': 1614607657664},,,,,,SUBMITTED,,5fc961c3b8cfca11a077dd33
1117,{'$oid': '603ce7100a7217c72c000405'},25.0,COMPLETE_NONPARTNER_RECEIPT,{'$date': 1614604048000},{'$date': 1614604048000},,{'$date': 1614604049000},,25.0,{'$date': 1597622400000},2.0,"[{'barcode': 'B076FJ92M4', 'description': 'mueller austria hypergrind precision electric spice/coffee grinder millwith large grinding capacity and hd motor also for spices, herbs, nuts,grains, white', 'discountedItemPrice': '22.97', 'finalPrice': '22.97', 'itemPrice': '22.97', 'originalReceiptItemText': 'mueller austria hypergrind precision electric spice/coffee grinder millwith large grinding capacity and hd motor also for spices, herbs, nuts,grains, white', 'partnerItemId': '0', 'priceAfterCoupon': '22.97', 'quantityPurchased': 1}, {'barcode': 'B07BRRLSVC', 'description': 'thindust summer face mask - sun protection neck gaiter for outdooractivities', 'discountedItemPrice': '11.99', 'finalPrice': '11.99', 'itemPrice': '11.99', 'originalReceiptItemText': 'thindust summer face mask - sun protection neck gaiter for outdooractivities', 'partnerItemId': '1', 'priceAfterCoupon': '11.99', 'quantityPurchased': 1}]",REJECTED,34.96,5fc961c3b8cfca11a077dd33


### **users**  
**to-do**:
- ~extract _ids~
- ~extract and convert createdDate~
- ~extract and convert lastLogin~

In [11]:
df_users.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 495 entries, 0 to 494
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   _id           495 non-null    object
 1   active        495 non-null    bool  
 2   createdDate   495 non-null    object
 3   lastLogin     433 non-null    object
 4   role          495 non-null    object
 5   signUpSource  447 non-null    object
 6   state         439 non-null    object
dtypes: bool(1), object(6)
memory usage: 23.8+ KB


**Users Data Schema**
- _id: user Id
- state: state abbreviation
- createdDate: when the user created their account
- lastLogin: last time the user was recorded logging in to the app
- role: constant value set to 'CONSUMER'
- active: indicates if the user is active; only Fetch will de-activate an account with this flag

In [12]:
df_users

Unnamed: 0,_id,active,createdDate,lastLogin,role,signUpSource,state
0,{'$oid': '5ff1e194b6a9d73a3a9f1052'},True,{'$date': 1609687444800},{'$date': 1609687537858},consumer,Email,WI
1,{'$oid': '5ff1e194b6a9d73a3a9f1052'},True,{'$date': 1609687444800},{'$date': 1609687537858},consumer,Email,WI
2,{'$oid': '5ff1e194b6a9d73a3a9f1052'},True,{'$date': 1609687444800},{'$date': 1609687537858},consumer,Email,WI
3,{'$oid': '5ff1e1eacfcf6c399c274ae6'},True,{'$date': 1609687530554},{'$date': 1609687530597},consumer,Email,WI
4,{'$oid': '5ff1e194b6a9d73a3a9f1052'},True,{'$date': 1609687444800},{'$date': 1609687537858},consumer,Email,WI
...,...,...,...,...,...,...,...
490,{'$oid': '54943462e4b07e684157a532'},True,{'$date': 1418998882381},{'$date': 1614963143204},fetch-staff,,
491,{'$oid': '54943462e4b07e684157a532'},True,{'$date': 1418998882381},{'$date': 1614963143204},fetch-staff,,
492,{'$oid': '54943462e4b07e684157a532'},True,{'$date': 1418998882381},{'$date': 1614963143204},fetch-staff,,
493,{'$oid': '54943462e4b07e684157a532'},True,{'$date': 1418998882381},{'$date': 1614963143204},fetch-staff,,


## Cleaning the data

### First attempt of extracting the values from columns containing dictionaries
I've chosen to include the following section of python code to help illustrate my thought process that lead to writting the value_from_dict() function. This code includes some notes as comments, but full explanation of my process is included within the documentation of the resulting function.  
The function should account for executing the following code, but if you are stepping through this notebook on a fresh kernel you can skip the cells between **Start** and **End**.

**Start** - you *can* skip executing the code starting here

In [122]:
# confirming the values in _id are being recoginized python objects, in this case a dictionary
type(df_brands['_id'][0])

dict

In [123]:
df_brands['_id'][0]['$oid']

'601ac115be37ce2ead437551'

In [124]:
# extract the _id column as a series
_id_series = df_brands['_id']
# create an list to collect the values from the dictionary objects in _id
_id_clean = []

# iterate through _id_series appending values them to _id_clean
for index, value in _id_series.items():
    _id_clean.append(value['$oid'])
    
# confirm no nulls in _id_clean
assert None not in _id_clean, "there is at least one None/null value in _id_clean"
# confirm _id_clean is the same length is the original _id column in df_brands
assert len(_id_clean) == len(df_brands['_id']), "the length of the original column and the cleaned column are not the same"

# add _id_clean to df_brands after the _id column
df_brands.insert(1, '_id_clean', _id_clean)

df_brands

Unnamed: 0,_id,_id_clean,barcode,category,categoryCode,cpg,name,topBrand,brandCode
0,{'$oid': '601ac115be37ce2ead437551'},601ac115be37ce2ead437551,511111019862,Baking,BAKING,"{'$id': {'$oid': '601ac114be37ce2ead437550'}, '$ref': 'Cogs'}",test brand @1612366101024,0.0,
1,{'$oid': '601c5460be37ce2ead43755f'},601c5460be37ce2ead43755f,511111519928,Beverages,BEVERAGES,"{'$id': {'$oid': '5332f5fbe4b03c9a25efd0ba'}, '$ref': 'Cogs'}",Starbucks,0.0,STARBUCKS
2,{'$oid': '601ac142be37ce2ead43755d'},601ac142be37ce2ead43755d,511111819905,Baking,BAKING,"{'$id': {'$oid': '601ac142be37ce2ead437559'}, '$ref': 'Cogs'}",test brand @1612366146176,0.0,TEST BRANDCODE @1612366146176
3,{'$oid': '601ac142be37ce2ead43755a'},601ac142be37ce2ead43755a,511111519874,Baking,BAKING,"{'$id': {'$oid': '601ac142be37ce2ead437559'}, '$ref': 'Cogs'}",test brand @1612366146051,0.0,TEST BRANDCODE @1612366146051
4,{'$oid': '601ac142be37ce2ead43755e'},601ac142be37ce2ead43755e,511111319917,Candy & Sweets,CANDY_AND_SWEETS,"{'$id': {'$oid': '5332fa12e4b03c9a25efd1e7'}, '$ref': 'Cogs'}",test brand @1612366146827,0.0,TEST BRANDCODE @1612366146827
...,...,...,...,...,...,...,...,...,...
1162,{'$oid': '5f77274dbe37ce6b592e90c0'},5f77274dbe37ce6b592e90c0,511111116752,Baking,BAKING,"{'$ref': 'Cogs', '$id': {'$oid': '5f77274dbe37ce6b592e90bf'}}",test brand @1601644365844,,
1163,{'$oid': '5dc1fca91dda2c0ad7da64ae'},5dc1fca91dda2c0ad7da64ae,511111706328,Breakfast & Cereal,,"{'$ref': 'Cogs', '$id': {'$oid': '53e10d6368abd3c7065097cc'}}",Dippin Dots® Cereal,,DIPPIN DOTS CEREAL
1164,{'$oid': '5f494c6e04db711dd8fe87e7'},5f494c6e04db711dd8fe87e7,511111416173,Candy & Sweets,CANDY_AND_SWEETS,"{'$ref': 'Cogs', '$id': {'$oid': '5332fa12e4b03c9a25efd1e7'}}",test brand @1598639215217,,TEST BRANDCODE @1598639215217
1165,{'$oid': '5a021611e4b00efe02b02a57'},5a021611e4b00efe02b02a57,511111400608,Grocery,,"{'$ref': 'Cogs', '$id': {'$oid': '5332f5f6e4b03c9a25efd0b4'}}",LIPTON TEA Leaves,0.0,LIPTON TEA Leaves


In [125]:
# examining the values in cpg
type(df_brands['cpg'])

pandas.core.series.Series

In [126]:
df_brands['cpg'][0]

{'$id': {'$oid': '601ac114be37ce2ead437550'}, '$ref': 'Cogs'}

In [127]:
df_brands['cpg'][0]['$id']['$oid']

'601ac114be37ce2ead437550'

In [128]:
# extract the cpg column as a series
cpg_series = df_brands['cpg']
# create an list to collect the values from the dictionary objects in _id
cpg_clean = []

# iterate through cpg_series appending values them to cpg_clean
for index, value in cpg_series.items():
    cpg_clean.append(value['$id']['$oid'])
    
# confirm no nulls in _id_clean
assert None not in cpg_clean, "there is at least one None/null value in cpg_clean"
# confirm cpg_clean is the same length is the original cpg column in df_brands
assert len(cpg_clean) == len(df_brands['cpg']), "the length of the original column and the cleaned column are not the same"

# add cpg_clean to df_brands after the cpg column
df_brands.insert(6, 'cpg_clean', cpg_clean)

df_brands

Unnamed: 0,_id,_id_clean,barcode,category,categoryCode,cpg,cpg_clean,name,topBrand,brandCode
0,{'$oid': '601ac115be37ce2ead437551'},601ac115be37ce2ead437551,511111019862,Baking,BAKING,"{'$id': {'$oid': '601ac114be37ce2ead437550'}, '$ref': 'Cogs'}",601ac114be37ce2ead437550,test brand @1612366101024,0.0,
1,{'$oid': '601c5460be37ce2ead43755f'},601c5460be37ce2ead43755f,511111519928,Beverages,BEVERAGES,"{'$id': {'$oid': '5332f5fbe4b03c9a25efd0ba'}, '$ref': 'Cogs'}",5332f5fbe4b03c9a25efd0ba,Starbucks,0.0,STARBUCKS
2,{'$oid': '601ac142be37ce2ead43755d'},601ac142be37ce2ead43755d,511111819905,Baking,BAKING,"{'$id': {'$oid': '601ac142be37ce2ead437559'}, '$ref': 'Cogs'}",601ac142be37ce2ead437559,test brand @1612366146176,0.0,TEST BRANDCODE @1612366146176
3,{'$oid': '601ac142be37ce2ead43755a'},601ac142be37ce2ead43755a,511111519874,Baking,BAKING,"{'$id': {'$oid': '601ac142be37ce2ead437559'}, '$ref': 'Cogs'}",601ac142be37ce2ead437559,test brand @1612366146051,0.0,TEST BRANDCODE @1612366146051
4,{'$oid': '601ac142be37ce2ead43755e'},601ac142be37ce2ead43755e,511111319917,Candy & Sweets,CANDY_AND_SWEETS,"{'$id': {'$oid': '5332fa12e4b03c9a25efd1e7'}, '$ref': 'Cogs'}",5332fa12e4b03c9a25efd1e7,test brand @1612366146827,0.0,TEST BRANDCODE @1612366146827
...,...,...,...,...,...,...,...,...,...,...
1162,{'$oid': '5f77274dbe37ce6b592e90c0'},5f77274dbe37ce6b592e90c0,511111116752,Baking,BAKING,"{'$ref': 'Cogs', '$id': {'$oid': '5f77274dbe37ce6b592e90bf'}}",5f77274dbe37ce6b592e90bf,test brand @1601644365844,,
1163,{'$oid': '5dc1fca91dda2c0ad7da64ae'},5dc1fca91dda2c0ad7da64ae,511111706328,Breakfast & Cereal,,"{'$ref': 'Cogs', '$id': {'$oid': '53e10d6368abd3c7065097cc'}}",53e10d6368abd3c7065097cc,Dippin Dots® Cereal,,DIPPIN DOTS CEREAL
1164,{'$oid': '5f494c6e04db711dd8fe87e7'},5f494c6e04db711dd8fe87e7,511111416173,Candy & Sweets,CANDY_AND_SWEETS,"{'$ref': 'Cogs', '$id': {'$oid': '5332fa12e4b03c9a25efd1e7'}}",5332fa12e4b03c9a25efd1e7,test brand @1598639215217,,TEST BRANDCODE @1598639215217
1165,{'$oid': '5a021611e4b00efe02b02a57'},5a021611e4b00efe02b02a57,511111400608,Grocery,,"{'$ref': 'Cogs', '$id': {'$oid': '5332f5f6e4b03c9a25efd0b4'}}",5332f5f6e4b03c9a25efd0b4,LIPTON TEA Leaves,0.0,LIPTON TEA Leaves


In [129]:
dataframe = df_brands
column_to_clean ='cpg'
dirty_series = dataframe[column_to_clean]
dirty_series
insert_at = dataframe.columns.get_loc(column_to_clean) + 1
cleaned_list = []
dict_key = "['$id']['$oid']"
dict_key_value = "value" + dict_key


for index, value in dirty_series.items():
    cleaned_list.append(eval(dict_key_value))

cleaned_list



['601ac114be37ce2ead437550',
 '5332f5fbe4b03c9a25efd0ba',
 '601ac142be37ce2ead437559',
 '601ac142be37ce2ead437559',
 '5332fa12e4b03c9a25efd1e7',
 '601ac142be37ce2ead437559',
 '601ac142be37ce2ead437559',
 '559c2234e4b06aca36af13c6',
 '5a734034e4b0d58f376be874',
 '59ba6f1ce4b092b29c167346',
 '5f4bf556be37ce0b44915549',
 '5332f5f2e4b03c9a25efd0aa',
 '559c2234e4b06aca36af13c6',
 '5d5d4fd16d5f3b23d1bc7905',
 '5332f5fbe4b03c9a25efd0ba',
 '5332f709e4b03c9a25efd0f1',
 '5d9b4f591dda2c6225a284aa',
 '5f358338be37ce443bf9d557',
 '5fb28549be37ce522e165cb4',
 '5332f5f6e4b03c9a25efd0b4',
 '55b62995e4b0d8e685c14213',
 '5d9b4f591dda2c6225a284aa',
 '559c2234e4b06aca36af13c6',
 '53e10d6368abd3c7065097cc',
 '5332f5ebe4b03c9a25efd0a8',
 '5e9f12f5be37ce3e45b6a77e',
 '5332f5f6e4b03c9a25efd0b4',
 '5d5d4fd16d5f3b23d1bc7905',
 '5f493e72be37ce64d0ae36c2',
 '5f4936dcbe37ce52f8314fd8',
 '559c2234e4b06aca36af13c6',
 '5fd2a0aebe37ce49eb72c0ed',
 '53e10d6368abd3c7065097cc',
 '5f494c5d04db711dd8fe87e2',
 '5332f5f3e4b0

In [130]:
dataframe

Unnamed: 0,_id,_id_clean,barcode,category,categoryCode,cpg,cpg_clean,name,topBrand,brandCode
0,{'$oid': '601ac115be37ce2ead437551'},601ac115be37ce2ead437551,511111019862,Baking,BAKING,"{'$id': {'$oid': '601ac114be37ce2ead437550'}, '$ref': 'Cogs'}",601ac114be37ce2ead437550,test brand @1612366101024,0.0,
1,{'$oid': '601c5460be37ce2ead43755f'},601c5460be37ce2ead43755f,511111519928,Beverages,BEVERAGES,"{'$id': {'$oid': '5332f5fbe4b03c9a25efd0ba'}, '$ref': 'Cogs'}",5332f5fbe4b03c9a25efd0ba,Starbucks,0.0,STARBUCKS
2,{'$oid': '601ac142be37ce2ead43755d'},601ac142be37ce2ead43755d,511111819905,Baking,BAKING,"{'$id': {'$oid': '601ac142be37ce2ead437559'}, '$ref': 'Cogs'}",601ac142be37ce2ead437559,test brand @1612366146176,0.0,TEST BRANDCODE @1612366146176
3,{'$oid': '601ac142be37ce2ead43755a'},601ac142be37ce2ead43755a,511111519874,Baking,BAKING,"{'$id': {'$oid': '601ac142be37ce2ead437559'}, '$ref': 'Cogs'}",601ac142be37ce2ead437559,test brand @1612366146051,0.0,TEST BRANDCODE @1612366146051
4,{'$oid': '601ac142be37ce2ead43755e'},601ac142be37ce2ead43755e,511111319917,Candy & Sweets,CANDY_AND_SWEETS,"{'$id': {'$oid': '5332fa12e4b03c9a25efd1e7'}, '$ref': 'Cogs'}",5332fa12e4b03c9a25efd1e7,test brand @1612366146827,0.0,TEST BRANDCODE @1612366146827
...,...,...,...,...,...,...,...,...,...,...
1162,{'$oid': '5f77274dbe37ce6b592e90c0'},5f77274dbe37ce6b592e90c0,511111116752,Baking,BAKING,"{'$ref': 'Cogs', '$id': {'$oid': '5f77274dbe37ce6b592e90bf'}}",5f77274dbe37ce6b592e90bf,test brand @1601644365844,,
1163,{'$oid': '5dc1fca91dda2c0ad7da64ae'},5dc1fca91dda2c0ad7da64ae,511111706328,Breakfast & Cereal,,"{'$ref': 'Cogs', '$id': {'$oid': '53e10d6368abd3c7065097cc'}}",53e10d6368abd3c7065097cc,Dippin Dots® Cereal,,DIPPIN DOTS CEREAL
1164,{'$oid': '5f494c6e04db711dd8fe87e7'},5f494c6e04db711dd8fe87e7,511111416173,Candy & Sweets,CANDY_AND_SWEETS,"{'$ref': 'Cogs', '$id': {'$oid': '5332fa12e4b03c9a25efd1e7'}}",5332fa12e4b03c9a25efd1e7,test brand @1598639215217,,TEST BRANDCODE @1598639215217
1165,{'$oid': '5a021611e4b00efe02b02a57'},5a021611e4b00efe02b02a57,511111400608,Grocery,,"{'$ref': 'Cogs', '$id': {'$oid': '5332f5f6e4b03c9a25efd0b4'}}",5332f5f6e4b03c9a25efd0b4,LIPTON TEA Leaves,0.0,LIPTON TEA Leaves


In [131]:
# reset df_brands to the inital load of raw data - if excuted above code, 
# uncomment the following line to avoid any exceptions with value_from_dict() 
df_brands = pd.read_json(in_brands,lines=True,compression='gzip')

**End** You can continue executing the following cells

### Writting a function to extract values from a dataframe's column that contains a dictionary. Then adds those values back to the dataframe as a new column.

In [13]:
# I realized I'd be doing this multiple times, better to make a function
# define a function that cleans a df column that contains a dictionary by returning the specified 
# values and adding them as new column in the dataframe
def value_from_dict(dataframe, column_to_clean, dict_key, allow_nulls = False):
    """Returns dataframe with a 'cleaned' column inserted after the column that was cleaned.
    
    :param dataframe: A dataframe with a column containing dictionaries, from which one value is
        to be extracted
    :type dataframe: Pandas DataFrame
    :param column_to_clean: The name of the column containing dictionaries
    :type column_to_clean: str
    :param dict_key: A str containing the key associated with the value we want to extract,
        e.g, "['$id']['$oid']"
    :type dict_key: str
    :param allow_nulls: A boolean value indicating if None/Null/NaN/NaT values should be allowed,
        defaults to False
    :type allow_nulls: bool
    
    :raises AssertionError: 'there is at least one None/Null/NaN/NaT value in the cleaned data' if param allow_nulls = False
    :raises AssertionError: 'the length of the original column and the cleaned column are not the same'
    :excepts VlueError: If a column has already been cleaned, print message confirming
        no values were added to the dataframe
    
    :rtype: Pandas DataFrame
    :return: the original DataFrame with an additional column containing 'cleaned' values
    """
    # setting variables:
    # extract the column we want to clean from the dataframe as a series
    dirty_series = dataframe[column_to_clean]
    # create a list to store the cleaned values in
    cleaned_list = []
    # name of the column we'll be adding to the DataFrame
    cleaned_column_name = column_to_clean + '_cleaned'
    # location to insert the cleaned column, after the 'dirty' column
    insert_at = dataframe.columns.get_loc(column_to_clean) + 1
    # translate dict_key str into a format useable in the following for loop
    value_dict_key = "value" + dict_key
    
    # iterate through dirty_series appending extracted values to cleaned_list
    for index, value in dirty_series.items():
        # if there is no dictionary or any other issue, append None
        try:
            cleaned_list.append(eval(value_dict_key))
        except:
            cleaned_list.append(None)

    # handle allow_nulls param flag
    if not allow_nulls:
        # confirm no nulls in cleaned_list
        assert None not in cleaned_list, "there is at least one None value in the cleaned data"
    
    # confirm cleaned_list is the same length as dirty_series
    assert len(cleaned_list) == len(dirty_series), "the length of the original column and the cleaned column are not the same"
    
    # add the cleaned_list data to the originl dataframe following column_to_clean
    try:
        dataframe.insert(insert_at, cleaned_column_name, cleaned_list)
    except ValueError as error:
        print(f"{str(error)}, {cleaned_column_name} was not added to the dataframe")

    # return the modified dataframe
    return dataframe

### Using  value_from_dict() to exatract values from all the columns containing dictionaries and add them to the dataframe as a new column:

#### df_brands._id :

In [14]:
# look at the first value in df_brands._id
df_brands['_id'][0]

{'$oid': '601ac115be37ce2ead437551'}

In [15]:
#extract the value
df_brands['_id'][0]['$oid']

'601ac115be37ce2ead437551'

In [16]:
# access the value, set a varbale to use for the dict_key param of value_from_dict() function
brand_id_dict_key = "['$oid']"

In [17]:
# clean df_brands._id and confirm by viewing a sample of the dataframe
value_from_dict(df_brands, "_id", brand_id_dict_key)
df_brands.sample(2)

Unnamed: 0,_id,_id_cleaned,barcode,category,categoryCode,cpg,name,topBrand,brandCode
1095,{'$oid': '58861c7f4e8d0d20bc42c4fa'},58861c7f4e8d0d20bc42c4fa,511111701293,Breakfast & Cereal,,"{'$ref': 'Cogs', '$id': {'$oid': '5332f5fbe4b03c9a25efd0ba'}}",Quaker Granola,0.0,
86,{'$oid': '5aa1b41de4b086c8aad5e096'},5aa1b41de4b086c8aad5e096,511111304265,Condiments & Sauces,,"{'$ref': 'Cogs', '$id': {'$oid': '559c2234e4b06aca36af13c6'}}",MOMOFUKU Sauce,0.0,MOMOFUKU


#### df_brands.cpg:

In [18]:
# look at the first value in df_brands.cpg
df_brands['cpg'][0]

{'$id': {'$oid': '601ac114be37ce2ead437550'}, '$ref': 'Cogs'}

In [19]:
#extract the value
df_brands['cpg'][0]['$id']['$oid']

'601ac114be37ce2ead437550'

In [20]:
# set a variable to use for the dict_key param of value_from_dict() function
brand_cpg_dict_key = "['$id']['$oid']"

In [21]:
# clean df_brands.cpg and confirm by viewing a sample of the dataframe
value_from_dict(df_brands, "cpg", brand_cpg_dict_key)
df_brands.sample(2)

Unnamed: 0,_id,_id_cleaned,barcode,category,categoryCode,cpg,cpg_cleaned,name,topBrand,brandCode
812,{'$oid': '5f38578fbe37ce5178517ad4'},5f38578fbe37ce5178517ad4,511111615422,Baking,BAKING,"{'$ref': 'Cogs', '$id': {'$oid': '5f38578fbe37ce5178517ad3'}}",5f38578fbe37ce5178517ad3,test brand @1597527951382,,TEST BRANDCODE @1597527951382
777,{'$oid': '5f7ba932be37ce2f290fb254'},5f7ba932be37ce2f290fb254,511111616979,Baking,BAKING,"{'$ref': 'Cogs', '$id': {'$oid': '5f7ba932be37ce2f290fb251'}}",5f7ba932be37ce2f290fb251,test brand @1601939762943,,TEST BRANDCODE @1601939762943


#### df_receipts._id:

In [22]:
# look at the first value in df_receipts._id
df_receipts['_id'][0]

{'$oid': '5ff1e1eb0a720f0523000575'}

In [23]:
#extract the value
df_receipts['_id'][0]['$oid']

'5ff1e1eb0a720f0523000575'

In [24]:
# set a variable to use for the dict_key param of value_from_dict() function
receipts_id_dict_key = "['$oid']"

In [25]:
# clean df_receipts._id and confirm by viewing a sample of the dataframe
value_from_dict(df_receipts, "_id", receipts_id_dict_key)
df_receipts.sample(2)

Unnamed: 0,_id,_id_cleaned,bonusPointsEarned,bonusPointsEarnedReason,createDate,dateScanned,finishedDate,modifyDate,pointsAwardedDate,pointsEarned,purchaseDate,purchasedItemCount,rewardsReceiptItemList,rewardsReceiptStatus,totalSpent,userId
184,{'$oid': '5ffc9da40a7214adca000050'},5ffc9da40a7214adca000050,5.0,All-receipts receipt bonus,{'$date': 1610390948000},{'$date': 1610390948000},{'$date': 1610390948000},{'$date': 1610390953000},{'$date': 1610390948000},5.0,{'$date': 1610323200000},5.0,"[{'barcode': '4011', 'description': 'ITEM NOT FOUND', 'finalPrice': '21.00', 'itemPrice': '21.00', 'needsFetchReview': False, 'partnerItemId': '1', 'preventTargetGapPoints': True, 'quantityPurchased': 5, 'userFlaggedBarcode': '4011', 'userFlaggedNewItem': True, 'userFlaggedPrice': '21.00', 'userFlaggedQuantity': 5}]",FINISHED,21.0,5ffc9da0b3348b11c9338951
1116,{'$oid': '603cf5290a720fde10000413'},603cf5290a720fde10000413,,,{'$date': 1614607657664},{'$date': 1614607657664},,{'$date': 1614607657664},,,,,,SUBMITTED,,5fc961c3b8cfca11a077dd33


#### df_receipts.createDate:

In [26]:
# look at the first value in XXXX.YYY
df_receipts['createDate'][0]

{'$date': 1609687531000}

In [27]:
# extract the value
df_receipts['createDate'][0]['$date']

1609687531000

In [28]:
# set a variable to use for the dict_key param of value_from_dict() function
receipts_createDate_dict_key = "['$date']"

In [29]:
# clean df_receipts.createDate and confirm by viewing a sample of the dataframe
value_from_dict(df_receipts, "createDate", receipts_createDate_dict_key)
df_receipts.sample(2)

Unnamed: 0,_id,_id_cleaned,bonusPointsEarned,bonusPointsEarnedReason,createDate,createDate_cleaned,dateScanned,finishedDate,modifyDate,pointsAwardedDate,pointsEarned,purchaseDate,purchasedItemCount,rewardsReceiptItemList,rewardsReceiptStatus,totalSpent,userId
1006,{'$oid': '6038e1020a720fde100000a4'},6038e1020a720fde100000a4,,,{'$date': 1614340354138},1614340354138,{'$date': 1614340354138},,{'$date': 1614340354138},,,,,,SUBMITTED,,5fc961c3b8cfca11a077dd33
189,{'$oid': '5ffca2f30a720f05c5000054'},5ffca2f30a720f05c5000054,5.0,All-receipts receipt bonus,{'$date': 1610356307000},1610356307000,{'$date': 1610356307000},{'$date': 1610374310000},{'$date': 1610374310000},{'$date': 1610374310000},5.0,{'$date': 1610287200000},1.0,"[{'brandCode': 'WINGSTOP', 'description': 'FILL UP MEAL', 'discountedItemPrice': '3.09', 'finalPrice': '3.09', 'itemPrice': '3.09', 'originalReceiptItemText': 'FILL UP MEAL', 'partnerItemId': '1009', 'quantityPurchased': 1}]",FINISHED,3.09,5ffca2f3b3348b11c9338a10


#### df_receipts.dateScanned:
same format as df_receipts.createDate

In [30]:
# set a varbale to use for the dict_key param of value_from_dict() function
receipts_dateScanned_dict_key = "['$date']"

In [31]:
# clean df_receipts.dateScanned and confirm by viewing a sample of the dataframe
value_from_dict(df_receipts, "dateScanned", receipts_dateScanned_dict_key)
df_receipts.sample(2)

Unnamed: 0,_id,_id_cleaned,bonusPointsEarned,bonusPointsEarnedReason,createDate,createDate_cleaned,dateScanned,dateScanned_cleaned,finishedDate,modifyDate,pointsAwardedDate,pointsEarned,purchaseDate,purchasedItemCount,rewardsReceiptItemList,rewardsReceiptStatus,totalSpent,userId
743,{'$oid': '601b730f0a7214ad28000243'},601b730f0a7214ad28000243,,,{'$date': 1612411663042},1612411663042,{'$date': 1612411663042},1612411663042,,{'$date': 1612411663042},,,,,,SUBMITTED,,5fc961c3b8cfca11a077dd33
362,{'$oid': '600887560a720f05fa000098'},600887560a720f05fa000098,250.0,"Receipt number 3 completed, bonus point schedule DEFAULT (5cefdcacf3693e0b50e83a36)",{'$date': 1611171670000},1611171670000,{'$date': 1611171670000},1611171670000,{'$date': 1611171671000},{'$date': 1611171671000},{'$date': 1611171671000},250.0,{'$date': 1613850070000},1.0,"[{'barcode': '4011', 'description': 'ITEM NOT FOUND', 'finalPrice': '1', 'itemPrice': '1', 'partnerItemId': '1', 'quantityPurchased': 1}]",FINISHED,1.0,6008873eb6310511daa4e8eb


#### df_receipts.finishedDate:
same format as df_receipts.createDate - can include None/Null/NaN

In [32]:
# set a varbale to use for the dict_key param of value_from_dict() function
receipts_finishedDate_dict_key = "['$date']"

In [33]:
# clean df_receipts.finishedDate and confirm by viewing a sample of the dataframe
value_from_dict(df_receipts, "finishedDate", receipts_finishedDate_dict_key)
df_receipts.sample(2)

AssertionError: there is at least one None value in the cleaned data

In [34]:
# clean df_receipts.finishedDate and confirm by viewing a sample of the dataframe
# setting allow_nulls = True
value_from_dict(df_receipts, "finishedDate", receipts_finishedDate_dict_key, allow_nulls=True)
df_receipts.sample(2)

Unnamed: 0,_id,_id_cleaned,bonusPointsEarned,bonusPointsEarnedReason,createDate,createDate_cleaned,dateScanned,dateScanned_cleaned,finishedDate,finishedDate_cleaned,modifyDate,pointsAwardedDate,pointsEarned,purchaseDate,purchasedItemCount,rewardsReceiptItemList,rewardsReceiptStatus,totalSpent,userId
657,{'$oid': '6018643e0a7214ad28000002'},6018643e0a7214ad28000002,,,{'$date': 1612211262000},1612211262000,{'$date': 1612211262000},1612211262000,,,{'$date': 1612211262000},,,,,,SUBMITTED,,5fc961c3b8cfca11a077dd33
1100,{'$oid': '603c88240a7217c72c0003b8'},603c88240a7217c72c0003b8,25.0,COMPLETE_NONPARTNER_RECEIPT,{'$date': 1614579748000},1614579748000,{'$date': 1614579748000},1614579748000,,,{'$date': 1614579749000},,25.0,{'$date': 1597622400000},2.0,"[{'barcode': 'B076FJ92M4', 'description': 'mueller austria hypergrind precision electric spice/coffee grinder millwith large grinding capacity and hd motor also for spices, herbs, nuts,grains, white', 'discountedItemPrice': '22.97', 'finalPrice': '22.97', 'itemPrice': '22.97', 'originalReceiptItemText': 'mueller austria hypergrind precision electric spice/coffee grinder millwith large grinding capacity and hd motor also for spices, herbs, nuts,grains, white', 'partnerItemId': '0', 'priceAfterCoupon': '22.97', 'quantityPurchased': 1}, {'barcode': 'B07BRRLSVC', 'description': 'thindust summer face mask - sun protection neck gaiter for outdooractivities', 'discountedItemPrice': '11.99', 'finalPrice': '11.99', 'itemPrice': '11.99', 'originalReceiptItemText': 'thindust summer face mask - sun protection neck gaiter for outdooractivities', 'partnerItemId': '1', 'priceAfterCoupon': '11.99', 'quantityPurchased': 1}]",REJECTED,34.96,5fc961c3b8cfca11a077dd33


#### df_receipts.modifyDate:
same format as df_receipts.createDate

In [35]:
# set a varbale to use for the dict_key param of value_from_dict() function
receipts_modifyDate_dict_key = "['$date']"

In [36]:
# clean df_receipts.modifyDate and confirm by viewing a sample of the dataframe
value_from_dict(df_receipts, "modifyDate", receipts_modifyDate_dict_key)
df_receipts.sample(2)

Unnamed: 0,_id,_id_cleaned,bonusPointsEarned,bonusPointsEarnedReason,createDate,createDate_cleaned,dateScanned,dateScanned_cleaned,finishedDate,finishedDate_cleaned,modifyDate,modifyDate_cleaned,pointsAwardedDate,pointsEarned,purchaseDate,purchasedItemCount,rewardsReceiptItemList,rewardsReceiptStatus,totalSpent,userId
32,{'$oid': '5ff36c750a7214ada100058f'},5ff36c750a7214ada100058f,,,{'$date': 1609788533000},1609788533000,{'$date': 1609788533000},1609788533000,{'$date': 1609788534000},1609789000000.0,{'$date': 1609788534000},1609788534000,{'$date': 1609788534000},500.0,{'$date': 1609702133000},9.0,"[{'barcode': '029000079236', 'description': 'PLANTERSe Cashew Halves & Pieces - 46 oz.', 'finalPrice': '9.99', 'itemPrice': '9.99', 'partnerItemId': '1', 'pointsEarned': '50.0', 'pointsPayerId': '559c2234e4b06aca36af13c6', 'quantityPurchased': 1, 'rewardsGroup': 'PLANTERS CASHEWS', 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6', 'targetPrice': '800'}, {'barcode': '029000079236', 'description': 'PLANTERSe Cashew Halves & Pieces - 46 oz.', 'finalPrice': '9.99', 'itemPrice': '9.99', 'partnerItemId': '2', 'pointsEarned': '50.0', 'pointsPayerId': '559c2234e4b06aca36af13c6', 'quantityPurchased': 1, 'rewardsGroup': 'PLANTERS CASHEWS', 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6', 'targetPrice': '800'}, {'barcode': '029000079236', 'description': 'PLANTERSe Cashew Halves & Pieces - 46 oz.', 'finalPrice': '9.99', 'itemPrice': '9.99', 'partnerItemId': '3', 'pointsEarned': '50.0', 'pointsPayerId': '559c2234e4b06aca36af13c6', 'quantityPurchased': 1, 'rewardsGroup': 'PLANTERS CASHEWS', 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6', 'targetPrice': '800'}, {'barcode': '029000079236', 'description': 'PLANTERSe Cashew Halves & Pieces - 46 oz.', 'finalPrice': '9.99', 'itemPrice': '9.99', 'partnerItemId': '4', 'pointsEarned': '50.0', 'pointsPayerId': '559c2234e4b06aca36af13c6', 'quantityPurchased': 1, 'rewardsGroup': 'PLANTERS CASHEWS', 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6', 'targetPrice': '800'}, {'barcode': '029000079236', 'description': 'PLANTERSe Cashew Halves & Pieces - 46 oz.', 'finalPrice': '9.99', 'itemPrice': '9.99', 'partnerItemId': '5', 'pointsEarned': '50.0', 'pointsPayerId': '559c2234e4b06aca36af13c6', 'quantityPurchased': 1, 'rewardsGroup': 'PLANTERS CASHEWS', 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6', 'targetPrice': '800'}, {'barcode': '029000079236', 'description': 'PLANTERSe Cashew Halves & Pieces - 46 oz.', 'finalPrice': '9.99', 'itemPrice': '9.99', 'partnerItemId': '6', 'pointsEarned': '50.0', 'pointsPayerId': '559c2234e4b06aca36af13c6', 'quantityPurchased': 1, 'rewardsGroup': 'PLANTERS CASHEWS', 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6', 'targetPrice': '800'}, {'barcode': '029000079236', 'description': 'PLANTERSe Cashew Halves & Pieces - 46 oz.', 'finalPrice': '9.99', 'itemPrice': '9.99', 'partnerItemId': '7', 'pointsEarned': '50.0', 'pointsPayerId': '559c2234e4b06aca36af13c6', 'quantityPurchased': 1, 'rewardsGroup': 'PLANTERS CASHEWS', 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6', 'targetPrice': '800'}, {'barcode': '029000079236', 'description': 'PLANTERSe Cashew Halves & Pieces - 46 oz.', 'finalPrice': '9.99', 'itemPrice': '9.99', 'partnerItemId': '8', 'pointsEarned': '50.0', 'pointsPayerId': '559c2234e4b06aca36af13c6', 'quantityPurchased': 1, 'rewardsGroup': 'PLANTERS CASHEWS', 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6', 'targetPrice': '800'}, {'barcode': '029000079236', 'description': 'PLANTERSe Cashew Halves & Pieces - 46 oz.', 'finalPrice': '9.99', 'itemPrice': '9.99', 'partnerItemId': '9', 'pointsEarned': '50.0', 'pointsPayerId': '559c2234e4b06aca36af13c6', 'quantityPurchased': 1, 'rewardsGroup': 'PLANTERS CASHEWS', 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6', 'targetPrice': '800'}]",FINISHED,89.91,5ff36be7135e7011bcb856d3
746,{'$oid': '601bc2bd0a7214ad28000262'},601bc2bd0a7214ad28000262,,,{'$date': 1612432061258},1612432061258,{'$date': 1612432061258},1612432061258,,,{'$date': 1612432061258},1612432061258,,,,,,SUBMITTED,,5fc961c3b8cfca11a077dd33


#### df_receipts.pointsAwardedDate:
same format as df_receipts.createDate - can include None/Null/NaN

In [37]:
# set a varbale to use for the dict_key param of value_from_dict() function
receipts_pointsAwardedDate_dict_key = "['$date']"

In [38]:
# clean df_receipts.purchaseDate and confirm by viewing a sample of the dataframe
value_from_dict(df_receipts, "pointsAwardedDate", receipts_pointsAwardedDate_dict_key)
df_receipts.sample(2)

AssertionError: there is at least one None value in the cleaned data

In [39]:
# clean df_receipts.purchaseDate and confirm by viewing a sample of the dataframe allowing nulls
value_from_dict(df_receipts, "pointsAwardedDate", receipts_pointsAwardedDate_dict_key, allow_nulls=True)
df_receipts.sample(2)

Unnamed: 0,_id,_id_cleaned,bonusPointsEarned,bonusPointsEarnedReason,createDate,createDate_cleaned,dateScanned,dateScanned_cleaned,finishedDate,finishedDate_cleaned,modifyDate,modifyDate_cleaned,pointsAwardedDate,pointsAwardedDate_cleaned,pointsEarned,purchaseDate,purchasedItemCount,rewardsReceiptItemList,rewardsReceiptStatus,totalSpent,userId
688,{'$oid': '60189cc10a7214ad28000050'},60189cc10a7214ad28000050,25.0,COMPLETE_NONPARTNER_RECEIPT,{'$date': 1612225729000},1612225729000,{'$date': 1612225729000},1612225729000,{'$date': 1612225730000},1612226000000.0,{'$date': 1612225730000},1612225730000,{'$date': 1612225730000},1612226000000.0,25.0,{'$date': 1611619200000},1.0,"[{'barcode': '070470149394', 'brandCode': 'BRAND', 'description': 'Yoplait Fiber One Non-Fat Yogurt - Peach & Vanilla, 6 Pack', 'finalPrice': '10.00', 'itemPrice': '10.00', 'partnerItemId': '0', 'pointsNotAwardedReason': 'Action not allowed for user and CPG', 'pointsPayerId': '5332f5f3e4b03c9a25efd0ae', 'quantityPurchased': 1, 'rewardsGroup': 'YOPLAIT FIBER ONE YOGURT', 'rewardsProductPartnerId': '5332f5f3e4b03c9a25efd0ae'}]",FINISHED,10.0,60189c74c8b50e11d8454eff
670,{'$oid': '60189ffe0a720f05f4000066'},60189ffe0a720f05f4000066,,,{'$date': 1612226558661},1612226558661,{'$date': 1612226558661},1612226558661,,,{'$date': 1612226558661},1612226558661,,,,,,,SUBMITTED,,5fc961c3b8cfca11a077dd33


#### df_receipts.purchaseDate:
same format as df_receipts.createDate - can include None/Null/NaN

In [40]:
# set a varbale to use for the dict_key param of value_from_dict() function
receipts_purchaseDate_dict_key = "['$date']"

In [41]:
# clean df_receipts.purchaseDate and confirm by viewing a sample of the dataframe
value_from_dict(df_receipts, "purchaseDate", receipts_purchaseDate_dict_key)
df_receipts.sample(2)

AssertionError: there is at least one None value in the cleaned data

In [42]:
# clean df_receipts.purchaseDate and confirm by viewing a sample of the dataframe allowing nulls
value_from_dict(df_receipts, "purchaseDate", receipts_purchaseDate_dict_key, allow_nulls=True)
df_receipts.sample(2)

Unnamed: 0,_id,_id_cleaned,bonusPointsEarned,bonusPointsEarnedReason,createDate,createDate_cleaned,dateScanned,dateScanned_cleaned,finishedDate,finishedDate_cleaned,modifyDate,modifyDate_cleaned,pointsAwardedDate,pointsAwardedDate_cleaned,pointsEarned,purchaseDate,purchaseDate_cleaned,purchasedItemCount,rewardsReceiptItemList,rewardsReceiptStatus,totalSpent,userId
1071,{'$oid': '603b0b4a0a720fde1000024e'},603b0b4a0a720fde1000024e,25.0,COMPLETE_NONPARTNER_RECEIPT,{'$date': 1614482250000},1614482250000,{'$date': 1614482250000},1614482250000,,,{'$date': 1614482251000},1614482251000,,,25.0,{'$date': 1597622400000},1597622000000.0,2.0,"[{'barcode': 'B076FJ92M4', 'description': 'mueller austria hypergrind precision electric spice/coffee grinder millwith large grinding capacity and hd motor also for spices, herbs, nuts,grains, white', 'discountedItemPrice': '22.97', 'finalPrice': '22.97', 'itemPrice': '22.97', 'originalReceiptItemText': 'mueller austria hypergrind precision electric spice/coffee grinder millwith large grinding capacity and hd motor also for spices, herbs, nuts,grains, white', 'partnerItemId': '0', 'priceAfterCoupon': '22.97', 'quantityPurchased': 1}, {'barcode': 'B07BRRLSVC', 'description': 'thindust summer face mask - sun protection neck gaiter for outdooractivities', 'discountedItemPrice': '11.99', 'finalPrice': '11.99', 'itemPrice': '11.99', 'originalReceiptItemText': 'thindust summer face mask - sun protection neck gaiter for outdooractivities', 'partnerItemId': '1', 'priceAfterCoupon': '11.99', 'quantityPurchased': 1}]",REJECTED,34.96,5fc961c3b8cfca11a077dd33
1103,{'$oid': '603cd57d0a7217c72c0003f6'},603cd57d0a7217c72c0003f6,25.0,COMPLETE_NONPARTNER_RECEIPT,{'$date': 1614599549000},1614599549000,{'$date': 1614599549000},1614599549000,,,{'$date': 1614599550000},1614599550000,,,25.0,{'$date': 1597622400000},1597622000000.0,2.0,"[{'barcode': 'B076FJ92M4', 'description': 'mueller austria hypergrind precision electric spice/coffee grinder millwith large grinding capacity and hd motor also for spices, herbs, nuts,grains, white', 'discountedItemPrice': '22.97', 'finalPrice': '22.97', 'itemPrice': '22.97', 'originalReceiptItemText': 'mueller austria hypergrind precision electric spice/coffee grinder millwith large grinding capacity and hd motor also for spices, herbs, nuts,grains, white', 'partnerItemId': '0', 'priceAfterCoupon': '22.97', 'quantityPurchased': 1}, {'barcode': 'B07BRRLSVC', 'description': 'thindust summer face mask - sun protection neck gaiter for outdooractivities', 'discountedItemPrice': '11.99', 'finalPrice': '11.99', 'itemPrice': '11.99', 'originalReceiptItemText': 'thindust summer face mask - sun protection neck gaiter for outdooractivities', 'partnerItemId': '1', 'priceAfterCoupon': '11.99', 'quantityPurchased': 1}]",REJECTED,34.96,5fc961c3b8cfca11a077dd33


#### df_users._id:

In [43]:
# look at the first value in df_users._id
df_users['_id'][0]

{'$oid': '5ff1e194b6a9d73a3a9f1052'}

In [44]:
#extract the value
df_users['_id'][0]['$oid']

'5ff1e194b6a9d73a3a9f1052'

In [45]:
# set a varbale to use for the dict_key param of value_from_dict() function
users_id_dict_key = "['$oid']"

In [46]:
# clean df_users._id and confirm by viewing a sample of the dataframe
value_from_dict(df_users, "_id", users_id_dict_key)
df_users.sample(2)

Unnamed: 0,_id,_id_cleaned,active,createdDate,lastLogin,role,signUpSource,state
426,{'$oid': '5a43c08fe4b014fd6b6a0612'},5a43c08fe4b014fd6b6a0612,True,{'$date': 1514389647059},{'$date': 1613146957155},consumer,,
476,{'$oid': '54943462e4b07e684157a532'},54943462e4b07e684157a532,True,{'$date': 1418998882381},{'$date': 1614963143204},fetch-staff,,


#### df_users.createdDate:  
same format as df_receipts.createDate

In [47]:
# set a variable to use for the dict_key param of value_from_dict() function
users_createdDate_dict_key = "['$date']"

In [48]:
# clean df_users.createDate and confirm by viewing a sample of the dataframe
value_from_dict(df_users, "createdDate", users_createdDate_dict_key)
df_users.sample(2)

Unnamed: 0,_id,_id_cleaned,active,createdDate,createdDate_cleaned,lastLogin,role,signUpSource,state
443,{'$oid': '5fc961c3b8cfca11a077dd33'},5fc961c3b8cfca11a077dd33,True,{'$date': 1607033283936},1607033283936,{'$date': 1614379156799},fetch-staff,Email,NH
437,{'$oid': '5fc961c3b8cfca11a077dd33'},5fc961c3b8cfca11a077dd33,True,{'$date': 1607033283936},1607033283936,{'$date': 1614379156799},fetch-staff,Email,NH


#### df_users.lastLogin:  
same format as df_receipts.createDat - can include None/Null/NaN

In [49]:
# set a variable to use for the dict_key param of value_from_dict() function
users_lastLogin_dict_key = "['$date']"

In [50]:
# clean df_users.createDate and confirm by viewing a sample of the dataframe
value_from_dict(df_users, "lastLogin", users_lastLogin_dict_key)
df_users.sample(2)

AssertionError: there is at least one None value in the cleaned data

In [51]:
# clean df_users.createDate and confirm by viewing a sample of the dataframe allowing nulls
value_from_dict(df_users, "lastLogin", users_lastLogin_dict_key, allow_nulls=True)
df_users.sample(2)

Unnamed: 0,_id,_id_cleaned,active,createdDate,createdDate_cleaned,lastLogin,lastLogin_cleaned,role,signUpSource,state
84,{'$oid': '5ff7264e8f142f11dd189504'},5ff7264e8f142f11dd189504,True,{'$date': 1610032718596},1610032718596,{'$date': 1610032826821},1610033000000.0,consumer,Email,WI
170,{'$oid': '5e27526d0bdb6a138c32b556'},5e27526d0bdb6a138c32b556,True,{'$date': 1579635309795},1579635309795,,,consumer,Google,WI


### Writing a function to convert date data from epoch to timestamps

In [52]:
def epoch_to_timestamp(dataframe, column_to_convert, allow_nulls = False):
    """Returns dataframe with a new column containing timestamps converted from epoch.
    
    :param dataframe: A dataframe with a column containing epoch seconds as ints or floats
    :type dataframe: Pandas DataFrame
    :param column_to_convert: The name of the column containing epoch seconds
    :type column_to_convert: str
    :param allow_nulls: A boolean value indicating if None(Null) values should be allowed,
        defaults to False
    :type allow_nulls: bool
    
    :raises AssertionError: 'there is at least one None/Null/NaN/NaT value in the converted timestamp data' 
        if param allow_nulls = False
    :raises AssertionError: 'the length of the original column and the converted column are not the same'
    :excepts VlueError: If a column has already been converted, print message confirming
        no values were added to the dataframe
    
    :rtype: Pandas DataFrame
    :return: the original DataFrame with an additional column containing converted epoch values as timestamps
    """
    #setting variables
    # name of the new column we'll be adding to the dataframe
    converted_column_name = column_to_convert + "_ts"
    # location to insert the converted column, after the column_to_clean
    insert_at = dataframe.columns.get_loc(column_to_convert) + 1
    # create a series of timestamps from the epoch time column_to_convert
    # pd.to_datetime() converts a scalar, array-like, Series or DataFrame/dict-like to a pandas datetime object
    # the data in the epoch columns is miliseconds from epoch start and we round to 1ms for consistency 
    time_stamps = pd.to_datetime(dataframe[column_to_convert], unit='ms').round('1ms')
    
    # handle allow_nulls param flag
    if not allow_nulls:
        # confirm no nulls in time_stamps
#         assert None not in time_stamps, "there is at least one None/Null/NaN/NaT value in the converted timestamp data"
        assert not time_stamps.isnull().values.any(), "there is at least one None/Null/NaN/NaT value in the converted timestamp data" 
            # df_receipts['finishedDate_cleaned'].isnull().values.any()
    
    # confirm time_stamps is the same length as column_to_convert
    assert len(time_stamps) == len(dataframe[column_to_convert]), "the length of the original column and the converted column are not the same"
    
    # add the timestamps data to the originl dataframe following column_to_convert
    try:
        dataframe.insert(insert_at, converted_column_name, time_stamps)
    except ValueError as error:
        print(f"{str(error)}, {converted_column_name} was not added to the dataframe")
    
    # return the modified dataframe
    return dataframe
    

### Using epoch_to_timestamp() to convert columns with epoch values to timestamps and add them to the dataframe as a new column:

In [53]:
# convert df_users.createdDate_cleaned to timestamps and confirm by viewing a sample of the dataframe
epoch_to_timestamp(df_users, 'createdDate_cleaned')
df_users.sample(2)

Unnamed: 0,_id,_id_cleaned,active,createdDate,createdDate_cleaned,createdDate_cleaned_ts,lastLogin,lastLogin_cleaned,role,signUpSource,state
200,{'$oid': '6004a5d3fb296c4ef805e256'},6004a5d3fb296c4ef805e256,True,{'$date': 1610917331432},1610917331432,2021-01-17 21:02:11.432,,,consumer,Email,WI
142,{'$oid': '5fff4beedf9ace121f0c17ea'},5fff4beedf9ace121f0c17ea,True,{'$date': 1610566638415},1610566638415,2021-01-13 19:37:18.415,{'$date': 1610566872676},1610567000000.0,consumer,Email,WI


In [54]:
# convert df_users.lastLogin_cleaned to timestamps and confirm by viewing a sample of the dataframe
epoch_to_timestamp(df_users, 'lastLogin_cleaned')
df_users.sample(2)

AssertionError: there is at least one None/Null/NaN/NaT value in the converted timestamp data

In [55]:
# convert df_users.lastLogin_cleaned to timestamps and confirm by viewing a sample of the dataframe, allowing for Nulls
epoch_to_timestamp(df_users, 'lastLogin_cleaned', allow_nulls=True)
df_users.sample(2)

Unnamed: 0,_id,_id_cleaned,active,createdDate,createdDate_cleaned,createdDate_cleaned_ts,lastLogin,lastLogin_cleaned,lastLogin_cleaned_ts,role,signUpSource,state
350,{'$oid': '60145ff384231211ce796d51'},60145ff384231211ce796d51,True,{'$date': 1611948019722},1611948019722,2021-01-29 19:20:19.722,,,NaT,consumer,Email,
80,{'$oid': '5ff7264e8f142f11dd189504'},5ff7264e8f142f11dd189504,True,{'$date': 1610032718596},1610032718596,2021-01-07 15:18:38.596,{'$date': 1610032826821},1610033000000.0,2021-01-07 15:20:26.821,consumer,Email,WI


In [56]:
# convert df_receipts.createDate_cleaned to timestamps and confirm by viewing a sample of the dataframe
epoch_to_timestamp(df_receipts, 'createDate_cleaned')
df_receipts.sample(2)

Unnamed: 0,_id,_id_cleaned,bonusPointsEarned,bonusPointsEarnedReason,createDate,createDate_cleaned,createDate_cleaned_ts,dateScanned,dateScanned_cleaned,finishedDate,finishedDate_cleaned,modifyDate,modifyDate_cleaned,pointsAwardedDate,pointsAwardedDate_cleaned,pointsEarned,purchaseDate,purchaseDate_cleaned,purchasedItemCount,rewardsReceiptItemList,rewardsReceiptStatus,totalSpent,userId
145,{'$oid': '5ff726a40a720f05230005fa'},5ff726a40a720f05230005fa,750.0,"Receipt number 1 completed, bonus point schedule DEFAULT (5cefdcacf3693e0b50e83a36)",{'$date': 1610032804000},1610032804000,2021-01-07 15:20:04,{'$date': 1610032804000},1610032804000,,,{'$date': 1610032805000},1610032805000,{'$date': 1610032805000},1610033000000.0,810.0,{'$date': 1609946404000},1609946000000.0,5.0,"[{'barcode': '075925306254', 'competitiveProduct': True, 'finalPrice': '1', 'itemPrice': '1', 'partnerItemId': '1', 'quantityPurchased': 1, 'rewardsGroup': 'SARGENTO NATURAL SHREDDED CHEESE 6OZ OR LARGER', 'rewardsProductPartnerId': '5e7cf838f221c312e698a628'}, {'barcode': '075925306254', 'competitiveProduct': True, 'finalPrice': '10.00', 'itemPrice': '10.00', 'needsFetchReview': True, 'needsFetchReviewReason': 'USER_FLAGGED', 'partnerItemId': '2', 'preventTargetGapPoints': True, 'quantityPurchased': 2, 'rewardsGroup': 'SARGENTO NATURAL SHREDDED CHEESE 6OZ OR LARGER', 'rewardsProductPartnerId': '5e7cf838f221c312e698a628', 'userFlaggedBarcode': '075925306254', 'userFlaggedNewItem': True, 'userFlaggedPrice': '10.00', 'userFlaggedQuantity': 2}, {'barcode': '034100573065', 'description': 'MILLER LITE 24 PACK 12OZ CAN', 'finalPrice': '1.00', 'itemPrice': '1.00', 'needsFetchReview': True, 'needsFetchReviewReason': 'USER_FLAGGED', 'partnerItemId': '3', 'pointsEarned': '30.0', 'pointsPayerId': '5332f709e4b03c9a25efd0f1', 'preventTargetGapPoints': True, 'quantityPurchased': 1, 'rewardsGroup': 'MILLER LITE 24 PACK', 'rewardsProductPartnerId': '5332f709e4b03c9a25efd0f1', 'userFlaggedBarcode': '034100573065', 'userFlaggedDescription': 'MILLER LITE 24 PACK 12OZ CAN', 'userFlaggedNewItem': True, 'userFlaggedPrice': '1.00', 'userFlaggedQuantity': 1}, {'barcode': '034100573065', 'description': 'MILLER LITE 24 PACK 12OZ CAN', 'finalPrice': '1.00', 'itemPrice': '1.00', 'needsFetchReview': True, 'needsFetchReviewReason': 'USER_FLAGGED', 'partnerItemId': '4', 'pointsEarned': '30.0', 'pointsPayerId': '5332f709e4b03c9a25efd0f1', 'preventTargetGapPoints': True, 'quantityPurchased': 1, 'rewardsGroup': 'MILLER LITE 24 PACK', 'rewardsProductPartnerId': '5332f709e4b03c9a25efd0f1', 'userFlaggedBarcode': '034100573065', 'userFlaggedDescription': 'MILLER LITE 24 PACK 12OZ CAN', 'userFlaggedNewItem': True, 'userFlaggedPrice': '1.00', 'userFlaggedQuantity': 1}, {'needsFetchReview': True, 'needsFetchReviewReason': 'USER_FLAGGED', 'partnerItemId': '5', 'preventTargetGapPoints': True, 'userFlaggedBarcode': '034100573065', 'userFlaggedDescription': 'MILLER LITE 24 PACK 12OZ CAN', 'userFlaggedNewItem': True, 'userFlaggedPrice': '1.00', 'userFlaggedQuantity': 1}]",FLAGGED,13.0,5ff726a38f142f11dd1895dc
374,{'$oid': '60088d580a7214ad890000eb'},60088d580a7214ad890000eb,750.0,"Receipt number 1 completed, bonus point schedule DEFAULT (5cefdcacf3693e0b50e83a36)",{'$date': 1611173208000},1611173208000,2021-01-20 20:06:48,{'$date': 1611173208000},1611173208000,{'$date': 1611173209000},1611173000000.0,{'$date': 1611173214000},1611173214000,{'$date': 1611173209000},1611173000000.0,9850.0,{'$date': 1611100800000},1611101000000.0,7.0,"[{'barcode': '4011', 'description': 'ITEM NOT FOUND', 'finalPrice': '26.00', 'itemPrice': '26.00', 'needsFetchReview': False, 'partnerItemId': '1', 'preventTargetGapPoints': True, 'quantityPurchased': 7, 'userFlaggedBarcode': '4011', 'userFlaggedNewItem': True, 'userFlaggedPrice': '25.00', 'userFlaggedQuantity': 4}]",FINISHED,26.0,60088d58633aab121bb8e424


In [57]:
# convert df_receipts.dateScanned_cleaned to timestamps and confirm by viewing a sample of the dataframe
epoch_to_timestamp(df_receipts, 'dateScanned_cleaned')
df_receipts.sample(2)

Unnamed: 0,_id,_id_cleaned,bonusPointsEarned,bonusPointsEarnedReason,createDate,createDate_cleaned,createDate_cleaned_ts,dateScanned,dateScanned_cleaned,dateScanned_cleaned_ts,finishedDate,finishedDate_cleaned,modifyDate,modifyDate_cleaned,pointsAwardedDate,pointsAwardedDate_cleaned,pointsEarned,purchaseDate,purchaseDate_cleaned,purchasedItemCount,rewardsReceiptItemList,rewardsReceiptStatus,totalSpent,userId
502,{'$oid': '6010bdef0a720f0535000053'},6010bdef0a720f0535000053,750.0,"Receipt number 1 completed, bonus point schedule DEFAULT (5cefdcacf3693e0b50e83a36)",{'$date': 1611709935000},1611709935000,2021-01-27 01:12:15,{'$date': 1611709935000},1611709935000,2021-01-27 01:12:15,{'$date': 1611709937000},1611710000000.0,{'$date': 1611709942000},1611709942000,{'$date': 1611709937000},1611710000000.0,760.0,{'$date': 1611623535000},1611624000000.0,1.0,"[{'barcode': '079400066619', 'competitiveProduct': True, 'description': 'SUAVE PROFESSIONALS MOISTURIZING SHAMPOO LIQUID PLASTIC BOTTLE RP 12.6 OZ - 0079400066612', 'finalPrice': '1', 'itemPrice': '1', 'needsFetchReview': False, 'originalMetaBriteBarcode': '080878042197', 'partnerItemId': '1', 'pointsEarned': '10.0', 'pointsPayerId': '5332f5f6e4b03c9a25efd0b4', 'preventTargetGapPoints': True, 'quantityPurchased': 1, 'rewardsGroup': 'SUAVE HAIR CARE', 'rewardsProductPartnerId': '5332f5f6e4b03c9a25efd0b4', 'targetPrice': '800', 'userFlaggedBarcode': '079400066619'}]",FINISHED,1.0,6010bddaa4b74c120bd19dfb
37,{'$oid': '5ff36dc40a7214ada100059f'},5ff36dc40a7214ada100059f,5.0,All-receipts receipt bonus,{'$date': 1609788868000},1609788868000,2021-01-04 19:34:28,{'$date': 1609788868000},1609788868000,2021-01-04 19:34:28,{'$date': 1609788868000},1609789000000.0,{'$date': 1609788868000},1609788868000,{'$date': 1609788868000},1609789000000.0,355.0,{'$date': 1609702468000},1609702000000.0,9.0,"[{'barcode': '021000678358', 'description': 'KRAFT Barbecue Roasted Garlic 18 OZ 002100067835', 'finalPrice': '9.99', 'itemPrice': '9.99', 'partnerItemId': '1', 'quantityPurchased': 1, 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6'}, {'barcode': '021000678358', 'description': 'KRAFT Barbecue Roasted Garlic 18 OZ 002100067835', 'finalPrice': '9.99', 'itemPrice': '9.99', 'partnerItemId': '2', 'quantityPurchased': 1, 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6'}, {'barcode': '021000678358', 'description': 'KRAFT Barbecue Roasted Garlic 18 OZ 002100067835', 'finalPrice': '9.99', 'itemPrice': '9.99', 'partnerItemId': '3', 'quantityPurchased': 1, 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6'}, {'barcode': '021000678358', 'description': 'KRAFT Barbecue Roasted Garlic 18 OZ 002100067835', 'finalPrice': '9.99', 'itemPrice': '9.99', 'partnerItemId': '4', 'quantityPurchased': 1, 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6'}, {'barcode': '021000678358', 'description': 'KRAFT Barbecue Roasted Garlic 18 OZ 002100067835', 'finalPrice': '9.99', 'itemPrice': '9.99', 'partnerItemId': '5', 'quantityPurchased': 1, 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6'}, {'barcode': '021000678358', 'description': 'KRAFT Barbecue Roasted Garlic 18 OZ 002100067835', 'finalPrice': '9.99', 'itemPrice': '9.99', 'partnerItemId': '6', 'quantityPurchased': 1, 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6'}, {'barcode': '021000678358', 'description': 'KRAFT Barbecue Roasted Garlic 18 OZ 002100067835', 'finalPrice': '9.99', 'itemPrice': '9.99', 'partnerItemId': '7', 'quantityPurchased': 1, 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6'}, {'barcode': '021000678358', 'description': 'KRAFT Barbecue Roasted Garlic 18 OZ 002100067835', 'finalPrice': '9.99', 'itemPrice': '9.99', 'partnerItemId': '8', 'quantityPurchased': 1, 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6'}, {'barcode': '021000678358', 'description': 'KRAFT Barbecue Roasted Garlic 18 OZ 002100067835', 'finalPrice': '9.99', 'itemPrice': '9.99', 'partnerItemId': '9', 'quantityPurchased': 1, 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6'}]",FINISHED,89.91,5ff36d0362fde912123a5535


In [58]:
# convert df_receipts.finishedDate_cleaned to timestamps and confirm by viewing a sample of the dataframe
epoch_to_timestamp(df_receipts, 'finishedDate_cleaned')
df_receipts.sample(2)

AssertionError: there is at least one None/Null/NaN/NaT value in the converted timestamp data

In [59]:
# convert df_receipts.finishedDate_cleaned to timestamps and confirm by viewing a sample of the dataframe, allowing nulls
epoch_to_timestamp(df_receipts, 'finishedDate_cleaned', allow_nulls = True)
df_receipts.sample(2)

Unnamed: 0,_id,_id_cleaned,bonusPointsEarned,bonusPointsEarnedReason,createDate,createDate_cleaned,createDate_cleaned_ts,dateScanned,dateScanned_cleaned,dateScanned_cleaned_ts,finishedDate,finishedDate_cleaned,finishedDate_cleaned_ts,modifyDate,modifyDate_cleaned,pointsAwardedDate,pointsAwardedDate_cleaned,pointsEarned,purchaseDate,purchaseDate_cleaned,purchasedItemCount,rewardsReceiptItemList,rewardsReceiptStatus,totalSpent,userId
219,{'$oid': '5ffe23560a720f05ac006874'},5ffe23560a720f05ac006874,,,{'$date': 1610490710000},1610490710000,2021-01-12 22:31:50,{'$date': 1610490710000},1610490710000,2021-01-12 22:31:50,{'$date': 1610490710000},1610491000000.0,2021-01-12 22:31:50,{'$date': 1610490710000},1610490710000,,,,{'$date': 1610409600000},1610410000000.0,,"[{'description': 'flipbelt level terrain waist pouch, neon yellow, large/32-35', 'discountedItemPrice': '28.57', 'finalPrice': '28.57', 'itemPrice': '28.57', 'originalReceiptItemText': 'flipbelt level terrain waist pouch, neon yellow, large/32-35', 'partnerItemId': '0', 'priceAfterCoupon': '28.57', 'quantityPurchased': 1}]",PENDING,28.57,59c124bae4b0299e55b0f330
257,{'$oid': '5fff4ca90a720f05f300002a'},5fff4ca90a720f05f300002a,45.0,COMPLETE_PARTNER_RECEIPT,{'$date': 1610566825000},1610566825000,2021-01-13 19:40:25,{'$date': 1610566825000},1610566825000,2021-01-13 19:40:25,{'$date': 1610566826000},1610567000000.0,2021-01-13 19:40:26,{'$date': 1610566826000},1610566826000,{'$date': 1610566826000},1610567000000.0,100.0,{'$date': 1610566825000},1610567000000.0,1.0,"[{'barcode': '021000068159', 'description': 'KRAFT Trios Snackfulls Monterey Jack Cheeses, Dried Apples, Granola Clusters,2.25 oz', 'finalPrice': '0.99', 'itemPrice': '0.99', 'partnerItemId': '1', 'pointsEarned': '5.0', 'pointsPayerId': '559c2234e4b06aca36af13c6', 'quantityPurchased': 1, 'rewardsGroup': 'KRAFT TRIOS SNACKFULLS', 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6', 'targetPrice': '800'}]",FINISHED,0.99,5fff4beedf9ace121f0c17ea


In [60]:
# convert df_receipts.modifyDate_cleaned to timestamps and confirm by viewing a sample of the dataframe
epoch_to_timestamp(df_receipts, 'modifyDate_cleaned')
df_receipts.sample(2)

Unnamed: 0,_id,_id_cleaned,bonusPointsEarned,bonusPointsEarnedReason,createDate,createDate_cleaned,createDate_cleaned_ts,dateScanned,dateScanned_cleaned,dateScanned_cleaned_ts,finishedDate,finishedDate_cleaned,finishedDate_cleaned_ts,modifyDate,modifyDate_cleaned,modifyDate_cleaned_ts,pointsAwardedDate,pointsAwardedDate_cleaned,pointsEarned,purchaseDate,purchaseDate_cleaned,purchasedItemCount,rewardsReceiptItemList,rewardsReceiptStatus,totalSpent,userId
327,{'$oid': '6004a99e0a720f05f3000095'},6004a99e0a720f05f3000095,750.0,"Receipt number 1 completed, bonus point schedule DEFAULT (5cefdcacf3693e0b50e83a36)",{'$date': 1610918301000},1610918301000,2021-01-17 21:18:21.000,{'$date': 1610918301000},1610918301000,2021-01-17 21:18:21.000,{'$date': 1610918798000},1610919000000.0,2021-01-17 21:26:38,{'$date': 1610982060000},1610982060000,2021-01-18 15:01:00.000,{'$date': 1610918798000},1610919000000.0,1541.8,{'$date': 1610150400000},1610150000000.0,167.0,"[{'brandCode': 'HY-VEE', 'description': 'HYV GRADE A X LRG EG', 'discountedItemPrice': '1.29', 'finalPrice': '1.29', 'itemPrice': '1.29', 'originalReceiptItemText': 'HYV GRADE A X LRG EG', 'partnerItemId': '1050', 'quantityPurchased': 1}, {'brandCode': 'HY-VEE', 'description': 'HYV LF BLUEBERRY YOG', 'discountedItemPrice': '0.54', 'finalPrice': '0.54', 'itemPrice': '0.54', 'originalReceiptItemText': 'HYV LF BLUEBERRY YOG', 'partnerItemId': '1051', 'quantityPurchased': 1}, {'barcode': '036632011077', 'brandCode': 'LIGHT & FIT GREEK', 'competitiveProduct': True, 'description': 'Light & Fit Greek Crunch Key Lime Pie Yogurt', 'discountedItemPrice': '0.54', 'finalPrice': '0.54', 'itemPrice': '0.54', 'originalReceiptItemText': 'HYV LF KEY LIME PIE', 'partnerItemId': '1052', 'quantityPurchased': 1, 'rewardsGroup': 'YOPLAIT GREEK YOGURT', 'rewardsProductPartnerId': '5332f5f3e4b03c9a25efd0ae'}, {'brandCode': 'HY-VEE', 'description': 'HYV LF PEACH YOGURT', 'discountedItemPrice': '0.54', 'finalPrice': '0.54', 'itemPrice': '0.54', 'originalReceiptItemText': 'HYV LF PEACH YOGURT', 'partnerItemId': '1053', 'quantityPurchased': 1}, {'brandCode': 'HY-VEE', 'description': 'HYV LF VANILLA YOGUR', 'discountedItemPrice': '0.54', 'finalPrice': '0.54', 'itemPrice': '0.54', 'originalReceiptItemText': 'HYV LF VANILLA YOGUR', 'partnerItemId': '1054', 'quantityPurchased': 1}, {'description': '1% Milk', 'discountedItemPrice': '3.99', 'finalPrice': '3.99', 'itemPrice': '3.99', 'originalReceiptItemText': 'SASSY COW 1% MILK', 'partnerItemId': '1055', 'quantityPurchased': 1}, {'barcode': '075706191031', 'brandCode': 'CONNIE'S PIZZA', 'description': 'CNS CLSC THN SPR BLCK OLV GRN PPR ONN WHL MLK MZRL PRMS RMN 25.49 OZ', 'discountedItemPrice': '5.99', 'finalPrice': '5.99', 'itemPrice': '5.99', 'originalReceiptItemText': 'CONNIES SUPREME PIZZ', 'partnerItemId': '1058', 'quantityPurchased': 1, 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6'}, {'brandCode': 'HY-VEE', 'description': 'HV MIKED VEGETABLES', 'discountedItemPrice': '2.29', 'finalPrice': '2.29', 'itemPrice': '2.29', 'originalReceiptItemText': 'HV MIKED VEGETABLES', 'partnerItemId': '1059', 'quantityPurchased': 1}, {'barcode': '019600920106', 'brandCode': 'VAN DE KAMP'S', 'competitorRewardsGroup': 'SMART ONES', 'description': 'Van de Kamp's - Crispy Fish Fillets 19.45-oz', 'discountedItemPrice': '6.99', 'finalPrice': '6.99', 'itemPrice': '6.99', 'originalReceiptItemText': 'VAN DE KAMP JUMBO FI', 'partnerItemId': '1060', 'quantityPurchased': 1, 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6'}, {'description': 'GRIMES NSA RED KIDNE', 'discountedItemPrice': '1.09', 'finalPrice': '1.09', 'itemPrice': '1.09', 'originalReceiptItemText': 'GRIMES NSA RED KIDNE', 'partnerItemId': '1063', 'quantityPurchased': 1}, {'barcode': '028189000079', 'brandCode': 'HATCH FARMS', 'description': 'HTCH JLPN PPR SLCD JAR SLCT 12 OZ', 'discountedItemPrice': '1.19', 'finalPrice': '1.19', 'itemPrice': '1.19', 'originalReceiptItemText': 'HATCH NACHO SLCD JAL', 'partnerItemId': '1064', 'quantityPurchased': 1, 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6'}, {'barcode': '754500326305', 'brandCode': 'HY-VEE', 'description': 'Hy-Vee Clover Honey Bear', 'discountedItemPrice': '5.99', 'finalPrice': '5.99', 'itemPrice': '5.99', 'originalReceiptItemText': 'HYV CLOVER HNY BEAR', 'partnerItemId': '1066', 'quantityPurchased': 1}, {'brandCode': 'HY-VEE', 'description': 'HYV KETTLE MESQ BBQ', 'discountedItemPrice': '2.39', 'finalPrice': '2.39', 'itemPrice': '2.39', 'originalReceiptItemText': 'HYV KETTLE MESQ BBQ', 'partnerItemId': '1067', 'quantityPurchased': 1}, {'brandCode': 'KELLOGG'S', 'description': 'KELL ORIG FRSTD MINI', 'discountedItemPrice': '4.29', 'finalPrice': '4.29', 'itemPrice': '4.29', 'originalReceiptItemText': 'KELL ORIG FRSTD MINI', 'partnerItemId': '1068', 'quantityPurchased': 1}, {'description': 'SIMPLY HVY DUTY FOIL', 'discountedItemPrice': '2.99', 'finalPrice': '2.99', 'itemPrice': '2.99', 'originalReceiptItemText': 'SIMPLY HVY DUTY FOIL', 'partnerItemId': '1069', 'quantityPurchased': 1}, {'barcode': '058496723002', 'brandCode': 'TEMPTATIONS', 'description': 'WSTMP SS CTSNK RPGBG CHKN 6.3OZ', 'discountedItemPrice': '1.99', 'finalPrice': '1.99', 'itemPrice': '1.99', 'originalReceiptItemText': 'TEMPT TASTY CHICKEN', 'partnerItemId': '1070', 'quantityPurchased': 1, 'rewardsProductPartnerId': '550b2565e4b001d5e9e4146f'}, {'barcode': '058449450023', 'brandCode': 'NATURE'S PATH ORGANIC', 'description': 'NTRS PTH FLX N OATS INST OAT ORGN BOX HOT CRL 14 OZ', 'discountedItemPrice': '3.99', 'finalPrice': '3.99', 'itemPrice': '3.99', 'originalReceiptItemText': 'NATURES PATH OF OATS', 'partnerItemId': '1073', 'quantityPurchased': 1, 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6'}, {'barcode': '075450093070', 'brandCode': 'HY-VEE', 'description': 'Hy-Vee Medium Cheddar Cheese Brick', 'discountedItemPrice': '4.99', 'finalPrice': '4.99', 'itemPrice': '4.99', 'originalReceiptItemText': 'HYV MED CHED BRICK', 'partnerItemId': '1077', 'quantityPurchased': 1}, {'barcode': '048500256763', 'brandCode': 'DOLE', 'description': 'DOLE ORANGE PEACH MANGO JUICE VITAMIN C ENRICHED FROM CONCNTRT PSTRZD CARTON 1 CT 59 OZ', 'discountedItemPrice': '1.31', 'finalPrice': '1.31', 'itemPrice': '1.31', 'metabriteCampaignId': 'DOLE BLENDS MULTI SERVE', 'originalReceiptItemText': 'DOLE BANANAS', 'partnerItemId': '1080', 'pointsNotAwardedReason': 'Action not allowed for user and CPG', 'pointsPayerId': '5332f5fbe4b03c9a25efd0ba', 'quantityPurchased': 1, 'rewardsGroup': 'DOLE BLENDS MULTI SERVE', 'rewardsProductPartnerId': '5332f5fbe4b03c9a25efd0ba'}, {'barcode': '4816', 'description': 'Golden Sweet Potato', 'discountedItemPrice': '2.21', 'finalPrice': '2.21', 'itemPrice': '2.21', 'originalReceiptItemText': 'GOLDEN SWT POTATO', 'partnerItemId': '1082', 'quantityPurchased': 1}, {'description': 'GREEN BELL PEPPERS', 'discountedItemPrice': '0.88', 'finalPrice': '0.88', 'itemPrice': '0.88', 'originalReceiptItemText': 'GREEN BELL PEPPERS', 'partnerItemId': '1084', 'quantityPurchased': 1}, {'barcode': '4166', 'description': 'Other Sweet Onions', 'discountedItemPrice': '0.91', 'finalPrice': '0.91', 'itemPrice': '0.91', 'originalReceiptItemText': 'SWEET ONIONS', 'partnerItemId': '1085', 'quantityPurchased': 1}, {'barcode': '075450120400', 'brandCode': 'HY-VEE', 'description': 'Hy-Vee Cage Free Omega-3 Grade A Large Brown Eggs', 'discountedItemPrice': '1.99', 'finalPrice': '1.99', 'itemPrice': '1.99', 'originalReceiptItemText': 'HV CAGE FREE LG WHT', 'partnerItemId': '1089', 'quantityPurchased': 1}, {'description': 'Fat Free Half And Half', 'discountedItemPrice': '1.84', 'finalPrice': '1.84', 'itemPrice': '1.84', 'originalReceiptItemText': 'HYY HALF & HALF', 'partnerItemId': '1090', 'quantityPurchased': 1}, {'barcode': '036632011077', 'brandCode': 'LIGHT & FIT GREEK', 'competitiveProduct': True, 'description': 'Light & Fit Greek Crunch Key Lime Pie Yogurt', 'discountedItemPrice': '0.56', 'finalPrice': '0.56', 'itemPrice': '0.56', 'originalReceiptItemText': 'HYV LF KEY LIME PIE', 'partnerItemId': '1091', 'quantityPurchased': 1, 'rewardsGroup': 'YOPLAIT GREEK YOGURT', 'rewardsProductPartnerId': '5332f5f3e4b03c9a25efd0ae'}, {'barcode': '754500797204', 'brandCode': 'HY-VEE', 'description': 'Hy-Vee Orange Cream Lowfat Yogurt', 'discountedItemPrice': '0.56', 'finalPrice': '0.56', 'itemPrice': '0.56', 'originalReceiptItemText': 'HYV LF ORANGE CREAM', 'partnerItemId': '1092', 'quantityPurchased': 1}, {'brandCode': 'HY-VEE', 'description': 'HYV LF VANILLA YOGUR', 'discountedItemPrice': '0.56', 'finalPrice': '0.56', 'itemPrice': '0.56', 'originalReceiptItemText': 'HYV LF VANILLA YOGUR', 'partnerItemId': '1093', 'quantityPurchased': 1}, {'barcode': '075450085150', 'brandCode': 'HY-VEE', 'description': 'Hy-Vee Unsalted Sweet Butter Quarters', 'discountedItemPrice': '3.99', 'finalPrice': '3.99', 'itemPrice': '3.99', 'originalReceiptItemText': 'HYV UNSLTD SWT BTR', 'partnerItemId': '1094', 'quantityPurchased': 1}, {'description': '2% Milk', 'discountedItemPrice': '4.09', 'finalPrice': '4.09', 'itemPrice': '4.09', 'originalReceiptItemText': 'SASSY COW 2% MILK', 'partnerItemId': '1095', 'quantityPurchased': 1}, {'barcode': '071007023095', 'brandCode': 'EL MONTEREY', 'description': 'EL MNT MLD BF BN GRN CHL BRT KP FRZN BAG 32 OZ', 'discountedItemPrice': '4.99', 'finalPrice': '4.99', 'itemPrice': '4.99', 'originalReceiptItemText': 'EL MONT GREEN CHILI', 'partnerItemId': '1098', 'quantityPurchased': 1, 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6'}, {'barcode': '072310001459', 'brandCode': 'BIGELOW', 'competitiveProduct': True, 'competitorRewardsGroup': 'LIPTON TEA', 'description': 'Bigelow Green Tea Bags with Lemon', 'discountedItemPrice': '3.49', 'finalPrice': '3.49', 'itemPrice': '3.49', 'originalReceiptItemText': 'BIGELOW GRN TEA W/LE', 'partnerItemId': '1101', 'quantityPurchased': 1, 'rewardsGroup': 'LIPTON TEA', 'rewardsProductPartnerId': '5332f5f6e4b03c9a25efd0b4'}, {'brandCode': 'HY-VEE', 'description': 'HYV KETTLE MESQ BBQ', 'discountedItemPrice': '2.39', 'finalPrice': '2.39', 'itemPrice': '2.39', 'originalReceiptItemText': 'HYV KETTLE MESQ BBQ', 'partnerItemId': '1103', 'quantityPurchased': 1}, {'barcode': '754502285808', 'brandCode': 'HY-VEE', 'description': 'Hy-Vee Kettle Cooked Pub Mustard Flavored Potato Chips', 'discountedItemPrice': '2.39', 'finalPrice': '2.39', 'itemPrice': '2.39', 'originalReceiptItemText': 'HYV KETTLE PUB MUSTA', 'partnerItemId': '1104', 'quantityPurchased': 1}, {'brandCode': 'HY-VEE SELECT', 'description': 'HYV SELECT WIDE NOOD', 'discountedItemPrice': '2.99', 'finalPrice': '2.99', 'itemPrice': '2.99', 'originalReceiptItemText': 'HYV SELECT WIDE NOOD', 'partnerItemId': '1105', 'quantityPurchased': 1}, {'barcode': '041390024368', 'brandCode': 'KIKKOMAN', 'competitorRewardsGroup': 'FOOD NETWORK KITCHEN INSPIRATIONS COOKING SAUCE', 'description': 'KKMN SWT CHL RFRG AFTR OPNN SC BTL 13 OZ', 'discountedItemPrice': '3.79', 'finalPrice': '3.79', 'itemPrice': '3.79', 'originalReceiptItemText': 'KIKKO THAI CHILI SCE', 'partnerItemId': '1106', 'quantityPurchased': 1, 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6'}, {'brandCode': 'SPECIAL K', 'description': 'SPL K ORIGINAL', 'discountedItemPrice': '1.88', 'finalPrice': '1.88', 'itemPrice': '1.88', 'originalReceiptItemText': 'SPL K ORIGINAL', 'partnerItemId': '1108', 'quantityPurchased': 1}, {'barcode': '511111204206', 'brandCode': 'SWANSON', 'competitiveProduct': False, 'description': 'SWANSON', 'discountedItemPrice': '5.58', 'finalPrice': '5.58', 'itemPrice': '5.58', 'originalReceiptItemText': 'SWANSON UNSLTD CKN', 'partnerItemId': '1109', 'pointsEarned': '55.8', 'pointsPayerId': '5a734034e4b0d58f376be874', 'quantityPurchased': 2, 'rewardsProductPartnerId': '5a734034e4b0d58f376be874'}, {'barcode': '043000716021', 'brandCode': 'YUBAN', 'description': 'YUBAN Traditional Medium Roast Ground Coffee 12 oz. Canister', 'discountedItemPrice': '5.99', 'finalPrice': '5.99', 'itemPrice': '5.99', 'metabriteCampaignId': 'YUBAN COFFEE', 'originalReceiptItemText': 'YUBAN TRDTNL ROAST', 'partnerItemId': '1111', 'pointsEarned': '30.0', 'pointsPayerId': '559c2234e4b06aca36af13c6', 'quantityPurchased': 1, 'rewardsGroup': 'YUBAN COFFEE', 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6'}, {'barcode': '044500341720', 'brandCode': 'HILLSHIRE FARM', 'competitorRewardsGroup': 'OSCAR MAYER SAUSAGE LINK', 'description': 'Hillshire Farm - Turkey Polska Kielbasa 13.00-oz', 'discountedItemPrice': '3.99', 'finalPrice': '3.99', 'itemPrice': '3.99', 'originalReceiptItemText': 'HILLS TRKY POLSKA KI', 'partnerItemId': '1114', 'quantityPurchased': 1, 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6'}, {'barcode': '754500958803', 'brandCode': 'HY-VEE', 'description': 'Hy-Vee Diced Cooked Ham', 'discountedItemPrice': '2.99', 'finalPrice': '2.99', 'itemPrice': '2.99', 'originalReceiptItemText': 'HYV DICED COOKED HAM', 'partnerItemId': '1115', 'quantityPurchased': 1}, {'brandCode': 'HY-VEE', 'description': 'HYV EX SHARP CHEDDAR', 'discountedItemPrice': '2.79', 'finalPrice': '2.79', 'itemPrice': '2.79', 'originalReceiptItemText': 'HYV EX SHARP CHEDDAR', 'partnerItemId': '1116', 'quantityPurchased': 1}, {'barcode': '024105590051', 'brandCode': 'JUST BARE', 'description': 'JST BR BNLS HND TRMD MNML PRCS SKNL CHCK WHL BRST FLT RFRG MLDD TRY 14 OZ', 'discountedItemPrice': '6.99', 'finalPrice': '6.99', 'itemPrice': '6.99', 'metabriteCampaignId': 'JUST BARE FRESH CHICKEN BREAST FILETS', 'originalReceiptItemText': 'JUST BARE CKN BREAST', 'partnerItemId': '1117', 'quantityPurchased': 1, 'rewardsGroup': 'JUST BARE FRESH CHICKEN BREAST FILETS', 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6'}, {'barcode': '21251000000', 'brandCode': 'LAURA'S LEAN BEEF', 'description': 'Laura's Lean Beef Ground Beef 93% Lean & Natural', 'discountedItemPrice': '5.99', 'finalPrice': '5.99', 'itemPrice': '5.99', 'originalReceiptItemText': 'LEAN GRND BF 93% 7%', 'partnerItemId': '1118', 'quantityPurchased': 1}, {'description': 'Blueberries', 'discountedItemPrice': '2.99', 'finalPrice': '2.99', 'itemPrice': '2.99', 'originalReceiptItemText': 'BLUEBERRIES', 'partnerItemId': '1121', 'quantityPurchased': 1}, {'barcode': '4060', 'description': 'Broccoli', 'discountedItemPrice': '2.99', 'finalPrice': '2.99', 'itemPrice': '2.99', 'originalReceiptItemText': 'BROCCOLI', 'partnerItemId': '1122', 'quantityPurchased': 1}, {'brandCode': 'CAL-ORGANIC FARMS', 'description': 'CAL ORG WHOLE CARTS', 'discountedItemPrice': '2.49', 'finalPrice': '2.49', 'itemPrice': '2.49', 'originalReceiptItemText': 'CAL ORG WHOLE CARTS', 'partnerItemId': '1123', 'quantityPurchased': 1}, {'barcode': '048500052020', 'brandCode': 'DOLE CHILLED FRUIT JUICES', 'description': 'DL PNPL ORNG BNN FRZN CNCN CAN 48 OZ JC BLND 12 FL OZ', 'discountedItemPrice': '1.53', 'finalPrice': '1.53', 'itemPrice': '1.53', 'metabriteCampaignId': 'DOLE 100% FRUIT JUICES', 'originalReceiptItemText': 'DOLE BANANAS', 'partnerItemId': '1124', 'quantityPurchased': 1, 'rewardsGroup': 'DOLE 100% FRUIT JUICES', 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6'}, {'brandCode': 'DOLE', 'description': 'DOLE ITALIAN BLEND', 'discountedItemPrice': '3.49', 'finalPrice': '3.49', 'itemPrice': '3.49', 'originalReceiptItemText': 'DOLE ITALIAN BLEND', 'partnerItemId': '1126', 'quantityPurchased': 1}, {'barcode': '4069', 'description': 'Green Cabbage', 'discountedItemPrice': '2.05', 'finalPrice': '2.05', 'itemPrice': '2.05', 'originalReceiptItemText': 'GREEN CABBAGE', 'partnerItemId': '1127', 'quantityPurchased': 1}, {'description': 'Leek', 'discountedItemPrice': '3.99', 'finalPrice': '3.99', 'itemPrice': '3.99', 'originalReceiptItemText': 'LEEK', 'partnerItemId': '1129', 'quantityPurchased': 1}, {'description': 'LE OW ONRONS', 'discountedItemPrice': '1.82', 'finalPrice': '1.82', 'itemPrice': '1.82', 'originalReceiptItemText': 'LE OW ONRONS', 'partnerItemId': '1130', 'quantityPurchased': 1}, {'description': 'BAREFOOT PINOT NOIR', 'discountedItemPrice': '19.99', 'finalPrice': '19.99', 'itemPrice': '19.99', 'originalReceiptItemText': 'BAREFOOT PINOT NOIR', 'partnerItemId': '1134', 'quantityPurchased': 1}, {'barcode': '754500973400', 'brandCode': 'HY-VEE', 'description': 'Hy-Vee 4% Large Curd Cottage Cheese', 'discountedItemPrice': '1.47', 'finalPrice': '1.47', 'itemPrice': '1.47', 'originalReceiptItemText': 'HYV 4% LRG CURD COTI', 'partnerItemId': '1137', 'quantityPurchased': 1}, {'barcode': '075450079760', 'brandCode': 'HY-VEE', 'description': 'Hy-Vee Lowfat Blackberry Yogurt', 'discountedItemPrice': '0.56', 'finalPrice': '0.56', 'itemPrice': '0.56', 'originalReceiptItemText': 'HYV LF BLACKBERRY YO', 'partnerItemId': '1138', 'quantityPurchased': 1}, {'barcode': '754500947807', 'brandCode': 'HY-VEE', 'description': 'Hy-Vee Lemon Flavored Lowfat Yogurt', 'discountedItemPrice': '0.56', 'finalPrice': '0.56', 'itemPrice': '0.56', 'originalReceiptItemText': 'HYV LF LEMON YOGURT', 'partnerItemId': '1139', 'quantityPurchased': 1}, {'brandCode': 'HY-VEE', 'description': 'HYV LF PEACH YOGURT', 'discountedItemPrice': '0.56', 'finalPrice': '0.56', 'itemPrice': '0.56', 'originalReceiptItemText': 'HYV LF PEACH YOGURT', 'partnerItemId': '1140', 'quantityPurchased': 1}, {'brandCode': 'HY-VEE', 'description': 'HYV LF VANILLA YOGUR', 'discountedItemPrice': '0.56', 'finalPrice': '0.56', 'itemPrice': '0.56', 'originalReceiptItemText': 'HYV LF VANILLA YOGUR', 'partnerItemId': '1141', 'quantityPurchased': 1}, {'barcode': '075450085150', 'brandCode': 'HY-VEE', 'description': 'Hy-Vee Unsalted Sweet Butter Quarters', 'discountedItemPrice': '2.77', 'finalPrice': '2.77', 'itemPrice': '2.77', 'originalReceiptItemText': 'HYV UNSLTD SWT BTR', 'partnerItemId': '1142', 'quantityPurchased': 1}, {'description': 'LUIGES DELUXE PIZZA', 'discountedItemPrice': '11.98', 'finalPrice': '11.98', 'itemPrice': '11.98', 'originalReceiptItemText': 'LUIGES DELUXE PIZZA', 'partnerItemId': '1145', 'quantityPurchased': 2}, {'brandCode': 'BUSH'S BEST', 'description': 'BUSHS HOT RED CHILI', 'discountedItemPrice': '1.59', 'finalPrice': '1.59', 'itemPrice': '1.59', 'originalReceiptItemText': 'BUSHS HOT RED CHILI', 'partnerItemId': '1149', 'quantityPurchased': 1}, {'barcode': '039400015031', 'brandCode': 'BUSH'S BEST', 'competitiveProduct': True, 'description': 'Bush's Best Chili Beans - Kidney Beans in Mild Chili Sauce', 'discountedItemPrice': '1.59', 'finalPrice': '1.59', 'itemPrice': '1.59', 'originalReceiptItemText': 'BUSHS MILD KIDNEY CH', 'partnerItemId': '1150', 'quantityPurchased': 1, 'rewardsGroup': 'BIRDS EYE STEAMFRESH PLAIN FROZEN VEGETABLES', 'rewardsProductPartnerId': '5e825d64f221c312e698a62a'}, {'barcode': '025500204215', 'brandCode': 'FOLGERS', 'competitiveProduct': True, 'competitorRewardsGroup': 'MAXWELL HOUSE GROUND COFFEE', 'description': 'Folgers Classic Roast Medium Caffeinated Coffee', 'discountedItemPrice': '14.98', 'finalPrice': '14.98', 'itemPrice': '14.98', 'originalReceiptItemText': 'FOLGERS CLASSIC ROAS', 'partnerItemId': '1158', 'quantityPurchased': 2, 'rewardsGroup': 'SWISS MISS CAFÉ', 'rewardsProductPartnerId': '5e825d64f221c312e698a62a'}, {'barcode': '754501311409', 'brandCode': 'HY-VEE', 'description': 'Hy-Vee Kettle Cooked Buffalo & Blue Cheese Potato Chips', 'discountedItemPrice': '1.67', 'finalPrice': '1.67', 'itemPrice': '1.67', 'originalReceiptItemText': 'HYV KETTLE BUFFALO&B', 'partnerItemId': '1160', 'quantityPurchased': 1}, {'brandCode': 'HY-VEE', 'description': 'HYV KETTLE CHIP ORIG', 'discountedItemPrice': '3.69', 'finalPrice': '3.69', 'itemPrice': '3.69', 'originalReceiptItemText': 'HYV KETTLE CHIP ORIG', 'partnerItemId': '1162', 'quantityPurchased': 1}, {'barcode': '754502285808', 'brandCode': 'HY-VEE', 'description': 'Hy-Vee Kettle Cooked Pub Mustard Flavored Potato Chips', 'discountedItemPrice': '1.67', 'finalPrice': '1.67', 'itemPrice': '1.67', 'originalReceiptItemText': 'HYV KETTLE PUB MUSTA', 'partnerItemId': '1163', 'quantityPurchased': 1}, {'barcode': '754500355602', 'brandCode': 'HY-VEE', 'description': 'Hy-Vee Seasoned Croutons', 'discountedItemPrice': '1.79', 'finalPrice': '1.79', 'itemPrice': '1.79', 'originalReceiptItemText': 'HYV SEASONED CROUTON', 'partnerItemId': '1165', 'quantityPurchased': 4}, {'barcode': '041000004087', 'brandCode': 'LIPTON', 'description': 'Lipton Recipe Secrets Beefy Onion Recipe Soup & Dip Mix, 2 count, 2.2 oz', 'discountedItemPrice': '1.99', 'finalPrice': '1.99', 'itemPrice': '1.99', 'metabriteCampaignId': 'LIPTON RECIPE SECRETS', 'originalReceiptItemText': 'RED LIPTON BEEFY ONION', 'partnerItemId': '1168', 'pointsEarned': '19.9', 'pointsPayerId': '5332f5f6e4b03c9a25efd0b4', 'quantityPurchased': 1, 'rewardsGroup': 'LIPTON RECIPE SECRETS', 'rewardsProductPartnerId': '5332f5f6e4b03c9a25efd0b4'}, {'barcode': '058449450023', 'brandCode': 'NATURE'S PATH ORGANIC', 'description': 'NTRS PTH FLX N OATS INST OAT ORGN BOX HOT CRL 14 OZ', 'discountedItemPrice': '3.99', 'finalPrice': '3.99', 'itemPrice': '3.99', 'originalReceiptItemText': 'NATURES PATH OF OATS', 'partnerItemId': '1171', 'quantityPurchased': 1, 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6'}, {'description': 'FRSH BNLS PORK LOIN', 'discountedItemPrice': '4.56', 'finalPrice': '4.56', 'itemPrice': '4.56', 'originalReceiptItemText': 'FRSH BNLS PORK LOIN', 'partnerItemId': '1175', 'quantityPurchased': 1}, {'barcode': '044500341720', 'brandCode': 'HILLSHIRE FARM', 'competitorRewardsGroup': 'OSCAR MAYER SAUSAGE LINK', 'description': 'Hillshire Farm - Turkey Polska Kielbasa 13.00-oz', 'discountedItemPrice': '3.99', 'finalPrice': '3.99', 'itemPrice': '3.99', 'originalReceiptItemText': 'HILLS TRKY POLSKA KI', 'partnerItemId': '1180', 'quantityPurchased': 1, 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6'}, {'brandCode': 'HY-VEE', 'description': 'HYV EXTRA SHARP CHDD', 'discountedItemPrice': '4.99', 'finalPrice': '4.99', 'itemPrice': '4.99', 'originalReceiptItemText': 'HYV EXTRA SHARP CHDD', 'partnerItemId': '1181', 'quantityPurchased': 1}, {'barcode': '024105590051', 'brandCode': 'JUST BARE', 'description': 'JST BR BNLS HND TRMD MNML PRCS SKNL CHCK WHL BRST FLT RFRG MLDD TRY 14 OZ', 'discountedItemPrice': '6.99', 'finalPrice': '6.99', 'itemPrice': '6.99', 'metabriteCampaignId': 'JUST BARE FRESH CHICKEN BREAST FILETS', 'originalReceiptItemText': 'JUST BARE CKN BREAST', 'partnerItemId': '1182', 'quantityPurchased': 1, 'rewardsGroup': 'JUST BARE FRESH CHICKEN BREAST FILETS', 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6'}, {'barcode': '021000604647', 'brandCode': 'KRAFT', 'competitiveProduct': True, 'description': 'KRAFT Cheese - Pasteurized Prepared American Singles', 'discountedItemPrice': '3.29', 'finalPrice': '3.29', 'itemPrice': '3.29', 'metabriteCampaignId': 'KRAFT SINGLES', 'originalReceiptItemText': 'STO KRAFT AMER SINGLES', 'partnerItemId': '1183', 'quantityPurchased': 1, 'rewardsGroup': 'SARGENTO SLICED NATURAL CHEESE 7OZ OR SMALLER', 'rewardsProductPartnerId': '5e7cf838f221c312e698a628'}, {'barcode': '21251000000', 'brandCode': 'LAURA'S LEAN BEEF', 'description': 'Laura's Lean Beef Ground Beef 93% Lean & Natural', 'discountedItemPrice': '5.99', 'finalPrice': '5.99', 'itemPrice': '5.99', 'originalReceiptItemText': 'LEAN GRND BF 93% 7%', 'partnerItemId': '1184', 'quantityPurchased': 1}, {'barcode': '071430010051', 'brandCode': 'DOLE', 'description': 'DL CRT ICBR PEA PD RDSH RD CBG RMN LTC VRY VG THRG WSHD LTC SLD BAG 11.9 OZ', 'discountedItemPrice': '3.49', 'finalPrice': '3.49', 'itemPrice': '3.49', 'originalReceiptItemText': 'DOLE VERY VEGGIE SLD', 'partnerItemId': '1187', 'quantityPurchased': 1, 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6'}, {'brandCode': 'GREEN GIANT', 'description': 'GG KLONDIKE GOLDUST', 'discountedItemPrice': '3.99', 'finalPrice': '3.99', 'itemPrice': '3.99', 'originalReceiptItemText': 'GG KLONDIKE GOLDUST', 'partnerItemId': '1188', 'quantityPurchased': 1}, {'description': 'Green Bell Peppers', 'discountedItemPrice': '0.77', 'finalPrice': '0.77', 'itemPrice': '0.77', 'originalReceiptItemText': 'LIKE GREEN BELL PEPPERS', 'partnerItemId': '1189', 'quantityPurchased': 1}, {'barcode': '071146002487', 'brandCode': 'HARVEST SNAPS', 'description': 'CLB SNP CRSP BLCK PPR BKD 40 PCT LS FAT HRVS SNPS BAG 3.3 OZ', 'discountedItemPrice': '1.25', 'finalPrice': '1.25', 'itemPrice': '1.25', 'originalReceiptItemText': 'HRVST SNAPS BLACK PE', 'partnerItemId': '1190', 'quantityPurchased': 1, 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6'}, {'barcode': '26623200000', 'brandCode': 'CHEESE', 'description': 'Cheese Cheddar & Colby - Signature Mild Cheese', 'discountedItemPrice': '1.59', 'finalPrice': '1.59', 'itemPrice': '1.59', 'originalReceiptItemText': 'OU5HS MILD KIDNEY CH', 'partnerItemId': '1196', 'quantityPurchased': 1}, {'description': 'Cookies', 'discountedItemPrice': '0.71', 'finalPrice': '0.71', 'itemPrice': '0.71', 'originalReceiptItemText': 'COMPETITOR COO 6', 'partnerItemId': '1201', 'quantityPurchased': 1}, {'description': ' PE 9G C6UP9N', 'discountedItemPrice': '1.01', 'finalPrice': '1.01', 'itemPrice': '1.01', 'originalReceiptItemText': ' PE 9G C6UP9N', 'partnerItemId': '1203', 'quantityPurchased': 2}, {'brandCode': 'HY-VEE', 'description': 'HYV KETTLE CHIP ORIG', 'discountedItemPrice': '3.69', 'finalPrice': '3.69', 'itemPrice': '3.69', 'originalReceiptItemText': 'HYV KETTLE CHIP ORIG', 'partnerItemId': '1209', 'quantityPurchased': 1}, {'barcode': '754502285808', 'brandCode': 'HY-VEE', 'description': 'Hy-Vee Kettle Cooked Pub Mustard Flavored Potato Chips', 'discountedItemPrice': '1.67', 'finalPrice': '1.67', 'itemPrice': '1.67', 'originalReceiptItemText': 'HYV KETTLE PUB MUSTA', 'partnerItemId': '1210', 'quantityPurchased': 1}, {'barcode': '754500355602', 'brandCode': 'HY-VEE', 'description': 'Hy-Vee Seasoned Croutons', 'discountedItemPrice': '1.79', 'finalPrice': '1.79', 'itemPrice': '1.79', 'originalReceiptItemText': 'HYV SEASONED CROUTON', 'partnerItemId': '1212', 'quantityPurchased': 1}, {'brandCode': 'KASHI', 'description': 'KASHI GOFLOW CINN CR', 'discountedItemPrice': '3.77', 'finalPrice': '3.77', 'itemPrice': '3.77', 'originalReceiptItemText': 'KASHI GOFLOW CINN CR', 'partnerItemId': '1213', 'quantityPurchased': 1}, {'barcode': '018627030010', 'brandCode': 'KASHI', 'competitiveProduct': True, 'description': 'Kashi Granola Bars - Chewy Honey Almond Flax', 'discountedItemPrice': '3.77', 'finalPrice': '3.77', 'itemPrice': '3.77', 'originalReceiptItemText': 'KASHI GOPLAY HNY ALM', 'partnerItemId': '1214', 'quantityPurchased': 1, 'rewardsGroup': 'NATURE VALLEY CHEWY BARS', 'rewardsProductPartnerId': '5332f5f3e4b03c9a25efd0ae'}, {'barcode': '041000004087', 'brandCode': 'LIPTON', 'description': 'Lipton Recipe Secrets Beefy Onion Recipe Soup & Dip Mix, 2 count, 2.2 oz', 'discountedItemPrice': '1.99', 'finalPrice': '1.99', 'itemPrice': '1.99', 'metabriteCampaignId': 'LIPTON RECIPE SECRETS', 'originalReceiptItemText': 'LIPTON BEEFY ONION', 'partnerItemId': '1215', 'pointsEarned': '19.9', 'pointsPayerId': '5332f5f6e4b03c9a25efd0b4', 'quantityPurchased': 1, 'rewardsGroup': 'LIPTON RECIPE SECRETS', 'rewardsProductPartnerId': '5332f5f6e4b03c9a25efd0b4'}, {'barcode': '058449450023', 'brandCode': 'NATURE'S PATH ORGANIC', 'description': 'NTRS PTH FLX N OATS INST OAT ORGN BOX HOT CRL 14 OZ', 'discountedItemPrice': '3.99', 'finalPrice': '3.99', 'itemPrice': '3.99', 'originalReceiptItemText': 'NATURES PATH OF OATS', 'partnerItemId': '1218', 'quantityPurchased': 1, 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6'}, {'description': 'FRSH BNLS PORK LOIN', 'discountedItemPrice': '4.56', 'finalPrice': '4.56', 'itemPrice': '4.56', 'originalReceiptItemText': 'FRSH BNLS PORK LOIN', 'partnerItemId': '1222', 'quantityPurchased': 1}, {'barcode': '044500341720', 'brandCode': 'HILLSHIRE FARM', 'competitorRewardsGroup': 'OSCAR MAYER SAUSAGE LINK', 'description': 'Hillshire Farm - Turkey Polska Kielbasa 13.00-oz', 'discountedItemPrice': '3.99', 'finalPrice': '3.99', 'itemPrice': '3.99', 'originalReceiptItemText': 'HILLS TRKY POLSKA KI', 'partnerItemId': '1223', 'quantityPurchased': 1, 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6'}, {'brandCode': 'HY-VEE', 'description': 'HYV EXTRA SHARP CHDD', 'discountedItemPrice': '4.99', 'finalPrice': '4.99', 'itemPrice': '4.99', 'originalReceiptItemText': 'HYV EXTRA SHARP CHDD', 'partnerItemId': '1224', 'quantityPurchased': 1}, {'barcode': '024105590051', 'brandCode': 'JUST BARE', 'description': 'JST BR BNLS HND TRMD MNML PRCS SKNL CHCK WHL BRST FLT RFRG MLDD TRY 14 OZ', 'discountedItemPrice': '6.99', 'finalPrice': '6.99', 'itemPrice': '6.99', 'metabriteCampaignId': 'JUST BARE FRESH CHICKEN BREAST FILETS', 'originalReceiptItemText': 'JUST BARE CKN BREAST', 'partnerItemId': '1225', 'quantityPurchased': 1, 'rewardsGroup': 'JUST BARE FRESH CHICKEN BREAST FILETS', 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6'}, {'barcode': '021000604647', 'brandCode': 'KRAFT', 'competitiveProduct': True, 'description': 'KRAFT Cheese - Pasteurized Prepared American Singles', 'discountedItemPrice': '3.29', 'finalPrice': '3.29', 'itemPrice': '3.29', 'metabriteCampaignId': 'KRAFT SINGLES', 'originalReceiptItemText': 'KRAFT AMER SINGLES', 'partnerItemId': '1226', 'quantityPurchased': 1, 'rewardsGroup': 'SARGENTO SLICED NATURAL CHEESE 7OZ OR SMALLER', 'rewardsProductPartnerId': '5e7cf838f221c312e698a628'}, {'barcode': '21251000000', 'brandCode': 'LAURA'S LEAN BEEF', 'description': 'Laura's Lean Beef Ground Beef 93% Lean & Natural', 'discountedItemPrice': '5.99', 'finalPrice': '5.99', 'itemPrice': '5.99', 'originalReceiptItemText': 'LEAN GRND BF 93% 7%', 'partnerItemId': '1227', 'quantityPurchased': 1}, {'barcode': '071430010051', 'brandCode': 'DOLE', 'description': 'DL CRT ICBR PEA PD RDSH RD CBG RMN LTC VRY VG THRG WSHD LTC SLD BAG 11.9 OZ', 'discountedItemPrice': '3.49', 'finalPrice': '3.49', 'itemPrice': '3.49', 'originalReceiptItemText': 'DOLE VERY VEGGIE SLD', 'partnerItemId': '1230', 'quantityPurchased': 1, 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6'}, {'brandCode': 'GREEN GIANT', 'description': 'GG KLONDIKE GOLDUST', 'discountedItemPrice': '3.99', 'finalPrice': '3.99', 'itemPrice': '3.99', 'originalReceiptItemText': 'GG KLONDIKE GOLDUST', 'partnerItemId': '1231', 'quantityPurchased': 1}, {'description': 'Green Bell Peppers', 'discountedItemPrice': '0.77', 'finalPrice': '0.77', 'itemPrice': '0.77', 'originalReceiptItemText': 'GREEN BELL PEPPERS', 'partnerItemId': '1232', 'quantityPurchased': 1}, {'barcode': '071146002487', 'brandCode': 'HARVEST SNAPS', 'description': 'CLB SNP CRSP BLCK PPR BKD 40 PCT LS FAT HRVS SNPS BAG 3.3 OZ', 'discountedItemPrice': '1.25', 'finalPrice': '1.25', 'itemPrice': '1.25', 'originalReceiptItemText': 'HRVST SNAPS BLACK PE', 'partnerItemId': '1233', 'quantityPurchased': 1, 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6'}, {'description': 'Organic Baby Spinach', 'discountedItemPrice': '1.88', 'finalPrice': '1.88', 'itemPrice': '1.88', 'originalReceiptItemText': 'JOSIES ORG BABY SPIN', 'partnerItemId': '1235', 'quantityPurchased': 1}, {'description': 'MONTEREY SLCD BABY B', 'discountedItemPrice': '2.50', 'finalPrice': '2.50', 'itemPrice': '2.50', 'originalReceiptItemText': 'MONTEREY SLCD BABY B', 'partnerItemId': '1236', 'quantityPurchased': 1}, ...]",FINISHED,574.65,6004a965e257124ec6b9a39f
756,{'$oid': '601c80940a720f05f40002d9'},601c80940a720f05f40002d9,,,{'$date': 1612480660291},1612480660291,2021-02-04 23:17:40.291,{'$date': 1612480660291},1612480660291,2021-02-04 23:17:40.291,,,NaT,{'$date': 1612480660291},1612480660291,2021-02-04 23:17:40.291,,,,,,,,SUBMITTED,,5fc961c3b8cfca11a077dd33


In [61]:
# convert df_receipts.pointsAwardedDate_cleaned to timestamps and confirm by viewing a sample of the dataframe
epoch_to_timestamp(df_receipts, 'pointsAwardedDate_cleaned')
df_receipts.sample(2)

AssertionError: there is at least one None/Null/NaN/NaT value in the converted timestamp data

In [62]:
# convert df_receipts.pointsAwardedDate_cleaned to timestamps and confirm by viewing a sample of the dataframe allowing nulls
epoch_to_timestamp(df_receipts, 'pointsAwardedDate_cleaned', allow_nulls=True)
df_receipts.sample(2)

Unnamed: 0,_id,_id_cleaned,bonusPointsEarned,bonusPointsEarnedReason,createDate,createDate_cleaned,createDate_cleaned_ts,dateScanned,dateScanned_cleaned,dateScanned_cleaned_ts,finishedDate,finishedDate_cleaned,finishedDate_cleaned_ts,modifyDate,modifyDate_cleaned,modifyDate_cleaned_ts,pointsAwardedDate,pointsAwardedDate_cleaned,pointsAwardedDate_cleaned_ts,pointsEarned,purchaseDate,purchaseDate_cleaned,purchasedItemCount,rewardsReceiptItemList,rewardsReceiptStatus,totalSpent,userId
582,{'$oid': '60155a1b0a7214ad50000170'},60155a1b0a7214ad50000170,,,{'$date': 1612012059113},1612012059113,2021-01-30 13:07:39.113,{'$date': 1612012059113},1612012059113,2021-01-30 13:07:39.113,,,NaT,{'$date': 1612012059113},1612012059113,2021-01-30 13:07:39.113,,,NaT,,,,,,SUBMITTED,,5fc961c3b8cfca11a077dd33
489,{'$oid': '6011f39c0a720f05350000b4'},6011f39c0a720f05350000b4,5.0,All-receipts receipt bonus,{'$date': 1611789212000},1611789212000,2021-01-27 23:13:32.000,{'$date': 1611789212000},1611789212000,2021-01-27 23:13:32.000,{'$date': 1611789218000},1611789000000.0,2021-01-27 23:13:38,{'$date': 1611789218000},1611789218000,2021-01-27 23:13:38.000,{'$date': 1611789212000},1611789000000.0,2021-01-27 23:13:32,5.0,{'$date': 1611702811000},1611703000000.0,2.0,"[{'barcode': '4011', 'description': 'ITEM NOT FOUND', 'finalPrice': '1', 'itemPrice': '1', 'partnerItemId': '1', 'quantityPurchased': 1}, {'barcode': '1234', 'finalPrice': '2.56', 'itemPrice': '2.56', 'needsFetchReview': True, 'needsFetchReviewReason': 'USER_FLAGGED', 'partnerItemId': '2', 'preventTargetGapPoints': True, 'quantityPurchased': 3, 'userFlaggedBarcode': '1234', 'userFlaggedDescription': '', 'userFlaggedNewItem': True, 'userFlaggedPrice': '2.56', 'userFlaggedQuantity': 3}]",FINISHED,1.0,6011f31ea4b74c18d3a8c476


In [63]:
# convert df_receipts.purchaseDate_cleaned to timestamps and confirm by viewing a sample of the dataframe 
epoch_to_timestamp(df_receipts, 'purchaseDate_cleaned')
df_receipts.sample(2)

AssertionError: there is at least one None/Null/NaN/NaT value in the converted timestamp data

In [64]:
# convert df_receipts.purchaseDate_cleaned to timestamps and confirm by viewing a sample of the dataframe
epoch_to_timestamp(df_receipts, 'purchaseDate_cleaned', allow_nulls = True)
df_receipts.sample(2)

Unnamed: 0,_id,_id_cleaned,bonusPointsEarned,bonusPointsEarnedReason,createDate,createDate_cleaned,createDate_cleaned_ts,dateScanned,dateScanned_cleaned,dateScanned_cleaned_ts,finishedDate,finishedDate_cleaned,finishedDate_cleaned_ts,modifyDate,modifyDate_cleaned,modifyDate_cleaned_ts,pointsAwardedDate,pointsAwardedDate_cleaned,pointsAwardedDate_cleaned_ts,pointsEarned,purchaseDate,purchaseDate_cleaned,purchaseDate_cleaned_ts,purchasedItemCount,rewardsReceiptItemList,rewardsReceiptStatus,totalSpent,userId
610,{'$oid': '60171af00a7214ad50000229'},60171af00a7214ad50000229,,,{'$date': 1612126960453},1612126960453,2021-01-31 21:02:40.453,{'$date': 1612126960453},1612126960453,2021-01-31 21:02:40.453,,,NaT,{'$date': 1612126960453},1612126960453,2021-01-31 21:02:40.453,,,NaT,,,,NaT,,,SUBMITTED,,5fc961c3b8cfca11a077dd33
990,{'$oid': '60268c790a7214d8e9000306'},60268c790a7214d8e9000306,750.0,"Receipt number 1 completed, bonus point schedule DEFAULT (5cefdcacf3693e0b50e83a36)",{'$date': 1613139065000},1613139065000,2021-02-12 14:11:05.000,{'$date': 1613139065000},1613139065000,2021-02-12 14:11:05.000,{'$date': 1613139065000},1613139000000.0,2021-02-12 14:11:05,{'$date': 1613139071000},1613139071000,2021-02-12 14:11:11.000,{'$date': 1613139065000},1613139000000.0,2021-02-12 14:11:05,850.0,{'$date': 1613088000000},1613088000000.0,2021-02-12,1.0,"[{'barcode': '4011', 'description': 'ITEM NOT FOUND', 'finalPrice': '25.00', 'itemPrice': '25.00', 'needsFetchReview': False, 'partnerItemId': '1', 'preventTargetGapPoints': True, 'quantityPurchased': 1, 'userFlaggedBarcode': '4011', 'userFlaggedNewItem': True, 'userFlaggedPrice': '25.00', 'userFlaggedQuantity': 1}]",FINISHED,25.0,60268c78efa6011bb151077d


In [65]:
# convert df_receipts.purchaseDate_cleaned to timestamps and confirm by viewing a sample of the dataframe, allowing nulls
epoch_to_timestamp(df_receipts, 'purchaseDate_cleaned', allow_nulls = True)
df_receipts.sample(2)

cannot insert purchaseDate_cleaned_ts, already exists, purchaseDate_cleaned_ts was not added to the dataframe


Unnamed: 0,_id,_id_cleaned,bonusPointsEarned,bonusPointsEarnedReason,createDate,createDate_cleaned,createDate_cleaned_ts,dateScanned,dateScanned_cleaned,dateScanned_cleaned_ts,finishedDate,finishedDate_cleaned,finishedDate_cleaned_ts,modifyDate,modifyDate_cleaned,modifyDate_cleaned_ts,pointsAwardedDate,pointsAwardedDate_cleaned,pointsAwardedDate_cleaned_ts,pointsEarned,purchaseDate,purchaseDate_cleaned,purchaseDate_cleaned_ts,purchasedItemCount,rewardsReceiptItemList,rewardsReceiptStatus,totalSpent,userId
357,{'$oid': '60074b960a7214ad8900002e'},60074b960a7214ad8900002e,5.0,All-receipts receipt bonus,{'$date': 1611090838000},1611090838000,2021-01-19 21:13:58.000,{'$date': 1611090838000},1611090838000,2021-01-19 21:13:58.000,{'$date': 1611090839000},1611091000000.0,2021-01-19 21:13:59,{'$date': 1611090839000},1611090839000,2021-01-19 21:13:59.000,{'$date': 1611090839000},1611091000000.0,2021-01-19 21:13:59,5.0,{'$date': 1611004438000},1611004000000.0,2021-01-18 21:13:58,1.0,"[{'barcode': '021000604906', 'competitiveProduct': True, 'finalPrice': '1', 'itemPrice': '1', 'partnerItemId': '1', 'quantityPurchased': 1, 'rewardsGroup': 'SARGENTO SLICED NATURAL CHEESE 8OZ OR LARGER', 'rewardsProductPartnerId': '5e7cf838f221c312e698a628'}]",FINISHED,1.0,54943462e4b07e684157a532
1037,{'$oid': '603a097e0a720fde10000180'},603a097e0a720fde10000180,,,{'$date': 1614416254020},1614416254020,2021-02-27 08:57:34.020,{'$date': 1614416254020},1614416254020,2021-02-27 08:57:34.020,,,NaT,{'$date': 1614416254020},1614416254020,2021-02-27 08:57:34.020,,,NaT,,,,NaT,,,SUBMITTED,,5fc961c3b8cfca11a077dd33


### visually check that all the cleaned and converted columns I expect are present

In [66]:
df_brands.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1167 entries, 0 to 1166
Data columns (total 10 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   _id           1167 non-null   object 
 1   _id_cleaned   1167 non-null   object 
 2   barcode       1167 non-null   int64  
 3   category      1012 non-null   object 
 4   categoryCode  517 non-null    object 
 5   cpg           1167 non-null   object 
 6   cpg_cleaned   1167 non-null   object 
 7   name          1167 non-null   object 
 8   topBrand      555 non-null    float64
 9   brandCode     933 non-null    object 
dtypes: float64(1), int64(1), object(8)
memory usage: 91.3+ KB


In [67]:
df_receipts.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1119 entries, 0 to 1118
Data columns (total 28 columns):
 #   Column                        Non-Null Count  Dtype         
---  ------                        --------------  -----         
 0   _id                           1119 non-null   object        
 1   _id_cleaned                   1119 non-null   object        
 2   bonusPointsEarned             544 non-null    float64       
 3   bonusPointsEarnedReason       544 non-null    object        
 4   createDate                    1119 non-null   object        
 5   createDate_cleaned            1119 non-null   int64         
 6   createDate_cleaned_ts         1119 non-null   datetime64[ns]
 7   dateScanned                   1119 non-null   object        
 8   dateScanned_cleaned           1119 non-null   int64         
 9   dateScanned_cleaned_ts        1119 non-null   datetime64[ns]
 10  finishedDate                  568 non-null    object        
 11  finishedDate_cleaned          

In [68]:
df_users.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 495 entries, 0 to 494
Data columns (total 12 columns):
 #   Column                  Non-Null Count  Dtype         
---  ------                  --------------  -----         
 0   _id                     495 non-null    object        
 1   _id_cleaned             495 non-null    object        
 2   active                  495 non-null    bool          
 3   createdDate             495 non-null    object        
 4   createdDate_cleaned     495 non-null    int64         
 5   createdDate_cleaned_ts  495 non-null    datetime64[ns]
 6   lastLogin               433 non-null    object        
 7   lastLogin_cleaned       433 non-null    float64       
 8   lastLogin_cleaned_ts    433 non-null    datetime64[ns]
 9   role                    495 non-null    object        
 10  signUpSource            447 non-null    object        
 11  state                   439 non-null    object        
dtypes: bool(1), datetime64[ns](2), float64(1), int64(1

### What might I need to answer the stakeholder questions?  
This collection of cells is representative of some of my brainstorming/planning process. I've attempted to 'think out loud' a bit here, but will more fully document what code is doing in the following section.

- What are the top 5 brands by receipts scanned for most recent month?
  - need to join to brands from receipts, only way there is via barcode: in rewardsReceiptItemList  
  
  
- How does the ranking of the top 5 brands by receipts scanned for the recent month compare to the ranking for the previous month?  
  - same as above, barcode 


- When considering average spend from receipts with 'rewardsReceiptStatus’ of ‘Accepted’ or ‘Rejected’, which is greater?  
  - this can be answered with df_receipts.totalSpent

- When considering total number of items purchased from receipts with 'rewardsReceiptStatus’ of ‘Accepted’ or ‘Rejected’, which is greater?  
  - df_receipts.purchasedItemCount


- Which brand has the most spend among users who were created within the past 6 months?  
  - barcode
  - df_users.createdDate_cleaned_ts

- Which brand has the most transactions among users who were created within the past 6 months?
  - barcode
  
  
Questions:
  - 1 receipt = 1 transaction?
  - There is no 'Accepted' value for rewardsReceiptStatus. Assume 'Finished' is 'Accepted' or anything but 'Rejected' or something else?
  - Re: receipts data - is this data a snapshot in time, if taken again might some statuses change, along the contents of rewardsReceiptItemList? If so, what are the final statuses - FINISHED and REJECTED?
    - looking at status by daterange might give some indication, there are a number of date fields - modifyDate could be representative of some sort of updated at reference 



**to-do:**
- explore what keys are included in a dictionary that includes barcode:, is it a consistent set?
  - it is not a consistent set, it looks like most 'FINISHED' receipts have the best quality of data. I'm curious what status implies 
- decide what else I should include in addition to barcode from rewardsReceiptItemList?
    - With the following I can get to brands via barcode / userFlaggedBarcode. I can sum quantity and prices and provide descriptions - potentially useful for the next level of drill down and easy to grab now.
  - 'barcode':
  - 'userFlaggedBarcode':
  - 'description':
  - 'userFlaggedDescription':
  - 'finalPrice':
  - 'userFlaggedPrice':
  - 'quantityPurchased':
  - 'userFlaggedQuantity':
- create a new data source that will act as a look up table, receipt_items. rows to include the original receipt id, and the above fields from  where available. If neither barcode or userFlaggedBarcode are avaialbe, don't include those receipt items


#### Update a few days later (9/23) after initally loading all the data to sql
- when attempting to answer 'What are the top 5 brands by receipts scanned for most recent month?' I realized my assumption that brands.barcode would join on coalesce(receipt_items.barcode, receipt_items.userFlaggedBarcode) was a bad one. A quick visual check would have confirmed next to no matches as it appears almost all of brands.barcode start with 511111 and practicaly none of the barcode values from receipt_items do.
- Will add the brandcode value from receipts.rewardsReceiptItemList to my derived receipt_items table. 
- For records that don't contain a brandcode in receipts.rewardsReceiptItemList I may be able to extract one from the description - will also add description and explore opportunities there. Could lead to an addtional extracted brand code.
- I'm going to leave the code I used to troubleshoot this in notebook: 2.1-EDA_first_pass.ipynb and will keep the original barcode values present in the final tables so that the sql continues to work.



In [70]:
# from df_receipts extract _id_cleaned and rewardsReceiptItemList to series and look at a few samplesabs
df_receipt_items = df_receipts[['_id_cleaned','rewardsReceiptItemList']]
# df_receipt_items
df_receipt_items.sample(3)

Unnamed: 0,_id_cleaned,rewardsReceiptItemList
1028,6038cabd0a720fde1000009c,
282,6000d4aa0a720f05f3000072,"[{'needsFetchReview': True, 'needsFetchReviewReason': 'USER_FLAGGED', 'partnerItemId': '1', 'preventTargetGapPoints': True, 'userFlaggedBarcode': '4011', 'userFlaggedDescription': '', 'userFlaggedNewItem': True, 'userFlaggedPrice': '22.00', 'userFlaggedQuantity': 4}, {'needsFetchReview': True, 'needsFetchReviewReason': 'USER_FLAGGED', 'partnerItemId': '2', 'preventTargetGapPoints': True, 'userFlaggedBarcode': '034100573065', 'userFlaggedDescription': 'MILLER LITE 24 PACK 12OZ CAN', 'userFlaggedNewItem': True, 'userFlaggedPrice': '28.00', 'userFlaggedQuantity': 3}]"
824,601f81560a720f053c0000f1,


In [71]:
df_receipts.groupby('rewardsReceiptStatus').count()

Unnamed: 0_level_0,_id,_id_cleaned,bonusPointsEarned,bonusPointsEarnedReason,createDate,createDate_cleaned,createDate_cleaned_ts,dateScanned,dateScanned_cleaned,dateScanned_cleaned_ts,finishedDate,finishedDate_cleaned,finishedDate_cleaned_ts,modifyDate,modifyDate_cleaned,modifyDate_cleaned_ts,pointsAwardedDate,pointsAwardedDate_cleaned,pointsAwardedDate_cleaned_ts,pointsEarned,purchaseDate,purchaseDate_cleaned,purchaseDate_cleaned_ts,purchasedItemCount,rewardsReceiptItemList,totalSpent,userId
rewardsReceiptStatus,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1
FINISHED,518,518,456,456,518,518,518,518,518,518,518,518,518,518,518,518,514,514,514,518,518,518,518,518,516,518,518
FLAGGED,46,46,30,30,46,46,46,46,46,46,0,0,0,46,46,46,19,19,19,33,35,35,35,46,46,46,46
PENDING,50,50,0,0,50,50,50,50,50,50,50,50,50,50,50,50,0,0,0,0,49,49,49,0,49,49,50
REJECTED,71,71,58,58,71,71,71,71,71,71,0,0,0,71,71,71,4,4,4,58,69,69,69,71,68,71,71
SUBMITTED,434,434,0,0,434,434,434,434,434,434,0,0,0,434,434,434,0,0,0,0,0,0,0,0,0,0,434


In [72]:
df_receipts[['_id_cleaned','rewardsReceiptStatus','rewardsReceiptItemList']].sample(40)

Unnamed: 0,_id_cleaned,rewardsReceiptStatus,rewardsReceiptItemList
1027,6039575d0a7217c72c0000f7,SUBMITTED,
857,6020c84a0a720f053c000187,SUBMITTED,
11,5ff1e1a10a720f0523000568,FINISHED,"[{'barcode': '013562300631', 'description': 'Annie's Homegrown Organic White Cheddar Macaroni & Cheese Shells, 6 Oz', 'discountedItemPrice': '50.00', 'finalPrice': '50.00', 'itemNumber': '013562300631', 'itemPrice': '50.00', 'needsFetchReview': True, 'needsFetchReviewReason': 'POINTS_GREATER_THAN_THRESHOLD', 'originalMetaBriteQuantityPurchased': 1, 'partnerItemId': '1', 'pointsNotAwardedReason': 'Action not allowed for user and CPG', 'pointsPayerId': '5332f5f3e4b03c9a25efd0ae', 'quantityPurchased': 5, 'rewardsGroup': 'ANNIE'S HOMEGROWN MULTI-SERVING MAC & CHEESE', 'rewardsProductPartnerId': '5332f5f3e4b03c9a25efd0ae'}]"
1024,603946f60a720fde100000ce,SUBMITTED,
191,5ffc8fa40a720f05c5000028,FINISHED,"[{'barcode': '4011', 'description': 'ITEM NOT FOUND', 'finalPrice': '1', 'itemPrice': '1', 'partnerItemId': '1', 'quantityPurchased': 1}, {'barcode': '1234', 'needsFetchReview': True, 'needsFetchReviewReason': 'USER_FLAGGED', 'partnerItemId': '2', 'preventTargetGapPoints': True, 'userFlaggedBarcode': '1234', 'userFlaggedDescription': '', 'userFlaggedNewItem': True}]"
562,601443310a720f05f80000c6,FINISHED,"[{'barcode': '016000898264', 'brandCode': 'BRAND', 'description': 'Honey Nut Cheerios Rte Cereal - Family Size, 2 Pack', 'finalPrice': '10.00', 'itemPrice': '10.00', 'partnerItemId': '0', 'pointsNotAwardedReason': 'Action not allowed for user and CPG', 'pointsPayerId': '5332f5f3e4b03c9a25efd0ae', 'quantityPurchased': 1, 'rewardsGroup': 'HONEY NUT CHEERIOS CEREAL FAMILY SIZE', 'rewardsProductPartnerId': '5332f5f3e4b03c9a25efd0ae'}]"
1,5ff1e1bb0a720f052300056b,FINISHED,"[{'barcode': '4011', 'description': 'ITEM NOT FOUND', 'finalPrice': '1', 'itemPrice': '1', 'partnerItemId': '1', 'quantityPurchased': 1}, {'barcode': '028400642255', 'description': 'DORITOS TORTILLA CHIP SPICY SWEET CHILI REDUCED FAT BAG 1 OZ', 'finalPrice': '10.00', 'itemPrice': '10.00', 'needsFetchReview': True, 'needsFetchReviewReason': 'USER_FLAGGED', 'partnerItemId': '2', 'pointsNotAwardedReason': 'Action not allowed for user and CPG', 'pointsPayerId': '5332f5fbe4b03c9a25efd0ba', 'preventTargetGapPoints': True, 'quantityPurchased': 1, 'rewardsGroup': 'DORITOS SPICY SWEET CHILI SINGLE SERVE', 'rewardsProductPartnerId': '5332f5fbe4b03c9a25efd0ba', 'userFlaggedBarcode': '028400642255', 'userFlaggedDescription': 'DORITOS TORTILLA CHIP SPICY SWEET CHILI REDUCED FAT BAG 1 OZ', 'userFlaggedNewItem': True, 'userFlaggedPrice': '10.00', 'userFlaggedQuantity': 1}]"
524,6012ea2d0a720f05f8000068,FINISHED,"[{'barcode': '036000495737', 'brandCode': 'HUGGIES', 'description': 'HUGGIES SPECIAL DELIVERY NEWBORN DISPOSABLE DIAPER BOX 66 CT', 'discountedItemPrice': '19.99', 'finalPrice': '19.99', 'itemNumber': '036000495737', 'itemPrice': '19.99', 'metabriteCampaignId': 'HUGGIES SPECIAL DELIVERY DIAPERS 60 - 69 COUNT', 'partnerItemId': '1014', 'pointsEarned': '199.9', 'pointsPayerId': '550b2565e4b001d5e9e4146f', 'quantityPurchased': 1, 'rewardsGroup': 'HUGGIES SPECIAL DELIVERY DIAPERS 60 - 69 COUNT', 'rewardsProductPartnerId': '550b2565e4b001d5e9e4146f'}, {'barcode': '036000432190', 'description': 'HUGGIES SIMPLY CLEAN PREMOISTENED FC HND AND BTM WP FRAGRANCE FREE NONFLUSHABLE BAG 72 COUNT', 'discountedItemPrice': '9.99', 'finalPrice': '9.99', 'itemNumber': '036000432190', 'itemPrice': '9.99', 'originalMetaBriteBarcode': '', 'partnerItemId': '1017', 'pointsEarned': '99.9', 'pointsPayerId': '550b2565e4b001d5e9e4146f', 'quantityPurchased': 1, 'rewardsGroup': 'HUGGIES ONE AND DONE SIMPLY CLEAN BABY WIPES 56 - 159 COUNT', 'rewardsProductPartnerId': '550b2565e4b001d5e9e4146f'}]"
55,5ff371240a7214ada10005b3,FINISHED,"[{'barcode': '044700073377', 'description': 'OSCAR MAYER Jumbo Angus Beef Uncured Franks, 15.0 OZ', 'finalPrice': '1', 'itemPrice': '1', 'partnerItemId': '1', 'pointsEarned': '5.0', 'pointsPayerId': '559c2234e4b06aca36af13c6', 'quantityPurchased': 1, 'rewardsGroup': 'OSCAR MAYER HOT DOG - BEEF FRANKS', 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6', 'targetPrice': '800'}]"
65,5ff4ce3d0a7214ada10005d3,FINISHED,"[{'barcode': '4011', 'description': 'ITEM NOT FOUND', 'finalPrice': '29.00', 'itemPrice': '29.00', 'needsFetchReview': False, 'partnerItemId': '1', 'preventTargetGapPoints': True, 'quantityPurchased': 2, 'userFlaggedBarcode': '4011', 'userFlaggedNewItem': True, 'userFlaggedPrice': '29.00', 'userFlaggedQuantity': 2}]"


In [73]:
id_125 = df_receipt_items.loc[df_receipt_items['_id_cleaned'] == '6008ee0e0a7214ad89000125']
id_125

Unnamed: 0,_id_cleaned,rewardsReceiptItemList
392,6008ee0e0a7214ad89000125,"[{'barcode': '012000809965', 'description': 'MTN DEW REVOLUTION SODA WILDBERRY FRUIT FLVR CANS IN BOX 12 CT 144 OZ', 'discountedItemPrice': '8.99', 'finalPrice': '8.99', 'itemNumber': '012000809965', 'itemPrice': '8.99', 'originalMetaBriteBarcode': '', 'originalReceiptItemText': 'ILDBERRY FRUIT FLVR CANS IN BOX 12 C', 'partnerItemId': '1032', 'pointsNotAwardedReason': 'Action not allowed for user and CPG', 'pointsPayerId': '5332f5fbe4b03c9a25efd0ba', 'quantityPurchased': 1, 'rewardsGroup': 'MOUNTAIN DEW 12 OZ 12 PACK', 'rewardsProductPartnerId': '5332f5fbe4b03c9a25efd0ba'}, {'barcode': '511111101451', 'description': 'QUAKER', 'discountedItemPrice': '3.99', 'finalPrice': '3.99', 'itemNumber': '511111101451', 'itemPrice': '3.99', 'originalMetaBriteBarcode': '', 'originalReceiptItemText': '2.99 10 OUAKER OATS Q', 'partnerItemId': '1042', 'pointsNotAwardedReason': 'Action not allowed for user and CPG', 'pointsPayerId': '53e10d6368abd3c7065097cc', 'quantityPurchased': 1, 'rewardsProductPartnerId': '53e10d6368abd3c7065097cc'}, {'barcode': '005111116022', 'description': 'TTER BLUE KRAZY KRITTER BLUE 1', 'discountedItemPrice': '1.49', 'finalPrice': '1.49', 'itemNumber': '005111116022', 'itemPrice': '1.49', 'originalMetaBriteBarcode': '', 'originalReceiptItemText': 'TTER BLUE KRAZY KRITTER BLUE 1', 'partnerItemId': '1048', 'quantityPurchased': 1}, {'barcode': '511111602118', 'description': 'JELL-O', 'discountedItemPrice': '1.99', 'finalPrice': '1.99', 'itemNumber': '511111602118', 'itemPrice': '1.99', 'originalMetaBriteBarcode': '', 'originalReceiptItemText': 'LO JELL-O', 'partnerItemId': '1051', 'pointsEarned': '10.0', 'pointsPayerId': '559c2234e4b06aca36af13c6', 'quantityPurchased': 1, 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6'}, {'barcode': '311111536044', 'description': 'LUCKY CHARMS UNICORN CEREAL FAMILY SIZE', 'discountedItemPrice': '6.58', 'finalPrice': '6.58', 'itemNumber': '311111536044', 'itemPrice': '6.58', 'originalMetaBriteBarcode': '', 'originalReceiptItemText': 'SI HIDDEN VALLEY SALAD DRESSING 21OZ', 'partnerItemId': '1088', 'pointsNotAwardedReason': 'Action not allowed for user and CPG', 'pointsPayerId': '5332f5f3e4b03c9a25efd0ae', 'quantityPurchased': 1, 'rewardsGroup': 'LUCKY CHARMS UNICORN CEREAL FAMILY SIZE', 'rewardsProductPartnerId': '5332f5f3e4b03c9a25efd0ae'}, {'barcode': '074682200294', 'description': 'R W KND FML BT VGTB JC BTL RFRG AFTR OPNN 32 FL OZ', 'discountedItemPrice': '7.89', 'finalPrice': '7.89', 'itemNumber': '074682200294', 'itemPrice': '7.89', 'originalMetaBriteBarcode': '', 'originalReceiptItemText': 'ML BT VGTB JC BTL RFRG AFTR OPNN 32', 'partnerItemId': '1091', 'quantityPurchased': 1, 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6'}, {'barcode': '011594404013', 'description': 'HWN ONN RNG SWT M GLDN CRSP BAG 4 OZ', 'discountedItemPrice': '1.49', 'finalPrice': '1.49', 'itemNumber': '011594404013', 'itemPrice': '1.49', 'originalMetaBriteBarcode': '', 'originalReceiptItemText': 'AIIAN SWT HWN ONN RNG SWT M GLDN CRS', 'partnerItemId': '1112', 'quantityPurchased': 1, 'rewardsProductPartnerId': '559c2234e4b06aca36af13c6'}]"


In [74]:
# extract a sample value from rewardsReceiptItemList
receiptlist = id_125.iloc[0]['rewardsReceiptItemList']
receiptlist
len(receiptlist)

for item in receiptlist:
    print(item['barcode'])

012000809965
511111101451
005111116022
511111602118
311111536044
074682200294
011594404013


### Creating df_receipt_items data source

In [75]:
# create a list containing '_id_cleaned','rewardsReceiptItemList' values from df_receipts
list_receipt_items_in = df_receipts[['_id_cleaned', 'rewardsReceiptItemList']].values.tolist()

In [76]:
# create an emptly list to store values from list_receipt_items_in, a list of lists
list_receipt_items_expand = []

for _id_cleaned, rewardsReceiptItemList in list_receipt_items_in:
    item_index = 0
    try:
        for item in rewardsReceiptItemList:
            # create the list to add to list_receipt_items_expand
            list_out = [_id_cleaned, item_index, item]
            list_receipt_items_expand.append(list_out)
            item_index += 1
    except:
        pass

# confirm the list is composed as intended, slicing the first 5 items
list_receipt_items_expand[0:6]

[['5ff1e1eb0a720f0523000575',
  0,
  {'barcode': '4011',
   'description': 'ITEM NOT FOUND',
   'finalPrice': '26.00',
   'itemPrice': '26.00',
   'needsFetchReview': False,
   'partnerItemId': '1',
   'preventTargetGapPoints': True,
   'quantityPurchased': 5,
   'userFlaggedBarcode': '4011',
   'userFlaggedNewItem': True,
   'userFlaggedPrice': '26.00',
   'userFlaggedQuantity': 5}],
 ['5ff1e1bb0a720f052300056b',
  0,
  {'barcode': '4011',
   'description': 'ITEM NOT FOUND',
   'finalPrice': '1',
   'itemPrice': '1',
   'partnerItemId': '1',
   'quantityPurchased': 1}],
 ['5ff1e1bb0a720f052300056b',
  1,
  {'barcode': '028400642255',
   'description': 'DORITOS TORTILLA CHIP SPICY SWEET CHILI REDUCED FAT BAG 1 OZ',
   'finalPrice': '10.00',
   'itemPrice': '10.00',
   'needsFetchReview': True,
   'needsFetchReviewReason': 'USER_FLAGGED',
   'partnerItemId': '2',
   'pointsNotAwardedReason': 'Action not allowed for user and CPG',
   'pointsPayerId': '5332f5fbe4b03c9a25efd0ba',
   'preve

In [96]:
# create an empty list to store values from list_receipt_items_expand, another list of lists
# 9/24 - take two - adding brandCode - adjusted conditions for saving values by adding brandCode and Description
# I would drop my barcode and userFlaggedBarcode conditons but I want to perserve my exploratory code in 2.1-EDA_first_pass.ipynb
list_receipt_items_extract = []

for _id_cleaned, item_index, item in list_receipt_items_expand:
    # only save values to list_receipt_items_extract if there is a 'barcode', 'userFlaggedBarcode',
    # 'description', OR 'brandCode' present 
    if item.get('barcode', None) or item.get('userFlaggedBarcode', None) or item.get('description', None) \
    or item.get('brandCode', None):
        # assign variables values from item dictionaries, if the key doesn't exist default None
        barcode = item.get('barcode', None)
        userFlaggedBarcode = item.get('userFlaggedBarcode', None)
        description = item.get('description', None)
        userFlaggedDescription = item.get('userFlaggedDescription', None)
        finalPrice = item.get('finalPrice', None)
        userFlaggedPrice = item.get('userFlaggedPrice', None)
        quantityPurchased = item.get('quantityPurchased', None)
        userFlaggedQuantity = item.get('userFlaggedQuantity', None)
        brandCode = item.get('brandCode', None)
        #create the list to add to list_receipt_items_extract
        list_out = [
            _id_cleaned, item_index, barcode, userFlaggedBarcode, description, userFlaggedDescription, \
            finalPrice, userFlaggedPrice, quantityPurchased, userFlaggedQuantity, brandCode
            ]
        list_receipt_items_extract.append(list_out)

In [97]:
# inspect the first items of list_receipt_items_extract
list_receipt_items_extract[0:6]

[['5ff1e1eb0a720f0523000575',
  0,
  '4011',
  '4011',
  'ITEM NOT FOUND',
  None,
  '26.00',
  '26.00',
  5,
  5,
  None],
 ['5ff1e1bb0a720f052300056b',
  0,
  '4011',
  None,
  'ITEM NOT FOUND',
  None,
  '1',
  None,
  1,
  None,
  None],
 ['5ff1e1bb0a720f052300056b',
  1,
  '028400642255',
  '028400642255',
  'DORITOS TORTILLA CHIP SPICY SWEET CHILI REDUCED FAT BAG 1 OZ',
  'DORITOS TORTILLA CHIP SPICY SWEET CHILI REDUCED FAT BAG 1 OZ',
  '10.00',
  '10.00',
  1,
  1,
  None],
 ['5ff1e1f10a720f052300057a',
  0,
  None,
  '4011',
  None,
  None,
  None,
  '26.00',
  None,
  3,
  None],
 ['5ff1e1ee0a7214ada100056f',
  0,
  '4011',
  '4011',
  'ITEM NOT FOUND',
  None,
  '28.00',
  '28.00',
  4,
  4,
  None],
 ['5ff1e1d20a7214ada1000561',
  0,
  '4011',
  None,
  'ITEM NOT FOUND',
  None,
  '1',
  None,
  1,
  None,
  None]]

In [98]:
# create an empty dataframe, df_receipt_items with columns
# 9/24 - take two - adding brandCode
ri_columns = [
                "receipt_id",
                "item_index",
                "barcode",
                "userFlaggedBarcode",
                "description",
                "userFlaggedDescription",
                "finalPrice",
                "userFlaggedPrice",
                "quantityPurchased",
                "userFlaggedQuantity",
                "brandCode"
                ]
# load all the values from list_receipt_items_extract into this dataframe
df_receipt_items = pd.DataFrame(list_receipt_items_extract, columns = ri_columns)

In [100]:
"""9/24 - take two - adding brandCode and adjusting 
Take one output follows:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3240 entries, 0 to 3239
Data columns (total 11 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   receipt_id              3240 non-null   object 
 1   item_index              3240 non-null   int64  
 2   barcode                 3090 non-null   object 
 3   userFlaggedBarcode      337 non-null    object 
 4   description             2859 non-null   object 
 5   userFlaggedDescription  205 non-null    object 
 6   finalPrice              3066 non-null   object 
 7   userFlaggedPrice        299 non-null    object 
 8   quantityPurchased       3066 non-null   float64
 9   userFlaggedQuantity     299 non-null    float64
 10  brandCode               1746 non-null   object 
dtypes: float64(2), int64(1), object(8)
memory usage: 278.6+ KB

Many more records to work with after pivoting to brandCode.
"""
df_receipt_items.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6941 entries, 0 to 6940
Data columns (total 11 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   receipt_id              6941 non-null   object 
 1   item_index              6941 non-null   int64  
 2   barcode                 3090 non-null   object 
 3   userFlaggedBarcode      337 non-null    object 
 4   description             6560 non-null   object 
 5   userFlaggedDescription  205 non-null    object 
 6   finalPrice              6767 non-null   object 
 7   userFlaggedPrice        299 non-null    object 
 8   quantityPurchased       6767 non-null   float64
 9   userFlaggedQuantity     299 non-null    float64
 10  brandCode               2600 non-null   object 
dtypes: float64(2), int64(1), object(8)
memory usage: 596.6+ KB


### Extracting brandCodes from description
Once I realized the better join between receipt_items and brand would be on brandCode I had the thought to apply a similar process that I used in my Fetch Catalog Analyst exercise. I wondered, are there items in receipt_items that have a value for description but None for brandCode. In those situations, might I be able to indentify a brand based on comparing known brands and the content of description.

The following cells capature my exploration of this idea.

In [101]:
# What's the potential set of items that would be eligible for this comparison. ie, non-null description and null brandCode
df_non_null_description = df_receipt_items

### Convert data types and load dataframes to SQLite database 

In [199]:
# convert topBrand to dtype boolean, allowing nulls
df_brands.topBrand = df_brands.topBrand.astype("boolean")

In [200]:
df_brands.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1167 entries, 0 to 1166
Data columns (total 10 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   _id           1167 non-null   object 
 1   _id_cleaned   1167 non-null   object 
 2   barcode       1167 non-null   int64  
 3   category      1012 non-null   object 
 4   categoryCode  517 non-null    object 
 5   cpg           1167 non-null   object 
 6   cpg_cleaned   1167 non-null   object 
 7   name          1167 non-null   object 
 8   topBrand      555 non-null    boolean
 9   brandCode     933 non-null    object 
dtypes: boolean(1), int64(1), object(8)
memory usage: 84.5+ KB


In [201]:
# create a new datafame with only the columns I want to load to sqlite
df_brands_load = df_brands[[
                            '_id_cleaned', 'barcode', 'category', 'categoryCode', 
                            'cpg_cleaned', 'name', 'topBrand', 'brandCode'
                            ]]
# rename some columns for better formating when loading into sqlite
df_brands_load.rename(columns={'_id_cleaned': 'id', 'cpg_cleaned': 'cpg'}, inplace=True)
df_brands_load


Unnamed: 0,id,barcode,category,categoryCode,cpg,name,topBrand,brandCode
0,601ac115be37ce2ead437551,511111019862,Baking,BAKING,601ac114be37ce2ead437550,test brand @1612366101024,False,
1,601c5460be37ce2ead43755f,511111519928,Beverages,BEVERAGES,5332f5fbe4b03c9a25efd0ba,Starbucks,False,STARBUCKS
2,601ac142be37ce2ead43755d,511111819905,Baking,BAKING,601ac142be37ce2ead437559,test brand @1612366146176,False,TEST BRANDCODE @1612366146176
3,601ac142be37ce2ead43755a,511111519874,Baking,BAKING,601ac142be37ce2ead437559,test brand @1612366146051,False,TEST BRANDCODE @1612366146051
4,601ac142be37ce2ead43755e,511111319917,Candy & Sweets,CANDY_AND_SWEETS,5332fa12e4b03c9a25efd1e7,test brand @1612366146827,False,TEST BRANDCODE @1612366146827
...,...,...,...,...,...,...,...,...
1162,5f77274dbe37ce6b592e90c0,511111116752,Baking,BAKING,5f77274dbe37ce6b592e90bf,test brand @1601644365844,,
1163,5dc1fca91dda2c0ad7da64ae,511111706328,Breakfast & Cereal,,53e10d6368abd3c7065097cc,Dippin Dots® Cereal,,DIPPIN DOTS CEREAL
1164,5f494c6e04db711dd8fe87e7,511111416173,Candy & Sweets,CANDY_AND_SWEETS,5332fa12e4b03c9a25efd1e7,test brand @1598639215217,,TEST BRANDCODE @1598639215217
1165,5a021611e4b00efe02b02a57,511111400608,Grocery,,5332f5f6e4b03c9a25efd0b4,LIPTON TEA Leaves,False,LIPTON TEA Leaves


In [202]:
# when trying to load, the unique constraint fails on barcode: IntegrityError: UNIQUE constraint failed: brands.barcode
# retrun all rows where barcodes are repeated ref: https://stackoverflow.com/questions/14657241/how-do-i-get-a-list-of-all-the-duplicate-items-using-pandas-in-python
pd.concat(g for _, g in df_brands_load.groupby("barcode") if len(g) > 1)

Unnamed: 0,id,barcode,category,categoryCode,cpg,name,topBrand,brandCode
467,5c409ab4cd244a3539b84162,511111004790,Baking,,55b62995e4b0d8e685c14213,alexa,True,ALEXA
1071,5cdacd63166eb33eb7ce0fa8,511111004790,Condiments & Sauces,,559c2234e4b06aca36af13c6,Bitten Dressing,,BITTEN
152,5c45f91b87ff3552f950f027,511111204923,Grocery,,5c45f8b087ff3552f950f026,Brand1,True,0987654321
536,5d6027f46d5f3b23d1bc7906,511111204923,Snacks,,5332f5fbe4b03c9a25efd0ba,CHESTER'S,,CHESTERS
20,5c4699f387ff3577e203ea29,511111305125,Baby,,55b62995e4b0d8e685c14213,Chris Image Test,,CHRISIMAGE
651,5d642d65a3a018514994f42d,511111305125,Magazines,,5d5d4fd16d5f3b23d1bc7905,Rachael Ray Everyday,,511111305125
129,5a7e0604e4b0aedb3b84afd3,511111504139,Beverages,,55b62995e4b0d8e685c14213,Chris Brand XYZ,,CHRISXYZ
299,5a8c33f3e4b07f0a2dac8943,511111504139,Grocery,,5a734034e4b0d58f376be874,Pace,False,PACE
9,5c408e8bcd244a1fdb47aee7,511111504788,Baking,,59ba6f1ce4b092b29c167346,test,,TEST
412,5ccb2ece166eb31bbbadccbe,511111504788,Condiments & Sauces,,559c2234e4b06aca36af13c6,The Pioneer Woman,,PIONEER WOMAN


In [203]:
# add a column to flag if the barcode is a duplicate
# start by creating a list of the duplicated barcodes
dupe_barcodes = pd.concat(g for _, g in df_brands_load.groupby("barcode") if len(g) > 1)['barcode']
# remove duplicates
dupe_barcodes = list(set(dupe_barcodes))

# create a list to collect a bool value indicating if df_brands_load['barcode'] is duplicated
dupe_barcode_flag = []

# extract barcodes to evaluate
brand_barcodes = df_brands_load['barcode']

# loop through brand_barcodes evaluating if the barcode is included in dupe_barcodes
for barcode in brand_barcodes:
    if barcode in dupe_barcodes:
        dupe_barcode_flag.append(True)
    else:
        dupe_barcode_flag.append(None)

# add dupe_barcode to df_brands_load after the barcode column
df_brands_load.insert(2, 'dupe_barcode', dupe_barcode_flag)

# convert dupe_barcode to dtype boolean, allowing nulls
df_brands_load.dupe_barcode = df_brands_load.dupe_barcode.astype("boolean")

df_brands_load

Unnamed: 0,id,barcode,dupe_barcode,category,categoryCode,cpg,name,topBrand,brandCode
0,601ac115be37ce2ead437551,511111019862,,Baking,BAKING,601ac114be37ce2ead437550,test brand @1612366101024,False,
1,601c5460be37ce2ead43755f,511111519928,,Beverages,BEVERAGES,5332f5fbe4b03c9a25efd0ba,Starbucks,False,STARBUCKS
2,601ac142be37ce2ead43755d,511111819905,,Baking,BAKING,601ac142be37ce2ead437559,test brand @1612366146176,False,TEST BRANDCODE @1612366146176
3,601ac142be37ce2ead43755a,511111519874,,Baking,BAKING,601ac142be37ce2ead437559,test brand @1612366146051,False,TEST BRANDCODE @1612366146051
4,601ac142be37ce2ead43755e,511111319917,,Candy & Sweets,CANDY_AND_SWEETS,5332fa12e4b03c9a25efd1e7,test brand @1612366146827,False,TEST BRANDCODE @1612366146827
...,...,...,...,...,...,...,...,...,...
1162,5f77274dbe37ce6b592e90c0,511111116752,,Baking,BAKING,5f77274dbe37ce6b592e90bf,test brand @1601644365844,,
1163,5dc1fca91dda2c0ad7da64ae,511111706328,,Breakfast & Cereal,,53e10d6368abd3c7065097cc,Dippin Dots® Cereal,,DIPPIN DOTS CEREAL
1164,5f494c6e04db711dd8fe87e7,511111416173,,Candy & Sweets,CANDY_AND_SWEETS,5332fa12e4b03c9a25efd1e7,test brand @1598639215217,,TEST BRANDCODE @1598639215217
1165,5a021611e4b00efe02b02a57,511111400608,,Grocery,,5332f5f6e4b03c9a25efd0b4,LIPTON TEA Leaves,False,LIPTON TEA Leaves


In [204]:
df_brands_load.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1167 entries, 0 to 1166
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   id            1167 non-null   object 
 1   barcode       1167 non-null   int64  
 2   dupe_barcode  14 non-null     boolean
 3   category      1012 non-null   object 
 4   categoryCode  517 non-null    object 
 5   cpg           1167 non-null   object 
 6   name          1167 non-null   object 
 7   topBrand      555 non-null    boolean
 8   brandCode     933 non-null    object 
dtypes: boolean(2), int64(1), object(6)
memory usage: 68.5+ KB


In [205]:
# create brands table
conn = sqlite3.connect(db_path)
c = conn.cursor()

c.execute('DROP TABLE IF EXISTS brands')

c.execute("""CREATE TABLE IF NOT EXISTS brands (
        id uuid PRIMARY KEY,
        -- there are duplicates in barcode, for now we'll load the table without this contraint and note it
        -- barcode numeric UNIQUE,
        barcode numeric,
        dupe_barcode bool,
        category text,
        categoryCode text,
        cpg text,
        name text,
        topBrand bool,
        brandCode text       
    )""")

df_brands_load.to_sql('brands', conn, if_exists='append', index=False)

conn.commit()
conn.close()

In [206]:
df_users.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 495 entries, 0 to 494
Data columns (total 12 columns):
 #   Column                  Non-Null Count  Dtype         
---  ------                  --------------  -----         
 0   _id                     495 non-null    object        
 1   _id_cleaned             495 non-null    object        
 2   active                  495 non-null    bool          
 3   createdDate             495 non-null    object        
 4   createdDate_cleaned     495 non-null    int64         
 5   createdDate_cleaned_ts  495 non-null    datetime64[ns]
 6   lastLogin               433 non-null    object        
 7   lastLogin_cleaned       433 non-null    float64       
 8   lastLogin_cleaned_ts    433 non-null    datetime64[ns]
 9   role                    495 non-null    object        
 10  signUpSource            447 non-null    object        
 11  state                   439 non-null    object        
dtypes: bool(1), datetime64[ns](2), float64(1), int64(1

In [207]:
# create a new datafame with only the columns I want to load to sqlite
df_users_load = df_users[[
                            '_id_cleaned', 'active', 'createdDate_cleaned_ts', 'lastLogin_cleaned_ts', 'role', 
                            'signUpSource', 'state'
                            ]]
# rename some columns for better formating when loading into sqlite
df_users_load.rename(columns={
                                '_id_cleaned': 'id', 'createdDate_cleaned_ts': 'createdDate', 
                                'lastLogin_cleaned_ts': 'lastLogin'
                                }, inplace=True)
df_users_load

Unnamed: 0,id,active,createdDate,lastLogin,role,signUpSource,state
0,5ff1e194b6a9d73a3a9f1052,True,2021-01-03 15:24:04.800,2021-01-03 15:25:37.858,consumer,Email,WI
1,5ff1e194b6a9d73a3a9f1052,True,2021-01-03 15:24:04.800,2021-01-03 15:25:37.858,consumer,Email,WI
2,5ff1e194b6a9d73a3a9f1052,True,2021-01-03 15:24:04.800,2021-01-03 15:25:37.858,consumer,Email,WI
3,5ff1e1eacfcf6c399c274ae6,True,2021-01-03 15:25:30.554,2021-01-03 15:25:30.597,consumer,Email,WI
4,5ff1e194b6a9d73a3a9f1052,True,2021-01-03 15:24:04.800,2021-01-03 15:25:37.858,consumer,Email,WI
...,...,...,...,...,...,...,...
490,54943462e4b07e684157a532,True,2014-12-19 14:21:22.381,2021-03-05 16:52:23.204,fetch-staff,,
491,54943462e4b07e684157a532,True,2014-12-19 14:21:22.381,2021-03-05 16:52:23.204,fetch-staff,,
492,54943462e4b07e684157a532,True,2014-12-19 14:21:22.381,2021-03-05 16:52:23.204,fetch-staff,,
493,54943462e4b07e684157a532,True,2014-12-19 14:21:22.381,2021-03-05 16:52:23.204,fetch-staff,,


In [208]:
# when trying to load, the unique and Primary Key constraint fails on barcode: IntegrityError: UNIQUE constraint failed: users.id
# retrun all rows where barcodes are repeated ref: https://stackoverflow.com/questions/14657241/how-do-i-get-a-list-of-all-the-duplicate-items-using-pandas-in-python
pd.concat(g for _, g in df_users_load.groupby("id") if len(g) > 1)

Unnamed: 0,id,active,createdDate,lastLogin,role,signUpSource,state
475,54943462e4b07e684157a532,True,2014-12-19 14:21:22.381,2021-03-05 16:52:23.204,fetch-staff,,
476,54943462e4b07e684157a532,True,2014-12-19 14:21:22.381,2021-03-05 16:52:23.204,fetch-staff,,
477,54943462e4b07e684157a532,True,2014-12-19 14:21:22.381,2021-03-05 16:52:23.204,fetch-staff,,
478,54943462e4b07e684157a532,True,2014-12-19 14:21:22.381,2021-03-05 16:52:23.204,fetch-staff,,
479,54943462e4b07e684157a532,True,2014-12-19 14:21:22.381,2021-03-05 16:52:23.204,fetch-staff,,
...,...,...,...,...,...,...,...
374,60189c94c8b50e11d8454f6b,True,2021-02-02 00:28:04.020,2021-02-02 00:28:04.073,consumer,Email,WI
385,601c2c05969c0b11f7d0b097,True,2021-02-04 17:16:53.700,2021-02-04 17:20:30.228,consumer,Email,WI
387,601c2c05969c0b11f7d0b097,True,2021-02-04 17:16:53.700,2021-02-04 17:20:30.228,consumer,Email,WI
393,60229990b57b8a12187fe9e0,True,2021-02-09 14:17:52.581,2021-02-09 14:17:52.626,consumer,Email,WI


In [209]:
# drop the duplicate rows (complete rows, may still result in dupe ids), then attempt to load 
df_backup = df_users_load
df_users_load.drop_duplicates(inplace = True)
# deduping worked, there were not repeating ids with unqiue values in any other column
df_users_load.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 212 entries, 0 to 475
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype         
---  ------        --------------  -----         
 0   id            212 non-null    object        
 1   active        212 non-null    bool          
 2   createdDate   212 non-null    datetime64[ns]
 3   lastLogin     172 non-null    datetime64[ns]
 4   role          212 non-null    object        
 5   signUpSource  207 non-null    object        
 6   state         206 non-null    object        
dtypes: bool(1), datetime64[ns](2), object(4)
memory usage: 11.8+ KB


In [210]:
# create users table
conn = sqlite3.connect(db_path)
c = conn.cursor()

c.execute('DROP TABLE IF EXISTS users')

c.execute("""CREATE TABLE IF NOT EXISTS users (
        id uuid PRIMARY KEY,
        active bool,
        createdDate timestamp,
        lastLogin timestamp,
        role text,
        signUpSource text,
        state text     
    )""")

df_users_load.to_sql('users', conn, if_exists='append', index=False)

conn.commit()
conn.close()

In [211]:
df_receipts.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1119 entries, 0 to 1118
Data columns (total 28 columns):
 #   Column                        Non-Null Count  Dtype         
---  ------                        --------------  -----         
 0   _id                           1119 non-null   object        
 1   _id_cleaned                   1119 non-null   object        
 2   bonusPointsEarned             544 non-null    float64       
 3   bonusPointsEarnedReason       544 non-null    object        
 4   createDate                    1119 non-null   object        
 5   createDate_cleaned            1119 non-null   int64         
 6   createDate_cleaned_ts         1119 non-null   datetime64[ns]
 7   dateScanned                   1119 non-null   object        
 8   dateScanned_cleaned           1119 non-null   int64         
 9   dateScanned_cleaned_ts        1119 non-null   datetime64[ns]
 10  finishedDate                  568 non-null    object        
 11  finishedDate_cleaned          

In [212]:
# create a new datafame with only the columns I want to load to sqlite
df_receipts_load = df_receipts[[
                            '_id_cleaned', 'bonusPointsEarned', 'bonusPointsEarnedReason', 'createDate_cleaned_ts', 
                            'dateScanned_cleaned_ts', 'finishedDate_cleaned_ts', 'modifyDate_cleaned_ts', 
                            'pointsAwardedDate_cleaned_ts', 'pointsEarned', 'purchaseDate_cleaned_ts', 
                            'purchasedItemCount', 'rewardsReceiptItemList', 'rewardsReceiptStatus', 'totalSpent', 
                            'userId'
                            ]]
# rename some columns for better formating when loading into sqlite
df_receipts_load.rename(columns={
                                '_id_cleaned': 'id', 'createDate_cleaned_ts': 'createDate', 
                                'dateScanned_cleaned_ts': 'dateScanned', 'finishedDate_cleaned_ts': 'finishedDate',
                                'modifyDate_cleaned_ts': 'modifyDate', 'pointsAwardedDate_cleaned_ts': 'pointsAwardedDate',
                                'purchaseDate_cleaned_ts': 'purchaseDate'
                                }, inplace=True)
df_receipts_load

Unnamed: 0,id,bonusPointsEarned,bonusPointsEarnedReason,createDate,dateScanned,finishedDate,modifyDate,pointsAwardedDate,pointsEarned,purchaseDate,purchasedItemCount,rewardsReceiptItemList,rewardsReceiptStatus,totalSpent,userId
0,5ff1e1eb0a720f0523000575,500.0,"Receipt number 2 completed, bonus point schedule DEFAULT (5cefdcacf3693e0b50e83a36)",2021-01-03 15:25:31.000,2021-01-03 15:25:31.000,2021-01-03 15:25:31,2021-01-03 15:25:36.000,2021-01-03 15:25:31,500.0,2021-01-03 00:00:00,5.0,"[{'barcode': '4011', 'description': 'ITEM NOT FOUND', 'finalPrice': '26.00', 'itemPrice': '26.00', 'needsFetchReview': False, 'partnerItemId': '1', 'preventTargetGapPoints': True, 'quantityPurchased': 5, 'userFlaggedBarcode': '4011', 'userFlaggedNewItem': True, 'userFlaggedPrice': '26.00', 'userFlaggedQuantity': 5}]",FINISHED,26.00,5ff1e1eacfcf6c399c274ae6
1,5ff1e1bb0a720f052300056b,150.0,"Receipt number 5 completed, bonus point schedule DEFAULT (5cefdcacf3693e0b50e83a36)",2021-01-03 15:24:43.000,2021-01-03 15:24:43.000,2021-01-03 15:24:43,2021-01-03 15:24:48.000,2021-01-03 15:24:43,150.0,2021-01-02 15:24:43,2.0,"[{'barcode': '4011', 'description': 'ITEM NOT FOUND', 'finalPrice': '1', 'itemPrice': '1', 'partnerItemId': '1', 'quantityPurchased': 1}, {'barcode': '028400642255', 'description': 'DORITOS TORTILLA CHIP SPICY SWEET CHILI REDUCED FAT BAG 1 OZ', 'finalPrice': '10.00', 'itemPrice': '10.00', 'needsFetchReview': True, 'needsFetchReviewReason': 'USER_FLAGGED', 'partnerItemId': '2', 'pointsNotAwardedReason': 'Action not allowed for user and CPG', 'pointsPayerId': '5332f5fbe4b03c9a25efd0ba', 'preventTargetGapPoints': True, 'quantityPurchased': 1, 'rewardsGroup': 'DORITOS SPICY SWEET CHILI SINGLE SERVE', 'rewardsProductPartnerId': '5332f5fbe4b03c9a25efd0ba', 'userFlaggedBarcode': '028400642255', 'userFlaggedDescription': 'DORITOS TORTILLA CHIP SPICY SWEET CHILI REDUCED FAT BAG 1 OZ', 'userFlaggedNewItem': True, 'userFlaggedPrice': '10.00', 'userFlaggedQuantity': 1}]",FINISHED,11.00,5ff1e194b6a9d73a3a9f1052
2,5ff1e1f10a720f052300057a,5.0,All-receipts receipt bonus,2021-01-03 15:25:37.000,2021-01-03 15:25:37.000,NaT,2021-01-03 15:25:42.000,NaT,5.0,2021-01-03 00:00:00,1.0,"[{'needsFetchReview': False, 'partnerItemId': '1', 'preventTargetGapPoints': True, 'userFlaggedBarcode': '4011', 'userFlaggedNewItem': True, 'userFlaggedPrice': '26.00', 'userFlaggedQuantity': 3}]",REJECTED,10.00,5ff1e1f1cfcf6c399c274b0b
3,5ff1e1ee0a7214ada100056f,5.0,All-receipts receipt bonus,2021-01-03 15:25:34.000,2021-01-03 15:25:34.000,2021-01-03 15:25:34,2021-01-03 15:25:39.000,2021-01-03 15:25:34,5.0,2021-01-03 00:00:00,4.0,"[{'barcode': '4011', 'description': 'ITEM NOT FOUND', 'finalPrice': '28.00', 'itemPrice': '28.00', 'needsFetchReview': False, 'partnerItemId': '1', 'preventTargetGapPoints': True, 'quantityPurchased': 4, 'userFlaggedBarcode': '4011', 'userFlaggedNewItem': True, 'userFlaggedPrice': '28.00', 'userFlaggedQuantity': 4}]",FINISHED,28.00,5ff1e1eacfcf6c399c274ae6
4,5ff1e1d20a7214ada1000561,5.0,All-receipts receipt bonus,2021-01-03 15:25:06.000,2021-01-03 15:25:06.000,2021-01-03 15:25:11,2021-01-03 15:25:11.000,2021-01-03 15:25:06,5.0,2021-01-02 15:25:06,2.0,"[{'barcode': '4011', 'description': 'ITEM NOT FOUND', 'finalPrice': '1', 'itemPrice': '1', 'partnerItemId': '1', 'quantityPurchased': 1}, {'barcode': '1234', 'finalPrice': '2.56', 'itemPrice': '2.56', 'needsFetchReview': True, 'needsFetchReviewReason': 'USER_FLAGGED', 'partnerItemId': '2', 'preventTargetGapPoints': True, 'quantityPurchased': 3, 'userFlaggedBarcode': '1234', 'userFlaggedDescription': '', 'userFlaggedNewItem': True, 'userFlaggedPrice': '2.56', 'userFlaggedQuantity': 3}]",FINISHED,1.00,5ff1e194b6a9d73a3a9f1052
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1114,603cc0630a720fde100003e6,25.0,COMPLETE_NONPARTNER_RECEIPT,2021-03-01 10:22:27.000,2021-03-01 10:22:27.000,NaT,2021-03-01 10:22:28.000,NaT,25.0,2020-08-17 00:00:00,2.0,"[{'barcode': 'B076FJ92M4', 'description': 'mueller austria hypergrind precision electric spice/coffee grinder millwith large grinding capacity and hd motor also for spices, herbs, nuts,grains, white', 'discountedItemPrice': '22.97', 'finalPrice': '22.97', 'itemPrice': '22.97', 'originalReceiptItemText': 'mueller austria hypergrind precision electric spice/coffee grinder millwith large grinding capacity and hd motor also for spices, herbs, nuts,grains, white', 'partnerItemId': '0', 'priceAfterCoupon': '22.97', 'quantityPurchased': 1}, {'barcode': 'B07BRRLSVC', 'description': 'thindust summer face mask - sun protection neck gaiter for outdooractivities', 'discountedItemPrice': '11.99', 'finalPrice': '11.99', 'itemPrice': '11.99', 'originalReceiptItemText': 'thindust summer face mask - sun protection neck gaiter for outdooractivities', 'partnerItemId': '1', 'priceAfterCoupon': '11.99', 'quantityPurchased': 1}]",REJECTED,34.96,5fc961c3b8cfca11a077dd33
1115,603d0b710a720fde1000042a,,,2021-03-01 15:42:41.873,2021-03-01 15:42:41.873,NaT,2021-03-01 15:42:41.873,NaT,,NaT,,,SUBMITTED,,5fc961c3b8cfca11a077dd33
1116,603cf5290a720fde10000413,,,2021-03-01 14:07:37.664,2021-03-01 14:07:37.664,NaT,2021-03-01 14:07:37.664,NaT,,NaT,,,SUBMITTED,,5fc961c3b8cfca11a077dd33
1117,603ce7100a7217c72c000405,25.0,COMPLETE_NONPARTNER_RECEIPT,2021-03-01 13:07:28.000,2021-03-01 13:07:28.000,NaT,2021-03-01 13:07:29.000,NaT,25.0,2020-08-17 00:00:00,2.0,"[{'barcode': 'B076FJ92M4', 'description': 'mueller austria hypergrind precision electric spice/coffee grinder millwith large grinding capacity and hd motor also for spices, herbs, nuts,grains, white', 'discountedItemPrice': '22.97', 'finalPrice': '22.97', 'itemPrice': '22.97', 'originalReceiptItemText': 'mueller austria hypergrind precision electric spice/coffee grinder millwith large grinding capacity and hd motor also for spices, herbs, nuts,grains, white', 'partnerItemId': '0', 'priceAfterCoupon': '22.97', 'quantityPurchased': 1}, {'barcode': 'B07BRRLSVC', 'description': 'thindust summer face mask - sun protection neck gaiter for outdooractivities', 'discountedItemPrice': '11.99', 'finalPrice': '11.99', 'itemPrice': '11.99', 'originalReceiptItemText': 'thindust summer face mask - sun protection neck gaiter for outdooractivities', 'partnerItemId': '1', 'priceAfterCoupon': '11.99', 'quantityPurchased': 1}]",REJECTED,34.96,5fc961c3b8cfca11a077dd33


In [220]:
# convert to str - causing an error when loading to SQLite - InterfaceError: Error binding parameter 11 - probably unsupported type.
df_receipts_load.rewardsReceiptItemList = df_receipts_load.rewardsReceiptItemList.astype(str)

In [221]:
# create receipts table
conn = sqlite3.connect(db_path)
c = conn.cursor()

c.execute('DROP TABLE IF EXISTS receipts')

c.execute("""CREATE TABLE IF NOT EXISTS receipts (
        id uuid PRIMARY KEY,
        'bonusPointsEarned' numeric,
        'bonusPointsEarnedReason' text,
        'createDate' timestamp,
        'dateScanned' timestamp,
        'finishedDate' timestamp,
        'modifyDate' timestamp,
        'pointsAwardedDate' timestamp,
        'pointsEarned' numeric,
        'purchaseDate' timestamp,
        'purchasedItemCount' numeric,
        'rewardsReceiptItemList' text,
        'rewardsReceiptStatus' text,
        'totalSpent' numeric,
        'userId' text    
    )""")

df_receipts_load.to_sql('receipts', conn, if_exists='append', index=False)


conn.commit()
conn.close()

In [222]:
df_receipt_items.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3240 entries, 0 to 3239
Data columns (total 10 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   receipt_id              3240 non-null   object 
 1   item_index              3240 non-null   int64  
 2   barcode                 3090 non-null   object 
 3   userFlaggedBarcode      337 non-null    object 
 4   description             2859 non-null   object 
 5   userFlaggedDescription  205 non-null    object 
 6   finalPrice              3066 non-null   object 
 7   userFlaggedPrice        299 non-null    object 
 8   quantityPurchased       3066 non-null   float64
 9   userFlaggedQuantity     299 non-null    float64
dtypes: float64(2), int64(1), object(7)
memory usage: 253.2+ KB


In [223]:
# df_receipt_items already has all the columns I want
df_receipt_items_load = df_receipt_items

In [216]:
# create receipt_items table
conn = sqlite3.connect(db_path)
c = conn.cursor()

c.execute('DROP TABLE IF EXISTS receipt_items')

c.execute("""CREATE TABLE IF NOT EXISTS receipt_items (
        receipt_id text,
        item_index numeric,
        barcode text,
        userFlaggedBarcode text,
        description text,
        userFlaggedDescription text,
        finalPrice numeric,
        userFlaggedPrice numeric,
        quantityPurchased numeric,
        userFlaggedQuantity numeric        
    )""")

df_receipt_items_load.to_sql('receipt_items', conn, if_exists='append', index=False)


conn.commit()
conn.close()

In [217]:
# scratch pad to quick check tables are correct

table_name = 'brands'
# table_name = 'users'
# table_name = 'receipts'
# table_name = 'receipt_items'

# query=f"""
# pragma table_info({table_name})
# """

query=f"""
    select 
        * 
    from 
        {table_name};
"""

# query=f"""
#     select 
#          *
#         -- count(*) 
#     from 
#         {table_name}
#     where
#         finishedDate is Null;
# """

conn = sqlite3.connect(db_path)
c = conn.cursor()

df = pd.read_sql_query(query,conn)

conn.commit()
conn.close()

df

Unnamed: 0,id,barcode,dupe_barcode,category,categoryCode,cpg,name,topBrand,brandCode
0,601ac115be37ce2ead437551,511111019862,,Baking,BAKING,601ac114be37ce2ead437550,test brand @1612366101024,0.0,
1,601c5460be37ce2ead43755f,511111519928,,Beverages,BEVERAGES,5332f5fbe4b03c9a25efd0ba,Starbucks,0.0,STARBUCKS
2,601ac142be37ce2ead43755d,511111819905,,Baking,BAKING,601ac142be37ce2ead437559,test brand @1612366146176,0.0,TEST BRANDCODE @1612366146176
3,601ac142be37ce2ead43755a,511111519874,,Baking,BAKING,601ac142be37ce2ead437559,test brand @1612366146051,0.0,TEST BRANDCODE @1612366146051
4,601ac142be37ce2ead43755e,511111319917,,Candy & Sweets,CANDY_AND_SWEETS,5332fa12e4b03c9a25efd1e7,test brand @1612366146827,0.0,TEST BRANDCODE @1612366146827
...,...,...,...,...,...,...,...,...,...
1162,5f77274dbe37ce6b592e90c0,511111116752,,Baking,BAKING,5f77274dbe37ce6b592e90bf,test brand @1601644365844,,
1163,5dc1fca91dda2c0ad7da64ae,511111706328,,Breakfast & Cereal,,53e10d6368abd3c7065097cc,Dippin Dots® Cereal,,DIPPIN DOTS CEREAL
1164,5f494c6e04db711dd8fe87e7,511111416173,,Candy & Sweets,CANDY_AND_SWEETS,5332fa12e4b03c9a25efd1e7,test brand @1598639215217,,TEST BRANDCODE @1598639215217
1165,5a021611e4b00efe02b02a57,511111400608,,Grocery,,5332f5f6e4b03c9a25efd0b4,LIPTON TEA Leaves,0.0,LIPTON TEA Leaves


### fetch.db schema

![fetch_db_erd-2.jpg](attachment:fetch_db_erd-2.jpg)