# Shift 4 Shop - API Scrape of all products

In the code below, the Strobes N More website API calls were performed to gain access to the API for all products on the website. 

# Pandas

The pandas library is imported here for dataframe manipulation

In [1]:
import pandas as pd
pd.set_option('display.max_rows', None)
pd.set_option("display.max_colwidth", 100)

# API call - First 200 products

This Python code snippet appears to be making an HTTP request to an API endpoint using the urllib.request module to fetch product data from an e-commerce platform 3dcart. 


This code sends an HTTP GET request to the specified API endpoint ('https://apirest.3dcart.com/3dCartWebAPI/v2/Products?limit=200') with headers containing authentication credentials. It then reads the response, decodes it from bytes to a UTF-8 encoded string, and parses the JSON data into Python objects (a list of dictionaries representing product information). It then appends this list of products to the product_lists list.

In [2]:
# Import necessary modules
from urllib.request import Request, urlopen
import json

# Initialize an empty list to store product lists
product_lists = []

# Define headers required for the API request
headers = {
    'Content-Type': 'application/json',
    'Accept': 'application/json',
    'SecureURL': 'https://www.strobesnmore.com',  # Assuming this is the URL of the API endpoint
    'PrivateKey': 'mykey',  # Private key for authentication
    'Token': 'mytoken'  # Token for authentication
}

# Construct the request object with the specified URL and headers
request = Request('https://apirest.3dcart.com/3dCartWebAPI/v2/Products?limit=200', headers=headers)

# Open the URL and read the response
body_1 = urlopen(request).read()

# Decode the response bytes to a string using UTF-8 encoding
response_str = body_1.decode('utf-8')

# Load the JSON string into a Python list of dictionaries
product_list = json.loads(response_str)

# Append the retrieved product list to the main product_lists list
product_lists.append(product_list)

In [3]:
len(product_lists[0])

200

This code iterates over the list of numbers, each representing an offset for paginated API requests. For each number, it constructs a URL with the appropriate offset, sends an HTTP GET request with the specified headers, reads the response, decodes it from bytes to a UTF-8 encoded string, parses the JSON data into Python objects (presumably a list of dictionaries representing product information), prints the length of the retrieved product list, and finally appends it to the product_lists list.

In [4]:
# Import necessary modules
from urllib.request import Request, urlopen
import json

# Initialize a list of numbers to be used as offsets for paginated API requests
numbered_list = [200, 400, 600, 800, 1000, 1200]

# Define headers required for the API request
headers = {
    'Content-Type': 'application/json',
    'Accept': 'application/json',
    'SecureURL': 'https://www.strobesnmore.com',  # Assuming this is the URL of the API endpoint
    'PrivateKey': 'mykey',  # Private key for authentication
    'Token': 'mytoken'  # Token for authentication
}

# Iterate over the list of numbers
for number in numbered_list:
    # Construct the request URL with the specified offset
    request = Request(f'https://apirest.3dcart.com/3dCartWebAPI/v2/Products?limit=200&offset={number}', headers=headers)
    
    # Open the URL and read the response
    response_body = urlopen(request).read()
    
    # Decode the response body from bytes to a string using UTF-8 encoding
    response_str = response_body.decode('utf-8')
    
    # Load the JSON string into a Python list of dictionaries
    product_list = json.loads(response_str)
    
    # Print the length of the retrieved product list
    print(f"List_{number} has length: ", len(product_list))
    
    # Append the retrieved product list to the main product_lists list
    product_lists.append(product_list)

List_200 has length:  200
List_400 has length:  200
List_600 has length:  200
List_800 has length:  200
List_1000 has length:  200
List_1200 has length:  93


## Verification of Data: 
Printing the length of each product_list allows you to verify that you received the expected number of products from each API request. It helps ensure that the data retrieval process is working as expected.

In [5]:
for product_list in product_lists:
    print(len(product_list))

200
200
200
200
200
200
93


Description:

- Initialization: The code starts by initializing an empty list named master_list, which will be used to store all the products from the individual product_lists.

- Nested Loop: It then iterates over each product_list in the product_lists list using the outer loop. Within this loop, there is another loop that iterates over each product in the current product_list.

- Combining Products: For each product in each product_list, the code appends the product to the master_list. This effectively combines all the individual product lists into a single list containing all the products.

- Total Count: After iterating over all the product lists and appending all the products to the master_list, the code prints the total number of products in the master_list using the len() function.

- This code snippet essentially flattens the list of lists (product_lists) into a single list (master_list) containing all the products from the individual lists. This can be useful for further processing or analysis where a single list of products is required.

In [6]:
master_list = []

for product_list in product_lists:
    for product in product_list:
        master_list.append(product)
print(len(master_list))

1293


# Split Between Option and No Option

Description:

- Initialization: The code initializes two empty lists: Options and No_Options. These lists will be used to categorize products based on whether they have options or not.

- Loop through Products: It then iterates over each product in the master_list, which contains all the products combined from the product_lists.

- Check for Options: Within the loop, it checks whether each product has options or not. This is done by examining the length of the OptionSetList within each product dictionary.

- Categorization: If a product has options (i.e., if the length of its OptionSetList is not zero), it is appended to the Options list. Otherwise, if the product has no options, it is appended to the No_Options list.

- Result: At the end of the loop, the Options list will contain all products that have options, while the No_Options list will contain all products that do not have any options.

In [7]:
# Initialize empty lists to store products with options and products without options
Options = []
No_Options = []

# Iterate over each product in the master list
for product in master_list:
    # Check if the product has any options by inspecting the length of its OptionSetList
    if len(product['OptionSetList']) != 0:
        # If the product has options, append it to the Options list
        Options.append(product)
    else:
        # If the product has no options, append it to the No_Options list
        No_Options.append(product)
        

In [8]:
len(Options)

516

In [9]:
len(No_Options)

777

In [10]:
len(Options) + len(No_Options) == 1293

True

## No Options Dataframe

- Initialization: Several empty lists are initialized to store specific attributes of products that do not have options. These attributes include CatalogID, SKU, Name, whether the product is for sale (NotForSale), and whether it is hidden (Hide).

- Extraction of Attributes: The code iterates over each product in the No_Options list, extracting the relevant attributes from each product dictionary. These attributes are then appended to their respective lists.

- Combination of Attributes: After extracting the attributes for all products without options, the lists are combined into a dictionary named Products_No_Options. Each key in the dictionary represents an attribute (e.g., 'CatalogID', 'SKU'), and the corresponding value is a list containing the values of that attribute for all products.

- Optional Output: Optionally, the extracted attributes can be printed to inspect their values. Additionally, the dictionary can be converted into a DataFrame (assuming Pandas is imported) and displayed for further analysis, showing the first 100 entries.

In [11]:
Products_No_Options = pd.DataFrame(columns = ['CatalogID','SKU','Name','Hide','NotForSale'])

In [12]:
# Initialize empty lists to store specific attributes of products without options
catalogid_list_noOptions = []
SKU_list_noOptions = []
Name_list_noOptions = []
sale_noOptions = []
hide_noOptions = []

# Iterate over each product in the No_Options list
for product in No_Options:
    # Extract relevant attributes from each product
    forsale = product['NotForSale']  # Retrieve the 'NotForSale' attribute
    hide = product['Hide']  # Retrieve the 'Hide' attribute
    catalogid = product['SKUInfo']['CatalogID']  # Retrieve the CatalogID from the SKUInfo dictionary
    SKU = product['SKUInfo']['SKU']  # Retrieve the SKU from the SKUInfo dictionary
    Name = product['SKUInfo']['Name']  # Retrieve the Name from the SKUInfo dictionary
    
    # Append extracted attributes to respective lists
    catalogid_list_noOptions.append(catalogid)
    SKU_list_noOptions.append(SKU)
    Name_list_noOptions.append(Name)
    sale_noOptions.append(forsale)
    hide_noOptions.append(hide)

# Combine extracted attributes into a DataFrame or dictionary
Products_No_Options = {
    'CatalogID': catalogid_list_noOptions,
    'SKU': SKU_list_noOptions,
    'Name': Name_list_noOptions,
    'Hide': hide_noOptions,
    'NotForSale': sale_noOptions
}
# Convert the dictionary into a DataFrame
Products_No_Options_df = pd.DataFrame(Products_No_Options)

# Display the extracted attributes (optional)
# print(catalogid_list_noOptions)
# print(SKU_list_noOptions)
# print(Name_list_noOptions)

# Optionally, convert the dictionary into a DataFrame and display the first 100 entries
Products_No_Options_df.head(100)

Unnamed: 0,CatalogID,SKU,Name,Hide,NotForSale
0,3465,11.1005SF,Sho Me Universal Strobe Style LED Flasher,False,False
1,3466,11.1032,Sho Me Micro Switch with Built In LED Flasher,False,False
2,3478,295SL100,Whelen Full Function Hands Free Siren and Heavy Duty Microphone,False,False
3,3480,30.0215P,Sho Me Concealed Scoop 100 Watt Speaker,False,False
4,3481,30.2104,Sho Me Four Function Undercover Siren,False,False
5,3491,6ELTUBE,Whelen 6ELTUBE Strobe Tube,False,False
6,3494,7ELTUBE,Whelen 700 Series Linear Strobe Tube/Reflector Assembly,False,False
7,3497,9UTUBE,Whelen Edge 9M/Ultra Corner Tube/Reflector,False,False
8,3499,ALPHA12S,Whelen Alpha Hands Free Siren,False,False
9,3501,AMP-3PK,Standard AMP Pin & Connector Kit Male,False,False


In [13]:
len(Products_No_Options_df)

777

# Options Dataframe

- Initialization: Two empty lists df_list and df_list_2 are initialized to store DataFrames. df_list is for products with options, and df_list_2 is for products without options.

- Iterating over Products: The code iterates over each product in the Options list, extracting common attributes like catalog ID, SKU, name, whether it's for sale (NotForSale), and whether it's hidden (Hide).

- Option Extraction: For each product, it extracts option data if any, creating a DataFrame options_df to store this information.

- Concatenation: If options exist for a product, it concatenates the product DataFrame (product_df) with the options DataFrame - (options_df) and appends it to df_list. Otherwise, it appends the product DataFrame to df_list_2.

- Concatenating DataFrames: After iterating over all products, if df_list is not empty, it concatenates all DataFrames within - df_list into a single DataFrame Options_df. It also concatenates DataFrames in df_list_2 into Options_df_2.

- Data Type Conversion: It converts the 'Hidden' column to integer type for both Options_df and Options_df_2.

- Output: Finally, it prints the first 100 entries of the final DataFrame Options_df containing products with options, or prints a message if df_list is empty indicating no data to concatenate.

In [14]:
df_list = []
df_list_2 = []

for product in Options:
    catalog_id = product['SKUInfo']['CatalogID']
    sku = product['SKUInfo']['SKU']
    name = product['SKUInfo']['Name']
    forsale = product['NotForSale']
    not_on_site = product['Hide']
    
    # Initialize product_df with default 'Hidden' value
    product_df = pd.DataFrame({
        'CatalogID': [catalog_id],
        'SKU': [sku],
        'Name': [name],
        'NotForSale': [forsale],
        'Hide': [not_on_site],  # Add Hide column with default value
    })

    option_data = []
    for option in product['OptionSetList']:
        option_set_id = option['OptionSetID']
        option_sorting = option['OptionSorting']
        option_list = option['OptionList']
        hidden_options = [option['OptionHide'] for option in option_list]
        option_names = [option['OptionName'] for option in option_list]
        partnumbers = [option['OptionPartNumber'] for option in option_list]

        option_data.extend(list(zip(option_names, partnumbers, [option_set_id] * len(option_list), [option_sorting] * len(option_list), hidden_options)))

    options_df = pd.DataFrame(option_data, columns=['OptionName', 'OptionPartNumber', 'OptionSetID', 'OptionSorting', 'Hidden'])

    if not options_df.empty:
        product_df = pd.concat([product_df] * len(options_df), ignore_index=True)
        result_df = pd.concat([product_df, options_df], axis=1)
        df_list.append(result_df)
    else:
        df_list_2.append(product_df)

# Concatenate all data frames in df_list if it is not empty
if df_list:
    Options_df = pd.concat(df_list, ignore_index=True)
    Options_df_2 = pd.concat(df_list_2, ignore_index=True)
    Options_df['Hidden'] = Options_df['Hidden'].astype(int)
    Options_df_2['Hidden'] = Options_df_2['Hide'].astype(int)  # Use 'Hide' column for Options_df_2
    # Print the final DataFrame
    Options_df.head(100)
else:
    print("No data to concatenate.")


In [15]:
Options_df.head()

Unnamed: 0,CatalogID,SKU,Name,NotForSale,Hide,OptionName,OptionPartNumber,OptionSetID,OptionSorting,Hidden
0,3457,RB6T,Whelen Dual Reflector Rota Beam Beacon,False,False,Blue,BP,2451,0,0
1,3457,RB6T,Whelen Dual Reflector Rota Beam Beacon,False,False,Clear,CP,2451,0,0
2,3457,RB6T,Whelen Dual Reflector Rota Beam Beacon,False,False,Red,RP,2451,0,1
3,3457,RB6T,Whelen Dual Reflector Rota Beam Beacon,False,False,Amber,AP,2451,0,1
4,3474,11.1002,Sho Me Universal On/Off/Flash Switch,False,False,Positive Momentary,.PM,4114,1,0


In [16]:
len(Options_df['CatalogID'].unique())

509

In [17]:
len(Options_df)

4389

In [18]:
Options_df['Hide'].unique()

array([False,  True])

# Filtering by products on the website

In [19]:
Products_Options_onWebsite = Options_df[Options_df['Hide'] == False]

In [20]:
len(Products_Options_onWebsite)

3955

In [21]:
Products_Options_onWebsite.head()

Unnamed: 0,CatalogID,SKU,Name,NotForSale,Hide,OptionName,OptionPartNumber,OptionSetID,OptionSorting,Hidden
0,3457,RB6T,Whelen Dual Reflector Rota Beam Beacon,False,False,Blue,BP,2451,0,0
1,3457,RB6T,Whelen Dual Reflector Rota Beam Beacon,False,False,Clear,CP,2451,0,0
2,3457,RB6T,Whelen Dual Reflector Rota Beam Beacon,False,False,Red,RP,2451,0,1
3,3457,RB6T,Whelen Dual Reflector Rota Beam Beacon,False,False,Amber,AP,2451,0,1
4,3474,11.1002,Sho Me Universal On/Off/Flash Switch,False,False,Positive Momentary,.PM,4114,1,0


# No Options Products

Here it was discovered that due to string complications these 6 products did not make it to the no options dataframe. These were added to the master no options list.

The code below added this short list to all the products without options. 

In [22]:
len(Options_df_2['CatalogID'].unique())

6

In [23]:
Options_df_2 = Options_df_2[['CatalogID','SKU','Name','Hide','NotForSale']]

In [24]:
Options_df_2

Unnamed: 0,CatalogID,SKU,Name,Hide,NotForSale
0,3951,JF0BAAAA,Whelen Towman's Justice Lightbar,False,False
1,6985,GARAGE-MB1,Federal Signal MB1 Message Bar,True,True
2,7001,R-4600,"Feniex Reverse Lux 6x4"" LED",False,False
3,7002,R-3700,"Feniex Reverse Lux 7x3"" LED",False,False
4,7003,R-7900,"Feniex Reverse Lux 9x7"" LED",False,False
5,7004,BTT-3XB,Feniex Backup/Brake/Tail Turn/Reverse LUX LED,False,False


In [25]:
AllSKUs_No_Options = pd.concat([Products_No_Options_df,Options_df_2])
AllSKUs_No_Options = AllSKUs_No_Options[['CatalogID','SKU','Name','Hide','NotForSale']]

In [26]:
AllSKUs_No_Options.to_csv('SKUs_No_Options.csv')

In [27]:
SKUs_No_Options = pd.read_csv('SKUs_No_Options.csv')
SKUs_No_Options.head(20)
# Options_df_2 = Options_df_2[['CatalogID','SKU','Name']]

Unnamed: 0.1,Unnamed: 0,CatalogID,SKU,Name,Hide,NotForSale
0,0,3465,11.1005SF,Sho Me Universal Strobe Style LED Flasher,False,False
1,1,3466,11.1032,Sho Me Micro Switch with Built In LED Flasher,False,False
2,2,3478,295SL100,Whelen Full Function Hands Free Siren and Heavy Duty Microphone,False,False
3,3,3480,30.0215P,Sho Me Concealed Scoop 100 Watt Speaker,False,False
4,4,3481,30.2104,Sho Me Four Function Undercover Siren,False,False
5,5,3491,6ELTUBE,Whelen 6ELTUBE Strobe Tube,False,False
6,6,3494,7ELTUBE,Whelen 700 Series Linear Strobe Tube/Reflector Assembly,False,False
7,7,3497,9UTUBE,Whelen Edge 9M/Ultra Corner Tube/Reflector,False,False
8,8,3499,ALPHA12S,Whelen Alpha Hands Free Siren,False,False
9,9,3501,AMP-3PK,Standard AMP Pin & Connector Kit Male,False,False


In [28]:
len(AllSKUs_No_Options['CatalogID'].unique())

783

# Filtering productss dataframe to a max of 8 options

After initial delivery, client agreed that we want up to 8 options for the products. 

In [29]:
import pandas as pd

# Assuming df_options_ is your DataFrame
# Replace 'catalogid_column_name' with the actual column name containing catalogid
# Replace 'sorting_column_name' with the actual column name containing sorting values

catalogid_column_name = 'CatalogID'
sorting_column_name = 'OptionSorting'

# Create variables to store the catalogids with up to 8 unique values
catalogids_with_up_to_8_unique_options = []

# Iterate through unique catalogids in the DataFrame
for catalogid in Products_Options_onWebsite[catalogid_column_name].unique():
    # Filter DataFrame for the current catalogid
    catalogid_data = Products_Options_onWebsite[Products_Options_onWebsite[catalogid_column_name] == catalogid]

    # Count the number of unique sorting values
    unique_sorting_count = len(catalogid_data[sorting_column_name].unique())

    # Check if the current catalogid has up to 8 unique values
    if unique_sorting_count <= 8:
        catalogids_with_up_to_8_unique_options.append(catalogid)

# Filter the original DataFrame based on the selected catalogids
filtered_df = Products_Options_onWebsite[Products_Options_onWebsite[catalogid_column_name].isin(catalogids_with_up_to_8_unique_options)]


In [30]:
len(filtered_df)

3734

In [31]:
filtered_df.head(20)

Unnamed: 0,CatalogID,SKU,Name,NotForSale,Hide,OptionName,OptionPartNumber,OptionSetID,OptionSorting,Hidden
0,3457,RB6T,Whelen Dual Reflector Rota Beam Beacon,False,False,Blue,BP,2451,0,0
1,3457,RB6T,Whelen Dual Reflector Rota Beam Beacon,False,False,Clear,CP,2451,0,0
2,3457,RB6T,Whelen Dual Reflector Rota Beam Beacon,False,False,Red,RP,2451,0,1
3,3457,RB6T,Whelen Dual Reflector Rota Beam Beacon,False,False,Amber,AP,2451,0,1
4,3474,11.1002,Sho Me Universal On/Off/Flash Switch,False,False,Positive Momentary,.PM,4114,1,0
5,3474,11.1002,Sho Me Universal On/Off/Flash Switch,False,False,Negative/Ground Momentary,.GSM,4114,1,0
6,3475,11.120,Sho Me Low Profile LED Mini Lightbar,False,False,Magnetic,0.008,2457,0,0
7,3475,11.120,Sho Me Low Profile LED Mini Lightbar,False,False,Permanent,0,2457,0,0
8,3475,11.120,Sho Me Low Profile LED Mini Lightbar,False,False,Blue,-BB,2458,1,0
9,3475,11.120,Sho Me Low Profile LED Mini Lightbar,False,False,Red/Blue,-RB,2458,1,0


## Exporting Options Df to CSV

In [32]:
filtered_df.to_csv('All_Options.csv')

# Filter by more than 8

Here we are getting the products on the website with more than 8 options for inventory purposes.

In [33]:
import pandas as pd

# Assuming df_options_ is your DataFrame
# Replace 'catalogid_column_name' with the actual column name containing catalogid
# Replace 'sorting_column_name' with the actual column name containing sorting values

catalogid_column_name = 'CatalogID'
sorting_column_name = 'OptionSorting'

# Create variables to store the catalogids with up to 8 unique values
catalogids_with_up_to_8_unique_options = []

# Iterate through unique catalogids in the DataFrame
for catalogid in Products_Options_onWebsite[catalogid_column_name].unique():
    # Filter DataFrame for the current catalogid
    catalogid_data = Products_Options_onWebsite[Products_Options_onWebsite[catalogid_column_name] == catalogid]

    # Count the number of unique sorting values
    unique_sorting_count = len(catalogid_data[sorting_column_name].unique())

    # Check if the current catalogid has up to 8 unique values
    if unique_sorting_count > 8:
        catalogids_with_up_to_8_unique_options.append(catalogid)

# Filter the original DataFrame based on the selected catalogids
filtered_over_8 = Products_Options_onWebsite[Products_Options_onWebsite[catalogid_column_name].isin(catalogids_with_up_to_8_unique_options)]

# Print or use the filtered_df as needed
filtered_over_8.head(100)

Unnamed: 0,CatalogID,SKU,Name,NotForSale,Hide,OptionName,OptionPartNumber,OptionSetID,OptionSorting,Hidden
311,3817,970L-4908,"Tomar 970L Scorpion 49"" Lightbar",False,False,Blue,-B,2861,1,0
312,3817,970L-4908,"Tomar 970L Scorpion 49"" Lightbar",False,False,Red,-R,2861,1,0
313,3817,970L-4908,"Tomar 970L Scorpion 49"" Lightbar",False,False,Amber,-A,2861,1,0
314,3817,970L-4908,"Tomar 970L Scorpion 49"" Lightbar",False,False,Red,-R,2862,2,0
315,3817,970L-4908,"Tomar 970L Scorpion 49"" Lightbar",False,False,Amber,-A,2862,2,0
316,3817,970L-4908,"Tomar 970L Scorpion 49"" Lightbar",False,False,Blue,-B,2862,2,0
317,3817,970L-4908,"Tomar 970L Scorpion 49"" Lightbar",False,False,Blue,-B,2863,3,0
318,3817,970L-4908,"Tomar 970L Scorpion 49"" Lightbar",False,False,Red/Blue,-RB,2863,3,0
319,3817,970L-4908,"Tomar 970L Scorpion 49"" Lightbar",False,False,White,-W,2863,3,0
320,3817,970L-4908,"Tomar 970L Scorpion 49"" Lightbar",False,False,,-NONE,2863,3,0


In [34]:
filtered_over_8 = filtered_over_8[['CatalogID','SKU','Name','OptionName','OptionPartNumber']]
filtered_over_8.to_csv('SKUs_OverEightOPtions_onWebsite.csv')

# Just in case - No Options Products on Website

Here another CSV is exported as a list of the products with no options that are just on the website. Client requested all products without options, this is an extra step so that they do not need to check if it is or is not on the website. 

In [35]:
AllSKUs_No_Options_onwebsite = AllSKUs_No_Options[AllSKUs_No_Options['Hide'] == False]

In [36]:
AllSKUs_No_Options_onwebsite.to_csv('SKUs_No_Options_OnWebsite.csv')