# Suction Cup Selector Project

This is the notebook for the suction cup selector project.

## Download the project

If you have git installed, you should be able to pull the project down from the repository using the command:

```bash
git clone <url>
```

Once you cloned the project, you can switch the branch using:

```bash
git checkout <branchname>
git checkout -b <yourownbranchname>
```

To confirm your change and add it to the git history, do:

```bash
git add .  #this is to stage all chagnes
git status #this is to confirm the change
git commit -m "commit message" #this is to finalize and commit the change
```

If you want to push your changes to repository, do:

```bash
git push -u origin <branchname>
```

To stage your changes for review, do pull request on Github


## Environment setup
For this project I choose to use Python 3.9 with Pandas/Jupyter library.

In [2]:
import sys
print(sys.version)

3.9.7 (default, Sep 16 2021, 08:50:36) 
[Clang 10.0.0 ]


To isolate the environment, here is how you create a virtual one for the project. Go to your project folder, open terminal/cmd and run the following command:

```bash
python3 -m venv myenv
```

To activate the environment, run the following command:
```bash
# for windows
myenv\Scripts\activate

# for Mac/Linux
source myenv/bin/activate
```

## Data Processing
The first step of the project is to process the flat files. There are multiple ways for doing data processing. Here we choose to use pandas library read_csv() method to load the csv files into pandas **dataframe** objects.

In [2]:
import pandas as pd

suctionCups = pd.read_csv('SuctionCups.csv')
graspTypes = pd.read_csv('GraspTypes.csv')
items = pd.read_csv('items.csv')
itemConfigs = pd.read_csv('itemConfigs.csv')

#### To see the type of the object, you can do type()


In [3]:
type(items)

pandas.core.frame.DataFrame

#### To inspect the data we just loaded, Here are some common methods:
* df.head(): Returns the first few rows of the DataFrame.
* df.tail(): Returns the last few rows of the DataFrame.
* df.shape: Returns the dimensions (rows, columns) of the DataFrame.
* df.info(): Provides information about the DataFrame, including column data types and missing values.
* df.describe(): Generates descriptive statistics of numerical columns, such as count, mean, min, max, etc.

#### items object overview

In [7]:
items.head()

Unnamed: 0,item_id,sku_no,unit_length,unit_width,unit_height,weight,item_description
0,12810,24287592.0,8.4,3.7,2.3,2.1,SILK PURE ALMOND UNSWT VAN
1,19327,1266017.0,5.5,5.5,8.4,1.47,NEO-GEL 48PC TUB BLU
2,24874,24529912.0,10.0,8.05,2.6,0.72,FULL SIZE HOT GLUE GUN
3,15205,565284.0,6.85,2.7,2.65,1.285,TAPE DISPENSER
4,13444,2610177.0,9.2,8.1,6.7,1.3,DESKTOP DRAWER SYSTEM SMALL


In [10]:
items.shape

(7622, 7)

In [12]:
items.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7622 entries, 0 to 7621
Data columns (total 7 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   item_id           7622 non-null   int64  
 1   sku_no            7114 non-null   float64
 2   unit_length       7622 non-null   float64
 3   unit_width        7622 non-null   float64
 4   unit_height       7622 non-null   float64
 5   weight            7622 non-null   float64
 6   item_description  7622 non-null   object 
dtypes: float64(5), int64(1), object(1)
memory usage: 417.0+ KB


In [13]:
items.describe()

Unnamed: 0,item_id,sku_no,unit_length,unit_width,unit_height,weight
count,7622.0,7114.0,7622.0,7622.0,7622.0,7622.0
mean,9634.173183,6313149.0,6.342267,4.147844,1.943269,0.552853
std,7048.288945,9914077.0,1.854861,1.580922,1.478202,0.701571
min,744.0,12203.0,0.1,0.2,0.0,0.0012
25%,4099.5,500045.2,5.1,3.0,0.9,0.125
50%,7782.5,831648.5,6.1,3.8,1.4,0.3
75%,13471.5,2735132.0,7.6,5.0,2.6,0.7
max,26945.0,24563190.0,12.8,11.1,10.1,9.2


#### graspTypes object overview

In [14]:
graspTypes.head()

Unnamed: 0,id,name,description
0,0,suction_only,suction only
1,1,default,suction + fingers
2,2,stabilized,stabilized grasp


In [16]:
graspTypes.shape

(3, 3)

In [17]:
graspTypes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   id           3 non-null      int64 
 1   name         3 non-null      object
 2   description  3 non-null      object
dtypes: int64(1), object(2)
memory usage: 200.0+ bytes


#### SuctionCups object overview

In [18]:
suctionCups.head(n=10)

Unnamed: 0,id,description,name,minDim,maxDim,maxWeight
0,0,any,any,0.0,0,0.0
1,1,small-25mm,swappable_vs_25_nr,0.25,5,0.8
2,2,medium,swappable_b3_bgi34,2.0,1000,1.9
3,3,large,swappable_vsa_63_nr,3.0,1000,6.6
4,4,bag,swappable_bgx_48,1.9,1000,2.42
5,5,small-18mm,swappable_vs_18_nr,0.18,5,0.8


In [19]:
suctionCups.shape

(6, 6)

In [20]:
suctionCups.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 6 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   id           6 non-null      int64  
 1   description  6 non-null      object 
 2   name         6 non-null      object 
 3   minDim       6 non-null      float64
 4   maxDim       6 non-null      int64  
 5   maxWeight    6 non-null      float64
dtypes: float64(2), int64(2), object(2)
memory usage: 416.0+ bytes


#### itemConfig object overview

In [21]:
itemConfigs.head(n=100)

Unnamed: 0,item_id,suction_cup_id,name,arm_config,name.1
0,823,5,swappable_vs_18_nr,1,default
1,763,0,any,1,default
2,7116,3,swappable_vsa_63_nr,1,default
3,766,0,any,1,default
4,767,0,any,0,suction_only
...,...,...,...,...,...
95,13424,3,swappable_vsa_63_nr,1,default
96,3763,4,swappable_bgx_48,2,stabilized
97,3926,4,swappable_bgx_48,1,default
98,4878,4,swappable_bgx_48,2,stabilized


In [22]:
itemConfigs.shape

(10819, 5)

In [23]:
itemConfigs.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10819 entries, 0 to 10818
Data columns (total 5 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   item_id         10819 non-null  int64 
 1   suction_cup_id  10819 non-null  int64 
 2   name            10819 non-null  object
 3   arm_config      10819 non-null  int64 
 4   name.1          10819 non-null  object
dtypes: int64(3), object(2)
memory usage: 422.7+ KB


In [26]:
itemConfigs['name'].value_counts()

name
swappable_bgx_48       3775
swappable_vs_25_nr     3696
swappable_vsa_63_nr    1910
any                     812
swappable_vs_18_nr      626
Name: count, dtype: int64

In [33]:
itemConfigs.groupby('item_id').filter(lambda x: len(x)>1).shape

(0, 5)

## Data Cleaning

After a preliminary data inspection, it is evident that the dataset contains incorrect data types and null values. Additionally, it is frequently observed that text fields contains noisy punctuations, such as quote and semicolons, which should be eliminated. Performing data cleaning is crucial at this stage to eradicate such records and ensure the cleanliness of your data.

### Item object
* Many item rows doesn't have SKU#
* Item ID should be a string since we are not going to do numeric manipulation on it.
* SKU # should be a string field without the trailing .0
* item_description is a text field. We probably want to take a deeper look
* len/wid/hgt doesn't quite fit our purpose. Making dim1, dim2 and dim3 in a asc/desc order makes more sense 

In [34]:
# drop null
items.dropna(subset=['sku_no'], inplace=True)

In [35]:
items.shape

(7114, 7)

In [45]:
# reformat sku_no field
items['sku_no'] = items['sku_no'].astype(str).str.rstrip('.0')

In [50]:
items.sample(5)

Unnamed: 0,item_id,sku_no,unit_length,unit_width,unit_height,weight,item_description
1105,5528,160156,9.6,2.8,1.1,0.4,HP 980 YELLOW INK CARTRIDGE
5860,11101,24420013,9.25,5.4,1.85,0.15,CW MF WET PAD 18X5 BLUE
6828,2371,904597,9.7,3.8,0.6,0.3,SIGN TURN OFF CELL PHONE
7330,4757,886241,4.9,4.4,2.3,0.15,EPSON 124 COLOR 3PK
4415,7623,387282,3.8,3.3,1.7,0.1,BANDAGE ADHESIVE FABRIC 3/4X3


In [57]:
# check punctuation
import string

print(string.punctuation)
mask = items['item_description'].str.contains(f"[{string.punctuation}]")

!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~


In [61]:
items[mask].sample(10)

Unnamed: 0,item_id,sku_no,unit_length,unit_width,unit_height,weight,item_description
1282,2872,618658,3.1,2.2,1.2,0.28,JUMBO NON-SKID PAPER CLIPS
2991,10528,848967,7.3,4.5,0.8,0.1,COMBAT ANT KILLING SYSTEM 6/BX
6012,3182,710705,8.8,8.8,1.2,0.6,ZEBRA Z-GRIP 0.7MM 24PK
3297,5517,812056,5.6,2.7,2.6,0.7,TAPE MAGIC 3/4X36 YD 6 PACK
1881,1494,863061,4.5,2.4,1.7,0.125,HP 61 TRI-COLOR INK
5927,10637,221101,6.0,3.0,1.3,0.3,ACROBALL HYBRID 1.0MM MED BLU
6043,1613,894633,5.5,3.0,3.0,0.5,SCOTCH MAGIC TAPE 3/4X800 6PK
2878,1800,681466,3.8,3.3,2.1,0.1,CANON PG-30 BLACK INK
2729,11163,91846,6.0,5.6,1.6,0.4,SCENTD OIL TWIN REFILL LAV/CAM
1227,11984,599993,7.7,5.0,3.5,1.1,RUBBERBAND #64 3.5X1/4IN 1# BX


In [70]:
# remove punctuation
unwantedChar = '\'"&'
for c in unwantedChar:
    items['item_description'] = items['item_description'].str.replace(c, '')

In [75]:
# reformat len/width/hgt to dim1/2/3
items['dim1'] = items[['unit_length', 'unit_width', 'unit_height']].apply(max, axis=1)
items['dim2'] = items[['unit_length', 'unit_width', 'unit_height']].apply(lambda x: sorted(x)[1], axis=1)
items['dim3'] = items[['unit_length', 'unit_width', 'unit_height']].apply(min, axis=1)

## Suction Cup Selection Logic
In this section, our main focus will be on developing the selection logic. The selection logic consists of a series of conditional statements with expandable rules. To ensure flexibility for future rule additions, we can leverage object-oriented programming (OOP) concepts. By adopting an OOP approach, we can easily incorporate new rules into the existing framework.

### Item Objects
The base of OOP is object. Pandas dataframe provides conveninent utilities for data manipulation, but it is not designed for OOP. For implementing selection logic, I would like to convert item to a object which is easier to access later on.

In [81]:
items

Unnamed: 0,item_id,sku_no,unit_length,unit_width,unit_height,weight,item_description,dim1,dim2,dim3
0,12810,24287592,8.40,3.70,2.30,2.100,SILK PURE ALMOND UNSWT VAN,8.40,3.70,2.30
1,19327,1266017,5.50,5.50,8.40,1.470,NEO-GEL 48PC TUB BLU,8.40,5.50,5.50
2,24874,24529912,10.00,8.05,2.60,0.720,FULL SIZE HOT GLUE GUN,10.00,8.05,2.60
3,15205,565284,6.85,2.70,2.65,1.285,TAPE DISPENSER,6.85,2.70,2.65
4,13444,2610177,9.20,8.10,6.70,1.300,DESKTOP DRAWER SYSTEM SMALL,9.20,8.10,6.70
...,...,...,...,...,...,...,...,...,...,...
7617,11997,49616,5.30,3.70,1.50,0.600,LAMINATING POUCH BADGE SIZE,5.30,3.70,1.50
7618,9779,664524,6.30,3.50,1.80,0.700,SDFC MULTIPLICATION 0-12,6.30,3.50,1.80
7619,17560,478187,7.80,5.50,2.90,1.400,NUTRA GRAIN RASPBERRY-BX,7.80,5.50,2.90
7620,10738,735767,5.00,4.90,1.40,0.300,MAGIC TAPE 1/2X2592 3IN 2PK,5.00,4.90,1.40


In [82]:
# here I created a item object to host the name and description
class Item:
    def __init__(self, **kwargs):
        for key, value in kwargs.items():
            setattr(self, key, value)
        self.mySelection = []
    
    def __str__(self):
        pstr = ''
        for k, v in vars(self).items():
            pstr += f"{k}:{v};\n"
        return pstr

In [83]:
# Initiate Map/dict to contain items
# the key should be the item ID, the value is the Item object: {1: item1, 2:item2, ...}
itemMap = {}

# Make a for loop to go through the dataframe and put each row into a object
for idx, row in items.iterrows():
    itemMap[row['item_id']] = Item(**row.to_dict())

In [84]:
itemMap['12810']

KeyError: '12810'

In [85]:
print(itemMap['12810'])

KeyError: '12810'

In [86]:
itemMap['12810'].minDim

KeyError: '12810'

### Rule Object
Instead of hardcoding the selection rules, a more flexible approach would be to leverage object-oriented programming (OOP) concepts. This involves encapsulating the rules into separate Rule objects and applying them dynamically during the selection process. Let's compare the two styles:

**Naive if flow**: 
```Python
# rule one
if foo>bar and ....:
    do something here
# rule two 
if foo<bar and ....:
    do other things here
# rule three, four, ...
...
```

In this approach, the selection logic is directly implemented within the code, making it less adaptable to changes in rules or the need for additional rules. Modifying the selection criteria requires manual changes to the code, which can be error-prone and time-consuming.

----

**OOP**:
```Python
rules = [rule1, rule2, rule3, rule4, ...]
for rule in rules:
    rule.apply(item)
...
```

By using OOP concepts, we can encapsulate the selection rules into separate Rule objects. Each Rule object represents a specific selection criterion and can be easily modified or extended without affecting the overall structure of the code. The rules can be organized into a cohesive hierarchy, allowing for better organization and maintainability.

During the selection process, the Rule objects can be dynamically applied based on the desired criteria. This flexibility enables easy addition, modification, or removal of rules, providing a more scalable and adaptable solution.


In [100]:
# here is an example of how to implement Rule Object
# Base Rule object. Every Rule object should provide the two method template
class BaseRule:
    def isEligible(self,item: Item):
        '''returns a true/false(boolean) value for the selection logic to use.'''
        return False
    
    def getGraspSuctionTuple(self):
        '''returns a tuple (suctionCupID, graspTypeID) for the selection logic to use.'''
        return (0, 1)

    
# each suction cup should have a base rule to enforce the min/max dim and weight condition
class miniCupRule(BaseRule):
    def isEligible(self, item: Item):
        return item.dim3 > 0.18 and item.dim1 < 5 and item.weight < 0.8
    
    def getGraspSuctionTuple(self):
        '''returns a tuple (suctionCupID, graspTypeID) for the selection logic to use.'''
        return (5, 1)


# advanced rule inheriting from cup base rule
class MiniCupPreferred1SORule(miniCupRule):
    '''Rule MiniCupPreferred1SO Implementation'''
    def __init__(self):
        super().__init__()
        self.name = 'MiniCupPreferred1SO'
    
    def isEligible(self, item: Item):
        return super().isEligible(item) and item.dim3 <1.1 and item.dim1 < 5.8 and item.weight < 0.088
    
    def getGraspSuctionTuple(self):
        '''returns a tuple (suctionCupID, graspTypeID) for the selection logic to use.'''
        return (5, 0)


In [101]:
# run through the items and apply MiniCupPreferred1SO rule:
rules = [MiniCupPreferred1SORule(),]
itemSample = []
# loop over items and rules
for itemId, item in itemMap.items():
    for rule in rules:
        if rule.isEligible(item):
            print(f"item {itemId} is eligible for rule {rule.name}")
            item.mySelection.append(rule.getGraspSuctionTuple())
            itemSample.append(item)

item 6614 is eligible for rule MiniCupPreferred1SO
item 10668 is eligible for rule MiniCupPreferred1SO
item 6692 is eligible for rule MiniCupPreferred1SO
item 4857 is eligible for rule MiniCupPreferred1SO
item 1517 is eligible for rule MiniCupPreferred1SO
item 864 is eligible for rule MiniCupPreferred1SO
item 2713 is eligible for rule MiniCupPreferred1SO
item 7621 is eligible for rule MiniCupPreferred1SO
item 5293 is eligible for rule MiniCupPreferred1SO
item 889 is eligible for rule MiniCupPreferred1SO
item 4482 is eligible for rule MiniCupPreferred1SO
item 6660 is eligible for rule MiniCupPreferred1SO
item 915 is eligible for rule MiniCupPreferred1SO
item 1211 is eligible for rule MiniCupPreferred1SO
item 2527 is eligible for rule MiniCupPreferred1SO
item 4595 is eligible for rule MiniCupPreferred1SO
item 7921 is eligible for rule MiniCupPreferred1SO
item 1006 is eligible for rule MiniCupPreferred1SO
item 2353 is eligible for rule MiniCupPreferred1SO
item 20570 is eligible for rule M

In [103]:
itemMap[4758].mySelection

[(5, 0)]