## Preparing the dataset

Following the instructons you should have checked out the project and started jupyter notebook in the parent folder.
```
uni-sofia-entity-linking-magellan    <= "jupyter notebook" started here
 |- datasets
 |- notebooks
    |- entity_match_electronics.ipynb
```
 

In [97]:
# Extract the zip file with dataset CSV files (amazon.csv, best_buy.csv )
# -o is to overwrite
!unzip -o ../dataset/dataset_electronics_ID_8.zip -d ../dataset/ 

Archive:  ../dataset/dataset_electronics_ID_8.zip
  inflating: ../dataset/amazon.csv   
  inflating: ../dataset/best_buy.csv  


In [98]:
# Import py_entitymatching package
import py_entitymatching as em
import os
import pandas as pd

In [99]:
# Read the CSV files and set 'ID' as the key attribute
A = em.read_csv_metadata("../dataset/amazon.csv", key='ID')
B = em.read_csv_metadata("../dataset/best_buy.csv", key='ID')

Metadata file is not present in the given path; proceeding to read the csv file.
Metadata file is not present in the given path; proceeding to read the csv file.


Lets have a look at the loaded data frames. Notice that `A`.`Original_Price` can be null.

In [100]:
A.head(3)

Unnamed: 0,ID,Brand,Name,Amazon_Price,Original_Price,Features
0,1,Asus,"ASUS X205TA 11.6 Inch Laptop (Intel Atom, 2 GB, 32GB SSD, Gold) - Free Upgrade to Windows 10",$199.00,,Intel Atom 1.33 GHz Processor. 2 GB DDR3 RAM. 32GB SSD Storage; No Optical Drive. 11.6 inches 13...
1,2,Other,AmazonBasics 11.6-Inch Laptop Sleeve,$9.99,,Form-fitting sleeve with quick top-loading access for Chromebooks and MacBook Air laptops. Preci...
2,3,Lenovo,Lenovo G50 Entertainment Laptop - Black: DOORBUSTER - Intel Core i7-5500U (2.4GHz / 3.0 GHz Turb...,$799.77,$999.99,"5th Generation Intel Core i7-5500U Processor (2.4 GHz Turbo / 3.0 GHz Base, 1600MHz 4MB). 15.6\ ..."


In [101]:
B.head(3)

Unnamed: 0,ID,Brand,Name,Price,Description,Features
0,1,Asus,Asus 11.6 Laptop Intel Atom 2GB Memory 32GB Flash Storage Blue X205TA-SATM0404G,$189.99,"11.6&#34; Laptop - Intel Atom - 2GB Memory - 32GB Flash Storage, Read customer reviews and buy o...","Microsoft Windows 8.1 operating system preinstalled,Intel?? Atom??? processor Z3735F,2GB DDR3L m..."
1,2,HP,HP 15.6 TouchScreen Laptop Intel Core i3 6GB Memory 750GB Hard Drive Black 15-r264dx,$379.99,"15.6&#34; Touch-Screen Laptop - Intel Core i3 - 6GB Memory - 750GB Hard Drive, Read customer rev...","Microsoft Windows 8.1 operating system preinstalled,5th Gen Intel?? Core??? i3-5010U processor,I..."
2,3,Asus,Asus 2in1 13.3 TouchScreen Laptop Intel Core i5 6GB Memory 1TB Hard Drive Black Q302LA-BBI5T19,$749.99,"2-in-1 13.3&#34; Touch-Screen Laptop - Intel Core i5 - 6GB Memory - 1TB Hard Drive, Read custome...","Microsoft Windows 10 operating system,13.3 TFT-LCD touch screen for hands-on control,5th Gen Int..."


In [102]:
print(f"len(A): {len(A)}")
print(f"len(B): {len(B)}")
print(f"len(A) * len(B): {len(A) * len(B)}")

len(A): 4259
len(B): 5001
len(A) * len(B): 21299259


## Creating additional features is important
Here we will try to parse and make additional features

In [103]:
# Tokens with digits are important
# so we introduce new field that will contain only tokens with digits, which might be 
# monitor length, cpu, ram, etc.

import re
def filter_tokens_with_digits(s):
    s = str(s)
    s = re.sub('&#34;|[*\\\()\-\/]', ' ',s)
    s = re.sub(',', ' ',s)
    toks = s.split()
    toks = list(filter(lambda x: bool(re.search(r'\d', x)), toks))
    return ' '.join(toks)

def filter_toks_letters(s):
    s = str(s)
    toks = s.split()
    toks = list(filter(lambda x: bool(re.search(r'[a-zA-Z]', x)), toks))
    return ' '.join(toks)

# Example
print(filter_tokens_with_digits('2-in-1 13.3&#34; Touch,-Screen Laptop - Intel Core i5 - 6GB Memory - 1TB Hard Drive, Read'))
print(filter_toks_letters(filter_tokens_with_digits('2-in-1 13.3&#34; Touch,-Screen Laptop - Intel Core i5 - 6GB Memory - 1TB Hard Drive, Read')))

2 1 13.3 i5 6GB 1TB
i5 6GB 1TB


In [104]:
A['Parameters'] = A['Name'] + " " + A['Features']
A['Parameters'] = A['Parameters'].apply(filter_tokens_with_digits)
A.head(3)

Unnamed: 0,ID,Brand,Name,Amazon_Price,Original_Price,Features,Parameters
0,1,Asus,"ASUS X205TA 11.6 Inch Laptop (Intel Atom, 2 GB, 32GB SSD, Gold) - Free Upgrade to Windows 10",$199.00,,Intel Atom 1.33 GHz Processor. 2 GB DDR3 RAM. 32GB SSD Storage; No Optical Drive. 11.6 inches 13...,X205TA 11.6 2 32GB 10 1.33 2 DDR3 32GB 11.6 1366 768 8.1
1,2,Other,AmazonBasics 11.6-Inch Laptop Sleeve,$9.99,,Form-fitting sleeve with quick top-loading access for Chromebooks and MacBook Air laptops. Preci...,11.6 11.6 11.4 0.4 8.4 12.2 0.8 9
2,3,Lenovo,Lenovo G50 Entertainment Laptop - Black: DOORBUSTER - Intel Core i7-5500U (2.4GHz / 3.0 GHz Turb...,$799.77,$999.99,"5th Generation Intel Core i7-5500U Processor (2.4 GHz Turbo / 3.0 GHz Base, 1600MHz 4MB). 15.6\ ...",G50 i7 5500U 2.4GHz 3.0 8GB 1TB 15.6 1080P USB3.0 8.1 5th i7 5500U 2.4 3.0 1600MHz 4MB 15.6 1080...


In [105]:
B['Parameters'] = B['Name'] + " " + B['Description']+ " " + B['Features']
B['Parameters'] = B['Parameters'].apply(filter_tokens_with_digits)
B.head(3)

Unnamed: 0,ID,Brand,Name,Price,Description,Features,Parameters
0,1,Asus,Asus 11.6 Laptop Intel Atom 2GB Memory 32GB Flash Storage Blue X205TA-SATM0404G,$189.99,"11.6&#34; Laptop - Intel Atom - 2GB Memory - 32GB Flash Storage, Read customer reviews and buy o...","Microsoft Windows 8.1 operating system preinstalled,Intel?? Atom??? processor Z3735F,2GB DDR3L m...",11.6 2GB 32GB X205TA SATM0404G 11.6 2GB 32GB 8.1 Z3735F 2GB DDR3L 11.6 32GB 0.3MP 2 2.0 802.11a ...
1,2,HP,HP 15.6 TouchScreen Laptop Intel Core i3 6GB Memory 750GB Hard Drive Black 15-r264dx,$379.99,"15.6&#34; Touch-Screen Laptop - Intel Core i3 - 6GB Memory - 750GB Hard Drive, Read customer rev...","Microsoft Windows 8.1 operating system preinstalled,5th Gen Intel?? Core??? i3-5010U processor,I...",15.6 i3 6GB 750GB 15 r264dx 15.6 i3 6GB 750GB 8.1 5th i3 5010U i3 6GB DDR3L 15.6 750GB 7200 5500...
2,3,Asus,Asus 2in1 13.3 TouchScreen Laptop Intel Core i5 6GB Memory 1TB Hard Drive Black Q302LA-BBI5T19,$749.99,"2-in-1 13.3&#34; Touch-Screen Laptop - Intel Core i5 - 6GB Memory - 1TB Hard Drive, Read custome...","Microsoft Windows 10 operating system,13.3 TFT-LCD touch screen for hands-on control,5th Gen Int...",2in1 13.3 i5 6GB 1TB Q302LA BBI5T19 2 1 13.3 i5 6GB 1TB 10 13.3 5th i5 5200U 6GB 1TB 3.75 0.87 2...


# Extract terms based on tfidf score

In [106]:
from sklearn.feature_extraction.text import TfidfVectorizer
import numpy as np
tf = TfidfVectorizer(input='content', analyzer='word', ngram_range=(1,1),
                      min_df = 0, stop_words = 'english', sublinear_tf=True)

In [107]:
# Fit A and B to be one big corpus of terms
tf.fit((A['Brand'] + ' ' + A['Name']).fillna(""))
tf.fit((B['Brand'] + ' ' + B['Name']).fillna(""))

TfidfVectorizer(min_df=0, stop_words='english', sublinear_tf=True)

In [108]:
# TFIDF for set A
tfidf_matrix =  tf.transform((A['Brand'] + ' ' + A['Name']).fillna(""))
feature_array = np.array(tf.get_feature_names())
def extract_top_tfidf(column):
     response = tf.transform([column])
     tfidf_sorting = np.argsort(response.toarray()).flatten()[::-1]
     n = 5
     top_n = feature_array[tfidf_sorting][:n]
     # stringify
     return " ".join(top_n)
A['tfidf'] = ((A['Brand'] + ' ' + A['Name']).fillna("")).apply(extract_top_tfidf)

In [109]:
# TFIDF for set B
tfidf_matrix =  tf.transform((B['Brand'] + ' ' + B['Name']).fillna(""))
feature_array = np.array(tf.get_feature_names())
def extract_top_tfidf(column):
     response = tf.transform([column])
     tfidf_sorting = np.argsort(response.toarray()).flatten()[::-1]
     n = 5
     top_n = feature_array[tfidf_sorting][:n]
     # stringify
     return " ".join(top_n)
B['tfidf'] = ((B['Brand'] + ' ' +B['Name']).fillna("")).apply(extract_top_tfidf)

In [110]:
A.head(2)

Unnamed: 0,ID,Brand,Name,Amazon_Price,Original_Price,Features,Parameters,tfidf
0,1,Asus,"ASUS X205TA 11.6 Inch Laptop (Intel Atom, 2 GB, 32GB SSD, Gold) - Free Upgrade to Windows 10",$199.00,,Intel Atom 1.33 GHz Processor. 2 GB DDR3 RAM. 32GB SSD Storage; No Optical Drive. 11.6 inches 13...,X205TA 11.6 2 32GB 10 1.33 2 DDR3 32GB 11.6 1366 768 8.1,inch upgrade x205ta asus ssd
1,2,Other,AmazonBasics 11.6-Inch Laptop Sleeve,$9.99,,Form-fitting sleeve with quick top-loading access for Chromebooks and MacBook Air laptops. Preci...,11.6 11.6 11.4 0.4 8.4 12.2 0.8 9,inch sleeve 11 laptop zseries


In [111]:
B.head(2)

Unnamed: 0,ID,Brand,Name,Price,Description,Features,Parameters,tfidf
0,1,Asus,Asus 11.6 Laptop Intel Atom 2GB Memory 32GB Flash Storage Blue X205TA-SATM0404G,$189.99,"11.6&#34; Laptop - Intel Atom - 2GB Memory - 32GB Flash Storage, Read customer reviews and buy o...","Microsoft Windows 8.1 operating system preinstalled,Intel?? Atom??? processor Z3735F,2GB DDR3L m...",11.6 2GB 32GB X205TA SATM0404G 11.6 2GB 32GB 8.1 Z3735F 2GB DDR3L 11.6 32GB 0.3MP 2 2.0 802.11a ...,satm0404g x205ta asus atom storage
1,2,HP,HP 15.6 TouchScreen Laptop Intel Core i3 6GB Memory 750GB Hard Drive Black 15-r264dx,$379.99,"15.6&#34; Touch-Screen Laptop - Intel Core i3 - 6GB Memory - 750GB Hard Drive, Read customer rev...","Microsoft Windows 8.1 operating system preinstalled,5th Gen Intel?? Core??? i3-5010U processor,I...",15.6 i3 6GB 750GB 15 r264dx 15.6 i3 6GB 750GB 8.1 5th i3 5010U i3 6GB DDR3L 15.6 750GB 7200 5500...,r264dx 750gb hp 6gb i3


## Persist A and B with new features to be available for the matching


In [112]:
# Read the CSV files and set 'ID' as the key attribute
em.to_csv_metadata(A, file_path="../dataset/amazon_new_features.csv")
em.to_csv_metadata(B, file_path="../dataset/best_buy_new_features.csv")

True

# Block Tables and Make Set of Candidates

Obviously having 21'299'259 records as a cross product between `A` and `B` is quite high value. What we are going to do now is to reduce obviously non-matching pairs. This process is called blocking tables `A` and `B`. We can use 2 of the blocking mechanisms provided by *py_entitymatching* and namely that would be:
 - attribute equivalence
 - overlap
We know that for an electronics to match , it should be the same `Brand`, so this should match. Sometimes it can have error or typo in the brand, so we can use overlap for tokens in the `Name` and `Description`. Here is the blocking plan:

In [113]:
# Blocking plan

# A, B -- AttrEquivalence blocker [Brand]--|
#                                        |---Overlap[Parameters]-Overlap[tfidf]--> candidate set
# A, B -- Overlap blocker [Name]-----------|

In [114]:
# Create attribute equivalence blocker
ab = em.AttrEquivalenceBlocker()
# Block tables using 'year' attribute : same year include in candidate set
C1 = ab.block_tables(A, B, 'Brand', 'Brand', 
                     l_output_attrs=['Brand','Name','Amazon_Price','Original_Price','Features','Parameters','tfidf'],
                     r_output_attrs=['Brand','Name','Price','Description','Features','Parameters','tfidf']
                    )
len(C1)

4439971

In [115]:
C1.head(2)

Unnamed: 0,_id,ltable_ID,rtable_ID,ltable_Brand,ltable_Name,ltable_Amazon_Price,ltable_Original_Price,ltable_Features,ltable_Parameters,ltable_tfidf,rtable_Brand,rtable_Name,rtable_Price,rtable_Description,rtable_Features,rtable_Parameters,rtable_tfidf
0,0,1,1,Asus,"ASUS X205TA 11.6 Inch Laptop (Intel Atom, 2 GB, 32GB SSD, Gold) - Free Upgrade to Windows 10",$199.00,,Intel Atom 1.33 GHz Processor. 2 GB DDR3 RAM. 32GB SSD Storage; No Optical Drive. 11.6 inches 13...,X205TA 11.6 2 32GB 10 1.33 2 DDR3 32GB 11.6 1366 768 8.1,inch upgrade x205ta asus ssd,Asus,Asus 11.6 Laptop Intel Atom 2GB Memory 32GB Flash Storage Blue X205TA-SATM0404G,$189.99,"11.6&#34; Laptop - Intel Atom - 2GB Memory - 32GB Flash Storage, Read customer reviews and buy o...","Microsoft Windows 8.1 operating system preinstalled,Intel?? Atom??? processor Z3735F,2GB DDR3L m...",11.6 2GB 32GB X205TA SATM0404G 11.6 2GB 32GB 8.1 Z3735F 2GB DDR3L 11.6 32GB 0.3MP 2 2.0 802.11a ...,satm0404g x205ta asus atom storage
1,1,1,3,Asus,"ASUS X205TA 11.6 Inch Laptop (Intel Atom, 2 GB, 32GB SSD, Gold) - Free Upgrade to Windows 10",$199.00,,Intel Atom 1.33 GHz Processor. 2 GB DDR3 RAM. 32GB SSD Storage; No Optical Drive. 11.6 inches 13...,X205TA 11.6 2 32GB 10 1.33 2 DDR3 32GB 11.6 1366 768 8.1,inch upgrade x205ta asus ssd,Asus,Asus 2in1 13.3 TouchScreen Laptop Intel Core i5 6GB Memory 1TB Hard Drive Black Q302LA-BBI5T19,$749.99,"2-in-1 13.3&#34; Touch-Screen Laptop - Intel Core i5 - 6GB Memory - 1TB Hard Drive, Read custome...","Microsoft Windows 10 operating system,13.3 TFT-LCD touch screen for hands-on control,5th Gen Int...",2in1 13.3 i5 6GB 1TB Q302LA BBI5T19 2 1 13.3 i5 6GB 1TB 10 13.3 5th i5 5200U 6GB 1TB 3.75 0.87 2...,bbi5t19 q302la asus 6gb 2in1


In [116]:
# Initialize overlap blocker
ob = em.OverlapBlocker()
# Block over title attribute
C2 = ob.block_tables(A, B, 'Name', 'Name', show_progress=True, overlap_size=2)
len(C2)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  l_df[l_dummy_overlap_attr] = l_df[l_overlap_attr]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  r_df[r_dummy_overlap_attr] = r_df[r_overlap_attr]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  table[overlap_attr] = values
  projected_dataframe = dataframe[proj_attrs].dropna(0,
0% [################

3253258

In [117]:
# Combine the outputs from attr. equivalence blocker and overlap blocker
C = em.combine_blocker_outputs_via_union([C1, C2])
len(C)

6855987

### Narrow blocking
Still 3253258 is quite big number, so we will try to block again and narrow the candidate set.


In [118]:
# http://anhaidgroup.github.io/py_entitymatching/v0.1.x/user_manual/create_feats_for_blocking.html#label-create-features-blocking

D = ob.block_candset(C, l_overlap_attr='Parameters', r_overlap_attr='Parameters', \
                     word_level=True,overlap_size=1, \
                     allow_missing=True, n_jobs=1, \
                     show_progress=True)

0% [##############################] 100% | ETA: 00:00:00
Total time elapsed: 00:01:41


In [119]:
print(f"len(D) = {len(D)}")
D.head(2)

len(D) = 2769145


Unnamed: 0,_id,ltable_ID,rtable_ID,ltable_Brand,ltable_Name,ltable_Amazon_Price,ltable_Original_Price,ltable_Features,ltable_Parameters,ltable_tfidf,rtable_Brand,rtable_Name,rtable_Price,rtable_Description,rtable_Features,rtable_Parameters,rtable_tfidf
0,0,1,1,Asus,"ASUS X205TA 11.6 Inch Laptop (Intel Atom, 2 GB, 32GB SSD, Gold) - Free Upgrade to Windows 10",$199.00,,Intel Atom 1.33 GHz Processor. 2 GB DDR3 RAM. 32GB SSD Storage; No Optical Drive. 11.6 inches 13...,X205TA 11.6 2 32GB 10 1.33 2 DDR3 32GB 11.6 1366 768 8.1,inch upgrade x205ta asus ssd,Asus,Asus 11.6 Laptop Intel Atom 2GB Memory 32GB Flash Storage Blue X205TA-SATM0404G,$189.99,"11.6&#34; Laptop - Intel Atom - 2GB Memory - 32GB Flash Storage, Read customer reviews and buy o...","Microsoft Windows 8.1 operating system preinstalled,Intel?? Atom??? processor Z3735F,2GB DDR3L m...",11.6 2GB 32GB X205TA SATM0404G 11.6 2GB 32GB 8.1 Z3735F 2GB DDR3L 11.6 32GB 0.3MP 2 2.0 802.11a ...,satm0404g x205ta asus atom storage
1,1,1,2,Asus,"ASUS X205TA 11.6 Inch Laptop (Intel Atom, 2 GB, 32GB SSD, Gold) - Free Upgrade to Windows 10",$199.00,,Intel Atom 1.33 GHz Processor. 2 GB DDR3 RAM. 32GB SSD Storage; No Optical Drive. 11.6 inches 13...,X205TA 11.6 2 32GB 10 1.33 2 DDR3 32GB 11.6 1366 768 8.1,inch upgrade x205ta asus ssd,HP,HP 15.6 TouchScreen Laptop Intel Core i3 6GB Memory 750GB Hard Drive Black 15-r264dx,$379.99,"15.6&#34; Touch-Screen Laptop - Intel Core i3 - 6GB Memory - 750GB Hard Drive, Read customer rev...","Microsoft Windows 8.1 operating system preinstalled,5th Gen Intel?? Core??? i3-5010U processor,I...",15.6 i3 6GB 750GB 15 r264dx 15.6 i3 6GB 750GB 8.1 5th i3 5010U i3 6GB DDR3L 15.6 750GB 7200 5500...,r264dx 750gb hp 6gb i3


In [120]:
# http://anhaidgroup.github.io/py_entitymatching/v0.1.x/user_manual/create_feats_for_blocking.html#label-create-features-blocking

E = ob.block_candset(D, l_overlap_attr='tfidf', r_overlap_attr='tfidf', \
                     word_level=True,overlap_size=1, \
                     allow_missing=True, n_jobs=1, \
                     show_progress=True)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  table[overlap_attr] = values
0% [##############################] 100% | ETA: 00:00:00
Total time elapsed: 00:00:31


In [121]:
print(f"len(E) = {len(E)}")
E.head(2)

len(E) = 282581


Unnamed: 0,_id,ltable_ID,rtable_ID,ltable_Brand,ltable_Name,ltable_Amazon_Price,ltable_Original_Price,ltable_Features,ltable_Parameters,ltable_tfidf,rtable_Brand,rtable_Name,rtable_Price,rtable_Description,rtable_Features,rtable_Parameters,rtable_tfidf
0,0,1,1,Asus,"ASUS X205TA 11.6 Inch Laptop (Intel Atom, 2 GB, 32GB SSD, Gold) - Free Upgrade to Windows 10",$199.00,,Intel Atom 1.33 GHz Processor. 2 GB DDR3 RAM. 32GB SSD Storage; No Optical Drive. 11.6 inches 13...,X205TA 11.6 2 32GB 10 1.33 2 DDR3 32GB 11.6 1366 768 8.1,inch upgrade x205ta asus ssd,Asus,Asus 11.6 Laptop Intel Atom 2GB Memory 32GB Flash Storage Blue X205TA-SATM0404G,$189.99,"11.6&#34; Laptop - Intel Atom - 2GB Memory - 32GB Flash Storage, Read customer reviews and buy o...","Microsoft Windows 8.1 operating system preinstalled,Intel?? Atom??? processor Z3735F,2GB DDR3L m...",11.6 2GB 32GB X205TA SATM0404G 11.6 2GB 32GB 8.1 Z3735F 2GB DDR3L 11.6 32GB 0.3MP 2 2.0 802.11a ...,satm0404g x205ta asus atom storage
2,2,1,3,Asus,"ASUS X205TA 11.6 Inch Laptop (Intel Atom, 2 GB, 32GB SSD, Gold) - Free Upgrade to Windows 10",$199.00,,Intel Atom 1.33 GHz Processor. 2 GB DDR3 RAM. 32GB SSD Storage; No Optical Drive. 11.6 inches 13...,X205TA 11.6 2 32GB 10 1.33 2 DDR3 32GB 11.6 1366 768 8.1,inch upgrade x205ta asus ssd,Asus,Asus 2in1 13.3 TouchScreen Laptop Intel Core i5 6GB Memory 1TB Hard Drive Black Q302LA-BBI5T19,$749.99,"2-in-1 13.3&#34; Touch-Screen Laptop - Intel Core i5 - 6GB Memory - 1TB Hard Drive, Read custome...","Microsoft Windows 10 operating system,13.3 TFT-LCD touch screen for hands-on control,5th Gen Int...",2in1 13.3 i5 6GB 1TB Q302LA BBI5T19 2 1 13.3 i5 6GB 1TB 10 13.3 5th i5 5200U 6GB 1TB 3.75 0.87 2...,bbi5t19 q302la asus 6gb 2in1


In [122]:
# Sample  candidate set
S = em.sample_table(E, 500)


In [123]:
S['label'] = 0

# We save this to manually label offline
S.to_csv('../dataset/sample_blocked_500.csv')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  S['label'] = 0


## Manually label data
Here we must go and manually label the data, but you can use the previously stored file.
In another jupyter notebook we continue from a labeled data from file `sample_blocked_500_labeled.csv`