# Getting started with DeepMatcher

Note: you can run **[this notebook live in Google Colab](https://colab.research.google.com/github/sidharthms/deepmatcher/blob/master/examples/getting_started.ipynb)** and use free GPUs provided by Google.

This tutorial describes how to effortlessly perform entity matching using deep neural networks. Specifically, we will see how to match pairs of tuples (also called data records or table rows) to determine if they refer to the same real world entity. To do so, we will need labeled examples as input, i.e., tuple pairs which have been annotated as matches or non-matches. This will be used to train our neural network using supervised learning. At the end of this tutorial, you will have a trained neural network as output which you can easily apply to unlabeled tuple pairs to make predictions.

As an overview, here are the 4 steps to use `deepmatcher` which we will go through in this tutorial:

<ol start="0">
  <li>Setup</li>
  <li>Process data</li>
  <li>Define neural network model</li>
  <li>Train model</li>
  <li>Apply model to new data</li>
</ol>

Let's begin!

## Step 0. Setup

If you are running this notebook inside Colab, you will first need to install necessary packages by running the code below:

In [9]:
try:
    import torch
except:
    !pip install -q http://download.pytorch.org/whl/cu80/torch-0.3.1-cp36-cp36m-linux_x86_64.whl
    !pip install -q --process-dependency-links git+https://github.com/sidharthms/deepmatcher

Now let's import `deepmatcher` which will do all the heavy lifting to build and train neural network models for entity matching. 

In [10]:
import deepmatcher as dm

We recommend having a GPU available for the training in Step 4. In case a GPU is not available, we will use all available CPU cores. You can run the following command to determine if a GPU is available and will be used for training:

In [11]:
import torch
torch.cuda.is_available()

True

### Download sample data for entity matching

Now let's get some sample data to play with in this tutorial. We will need three sets of labeled data and one set of unlabeled data:

1. **Training Data:** This is used for training our neural network model.
2. **Validation Data:** This is used for determining the configuration (i.e., hyperparameters) of our model in such a way that the model does not overfit to the training set.
3. **Test Data:** This is used to estimate the performance of our trained model on unlabeled data.
4. **Unlabeled Data:** The trained model is applied on this data to obtain predictions, which can then be used for downstream tasks in practical application scenarios.

We download these four data sets to the `sample_data` directory:

In [12]:
!mkdir -p sample_data
!wget -qnc -P sample_data https://raw.githubusercontent.com/sidharthms/deepmatcher/master/examples/sample_data/amz_goog_train.csv
!wget -qnc -P sample_data https://raw.githubusercontent.com/sidharthms/deepmatcher/master/examples/sample_data/amz_goog_validation.csv
!wget -qnc -P sample_data https://raw.githubusercontent.com/sidharthms/deepmatcher/master/examples/sample_data/amz_goog_test.csv
!wget -qnc -P sample_data https://raw.githubusercontent.com/sidharthms/deepmatcher/master/examples/sample_data/amz_goog_unlabeled.csv

To get an idea of how our data looks like, let's take a peek at the training dataset:

In [13]:
import pandas as pd
pd.read_csv('sample_data/amz_goog_train.csv').head()

Unnamed: 0,id,label,left_id,left_title,left_manufacturer,left_price,right_id,right_title,right_manufacturer,right_price
0,0,0,571,microsoft visio standard 2007 version upgrade,microsoft,129.95,946,adobe cs3 design standard upgrade,,413.99
1,1,0,574,microsoft mappoint 2006 with gps,microsoft,349.0,2423,microsoft student with encarta premium 2008 co...,,43.6
2,2,0,250,adobe after effects professional 7.0,adobe,999.0,2839,adobe flash cs3 professional ( mac ),,699.0
3,3,1,1162,motu digital performer 5 digital audio softwar...,motu,395.0,2109,motu digital performer dp5 software music prod...,,319.95
4,4,1,741,illustrator cs3 13 mac ed 1u,adobe-education-box,199.0,358,adobe illustrator cs3 for mac academic,adobe-education-box,199.99


## Step 1. Process data

Before we can use our data for training, `deepmatcher` needs to first load and process it in order to prepare it for neural network training. Currently `deepmatcher` only supports processing CSV files. Each CSV file is assumed to have the following kinds of columns:

* **"Left" attributes (required):** Our goal is to match tuple pairs. "Left" attributes are columns that correspond to the "left" tuple or the first tuple in the tuple pair. These column names are expected to be prefixed with "left_" by default.
* **"Right" attributes (required):** "Right" attributes are columns that correspond to the "right" tuple or the second tuple in the tuple pair. These column names are expected to be prefixed with "right_" by default.
* **Label column (required for train, validation, test):** Column containing the labels (match or non-match) for each tuple pair. Expected to be named "label" by default
* **ID column (required):** Column containing a unique ID for each tuple pair. This is for evaluation convenience.  Expected to be named "id" by default.

More details on what data processing involves and ways to customize it are described in **[this notebook](https://github.com/sidharthms/deepmatcher/tree/master/examples/data_process.ipynb)**. 

### Processing train / validation / test data
In order to process our train, validation and test CSV files we call `dm.process` in the following code snippet which will load and process the CSV files and return three processed `MatchingDataset` objects respectively. These dataset objects will later be used for training and evaluation. The basic parameters to `dm.proecss` are as follows:

* **path (required): ** The path where all data is stored. This includes train, validation and test. `deepmatcher` may create new files in this directory to store information about these data sets. This allows subsequent `dm.process` calls to be much faster.
* **train (required): ** File name of training data in `path` directory.
* **validation (required): ** File name of validation data in `path` directory.
* **test (optional): ** File name of test data in `path` directory.
* **ignore_columns (optional): ** Any columns in the CSV files that you may want to ignore for the purposes of training. These should be included here. 

Note that the train, validation and test CSVs must all share the same schema, i.e., they should have the same columns. Processing data involves several steps and can take several minutes to complete, especially if this is the first time you are running the `deepmatcher` package.

NOTE: If you are running this in Colab, you may get a message saying 'Memory usage is close to the limit.' You can safely ignore it for now. We are working on reducing the memory footprint.

In [14]:
train, validation, test = dm.process(
    path='sample_data',
    train='amz_goog_train.csv',
    validation='amz_goog_validation.csv',
    test='amz_goog_test.csv',
    ignore_columns=('left_id', 'right_id'))


#### Peeking at processed data
Let's take a look at how the processed data looks like. To do this, we get the raw `pandas` table corresponding to the processed training dataset object. 

In [15]:
train_table = train.get_raw_table()
train_table.head()

Unnamed: 0,id,label,left_title,left_manufacturer,left_price,right_title,right_manufacturer,right_price
0,0,0,microsoft visio standard 2007 version upgrade,microsoft,129.95,adobe cs3 design standard upgrade,,413.99
1,1,0,microsoft mappoint 2006 with gps,microsoft,349.0,microsoft student with encarta premium 2008 co...,,43.6
2,2,0,adobe after effects professional 7.0,adobe,999.0,adobe flash cs3 professional ( mac ),,699.0
3,3,1,motu digital performer 5 digital audio softwar...,motu,395.0,motu digital performer dp5 software music prod...,,319.95
4,4,1,illustrator cs3 13 mac ed 1u,adobe-education-box,199.0,adobe illustrator cs3 for mac academic,adobe-education-box,199.99


The processed attribute values have been tokenized and lowercased so they may not look exactly the same as the input training data. These modifications help the neural network generalize better, i.e., perform better on data not trained on. 

### Processing unlabeled data

`dm.process` can also be used to process unlabeled data, as shown in the code snippet below, so that you can perform prediction over it. The basic parameters to use `dm.process` for this case are as follows:

* **path (required): ** The path where unlabeled data is stored.
* **unlabeled (required): ** File name of unlabeled data in `path` directory.
* **ignore_columns (optional): ** Any columns in the CSV file that you may want to ignore for the purposes of training. These should be included here.

Note that the unlabeled CSV file must have the same schema as the train, validation and test CSVs.

In [16]:
unlabeled = dm.process(
    path='sample_data',
    unlabeled='amz_goog_unlabeled.csv',
    ignore_columns=('left_id', 'right_id'))

Load time: 0.7748610926792026
Vocab time: 13.302086600102484
Metadata time: 0.00011100154370069504


## Step 2. Define neural network model

In this step you tell `deepmatcher` what kind of neural network you would like to use for entity matching. The easiest way to do this is to use one of the several kinds of neural network models that comes built-in with `deepmatcher`. To use a built-in network, construct a `dm.MatchingModel` as follows:

`model = dm.MatchingModel(attr_summarizer='<TYPE>')`

where `<TYPE>` is one of `sif`, `rnn`, `attention` or `hybrid`. If you are not familiar with what these mean, we strongly recommend taking a look at either **[slides from our talk on deepmatcher](http://bit.do/deepmatcher-talk)** for a high level overview, or **[our paper](http://pages.cs.wisc.edu/~anhai/papers1/deepmatcher-sigmod18.pdf)** for a more detailed explanation. Here we give briefly describe the intuition behind these four model types:
* **sif:** This model considers the **words** present in each attribute value pair to determine a match or non-match. It does not take word order into account.
* **rnn:** This model considers the **sequences of words** present in each attribute value pair to determine a match or non-match.
* **attention:** This model considers the **alignment of words** present in each attribute value pair to determine a match or non-match. It does not take word order into account.
* **hybrid:** This model considers the **alignment of sequences of words** present in each attribute value pair to determine a match or non-match.

`deepmatcher` is highly customizable and allows you to tune almost every aspect of the neural network model for your application scenario. **[This tutorial](https://github.com/sidharthms/deepmatcher/tree/master/examples/customize_network.ipynb)** discusses the structure of `MatchingModel`s and how they can be customized.

For this tutorial, let's create a `hybrid` model for entity matching:

In [17]:
model = dm.MatchingModel(attr_summarizer='hybrid')

## Step 3. Train model

Next, we train the defined neural network model using the processed training and validation data. To do so, we call the `run_train` method which takes the following basic parameters:

* **train:** The processed training dataset object (of type `MatchingDataset`).
* **validation:** The processed validation dataset object (of type `MatchingDataset`).
* **epochs:** Number of times to go over the entire `train` data for training the model.
* **batch_size:** Number of labeled examples (tuple pairs) to use for each training step. This value may be increased if you have a lot of training data and would like to speed up training. The optimal value is dataset dependent.
* **best_save_path:** Path to save the best model.
* **pos_neg_ratio**: The ratio of the weight of positive examples (matches) to weight of negative examples (non-matches). This value should be increased if you have fewer matches than non-matches in your data. The optimal value is dataset dependent.

Many other aspects of the training algorithm can be customized. For details on this, please refer the API documentation for **[run_train]()**

In [18]:
model.run_train(
    train,
    validation,
    epochs=15,
    batch_size=16,
    best_save_path='hybrid_model.pth',
    pos_weight=1.8)

* Number of trainable parameters: 7133105
===>  TRAIN Epoch 1 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:31


Finished Epoch 1 || Run Time:   29.2 | Load Time:    2.5 || F1:  16.84 | Prec:  46.15 | Rec:  10.30 || Ex/s: 216.89

===>  EVAL Epoch 1 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:04


Finished Epoch 1 || Run Time:    4.3 | Load Time:    0.8 || F1:  19.58 | Prec:  53.85 | Rec:  11.97 || Ex/s: 445.27

* Best F1: 19.58041958041958
Saving best model...
===>  TRAIN Epoch 2 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:31


Finished Epoch 2 || Run Time:   29.4 | Load Time:    2.5 || F1:  57.38 | Prec:  60.22 | Rec:  54.79 || Ex/s: 215.46

===>  EVAL Epoch 2 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:04


Finished Epoch 2 || Run Time:    4.3 | Load Time:    0.8 || F1:  56.37 | Prec:  66.09 | Rec:  49.15 || Ex/s: 448.61

* Best F1: 56.37254901960784
Saving best model...
===>  TRAIN Epoch 3 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:31


Finished Epoch 3 || Run Time:   29.4 | Load Time:    2.5 || F1:  73.42 | Prec:  70.43 | Rec:  76.68 || Ex/s: 215.51

===>  EVAL Epoch 3 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:04


Finished Epoch 3 || Run Time:    4.3 | Load Time:    0.8 || F1:  63.49 | Prec:  61.69 | Rec:  65.38 || Ex/s: 447.22

* Best F1: 63.485477178423245
Saving best model...
===>  TRAIN Epoch 4 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:31


Finished Epoch 4 || Run Time:   29.4 | Load Time:    2.5 || F1:  82.04 | Prec:  77.41 | Rec:  87.27 || Ex/s: 215.42

===>  EVAL Epoch 4 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:04


Finished Epoch 4 || Run Time:    4.3 | Load Time:    0.8 || F1:  66.39 | Prec:  64.14 | Rec:  68.80 || Ex/s: 445.90

* Best F1: 66.39175257731958
Saving best model...
===>  TRAIN Epoch 5 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:31


Finished Epoch 5 || Run Time:   29.3 | Load Time:    2.5 || F1:  88.40 | Prec:  84.96 | Rec:  92.13 || Ex/s: 215.74

===>  EVAL Epoch 5 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:04


Finished Epoch 5 || Run Time:    4.3 | Load Time:    0.8 || F1:  59.46 | Prec:  69.94 | Rec:  51.71 || Ex/s: 446.94

===>  TRAIN Epoch 6 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:32


Finished Epoch 6 || Run Time:   29.8 | Load Time:    2.6 || F1:  91.07 | Prec:  88.20 | Rec:  94.13 || Ex/s: 212.36

===>  EVAL Epoch 6 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:05


Finished Epoch 6 || Run Time:    4.3 | Load Time:    0.8 || F1:  58.39 | Prec:  67.80 | Rec:  51.28 || Ex/s: 444.14

===>  TRAIN Epoch 7 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:31


Finished Epoch 7 || Run Time:   29.5 | Load Time:    2.5 || F1:  93.18 | Prec:  90.66 | Rec:  95.85 || Ex/s: 214.86

===>  EVAL Epoch 7 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:04


Finished Epoch 7 || Run Time:    4.3 | Load Time:    0.8 || F1:  57.56 | Prec:  67.05 | Rec:  50.43 || Ex/s: 446.43

===>  TRAIN Epoch 8 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:31


Finished Epoch 8 || Run Time:   29.4 | Load Time:    2.6 || F1:  94.12 | Prec:  92.18 | Rec:  96.14 || Ex/s: 214.95

===>  EVAL Epoch 8 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:05


Finished Epoch 8 || Run Time:    4.3 | Load Time:    0.8 || F1:  61.39 | Prec:  69.95 | Rec:  54.70 || Ex/s: 444.47

===>  TRAIN Epoch 9 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:31


Finished Epoch 9 || Run Time:   29.4 | Load Time:    2.5 || F1:  95.51 | Prec:  93.79 | Rec:  97.28 || Ex/s: 215.17

===>  EVAL Epoch 9 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:04


Finished Epoch 9 || Run Time:    4.3 | Load Time:    0.8 || F1:  61.76 | Prec:  69.52 | Rec:  55.56 || Ex/s: 447.40

===>  TRAIN Epoch 10 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:31


Finished Epoch 10 || Run Time:   29.4 | Load Time:    2.5 || F1:  96.42 | Prec:  94.63 | Rec:  98.28 || Ex/s: 215.56

===>  EVAL Epoch 10 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:05


Finished Epoch 10 || Run Time:    4.3 | Load Time:    0.8 || F1:  62.23 | Prec:  70.05 | Rec:  55.98 || Ex/s: 444.00

===>  TRAIN Epoch 11 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:31


Finished Epoch 11 || Run Time:   29.5 | Load Time:    2.5 || F1:  97.17 | Prec:  96.08 | Rec:  98.28 || Ex/s: 214.71

===>  EVAL Epoch 11 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:04


Finished Epoch 11 || Run Time:    4.3 | Load Time:    0.8 || F1:  62.59 | Prec:  69.63 | Rec:  56.84 || Ex/s: 445.26

===>  TRAIN Epoch 12 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:31


Finished Epoch 12 || Run Time:   29.4 | Load Time:    2.5 || F1:  97.94 | Prec:  97.05 | Rec:  98.86 || Ex/s: 214.97

===>  EVAL Epoch 12 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:05


Finished Epoch 12 || Run Time:    4.4 | Load Time:    0.8 || F1:  62.94 | Prec:  69.23 | Rec:  57.69 || Ex/s: 441.22

===>  TRAIN Epoch 13 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:32


Finished Epoch 13 || Run Time:   29.6 | Load Time:    2.6 || F1:  98.22 | Prec:  97.60 | Rec:  98.86 || Ex/s: 213.47

===>  EVAL Epoch 13 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:04


Finished Epoch 13 || Run Time:    4.3 | Load Time:    0.8 || F1:  63.49 | Prec:  67.63 | Rec:  59.83 || Ex/s: 447.88

===>  TRAIN Epoch 14 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:32


Finished Epoch 14 || Run Time:   29.6 | Load Time:    2.6 || F1:  98.58 | Prec:  97.88 | Rec:  99.28 || Ex/s: 213.55

===>  EVAL Epoch 14 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:05


Finished Epoch 14 || Run Time:    4.4 | Load Time:    0.9 || F1:  63.84 | Prec:  66.82 | Rec:  61.11 || Ex/s: 437.36

===>  TRAIN Epoch 15 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:33


Finished Epoch 15 || Run Time:   30.6 | Load Time:    2.6 || F1:  99.07 | Prec:  98.72 | Rec:  99.43 || Ex/s: 207.46

===>  EVAL Epoch 15 :


0% [████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:05


Finished Epoch 15 || Run Time:    4.4 | Load Time:    0.8 || F1:  62.75 | Prec:  66.51 | Rec:  59.40 || Ex/s: 440.20

Loading best model...


## Step 4. Apply model to new data

#### Evaluating on test data
Now that we have a trained model for entity matching, we can now evaluate its accuracy on test data, to estimate the performance of the model on unlabeled data.

In [19]:
# Compute F1 on test set
model.run_eval(test)

===>  EVAL Epoch 4 :


0% [██████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:02


Finished Epoch 4 || Run Time:    2.2 | Load Time:    0.8 || F1:  69.86 | Prec:  65.54 | Rec:  74.79 || Ex/s: 760.45



69.86027944111775

#### Getting predictions on unlabeled data

We finally apply the trained model to unlabeled data to get predictions. To do this, we call the `run_prediction` method which takes a processed data set object and returns a `pandas` dataframe containing tuple pair IDs (`id` column) and the corresponding match / non-match predictions (`prediction` column).

In [20]:
predictions = model.run_prediction(unlabeled)
predictions.head()

===>  PREDICT Epoch 4 :


0% [██████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:02


Finished Epoch 4 || Run Time:    2.3 | Load Time:    0.8 || F1:   0.00 | Prec:   0.00 | Rec:   0.00 || Ex/s:   0.00



Unnamed: 0_level_0,match_score
id,Unnamed: 1_level_1
2174,0.87486
1757,0.044076
1235,0.083617
1112,0.664423
1091,0.068327


You may optionally set the `output_attributes` parameter to also include all attributes present in the original input table. As mentioned earlier, the processed attribute values will likely look a bit different from the attribute values in the input CSV files due to modifications such as tokenization and lowercasing.

In [21]:
predictions = model.run_prediction(unlabeled, output_attributes=True)
predictions.head()

===>  PREDICT Epoch 4 :


0% [██████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:02


Finished Epoch 4 || Run Time:    2.2 | Load Time:    0.8 || F1:   0.00 | Prec:   0.00 | Rec:   0.00 || Ex/s:   0.00



Unnamed: 0_level_0,match_score,left_id,left_title,left_manufacturer,left_price,right_id,right_title,right_manufacturer,right_price
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2174,0.87486,314,zipmagic personal edition,allume systems,19.95,3070,zipmagic personal edition,,8.95
1757,0.044076,1164,g7 kontakt edition,sibelius-software-ltd .,99.99,1103,chatchecker family edition,,29.99
1235,0.083617,1310,ca antivirus 2007,computer associates,39.95,2801,ca anti-spyware 2007,,24.99
1112,0.664423,300,police quest compilation,vivendi games,19.99,2985,police quest compilation,,18.95
1091,0.068327,1323,simple slide show,topics entertainment,19.99,2157,simple movie maker,,12.9


You can then save these predictions to CSV and use them for downstream tasks.

In [22]:
predictions.to_csv('sample_data/unlabeled_predictions.csv')

#### Getting predictions on labeled data

You can also get predictions for labeled data such as validation data. To do so, you can simply call the `run_prediction` method passing the validation data as argument.

In [23]:
valid_predictions = model.run_prediction(train, output_attributes=True)
valid_predictions.head()

===>  PREDICT Epoch 4 :


0% [██████████████████████████████] 100% | ETA: 00:00:00
Total time elapsed: 00:00:10


Finished Epoch 4 || Run Time:    7.7 | Load Time:    3.3 || F1:  85.51 | Prec:  77.75 | Rec:  94.99 || Ex/s: 626.73



Unnamed: 0_level_0,match_score,label,left_id,left_title,left_manufacturer,left_price,right_id,right_title,right_manufacturer,right_price
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1690,0.0738,0,883,adobe creative suite cs3 production premium,adobe,1699.0,369,adobe cs3 design premium upsell,,1639.99
6749,0.036878,0,1078,3d home architect home v. 8 by encore software,encore software,39.99,2589,encore software 10444 elementary school advant...,,25.97
4606,0.03055,0,176,instant immersion japanese ( audio book ),topics entertainment,,2313,instant immersion italian platinum ( win 95 98...,,129.99
3661,0.040486,0,431,microsoft windows server 2003 client additiona...,microsoft,209.0,2853,microsoft windows xp professional edition ( up...,,199.99
5323,0.977979,1,721,backpack journalist,honest technology,79.99,447,global marketing partners backpack journalist ...,,83.27
