# Downloading datasets


* first i am gonna install kaggle
* then i am gonna import the data set to the notebook

In [1]:
! pip install kaggle


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [2]:
! kaggle datasets download -d datafiniti/consumer-reviews-of-amazon-products
! kaggle datasets download -d karkavelrajaj/amazon-sales-dataset

Dataset URL: https://www.kaggle.com/datasets/datafiniti/consumer-reviews-of-amazon-products
License(s): CC-BY-NC-SA-4.0
Downloading consumer-reviews-of-amazon-products.zip to /Users/SaadMakki/Developer/Python/Projects/Healthcare_Data_Project(by senior)
100%|██████████████████████████████████████| 16.3M/16.3M [00:11<00:00, 1.74MB/s]
100%|██████████████████████████████████████| 16.3M/16.3M [00:11<00:00, 1.51MB/s]
Dataset URL: https://www.kaggle.com/datasets/karkavelrajaj/amazon-sales-dataset
License(s): CC-BY-NC-SA-4.0
Downloading amazon-sales-dataset.zip to /Users/SaadMakki/Developer/Python/Projects/Healthcare_Data_Project(by senior)
100%|██████████████████████████████████████| 1.95M/1.95M [00:01<00:00, 1.16MB/s]
100%|██████████████████████████████████████| 1.95M/1.95M [00:01<00:00, 1.08MB/s]


In [3]:
import os
import zipfile as zp

zip_file_to_extract = [file for file in os.listdir() if file.endswith('.zip')]
os.makedirs('dataset', exist_ok=True)

for zipFile in zip_file_to_extract:
    with zp.ZipFile(zipFile, 'r') as file:
        file.extractall('dataset')
    os.remove(zipFile)

# Prefroming tasks on the datasets


i am gonna perform 5 tasks on 2 datasets
* handling missing values
* merging datasets
* renaming columns
* creating new columns
* type conversion

! pip install pandas

In [4]:
import pandas as pd

# Handling Missing Values


### for Datafiniti_Amazon_Consumer_Reviews_of_Amazon_Products

In [5]:
Amazon_consumer_review = pd.read_csv('dataset/Datafiniti_Amazon_Consumer_Reviews_of_Amazon_Products.csv')
Amazon_consumer_review.isnull().sum()

id                        0
dateAdded                 0
dateUpdated               0
name                      0
asins                     0
brand                     0
categories                0
primaryCategories         0
imageURLs                 0
keys                      0
manufacturer              0
manufacturerNumber        0
reviews.date              0
reviews.dateAdded      3948
reviews.dateSeen          0
reviews.doRecommend       0
reviews.id             4971
reviews.numHelpful        0
reviews.rating            0
reviews.sourceURLs        0
reviews.text              0
reviews.title            13
reviews.username          1
sourceURLs                0
dtype: int64

The reviews username was missing so i put the most common name used in the whole column instead

In [6]:
Amazon_consumer_review['reviews.username'].fillna(Amazon_consumer_review['reviews.username'].value_counts().idxmax(), inplace=True)
Amazon_consumer_review.isnull().sum()

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  Amazon_consumer_review['reviews.username'].fillna(Amazon_consumer_review['reviews.username'].value_counts().idxmax(), inplace=True)


id                        0
dateAdded                 0
dateUpdated               0
name                      0
asins                     0
brand                     0
categories                0
primaryCategories         0
imageURLs                 0
keys                      0
manufacturer              0
manufacturerNumber        0
reviews.date              0
reviews.dateAdded      3948
reviews.dateSeen          0
reviews.doRecommend       0
reviews.id             4971
reviews.numHelpful        0
reviews.rating            0
reviews.sourceURLs        0
reviews.text              0
reviews.title            13
reviews.username          0
sourceURLs                0
dtype: int64

Now doing the same for reviews title column

In [7]:
Amazon_consumer_review['reviews.title'].fillna(Amazon_consumer_review['reviews.title'].value_counts().idxmax(), inplace= True)
Amazon_consumer_review.isnull().sum()

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  Amazon_consumer_review['reviews.title'].fillna(Amazon_consumer_review['reviews.title'].value_counts().idxmax(), inplace= True)


id                        0
dateAdded                 0
dateUpdated               0
name                      0
asins                     0
brand                     0
categories                0
primaryCategories         0
imageURLs                 0
keys                      0
manufacturer              0
manufacturerNumber        0
reviews.date              0
reviews.dateAdded      3948
reviews.dateSeen          0
reviews.doRecommend       0
reviews.id             4971
reviews.numHelpful        0
reviews.rating            0
reviews.sourceURLs        0
reviews.text              0
reviews.title             0
reviews.username          0
sourceURLs                0
dtype: int64

The column reviews.dateAdded and reviews.id  have missing values greater than >5% so i am gonna drop it

In [8]:
Amazon_consumer_review.drop(['reviews.dateAdded','reviews.id'], axis=1, inplace=True)
Amazon_consumer_review.isnull().sum()

id                     0
dateAdded              0
dateUpdated            0
name                   0
asins                  0
brand                  0
categories             0
primaryCategories      0
imageURLs              0
keys                   0
manufacturer           0
manufacturerNumber     0
reviews.date           0
reviews.dateSeen       0
reviews.doRecommend    0
reviews.numHelpful     0
reviews.rating         0
reviews.sourceURLs     0
reviews.text           0
reviews.title          0
reviews.username       0
sourceURLs             0
dtype: int64

### For amazon.csv

In [9]:
amazon = pd.read_csv('dataset/amazon.csv')
amazon.info()
amazon.isnull().sum()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1465 entries, 0 to 1464
Data columns (total 16 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   product_id           1465 non-null   object
 1   product_name         1465 non-null   object
 2   category             1465 non-null   object
 3   discounted_price     1465 non-null   object
 4   actual_price         1465 non-null   object
 5   discount_percentage  1465 non-null   object
 6   rating               1465 non-null   object
 7   rating_count         1463 non-null   object
 8   about_product        1465 non-null   object
 9   user_id              1465 non-null   object
 10  user_name            1465 non-null   object
 11  review_id            1465 non-null   object
 12  review_title         1465 non-null   object
 13  review_content       1465 non-null   object
 14  img_link             1465 non-null   object
 15  product_link         1465 non-null   object
dtypes: obj

product_id             0
product_name           0
category               0
discounted_price       0
actual_price           0
discount_percentage    0
rating                 0
rating_count           2
about_product          0
user_id                0
user_name              0
review_id              0
review_title           0
review_content         0
img_link               0
product_link           0
dtype: int64

For the amazon list only 2 rows have missing values so its safe to drop them

In [10]:
amazon.dropna(inplace= True)
amazon.isnull().sum()

product_id             0
product_name           0
category               0
discounted_price       0
actual_price           0
discount_percentage    0
rating                 0
rating_count           0
about_product          0
user_id                0
user_name              0
review_id              0
review_title           0
review_content         0
img_link               0
product_link           0
dtype: int64

# Merging datasets

In [11]:
Amazon_consumer_review.info()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 22 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   id                   5000 non-null   object
 1   dateAdded            5000 non-null   object
 2   dateUpdated          5000 non-null   object
 3   name                 5000 non-null   object
 4   asins                5000 non-null   object
 5   brand                5000 non-null   object
 6   categories           5000 non-null   object
 7   primaryCategories    5000 non-null   object
 8   imageURLs            5000 non-null   object
 9   keys                 5000 non-null   object
 10  manufacturer         5000 non-null   object
 11  manufacturerNumber   5000 non-null   object
 12  reviews.date         5000 non-null   object
 13  reviews.dateSeen     5000 non-null   object
 14  reviews.doRecommend  5000 non-null   bool  
 15  reviews.numHelpful   5000 non-null   int64 
 16  review

In [12]:
amazon.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1463 entries, 0 to 1464
Data columns (total 16 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   product_id           1463 non-null   object
 1   product_name         1463 non-null   object
 2   category             1463 non-null   object
 3   discounted_price     1463 non-null   object
 4   actual_price         1463 non-null   object
 5   discount_percentage  1463 non-null   object
 6   rating               1463 non-null   object
 7   rating_count         1463 non-null   object
 8   about_product        1463 non-null   object
 9   user_id              1463 non-null   object
 10  user_name            1463 non-null   object
 11  review_id            1463 non-null   object
 12  review_title         1463 non-null   object
 13  review_content       1463 non-null   object
 14  img_link             1463 non-null   object
 15  product_link         1463 non-null   object
dtypes: object(1

In [13]:
merge_csv = pd.merge(Amazon_consumer_review, amazon, left_on='id', right_on='product_id', how="left")
merge_csv

Unnamed: 0,id,dateAdded,dateUpdated,name,asins,brand,categories,primaryCategories,imageURLs,keys,...,rating,rating_count,about_product,user_id,user_name,review_id,review_title,review_content,img_link,product_link
0,AVqVGZNvQMlgsOJE6eUY,2017-03-03T16:56:05Z,2018-10-25T16:36:31Z,"Amazon Kindle E-Reader 6"" Wifi (8th Generation...",B00ZV9PXP2,Amazon,"Computers,Electronics Features,Tablets,Electro...",Electronics,https://pisces.bbystatic.com/image2/BestBuy_US...,allnewkindleereaderblack6glarefreetouchscreend...,...,,,,,,,,,,
1,AVqVGZNvQMlgsOJE6eUY,2017-03-03T16:56:05Z,2018-10-25T16:36:31Z,"Amazon Kindle E-Reader 6"" Wifi (8th Generation...",B00ZV9PXP2,Amazon,"Computers,Electronics Features,Tablets,Electro...",Electronics,https://pisces.bbystatic.com/image2/BestBuy_US...,allnewkindleereaderblack6glarefreetouchscreend...,...,,,,,,,,,,
2,AVqVGZNvQMlgsOJE6eUY,2017-03-03T16:56:05Z,2018-10-25T16:36:31Z,"Amazon Kindle E-Reader 6"" Wifi (8th Generation...",B00ZV9PXP2,Amazon,"Computers,Electronics Features,Tablets,Electro...",Electronics,https://pisces.bbystatic.com/image2/BestBuy_US...,allnewkindleereaderblack6glarefreetouchscreend...,...,,,,,,,,,,
3,AVqVGZNvQMlgsOJE6eUY,2017-03-03T16:56:05Z,2018-10-25T16:36:31Z,"Amazon Kindle E-Reader 6"" Wifi (8th Generation...",B00ZV9PXP2,Amazon,"Computers,Electronics Features,Tablets,Electro...",Electronics,https://pisces.bbystatic.com/image2/BestBuy_US...,allnewkindleereaderblack6glarefreetouchscreend...,...,,,,,,,,,,
4,AVqVGZNvQMlgsOJE6eUY,2017-03-03T16:56:05Z,2018-10-25T16:36:31Z,"Amazon Kindle E-Reader 6"" Wifi (8th Generation...",B00ZV9PXP2,Amazon,"Computers,Electronics Features,Tablets,Electro...",Electronics,https://pisces.bbystatic.com/image2/BestBuy_US...,allnewkindleereaderblack6glarefreetouchscreend...,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4995,AVqkIdZiv8e3D1O-leaJ,2017-03-06T14:59:25Z,2017-09-04T11:19:31Z,"Fire Tablet with Alexa, 7"" Display, 16 GB, Mag...",B018Y224PY,Amazon,"Tablets,Fire Tablets,Electronics,iPad & Tablet...",Electronics,https://images-na.ssl-images-amazon.com/images...,"841667103150,0841667103150,firetabletwithalexa...",...,,,,,,,,,,
4996,AVqkIdZiv8e3D1O-leaJ,2017-03-06T14:59:25Z,2017-09-04T11:19:31Z,"Fire Tablet with Alexa, 7"" Display, 16 GB, Mag...",B018Y224PY,Amazon,"Tablets,Fire Tablets,Electronics,iPad & Tablet...",Electronics,https://images-na.ssl-images-amazon.com/images...,"841667103150,0841667103150,firetabletwithalexa...",...,,,,,,,,,,
4997,AVqkIdZiv8e3D1O-leaJ,2017-03-06T14:59:25Z,2017-09-04T11:19:31Z,"Fire Tablet with Alexa, 7"" Display, 16 GB, Mag...",B018Y224PY,Amazon,"Tablets,Fire Tablets,Electronics,iPad & Tablet...",Electronics,https://images-na.ssl-images-amazon.com/images...,"841667103150,0841667103150,firetabletwithalexa...",...,,,,,,,,,,
4998,AVqkIdZiv8e3D1O-leaJ,2017-03-06T14:59:25Z,2017-09-04T11:19:31Z,"Fire Tablet with Alexa, 7"" Display, 16 GB, Mag...",B018Y224PY,Amazon,"Tablets,Fire Tablets,Electronics,iPad & Tablet...",Electronics,https://images-na.ssl-images-amazon.com/images...,"841667103150,0841667103150,firetabletwithalexa...",...,,,,,,,,,,


In [14]:
merge_csv.to_csv('merged_csv.csv')

# Renaming column in dataset


### Renaming the columns in  Amazon_consumer_review

In [15]:
column_rename_mapping = {
    'id': 'product_id',
    'dateAdded': 'product_added_date',
    'dateUpdated': 'product_last_updated_date',
    'name': 'product_name',
    'asins': 'amazon_asin',
    'brand': 'brand_name',
    'categories': 'product_categories',
    'primaryCategories': 'primary_category',
    'imageURLs': 'product_image_urls',
    'keys': 'product_keys',
    'manufacturer': 'manufacturer_name',
    'manufacturerNumber': 'manufacturer_part_number',
    'reviews.date': 'review_date',
    'reviews.dateAdded': 'review_added_date',
    'reviews.dateSeen': 'review_last_viewed_date',
    'reviews.doRecommend': 'is_recommended',
    'reviews.id': 'review_id',
    'reviews.numHelpful': 'helpful_votes_count',
    'reviews.rating': 'review_rating',
    'reviews.sourceURLs': 'review_source_urls'
}

In [16]:
Amazon_consumer_review.rename(columns=column_rename_mapping, inplace=True, errors='ignore')
Amazon_consumer_review.info()
Amazon_consumer_review.to_csv('renamed amazon consumer review.csv')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 22 columns):
 #   Column                     Non-Null Count  Dtype 
---  ------                     --------------  ----- 
 0   product_id                 5000 non-null   object
 1   product_added_date         5000 non-null   object
 2   product_last_updated_date  5000 non-null   object
 3   product_name               5000 non-null   object
 4   amazon_asin                5000 non-null   object
 5   brand_name                 5000 non-null   object
 6   product_categories         5000 non-null   object
 7   primary_category           5000 non-null   object
 8   product_image_urls         5000 non-null   object
 9   product_keys               5000 non-null   object
 10  manufacturer_name          5000 non-null   object
 11  manufacturer_part_number   5000 non-null   object
 12  review_date                5000 non-null   object
 13  review_last_viewed_date    5000 non-null   object
 14  is_recom

### Renaming the columns in amazon dataset

In [17]:
amazon.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1463 entries, 0 to 1464
Data columns (total 16 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   product_id           1463 non-null   object
 1   product_name         1463 non-null   object
 2   category             1463 non-null   object
 3   discounted_price     1463 non-null   object
 4   actual_price         1463 non-null   object
 5   discount_percentage  1463 non-null   object
 6   rating               1463 non-null   object
 7   rating_count         1463 non-null   object
 8   about_product        1463 non-null   object
 9   user_id              1463 non-null   object
 10  user_name            1463 non-null   object
 11  review_id            1463 non-null   object
 12  review_title         1463 non-null   object
 13  review_content       1463 non-null   object
 14  img_link             1463 non-null   object
 15  product_link         1463 non-null   object
dtypes: object(1

In [18]:
column_rename_mapping = {
    # Product metadata
    'product_id': 'product_id',  # Already good (keep as-is)
    'product_name': 'product_name',  # Already clear
    'category': 'product_category',
    'img_link': 'product_image_url',
    'product_link': 'product_page_url',
    
    # Pricing information
    'discounted_price': 'current_price',
    'actual_price': 'original_price',
    'discount_percentage': 'discount_pct',
    
    # Ratings/Reviews
    'rating': 'average_rating',
    'rating_count': 'total_ratings',
    'about_product': 'product_description',
    
    # User/Review metadata
    'user_id': 'reviewer_id',
    'user_name': 'reviewer_name',
    'review_id': 'review_id',  # Already good
    'review_title': 'review_title',  # Already clear
    'review_content': 'review_text'
}

In [19]:
amazon.rename(columns=column_rename_mapping, inplace=True, errors='ignore')
amazon.info()
# amazon.to_csv('rename amazon.csv')

<class 'pandas.core.frame.DataFrame'>
Index: 1463 entries, 0 to 1464
Data columns (total 16 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   product_id           1463 non-null   object
 1   product_name         1463 non-null   object
 2   product_category     1463 non-null   object
 3   current_price        1463 non-null   object
 4   original_price       1463 non-null   object
 5   discount_pct         1463 non-null   object
 6   average_rating       1463 non-null   object
 7   total_ratings        1463 non-null   object
 8   product_description  1463 non-null   object
 9   reviewer_id          1463 non-null   object
 10  reviewer_name        1463 non-null   object
 11  review_id            1463 non-null   object
 12  review_title         1463 non-null   object
 13  review_text          1463 non-null   object
 14  product_image_url    1463 non-null   object
 15  product_page_url     1463 non-null   object
dtypes: object(1

# Creating New Columns

### first doing the Amazon_consumer_review 

in this i am gonna add column containing the boolean value if the manufacture and the brand is the same value

In [20]:
Amazon_consumer_review['is manufacture same as brand'] = Amazon_consumer_review['brand_name'].str.lower() == Amazon_consumer_review['manufacturer_name'].str.lower()

now i am gonna add a urgent tab where if the rating of the product is lower than 2 it will raise a flag

In [21]:
Amazon_consumer_review['urgent review'] = (Amazon_consumer_review['review_rating']<=2)

now saving the file as a separate csv

In [22]:
Amazon_consumer_review.to_csv('new columns amazon consumer review.csv')
Amazon_consumer_review

Unnamed: 0,product_id,product_added_date,product_last_updated_date,product_name,amazon_asin,brand_name,product_categories,primary_category,product_image_urls,product_keys,...,is_recommended,helpful_votes_count,review_rating,review_source_urls,reviews.text,reviews.title,reviews.username,sourceURLs,is manufacture same as brand,urgent review
0,AVqVGZNvQMlgsOJE6eUY,2017-03-03T16:56:05Z,2018-10-25T16:36:31Z,"Amazon Kindle E-Reader 6"" Wifi (8th Generation...",B00ZV9PXP2,Amazon,"Computers,Electronics Features,Tablets,Electro...",Electronics,https://pisces.bbystatic.com/image2/BestBuy_US...,allnewkindleereaderblack6glarefreetouchscreend...,...,False,0,3,http://reviews.bestbuy.com/3545/5442403/review...,I thought it would be as big as small paper bu...,Too small,llyyue,https://www.newegg.com/Product/Product.aspx%25...,True,False
1,AVqVGZNvQMlgsOJE6eUY,2017-03-03T16:56:05Z,2018-10-25T16:36:31Z,"Amazon Kindle E-Reader 6"" Wifi (8th Generation...",B00ZV9PXP2,Amazon,"Computers,Electronics Features,Tablets,Electro...",Electronics,https://pisces.bbystatic.com/image2/BestBuy_US...,allnewkindleereaderblack6glarefreetouchscreend...,...,True,0,5,http://reviews.bestbuy.com/3545/5442403/review...,This kindle is light and easy to use especiall...,Great light reader. Easy to use at the beach,Charmi,https://www.newegg.com/Product/Product.aspx%25...,True,False
2,AVqVGZNvQMlgsOJE6eUY,2017-03-03T16:56:05Z,2018-10-25T16:36:31Z,"Amazon Kindle E-Reader 6"" Wifi (8th Generation...",B00ZV9PXP2,Amazon,"Computers,Electronics Features,Tablets,Electro...",Electronics,https://pisces.bbystatic.com/image2/BestBuy_US...,allnewkindleereaderblack6glarefreetouchscreend...,...,True,0,4,https://reviews.bestbuy.com/3545/5442403/revie...,Didnt know how much i'd use a kindle so went f...,Great for the price,johnnyjojojo,https://www.newegg.com/Product/Product.aspx%25...,True,False
3,AVqVGZNvQMlgsOJE6eUY,2017-03-03T16:56:05Z,2018-10-25T16:36:31Z,"Amazon Kindle E-Reader 6"" Wifi (8th Generation...",B00ZV9PXP2,Amazon,"Computers,Electronics Features,Tablets,Electro...",Electronics,https://pisces.bbystatic.com/image2/BestBuy_US...,allnewkindleereaderblack6glarefreetouchscreend...,...,True,3,5,https://redsky.target.com/groot-domain-api/v1/...,I am 100 happy with my purchase. I caught it o...,A Great Buy,Kdperry,https://www.newegg.com/Product/Product.aspx%25...,True,False
4,AVqVGZNvQMlgsOJE6eUY,2017-03-03T16:56:05Z,2018-10-25T16:36:31Z,"Amazon Kindle E-Reader 6"" Wifi (8th Generation...",B00ZV9PXP2,Amazon,"Computers,Electronics Features,Tablets,Electro...",Electronics,https://pisces.bbystatic.com/image2/BestBuy_US...,allnewkindleereaderblack6glarefreetouchscreend...,...,True,0,5,https://reviews.bestbuy.com/3545/5442403/revie...,Solid entry level Kindle. Great for kids. Gift...,Solid entry-level Kindle. Great for kids,Johnnyblack,https://www.newegg.com/Product/Product.aspx%25...,True,False
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4995,AVqkIdZiv8e3D1O-leaJ,2017-03-06T14:59:25Z,2017-09-04T11:19:31Z,"Fire Tablet with Alexa, 7"" Display, 16 GB, Mag...",B018Y224PY,Amazon,"Tablets,Fire Tablets,Electronics,iPad & Tablet...",Electronics,https://images-na.ssl-images-amazon.com/images...,"841667103150,0841667103150,firetabletwithalexa...",...,True,0,5,http://reviews.bestbuy.com/3545/5025900/review...,This is a great tablet for the price. Amazon i...,Good product,litle,"https://www.barcodable.com/upc/841667103150,ht...",True,False
4996,AVqkIdZiv8e3D1O-leaJ,2017-03-06T14:59:25Z,2017-09-04T11:19:31Z,"Fire Tablet with Alexa, 7"" Display, 16 GB, Mag...",B018Y224PY,Amazon,"Tablets,Fire Tablets,Electronics,iPad & Tablet...",Electronics,https://images-na.ssl-images-amazon.com/images...,"841667103150,0841667103150,firetabletwithalexa...",...,True,1,5,http://reviews.bestbuy.com/3545/5025900/review...,This tablet is the perfect size and so easy to...,Great Tablet,gracie,"https://www.barcodable.com/upc/841667103150,ht...",True,False
4997,AVqkIdZiv8e3D1O-leaJ,2017-03-06T14:59:25Z,2017-09-04T11:19:31Z,"Fire Tablet with Alexa, 7"" Display, 16 GB, Mag...",B018Y224PY,Amazon,"Tablets,Fire Tablets,Electronics,iPad & Tablet...",Electronics,https://images-na.ssl-images-amazon.com/images...,"841667103150,0841667103150,firetabletwithalexa...",...,True,8,4,http://reviews.bestbuy.com/3545/5025900/review...,Purchased this for my son. Has room to upgrade...,Great for kids or smaller needs,Hawk,"https://www.barcodable.com/upc/841667103150,ht...",True,False
4998,AVqkIdZiv8e3D1O-leaJ,2017-03-06T14:59:25Z,2017-09-04T11:19:31Z,"Fire Tablet with Alexa, 7"" Display, 16 GB, Mag...",B018Y224PY,Amazon,"Tablets,Fire Tablets,Electronics,iPad & Tablet...",Electronics,https://images-na.ssl-images-amazon.com/images...,"841667103150,0841667103150,firetabletwithalexa...",...,True,0,5,http://reviews.bestbuy.com/3545/5025900/review...,I had some thoughts about getting this for a 5...,Very sturdy for a 5 year old,Mrbilly,"https://www.barcodable.com/upc/841667103150,ht...",True,False


### Now doing it for amazon

adding a column containing the length of te review text

In [23]:
amazon['review length'] = amazon['review_text'].str.len()

In [24]:
amazon.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1463 entries, 0 to 1464
Data columns (total 17 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   product_id           1463 non-null   object
 1   product_name         1463 non-null   object
 2   product_category     1463 non-null   object
 3   current_price        1463 non-null   object
 4   original_price       1463 non-null   object
 5   discount_pct         1463 non-null   object
 6   average_rating       1463 non-null   object
 7   total_ratings        1463 non-null   object
 8   product_description  1463 non-null   object
 9   reviewer_id          1463 non-null   object
 10  reviewer_name        1463 non-null   object
 11  review_id            1463 non-null   object
 12  review_title         1463 non-null   object
 13  review_text          1463 non-null   object
 14  product_image_url    1463 non-null   object
 15  product_page_url     1463 non-null   object
 16  review leng

# Type conversion on Columns

### Amazon consumer review

In [25]:
Amazon_consumer_review.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 24 columns):
 #   Column                        Non-Null Count  Dtype 
---  ------                        --------------  ----- 
 0   product_id                    5000 non-null   object
 1   product_added_date            5000 non-null   object
 2   product_last_updated_date     5000 non-null   object
 3   product_name                  5000 non-null   object
 4   amazon_asin                   5000 non-null   object
 5   brand_name                    5000 non-null   object
 6   product_categories            5000 non-null   object
 7   primary_category              5000 non-null   object
 8   product_image_urls            5000 non-null   object
 9   product_keys                  5000 non-null   object
 10  manufacturer_name             5000 non-null   object
 11  manufacturer_part_number      5000 non-null   object
 12  review_date                   5000 non-null   object
 13  review_last_viewed

In [26]:
Amazon_consumer_review['product_id'] =Amazon_consumer_review['product_id'].astype(str)

### Amazon

In [27]:
amazon.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1463 entries, 0 to 1464
Data columns (total 17 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   product_id           1463 non-null   object
 1   product_name         1463 non-null   object
 2   product_category     1463 non-null   object
 3   current_price        1463 non-null   object
 4   original_price       1463 non-null   object
 5   discount_pct         1463 non-null   object
 6   average_rating       1463 non-null   object
 7   total_ratings        1463 non-null   object
 8   product_description  1463 non-null   object
 9   reviewer_id          1463 non-null   object
 10  reviewer_name        1463 non-null   object
 11  review_id            1463 non-null   object
 12  review_title         1463 non-null   object
 13  review_text          1463 non-null   object
 14  product_image_url    1463 non-null   object
 15  product_page_url     1463 non-null   object
 16  review leng

In [28]:
amazon['product_id'] =amazon['product_id'].astype(str)

In [29]:
amazon.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1463 entries, 0 to 1464
Data columns (total 17 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   product_id           1463 non-null   object
 1   product_name         1463 non-null   object
 2   product_category     1463 non-null   object
 3   current_price        1463 non-null   object
 4   original_price       1463 non-null   object
 5   discount_pct         1463 non-null   object
 6   average_rating       1463 non-null   object
 7   total_ratings        1463 non-null   object
 8   product_description  1463 non-null   object
 9   reviewer_id          1463 non-null   object
 10  reviewer_name        1463 non-null   object
 11  review_id            1463 non-null   object
 12  review_title         1463 non-null   object
 13  review_text          1463 non-null   object
 14  product_image_url    1463 non-null   object
 15  product_page_url     1463 non-null   object
 16  review leng