#  Data Cleaning and Transformation of eBay Computer Components Listings Dataset

## Overview

This notebook demonstrates the process of cleaning and transforming a dataset of eBay listings, which contains information on the sale of computer components. The primary aim is to improve the usability of the raw dataset by cleaning the data and making necessary transformations.

### Importing Necessary Libraries:

In [2]:
import pandas as pd
import numpy as np
import re

### Setting up the Display Options:

In [7]:
pd.set_option('display.max_colwidth', 95)
pd.set_option('display.max_rows', None)

### Loading the Dataset:

In [8]:
file_path = r"C:\Users\Lyagovich\Documents\Portfolio\Ebay Scraper\ebay_listings.csv"
df= pd.read_csv(file_path)
df1 = df.copy()

### Displaying Dataset Information and Preview:

In [5]:
df1.info()
df1.head(5)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4803 entries, 0 to 4802
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   title      4803 non-null   object
 1   price      4803 non-null   object
 2   info       4803 non-null   object
 3   link       4803 non-null   object
 4   sold_date  4783 non-null   object
dtypes: object(5)
memory usage: 187.7+ KB


Unnamed: 0,title,price,info,link,sold_date
0,Shop on eBay,$20.00,Brand New,https://ebay.com/itm/123456?hash=item28caef0a3a:g:E3kAAOSwlGJiMikD&amdata=enc%3AAQAHAAAAsJo...,
1,New ListingASUS TUF NVIDIA GeForce RTX 3060 TI 8GB GDDR6 OC Graphics Card,$259.99,Pre-Owned · ASUS · NVIDIA GeForce RTX 3060 Ti · 8 GB,https://www.ebay.com/itm/374775205575?epid=16049131519&hash=item574257cec7%3Ag%3AtS8AAOSwHC...,2023-06-26
2,NVIDIA GeForce RTX 2080 Super GDDR6 Founders Edition Graphics Card - 8GB,$232.50,Pre-Owned · NVIDIA · NVIDIA GeForce RTX 2080 Super · 8GB,https://www.ebay.com/itm/325700341743?epid=13036084099&hash=item4bd540cfef%3Ag%3ASWoAAOSwQ4...,2023-06-26
3,XFX AMD Radeon RX 580 8GB GDDR5 Graphics Card - RX-580P8DFD6,$79.95,Pre-Owned · XFX · AMD Radeon RX 580 · 8 GB,https://www.ebay.com/itm/275915192201?epid=17012571362&hash=item403dd3b789%3Ag%3ATJoAAOSwKp...,2023-06-26
4,EVGA GeForce RTX 3060 Ti XC GAMING 8GB GDDR6 Graphics Card (08G-P5-3663-KL),$225.00,Pre-Owned · EVGA · NVIDIA GeForce RTX 3060 Ti · 8 GB,https://www.ebay.com/itm/266304790412?epid=7048188903&hash=item3e01009f8c%3Ag%3AF-gAAOSwEWl...,2023-06-26


### Handling Missing Data:

In [12]:
df1[['condition', 'brand', 'model', 'memory']] = df1['info'].apply(lambda x: pd.Series(str(x).split('·')))
df1[['condition', 'brand', 'model', 'memory']] = df1[['condition', 'brand', 'model', 'memory']].apply(lambda x: x.str.strip())

In [13]:
df1 = df1.dropna(subset=['sold_date'])
df1['info'] = df1['info'].astype(str)
df1 = df1.dropna(subset=['condition', 'brand', 'model', 'memory'])

### Modifying Brands:

In [14]:
brand_counts = df1['brand'].value_counts()
df1 = df1[df1['brand'] != 'HP']
df1.loc[df1['brand'] == 'Msi|Nvidia', 'brand'] = 'MSI'
df1.loc[df1['brand'] == 'Graphics|MSI', 'brand'] = 'MSI'
df1.loc[df1['brand'] == 'MSI|Geforce', 'brand'] = 'MSI'
df1.loc[df1['brand'] == 'Nvidia|Pny', 'brand'] = 'PNY'
df1['brand'] = df1['brand'].str.upper()

### Cleaning 'price' Column:

In [16]:
df1['price'] = df1['price'].str.replace('$', '', regex=False).replace(',', '', regex=False).astype(float)
df1 = df1[df1['price'] != 'Tap item to see current priceSee price']

### Preview of Cleaned Data:

In [17]:
df1.head()

Unnamed: 0,title,price,info,link,sold_date,condition,brand,model,memory
1,New ListingASUS TUF NVIDIA GeForce RTX 3060 TI 8GB GDDR6 OC Graphics Card,,Pre-Owned · ASUS · NVIDIA GeForce RTX 3060 Ti · 8 GB,https://www.ebay.com/itm/374775205575?epid=16049131519&hash=item574257cec7%3Ag%3AtS8AAOSwHC...,2023-06-26,Pre-Owned,ASUS,NVIDIA GeForce RTX 3060 Ti,8 GB
2,NVIDIA GeForce RTX 2080 Super GDDR6 Founders Edition Graphics Card - 8GB,,Pre-Owned · NVIDIA · NVIDIA GeForce RTX 2080 Super · 8GB,https://www.ebay.com/itm/325700341743?epid=13036084099&hash=item4bd540cfef%3Ag%3ASWoAAOSwQ4...,2023-06-26,Pre-Owned,NVIDIA,NVIDIA GeForce RTX 2080 Super,8GB
3,XFX AMD Radeon RX 580 8GB GDDR5 Graphics Card - RX-580P8DFD6,,Pre-Owned · XFX · AMD Radeon RX 580 · 8 GB,https://www.ebay.com/itm/275915192201?epid=17012571362&hash=item403dd3b789%3Ag%3ATJoAAOSwKp...,2023-06-26,Pre-Owned,XFX,AMD Radeon RX 580,8 GB
4,EVGA GeForce RTX 3060 Ti XC GAMING 8GB GDDR6 Graphics Card (08G-P5-3663-KL),,Pre-Owned · EVGA · NVIDIA GeForce RTX 3060 Ti · 8 GB,https://www.ebay.com/itm/266304790412?epid=7048188903&hash=item3e01009f8c%3Ag%3AF-gAAOSwEWl...,2023-06-26,Pre-Owned,EVGA,NVIDIA GeForce RTX 3060 Ti,8 GB
5,Nvidia Quadro P4000 8GB HP GDDR5 Workstation Graphics Card,,Pre-Owned · NVIDIA · NVIDIA Quadro 4000 · 8 GB,https://www.ebay.com/itm/404348964911?hash=item5e2513902f%3Ag%3A5f4AAOSwafJklB5T&amdata=enc...,2023-06-26,Pre-Owned,NVIDIA,NVIDIA Quadro 4000,8 GB


# Conclusion:

This Jupyter Notebook presents an exhaustive data cleaning and preprocessing routine, devised to refine a raw eBay dataset related to sold graphics cards. The major tools implemented in this notebook consist of the Pandas library for data manipulation and Matplotlib for initial data visualization. The features manipulated and cleaned involve the product's title, its selling price, additional details, the product link, and the date it was sold.

This is the second of four notebooks where we're navigating through the complete data pipeline process. Beginning with data collection in the first notebook, we'll progressively clean, organize, transform, analyze, and visualize the data.

This particular notebook zeroes in on the data cleaning and preprocessing phase of the project. A series of functions are defined and used for managing missing data, splitting and reformating columns, and handling any inconsistencies in the dataset. The cleaned and preprocessed data is then stored as a CSV file, which provides a well-structured basis for future data handling, analysis, and visualization.
