# eBayScraper: An Analysis on Scraped Data

The data in [aggregate.csv](./aggregate.csv) are updated periodically with newly scraped data from [ebay.com](https://www.ebay.com/). This dataset mainly focuses on electronic goods.

Below lists a few important details of how the data are collected.
* This data is not representative of all sales. Instead, this data is limited to transactions posted on [ebay.com](https://www.ebay.com/).
* Products were added to the scraper at different points in time. For this reason, items like the Apple Airpods Max do not have data earlier than December 15th, 2020 while other items, like the Apple iPhone 8 are found as early as August 9th, 2020.

In [1]:
import pandas as pd
import seaborn as sns

In [2]:
aggregate = pd.read_csv("aggregate.csv")
aggregate.head(3)

Unnamed: 0,sale_condition,groupA,groupB,groupC,title,price,date
0,Auction,calculator,calculator,ti-83 plus calculator,Texas Instruments Ti-83 Plus Graphing Calculator,12.1,2020-08-09
1,Auction,calculator,calculator,ti-83 plus calculator,TI-83 Plus Graphing Calculator Texas Instruments,3.99,2020-08-09
2,Auction,calculator,calculator,ti-83 plus calculator,Texas Instruments TI-83 Graphing Calculator Te...,10.5,2020-08-09


In [4]:
# filter items with too little values
SIZE_MINIMUM = 1000
aggregate = aggregate.groupby("groupC").filter(lambda df: df.shape[0] > SIZE_MINIMUM)

In [5]:
tabsize = 25
print("Items tracked:\t{:,}".format(len(aggregate["groupC"].unique())).expandtabs(tabsize))
print("Transactions:\t{:,}".format(aggregate.shape[0]).expandtabs(tabsize))

Items tracked:           190
Transactions:            2,482,423


In [6]:
aggregate["groupC"].value_counts()

PlayStation 1                   97394
PlayStation 5                   74350
PlayStation 4                   62279
The Beatles With the Beatles    61902
Nintendo DS                     58711
                                ...  
Nikon D3                         1165
Canon EOS 7D Mark II             1143
Nikon D810                       1107
Leica S                          1103
Fujifilm FinePix X Series        1069
Name: groupC, Length: 190, dtype: int64

## Some questions to investigate

1. Which items lose the most value?
2. Which items lose a fixed percent of their value the most quickly?
3. Which items best retain their value?
4. Which items take the longest to lose a fixed percent of their value?