## Web Crawler to SQLite3 on eBay

### About this project
- This project is mix of using web-crawling, data cleaning, and storing data into a database
- Here, I will be using the following libraries:
    - pandas
    - re
    - beautifulsoup
    - requests
    - sqlite3
- I will also be commenting on places where you can  change the variables to adjust to your needs

#### Steps
- Create a funtions that stores your "keyword" (item you want on eBay)
- Use the search engine on ebay website to search your "keyword"
    - get the first n pages
- Extract the items' information
    - title of the item
    - price
    - condition 
    - shipping information
    - returning policy

|keyword|rank|item_title|price|condition|shipping_info|return|
|:--|:--|:--|:--|:--|:--|:--|
|starwars|1||The Black Series Princess|9.99|brand new|free shipping|null|
|starwars|2|Luke Skywalker & Ysalamiri |19.99|brand new|free shipping|free return|
|starwars|3|The Black Series Luke Skywalker|25.99|brand new|5.99 shipping|null|
|...|...|...|...|...|...|...|
|...|...|...|...|...|...|...|
|lego|1|100p lego |19.99|pre-owned|free shipping|null|

- Store the data above into a database using SQLite3
    - write SQL to analyze the data

## All the importing libraries

In [30]:
import numpy as np
import pandas as pd
from bs4 import BeautifulSoup
import requests
import sqlite3
import re

## STEP 1
#### Making a list of keywords

In [10]:
keyword = input('Type in words that you want to search: ')
keys = keyword.split(' ')

Type in words that you want to search: Type in words that you want


In [11]:
keys

['Type', 'in', 'words', 'that', 'you', 'want']

## STEP 2
#### Create a web crawler on Yahoo search engine to extract information

In [140]:

keywords = ["starwars"]

title_box = []
price_box = []
condition_box = []
shipping_box = []
return_box = []

for k in keywords:
    for pn in range(1, 4):
        URL = 'https://www.ebay.com/sch/i.html?_from=R40&_nkw={0}&_pgn={1}'.format(k,pn)
        res = requests.get(URL)
        if res.status_code != 200:
            continue
        res.coding = 'utf-8'
        soup = BeautifulSoup(res.text, 'html.parser')
        elms = soup.find_all('div', {'class': 's-item__info clearfix'})
        for i, elm in enumerate(elms):
            title = elm.find_all('h3', {'class': 's-item__title'})[0].get_text()
            title_box.append({'page': pn, 'index': i, 'title': title})
            try:
                price = elm.find_all('span', {'class': 's-item__price'})[0].get_text()
            except:
                continue
            price_box.append({'page': pn, 'index': i, 'price': price})
            try:
                cond = elm.find_all('span', {'class': 'SECONDARY_INFO'})[0].get_text()
            except:
                continue
            condition_box.append({'page': pn, 'index': i, 'condition': cond})
            try:
                ship = elm.find_all('span', {'class': 's-item__shipping s-item__logisticsCost'})[0].get_text()
            except:
                continue
            shipping_box.append({'page': pn, 'index': i, 'shipping': ship})
            try:
                r = elm.find_all('span', {'class': 's-item__free-returns s-item__freeReturnsNoFee'})[0].get_text()
            except:
                continue
            return_box.append({'page': pn, 'index': i, 'return': r})
            

## STEP 3
#### From the dictionaries we made from above, store it in a data frame

In [135]:
# functions for all 5 description 
def todf(x):
    df = pd.DataFrame.from_dict(x[0], orient = 'index').T
    for k in range(1, len(x)):
        df = df.append(x[k], ignore_index = True)
    return df

In [141]:
title_df = todf(title_box)
price_df = todf(price_box)
cond_df = todf(condition_box)
ship_df = todf(shipping_box)
r_df = todf(return_box)

In [148]:
DF = pd.merge(title_df, price_df, on=['page', 'index'], how='left')
DF = pd.merge(DF, cond_df, on=['page', 'index'], how='left')
DF = pd.merge(DF, ship_df, on=['page', 'index'], how='left')
DF = pd.merge(DF, r_df, on=['page', 'index'], how='left')
DF['keyword'] = keywords[0]
DF

Unnamed: 0,page,index,title,price,condition,shipping,return,keyword
0,1,0,,,,,,starwars
1,1,1,Star Wars The Black Series Princess Leia Organ...,$9.99,Brand New,Free shipping,,starwars
2,1,2,Star Wars The Black Series Luke Skywalker & Ys...,$26.36,Brand New,,,starwars
3,1,3,Star Wars The Black Series Luke Skywalker (Dag...,$15.99,Brand New,Free shipping,,starwars
4,1,4,Star Wars The Vintage Collection The Mandalori...,$59.19,Brand New,Free shipping,,starwars
...,...,...,...,...,...,...,...,...
196,3,62,Star Wars The Black Series Bad Batch Wrecker 6...,$44.00,Brand New,+$7.75 shipping,,starwars
197,3,63,STAR WARS Kid's Mandalorian Baby Yoda The Chil...,$14.99,Brand New,Free shipping,,starwars
198,3,64,FUNKO POP STAR WARS MANDALORIAN MANDO FLYING ...,$13.45,Brand New,Free shipping,,starwars
199,3,65,One-Stop Disney Infinity Shop! Buy 3 Get 2 FRE...,$4.50 to $40.00,Pre-Owned,Free shipping,Free returns,starwars


In [None]:
#### THINGS STILL LEFT TO DO
#  indexをrankに変える
# index=0を排除する
# column名の順番を変える
# priceの金額を数値だけにする
# shippingの項目を変更する
# returnの項目をNanかFreeにする
# 
#
# 全て完了したら、一つの関数にする　（keywordのlistを作って、）



In [None]:
#### メモ
# これを使うことによって、ebayで好きなものを条件を入力してみることができる
# 分析（どれくらいfree shippingがあるか）
    # conditionの割合
    # returnの割合
    # ある特定の検索ワードに対して、どの商品名やキャラクター名が多いのか

## STEP X
#### After making the complete table, import it into a database using sqlite

In [None]:
## after making the complete table, import it into a database using sqlite

conn = sqlite3.connect("SEO_taisaku")
df1.to_sql('affi_links', conn, if_exists = 'append', index = None)
conn.closer()


conn = sqlite3.connect("SEO_taisaku")

first = pd.read_sql('''
        SELECT
            rank,
            word,
            min(掲載順位) as afi_rank,
            url
        FROM
            affi_links
        WHERE
            company_name = "bitFlyer"
        GROUP BY
            url,
            rank,
            word
        ORDER BY
            word,
            rank''', con = conn)
conn.closer()