# 의존성 설치

(선택사항) 도커 컴포즈 RUN 명령으로 필요한 의존성 설치

```
# 패키지 설치를 위해 사용자를 루트로 변경
USER root
RUN apt-get update
RUN apt-get install python-mysqldb -y
# 더이상 루트 권한이 필요하지 않으므로 사용자 변경
USER gopher
RUN pip install numpy pandas bs4 BeautifulSoup requests Flask-SQLAlchemy
```

왜 패키지명이랑 실제로 import 하는 이름이랑 틀릴 걸까? sqlalchemy 랑 mysqldb 가 import 명과 달라서 막혔었다.

아래는 본문에서 사용될 파이썬 패키지 임포트

In [3]:
import numpy as np #  배열 생성 및 연산
import pandas as pd # 데이터프레임워크
import pickle # 암호화

import bs4 as bs # 크롤링
import urllib # 웹데이터 읽어오기
import json # json
from bs4 import BeautifulSoup
import re # 정규식 지원
import MySQLdb # mysql 지원
from sqlalchemy import create_engine # 데이터베이스 툴킷

## urlib.request 버전 처리

파이썬 3.x 에는 urlib.request 가 있지만. 2.x 에서는 사용이 불가하다. 따라서 버전별로 처리를 해야함.

```python
In python 2 ,you simply use urllib for example

import urllib
htmlfile=urllib.urlopen("your url")
htmltext=htmlfile.read()
in python 3,you need to use urllib.request

import urllib.request
htmlfile=urllib.request.urlopen("your url")
htmltext=htmlfile.read()
```
from https://stackoverflow.com/users/4750965/niharika-kumar

if else 문으로 처리하면 모듈 없음 오류가 나면서 실행 자체가 중단된다. 그러면 앙됨.

In [2]:
try:
    import urllib.request as urlreq
except ImportError:
    import urllib as urlreq
    # raise ImportError('<any message you want here>')
     

# 크롤링

## 크롤링한 데이터를 담아올 데이터프레임 생성

In [3]:
columns = ['year', 'title','company','subname','mile','photos',
           'video','exterior_color','interior_color','transmission',
           'drivetrain','star','review_no','vendor','price']
df = pd.DataFrame(columns)

df

Unnamed: 0,0
0,year
1,title
2,company
3,subname
4,mile
5,photos
6,video
7,exterior_color
8,interior_color
9,transmission


## 크롤링 전략

- 중고차 판매 사이트인 `cars.com` 에서 차량에 대한 정보를 조회할 수 있는 API 를 제공한다.
- 하지만, 승인된 파트너만 조회할 수 있다.
- 즉, 웹사이트를 크롤링 해서 정보를 조회한다.

> **주의사항** urlib.request 대신에 urlreq 를 사용해준다.

## 크롤링 테스트 

### str 객체 가져오기

음..

In [162]:
def getPageFrom(url):
    sauce = urlreq.urlopen(url).read()
    return sauce

url = 'https://www.cars.com/for-sale/searchresults.action/?page=1&perPage=100&rd=99999&searchSource=PAGINATION&showMore=true&sort=relevance&stkTypId=28881&zc=31216'
page = getPageFrom(url)
print(type(page))
page[0:30]

<type 'str'>


'\n<!doctype html>\n<html lang="e'

### bs 객체로 str 변환

줄바꿈문자가 들어가 있는 html 문자열이 불러와졌다.
(파이썬이 다이나믹 타입 언어인가?, 이게 문자열 타입인지 모르겠다.)

가져온 문자열들을 lxml 로 html 파싱한다.

In [163]:
def parseFrom(page):
    parsed = bs.BeautifulSoup(page, 'lxml')
    return parsed

parsedPage = parseFrom(page)
print(type(parsedPage))

<class 'bs4.BeautifulSoup'>


파싱된 자료에서 자동차 세부사항을 담은 모든 div 클래스를 추출해서 list 에 담는다.

In [129]:
specificSoup = parsedPage.find_all('div', class_='listing-row__details')

### 정보 추출해서 딕셔너리에 담기

`cui-delta listing-row__title` 로 가져온 soup 태그(type 치면 나옴) 로부터 연식, 제조사, 모델명 추출한다.

- tag 객체의 get_text() 로 문자열만 가져올 수 있다.
- text strip() 으로 양쪽 공백을 벗긴다.
- text split(" ") 으로 공백을 기준으로 나눠 리스트 반환

> **주의** 연식, 제조사 다음에 모델명이 있는데 없는 경우부터 세개 이상인 경우가 있다.
이런 경우에 모델명 중 첫번째 부분것만 subname 으로 저장하고, 없을 경우 제조사명으로 저장한다.

> **주의** 타입을 모르니 예측 불가능한 면이 있어서 화가나는데.. 고언어는 파이썬이랑 비슷하면서도 타입이 있어서 좋네. 다트2는 어떨까?

> 계속 변수명으로 카멜케이스를 쓴다. 언더바로 분리하는 게 익숙치 않네.

아래는 콘텐츠를 tag 로부터 추출하는 처리 함수. 거의 공통으로 쓸 수 있도록 다듬었다.

In [432]:
def contentProcess(obj,holder,index, sep):
    if type(index) == list:
        return contentMerge(obj,holder,index, sep)
    else:
        return contentGet(obj,holder,index, sep)

def contentGet(obj,holder,index, sep):
    if type(obj) == unicode:
        content = obj.split(sep)[index]
    else:
        content = obj.get_text().strip().split(sep)[index].strip()
    print(content)
    # if not myString: 
    if content == "":
        content = holder
    return content

def contentMerge(obj,holder,idxList, sep):
    if type(obj) == unicode:
        try:
            content = " ".join(obj.split(sep)[idxList[0]:idxList[1]])
        except:
            content = " ".join(obj.split(sep)[idxList[0]:])
    else:
        try:
            content = " ".join(obj.get_text().strip().split(sep)[idxList[0]:idxList[1]])
        except:
            content = " ".join(obj.get_text().strip().split(sep)[idxList[0]:])
    print(content)
    # if not myString: 
    if content == "":
        content = holder
    return content

In [434]:
 for div in specificSoup:
    price_soup = div.find('span', {'class' : 'listing-row__price'})
        
    price = contentProcess(price_soup,"0",0,"\n")
    break

$82,990


In [384]:
for div in specificSoup:
            
    # <h2 class="cui-delta listing-row__title">\n                            2011 Audi R8 5.2L\n                        </h2>
    row_title_soup = div.find('h2', {'class' :'cui-delta listing-row__title'})
  
    year = contentProcess(row_title_soup,"empty year",0," ")
    company = contentProcess(row_title_soup,"empty company",1," ")
    subname = contentProcess(row_title_soup,company,2," ")
    title = contentProcess(row_title_soup,"no-title",[1]," ")

    break

2011
Audi
R8
Audi R8 5.2L


운행거리를 마일로 추출

In [401]:
for div in specificSoup:
     
    mile_soup = div.find('span', {'class' : 'listing-row__mileage'})
    mile = contentProcess(mile_soup,"no-mile",0," ")
    
    print(mile)
    
    break

30,556
30,556


벤더 추출

In [404]:
for div in specificSoup:
     
    vendor_soup =div.find('div',{'class' : 'listing-row__dealer-name listing-row__dealer-name-mobile'}).div
    vendor = contentProcess(vendor_soup,"no-vendor",[0]," ")
    print(vendor)
    break

Momentum Motorcars
Momentum Motorcars


media 숫자 추출

이건 stripped_strings 를 사용했다.

In [405]:
for div in specificSoup:
    media_soup = div.find('div', {'class' : 'media-count shadowed'})
    media_stripped = [ text for text in media_soup.stripped_strings]
    print(media_stripped)
        
    mediaCounts = map(lambda x: contentProcess(x,0,0," ") ,media_stripped)
    
    photo = mediaCounts[0]
    video = mediaCounts[1]
    
    print(mediaCounts)
    
    break
    

[u'32 Photos', u'1 Video']
32
1
[u'32', u'1']


In [421]:
for div in specificSoup:
    media_soup = div.find('div', {'class' : 'media-count shadowed'})
    
    photoPre = contentProcess(media_soup,0,0,"\n")
    videoPre = contentProcess(media_soup,0,1,"\n")
    photo = contentProcess(photoPre,0,0," ")
    video = contentProcess(videoPre,0,0," ")
    
    break
    

32 Photos
1 Video
32
1


메타 데이터 추출

요건 find_all 을 사용했다. 위랑 구조는 같은데 구현을 다르게 해보았음

In [422]:
for div in specificSoup:
    
    meta_soup = div.find('ul', {'class' : 'listing-row__meta'}).find_all("li")
        
    exterior_color = contentProcess(meta_soup[0],"black",1,":")
    interior_color = contentProcess(meta_soup[1],"black",1,":")
    transmission = contentProcess(meta_soup[2],"6-speed",1,":")
    
    def drivertrainProcess(tag, holder):
        content = contentProcess(tag, holder,1,":")[0].lower()
        print("content",content)
        if content == 'a':
            print("lower a")
            content = '4wd'
        else:
            content = content +'wd'
        return content
            
    drivetrain = drivertrainProcess(meta_soup[3], "fwd")
    print(exterior_color)
    print(interior_color)
    print(transmission)
    print(drivetrain)

    break

Phantom Black Pearl
Black
6-SPEED A/T
All Wheel Drive
('content', u'a')
lower a
Phantom Black Pearl
Black
6-SPEED A/T
4wd


process 메소드들을 하나로 합칠 수 있는데... 
Process 클래스를 만들고 하위에 각각을 넣거나, 아니면 모든 경우를 다 처리할 수 있는 메소드를 만들거나.
지금은 귀찮음.

별의 개수

css 를 찾아보니 filled 는 별 한 개, half 는 반 개, empty 는 0개.

In [303]:
for div in specificSoup:
    
    full_star_soup =div.find('div',{'class' : 'dealer-rating-stars'}).find_all('svg',{'class' : 'icon-image filled'})
    half_star_soup =div.find('div',{'class' : 'dealer-rating-stars'}).find_all('svg',{'class' : 'icon-image half'})
    star = len(full_star_soup) + len(half_star_soup)*0.5
    print(star)
    
    break

0
5.0


리뷰 넘버
음.. 이거는 앞에 걸 가져오네..

In [424]:
for div in specificSoup:
    
    review_soup =div.find('span',{'class' : 'listing-row__review-number'})
    
    
    review_no = contentProcess(review_soup,"0",0,"\n")
    
    print(review_no)
    
    break

83
83


가격

In [435]:
for div in specificSoup:
    print(type(div))
    price_soup = div.find('span', {'class' : 'listing-row__price'})
        
    price = contentProcess(price_soup,"0",0,"\n")
    
    print(price)
    
    break

<class 'bs4.element.Tag'>
$82,990
$82,990


In [111]:
def contentProcess(obj,holder,index, sep):

    if type(index) == list:
        content = contentMerge(obj,holder,index, sep)
    else:
        content = contentGet(obj,holder,index, sep)

    # print("contentProcess",content)

    return toReplaceAndLower(holder,content)

def contentGet(obj,holder,index, sep):
    if type(obj) == unicode or type(obj) == str:
        try:
            content = obj.split(sep)[index]
        except:
            content=""
    else:
        try:
            content = obj.get_text().strip().split(sep)[index].strip()
        except:
            content=""

    return content

def contentMerge(obj,holder,idxList, sep):
    if type(obj) == unicode:
        try:
            content = " ".join(obj.split(sep)[idxList[0]:idxList[1]])
        except:
            content = " ".join(obj.split(sep)[idxList[0]:])
    else:
        try:
            content = " ".join(obj.get_text().strip().split(sep)[idxList[0]:idxList[1]])
        except:
            content = " ".join(obj.get_text().strip().split(sep)[idxList[0]:])

    return content

def toReplaceAndLower( holder, content):
    if content == "":
        content = holder

    if type(content) !=int:
        content.lower()
    return content

def setMetaSubSoup(index):
    soup = div.find('ul', {'class' : 'listing-row__meta'}).find_all("li")
    try:
        sub_soup = soup[index]
    except:
        sub_soup = ""
    return sub_soup

def transmissionProcess(tag,holder):
    content = contentProcess(tag,holder,1,":").lower().split(" ")[0]
    first = content[0]
    if first == int and 1<= first <= 10:
        content = content[0]+"-speed"
    elif content == 'automatic':
        content = "6-speed"
    else:
        content = "x-speed"
    
    print(content)
    return content

def drivertrainProcess( tag,holder):
    content = contentProcess(tag,holder ,1,":").lower()
    import copy
    # not ref just value
    x = copy.deepcopy(content)
    if x == 'four wheel drive' or x == '4wd' or x=='4x4'or x=='awd':
         content = '4wd'
    elif x == '2wd' or x=='f w d': 
         content = 'fwd'
    elif x == 'rwd':
        pass
    else:
        content = '4wd' 
    
    print(content)
    return content

In [112]:
for div in specificSoup:
    exterior_color_soup = setMetaSubSoup(0)
    interior_color_soup = setMetaSubSoup(1)
    transmission_soup = setMetaSubSoup(2)
    drivetrain_soup = setMetaSubSoup(3)
    
    transmission = transmissionProcess(transmission_soup,"x-speed")
    drivetrain = drivertrainProcess(drivetrain_soup,"4wd")
    
    break

6-speed
4wd


## 크롤링 라이브러리화 하기

In [6]:
import numpy as np #  배열 생성 및 연산
import pandas as pd # 데이터프레임워크
import pickle # 암호화

import bs4 as bs # 크롤링
import urllib # 웹데이터 읽어오기
import json # json
from bs4 import BeautifulSoup
import re # 정규식 지원
import MySQLdb # mysql 지원
from sqlalchemy import create_engine # 데이터베이스 툴킷

try:
    import urllib.request as urlreq
except ImportError:
    import urllib as urlreq
    # raise ImportError('<any message you want here>')

In [7]:
# 크로울 다루는 책임을 가진 클래스
class CrawlHandler:
    # constructor take <class 'bs4.element.Tag'>
    def __init__(self, bs4_tag):
        self.tag = bs4_tag
        self.data = {}
        self.setSoup()
        self.setElement()
        
    def setMetaSubSoup(self,index):
        soup = self.meta_soup
        try:
            sub_soup = soup[index]
        except:
            sub_soup = ""
        return sub_soup
    
    def setSoup(self):
        setMetaSubSoup = self.setMetaSubSoup
        self.row_title_soup = div.find('h2', {'class' :'cui-delta listing-row__title'})
        self.mile_soup = div.find('span', {'class' : 'listing-row__mileage'})
        self.vendor_soup =div.find('div',{'class' : 'listing-row__dealer-name listing-row__dealer-name-mobile'}).div
        self.media_soup = div.find('div', {'class' : 'media-count shadowed'})

        self.meta_soup = div.find('ul', {'class' : 'listing-row__meta'}).find_all("li")
        self.exterior_color_soup = setMetaSubSoup(0)
        self.interior_color_soup = setMetaSubSoup(1)
        self.transmission_soup = setMetaSubSoup(2)
        self.drivetrain_soup = setMetaSubSoup(3)
        
        self.full_star_soup =div.find('div',{'class' : 'dealer-rating-stars'}).find_all('svg',{'class' : 'icon-image filled'})
        self.half_star_soup =div.find('div',{'class' : 'dealer-rating-stars'}).find_all('svg',{'class' : 'icon-image half'})
        self.review_soup =div.find('span',{'class' : 'listing-row__review-number'})
        self.price_soup = div.find('span', {'class' : 'listing-row__price'})
        
    def setElement(self):
        contentProcess = self.contentProcess
        drivertrainProcess = self.drivertrainProcess
        transmissionProcess =self.transmissionProcess

        
        row_title_soup = self.row_title_soup
        row_title_soup = self.row_title_soup
        mile_soup = self.mile_soup
        vendor_soup = self.vendor_soup
        media_soup = self.media_soup
        meta_soup = self.meta_soup
        full_star_soup = self.full_star_soup
        half_star_soup = self.half_star_soup
        review_soup = self.review_soup
        
        exterior_color_soup = self.exterior_color_soup
        interior_color_soup  = self.interior_color_soup 
        transmission_soup  = self.transmission_soup 
        drivetrain_soup = self.drivetrain_soup
        
        price_soup = self.price_soup
    
        self.year = contentProcess(row_title_soup,"empty year",0," ")
        self.company = contentProcess(row_title_soup,"empty company",1," ")
        self.subname = contentProcess(row_title_soup,self.company,2," ")
        self.title = contentProcess(row_title_soup,"no-title",[1]," ")
        self.mile = ''.join(contentProcess(mile_soup,"no-mile",0," ").split(','))
        self.vendor = contentProcess(vendor_soup,"no-vendor",[0]," ")
        photoPre = contentProcess(media_soup,0,0,"\n")
        videoPre = contentProcess(media_soup,0,1,"\n")
        self.photos = contentProcess(photoPre,0,0," ")
        self.video = contentProcess(videoPre,0,0," ")
        
        self.exterior_color = contentProcess(exterior_color_soup,"black",1,":")
        self.interior_color = contentProcess(interior_color_soup,"black",1,":")
        self.transmission = transmissionProcess(transmission_soup,"x-speed")
        self.drivertrain = drivertrainProcess(drivetrain_soup,"4wd")
        
        self.star = len(full_star_soup) + len(half_star_soup)*0.5
        self.review_no = contentProcess(review_soup,"0",0,"\n")
        self.price = ''.join(contentProcess(price_soup,"0",0,"\n")[1:].split(','))

        
    def getData(self):
        data = self.data
        
        data['year']=self.year
        data['company']=self.company
        data['subname']=self.subname
        data['title']=self.title
        data['mile']=self.mile
        data['vendor']=self.vendor
        data['photos']=self.photos
        data['video']=self.video
        data['exterior_color']=self.exterior_color
        data['interior_color']=self.interior_color
        data['transmission']=self.transmission
        data['drivertrain']=self.drivertrain
        data['star']=self.star
        data['review_no']=self.review_no
        data['price']=self.price
                
        return data
        

    def contentProcess(self, obj,holder,index, sep):
        
        if type(index) == list:
            content = self.contentMerge(obj,holder,index, sep)
        else:
            content = self.contentGet(obj,holder,index, sep)
        
        # print("contentProcess",content)
        
        return self.toReplaceAndLower(holder,content)

    def contentGet(self,obj,holder,index, sep):
        if type(obj) == unicode or type(obj) == str:
            try:
                content = obj.split(sep)[index]
            except:
                content=""
        else:
            try:
                content = obj.get_text().strip().split(sep)[index].strip()
            except:
                content=""
        
        return content

    def contentMerge(self,obj,holder,idxList, sep):
        if type(obj) == unicode:
            try:
                content = " ".join(obj.split(sep)[idxList[0]:idxList[1]])
            except:
                content = " ".join(obj.split(sep)[idxList[0]:])
        else:
            try:
                content = " ".join(obj.get_text().strip().split(sep)[idxList[0]:idxList[1]])
            except:
                content = " ".join(obj.get_text().strip().split(sep)[idxList[0]:])
        
        return content
    
    def toReplaceAndLower(self, holder, content):
        if content == "":
            content = holder
        
        if type(content) !=int:
            content.lower()
        return content
    
    def transmissionProcess(self, tag,holder):
        contentProcess = self.contentProcess
        content = contentProcess(tag,holder,1,":").lower().split(" ")[0]
        first = content[0]
        numbers = ['1','2','3','4','5','6','7','8','9','10']
        # print(first)
        if type(first) == int and 1<= first <= 10:
            content = content[0]+"-speed"
        elif first in numbers:
            content = first+"-speed"
        elif content == 'automatic':
            content = "6-speed"

        # print(content)
        return content
    
    def drivertrainProcess(self, tag,holder):
        contentProcess = self.contentProcess
        content = self.contentProcess(tag,holder ,1,":").lower()
        import copy
        # not ref just value
        x = copy.deepcopy(content)
        if x == 'four wheel drive' or x == '4wd' or x=='4x4'or x=='awd':
             content = '4wd'
        elif x == '2wd' or x=='f w d': 
             content = 'fwd'
        elif x == 'rwd':
            pass
        else:
            content = '4wd' 

        # print(content)
        return content

## 크롤링 실행

첫번째 것 이용

https://stackoverflow.com/users/5014455/juanpa-arrivillaga

그 외에 많은데, 쓸 필요가 없다.

https://stackoverflow.com/users/1085495/nasser-al-wohaibi

https://stackoverflow.com/users/4960953/mikhail-sam

https://stackoverflow.com/users/4640132/zuku

In [8]:
rows = []
for page in range(1,50):
    url = 'https://www.cars.com/for-sale/searchresults.action/?page='+str(page)+'&perPage=100&rd=99999&searchSource=PAGINATION&showMore=true&sort=relevance&stkTypId=28881&zc=31216'

    sauce = urlreq.urlopen(url).read()
    soup = bs.BeautifulSoup(sauce, 'lxml')

    specificSoup = soup.find_all('div', class_='listing-row__details')
    
    cnt = 0

    print("===",page)
    for div in specificSoup:
        data = CrawlHandler(div).getData()
        # print(data)
        rows.append(data)
        cnt +=1
    break

('===', 1)


# 데이터 프레임에 담기

In [9]:
df = pd.DataFrame(rows)
df

Unnamed: 0,company,drivertrain,exterior_color,interior_color,mile,photos,price,review_no,star,subname,title,transmission,vendor,video,year
0,Subaru,4wd,Desert Khaki,Black,26525,23,17999,43,5.0,Crosstrek,Subaru Crosstrek 2.0i,5-speed,1st Choice Autos,1,2016
1,Mazda,4wd,Silver,Gray,118205,21,3750,43,5.0,Tribute,Mazda Tribute LX V6,4-speed,1st Choice Autos,1,2003
2,Ford,4wd,Race Red,Black,2330,32,53900,244,5.0,F-150,Ford F-150 Raptor,1-speed,Gilbert & Baugh Ford,1,2018
3,Porsche,4wd,Gray,Black,14698,32,89997,10,5.0,Cayman,Porsche Cayman GT4,6-speed,Exclusive Auto Wholesale,1,2016
4,Chevrolet,4wd,Red Hot,Jet Black,1704,32,24000,117,5.0,Camaro,Chevrolet Camaro LT,8-speed,Subaru of Gainesville,1,2017
5,Mercedes-Benz,rwd,Palladium Silver Metallic,Silk Beige / Espresso Brown,121078,32,32991,939,5.0,S,Mercedes-Benz S 550,7-speed,Hendrick BMW,1,2014
6,INFINITI,rwd,Platinum,Black,31736,26,19998,18,5.0,Q50,INFINITI Q50 3.0T Premium,7-speed,Hertz Car Sales Winston-Salem,1,2017
7,Audi,4wd,Silver Metallic,Black,57171,32,14500,201,5.0,TT,Audi TT 3.2 quattro,6-speed,CarLotz Greensboro,1,2008
8,Tesla,4wd,Black,Black,17342,32,104890,96,5.0,Model,Tesla Model X P90D,1-speed,Bayshore Automotive,1,2016
9,BMW,rwd,Silver,Creme Beige,106334,31,9995,77,5.0,645,BMW 645 Ci,6-speed,RDU Auto Sales,1,2004


In [10]:
df["year"] = df["year"].astype('int')
df["mile"] = df["mile"].astype('int')
df["photos"] = df["photos"].astype('int')
df["video"] = df["video"].astype('int')
df["star"] = df["star"].astype('float')
df["review_no"] = df["review_no"].astype('int')
df["price"] = df["price"].astype('int')

# 데이터 베이스 처리

In [None]:
#### create table
# http://docs.sqlalchemy.org/en/latest/orm/tutorial.html
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()

# types 
# http://docs.sqlalchemy.org/en/latest/core/type_basics.html
from sqlalchemy import Column, Integer, String, Float

class CarInfo(Base):
    __tablename__ = 'car_info'

    id = Column(Integer, primary_key=True) 
    year = Column(Integer)
    company = Column(String(16))
    subname = Column(String(16))
    title = Column(String(16))
    mile = Column(Integer)
    vendor = Column(String(16))
    photos = Column(Integer)
    video = Column(Integer)
    exterior_color = Column(String(16))
    interior_color = Column(String(16))
    transmission = Column(String(16))
    drivertrain = Column(String(16))
    star = Column(Float)
    review_no = Column(String(16))
    price = Column(Integer)

    def __repr__(self):
        return rtform % (
                        self.id, self.year, self.company, self.subname, 
                        self.title, self.mile, self.vendor, 
                        self.photos, self.video, self.exterior_color, 
                        self.interior_color, self.transmission, self.drivertrain, 
                        self.star, self.review_no, self.price, )
    
    rtform = "<CarInfo(id = '%s', year = '%s', company = '%s', subname = '%s',title = '%s', mile = '%s', vendor = '%s', photos = '%s', video = '%s', exterior_color = '%s', interior_color = '%s', transmission = '%s', drivertrain = '%s', star = '%s', review_no = '%s', price = '%s)>"

CarInfo.__table__

In [28]:
#pw = pickle.load(open('./Data/pw.p','rb'))

# write local car_info
#engine = create_engine("mysql+mysqldb://root:" + pw.data + "@127.0.0.1/car_info", echo=True)
engine = create_engine("mysql+mysqldb://root:0@mysql/test", echo=True)

In [29]:
Base.metadata.create_all(engine)

2018-04-09 23:36:31,645 INFO sqlalchemy.engine.base.Engine SHOW VARIABLES LIKE 'sql_mode'
2018-04-09 23:36:31,647 INFO sqlalchemy.engine.base.Engine ()
2018-04-09 23:36:31,654 INFO sqlalchemy.engine.base.Engine SELECT DATABASE()
2018-04-09 23:36:31,656 INFO sqlalchemy.engine.base.Engine ()
2018-04-09 23:36:31,665 INFO sqlalchemy.engine.base.Engine show collation where `Charset` = 'utf8' and `Collation` = 'utf8_bin'
2018-04-09 23:36:31,666 INFO sqlalchemy.engine.base.Engine ()
2018-04-09 23:36:31,667 INFO sqlalchemy.engine.base.Engine SELECT CAST('test plain returns' AS CHAR(60)) AS anon_1
2018-04-09 23:36:31,668 INFO sqlalchemy.engine.base.Engine ()
2018-04-09 23:36:31,670 INFO sqlalchemy.engine.base.Engine SELECT CAST('test unicode returns' AS CHAR(60)) AS anon_1
2018-04-09 23:36:31,671 INFO sqlalchemy.engine.base.Engine ()
2018-04-09 23:36:31,674 INFO sqlalchemy.engine.base.Engine SELECT CAST('test collated returns' AS CHAR CHARACTER SET utf8) COLLATE utf8_bin AS anon_1
2018-04-09 23

In [30]:
df.to_sql(name="car_info", con=engine, if_exists='replace')

2018-04-09 23:37:50,407 INFO sqlalchemy.engine.base.Engine DESCRIBE `car_info`
2018-04-09 23:37:50,409 INFO sqlalchemy.engine.base.Engine ()
2018-04-09 23:37:50,413 INFO sqlalchemy.engine.base.Engine DESCRIBE `car_info`
2018-04-09 23:37:50,414 INFO sqlalchemy.engine.base.Engine ()
2018-04-09 23:37:50,419 INFO sqlalchemy.engine.base.Engine SHOW FULL TABLES FROM `test`
2018-04-09 23:37:50,421 INFO sqlalchemy.engine.base.Engine ()
2018-04-09 23:37:50,435 INFO sqlalchemy.engine.base.Engine SHOW CREATE TABLE `car_info`
2018-04-09 23:37:50,438 INFO sqlalchemy.engine.base.Engine ()
2018-04-09 23:37:50,442 INFO sqlalchemy.engine.base.Engine 
DROP TABLE car_info
2018-04-09 23:37:50,445 INFO sqlalchemy.engine.base.Engine ()
2018-04-09 23:37:50,469 INFO sqlalchemy.engine.base.Engine COMMIT
2018-04-09 23:37:50,472 INFO sqlalchemy.engine.base.Engine 
CREATE TABLE car_info (
	`index` BIGINT, 
	company TEXT, 
	drivertrain TEXT, 
	exterior_color TEXT, 
	interior_color TEXT, 
	mile BIGINT, 
	photos BIG