## Examples 
1. Use the class SelScrape to get hourly data from weather.com
2. Use the class SelScrape to get information about a car from craigslist
3. Use the class CraigAccess to get a fully detailed pandas DataFrame of car info from craigslist

In [1]:
import sel_scrape as sc
import pandas as pd

___
### 1.0 Use SelScrape class to get weather data from weather.com
___

#### 1.01 Create and instance of SelScrape

In [2]:
scc = sc.SelScrape(headless=False)

#### 1.02 navigate to the weather.com site for Fairfield, CT, and extract the hourly table element

In [3]:
w = 'https://weather.com/weather/hourbyhour/l/06824:4:US'
scc.goto(w)
p = '//table[@class="twc-table"]/..'
tws_table = scc.findxpath(p)


#### 1.04 Get the html for that table, and create a pandas DataFrame from that html.
* The ```pd.read_html``` method will get an array all tables from the html that you pass in the first argument.  
* For this weather.com page, there is only one table, so you will access element index 0

In [4]:
html_table = tws_table['value'][0].get_attribute('innerHTML')
array_of_df = pd.read_html(html_table)
# only one element in this array, which holds the table
df_hourly = array_of_df[0]


#### 1.05 Fix the columns because the first column is all NaN's

In [5]:
# get all column objects but the last one
cols = df_hourly.columns[:-1]
# get rid of the first column of NaN's
df_hourly = df_hourly[df_hourly.columns.values[1:]]
# change the columns so that they now coincide with the data
df_hourly.columns = cols
# display the DataFrame
df_hourly

Unnamed: 0,Time,Description,Temp,Feels,Precip,Humidity,Wind
0,6:15 pmWed,Sunny,67°,67°,0%,43%,S 8 mph
1,7:00 pmWed,Sunny,65°,65°,0%,46%,SSE 7 mph
2,8:00 pmWed,Clear,62°,62°,0%,51%,ESE 5 mph
3,9:00 pmWed,Clear,60°,60°,0%,56%,E 4 mph
4,10:00 pm Wed,Partly Cloudy,55°,55°,0%,74%,ENE 4 mph
5,11:00 pm Wed,Partly Cloudy,53°,53°,0%,77%,ENE 4 mph
6,12:00 am Thu,Partly Cloudy,52°,51°,5%,72%,NE 4 mph
7,1:00 am Thu,Partly Cloudy,51°,50°,5%,68%,NE 4 mph
8,2:00 am Thu,Partly Cloudy,50°,47°,0%,60%,NE 6 mph
9,3:00 am Thu,Partly Cloudy,49°,46°,0%,57%,NE 6 mph


___
### 2.0 Use SelScrape class to extract info about cars on Craigslist
___

#### 2.01 Create a search URL by combining the "route" and the "parameters" of the url.

In [12]:
craig_url_base = "https://sfbay.craigslist.org/search/cta?"
craig_url_parameters = ["auto_make_model=BMW+328i",
                        "sort=date",
                        "max_auto_year=2010",
                        "auto_transmission=auto_transmission_1",
                        "min_auto_miles=0",
                        "max_auto_miles=500000"
                       ]
craig_url = craig_url_base + craig_url_parameters[0]
for url_param in craig_url_parameters[1:]:
    craig_url += '&' + url_param

print(craig_url)

https://sfbay.craigslist.org/search/cta?auto_make_model=BMW+328i&sort=date&max_auto_year=2010&auto_transmission=auto_transmission_1&min_auto_miles=0&max_auto_miles=500000


#### 2.02 Retrieve this page

In [13]:
scc.goto(craig_url)

#### 2.02 Get all of the possible links to matches of the above search url.  _However, only show info from first match_

In [14]:
# a_link_array  = scc.driver.find_elements_by_xpath("//a[@class='result-title hdrlnk']")
a_link_array  = scc.findxpath("//a[@class='result-title hdrlnk']")['value']
# goto the href link on the first match (index 0)
hr = a_link_array[0].get_attribute("href")
scc.goto(hr)

#### 2.03 Create a dictionary that holds the xpath of every attribute that you want to find on this results page.

In [15]:
dict_things_to_get = {
    'price':"//span[@class='postingtitletext']/span[@class='price']",
    'page_title':"//span[@class='postingtitletext']/span[@id='titletextonly']",
    'auto_condition':'//section[@class="userbody"]/div[@class="mapAndAttrs"]/p[@class="attrgroup"]/span[contains(text(),"condition")]/b',
    'cylinders':'//section[@class="userbody"]/div[@class="mapAndAttrs"]/p[@class="attrgroup"]/span[contains(text(),"cylinders")]/b',
    'drive':'//section[@class="userbody"]/div[@class="mapAndAttrs"]/p[@class="attrgroup"]/span[contains(text(),"drive")]/b',
    'fuel':'//section[@class="userbody"]/div[@class="mapAndAttrs"]/p[@class="attrgroup"]/span[contains(text(),"fuel")]/b',
    'odometer':'//section[@class="userbody"]/div[@class="mapAndAttrs"]/p[@class="attrgroup"]/span[contains(text(),"odometer")]/b',
    'paint_color':'//section[@class="userbody"]/div[@class="mapAndAttrs"]/p[@class="attrgroup"]/span[contains(text(),"paint color")]/b',
    'title_status':'//section[@class="userbody"]/div[@class="mapAndAttrs"]/p[@class="attrgroup"]/span[contains(text(),"title status")]/b',
    'transmission':'//section[@class="userbody"]/div[@class="mapAndAttrs"]/p[@class="attrgroup"]/span[contains(text(),"transmission")]/b',
    'posted_full_text':"//section[@id='postingbody']",
}

#### 2.04 Print out the above attributes

In [16]:
for k in dict_things_to_get.keys():
    xpath = dict_things_to_get[k]
    e = scc.findxpath(xpath)
    if e['status'] is not None:
        print(f"{k}: {e['status']}")
        thing = 'not found'
    else:        
        thing = e['value'][0].text
    print(f"{k}: {thing}")


price: $9600
page_title: 2008 BMW 328i 2DR COUPE
auto_condition: not found
auto_condition: not found
cylinders: not found
cylinders: not found
drive: not found
drive: not found
fuel: gas
odometer: 72694
paint_color: not found
paint_color: not found
title_status: clean
transmission: manual
posted_full_text: 2008 bmw 328i 2DR COUPE
CARFORNIA - Call or Text: 408-679-3018 - $9,600

2008 BMW 328i Coupe RWD 72k easy miles on it 2 Owner vehicle, clean history, no accident Always maintained by BMW dealership, Garage kept, paint is in immaculate condition, no dents or dings, Interior is like new with extra clean leather and carpet, non smoking, no pets, In Excellent Cosmetic and Mechanical Condition Must see to appreciate, For sale from Carfornia Please call or text 408-679-3018

CARFORNIA

Year: 2008
Make: BMW
Model: 328I
Series: 2DR COUPE
VIN: WBAWB33598PU88894
Stock #: s548
Condition: Used
Mileage: 72,694
Exterior: Gray
Interior: Black
Body: Coupe
Transmission: Manual
Engine: 3.0L I6
Drive T

### 3.0 Now use the class CraigAccess, which uses SelScrape to do more complicated auto searches.
**To search through every possible geo in the United States, remove the geos_csv_path argument from the CraigAcess constructor. See below.**
```
    ca_bmw = ca.CraigAccess(make=make,model=model,headless=False)
```

In [17]:
import craig_access as ca

In [20]:
make = 'BMW'
model = '635' # 328i, 5 series
ca_bmw = ca.CraigAccess(make=make,model=model,geos_csv_path='./df_geos_subset.csv',headless=False)

In [21]:
df_bmws = ca_bmw.main()

2019-05-08 18:06:34,101 - root - INFO - ca_main starting 0000 -   46 time 2019-05-08 18:06:34.101246
2019-05-08 18:06:34,101 - root - INFO - ca_main starting 0000 -   46 time 2019-05-08 18:06:34.101246
2019-05-08 18:07:15,133 - root - INFO - processing href: https://sfbay.craigslist.org/pen/cto/d/belmont-bmw-635csi-manual-5-speed/6874656572.html
2019-05-08 18:07:15,133 - root - INFO - processing href: https://sfbay.craigslist.org/pen/cto/d/belmont-bmw-635csi-manual-5-speed/6874656572.html
2019-05-08 18:08:00,407 - root - INFO - processing href: https://orlando.craigslist.org/cto/d/eustis-bmw-635-csi-euro-price-reduction/6878353499.html
2019-05-08 18:08:00,407 - root - INFO - processing href: https://orlando.craigslist.org/cto/d/eustis-bmw-635-csi-euro-price-reduction/6878353499.html
2019-05-08 18:08:15,443 - root - INFO - processing href: https://orlando.craigslist.org/cto/d/eustis-bmw-635-csi-euro-price-reduction/6878353499.html
2019-05-08 18:08:15,443 - root - INFO - processing href:

In [22]:
df_bmws

Unnamed: 0,auto_condition,cylinders,date_posted,date_updated,drive,fuel,geo,href,odometer,page_text,page_title,paint_color,price,title_status,transmission
0,good,6 cylinders,b'2019-04-25T11:58:43-0700',,rwd,gas,san_francisco_bay_area,https://sfbay.craigslist.org/pen/cto/d/belmont...,170000,Rare manual 5 speed 1986 e24. Well maintained;...,BMW 635csi Manual 5 Speed,,$6900,clean,manual
1,good,6 cylinders,b'2019-04-30T14:20:59-0400',,,gas,daytona_beach,https://orlando.craigslist.org/cto/d/eustis-bm...,186000,1985 bmw 635 euro. 5 Speed trans. This car has...,bmw 635 csi EURO (Price Reduction),black,$7500,clean,manual
2,good,6 cylinders,b'2019-04-30T14:20:59-0400',,,gas,orlando,https://orlando.craigslist.org/cto/d/eustis-bm...,186000,1985 bmw 635 euro. 5 Speed trans. This car has...,bmw 635 csi EURO (Price Reduction),black,$7500,clean,manual
3,good,6 cylinders,b'2019-04-30T14:20:59-0400',,,gas,tampa_bay_area,https://orlando.craigslist.org/cto/d/eustis-bm...,186000,1985 bmw 635 euro. 5 Speed trans. This car has...,bmw 635 csi EURO (Price Reduction),black,$7500,clean,manual
4,good,6 cylinders,b'2019-04-30T14:20:59-0400',,,gas,treasure_coast,https://orlando.craigslist.org/cto/d/eustis-bm...,186000,1985 bmw 635 euro. 5 Speed trans. This car has...,bmw 635 csi EURO (Price Reduction),black,$7500,clean,manual
5,excellent,6 cylinders,b'2019-04-29T19:01:51-0400',,rwd,gas,ann_arbor,https://annarbor.craigslist.org/cto/d/auburn-h...,106700,The E24-series 635 represents a high-water mar...,1985 BMW 635 CSi,custom,$6950,clean,manual
6,excellent,6 cylinders,b'2019-04-29T19:01:51-0400',,rwd,gas,detroit_metro,https://annarbor.craigslist.org/cto/d/auburn-h...,106700,The E24-series 635 represents a high-water mar...,1985 BMW 635 CSi,custom,$6950,clean,manual
