## Examples 
1. Use the class SelScrape to get hourly data from weather.com
2. Use the class SelScrape to get information about a car from craigslist
3. Use the class CraigAccess to get a fully detailed pandas DataFrame of car info from craigslist

In [None]:
import sel_scrape as sc
import pandas as pd
import datetime

___
### 1.0 Use SelScrape class to get weather data from weather.com
___

#### 1.01 Create and instance of SelScrape

In [None]:
scc = sc.SelScrape(headless=False)

#### 1.02 navigate to the weather.com site for Fairfield, CT, and extract the hourly table element

In [None]:
w = 'https://weather.com/weather/hourbyhour/l/06824:4:US'
scc.goto(w)
p = '//table[@class="twc-table"]/..'
tws_table = scc.findxpath(p)


#### 1.04 Get the html for that table, and create a pandas DataFrame from that html.
* The ```pd.read_html``` method will get an array all tables from the html that you pass in the first argument.  
* For this weather.com page, there is only one table, so you will access element index 0

In [None]:
html_table = tws_table['value'][0].get_attribute('innerHTML')
array_of_df = pd.read_html(html_table)
# only one element in this array, which holds the table
df_hourly = array_of_df[0]


#### 1.05 Fix the columns because the first column is all NaN's

In [None]:
# get all column objects but the last one
cols = df_hourly.columns[:-1]
# get rid of the first column of NaN's
df_hourly = df_hourly[df_hourly.columns.values[1:]]
# change the columns so that they now coincide with the data
df_hourly.columns = cols
# display the DataFrame
df_hourly

___
### 2.0 Use SelScrape class to extract info about cars on Craigslist
___

#### 2.01 Create a search URL by combining the "route" and the "parameters" of the url.

In [None]:
craig_url_base = "https://sfbay.craigslist.org/search/cta?"
craig_url_parameters = ["auto_make_model=BMW+328i",
                        "sort=date",
                        "max_auto_year=2010",
                        "auto_transmission=auto_transmission_1",
                        "min_auto_miles=0",
                        "max_auto_miles=500000"
                       ]
craig_url = craig_url_base + craig_url_parameters[0]
for url_param in craig_url_parameters[1:]:
    craig_url += '&' + url_param

print(craig_url)

#### 2.02 Retrieve this page

In [None]:
scc.goto(craig_url)

#### 2.02 Get all of the possible links to matches of the above search url.  _However, only show info from first match_

In [None]:
# a_link_array  = scc.driver.find_elements_by_xpath("//a[@class='result-title hdrlnk']")
a_link_array  = scc.findxpath("//a[@class='result-title hdrlnk']")['value']
# goto the href link on the first match (index 0)
hr = a_link_array[0].get_attribute("href")
scc.goto(hr)

#### 2.03 Create a dictionary that holds the xpath of every attribute that you want to find on this results page.

In [None]:
dict_things_to_get = {
    'price':"//span[@class='postingtitletext']/span[@class='price']",
    'page_title':"//span[@class='postingtitletext']/span[@id='titletextonly']",
    'auto_condition':'//section[@class="userbody"]/div[@class="mapAndAttrs"]/p[@class="attrgroup"]/span[contains(text(),"condition")]/b',
    'cylinders':'//section[@class="userbody"]/div[@class="mapAndAttrs"]/p[@class="attrgroup"]/span[contains(text(),"cylinders")]/b',
    'drive':'//section[@class="userbody"]/div[@class="mapAndAttrs"]/p[@class="attrgroup"]/span[contains(text(),"drive")]/b',
    'fuel':'//section[@class="userbody"]/div[@class="mapAndAttrs"]/p[@class="attrgroup"]/span[contains(text(),"fuel")]/b',
    'odometer':'//section[@class="userbody"]/div[@class="mapAndAttrs"]/p[@class="attrgroup"]/span[contains(text(),"odometer")]/b',
    'paint_color':'//section[@class="userbody"]/div[@class="mapAndAttrs"]/p[@class="attrgroup"]/span[contains(text(),"paint color")]/b',
    'title_status':'//section[@class="userbody"]/div[@class="mapAndAttrs"]/p[@class="attrgroup"]/span[contains(text(),"title status")]/b',
    'transmission':'//section[@class="userbody"]/div[@class="mapAndAttrs"]/p[@class="attrgroup"]/span[contains(text(),"transmission")]/b',
    'posted_full_text':"//section[@id='postingbody']",
}

#### 2.04 Print out the above attributes

In [None]:
for k in dict_things_to_get.keys():
    xpath = dict_things_to_get[k]
    e = scc.findxpath(xpath)
    if e['status'] is not None:
        print(f"{k}: {e['status']}")
        thing = 'not found'
    else:        
        thing = e['value'][0].text
    print(f"{k}: {thing}")


### 3.0 Now use the class CraigAccess, which uses SelScrape to do more complicated auto searches.
**To search through every possible geo in the United States, remove the geos_csv_path argument from the CraigAcess constructor. See below.**
```
# import the craig_access module
import craig_access as ca 
# create an instace of the CraigAccess class
ca_bmw_635 = ca.CraigAccess(make='BMW',model='635',headless=False)
# run the search by call the main method
ca_bmw_635.main()
```
**Below is an example using other possible inputs to the CraigAccess constructor:**
```
# import the craig_access module
import craig_access as ca 
# create an instace of the CraigAccess class
ca_vw_beetle = CraigAccess(
    headless=False, # show the browser
    geos_csv_path=None, # use all geos. See ./df_geos_subset.csv for a smaller set
    make='volkswagon', 
    model='beetle', 
    max_auto_year='1970', 
    max_auto_miles=300000, 
    auto_transmission=1,  # 1 = manual, 2 = automatic 
    )
# run the search by call the main method
ca_vw_beetle.main()

```

In [None]:
import craig_access as ca

#### 3.01 Search for bmw 635's for a limited set of geos as defined in the csv file df_geos_subset.csv

In [None]:
make = 'BMW'
model = '635' # 328i, 5 series
ca_bmw = ca.CraigAccess(make=make,model=model,geos_csv_path='./df_geos_subset.csv',headless=False)

In [None]:
df_bmws = ca_bmw.main()

In [None]:
df_bmws

#### 3.02 Search all geos for bmw 2002, before 1970.  This takes a bit of time.

In [None]:
import importlib
importlib.reload(ca)

In [None]:
print(f'start search at {datetime.datetime.now()}')
urls_only = False
ca_bmw_2002 = ca.CraigAccess(headless=False,
        make='BMW',
        model='2002',
        max_auto_year='1969', 
        max_auto_miles=300000,
        urls_only=urls_only)
df_bmw_2002 = ca_bmw_2002.main()
print(f'end search at {datetime.datetime.now()}')

In [None]:
df_bmw_2002