## CarGuru Platform Notes

### get_links() 

- 2 search endpoints: NEW and USED cars
- can potentially filter for new postings with AGE_IN_DAYS indicator

`NEW CARS` search listings link structure samples
- https://www.cargurus.com/Cars/new/searchresults.action?zip=48933&inventorySearchWidgetType=NEW_CAR&sortDir=ASC&sourceContext=untrackedExternal_false_0&distance=100&sortType=AGE_IN_DAYS

`USED CARS` search listings link structure samples
- https://www.cargurus.com/Cars/inventorylisting/viewDetailsFilterViewInventoryListing.action?zip=48933&inventorySearchWidgetType=AUTO&sortDir=ASC&sourceContext=untrackedWithinSite_false_0&distance=100&sortType=PRICE

- https://www.cargurus.com/Cars/inventorylisting/viewDetailsFilterViewInventoryListing.action?zip=48933&inventorySearchWidgetType=AUTO&sortDir=ASC&sourceContext=untrackedWithinSite_true_1&distance=200&sortType=AGE_IN_DAYS

- https://www.cargurus.com/Cars/inventorylisting/viewDetailsFilterViewInventoryListing.action?zip=48933&distance=200
- https://www.cargurus.com/Cars/inventorylisting/viewDetailsFilterViewInventoryListing.action?zip=48933&distance=200#resultsPage=2

### get_data()
- 2 part url structure:
    - `f'{search_listings_url}{car_listing}'`
    
`NEW CARS` listing url sample    
- https://www.cargurus.com/Cars/new/searchresults.action?zip=48933&distance=200#listing=361741807/NEWCAR_FEATURED/DEFAULT
    
`USED CARS` listing url sample
- https://www.cargurus.com/Cars/inventorylisting/viewDetailsFilterViewInventoryListing.action?zip=48933&inventorySearchWidgetType=AUTO&sortDir=ASC&sourceContext=untrackedWithinSite_true_1&distance=200&sortType=AGE_IN_DAYS#listing=362620800/NONE

## Scripts

In [None]:
import undetected_chromedriver as uc
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import time

In [8]:
def get_links(zip_code, radius, car_type):
    
    # url builder
    page = 1
    if car_type == 'new':
        url = f'https://www.cargurus.com/Cars/new/searchresults.action?' \
              f'zip={zip_code}&inventorySearchWidgetType=NEW_CAR&sortDir=ASC' \
              f'&sourceContext=untrackedExternal_false_0&distance={radius}' \
              f'&sortType=AGE_IN_DAYS#resultsPage={page}'
    elif car_type == 'used':
        url = f'https://www.cargurus.com/Cars/inventorylisting/viewDetailsFilterViewInventoryListing.action?'\
              f'zip={zip_code}&inventorySearchWidgetType=AUTO&sortDir=ASC' \
              f'&sourceContext=untrackedWithinSite_true_1&distance={radius}' \
              f'&sortType=AGE_IN_DAYS#resultsPage={page}'
    else: 
        print('Invalid car type, new or used')
        
    # launch Chrome session 
    driver = uc.Chrome()
    #driver.implicitly_wait(10)
    driver.get(url)
    print("\n 3...")
    time.sleep(1)
    print("\n 2...")
    time.sleep(1)
    print("\n 1...")
    time.sleep(1)
    
    pass

In [9]:
zip_code = 48933
radius = 200

get_links(zip_code, radius, car_type='used')
## sliding puzzle - getting blocked with 5 seconds after manual intervention
# additional implicit waits hit the same sliding puzzle

In [None]:
    # find page elements
    # check AGE_IN_DAYS
#     for i in range(pages):
#         html = driver.page_source
        
#         next_page = driver.find_element_by_class_name('nextPageElement')
#         next_page.click()

### search listings html parsing

In [None]:
links = './sample/CarGurus.html'

In [None]:
with open(links) as fp:
    soup = BeautifulSoup(fp)
    
links = [link.get('href') for link in soup.find_all('a')]
car_listings = [link for link in new_links if link is not None and link.startswith('#listing')] #filter for NoneType hrefs

In [None]:
#print(soup.prettify())

In [None]:
len(car_listings)

In [None]:
car_listings[:10]

### individual listing html parsing

In [11]:
used_listing = './sample/2021ChevroletSilverado.html'
new_listing = './sample/new2023ChevroletTrailblazer.html'

In [None]:
#div class="cardBodyPadding cardBody"

In [13]:
with open(used_listing) as fp:
    soup = BeautifulSoup(fp)

#print(soup.prettify())

## Requirements

Python implementation: CPython

Python version       : 3.11.4

IPython version      : 8.12.0


Compiler    : Clang 14.0.6 

OS          : Darwin

Release     : 22.5.0

Machine     : x86_64

Processor   : i386

CPU cores   : 8

Architecture: 64bit


bs4                    : 4.12.2

selenium               : 4.10.0

undetected_chromedriver: 3.5.0

In [2]:
#%load_ext watermark
#%watermark -v -m -iv