## Python Dealership Scraping Tutorial Part 2: Honda Motor Company

Hello! In this tutorial we will use Python to retrieve dealership data from the [Honda North America](https://www.honda.com/) website!

What you'll need for this part of the tutorial (and all other parts): 
* Python3.3 or later
* Jupyter Notebooks installed
* A basic understanding of how websites retrieve and load content to users (or a willingness to learn!)
* Patience to experiment!

Because individual websites store, retrieve and load data (and by extension, content) to users in vastly different ways, this tutorial is not all-encompassing. What I mean is: While the rough methodology and experimental/scientific method outlined here will work for any site you come across, the code you will have to write to retrieve the data will be different. 

Depending on the site administration, you might sometimes get lucky. For instance, Chrsyler, Dodge, Jeep and Ram all use the same back-end [REST API](https://restfulapi.net/) for their United States dealer services. The only difference is the endpoint of the URL you'll want to make your requests to changes depending on whether you want Chrysler, Dodge, Jeep or Ram dealer results. 

In most cases, though, you'll find you're repeating the same experimental steps on every site you want to scrape. There will be some similarities that can be generalized, bits of code that can be made into functions and so on, but you'll always want to start with a fresh Jupyter Notebook to get yourself started (in my opinion).

### A Skippable Tangent: Why Jupyter? 

You can skip this section to get right to the "good" stuff if you want. But I'd like to just say I think Jupyter Notebooks, for all their shortcomings as a proper IDE, are the best tool to use for jobs like this. 

The rapid experimentation and manipulation of objects (lists, dictionaries, etc.) that you can achieve in a Jupyter Notebook is unrivaled. The interactive nature of them provides a great environment to serve as your scratchpad from which you can copy and paste your "solution" into the IDE of your choice. 

There are some things an IDE like PyCharm will do you for: parameter previewing, underlining errors, proper debugging, and so on are just a few to name. But for just getting started, especially if this is your first time writing Python code, Jupyter is tough to beat. 

### Loading Libraries

Okay! Let's load in our libraries. For this I'll be using:

* json : the built-in Python JSON parsing/outputting library
* requests : the best Python library for interacting with websites/APIs with [HTTP methods](https://www.w3schools.com/tags/ref_httpmethods.asp) (we'll use GET and POST exclusively)
* mbtools : a custom-made module that has a bunch of useful functions for web scraping

Unfortunately, the name of the author of mbtools has been lost to time. So I can't give them credit in this Notebook for their work. 

Notice how I'm importing `mbtools` from a different directory, `lib`, compared to the one I'm in, `scrape_unsecured_js_file`, so the import is structured like:

```python
import utils.mbtools as mbtools
```

This is possible because of the first code block below. What this does is it adds the `lib` directory that contains the `scraping_utils.py` and `mbtools.py` files to the PATH. 

In [1]:
import sys

sys.path.insert(0, '../lib')

Now this works as you'd expect since this file can "see" the `lib` directory, even though it's in the `scrape_unsecured_js_file` directory. 

In [2]:
import json
import requests
import mbtools
from scraping_utils import show_obj_head as show_obj_head
from scraping_utils import equivalence_checker as equivalence_checker
import pprint # For pretty printing JSON-like objects

Keep this in mind as you're moving custom module and utility files around: The project structure is very important and you'll spend a ton of time fixing path issues, relative import issues, etc. if you're not carful. 

If you'd like to see the `show_obj_head` and `equivalence_checker` code separately, take a look at the `scraping_utils.py` file located in the `lib` directory. This is where these helper functions are stored. 

### Let's Dive Right In

I'm not going to repeat a bunch of text and movie references here, if you're interested in seeing the proper introduction to this section I'd advise you to revisit Part One of this tutorial. 

Instead, let's just cut to the chase and get rolling. You know the drill, navigate to the dealership search functionality on the Honda website and start playing with the website. If you're having trouble finding it or are not feeling like searching, [here you go!](https://automobiles.honda.com/tools/dealership-locator)

Same deal as Part One: We want to manipulate the site into showing us how it makes requests to a database, API, .js file, etc. to retrieve information. So, let's do that. 

Here's what the initial site looks like:
INSERT SOMETHING HERE FOR AN IMAGE OF THE INITIAL SITE


And some things to notice right away that we might be able to take advantage of, before even opening up the Inspect Console:
* You can search by ZIP
* You can search by City, State combination
* You can search by name
<br>

So we'll have a couple different ways we can attack this site. The intriguing one is the name: Depending on the site security, an old trick we could use would be to send an "empty search" on name and see what we return. Depending on the security and how the back-end is setup, this might return everything or nothing. Definitely worth a try though! 

Anyway, let's open up the Inspect Console (`CTRL` + `SHIFT` + `I` if you've forgotten) and start playing!

INSERT SOMETHING HERE TRYING TO DO THE NAME, CITY AND STATE, AND ZIP

It seems like searching on ZIP is the easiest, since that's just one value we have to feed the API instead of two (in the case of City and State). So, let's attack that route and see if we can do one of two things (or a combination of them):

1. Send requests to a random ZIP and return a sufficiently-larger number of dealers from it
2. Send requests to a sufficiently-large number of ZIPs and return only a few dealers from each

The first is easier, but less-likely to be supported by the site. Most of these OEM sites limit you either on the search radius around a ZIP or limit you on the number of results that can be returned from a single request. It's likely we'll need to try the second option, but we'll explore both. 

In [35]:
base_url = 'https://automobiles.honda.com/platform/api/v1/dealer?'

params = \
{
'productDivisionCode': 'A',
'excludeServiceCenters': 'true',
'zip': '48371',
'maxResults': '100'
}
qs_params = json.dumps(params, separators=('&', '='))[1:-1].replace('\"', '')
print(len(requests.get(url=base_url+qs_params).json()['Dealers']))

params = \
{
'productDivisionCode': 'A',
'excludeServiceCenters': 'true',
'zip': '48371',
'maxResults': '500'
}
qs_params = json.dumps(params, separators=('&', '='))[1:-1].replace('\"', '')
print(len(requests.get(url=base_url+qs_params).json()['Dealers']))

100
124
