# Accessing data from the Web

As we know, part of the reason data science skills have become so important is because of the amount of information available on the Internet.  Data scientists have the skills to precisely target the information relevant to them and then analyze that data.

### Let's get to it

Now getting the proper information can be fairly easy, and even automated once we have the right knowledge and skills.

For example, this is all of the code it takes to ask for Google for information about a specific book. 

```python
import requests
response = requests.get("https://www.googleapis.com/books/v1/volumes?q=tom%20sawyer")
response.json()
```

And Google sends us back the following information (and a lot more).

```python
{'kind': 'books#volume',
 'id': 'OR46AQAAIAAJ',
 'etag': '+UMtOjnJUu0',
 'selfLink': 'https://www.googleapis.com/books/v1/volumes/OR46AQAAIAAJ',
 'volumeInfo': {'title': 'The Adventures of Tom Sawyer',
  'authors': ['Mark Twain'],
  'publishedDate': '1920',
  'industryIdentifiers': [{'type': 'OTHER',
    'identifier': 'STANFORD:36105047945816'}],
  'readingModes': {'text': False, 'image': True},
  'pageCount': 290,
  'printType': 'BOOK',
  'categories': ['Sawyer, Tom (Fictitious character)'],
  'averageRating': 4.5,
  'ratingsCount': 3,
  'maturityRating': 'NOT_MATURE',
  'allowAnonLogging': False,
  'contentVersion': '0.2.1.0.full.1',
  'imageLinks': {'smallThumbnail': 'http://books.google.com/books/content?id=OR46AQAAIAAJ&printsec=frontcover&img=1&zoom=5&edge=curl&source=gbs_api'}
               }
 ```

So we really just had to type 8 words to get back the data we wanted.  And they weren't even very big words.

```python
import requests
response = requests.get("https://www.googleapis.com/books/v1/volumes?q=tom%20sawyer")
response.json()
```

What's nice about this, is that with the correct skills we can request information and then say if we just want to find out the page count, or the average rating of this book, and many others, we can write a program to do that for us.  

### Welcome to APIs 

What we just did was access information from an API, and that is preferred way that we can get public data on the web.  An API is just a service that we can ask for information, like we did with our code above, and the service gives us back some data.  Don't worry if that's not exactly clear -- we'll talk a lot about APIs later.

The important point is that with just a few keystrokes we can write some code that asks for information, and that information is sent back to us to analyze.

The other important point is that there are a lot of APIs available to us, and we can ask any of them for information.  For example, if we would like to get back information about a restaurants near us. Well we can ask the website Foursquare for this information.  And that is also eight words.  And almost all of those words are the same as before.

```python
import requests
response = requests.get("https://api.foursquare.com/v2/venues/explore/?near=nyc&section=food&client_id=YZQZP1Q2HEJWMD5ZVBMIQD3VSZC1W4BQCCQTVFEPJWNHL0RK&client_secret=ORHPL2VKKHUTB3KTJVDTB4D20AXBRCFKWVL12EPQNJNDFYBX&v=20131124")
response.json()

```

Foursquare politely gives us back information about 243 restaurants.

```python

{'meta': {'code': 200, 'requestId': '5ca7923fdd57977cb31295ff'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'geocode': {'what': '',
   'where': 'nyc',
   'center': {'lat': 40.742185, 'lng': -73.992602},
   'displayString': 'New York, NY, United States',
   'cc': 'US',
   'geometry': {'bounds': {'ne': {'lat': 40.882214, 'lng': -73.907},
     'sw': {'lat': 40.679548, 'lng': -74.047285}}},
   'slug': 'new-york-city-new-york',
   'longId': '72057594043056517'},
  'headerLocation': 'New York',
  'headerFullLocation': 'New York',
  'headerLocationGranularity': 'city',
  'query': 'food',
  'totalResults': 246}
 
```

Of course these are just two of the websites that offer up their data via an API.  But most of the popular websites that you can name have developed an API for us to pull information from: Google, Spotify, Yelp, The New York Times, Wikipedia to name just a few. 

## Now for our questions

The problem of course, is that we cannot really understand any of the code above, so if we ever want to change things around and search for information on our own, we would likely get stuck.  We also don't really know what an API is, other than it could be a nice way to access data from other websites.

All true. So let's take another look at the original code.  We'll use it to generate questions that we need to answer going forward.

```python
import requests
response = requests.get("https://www.googleapis.com/books/v1/volumes?q=tom%20sawyer")
response.json()
```

**Our Questions**

1. What is that `"https://www.googleapis.com/books/v1/volumes?q=tom%20sawyer"`, and how would we know that this has the information we want?
2. What is `requests` and what is `response`?
3. What is `get` and what is `json`
4. Finally how do we make sense and navigate those squiggly lines and text that Google and Yelp send back, like the following beauty: `{'meta': {'code': 200, 'requestId': '5ca7923fdd57977cb31295ff'}`?.
5. And what again is an API?



**Why these questions**

These are important questions.  Answering them will make us way faster at getting data from many of the websites that make their data available to us.  They will also teach us the fundamentals of how the Internet works, which we are going to need to know if we want to get good at retreiving information from there.

Once we are able to retrieve that data, we can use it to perform analysis and ultimately draw insights.  Let's get started.

### Resources

[Here are some available APIs, organized by topic](https://github.com/toddmotto/public-apis)