# Socrata API


For this project you will use `requests` package and the SODA API to access data through https://data.nashville.gov/. We'll start out with something familiar, the [Top 500 Monthly Searches](https://data.nashville.gov/Public-Services/Nashville-gov-Top-500-Monthly-Searches/fuaa-r5cm), then pull in different datasets further on. You will make different API requests for each individual question.

Each dataset has its own api endpoint. You can find the endpoint for a dataset by clicking on the `API` button in the top right of the dataset screen, then copying the `API Endpoint`. The default output is `JSON`, which you can leave unchanged:

![api_endpoint](assets/api_endpoint.png)

Each API is different, so it is very important to read the documentation for each API to know how to use it properly. The documentation for the SODA API is [here](https://dev.socrata.com/consumers/getting-started.html). It is **HIGHLY RECOMMENDED** that you read the documentation before making any requests, then do deeper dives into specific use cases when questions require. NOTE that the examples in the documentation don't use the `requests` package. You will need to look at the examples and figure out which things go in the `url` and which things go in the `params`.






#### Questions  

1. Make an API request that returns the months where "fire" was searched in 2016. Which month had the most searches?  

In [119]:
import requests
import matplotlib.pyplot as plt
import pandas as pd

#### Parameters

Parameters are specific to each API and indicate what information you want back. These can be compared to the various ways you slice a table or df to get just the subset you want. Some parameters are required, others are optional. Always look at the documentation to know what parameters you should include and what are possible values for each one. When using parameters for an API call, you can do the following:

* Make an empty dictionary for the `params` variable
* Look at the documentation to know what parameters you should include, add these as **keys** to the dictionary
* Add the appropriate values for each parameter as the **values** for the dictionary

(Limit and offset parameters - play around with years to find the right year)

In [97]:
endpoint = 'https://data.nashville.gov/resource/fuaa-r5cm.json'

In [98]:
# reminder - we're looking for months where "fire" was searched in 2016
# then finding which month had the most searches 

params = {
'year': '2016',
'query_text':'fire'
}

In [99]:
response = requests.get(endpoint, params = params)

In [100]:
response

<Response [200]>

In [101]:
res = response.json()
res

[{'month_name': 'January',
  'year': '2016',
  'query_count': '19',
  'query_text': 'fire'},
 {'month_name': 'February',
  'year': '2016',
  'query_count': '35',
  'query_text': 'fire'},
 {'month_name': 'March',
  'year': '2016',
  'query_count': '32',
  'query_text': 'fire'},
 {'month_name': 'April',
  'year': '2016',
  'query_count': '26',
  'query_text': 'fire'},
 {'month_name': 'May',
  'year': '2016',
  'query_count': '24',
  'query_text': 'fire'},
 {'month_name': 'June',
  'year': '2016',
  'query_count': '31',
  'query_text': 'fire'},
 {'month_name': 'July',
  'year': '2016',
  'query_count': '24',
  'query_text': 'fire'},
 {'month_name': 'August',
  'year': '2016',
  'query_count': '47',
  'query_text': 'fire'},
 {'month_name': 'September',
  'year': '2016',
  'query_count': '36',
  'query_text': 'fire'},
 {'month_name': 'October',
  'year': '2016',
  'query_count': '38',
  'query_text': 'fire'},
 {'month_name': 'November',
  'year': '2016',
  'query_count': '32',
  'query_text


2. Make an API request that returns all the times a query was run more than 100 times in a month. How many times did this occur?  

In [102]:
endpoint = 'https://data.nashville.gov/resource/fuaa-r5cm.json?$where=query_count > 100'

In [103]:
response = requests.get(endpoint)

In [104]:
res = response.json()

In [105]:
res

[{'month_name': 'March',
  'year': '2014',
  'query_count': '101',
  'query_text': 'permits'},
 {'month_name': 'January',
  'year': '2015',
  'query_count': '101',
  'query_text': 'criminal court clerk'},
 {'month_name': 'September',
  'year': '2015',
  'query_count': '101',
  'query_text': 'codes'},
 {'month_name': 'March',
  'year': '2016',
  'query_count': '101',
  'query_text': 'police'},
 {'month_name': 'March',
  'year': '2016',
  'query_count': '101',
  'query_text': 'civil service'},
 {'month_name': 'November',
  'year': '2016',
  'query_count': '101',
  'query_text': 'jobs'},
 {'month_name': 'November',
  'year': '2017',
  'query_count': '101',
  'query_text': 'metro holidays'},
 {'month_name': 'November',
  'year': '2017',
  'query_count': '101',
  'query_text': 'longevity pay'},
 {'month_name': 'January',
  'year': '2018',
  'query_count': '101',
  'query_text': 'West Nashville Heights Church of Christ'},
 {'month_name': 'January',
  'year': '2018',
  'query_count': '101',
 

How many times did this occur?

In [106]:
endpoint = 'https://data.nashville.gov/resource/fuaa-r5cm.json?$select=count(query_count > 100)'

In [107]:
response = requests.get(endpoint)

In [108]:
res = response.json()
res

[{'count_query_count_100': '43676'}]


3. Make another API request that returns all the times "codes" was searched more than 100 times in a month. How many times did this occur?  


In [138]:
endpoint = 'https://data.nashville.gov/resource/fuaa-r5cm.json'

In [142]:
# courtesy of Jai
aparams = {
'$where':'query_count>=100',
'$select':'month_name, year, query_text, query_count',
'$limit':'5000000'
}

aresponse = requests.get(endpoint, params=aparams)
ares=aresponse.json()
entire=pd.DataFrame(ares)
entire

Unnamed: 0,month_name,year,query_text,query_count
0,October,2014,maps,100
1,April,2015,employment,100
2,April,2015,Nashville,100
3,October,2016,property maps,100
4,June,2017,building permit,100
...,...,...,...,...
1282,September,2018,annual enrollment,750
1283,October,2018,annual enrollment,816
1284,January,2019,Nashville,2646
1285,September,2019,directory,5327


In [111]:
response = requests.get(endpoint, params = aparams)

In [114]:
response

<Response [200]>

In [117]:
res_codes = response.json()
res_codes

[{'month_name': 'January',
  'year': '2014',
  'query_count': '37',
  'query_text': 'codes'},
 {'month_name': 'February',
  'year': '2014',
  'query_count': '75',
  'query_text': 'codes'},
 {'month_name': 'March',
  'year': '2014',
  'query_count': '90',
  'query_text': 'codes'},
 {'month_name': 'April',
  'year': '2014',
  'query_count': '65',
  'query_text': 'codes'},
 {'month_name': 'May',
  'year': '2014',
  'query_count': '84',
  'query_text': 'codes'},
 {'month_name': 'June',
  'year': '2014',
  'query_count': '77',
  'query_text': 'codes'},
 {'month_name': 'July',
  'year': '2014',
  'query_count': '92',
  'query_text': 'codes'},
 {'month_name': 'August',
  'year': '2014',
  'query_count': '70',
  'query_text': 'codes'},
 {'month_name': 'September',
  'year': '2014',
  'query_count': '71',
  'query_text': 'codes'},
 {'month_name': 'October',
  'year': '2014',
  'query_count': '45',
  'query_text': 'codes'},
 {'month_name': 'November',
  'year': '2014',
  'query_count': '53',
  '

In [120]:
res_codes_df = pd.DataFrame(res_codes)

In [128]:
print(res_codes_df)

   month_name  year query_count query_text month
0     January  2014          37      codes   NaN
1    February  2014          75      codes   NaN
2       March  2014          90      codes   NaN
3       April  2014          65      codes   NaN
4         May  2014          84      codes   NaN
..        ...   ...         ...        ...   ...
81      March  2021         134      codes     3
82      April  2021         102      codes     4
83        May  2021         115      codes     5
84       June  2021         138      codes     6
85       July  2021         114      codes     7

[86 rows x 5 columns]


In [126]:
res_codes_100_df = res_codes_df['query_count'] >='100'

In [129]:
res_codes_100_df.value_counts("True")

query_count
True    1.0
Name: proportion, dtype: float64

4. Make an API request that returns the entire Top 500 Monthly Searches dataset. Make a chart that shows the number of times "maps" was searched in a month across the entire time frame.

#### Stretch Questions

5. Make an API request to pull back all the data from [hubNashville (311) Service Requests](https://data.nashville.gov/Public-Services/hubNashville-311-Service-Requests/7qhx-rexh) (check to see how many rows you can return in a single request). Compare it to the Top 500 Monthly Searches data set. What do you observe? (This is open-ended, there isn't a specific answer for this one)  


6. Find 2 new data sets on data.nashville.gov, make API requests to pull the data, and do an analysis that combines the data sets.  

#### Bonus

7. Socrata is used by many cities, states, and federal organizations. Find additional datasets through [Socrata's Open Data Network](http://www.opendatanetwork.com/) and do an analysis comparing them to Nashville or each other.


#### Show and Tell
At the end of the project you will present some general insights, visualizations, or other finding from any part of the project. This will be informal (showing your Jupyter notebook is fine, no need to make a powerpoint) and should be no more than 5 min. If you had challenges making your visualizations, then it is fine to discuss your experience working with the API and what you were intending to show.