# Dictionaries with APIs

### Introduction

Now that we have better explored lists, strings, and dictionaries, we can explore additional data from Max's Wine Bar.  

In [1]:
import pandas as pd
df = pd.read_json('https://raw.githubusercontent.com/eng-6-22/mod-1-fundamentals/master/restaurant_receipts_2020.json')
restaurant_receipts = df.to_dict('records')

In [2]:
restaurant_receipts[:3]

[{'taxpayer_number': 12727298569,
  'taxpayer_name': 'MWD AUSTIN DOWNTOWN, LLC',
  'taxpayer_address': '7026 OLD KATY RD STE 255',
  'taxpayer_city': 'HOUSTON',
  'taxpayer_state': 'TX',
  'taxpayer_zip': 77024,
  'taxpayer_county': 101,
  'location_number': 1,
  'location_name': "MAX'S WINE DIVE",
  'location_address': '207 SAN JACINTO BLVD STE 200',
  'location_city': 'AUSTIN',
  'location_state': 'TX',
  'location_zip': 78701,
  'location_county': 227,
  'inside_outside_city_limits_code_y_n': 'Y',
  'tabc_permit_number': 'MB944126',
  'responsibility_begin_date_yyyymmdd': '2016-05-13T00:00:00.000',
  'obligation_end_date_yyyymmdd': '2020-08-31T00:00:00.000',
  'liquor_receipts': 0,
  'wine_receipts': 0,
  'beer_receipts': 0,
  'cover_charge_receipts': 0,
  'total_receipts': 0,
  'responsibility_end_date_yyyymmdd': None},
 {'taxpayer_number': 12727298569,
  'taxpayer_name': 'MWD AUSTIN DOWNTOWN, LLC',
  'taxpayer_address': '7026 OLD KATY RD STE 255',
  'taxpayer_city': 'HOUSTON',
  '

We now have returned a list of dictionaries, where each dictionary contains information about a separate month's drink revenue.

In [3]:
#restaurant_receipts[:2]

So while each dictionary is fairly large, in the end it is just a list of dictionaries which we have seen before.

```python
restaurant_receipts
# [{receipt1}, {receipt2}]
```

Ok, let's use our knowledge of datatypes to explore and organize this data.

### Starting Broad

Begin by calculating the number of elements in our list of receipts.

In [4]:
len(restaurant_receipts)
# 77

77

So it looks like there are 77 restaurant receipts that are returned.  Now let's select the first receipt and take a look at it.

> Select the first receipt below and assign it to the variable receipt.

In [5]:
receipt = restaurant_receipts[0]
receipt

# {'taxpayer_number': 12727298569,
#  'taxpayer_name': 'MWD AUSTIN DOWNTOWN, LLC',
#  'taxpayer_address': '7026 OLD KATY RD STE 255',
#  'taxpayer_city': 'HOUSTON',
#  'taxpayer_state': 'TX',
#  'taxpayer_zip': 77024,
#  'taxpayer_county': 101,
# ...
# 'cover_charge_receipts': 0,
#  'total_receipts': 0,
#  'responsibility_end_date_yyyymmdd': None}

{'taxpayer_number': 12727298569,
 'taxpayer_name': 'MWD AUSTIN DOWNTOWN, LLC',
 'taxpayer_address': '7026 OLD KATY RD STE 255',
 'taxpayer_city': 'HOUSTON',
 'taxpayer_state': 'TX',
 'taxpayer_zip': 77024,
 'taxpayer_county': 101,
 'location_number': 1,
 'location_name': "MAX'S WINE DIVE",
 'location_address': '207 SAN JACINTO BLVD STE 200',
 'location_city': 'AUSTIN',
 'location_state': 'TX',
 'location_zip': 78701,
 'location_county': 227,
 'inside_outside_city_limits_code_y_n': 'Y',
 'tabc_permit_number': 'MB944126',
 'responsibility_begin_date_yyyymmdd': '2016-05-13T00:00:00.000',
 'obligation_end_date_yyyymmdd': '2020-08-31T00:00:00.000',
 'liquor_receipts': 0,
 'wine_receipts': 0,
 'beer_receipts': 0,
 'cover_charge_receipts': 0,
 'total_receipts': 0,
 'responsibility_end_date_yyyymmdd': None}

Ok, so there's our dictionary.

Now let's see what information is available in our dictionary.  Return all of the keys available, and assign them to the variable `receipt_attributes`.

In [6]:
receipt_attributes = receipt.keys()
receipt_attributes

# dict_keys(['taxpayer_number', 'taxpayer_name', 'taxpayer_address', 'taxpayer_city', 'taxpayer_state', 'taxpayer_zip', 'taxpayer_county', 'location_number', 'location_name', 'location_address', 'location_city', 'location_state', 'location_zip', 'location_county', 'inside_outside_city_limits_code_y_n', 'tabc_permit_number', 'responsibility_begin_date_yyyymmdd', 'obligation_end_date_yyyymmdd', 'liquor_receipts',
#            'wine_receipts', 'beer_receipts', 'cover_charge_receipts', 'total_receipts'])

dict_keys(['taxpayer_number', 'taxpayer_name', 'taxpayer_address', 'taxpayer_city', 'taxpayer_state', 'taxpayer_zip', 'taxpayer_county', 'location_number', 'location_name', 'location_address', 'location_city', 'location_state', 'location_zip', 'location_county', 'inside_outside_city_limits_code_y_n', 'tabc_permit_number', 'responsibility_begin_date_yyyymmdd', 'obligation_end_date_yyyymmdd', 'liquor_receipts', 'wine_receipts', 'beer_receipts', 'cover_charge_receipts', 'total_receipts', 'responsibility_end_date_yyyymmdd'])

Now, that's a lot of attributes.  Let's see how many.

In [7]:
num_attributes = len(receipt_attributes)

num_attributes
# 23

24

And to better organize the data we have, let's sort those attribute alphabetically.

In [8]:
sorted_attrs = sorted(receipt_attributes)

sorted_attrs[:10]

# ['beer_receipts', 'cover_charge_receipts', 'inside_outside_city_limits_code_y_n',
#  'liquor_receipts', 'location_address', 'location_city', 'location_county',
#  'location_name', 'location_number', 'location_state']

['beer_receipts',
 'cover_charge_receipts',
 'inside_outside_city_limits_code_y_n',
 'liquor_receipts',
 'location_address',
 'location_city',
 'location_county',
 'location_name',
 'location_number',
 'location_state']

Ok, now we can focus in on a few of our attributes.  We can perhaps reduce down the number of attributes by selecting just the values that change over time.  For example, are there multiple locations of Max's wine bar?  

First create a list that has the `location_address` in each dictionary.

> Assign the result to the variable `location_addresses`.

In [9]:
location_addresses = [location['location_address'] for location in restaurant_receipts]

In [10]:
location_addresses[:3]

# ['207 SAN JACINTO BLVD STE 200',
#  '207 SAN JACINTO BLVD STE 200',
#  '207 SAN JACINTO BLVD STE 200']

['207 SAN JACINTO BLVD STE 200',
 '207 SAN JACINTO BLVD STE 200',
 '207 SAN JACINTO BLVD STE 200']

Then turn this into a unique list of locations.

In [11]:
unique_locations = set(location_addresses)

unique_locations

# ['3600 MCKINNEY AVE STE 100', '207 SAN JACINTO BLVD STE 200']

{'207 SAN JACINTO BLVD STE 200', '3600 MCKINNEY AVE STE 100'}

Ok, so it looks like we have two different locations of Max's Wine Bar.  

Now let's begin to reduce the amount information in our dictionary.  Let's select just the key value pairs of `total_receipts`, `obligation_end_date_yyyymmdd`, and `location_address`.

Assign the result to the variable `abridged_receipts`.

In [23]:
abridged_receipts = [{'total_receipts': item['total_receipts'],
                      'obligation_end_date_yyyymmdd' : item['obligation_end_date_yyyymmdd'],
                      'location_address' : item['location_address']
                      } for item in restaurant_receipts]
abridged_receipts[:3]

[{'total_receipts': 0,
  'obligation_end_date_yyyymmdd': '2020-08-31T00:00:00.000',
  'location_address': '207 SAN JACINTO BLVD STE 200'},
 {'total_receipts': 2908,
  'obligation_end_date_yyyymmdd': '2020-07-31T00:00:00.000',
  'location_address': '207 SAN JACINTO BLVD STE 200'},
 {'total_receipts': 9322,
  'obligation_end_date_yyyymmdd': '2020-06-30T00:00:00.000',
  'location_address': '207 SAN JACINTO BLVD STE 200'}]

In [24]:
abridged_receipts[:3]

# [{'total_receipts': '0',
#   'obligation_end_date_yyyymmdd': '2020-08-31T00:00:00.000',
#   'location_address': '207 SAN JACINTO BLVD STE 200'},
#  {'total_receipts': '2908',
#   'obligation_end_date_yyyymmdd': '2020-07-31T00:00:00.000',
#   'location_address': '207 SAN JACINTO BLVD STE 200'},
#  {'total_receipts': '9322',
#   'obligation_end_date_yyyymmdd': '2020-06-30T00:00:00.000',
#   'location_address': '207 SAN JACINTO BLVD STE 200'}]


[{'total_receipts': 0,
  'obligation_end_date_yyyymmdd': '2020-08-31T00:00:00.000',
  'location_address': '207 SAN JACINTO BLVD STE 200'},
 {'total_receipts': 2908,
  'obligation_end_date_yyyymmdd': '2020-07-31T00:00:00.000',
  'location_address': '207 SAN JACINTO BLVD STE 200'},
 {'total_receipts': 9322,
  'obligation_end_date_yyyymmdd': '2020-06-30T00:00:00.000',
  'location_address': '207 SAN JACINTO BLVD STE 200'}]

Ok, now let's clean up some of our resulting data.  To begin with, instead of `obligation-end_date_yyyymmdd`, let's create separate keys of year and month, and store the respective values.  In addition let's coerce each `total_receipts`, `year` and `month` value to an integer.  Finally, let's only store the first fifteen characters of the location address.

Store the new result in the variable `clean_receipts`.

> We can start from our existing `abridged_receipts` data and keep going from there.

In [40]:
import datetime
from  datetime import datetime as dt

clean_receipts = []


for item in abridged_receipts:
  date= datetime.datetime.strptime(item['obligation_end_date_yyyymmdd'], '%Y-%m-%dT%H:%M:%S.%f')

    # Extract year, month, and day as integers
  year = date.year
  month = date.month
  day = date.day
  total = int(item['total_receipts'])

  clean_receipt = {
      'year': year,
      'month': month,
      'total' : total,
      'location_address': item['location_address'][:15]
  }

  clean_receipts.append(clean_receipt)

In [41]:
clean_receipts[:5]

# [{'year': 2020, 'month': 8, 'total': 0, 'location_address': '207 SAN JACINTO'},
#  {'year': 2020,
#   'month': 7,
#   'total': 2908,
#   'location_address': '207 SAN JACINTO'},
#  {'year': 2020,
#   'month': 6,
#   'total': 9322,
#   'location_address': '207 SAN JACINTO'},
#  {'year': 2020,
#   'month': 5,
#   'total': 3421,
#   'location_address': '207 SAN JACINTO'},
#  {'year': 2020,
#   'month': 4,
#   'total': 90,
#   'location_address': '207 SAN JACINTO'}]

[{'year': 2020, 'month': 8, 'total': 0, 'location_address': '207 SAN JACINTO'},
 {'year': 2020,
  'month': 7,
  'total': 2908,
  'location_address': '207 SAN JACINTO'},
 {'year': 2020,
  'month': 6,
  'total': 9322,
  'location_address': '207 SAN JACINTO'},
 {'year': 2020,
  'month': 5,
  'total': 3421,
  'location_address': '207 SAN JACINTO'},
 {'year': 2020,
  'month': 4,
  'total': 90,
  'location_address': '207 SAN JACINTO'}]

### Bonus

Notice that we can use the `sorted` method to sort our data by a specific attribute.  See if you can figure out how this works.

In [42]:
sorted(clean_receipts, key=lambda receipt: receipt['total'])[:5]

[{'year': 2020, 'month': 8, 'total': 0, 'location_address': '207 SAN JACINTO'},
 {'year': 2015, 'month': 8, 'total': 0, 'location_address': '3600 MCKINNEY A'},
 {'year': 2020,
  'month': 4,
  'total': 90,
  'location_address': '207 SAN JACINTO'},
 {'year': 2020,
  'month': 7,
  'total': 2908,
  'location_address': '207 SAN JACINTO'},
 {'year': 2020,
  'month': 5,
  'total': 3421,
  'location_address': '207 SAN JACINTO'}]

> See if you can sort the data by year.

In [44]:
receipts_sorted_year = sorted(clean_receipts, key=lambda receipt: receipt['year'])

receipts_sorted_year[:5]

# [{'year': 2015,
#   'month': 9,
#   'total': 66609,
#   'location_address': '3600 MCKINNEY A'},
#  {'year': 2015, 'month': 8, 'total': 0, 'location_address': '3600 MCKINNEY A'},
#  {'year': 2015,
#   'month': 11,
#   'total': 69664,
#   'location_address': '3600 MCKINNEY A'},
#  {'year': 2015,
#   'month': 12,
#   'total': 72238,
#   'location_address': '3600 MCKINNEY A'},
#  {'year': 2015,
#   'month': 10,
#   'total': 67048,
#   'location_address': '3600 MCKINNEY A'}]

[{'year': 2015,
  'month': 9,
  'total': 66609,
  'location_address': '3600 MCKINNEY A'},
 {'year': 2015, 'month': 8, 'total': 0, 'location_address': '3600 MCKINNEY A'},
 {'year': 2015,
  'month': 11,
  'total': 69664,
  'location_address': '3600 MCKINNEY A'},
 {'year': 2015,
  'month': 12,
  'total': 72238,
  'location_address': '3600 MCKINNEY A'},
 {'year': 2015,
  'month': 10,
  'total': 67048,
  'location_address': '3600 MCKINNEY A'}]

### Summary

In this lesson we started with a messy dataset and made our data more managable by:

1. Limiting the number of attributes in each dictionary
2. Coercing our data to more appropriate datatype (integers for dates and revenue, shorter strings).

Once we did so, sorting and exploring our data became easier.

We also used our knowledge of datatypes explore our data by finding the number of locations we would be handling receipts of.

### Resources

[Mixed Beverage API](https://data.texas.gov/Government-and-Taxes/Mixed-Beverage-Gross-Receipts/naix-2893)