# ETL Lab

### Introduction

In this lab, we ask you to use the techniques learned in this section to work with an API of your choosing.  As important to getting to the correct code is to develop the proper procedure for getting there.  Just like in the preceding lessons we will following our procedures such as: 

1. Red, green, refactor
2. Move mess into an object
3. Make small methods by: 
    A. Commenting code
    B. Translating comments into methods
    
Along the way, we will arrive at our pattern of a *Client*, *Adapter*, and *Target*.

### Step 1.  Just get the data

The first step is to go from red to green.  That is, the code starts off with nothing working and our task is simply to get it working.  In this case, this means the following: 

1. Call an API of your choosing
2. Return a list of dictionaries and store as a variable named `entities`

In [87]:
import requests
response = requests.get("https://data.texas.gov/resource/naix-2893.json?$order=total_receipts DESC&$LIMIT=10")
entities = response.json()

In [88]:
entities

[{'taxpayer_number': '12633932723',
  'taxpayer_name': 'LEGENDS HOSPITALITY, LLC',
  'taxpayer_address': '61 BROADWAY STE 2400',
  'taxpayer_city': 'NEW YORK',
  'taxpayer_state': 'NY',
  'taxpayer_zip': '10006',
  'taxpayer_county': '0',
  'location_number': '1',
  'location_name': 'AT&T STADIUM',
  'location_address': '1 LEGENDS WAY',
  'location_city': 'ARLINGTON',
  'location_state': 'TX',
  'location_zip': '76011',
  'location_county': '220',
  'inside_outside_city_limits_code_y_n': 'Y',
  'tabc_permit_number': 'MB722028',
  'responsibility_begin_date_yyyymmdd': '2009-05-15T00:00:00.000',
  'obligation_end_date_yyyymmdd': '2018-12-31T00:00:00.000',
  'liquor_receipts': '2598869',
  'wine_receipts': '378626',
  'beer_receipts': '3930134',
  'cover_charge_receipts': '0',
  'total_receipts': '6907629'},
 {'taxpayer_number': '12633932723',
  'taxpayer_name': 'LEGENDS HOSPITALITY, LLC',
  'taxpayer_address': '61 BROADWAY STE 2400',
  'taxpayer_city': 'NEW YORK',
  'taxpayer_state': 'NY

In [89]:
type(entities)
# list 

type(entities[0])
# dict

dict

In [109]:
entities[0]

{'taxpayer_number': '12633932723',
 'taxpayer_name': 'LEGENDS HOSPITALITY, LLC',
 'taxpayer_address': '61 BROADWAY STE 2400',
 'taxpayer_city': 'NEW YORK',
 'taxpayer_state': 'NY',
 'taxpayer_zip': '10006',
 'taxpayer_county': '0',
 'location_number': '1',
 'location_name': 'AT&T STADIUM',
 'location_address': '1 LEGENDS WAY',
 'location_city': 'ARLINGTON',
 'location_state': 'TX',
 'location_zip': '76011',
 'location_county': '220',
 'inside_outside_city_limits_code_y_n': 'Y',
 'tabc_permit_number': 'MB722028',
 'responsibility_begin_date_yyyymmdd': '2009-05-15T00:00:00.000',
 'obligation_end_date_yyyymmdd': '2018-12-31T00:00:00.000',
 'liquor_receipts': '2598869',
 'wine_receipts': '378626',
 'beer_receipts': '3930134',
 'cover_charge_receipts': '0',
 'total_receipts': '6907629'}

### Step 2. Change the dictionaries into objects

The next step is to change dictionaries received back from the API into objects.  We can break this down into a couple of steps.

1. Create the *target class*.  This is the class the dictionaries will be transformed into.  To do this, choose no more than five attributes to store in each instance.

In [110]:
class Location:
    # please change the name of this class
    def __init__(self, address, city, state, zip):
        self._address = address
        self._city = city
        self._state = state
        self._zip = zip

Check your work by assigning an instance to the variable `target_instance`.

In [111]:
target_instance = Location(entities[0]['location_address'],
                           entities[0]['location_city'],
                           entities[0]['location_state'],
                           entities[0]['location_zip'])
3 < len( target_instance.__dict__.keys()) < 5
# True 

True

1. Reject some of the data

We don't want to pass all of our data into our class.  So create a smaller dictionary of just the attributes we need.

In [112]:
selected_attributes = {'location_address':entities[0]['location_address'],
                       'location_city':entities[0]['location_city'],
                       'location_state':entities[0]['location_state'],
                       'location_zip':entities[0]['location_zip']}

In [113]:
type(selected_attributes)
# dict

dict

In [114]:
len(selected_attributes.keys()) == len(target_instance.__dict__.keys())
# True

True

2. Coerce dictionaries into objects

A. To start, coerce just one dictionary into an object.

In [115]:
first_object = Location(entities[0]['location_address'],
                           entities[0]['location_city'],
                           entities[0]['location_state'],
                           entities[0]['location_zip'])
# change the above line to reference your target class

In [116]:
first_object._address

'1 LEGENDS WAY'

In [117]:
first_object.__dict__.values()

dict_values(['1 LEGENDS WAY', 'ARLINGTON', 'TX', '76011'])

In [118]:
entities[0].values()

dict_values(['12633932723', 'LEGENDS HOSPITALITY, LLC', '61 BROADWAY STE 2400', 'NEW YORK', 'NY', '10006', '0', '1', 'AT&T STADIUM', '1 LEGENDS WAY', 'ARLINGTON', 'TX', '76011', '220', 'Y', 'MB722028', '2009-05-15T00:00:00.000', '2018-12-31T00:00:00.000', '2598869', '378626', '3930134', '0', '6907629'])

In [119]:
list(first_object.__dict__.values()) == list(entities[0].values())
# True

False

B. Now that you have solved for one, solve for all.  Coerce all of the dictionaries into objects.  Assign the list of objects to a variable `targets`.

In [159]:
targets = []
for location_dict in entities: 
    selected_attr = {'address': location_dict['location_address'], 
                     'city': location_dict['location_city'], 
                     'state': location_dict['location_state'],
                     'zip':  location_dict['location_zip']}
    target = Location(selected_attr['address'], selected_attr['city'],
                      selected_attr['state'], selected_attr['zip'])
    targets.append(target)

In [160]:
type(targets)

list

In [161]:
targets[2].__dict__

{'_address': '1 LEGENDS WAY',
 '_city': 'ARLINGTON',
 '_state': 'TX',
 '_zip': '76011'}

In [162]:
targets

[<__main__.Location at 0x1ea6c19a1d0>,
 <__main__.Location at 0x1ea6c19a320>,
 <__main__.Location at 0x1ea6c19a160>,
 <__main__.Location at 0x1ea6c19a128>,
 <__main__.Location at 0x1ea6c19a0f0>,
 <__main__.Location at 0x1ea6c19a080>,
 <__main__.Location at 0x1ea6c1b3048>,
 <__main__.Location at 0x1ea6c1b3080>,
 <__main__.Location at 0x1ea6c1b30b8>,
 <__main__.Location at 0x1ea6c1b30f0>]

In [156]:
entities

[{'taxpayer_number': '12633932723',
  'taxpayer_name': 'LEGENDS HOSPITALITY, LLC',
  'taxpayer_address': '61 BROADWAY STE 2400',
  'taxpayer_city': 'NEW YORK',
  'taxpayer_state': 'NY',
  'taxpayer_zip': '10006',
  'taxpayer_county': '0',
  'location_number': '1',
  'location_name': 'AT&T STADIUM',
  'location_address': '1 LEGENDS WAY',
  'location_city': 'ARLINGTON',
  'location_state': 'TX',
  'location_zip': '76011',
  'location_county': '220',
  'inside_outside_city_limits_code_y_n': 'Y',
  'tabc_permit_number': 'MB722028',
  'responsibility_begin_date_yyyymmdd': '2009-05-15T00:00:00.000',
  'obligation_end_date_yyyymmdd': '2018-12-31T00:00:00.000',
  'liquor_receipts': '2598869',
  'wine_receipts': '378626',
  'beer_receipts': '3930134',
  'cover_charge_receipts': '0',
  'total_receipts': '6907629'},
 {'taxpayer_number': '12633932723',
  'taxpayer_name': 'LEGENDS HOSPITALITY, LLC',
  'taxpayer_address': '61 BROADWAY STE 2400',
  'taxpayer_city': 'NEW YORK',
  'taxpayer_state': 'NY

In [163]:
len(targets) == len(entities)
# True

True

### 3. Move the remaining code into an object 

At this point, we successfully have transformed a list of dictionaries from an API into a list of objects.  But we need to keep cleaning up our code.  To do this, look at the code outside of a class, and move it into a class, with a method name of run.

In [135]:
class LocationAdapter:
  def __init__(self, location_dicts):
    self._location_dicts = location_dicts
    
  def run(self):
    self._targets = []
    for location_dict in self._location_dicts: 
      selected_attr = {'address': location_dict['location_address'], 
                       'city': location_dict['location_city'], 
                       'state': location_dict['location_state'],
                       'zip':  location_dict['location_zip']}
      target = Location(selected_attr['address'], selected_attr['city'],
                        selected_attr['state'], selected_attr['zip'])
      self._targets.append(target)
    return self._targets

Let's marke sure that this works.

In [147]:
location_adapter = LocationAdapter(entities)
# change the above line to reference your adapter
results = location_adapter.run()

In [164]:
len(results) == len(targets)
# True

True

### 4. Make the methods smaller

Next, separate out the run method in the adapter class smaller.  Do this by first writing comments in the code, and then moving the code into separate methods.  Please leave the comments in your code.  Your methods should be no longer than five lines long, and there can only be a total of one `if else` statement or `loop` per method.  Having both an `if else` and a `loop` in any method is also too complicated -- don't do it.

In [78]:
class FooAdapter:
    # def run():
        # do this
        
        # and this
    pass

In [165]:
class LocationAdapter:
    # initiation method
  def __init__(self, location_dicts):
    self._location_dicts = location_dicts
    
  # run method to generate result  
  def run(self):
    self._targets = []
    for location_dict in self._location_dicts:
        target = Location(location_dict['location_address'], location_dict['location_city'], location_dict['location_state'], location_dict['location_zip'])
        self._targets.append(target)
    return self._targets

In [167]:
location_adapter = LocationAdapter(entities)
# change the above line to reference your adapter
results = location_adapter.run()
len(results) == len(targets)

True

### 5. Create the client class

Next move calls to the API into their own separate class.  This way we can call the API but later to decide to coerce the data into different types of objects than we did above.

In [171]:
import requests
class ClientAPI:
    def run(self):
        url = "https://data.texas.gov/resource/naix-2893.json?$order=total_receipts DESC&$LIMIT=10"
        return requests.get(url).json()

Place the updated Adapter class below.  Check that it still works as it did before.

In [177]:
class RefactoredLocationAdapter:
    # initiation method
    def __init__(self, location_dicts):
        self._location_dicts = location_dicts
    
    # run method to generate result  
    def run(self):
        self._targets = []
        for location_dict in self._location_dicts:
            target = Location(location_dict['location_address'], location_dict['location_city'], location_dict['location_state'], location_dict['location_zip'])
            self._targets.append(target)
        return self._targets

In [178]:
client_api = ClientAPI()
refactored_adapter = RefactoredLocationAdapter(client_api.run())
len(refactored_adapter.run()) == len(location_adapter.run())

True

### Summary

Great job!  Hopefully, you saw how by building our code and then slowly refactoring our code, we can eventually get to some clean code.