# ETL Lab

### Introduction

In this lab, we ask you to use the techniques learned in this section to work with an API of your choosing.  As important to getting to the correct code is to develop the proper procedure for getting there.  Just like in the preceding lessons we will following our procedures such as: 

1. Red, green, refactor
2. Move mess into an object
3. Make small methods by: 
    A. Commenting code
    B. Translating comments into methods
    
Along the way, we will arrive at our pattern of a *Client*, *Adapter*, and *Target*.

### Step 1.  Just get the data

The first step is to go from red to green.  That is, the code starts off with nothing working and our task is simply to get it working.  In this case, this means the following: 

1. Call an API of your choosing
2. Return a list of dictionaries and store as a variable named `entities`

In [3]:
# Median Gross Rent - Dataset
# https://dev.socrata.com/foundry/opendata.ramseycounty.us/p5xb-h3xq
import requests
url = "https://opendata.ramseycounty.us/resource/p5xb-h3xq.json"
entities = requests.get(url).json()

In [2]:
print(type(entities))
# list 

print(type(entities[0]))
# dict

<class 'list'>
<class 'dict'>


In [3]:
entities[0]

{'period': '2012-01-01T00:00:00.000',
 'id': '0500000US27003',
 'id2': '27003',
 'geography': 'Anoka County, Minnesota',
 'estimate_median_gross_rent': '937',
 'margin_of_error_median_gross_rent': '16'}

### Step 2. Change the dictionaries into objects

The next step is to change dictionaries received back from the API into objects.  We can break this down into a couple of steps.

1. Create the *target class*.  This is the class the dictionaries will be transformed into.  To do this, choose no more than five attributes to store in each instance.

In [6]:
class Target:
  def __init__(self, id2, address, estimate_median_gross_rent, margin_of_error_median_gross_rent):
    self._id2 = id2
    self._address = address
    self._estimate_median_gross_rent = estimate_median_gross_rent
    self._margin_of_error_median_gross_rent = margin_of_error_median_gross_rent

Check your work by assigning an instance to the variable `target_instance`.

In [5]:
target_instance = Target('27003','Anoka County, Minnesota','937','16')
3 < len(target_instance.__dict__.keys()) < 5
# True 

True

1. Reject some of the data

We don't want to pass all of our data into our class.  So create a smaller dictionary of just the attributes we need.

In [6]:
selected_attributes = {'address':None, 'estimate_median_gross_rent':None, 'margin_of_error_median_gross_rent':None}

In [7]:
type(selected_attributes)
# dict

dict

In [8]:
len(selected_attributes.keys()) < len(target_instance.__dict__.keys())
# True

True

2. Coerce dictionaries into objects

A. To start, coerce just one dictionary into an object.

In [4]:
entities[0]

{'period': '2012-01-01T00:00:00.000',
 'id': '0500000US27003',
 'id2': '27003',
 'geography': 'Anoka County, Minnesota',
 'estimate_median_gross_rent': '937',
 'margin_of_error_median_gross_rent': '16'}

In [7]:
# target - 
entity = entities[0]
first_object = Target(entity['id2'],entity['geography'],entity['estimate_median_gross_rent'],entity['margin_of_error_median_gross_rent'])
# change the above line to reference your target class

In [None]:
def find_by_geography(geography):
    entity_objects = list(filter(lambda entity_object: entity_object.geography == geography,entity_objects))

In [8]:
first_object.__dict__.values()

dict_values(['27003', 'Anoka County, Minnesota', '937', '16'])

In [11]:
list(entities[0].values())

['2012-01-01T00:00:00.000',
 '0500000US27003',
 '27003',
 'Anoka County, Minnesota',
 '937',
 '16']

In [12]:
list(first_object.__dict__.values()) == list(entities[0].values())
# True

False

B. Now that you have solved for one, solve for all.  Coerce all of the dictionaries into objects.  Assign the list of objects to a variable `targets`.

In [13]:
targets = []

for target_dict in entities: 
    selected_attr = {'id2': target_dict['id2'],
                       'address': target_dict['geography'],
                       'estimate_median_gross_rent': target_dict['estimate_median_gross_rent'],  
                     'margin_of_error_median_gross_rent': target_dict['margin_of_error_median_gross_rent']}
    target = Target(selected_attr['id2'], 
                      selected_attr['address'], 
                      selected_attr['estimate_median_gross_rent'], 
                      selected_attr['margin_of_error_median_gross_rent'])
    targets.append(target)

In [14]:
len(targets) == len(entities)
# True

True

### 3. Move the remaining code into an object 

At this point, we successfully have transformed a list of dictionaries from an API into a list of objects.  But we need to keep cleaning up our code.  To do this, look at the code outside of a class, and move it into a class, with a method name of run.

In [16]:
class TargetAdapter:
  def __init__(self, target_dicts):
    self._target_dicts = target_dicts
    
  def run(self):
    self._targets = []
    for target_dict in self._target_dicts: 
      selected_attr = {'id2': target_dict['id2'],
                       'address': target_dict['geography'],
                       'estimate_median_gross_rent': target_dict['estimate_median_gross_rent'],  
                     'margin_of_error_median_gross_rent': target_dict['margin_of_error_median_gross_rent']}
      target = Target(selected_attr['id2'], 
                      selected_attr['address'], 
                      selected_attr['estimate_median_gross_rent'], 
                      selected_attr['margin_of_error_median_gross_rent'])
      self._targets.append(target)
    return self._targets

Let's marke sure that this works.

In [17]:
target_dicts = entities
target_adapter = TargetAdapter(target_dicts)
results = target_adapter.run()

In [18]:
len(results) == len(targets)
# True

True

### 4. Make the methods smaller

Next, separate out the run method in the adapter class smaller.  Do this by first writing comments in the code, and then moving the code into separate methods.  Please leave the comments in your code.  Your methods should be no longer than five lines long, and there can only be a total of one `if else` statement or `loop` per method.  Having both an `if else` and a `loop` in any method is also too complicated -- don't do it.

In [19]:
class TargetAdapter:
  def run(self):
    self._targets = []
    for target_dict in self._target_dicts: 
      selected_attr = {'id2': target_dict['id2'],
                       'address': target_dict['geography'],
                       'estimate_median_gross_rent': target_dict['estimate_median_gross_rent'],  
                     'margin_of_error_median_gross_rent': target_dict['margin_of_error_median_gross_rent']}
      target = Target(selected_attr['id2'], 
                      selected_attr['address'], 
                      selected_attr['estimate_median_gross_rent'], 
                      selected_attr['margin_of_error_median_gross_rent'])
      self._targets.append(target)
    return self._targets

### 5. Create the client class

Next move calls to the API into their own separate class.  This way we can call the API but later to decide to coerce the data into different types of objects than we did above.

In [20]:
import requests
class MedianGrossRentAPI:
  def run(self):
    url = "https://opendata.ramseycounty.us/resource/p5xb-h3xq.json"
    return requests.get(url).json()

Place the updated Adapter class below.  Check that it still works as it did before.

In [21]:
class RefactoredTargetAdapter:
  def __init__(self, target_dicts):
    self._target_dicts = target_dicts
  def run(self):
    self._targets = []
    for target_dict in self._target_dicts: 
      selected_attr = {'id2': target_dict['id2'],
                       'address': target_dict['geography'],
                       'estimate_median_gross_rent': target_dict['estimate_median_gross_rent'],  
                     'margin_of_error_median_gross_rent': target_dict['margin_of_error_median_gross_rent']}
      target = Target(selected_attr['id2'], 
                      selected_attr['address'], 
                      selected_attr['estimate_median_gross_rent'], 
                      selected_attr['margin_of_error_median_gross_rent'])
      self._targets.append(target)
    return self._targets

In [22]:
target_dicts = MedianGrossRentAPI().run()
target_adapter = RefactoredTargetAdapter(target_dicts)
results = target_adapter.run()

In [23]:
refactored_adapter = RefactoredTargetAdapter(target_dicts)
len(refactored_adapter.run()) == len(target_adapter.run())

True

### Summary

Great job!  Hopefully, you saw how by building our code and then slowly refactoring our code, we can eventually get to some clean code.