Skip to content
This repository has been archived by the owner on Mar 1, 2018. It is now read-only.

Commit

Permalink
Document cleansing operations happening in adapter
Browse files Browse the repository at this point in the history
  • Loading branch information
anaschwendler committed May 4, 2017
1 parent 368c0d8 commit 2558920
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 0 deletions.
2 changes: 2 additions & 0 deletions rosie/chamber_of_deputies/adapter.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,12 +41,14 @@ def rename_columns(self):
self._dataset.rename(columns=columns, inplace=True)

def rename_categories(self):
# There's no documented type for `3`, thus we assume it's an input error
self._dataset['document_type'].replace({3: None}, inplace=True)
self._dataset['document_type'] = self._dataset['document_type'].astype(
'category')
types = ['bill_of_sale', 'simple_receipt', 'expense_made_abroad']
self._dataset['document_type'].cat.rename_categories(
types, inplace=True)
# Classifiers expect a more broad category name for meals
self._dataset['category'] = self._dataset['category'].replace(
{'Congressperson meal': 'Meal'})
self._dataset['is_party_expense'] = \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,10 @@ class MealPriceOutlierClassifier(TransformerMixin):
applicant_id : string column
A personal identifier code for every person making expenses.
category : category column
Category of the expense. The model will be applied just in rows where
the value is equal to "Meal".
net_value : float column
The value of the expense.
Expand Down

0 comments on commit 2558920

Please sign in to comment.