# SERVICE_STANDARDIZE_NAME
Written in Python 3.6 by Erin Ochoa

Jupyter Notebook written by Jerry Shi

In [1]:
from SERVICE_STANDARDIZE_NAME import *

In addition to the STANDARDIZE_NAME module, the SERVICE_STANDARDIZE_NAME module functions to further clean and standardize addresses. Data can be run through the standardize_name function of the former and/or stdname function of the latter.

First, the required python operator library of re is imported, as well as the STANDARDIZE_NAME module.

In [2]:
import re
import STANDARDIZE_NAME as sname

Next, a new function stdname is defined. The operand string is first processed using the standardize_name function of the previous module.
```
def stdname(string):
    
    string = sname.standardize_name(string)
```

After being run through the standardize_name function, some frivolous detail remains in the data. For example, University of Chicago Laboratory Schools can be simplified to University of Chicago. To clean the data, first determine organizations where removing suborganization names won't affect data analysis. Salvation Army of North Chicago and Salvation Army of South Side Chicago, for example, can be treated as the same organization for the sake of analysis. These organizations that can be condensed without losing detail are grouped into two categories: univs (universities) and others.

In [3]:
univs = ['UNIVERSITY OF CHICAGO','NORTHWESTERN UNIVERSITY',
             'DEPAUL UNIVERSITY','RUSH UNIVERSITY','UNIVERSITY OF ILLINOIS',
             'SOUTHERN ILLINOIS UNIVERSITY','ILLINOIS INSTITUTE OF TECHNOLOGY']

others = ['SALVATION ARMY','EL VALOR','ERIE FAMILY HEALTH CENTER',
              'FRESENIUS MEDICAL CARE','FRIEND FAMILY HEALTH CENTER',
              'HUMAN RESOURCES DEVELOPMENT INSTITUTE']

The variable item is initialized to combine both univs and others. A simple if statement replaces names with their simplified form: if a string starts with or ends with an item (meaning any organization in univs or others), item is returned rather the initial string.

```
    for item in univs + others:
        if string.startswith(item) or string.endswith(item):
            return item
```

Thus, University of Chicago Labratory School when run through the function stddname returns University of Chicago.

In [4]:
stdname('University of Chicago Labratory School')

'UNIVERSITY OF CHICAGO'

Similarly, the various catholic charitable ventures of Chicago are cleaned to become "CATHOLIC CHARITIES OF ARCHDIOCESE OF CHICAGO". A variable cath is initialized with all the different beginnings/endings of the various catholic charitable ventures. The variable c (meaning anything in cath) is initialized in a simple if statement: if a string starts with or ends with anything in cath, 'CATHOLIC CHARITIES OF ARCHDIOCESE OF CHICAGO' is returned.
```
    cath = ['CATHOLIC CHARITIES','CATHOLIC BISHOP OF CHICAGO',
            'ARCHDIOCESE OF CHICAGO','ARCHDIOSIS OF CHICAGO']
            
    for c in cath:
        if string.startswith(c) or string.endswith(c):
            return 'CATHOLIC CHARITIES OF ARCHDIOCESE OF CHICAGO'
```

The last cleaning is for a select few organizations but this time a different method is used to store variables for comparison: a dictionary. [According to W3Schools](https://www.w3schools.com/python/python_dictionaries.asp), a dictionary is a collection which is unordered, changeable and indexed. In Python dictionaries are written with curly brackets, and they have keys and values.

In [5]:
    dicto = {'CHURCH OF JSUS CHRIST OF LD STS':'CHURCH OF JESUS CHRIST OF LATTER DAY SAINTS',
             'EASTER SEALS':'EASTER SEALS METROPOLITAN CHICAGO',
             '^UIC ':'UNIVERSITY OF ILLINOIS',
             '^UNIVERSITY ILLINOIS ':'UNIVERSITY OF ILLINOIS'
            }

Thus, a disctionary is defined with several key-value pairs. ^ means "begins with" in regex (recall we imported regex in the first two lines).

    for key, value in dicto.items():
        if re.findall(key,string):
            return value

Next, using these key-value pairs, we use a regex function - findall - to find any instances of keys from the dictionary in the inputted string. Any keys found are replaced with their respective value from the dictionary with the simple "return value" command. Rather than systematically cleaning certain organizations (catholic charitable ventures, schools, etc.), this method of standardizing names is more suited for a select few instances. 

```
return string
```

Lastly, the altered string is returned and this function is completed.