# Jupyter Notebook for converting the Go Pass Data


## 1.0 Load in modules

We are using pandas and the numpy libraries to read and process the data.

Then we will use the `rapidfuzz` library to caclulate close matches to the strings so that we can join the information.

Using other popular fuzzy-string matching libraries like `fuzzball` takes over 3 hours to process as the alogrithim used takes the number of records to the `n`th power!! (i.e. comparing 1000 records processes 1000^1000!!!) `rapidfuzz` has a better algorithm that trims this town to 3 - 4 minutes!


In [41]:
import pandas as pd
import numpy as np
from datetime import datetime
from rapidfuzz import process, utils as fuzz_utils

GOOGLE_SHEET_URL = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vSWbwrsqF-c---4lfw0LZWymd-f8sy8sLYkXgzh0OyeGATWwrvv7V1Mq5BcApn7F_-WYKP1KXy5shKw/pub?gid=376323488&single=true&output=csv'

## 2.0 Format Go Pass Data

We will be extracting the following fields:

- District
- School
- Address
- City
- Participating

We create a simplfied field called `name` based on the `street address`, `school name`, and `district`. This field is formatted to be all lowercase with spaces replaced with `-` to make matching easier.

In [42]:
go_pass_schools = pd.read_csv(GOOGLE_SHEET_URL,
    usecols={'district_x','original_name_x','address_x','phone'})
go_pass_schools.columns = ["district","name","address","phone"]

go_pass_schools = go_pass_schools.fillna('')
go_pass_schools['original_name'] = go_pass_schools['name']
simple_district = go_pass_schools['district'].str.lower().str.replace(" ","-").str.replace(".","").str.replace('"','')
# go_pass_schools['name'] = go_pass_schools['name'].str.lower().str.replace(" ","-").str.replace(".","").str.replace('"','')+"-"+simple_district
simple_address = go_pass_schools['address'].str.lower().str.replace(" ","-").str.replace(".","").str.replace('"','')
simple_name = go_pass_schools['name'].str.lower().str.replace(" ","-").str.replace(".","").str.replace('"','')
go_pass_schools.loc[(go_pass_schools['name'].str.len() > 1),'name'] = simple_name+"-"+simple_address
go_pass_schools['participating'] = True
go_pass_schools



  simple_district = go_pass_schools['district'].str.lower().str.replace(" ","-").str.replace(".","").str.replace('"','')
  simple_address = go_pass_schools['address'].str.lower().str.replace(" ","-").str.replace(".","").str.replace('"','')
  simple_name = go_pass_schools['name'].str.lower().str.replace(" ","-").str.replace(".","").str.replace('"','')


Unnamed: 0,district,name,address,phone,original_name,participating
0,Centinela Valley,hawthorne-high-4859-west-el-segundo-blvd,4859 West El Segundo Blvd.,(310) 263-4400,Hawthorne High,True
1,Centinela Valley,lawndale-high-14901-s-inglewood-avenue,14901 S. Inglewood Avenue,(310) 263-3100,Lawndale High,True
2,Centinela Valley,leuzinger-high-4118-west-rosecrans-ave,4118 West Rosecrans Ave.,(310) 263-2200,Leuzinger High,True
3,Centinela Valley,r-k-lloyde-high-4951-marine-ave,4951 Marine Ave.,(310) 263-3264,R. K. Lloyde High,True
4,Charter,alliance-alice-m-baxter-college-ready-high-sch...,"461 9th Street, San Pedro",(310) 221-0430,Alliance Alice M. Baxter College-Ready High Sc...,True
...,...,...,...,...,...,...
1480,Private,cathedral-high-school-1253-bishops-road,1253 Bishops Road,(323) 441-3113,Cathedral High School,True
1481,Private,episcopal-school-of-los-angeles-6361-santa-mon...,6361 Santa Monica Blvd.,(323) 284-7266,Episcopal School of Los Angeles,True
1482,Private,northpoint-school-9650-zelzah-ave,9650 Zelzah Ave.,(818) 739-5231,Northpoint School,True
1483,Santa Monica-Malibu,,,,,True


### 2.1 Format California Schools data

We create a simplfied field called `simple name` based on the `street address`, `school name`, and `district`. This field is formatted to be all lowercase with spaces replaced with `-` to make matching easier.

#### Source for California schools data

https://www.cde.ca.gov/SchoolDirectory/ExportSelect?simpleSearch=N&address=&city=&counties=&districts=&cdscode=&charter=&magnet=&name=&nps=&search=2&zip=&yearround=&status=1%2C2&types=&order=1&multilingual=&qsc=3549&qdc=3549

August 2022


In [43]:
california_schools_data = "../data/TapData.csv"

california_schools = pd.read_csv(california_schools_data,
         usecols={'rowid','school','district','street'})
california_schools.columns = ['rowid',"school","district","address"]

la_schools_df = california_schools.fillna('')
# la_schools_df

la_schools_df['original_name'] = la_schools_df['school']
name = la_schools_df['school'].replace({'\([a-zA-Z\s\.\-\/]*\(*[a-zA-Z\s\.\-\/]*\)*[a-zA-Z\s\.\-\/]*\)$'},'',regex=True)
la_schools_df['name'] = name
# la_schools_df.loc[la_schools_df['district'] == "Los Angeles Unified", 'district'] = "LAUSD"
simple_district = la_schools_df['district'].str.lower().str.replace(" ","-").str.replace(".","").str.replace('"','')
simple_address = la_schools_df['address'].str.lower().str.replace(" ","-").str.replace(".","").str.replace('"','')
simple_name = la_schools_df['name'].str.lower().str.replace(" ","-").str.replace(".","").str.replace('"','')+"-"
la_schools_df['simple_name'] = simple_name+"-"+simple_address
la_schools_df['simple_name'] = la_schools_df['simple_name'].str.replace("---","-")


la_schools_df

  simple_district = la_schools_df['district'].str.lower().str.replace(" ","-").str.replace(".","").str.replace('"','')
  simple_address = la_schools_df['address'].str.lower().str.replace(" ","-").str.replace(".","").str.replace('"','')
  simple_name = la_schools_df['name'].str.lower().str.replace(" ","-").str.replace(".","").str.replace('"','')+"-"


Unnamed: 0,rowid,school,district,address,original_name,name,simple_name
0,1,Palms Elementary (ABC Unified),ABC Unified,12445 East 207th St.,Palms Elementary (ABC Unified),Palms Elementary,palms-elementary-12445-east-207th-st
1,2,Ross (Faye) Middle (ABC Unified),ABC Unified,17707 Elaine Ave.,Ross (Faye) Middle (ABC Unified),Ross (Faye) Middle,ross-(faye)-middle-17707-elaine-ave
2,3,Stowers(Cecil B.) Elementary (ABC Unified),ABC Unified,13350 Beach St.,Stowers(Cecil B.) Elementary (ABC Unified),Stowers(Cecil B.) Elementary,stowers(cecil-b)-elementary-13350-beach-st
3,4,Tetzlaff (Martin B.) Middle (ABC Unified),ABC Unified,12351 East Del Amo Blvd.,Tetzlaff (Martin B.) Middle (ABC Unified),Tetzlaff (Martin B.) Middle,tetzlaff-(martin-b)-middle-12351-east-del-amo-...
4,5,Tracy (Wilbur) High (Continuation) (ABC Unified),ABC Unified,12222 Cuesta Dr.,Tracy (Wilbur) High (Continuation) (ABC Unified),Tracy (Wilbur) High (Continuation),tracy-(wilbur)-high-(continuation)-12222-cuest...
...,...,...,...,...,...,...,...
3146,3147,YouthBuild - Norwalk (YouthBuild Charter Schoo...,YouthBuild Charter Schools of Los Angeles,12124 Front St.,YouthBuild - Norwalk (YouthBuild Charter Schoo...,YouthBuild - Norwalk,youthbuild-norwalk-12124-front-st
3147,3148,YouthBuild - Palmdale (YouthBuild Charter Scho...,YouthBuild Charter Schools of Los Angeles,38626 9th St. East,YouthBuild - Palmdale (YouthBuild Charter Scho...,YouthBuild - Palmdale,youthbuild-palmdale-38626-9th-st-east
3148,3149,YouthBuild - Pomona (YouthBuild Charter School...,YouthBuild Charter Schools of Los Angeles,305 E Arrow Hwy.,YouthBuild - Pomona (YouthBuild Charter School...,YouthBuild - Pomona,youthbuild-pomona-305-e-arrow-hwy
3149,3150,YouthBuild - South LA (YouthBuild Charter Scho...,YouthBuild Charter Schools of Los Angeles,400 West Washington Blvd.,YouthBuild - South LA (YouthBuild Charter Scho...,YouthBuild - South LA,youthbuild-south-la-400-west-washington-blvd


In [44]:
# unique_schools = go_pass_schools.drop_duplicates(subset=['original_name'])
# print(go_pass_schools.shape)
# print(unique_schools.shape)

# difference_in_dupes = go_pass_schools.shape[0] - unique_schools.shape[0]
# # these are the number of records with the same names
# print("Number of records with same names: \n "+str(difference_in_dupes))

nina_q = la_schools_df.loc[la_schools_df['simple_name'].str.contains('alliance')]
# nina_q = la_schools_df.query('simple_name.str.contains("alliance")')
# go_pass_schools.loc[]
for a in nina_q['simple_name']:
    print(a)


alliance-leichtman-levine-family-foundation-environmental-science-high-2930-fletcher-dr
alliance-marc-&-eva-stern-math-and-science-5151-state-university-dr,-lot-2
alliance-margaret-m-bloomfield-technology-academy-high-7907-santa-fe-ave
alliance-marine-innovation-and-technology-6-12-complex-11933-allegheny-st
alliance-morgan-mckinzie-high-110-south-townsend-ave
alliance-ouchi-o'donovan-6-12-complex-5356-south-fifth-ave
alliance-patti-and-peter-neuwirth-leadership-academy-4610-south-main-st
alliance-piera-barbaglia-shaheen-health-services-academy-8515-kansas-ave
alliance-renee-and-meyer-luskin-academy-high-2941-west-70th-st
alliance-susan-and-eric-smidt-technology-high-211-south-avenue-20
alliance-ted-k-tajima-high-1552-w-rockwood-st
alliance-tennenbaum-family-technology-high-2050-north-san-fernando-rd
alliance-virgil-roberts-leadership-academy-2941-west-70th-st
alliance-cindy-and-bill-simon-technology-academy-high-10720-south-wilmington-ave
alliance-college-ready-middle-academy-12-131-e

## 3.0 Joining the `Original dataset` to the `California dataset`

Here we do a blanket `left` merge where the original records get data added to it.

If we want to keep the California data, then we need to switch this merge type to `inner` or `right`.

#### See the Pandas merge documentation for more information:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.merge.html

In [45]:
def fuzzy_merge(baseFrame, compareFrame, baseKey, compareKey, threshold=86, limit=1, how='left'):
    s_mapping = {x: fuzz_utils.default_process(x) for x in compareFrame[compareKey]}

    m1 = baseFrame[baseKey].apply(lambda x: process.extract(
      fuzz_utils.default_process(x), s_mapping, limit=limit, score_cutoff=threshold, processor=None
    ))
    baseFrame['Match'] = m1

    m2 = baseFrame['Match'].apply(lambda x: ', '.join(i[2] for i in x))
    baseFrame['name'] = m2.replace("",np.nan)
    # baseFrame['school'] = m2['school']
    return baseFrame.merge(compareFrame, left_on='name', right_on=compareKey, how=how)

merged_df = fuzzy_merge(go_pass_schools, la_schools_df, 'name', 'simple_name',how='outer')
# merged_df = fuzzy_merge(go_pass_schools, la_schools_df, 'original_name', 'name',how='left')
# merged_df = fuzzy_merge(la_schools_df, go_pass_schools, 'name', 'original_name',how='right')
merged_df


Unnamed: 0,district_x,name_x,address_x,phone,original_name_x,participating,Match,rowid,school,district_y,address_y,original_name_y,name_y,simple_name
0,Centinela Valley,hawthorne-high-4859-west-el-segundo-blvd,4859 West El Segundo Blvd.,(310) 263-4400,Hawthorne High,True,"[(hawthorne high 4859 west el segundo blvd, 10...",306.0,Hawthorne High (Centinela Valley Union High),Centinela Valley Union High,4859 West El Segundo Blvd.,Hawthorne High (Centinela Valley Union High),Hawthorne High,hawthorne-high-4859-west-el-segundo-blvd
1,Centinela Valley,lawndale-high-14901-south-inglewood-ave,14901 S. Inglewood Avenue,(310) 263-3100,Lawndale High,True,"[(lawndale high 14901 south inglewood ave, 90....",307.0,Lawndale High (Centinela Valley Union High),Centinela Valley Union High,14901 South Inglewood Ave.,Lawndale High (Centinela Valley Union High),Lawndale High,lawndale-high-14901-south-inglewood-ave
2,Centinela Valley,leuzinger-high-4118-west-rosecrans-ave,4118 West Rosecrans Ave.,(310) 263-2200,Leuzinger High,True,"[(leuzinger high 4118 west rosecrans ave, 100....",308.0,Leuzinger High (Centinela Valley Union High),Centinela Valley Union High,4118 West Rosecrans Ave.,Leuzinger High (Centinela Valley Union High),Leuzinger High,leuzinger-high-4118-west-rosecrans-ave
3,Centinela Valley,r-k-lloyde-high-4951-marine-ave,4951 Marine Ave.,(310) 263-3264,R. K. Lloyde High,True,"[(r k lloyde high 4951 marine ave, 100.0, r-k-...",310.0,R. K. Lloyde High (Centinela Valley Union High),Centinela Valley Union High,4951 Marine Ave.,R. K. Lloyde High (Centinela Valley Union High),R. K. Lloyde High,r-k-lloyde-high-4951-marine-ave
4,Charter,tracy-(wilbur)-high-(continuation)-12222-cuest...,"461 9th Street, San Pedro",(310) 221-0430,Alliance Alice M. Baxter College-Ready High Sc...,True,[(tracy wilbur high continuation 12222 cue...,5.0,Tracy (Wilbur) High (Continuation) (ABC Unified),ABC Unified,12222 Cuesta Dr.,Tracy (Wilbur) High (Continuation) (ABC Unified),Tracy (Wilbur) High (Continuation),tracy-(wilbur)-high-(continuation)-12222-cuest...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3668,,,,,,,,3139.0,Monsenor Oscar Romero Charter Middle (Youth Po...,Youth Policy Institute (YPI) Charter Schools,2670 West 11th St.,Monsenor Oscar Romero Charter Middle (Youth Po...,Monsenor Oscar Romero Charter Middle,monsenor-oscar-romero-charter-middle-2670-west...
3669,,,,,,,,3140.0,YouthBuild - Avalon (YouthBuild Charter School...,YouthBuild Charter Schools of Los Angeles,920 S. Avalon Blvd.,YouthBuild - Avalon (YouthBuild Charter School...,YouthBuild - Avalon,youthbuild-avalon-920-s-avalon-blvd
3670,,,,,,,,3145.0,YouthBuild - Hollywood (YouthBuild Charter Sch...,YouthBuild Charter Schools of Los Angeles,5941 Hollywood Boulevard,YouthBuild - Hollywood (YouthBuild Charter Sch...,YouthBuild - Hollywood,youthbuild-hollywood-5941-hollywood-boulevard
3671,,,,,,,,3147.0,YouthBuild - Norwalk (YouthBuild Charter Schoo...,YouthBuild Charter Schools of Los Angeles,12124 Front St.,YouthBuild - Norwalk (YouthBuild Charter Schoo...,YouthBuild - Norwalk,youthbuild-norwalk-12124-front-st


## 4.0 Cleaning up of Merged Data

The steps are as follows:
1. Assign the full name the original name and the district
   - Note: you can also use the California district names instead by uncommenting the `Altenative` line
2. Process alternative method of naming where district names are only added if there are duplicates
3. Set the final `phone` column to default to the original data's `telephone` column
4. Get the match score from rapidfuzz

In [46]:
merged_df['rowid'].fillna(0, inplace=True)
merged_df['rowid'] = merged_df['rowid'].astype('int64')

In [47]:
# 1. Default name processing
# merged_df['full_name'] = merged_df['original_name_x'] + ' (' + merged_df['district_x'] + ')'
merged_df.loc[merged_df['rowid'] == 0, 'school'] = merged_df['original_name_x'] + ' (' + merged_df['district_x'] + ')'
merged_df.loc[merged_df['rowid'] == 0, 'district'] = merged_df['district_x']
# 1b. alternative approach: uncomment below to use the California district names instead
# merged_df['full_name']  = merged_df['original_name_x'] + ' (' + merged_df['district_y'] + ')'

# 2. Alternative school name field processing
# Flag duplicated records
merged_df['duped'] = merged_df.duplicated(['original_name_x'],keep=False)

# `full_name_some` is the option where there are district names for only duplicated records.
# merged_df.loc[merged_df['duped'] == False, 'full_name_some'] = merged_df['original_name_x']
# merged_df.loc[merged_df['duped'] == True, 'full_name_some'] = merged_df['original_name_x'] + ' (' + merged_df['district_y'] + ')'

# 3. Phone number processing
# merged_df['phone'] = merged_df['telephone']

# if there is less than 4 characters in that column then set it to the California's phone numbers i.e. 'default phone' column
# merged_df.loc[merged_df['phone'].str.len() < 4, 'phone'] = merged_df['default_phone']

# 4. Match score processing
merged_df['score'] = merged_df['Match'].astype('string').str.split(",").str[1]

merged_df

Unnamed: 0,district_x,name_x,address_x,phone,original_name_x,participating,Match,rowid,school,district_y,address_y,original_name_y,name_y,simple_name,district,duped,score
0,Centinela Valley,hawthorne-high-4859-west-el-segundo-blvd,4859 West El Segundo Blvd.,(310) 263-4400,Hawthorne High,True,"[(hawthorne high 4859 west el segundo blvd, 10...",306,Hawthorne High (Centinela Valley Union High),Centinela Valley Union High,4859 West El Segundo Blvd.,Hawthorne High (Centinela Valley Union High),Hawthorne High,hawthorne-high-4859-west-el-segundo-blvd,,False,100.0
1,Centinela Valley,lawndale-high-14901-south-inglewood-ave,14901 S. Inglewood Avenue,(310) 263-3100,Lawndale High,True,"[(lawndale high 14901 south inglewood ave, 90....",307,Lawndale High (Centinela Valley Union High),Centinela Valley Union High,14901 South Inglewood Ave.,Lawndale High (Centinela Valley Union High),Lawndale High,lawndale-high-14901-south-inglewood-ave,,False,90.9090909090909
2,Centinela Valley,leuzinger-high-4118-west-rosecrans-ave,4118 West Rosecrans Ave.,(310) 263-2200,Leuzinger High,True,"[(leuzinger high 4118 west rosecrans ave, 100....",308,Leuzinger High (Centinela Valley Union High),Centinela Valley Union High,4118 West Rosecrans Ave.,Leuzinger High (Centinela Valley Union High),Leuzinger High,leuzinger-high-4118-west-rosecrans-ave,,False,100.0
3,Centinela Valley,r-k-lloyde-high-4951-marine-ave,4951 Marine Ave.,(310) 263-3264,R. K. Lloyde High,True,"[(r k lloyde high 4951 marine ave, 100.0, r-k-...",310,R. K. Lloyde High (Centinela Valley Union High),Centinela Valley Union High,4951 Marine Ave.,R. K. Lloyde High (Centinela Valley Union High),R. K. Lloyde High,r-k-lloyde-high-4951-marine-ave,,False,100.0
4,Charter,tracy-(wilbur)-high-(continuation)-12222-cuest...,"461 9th Street, San Pedro",(310) 221-0430,Alliance Alice M. Baxter College-Ready High Sc...,True,[(tracy wilbur high continuation 12222 cue...,5,Tracy (Wilbur) High (Continuation) (ABC Unified),ABC Unified,12222 Cuesta Dr.,Tracy (Wilbur) High (Continuation) (ABC Unified),Tracy (Wilbur) High (Continuation),tracy-(wilbur)-high-(continuation)-12222-cuest...,,False,85.5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3668,,,,,,,,3139,Monsenor Oscar Romero Charter Middle (Youth Po...,Youth Policy Institute (YPI) Charter Schools,2670 West 11th St.,Monsenor Oscar Romero Charter Middle (Youth Po...,Monsenor Oscar Romero Charter Middle,monsenor-oscar-romero-charter-middle-2670-west...,,True,
3669,,,,,,,,3140,YouthBuild - Avalon (YouthBuild Charter School...,YouthBuild Charter Schools of Los Angeles,920 S. Avalon Blvd.,YouthBuild - Avalon (YouthBuild Charter School...,YouthBuild - Avalon,youthbuild-avalon-920-s-avalon-blvd,,True,
3670,,,,,,,,3145,YouthBuild - Hollywood (YouthBuild Charter Sch...,YouthBuild Charter Schools of Los Angeles,5941 Hollywood Boulevard,YouthBuild - Hollywood (YouthBuild Charter Sch...,YouthBuild - Hollywood,youthbuild-hollywood-5941-hollywood-boulevard,,True,
3671,,,,,,,,3147,YouthBuild - Norwalk (YouthBuild Charter Schoo...,YouthBuild Charter Schools of Los Angeles,12124 Front St.,YouthBuild - Norwalk (YouthBuild Charter Schoo...,YouthBuild - Norwalk,youthbuild-norwalk-12124-front-st,,True,


## 5.0 Final Column Selection
Here we select the final data columns for our outputs, with `_x` suffixes representing the original Metro dataset and `_y` suffixes representing the California schools dataset.

- "district_x"
- "district_y"
- "original_name_x"
- "original_name_y"
- "status"
- 'closed_date'
- "full_name"
- "full_name_some"
- "participating"
- "address_x"
- "address_y"
- "city_x"
- "city_y"
- 'address_y'
- 'city_y'
- "score"
- "duped"
- 'phone'
- 'email'
- 'website'
- 'latitude'
- 'longitude'
- 'last_update'

In [48]:
# final_columns = {
#     "full_name":"school_name",
#     "full_name_some":"school_name_with_some_districts_attached"
# }

# final_df = merged_df[["oid","district_x","district_y","original_name_x","original_name_y","status",'closed_date',
#        "full_name","full_name_some","participating","address_x","address_y","city_x","city_y","score","duped",'phone', 'email',
#        'website', 'latitude', 'longitude', 'last_update']]
final_df = merged_df

# final_df["address"] = final_df["address_x"] 


# final_df.rename(inplace=True)
# final_df.reset_index(inplace=True)
# final_df.rename(inplace=True, columns=final_columns)
final_df.reset_index(inplace=True)
# final_df.index.names = ['id']
final_df.index.names = ['id']

final_df

Unnamed: 0_level_0,index,district_x,name_x,address_x,phone,original_name_x,participating,Match,rowid,school,district_y,address_y,original_name_y,name_y,simple_name,district,duped,score
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
0,0,Centinela Valley,hawthorne-high-4859-west-el-segundo-blvd,4859 West El Segundo Blvd.,(310) 263-4400,Hawthorne High,True,"[(hawthorne high 4859 west el segundo blvd, 10...",306,Hawthorne High (Centinela Valley Union High),Centinela Valley Union High,4859 West El Segundo Blvd.,Hawthorne High (Centinela Valley Union High),Hawthorne High,hawthorne-high-4859-west-el-segundo-blvd,,False,100.0
1,1,Centinela Valley,lawndale-high-14901-south-inglewood-ave,14901 S. Inglewood Avenue,(310) 263-3100,Lawndale High,True,"[(lawndale high 14901 south inglewood ave, 90....",307,Lawndale High (Centinela Valley Union High),Centinela Valley Union High,14901 South Inglewood Ave.,Lawndale High (Centinela Valley Union High),Lawndale High,lawndale-high-14901-south-inglewood-ave,,False,90.9090909090909
2,2,Centinela Valley,leuzinger-high-4118-west-rosecrans-ave,4118 West Rosecrans Ave.,(310) 263-2200,Leuzinger High,True,"[(leuzinger high 4118 west rosecrans ave, 100....",308,Leuzinger High (Centinela Valley Union High),Centinela Valley Union High,4118 West Rosecrans Ave.,Leuzinger High (Centinela Valley Union High),Leuzinger High,leuzinger-high-4118-west-rosecrans-ave,,False,100.0
3,3,Centinela Valley,r-k-lloyde-high-4951-marine-ave,4951 Marine Ave.,(310) 263-3264,R. K. Lloyde High,True,"[(r k lloyde high 4951 marine ave, 100.0, r-k-...",310,R. K. Lloyde High (Centinela Valley Union High),Centinela Valley Union High,4951 Marine Ave.,R. K. Lloyde High (Centinela Valley Union High),R. K. Lloyde High,r-k-lloyde-high-4951-marine-ave,,False,100.0
4,4,Charter,tracy-(wilbur)-high-(continuation)-12222-cuest...,"461 9th Street, San Pedro",(310) 221-0430,Alliance Alice M. Baxter College-Ready High Sc...,True,[(tracy wilbur high continuation 12222 cue...,5,Tracy (Wilbur) High (Continuation) (ABC Unified),ABC Unified,12222 Cuesta Dr.,Tracy (Wilbur) High (Continuation) (ABC Unified),Tracy (Wilbur) High (Continuation),tracy-(wilbur)-high-(continuation)-12222-cuest...,,False,85.5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3668,3668,,,,,,,,3139,Monsenor Oscar Romero Charter Middle (Youth Po...,Youth Policy Institute (YPI) Charter Schools,2670 West 11th St.,Monsenor Oscar Romero Charter Middle (Youth Po...,Monsenor Oscar Romero Charter Middle,monsenor-oscar-romero-charter-middle-2670-west...,,True,
3669,3669,,,,,,,,3140,YouthBuild - Avalon (YouthBuild Charter School...,YouthBuild Charter Schools of Los Angeles,920 S. Avalon Blvd.,YouthBuild - Avalon (YouthBuild Charter School...,YouthBuild - Avalon,youthbuild-avalon-920-s-avalon-blvd,,True,
3670,3670,,,,,,,,3145,YouthBuild - Hollywood (YouthBuild Charter Sch...,YouthBuild Charter Schools of Los Angeles,5941 Hollywood Boulevard,YouthBuild - Hollywood (YouthBuild Charter Sch...,YouthBuild - Hollywood,youthbuild-hollywood-5941-hollywood-boulevard,,True,
3671,3671,,,,,,,,3147,YouthBuild - Norwalk (YouthBuild Charter Schoo...,YouthBuild Charter Schools of Los Angeles,12124 Front St.,YouthBuild - Norwalk (YouthBuild Charter Schoo...,YouthBuild - Norwalk,youthbuild-norwalk-12124-front-st,,True,


In [49]:
final_df.columns

Index(['index', 'district_x', 'name_x', 'address_x', 'phone',
       'original_name_x', 'participating', 'Match', 'rowid', 'school',
       'district_y', 'address_y', 'original_name_y', 'name_y', 'simple_name',
       'district', 'duped', 'score'],
      dtype='object')

## 5.0 Final Output
Using today's date and the csv file extension we will output the file to the data directory.

We also split the data using `to_json` in pandas, more info here:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_json.html

In [50]:
today = str(datetime.now().strftime("%Y-%m"))
outfile_extension = ".csv"
output_file_name = "../data/tap_data_merged_with_metro_california_qc_data_right_join_"+today+outfile_extension

final_df.to_csv(output_file_name)

#create JSON file oriented by records
output_json = "../src/data/schools_right_join.json"
json_file = final_df.to_json(orient='records',index=True) 
with open(output_json, 'w') as f:
    f.write(json_file)

In [51]:
# Done!