# Jupyter Notebook for converting the Go Pass Data


## 1.0 Load in modules

We are using pandas and the numpy libraries to read and process the data.

Then we will use the `rapidfuzz` library to caclulate close matches to the strings so that we can join the information.

Using other popular fuzzy-string matching libraries like `fuzzball` takes over 3 hours to process as the alogrithim used takes the number of records to the `n`th power!! (i.e. comparing 1000 records processes 1000^1000!!!) `rapidfuzz` has a better algorithm that trims this town to 3 - 4 minutes!


In [62]:
import pandas as pd
import numpy as np
from datetime import datetime
from rapidfuzz import process, utils as fuzz_utils

GOOGLE_SHEET_URL = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vSWbwrsqF-c---4lfw0LZWymd-f8sy8sLYkXgzh0OyeGATWwrvv7V1Mq5BcApn7F_-WYKP1KXy5shKw/pub?gid=376323488&single=true&output=csv'

## 2.0 Format Go Pass Data

We will be extracting the following fields:

- District
- School
- Address
- City
- Participating

We create a simplfied field called `name` based on the `street address`, `school name`, and `district`. This field is formatted to be all lowercase with spaces replaced with `-` to make matching easier.

In [63]:
go_pass_schools = pd.read_csv(GOOGLE_SHEET_URL,
    usecols={'oid','district_x','original_name_x','address_x','phone'})
go_pass_schools.columns = ["california_oid","district","name","address","phone"]

go_pass_schools = go_pass_schools.fillna('')
go_pass_schools['original_name'] = go_pass_schools['name']
simple_district = go_pass_schools['district'].str.lower().str.replace(" ","-").str.replace(".","").str.replace('"','')
simple_address = go_pass_schools['address'].str.lower().str.replace(" ","-").str.replace(".","").str.replace('"','').str.split().str.get(0)
simple_name = go_pass_schools['name'].str.lower().str.replace("[[:punct:]]","-",regex=True).str.replace(" ","-").str.replace(".","").str.replace('"','')

key =  simple_address+"-"+simple_name
go_pass_schools.loc[(go_pass_schools['name'].str.len() > 1) & (go_pass_schools['address'].str.len() > 1),'devons_key'] = key
go_pass_schools.loc[(go_pass_schools['name'].str.len() > 1) & (go_pass_schools['address'].str.len()  == 0),'devons_key'] = simple_name
go_pass_schools.loc[(go_pass_schools['name'].str.len() > 1) & (go_pass_schools['address'].str.len() > 1),'name'] = key
go_pass_schools.loc[(go_pass_schools['name'].str.len() > 1) & (go_pass_schools['address'].str.len()  == 0),'name'] = simple_name

# go_pass_schools['name'].fillna('')

go_pass_schools['california_oid'] = go_pass_schools['california_oid'].astype(str).str.replace(".0","")
go_pass_schools['participating'] = True
go_pass_schools

  simple_district = go_pass_schools['district'].str.lower().str.replace(" ","-").str.replace(".","").str.replace('"','')
  simple_address = go_pass_schools['address'].str.lower().str.replace(" ","-").str.replace(".","").str.replace('"','').str.split().str.get(0)
  simple_name = go_pass_schools['name'].str.lower().str.replace("[[:punct:]]","-",regex=True).str.replace(" ","-").str.replace(".","").str.replace('"','')
  go_pass_schools['california_oid'] = go_pass_schools['california_oid'].astype(str).str.replace(".0","")


Unnamed: 0,california_oid,district,name,address,phone,original_name,devons_key,participating
0,19643521933951,Centinela Valley,4859-west-el-segundo-blvd-hawthorne-high,4859 West El Segundo Blvd.,(310) 263-4400,Hawthorne High,4859-west-el-segundo-blvd-hawthorne-high,True
1,19643521934926,Centinela Valley,14901-s-inglewood-avenue-lawndale-high,14901 S. Inglewood Avenue,(310) 263-3100,Lawndale High,14901-s-inglewood-avenue-lawndale-high,True
2,196435219348,Centinela Valley,4118-west-rosecrans-ave-leuzinger-high,4118 West Rosecrans Ave.,(310) 263-2200,Leuzinger High,4118-west-rosecrans-ave-leuzinger-high,True
3,196435219239,Centinela Valley,4951-marine-ave-r-k-lloyde-high,4951 Marine Ave.,(310) 263-3264,R. K. Lloyde High,4951-marine-ave-r-k-lloyde-high,True
4,,Charter,"461-9th-street,-san-pedro-alliance-alice-m-bax...","461 9th Street, San Pedro",(310) 221-0430,Alliance Alice M. Baxter College-Ready High Sc...,"461-9th-street,-san-pedro-alliance-alice-m-bax...",True
...,...,...,...,...,...,...,...,...
1480,19647336934442,Private,1253-bishops-road-cathedral-high-school,1253 Bishops Road,(323) 441-3113,Cathedral High School,1253-bishops-road-cathedral-high-school,True
1481,19647336145999,Private,6361-santa-monica-blvd-episcopal-school-of-los...,6361 Santa Monica Blvd.,(323) 284-7266,Episcopal School of Los Angeles,6361-santa-monica-blvd-episcopal-school-of-los...,True
1482,,Private,9650-zelzah-ave-northpoint-school,9650 Zelzah Ave.,(818) 739-5231,Northpoint School,9650-zelzah-ave-northpoint-school,True
1483,,Santa Monica-Malibu,,,,,,True


### 2.1 Format California Schools data

We create a simplfied field called `simple name` based on the `street address`, `school name`, and `district`. This field is formatted to be all lowercase with spaces replaced with `-` to make matching easier.

#### Source for California schools data

https://www.cde.ca.gov/SchoolDirectory/ExportSelect?simpleSearch=N&address=&city=&counties=&districts=&cdscode=&charter=&magnet=&name=&nps=&search=2&zip=&yearround=&status=1%2C2&types=&order=1&multilingual=&qsc=3549&qdc=3549

August 2022


In [64]:
california_schools_data = "../data/TapData.csv"

california_schools = pd.read_csv(california_schools_data,
         usecols={'rowid','school','district','street'})
california_schools.columns = ['rowid',"school","district","address"]

la_schools_df = california_schools.fillna('')
# la_schools_df

la_schools_df['original_name'] = la_schools_df['school']
name = la_schools_df['school'].str.replace("[[:punct:]]","-",regex=True).replace({'\([a-zA-Z\s\.\-\/]*\(*[a-zA-Z\s\.\-\/]*\)*[a-zA-Z\s\.\-\/]*\)$'},'',regex=True)
la_schools_df['name'] = name
# la_schools_df.loc[la_schools_df['district'] == "Los Angeles Unified", 'district'] = "LAUSD"
simple_district = la_schools_df['district'].str.lower().str.replace(" ","-").str.replace(".","").str.replace('"','')
simple_address = la_schools_df['address'].str.lower().str.replace(" ","-").str.replace(".","").str.replace('"','')
simple_name = la_schools_df['name'].str.lower().str.replace(" ","-").str.replace(".","").str.replace('"','').str.split().str.get(0)
key = simple_address+"-"+simple_name
clean_key = key.str.replace("---","-")
la_schools_df['tap_key'] = clean_key
la_schools_df['simple_name'] = clean_key

la_schools_df

  simple_district = la_schools_df['district'].str.lower().str.replace(" ","-").str.replace(".","").str.replace('"','')
  simple_address = la_schools_df['address'].str.lower().str.replace(" ","-").str.replace(".","").str.replace('"','')
  simple_name = la_schools_df['name'].str.lower().str.replace(" ","-").str.replace(".","").str.replace('"','').str.split().str.get(0)


Unnamed: 0,rowid,school,district,address,original_name,name,tap_key,simple_name
0,1,Palms Elementary (ABC Unified),ABC Unified,12445 East 207th St.,Palms Elementary (ABC Unified),Palms Elementary,12445-east-207th-st-palms-elementary-,12445-east-207th-st-palms-elementary-
1,2,Ross (Faye) Middle (ABC Unified),ABC Unified,17707 Elaine Ave.,Ross (Faye) Middle (ABC Unified),Ross (Faye) Middle,17707-elaine-ave-ross-(faye)-middle-,17707-elaine-ave-ross-(faye)-middle-
2,3,Stowers(Cecil B.) Elementary (ABC Unified),ABC Unified,13350 Beach St.,Stowers(Cecil B.) Elementary (ABC Unified),Stowers(Cecil B.) Elementary,13350-beach-st-stowers(cecil-b)-elementary-,13350-beach-st-stowers(cecil-b)-elementary-
3,4,Tetzlaff (Martin B.) Middle (ABC Unified),ABC Unified,12351 East Del Amo Blvd.,Tetzlaff (Martin B.) Middle (ABC Unified),Tetzlaff (Martin B.) Middle,12351-east-del-amo-blvd-tetzlaff-(martin-b)-mi...,12351-east-del-amo-blvd-tetzlaff-(martin-b)-mi...
4,5,Tracy (Wilbur) High (Continuation) (ABC Unified),ABC Unified,12222 Cuesta Dr.,Tracy (Wilbur) High (Continuation) (ABC Unified),Tracy (Wilbur) High (Continuation),12222-cuesta-dr-tracy-(wilbur)-high-(continuat...,12222-cuesta-dr-tracy-(wilbur)-high-(continuat...
...,...,...,...,...,...,...,...,...
3146,3147,YouthBuild - Norwalk (YouthBuild Charter Schoo...,YouthBuild Charter Schools of Los Angeles,12124 Front St.,YouthBuild - Norwalk (YouthBuild Charter Schoo...,YouthBuild - Norwalk,12124-front-st-youthbuild-norwalk-,12124-front-st-youthbuild-norwalk-
3147,3148,YouthBuild - Palmdale (YouthBuild Charter Scho...,YouthBuild Charter Schools of Los Angeles,38626 9th St. East,YouthBuild - Palmdale (YouthBuild Charter Scho...,YouthBuild - Palmdale,38626-9th-st-east-youthbuild-palmdale-,38626-9th-st-east-youthbuild-palmdale-
3148,3149,YouthBuild - Pomona (YouthBuild Charter School...,YouthBuild Charter Schools of Los Angeles,305 E Arrow Hwy.,YouthBuild - Pomona (YouthBuild Charter School...,YouthBuild - Pomona,305-e-arrow-hwy-youthbuild-pomona-,305-e-arrow-hwy-youthbuild-pomona-
3149,3150,YouthBuild - South LA (YouthBuild Charter Scho...,YouthBuild Charter Schools of Los Angeles,400 West Washington Blvd.,YouthBuild - South LA (YouthBuild Charter Scho...,YouthBuild - South LA,400-west-washington-blvd-youthbuild-south-la-,400-west-washington-blvd-youthbuild-south-la-


In [65]:
# unique_schools = go_pass_schools.drop_duplicates(subset=['original_name'])
# print(go_pass_schools.shape)
# print(unique_schools.shape)

# difference_in_dupes = go_pass_schools.shape[0] - unique_schools.shape[0]
# # these are the number of records with the same names
# print("Number of records with same names: \n "+str(difference_in_dupes))

# go_pass_schools.loc[]


## 3.0 Joining the `Original dataset` to the `California dataset`

Here we do a blanket `left` merge where the original records get data added to it.

If we want to keep the California data, then we need to switch this merge type to `inner` or `right`.

#### See the Pandas merge documentation for more information:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.merge.html

In [66]:
def fuzzy_merge(baseFrame, compareFrame, baseKey, compareKey, threshold=86, limit=1, how='left'):
    s_mapping = {x: fuzz_utils.default_process(x) for x in compareFrame[compareKey]}

    m1 = baseFrame[baseKey].apply(lambda x: process.extract(
      fuzz_utils.default_process(x), s_mapping, limit=limit, score_cutoff=threshold, processor=None
    ))
    baseFrame['Match'] = m1

    m2 = baseFrame['Match'].apply(lambda x: ', '.join(i[2] for i in x))
    baseFrame['name'] = m2.replace("",np.nan)
    # baseFrame['school'] = m2['school']
    return baseFrame.merge(compareFrame, left_on=baseKey, right_on=compareKey, how=how)

merged_df = fuzzy_merge(go_pass_schools, la_schools_df, 'name', 'simple_name',how='outer')
# merged_df = fuzzy_merge(go_pass_schools, la_schools_df, 'original_name', 'name',how='left')
# merged_df = fuzzy_merge(la_schools_df, go_pass_schools, 'name', 'original_name',how='right')
merged_df


Unnamed: 0,california_oid,district_x,name_x,address_x,phone,original_name_x,devons_key,participating,Match,rowid,school,district_y,address_y,original_name_y,name_y,tap_key,simple_name
0,19643521933951,Centinela Valley,4859-west-el-segundo-blvd-hawthorne-high-,4859 West El Segundo Blvd.,(310) 263-4400,Hawthorne High,4859-west-el-segundo-blvd-hawthorne-high,True,"[(4859 west el segundo blvd hawthorne high, 10...",306.0,Hawthorne High (Centinela Valley Union High),Centinela Valley Union High,4859 West El Segundo Blvd.,Hawthorne High (Centinela Valley Union High),Hawthorne High,4859-west-el-segundo-blvd-hawthorne-high-,4859-west-el-segundo-blvd-hawthorne-high-
1,19643521934926,Centinela Valley,14901-south-inglewood-ave-lawndale-high-,14901 S. Inglewood Avenue,(310) 263-3100,Lawndale High,14901-s-inglewood-avenue-lawndale-high,True,"[(14901 south inglewood ave lawndale high, 90....",307.0,Lawndale High (Centinela Valley Union High),Centinela Valley Union High,14901 South Inglewood Ave.,Lawndale High (Centinela Valley Union High),Lawndale High,14901-south-inglewood-ave-lawndale-high-,14901-south-inglewood-ave-lawndale-high-
2,196435219348,Centinela Valley,4118-west-rosecrans-ave-leuzinger-high-,4118 West Rosecrans Ave.,(310) 263-2200,Leuzinger High,4118-west-rosecrans-ave-leuzinger-high,True,"[(4118 west rosecrans ave leuzinger high, 100....",308.0,Leuzinger High (Centinela Valley Union High),Centinela Valley Union High,4118 West Rosecrans Ave.,Leuzinger High (Centinela Valley Union High),Leuzinger High,4118-west-rosecrans-ave-leuzinger-high-,4118-west-rosecrans-ave-leuzinger-high-
3,196435219239,Centinela Valley,4951-marine-ave-r-k-lloyde-high-,4951 Marine Ave.,(310) 263-3264,R. K. Lloyde High,4951-marine-ave-r-k-lloyde-high,True,"[(4951 marine ave r k lloyde high, 100.0, 4951...",310.0,R. K. Lloyde High (Centinela Valley Union High),Centinela Valley Union High,4951 Marine Ave.,R. K. Lloyde High (Centinela Valley Union High),R. K. Lloyde High,4951-marine-ave-r-k-lloyde-high-,4951-marine-ave-r-k-lloyde-high-
4,,Charter,,"461 9th Street, San Pedro",(310) 221-0430,Alliance Alice M. Baxter College-Ready High Sc...,"461-9th-street,-san-pedro-alliance-alice-m-bax...",True,[],,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3703,,,,,,,,,,3138.0,Bert Corona Charter High (Youth Policy Institu...,Youth Policy Institute (YPI) Charter Schools,12513 Gain St.,Bert Corona Charter High (Youth Policy Institu...,Bert Corona Charter High,12513-gain-st-bert-corona-charter-high-,12513-gain-st-bert-corona-charter-high-
3704,,,,,,,,,,3139.0,Monsenor Oscar Romero Charter Middle (Youth Po...,Youth Policy Institute (YPI) Charter Schools,2670 West 11th St.,Monsenor Oscar Romero Charter Middle (Youth Po...,Monsenor Oscar Romero Charter Middle,2670-west-11th-st-monsenor-oscar-romero-charte...,2670-west-11th-st-monsenor-oscar-romero-charte...
3705,,,,,,,,,,3145.0,YouthBuild - Hollywood (YouthBuild Charter Sch...,YouthBuild Charter Schools of Los Angeles,5941 Hollywood Boulevard,YouthBuild - Hollywood (YouthBuild Charter Sch...,YouthBuild - Hollywood,5941-hollywood-boulevard-youthbuild-hollywood-,5941-hollywood-boulevard-youthbuild-hollywood-
3706,,,,,,,,,,3147.0,YouthBuild - Norwalk (YouthBuild Charter Schoo...,YouthBuild Charter Schools of Los Angeles,12124 Front St.,YouthBuild - Norwalk (YouthBuild Charter Schoo...,YouthBuild - Norwalk,12124-front-st-youthbuild-norwalk-,12124-front-st-youthbuild-norwalk-


## 4.0 Cleaning up of Merged Data

The steps are as follows:
1. Assign the full name the original name and the district
   - Note: you can also use the California district names instead by uncommenting the `Altenative` line
2. Process alternative method of naming where district names are only added if there are duplicates
3. Set the final `phone` column to default to the original data's `telephone` column
4. Get the match score from rapidfuzz

In [67]:
merged_df['rowid'].fillna(0, inplace=True)
merged_df['rowid'] = merged_df['rowid'].astype('int64')

In [68]:
# 1. Default name processing
# merged_df['full_name'] = merged_df['original_name_x'] + ' (' + merged_df['district_x'] + ')'
merged_df.loc[merged_df['rowid'] == 0, 'school'] = merged_df['original_name_x'] + ' (' + merged_df['district_x'] + ')'
merged_df.loc[merged_df['rowid'] == 0, 'district'] = merged_df['district_x']
# 1b. alternative approach: uncomment below to use the California district names instead
# merged_df['full_name']  = merged_df['original_name_x'] + ' (' + merged_df['district_y'] + ')'

# 2. Alternative school name field processing
# Flag duplicated records
merged_df['duped'] = merged_df.duplicated(['original_name_x'],keep=False)

# `full_name_some` is the option where there are district names for only duplicated records.
# merged_df.loc[merged_df['duped'] == False, 'full_name_some'] = merged_df['original_name_x']
# merged_df.loc[merged_df['duped'] == True, 'full_name_some'] = merged_df['original_name_x'] + ' (' + merged_df['district_y'] + ')'

# 3. Phone number processing
# merged_df['phone'] = merged_df['telephone']

# if there is less than 4 characters in that column then set it to the California's phone numbers i.e. 'default phone' column
# merged_df.loc[merged_df['phone'].str.len() < 4, 'phone'] = merged_df['default_phone']

# 4. Match score processing
merged_df['score'] = merged_df['Match'].astype('string').str.split(",").str[1]

merged_df

Unnamed: 0,california_oid,district_x,name_x,address_x,phone,original_name_x,devons_key,participating,Match,rowid,school,district_y,address_y,original_name_y,name_y,tap_key,simple_name,district,duped,score
0,19643521933951,Centinela Valley,4859-west-el-segundo-blvd-hawthorne-high-,4859 West El Segundo Blvd.,(310) 263-4400,Hawthorne High,4859-west-el-segundo-blvd-hawthorne-high,True,"[(4859 west el segundo blvd hawthorne high, 10...",306,Hawthorne High (Centinela Valley Union High),Centinela Valley Union High,4859 West El Segundo Blvd.,Hawthorne High (Centinela Valley Union High),Hawthorne High,4859-west-el-segundo-blvd-hawthorne-high-,4859-west-el-segundo-blvd-hawthorne-high-,,False,100.0
1,19643521934926,Centinela Valley,14901-south-inglewood-ave-lawndale-high-,14901 S. Inglewood Avenue,(310) 263-3100,Lawndale High,14901-s-inglewood-avenue-lawndale-high,True,"[(14901 south inglewood ave lawndale high, 90....",307,Lawndale High (Centinela Valley Union High),Centinela Valley Union High,14901 South Inglewood Ave.,Lawndale High (Centinela Valley Union High),Lawndale High,14901-south-inglewood-ave-lawndale-high-,14901-south-inglewood-ave-lawndale-high-,,False,90.9090909090909
2,196435219348,Centinela Valley,4118-west-rosecrans-ave-leuzinger-high-,4118 West Rosecrans Ave.,(310) 263-2200,Leuzinger High,4118-west-rosecrans-ave-leuzinger-high,True,"[(4118 west rosecrans ave leuzinger high, 100....",308,Leuzinger High (Centinela Valley Union High),Centinela Valley Union High,4118 West Rosecrans Ave.,Leuzinger High (Centinela Valley Union High),Leuzinger High,4118-west-rosecrans-ave-leuzinger-high-,4118-west-rosecrans-ave-leuzinger-high-,,False,100.0
3,196435219239,Centinela Valley,4951-marine-ave-r-k-lloyde-high-,4951 Marine Ave.,(310) 263-3264,R. K. Lloyde High,4951-marine-ave-r-k-lloyde-high,True,"[(4951 marine ave r k lloyde high, 100.0, 4951...",310,R. K. Lloyde High (Centinela Valley Union High),Centinela Valley Union High,4951 Marine Ave.,R. K. Lloyde High (Centinela Valley Union High),R. K. Lloyde High,4951-marine-ave-r-k-lloyde-high-,4951-marine-ave-r-k-lloyde-high-,,False,100.0
4,,Charter,,"461 9th Street, San Pedro",(310) 221-0430,Alliance Alice M. Baxter College-Ready High Sc...,"461-9th-street,-san-pedro-alliance-alice-m-bax...",True,[],0,Alliance Alice M. Baxter College-Ready High Sc...,,,,,,,Charter,False,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3703,,,,,,,,,,3138,Bert Corona Charter High (Youth Policy Institu...,Youth Policy Institute (YPI) Charter Schools,12513 Gain St.,Bert Corona Charter High (Youth Policy Institu...,Bert Corona Charter High,12513-gain-st-bert-corona-charter-high-,12513-gain-st-bert-corona-charter-high-,,True,
3704,,,,,,,,,,3139,Monsenor Oscar Romero Charter Middle (Youth Po...,Youth Policy Institute (YPI) Charter Schools,2670 West 11th St.,Monsenor Oscar Romero Charter Middle (Youth Po...,Monsenor Oscar Romero Charter Middle,2670-west-11th-st-monsenor-oscar-romero-charte...,2670-west-11th-st-monsenor-oscar-romero-charte...,,True,
3705,,,,,,,,,,3145,YouthBuild - Hollywood (YouthBuild Charter Sch...,YouthBuild Charter Schools of Los Angeles,5941 Hollywood Boulevard,YouthBuild - Hollywood (YouthBuild Charter Sch...,YouthBuild - Hollywood,5941-hollywood-boulevard-youthbuild-hollywood-,5941-hollywood-boulevard-youthbuild-hollywood-,,True,
3706,,,,,,,,,,3147,YouthBuild - Norwalk (YouthBuild Charter Schoo...,YouthBuild Charter Schools of Los Angeles,12124 Front St.,YouthBuild - Norwalk (YouthBuild Charter Schoo...,YouthBuild - Norwalk,12124-front-st-youthbuild-norwalk-,12124-front-st-youthbuild-norwalk-,,True,


## 5.0 Final Column Selection
Here we select the final data columns for our outputs, with `_x` suffixes representing the original Metro dataset and `_y` suffixes representing the California schools dataset.

- "district_x"
- "district_y"
- "original_name_x"
- "original_name_y"
- "status"
- 'closed_date'
- "full_name"
- "full_name_some"
- "participating"
- "address_x"
- "address_y"
- "city_x"
- "city_y"
- 'address_y'
- 'city_y'
- "score"
- "duped"
- 'phone'
- 'email'
- 'website'
- 'latitude'
- 'longitude'
- 'last_update'

In [69]:
# final_columns = {
#     "full_name":"school_name",
#     "full_name_some":"school_name_with_some_districts_attached"
# }

# final_df = merged_df[["oid","district_x","district_y","original_name_x","original_name_y","status",'closed_date',
#        "full_name","full_name_some","participating","address_x","address_y","city_x","city_y","score","duped",'phone', 'email',
#        'website', 'latitude', 'longitude', 'last_update']]
final_df = merged_df

# final_df["address"] = final_df["address_x"] 


# final_df.rename(inplace=True)
# final_df.reset_index(inplace=True)
# final_df.rename(inplace=True, columns=final_columns)
final_df.reset_index(inplace=True)
# final_df.index.names = ['id']
final_df.index.names = ['id']

final_df

Unnamed: 0_level_0,index,california_oid,district_x,name_x,address_x,phone,original_name_x,devons_key,participating,Match,...,school,district_y,address_y,original_name_y,name_y,tap_key,simple_name,district,duped,score
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,0,19643521933951,Centinela Valley,4859-west-el-segundo-blvd-hawthorne-high-,4859 West El Segundo Blvd.,(310) 263-4400,Hawthorne High,4859-west-el-segundo-blvd-hawthorne-high,True,"[(4859 west el segundo blvd hawthorne high, 10...",...,Hawthorne High (Centinela Valley Union High),Centinela Valley Union High,4859 West El Segundo Blvd.,Hawthorne High (Centinela Valley Union High),Hawthorne High,4859-west-el-segundo-blvd-hawthorne-high-,4859-west-el-segundo-blvd-hawthorne-high-,,False,100.0
1,1,19643521934926,Centinela Valley,14901-south-inglewood-ave-lawndale-high-,14901 S. Inglewood Avenue,(310) 263-3100,Lawndale High,14901-s-inglewood-avenue-lawndale-high,True,"[(14901 south inglewood ave lawndale high, 90....",...,Lawndale High (Centinela Valley Union High),Centinela Valley Union High,14901 South Inglewood Ave.,Lawndale High (Centinela Valley Union High),Lawndale High,14901-south-inglewood-ave-lawndale-high-,14901-south-inglewood-ave-lawndale-high-,,False,90.9090909090909
2,2,196435219348,Centinela Valley,4118-west-rosecrans-ave-leuzinger-high-,4118 West Rosecrans Ave.,(310) 263-2200,Leuzinger High,4118-west-rosecrans-ave-leuzinger-high,True,"[(4118 west rosecrans ave leuzinger high, 100....",...,Leuzinger High (Centinela Valley Union High),Centinela Valley Union High,4118 West Rosecrans Ave.,Leuzinger High (Centinela Valley Union High),Leuzinger High,4118-west-rosecrans-ave-leuzinger-high-,4118-west-rosecrans-ave-leuzinger-high-,,False,100.0
3,3,196435219239,Centinela Valley,4951-marine-ave-r-k-lloyde-high-,4951 Marine Ave.,(310) 263-3264,R. K. Lloyde High,4951-marine-ave-r-k-lloyde-high,True,"[(4951 marine ave r k lloyde high, 100.0, 4951...",...,R. K. Lloyde High (Centinela Valley Union High),Centinela Valley Union High,4951 Marine Ave.,R. K. Lloyde High (Centinela Valley Union High),R. K. Lloyde High,4951-marine-ave-r-k-lloyde-high-,4951-marine-ave-r-k-lloyde-high-,,False,100.0
4,4,,Charter,,"461 9th Street, San Pedro",(310) 221-0430,Alliance Alice M. Baxter College-Ready High Sc...,"461-9th-street,-san-pedro-alliance-alice-m-bax...",True,[],...,Alliance Alice M. Baxter College-Ready High Sc...,,,,,,,Charter,False,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3703,3703,,,,,,,,,,...,Bert Corona Charter High (Youth Policy Institu...,Youth Policy Institute (YPI) Charter Schools,12513 Gain St.,Bert Corona Charter High (Youth Policy Institu...,Bert Corona Charter High,12513-gain-st-bert-corona-charter-high-,12513-gain-st-bert-corona-charter-high-,,True,
3704,3704,,,,,,,,,,...,Monsenor Oscar Romero Charter Middle (Youth Po...,Youth Policy Institute (YPI) Charter Schools,2670 West 11th St.,Monsenor Oscar Romero Charter Middle (Youth Po...,Monsenor Oscar Romero Charter Middle,2670-west-11th-st-monsenor-oscar-romero-charte...,2670-west-11th-st-monsenor-oscar-romero-charte...,,True,
3705,3705,,,,,,,,,,...,YouthBuild - Hollywood (YouthBuild Charter Sch...,YouthBuild Charter Schools of Los Angeles,5941 Hollywood Boulevard,YouthBuild - Hollywood (YouthBuild Charter Sch...,YouthBuild - Hollywood,5941-hollywood-boulevard-youthbuild-hollywood-,5941-hollywood-boulevard-youthbuild-hollywood-,,True,
3706,3706,,,,,,,,,,...,YouthBuild - Norwalk (YouthBuild Charter Schoo...,YouthBuild Charter Schools of Los Angeles,12124 Front St.,YouthBuild - Norwalk (YouthBuild Charter Schoo...,YouthBuild - Norwalk,12124-front-st-youthbuild-norwalk-,12124-front-st-youthbuild-norwalk-,,True,


In [70]:
final_df.columns

Index(['index', 'california_oid', 'district_x', 'name_x', 'address_x', 'phone',
       'original_name_x', 'devons_key', 'participating', 'Match', 'rowid',
       'school', 'district_y', 'address_y', 'original_name_y', 'name_y',
       'tap_key', 'simple_name', 'district', 'duped', 'score'],
      dtype='object')

## 5.0 Final Output
Using today's date and the csv file extension we will output the file to the data directory.

We also split the data using `to_json` in pandas, more info here:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_json.html

In [71]:
today = str(datetime.now().strftime("%Y-%m"))
outfile_extension = ".csv"
output_file_name = "../data/tap_data_merged_with_metro_california_qc_data_right_join_"+today+outfile_extension

final_df.to_csv(output_file_name)

#create JSON file oriented by records
output_json = "../src/data/schools_right_join.json"
json_file = final_df.to_json(orient='records',index=True) 
with open(output_json, 'w') as f:
    f.write(json_file)

In [72]:
# Done!