# Jupyter Notebook for converting the Go Pass Data


## 1.0 Load in modules

We are using pandas and the numpy libraries to read and process the data.

Then we will use the `rapidfuzz` library to caclulate close matches to the strings so that we can join the information.

Using other popular fuzzy-string matching libraries like `fuzzball` takes over 3 hours to process as the alogrithim used takes the number of records to the `n`th power!! (i.e. comparing 1000 records processes 1000^1000!!!) `rapidfuzz` has a better algorithm that trims this town to 3 - 4 minutes!


In [1]:
import pandas as pd
import numpy as np
from datetime import datetime
from rapidfuzz import process, utils as fuzz_utils

GOOGLE_SHEET_URL = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vSWbwrsqF-c---4lfw0LZWymd-f8sy8sLYkXgzh0OyeGATWwrvv7V1Mq5BcApn7F_-WYKP1KXy5shKw/pub?output=csv'

## 2.0 Format Go Pass Data

We will be extracting the following fields:

- District
- School
- Address
- City
- Participating

We create a simplfied field called `name` based on the `street address`, `school name`, and `district`. This field is formatted to be all lowercase with spaces replaced with `-` to make matching easier.

In [2]:
go_pass_schools = pd.read_csv(GOOGLE_SHEET_URL,
    usecols={'District','School','Address','City','Telephone','Participating'})
go_pass_schools.columns = ["district","name","address","city",'telephone',"participating"]

go_pass_schools = go_pass_schools.fillna('')
go_pass_schools['original_name'] = go_pass_schools['name']
simple_district = go_pass_schools['district'].str.lower().str.replace(" ","-").str.replace(".","").str.replace('"','')
# go_pass_schools['name'] = go_pass_schools['name'].str.lower().str.replace(" ","-").str.replace(".","").str.replace('"','')+"-"+simple_district
simple_address = go_pass_schools['address'].str.lower().str.replace(" ","-").str.replace(".","").str.replace('"','')
go_pass_schools.loc[(go_pass_schools['name'].str.len() > 1),'name'] = simple_address+"-"+go_pass_schools['name'].str.lower().str.replace(" ","-").str.replace(".","").str.replace('"','')+"-"+simple_district
go_pass_schools

  simple_district = go_pass_schools['district'].str.lower().str.replace(" ","-").str.replace(".","").str.replace('"','')
  simple_address = go_pass_schools['address'].str.lower().str.replace(" ","-").str.replace(".","").str.replace('"','')
  go_pass_schools.loc[(go_pass_schools['name'].str.len() > 1),'name'] = simple_address+"-"+go_pass_schools['name'].str.lower().str.replace(" ","-").str.replace(".","").str.replace('"','')+"-"+simple_district


Unnamed: 0,district,name,address,city,telephone,participating,original_name
0,Centinela Valley,4859-west-el-segundo-blvd-hawthorne-high-centi...,4859 West El Segundo Blvd.,Hawthorne,(310) 263-4400,True,Hawthorne High
1,Centinela Valley,14901-s-inglewood-avenue-lawndale-high-centine...,14901 S. Inglewood Avenue,Lawndale,(310) 263-3100,True,Lawndale High
2,Centinela Valley,4118-west-rosecrans-ave-leuzinger-high-centine...,4118 West Rosecrans Ave.,Lawndale,(310) 263-2200,True,Leuzinger High
3,Centinela Valley,4951-marine-ave-r-k-lloyde-high-centinela-valley,4951 Marine Ave.,Lawndale,(310) 263-3264,True,R. K. Lloyde High
4,Charter,"461-9th-street,-san-pedro-alliance-alice-m-bax...","461 9th Street, San Pedro",San Pedro,(310) 221-0430,True,Alliance Alice M. Baxter College-Ready High Sc...
...,...,...,...,...,...,...,...
1477,Private,1253-bishops-road-cathedral-high-school-private,1253 Bishops Road,Los Angeles,(323) 441-3113,True,Cathedral High School
1478,Private,6361-santa-monica-blvd-episcopal-school-of-los...,6361 Santa Monica Blvd.,Los Angeles,(323) 284-7266,True,Episcopal School of Los Angeles
1479,Private,9650-zelzah-ave-northpoint-school-private,9650 Zelzah Ave.,Northridge,(818) 739-5231,True,Northpoint School
1480,Santa Monica-Malibu,,,,,True,


### 2.1 Format California Schools data

We create a simplfied field called `simple name` based on the `street address`, `school name`, and `district`. This field is formatted to be all lowercase with spaces replaced with `-` to make matching easier.

#### Source for California schools data

https://www.cde.ca.gov/SchoolDirectory/ExportSelect?simpleSearch=N&address=&city=&counties=&districts=&cdscode=&charter=&magnet=&name=&nps=&search=2&zip=&yearround=&status=1%2C2&types=&order=1&multilingual=&qsc=3549&qdc=3549

August 2022


In [3]:
california_schools_data = "../data/CDESchoolDirectoryExportAugust2022.csv"

california_schools = pd.read_csv(california_schools_data,
    usecols={'County','Status','District','School',"Closed Date","Website","Latitude","Longitude","Last Update",'Street Address','Street City',"Phone","Email"},encoding='latin')
california_schools.columns = ["county","district","name","status","closed_date","website","latitude","longitude","last_update","address","city","default_phone","email"]

california_schools = california_schools.fillna('')
la_schools_df = california_schools[california_schools["county"] == "Los Angeles"]

la_schools_df['original_name'] = la_schools_df['name']

la_schools_df.loc[la_schools_df['district'] == "Los Angeles Unified", 'district'] = "LAUSD"
simple_district = la_schools_df['district'].str.lower().str.replace(" ","-").str.replace(".","").str.replace('"','')
simple_address = la_schools_df['address'].str.lower().str.replace(" ","-").str.replace(".","").str.replace('"','')

la_schools_df['simple_name'] = simple_address+"-"+la_schools_df['name'].str.lower().str.replace(" ","-").str.replace(".","").str.replace('"','')+"-"+simple_district


la_schools_df

  simple_district = la_schools_df['district'].str.lower().str.replace(" ","-").str.replace(".","").str.replace('"','')
  simple_address = la_schools_df['address'].str.lower().str.replace(" ","-").str.replace(".","").str.replace('"','')
  la_schools_df['simple_name'] = simple_address+"-"+la_schools_df['name'].str.lower().str.replace(" ","-").str.replace(".","").str.replace('"','')+"-"+simple_district


Unnamed: 0,county,district,name,status,closed_date,website,latitude,longitude,last_update,address,city,default_phone,email,original_name,simple_name
0,Los Angeles,ABC Unified,ABC Adult,Active,No Data,http://abcadultschool.edu,33.878924,-118.071286,7/19/2021,12254 Cuesta Dr.,Cerritos,(562) 229-7960,No Data,ABC Adult,12254-cuesta-dr-abc-adult-abc-unified
1,Los Angeles,ABC Unified,ABC Evening High School,Closed,11/23/1994,No Data,No Data,No Data,6/24/1999,16800 Shoemaker Ave.,Cerritos,No Data,No Data,ABC Evening High School,16800-shoemaker-ave-abc-evening-high-school-ab...
2,Los Angeles,ABC Unified,ABC Secondary (Alternative),Active,No Data,No Data,33.881547,-118.046358,11/5/2021,16534 South Carmenita Rd.,Cerritos,(562) 229-7768,No Data,ABC Secondary (Alternative),16534-south-carmenita-rd-abc-secondary-(altern...
3,Los Angeles,ABC Unified,Accelerated Christian Academy,Closed,5/12/2016,No Data,Information Redacted,Information Redacted,5/12/2016,Information Redacted,Lakewood,Information Redacted,Information Redacted,Accelerated Christian Academy,information-redacted-accelerated-christian-aca...
4,Los Angeles,ABC Unified,Aloha Elementary,Active,No Data,www.alohaes.us,33.835176,-118.083725,11/5/2021,11737 East 214th St.,Lakewood,(562) 229-7825,No Data,Aloha Elementary,11737-east-214th-st-aloha-elementary-abc-unified
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5824,Los Angeles,Wiseburn Unified,Juan De Anza Elementary,Active,No Data,No Data,33.922918,-118.372077,11/5/2021,12110 Hindry Ave.,Hawthorne,(310) 725-2100,No Data,Juan De Anza Elementary,12110-hindry-ave-juan-de-anza-elementary-wiseb...
5825,Los Angeles,Wiseburn Unified,Malaga Cove School - Success Learning Center,Closed,6/30/2016,Information Not Available,33.801682,-118.396452,10/24/2016,300 Paseo Del Mar,Palos Verdes Estates,Information Not Available,Information Not Available,Malaga Cove School - Success Learning Center,300-paseo-del-mar-malaga-cove-school---success...
5826,Los Angeles,Wiseburn Unified,Richard Henry Dana Middle,Active,No Data,No Data,33.909404,-118.376654,11/5/2021,5504 West 135th St.,Hawthorne,(310) 725-4700,No Data,Richard Henry Dana Middle,5504-west-135th-st-richard-henry-dana-middle-w...
5827,Los Angeles,Wiseburn Unified,RISE High,Closed,9/1/2018,Information Not Available,33.908904,-118.377651,11/30/2018,13500 Aviation Blvd.,Hawthorne,Information Not Available,Information Not Available,RISE High,13500-aviation-blvd-rise-high-wiseburn-unified


In [4]:
unique_schools = go_pass_schools.drop_duplicates(subset=['name'])
print(go_pass_schools.shape)
print(unique_schools.shape)

difference_in_dupes = go_pass_schools.shape[0] - unique_schools.shape[0]
# these are the number of records with the same names
print("Number of records with same names: \n "+str(difference_in_dupes))


(1482, 7)
(1477, 7)
Number of records with same names: 
 5


## 3.0 Joining the `Original dataset` to the `California dataset`

Here we do a blanket `left` merge where the original records get data added to it.

If we want to keep the California data, then we need to switch this merge type to `inner` or `right`.

#### See the Pandas merge documentation for more information:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.merge.html

In [5]:
def fuzzy_merge(baseFrame, compareFrame, baseKey, compareKey, threshold=86, limit=1, how='left'):
    s_mapping = {x: fuzz_utils.default_process(x) for x in compareFrame[compareKey]}

    m1 = baseFrame[baseKey].apply(lambda x: process.extract(
      fuzz_utils.default_process(x), s_mapping, limit=limit, score_cutoff=threshold, processor=None
    ))
    baseFrame['Match'] = m1

    m2 = baseFrame['Match'].apply(lambda x: ', '.join(i[2] for i in x))
    baseFrame['name'] = m2.replace("",np.nan)

    return baseFrame.merge(compareFrame, left_on='name', right_on=compareKey, how=how)

merged_df = fuzzy_merge(go_pass_schools, la_schools_df, 'name', 'simple_name',how='left')
# merged_df = fuzzy_merge(la_schools_df, go_pass_schools 'name', 'simple_name',how='right')
merged_df


Unnamed: 0,district_x,name_x,address_x,city_x,telephone,participating,original_name_x,Match,county,district_y,...,website,latitude,longitude,last_update,address_y,city_y,default_phone,email,original_name_y,simple_name
0,Centinela Valley,4859-west-el-segundo-blvd-hawthorne-high-centi...,4859 West El Segundo Blvd.,Hawthorne,(310) 263-4400,True,Hawthorne High,[(4859 west el segundo blvd hawthorne high cen...,Los Angeles,Centinela Valley Union High,...,No Data,33.916456,-118.362903,11/5/2021,4859 West El Segundo Blvd.,Hawthorne,(310) 263-4400,No Data,Hawthorne High,4859-west-el-segundo-blvd-hawthorne-high-centi...
1,Centinela Valley,14901-south-inglewood-ave-lawndale-high-centin...,14901 S. Inglewood Avenue,Lawndale,(310) 263-3100,True,Lawndale High,[(14901 south inglewood ave lawndale high cent...,Los Angeles,Centinela Valley Union High,...,No Data,33.896373,-118.361369,6/30/2022,14901 South Inglewood Ave.,Lawndale,(310) 263-3102,No Data,Lawndale High,14901-south-inglewood-ave-lawndale-high-centin...
2,Centinela Valley,4118-west-rosecrans-ave-leuzinger-high-centine...,4118 West Rosecrans Ave.,Lawndale,(310) 263-2200,True,Leuzinger High,[(4118 west rosecrans ave leuzinger high centi...,Los Angeles,Centinela Valley Union High,...,No Data,33.90137,-118.34687,11/5/2021,4118 West Rosecrans Ave.,Lawndale,(310) 263-2208,No Data,Leuzinger High,4118-west-rosecrans-ave-leuzinger-high-centine...
3,Centinela Valley,4951-marine-ave-r-k-lloyde-high-centinela-vall...,4951 Marine Ave.,Lawndale,(310) 263-3264,True,R. K. Lloyde High,[(4951 marine ave r k lloyde high centinela va...,Los Angeles,Centinela Valley Union High,...,No Data,33.895165,-118.365993,11/5/2021,4951 Marine Ave.,Lawndale,(310) 263-3264,No Data,R. K. Lloyde High,4951-marine-ave-r-k-lloyde-high-centinela-vall...
4,Charter,,"461 9th Street, San Pedro",San Pedro,(310) 221-0430,True,Alliance Alice M. Baxter College-Ready High Sc...,[],,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1480,Private,1253-bishops-rd-cathedral-high-school-lausd,1253 Bishops Road,Los Angeles,(323) 441-3113,True,Cathedral High School,"[(1253 bishops rd cathedral high school lausd,...",Los Angeles,LAUSD,...,No Data,34.069854,-118.234286,6/27/2022,1253 Bishops Rd.,Los Angeles,(323) 225-2438,brjohnm@chsla.org,Cathedral High School,1253-bishops-rd-cathedral-high-school-lausd
1481,Private,6361-santa-monica-blvd-the-episcopal-school-of...,6361 Santa Monica Blvd.,Los Angeles,(323) 284-7266,True,Episcopal School of Los Angeles,[(6361 santa monica blvd the episcopal school ...,Los Angeles,LAUSD,...,No Data,34.090998,-118.328139,6/28/2022,6361 Santa Monica Blvd.,Los Angeles,(323) 462-3752,registrar@es-la.com,The Episcopal School of Los Angeles,6361-santa-monica-blvd-the-episcopal-school-of...
1482,Private,,9650 Zelzah Ave.,Northridge,(818) 739-5231,True,Northpoint School,[],,,...,,,,,,,,,,
1483,Santa Monica-Malibu,,,,,True,,[],,,...,,,,,,,,,,


## 4.0 Cleaning up of Merged Data

The steps are as follows:
1. Assign the full name the original name and the district
   - Note: you can also use the California district names instead by uncommenting the `Altenative` line
2. Process alternative method of naming where district names are only added if there are duplicates
3. Set the final `phone` column to default to the original data's `telephone` column
4. Get the match score from rapidfuzz

In [6]:
# 1. Default name processing
merged_df['full_name'] = merged_df['original_name_x'] + ' (' + merged_df['district_x'] + ')'

# 1b. alternative approach: uncomment below to use the California district names instead
# merged_df['full_name']  = merged_df['original_name_x'] + ' (' + merged_df['district_y'] + ')'

# 2. Alternative school name field processing
# Flag duplicated records
merged_df['duped'] = merged_df.duplicated(['original_name_x'],keep=False)

# `full_name_some` is the option where there are district names for only duplicated records.
merged_df.loc[merged_df['duped'] == False, 'full_name_some'] = merged_df['original_name_x']
merged_df.loc[merged_df['duped'] == True, 'full_name_some'] = merged_df['original_name_x'] + ' (' + merged_df['district_y'] + ')'

# 3. Phone number processing
merged_df['phone'] = merged_df['telephone']

# if there is less than 4 characters in that column then set it to the California's phone numbers i.e. 'default phone' column
merged_df.loc[merged_df['phone'].str.len() < 4, 'phone'] = merged_df['default_phone']

# 4. Match score processing
merged_df['score'] = merged_df['Match'].astype('string').str.split(",").str[1]

merged_df

Unnamed: 0,district_x,name_x,address_x,city_x,telephone,participating,original_name_x,Match,county,district_y,...,city_y,default_phone,email,original_name_y,simple_name,full_name,duped,full_name_some,phone,score
0,Centinela Valley,4859-west-el-segundo-blvd-hawthorne-high-centi...,4859 West El Segundo Blvd.,Hawthorne,(310) 263-4400,True,Hawthorne High,[(4859 west el segundo blvd hawthorne high cen...,Los Angeles,Centinela Valley Union High,...,Hawthorne,(310) 263-4400,No Data,Hawthorne High,4859-west-el-segundo-blvd-hawthorne-high-centi...,Hawthorne High (Centinela Valley),False,Hawthorne High,(310) 263-4400,95.0
1,Centinela Valley,14901-south-inglewood-ave-lawndale-high-centin...,14901 S. Inglewood Avenue,Lawndale,(310) 263-3100,True,Lawndale High,[(14901 south inglewood ave lawndale high cent...,Los Angeles,Centinela Valley Union High,...,Lawndale,(310) 263-3102,No Data,Lawndale High,14901-south-inglewood-ave-lawndale-high-centin...,Lawndale High (Centinela Valley),False,Lawndale High,(310) 263-3100,86.53465346534652
2,Centinela Valley,4118-west-rosecrans-ave-leuzinger-high-centine...,4118 West Rosecrans Ave.,Lawndale,(310) 263-2200,True,Leuzinger High,[(4118 west rosecrans ave leuzinger high centi...,Los Angeles,Centinela Valley Union High,...,Lawndale,(310) 263-2208,No Data,Leuzinger High,4118-west-rosecrans-ave-leuzinger-high-centine...,Leuzinger High (Centinela Valley),False,Leuzinger High,(310) 263-2200,95.0
3,Centinela Valley,4951-marine-ave-r-k-lloyde-high-centinela-vall...,4951 Marine Ave.,Lawndale,(310) 263-3264,True,R. K. Lloyde High,[(4951 marine ave r k lloyde high centinela va...,Los Angeles,Centinela Valley Union High,...,Lawndale,(310) 263-3264,No Data,R. K. Lloyde High,4951-marine-ave-r-k-lloyde-high-centinela-vall...,R. K. Lloyde High (Centinela Valley),False,R. K. Lloyde High,(310) 263-3264,95.0
4,Charter,,"461 9th Street, San Pedro",San Pedro,(310) 221-0430,True,Alliance Alice M. Baxter College-Ready High Sc...,[],,,...,,,,,,Alliance Alice M. Baxter College-Ready High Sc...,False,Alliance Alice M. Baxter College-Ready High Sc...,(310) 221-0430,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1480,Private,1253-bishops-rd-cathedral-high-school-lausd,1253 Bishops Road,Los Angeles,(323) 441-3113,True,Cathedral High School,"[(1253 bishops rd cathedral high school lausd,...",Los Angeles,LAUSD,...,Los Angeles,(323) 225-2438,brjohnm@chsla.org,Cathedral High School,1253-bishops-rd-cathedral-high-school-lausd,Cathedral High School (Private),False,Cathedral High School,(323) 441-3113,86.66666666666667
1481,Private,6361-santa-monica-blvd-the-episcopal-school-of...,6361 Santa Monica Blvd.,Los Angeles,(323) 284-7266,True,Episcopal School of Los Angeles,[(6361 santa monica blvd the episcopal school ...,Los Angeles,LAUSD,...,Los Angeles,(323) 462-3752,registrar@es-la.com,The Episcopal School of Los Angeles,6361-santa-monica-blvd-the-episcopal-school-of...,Episcopal School of Los Angeles (Private),False,Episcopal School of Los Angeles,(323) 284-7266,88.88888888888889
1482,Private,,9650 Zelzah Ave.,Northridge,(818) 739-5231,True,Northpoint School,[],,,...,,,,,,Northpoint School (Private),False,Northpoint School,(818) 739-5231,
1483,Santa Monica-Malibu,,,,,True,,[],,,...,,,,,,(Santa Monica-Malibu),True,,,


## 5.0 Final Column Selection
Here we select the final data columns for our outputs, with `_x` suffixes representing the original Metro dataset and `_y` suffixes representing the California schools dataset.

- "district_x"
- "district_y"
- "original_name_x"
- "original_name_y"
- "status"
- 'closed_date'
- "full_name"
- "full_name_some"
- "participating"
- "address_x"
- "address_y"
- "city_x"
- "city_y"
- 'address_y'
- 'city_y'
- "score"
- "duped"
- 'phone'
- 'email'
- 'website'
- 'latitude'
- 'longitude'
- 'last_update'

In [21]:
final_columns = {
    "full_name":"school_name",
    "full_name_some":"school_name_with_some_districts_attached",
}

final_df = merged_df[["district_x","district_y","original_name_x","original_name_y","status",'closed_date',
       "full_name","full_name_some","participating","address_x","address_y","city_x","city_y","score","duped",'phone', 'email',
       'website', 'latitude', 'longitude', 'last_update']]
final_df.rename(inplace=True, columns=final_columns)
final_df.reset_index(inplace=True)
final_df.index.names = ['id']

final_df

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().rename(


Unnamed: 0_level_0,id,district_x,district_y,original_name_x,original_name_y,status,closed_date,school_name,school_name_with_some_districts_attached,participating,...,city_x,city_y,score,duped,phone,email,website,latitude,longitude,last_update
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,0,Centinela Valley,Centinela Valley Union High,Hawthorne High,Hawthorne High,Active,No Data,Hawthorne High (Centinela Valley),Hawthorne High,True,...,Hawthorne,Hawthorne,95.0,False,(310) 263-4400,No Data,No Data,33.916456,-118.362903,11/5/2021
1,1,Centinela Valley,Centinela Valley Union High,Lawndale High,Lawndale High,Active,No Data,Lawndale High (Centinela Valley),Lawndale High,True,...,Lawndale,Lawndale,86.53465346534652,False,(310) 263-3100,No Data,No Data,33.896373,-118.361369,6/30/2022
2,2,Centinela Valley,Centinela Valley Union High,Leuzinger High,Leuzinger High,Active,No Data,Leuzinger High (Centinela Valley),Leuzinger High,True,...,Lawndale,Lawndale,95.0,False,(310) 263-2200,No Data,No Data,33.90137,-118.34687,11/5/2021
3,3,Centinela Valley,Centinela Valley Union High,R. K. Lloyde High,R. K. Lloyde High,Active,No Data,R. K. Lloyde High (Centinela Valley),R. K. Lloyde High,True,...,Lawndale,Lawndale,95.0,False,(310) 263-3264,No Data,No Data,33.895165,-118.365993,11/5/2021
4,4,Charter,,Alliance Alice M. Baxter College-Ready High Sc...,,,,Alliance Alice M. Baxter College-Ready High Sc...,Alliance Alice M. Baxter College-Ready High Sc...,True,...,San Pedro,,,False,(310) 221-0430,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1480,1480,Private,LAUSD,Cathedral High School,Cathedral High School,Active,No Data,Cathedral High School (Private),Cathedral High School,True,...,Los Angeles,Los Angeles,86.66666666666667,False,(323) 441-3113,brjohnm@chsla.org,No Data,34.069854,-118.234286,6/27/2022
1481,1481,Private,LAUSD,Episcopal School of Los Angeles,The Episcopal School of Los Angeles,Active,No Data,Episcopal School of Los Angeles (Private),Episcopal School of Los Angeles,True,...,Los Angeles,Los Angeles,88.88888888888889,False,(323) 284-7266,registrar@es-la.com,No Data,34.090998,-118.328139,6/28/2022
1482,1482,Private,,Northpoint School,,,,Northpoint School (Private),Northpoint School,True,...,Northridge,,,False,(818) 739-5231,,,,,
1483,1483,Santa Monica-Malibu,,,,,,(Santa Monica-Malibu),,True,...,,,,True,,,,,,


In [13]:
final_df.columns

Index(['district_x', 'district_y', 'original_name_x', 'original_name_y',
       'status', 'closed_date', 'school_name',
       'school_name_with_some_districts_attached', 'participating',
       'address_x', 'address_y', 'city_x', 'city_y', 'address_y', 'city_y',
       'score', 'duped', 'phone', 'email', 'website', 'latitude', 'longitude',
       'last_update'],
      dtype='object')

## 5.0 Final Output
Using today's date and the csv file extension we will output the file to the data directory.

We also split the data using `to_json` in pandas, more info here:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_json.html

In [22]:
today = str(datetime.now().date())
outfile_extension = ".csv"
output_file_name = "../data/go_pass_schools_merged_with_california_dataset_"+today+outfile_extension

final_df.to_csv(output_file_name)

#create JSON file oriented by split columns
json_file = final_df.to_json(orient='split')
output_json = "../data/go_pass_schools_merged_with_california_dataset_"+today+".json"
with open(output_json, 'w') as f:
    f.write(json_file)

#create JSON file oriented by records
output_json_v2 = "../data/go_pass_schools_merged_with_california_dataset_"+today+"orient_records.json"
json_file2 = final_df.to_json(orient='records',index=True) 
with open(output_json_v2, 'w') as f2:
    f2.write(json_file2)