# Converting Take All Documents into JSON

## Documentation

This Jupyter Notebook takes in translations of the Take One brochure and outputs it as a JSON file for the MyBus tool.

The data was originally in a Word document.  In transferring it to a Word document, line breaks and spaces were cleaned up in the content.  Different languages use spaces differently.

The output file is used on the "All Changes" page of the MyBus tool to display the Take One brochure as an HTML page instead of only as a PDF file.  It contains all the details for all line changes aggregated into a single view.

### Notes

#### Not All Lines

Not all lines are listed in the Take One brochure, only those with major changes.  Some lines not listed in the brochure will still have updated schedules due to minor changes.  For the All Changes page to also act as a central source for updated schedule PDFs, this data needed to be updated.

#### Line Numbers

Lines with sister routes are listed in the brochure as a combined line.  For example - the 16/17.  To match entries with their corresponding schedule PDFs, an additional field for the line number was added.


## Setup 
### 1.1 Import modules

In [1]:
import pandas as pd 
import numpy as np
from docx.api import Document
# import re
# import json

# templates = [["header",1,"Metro is making more service changes.","Metro está haciendo más cambios en sus servicios.","Metro正在進行更多服務調整。","Metro hiện đang thực hiện nhiều thay đổi về dịch vụ.","메트로 서비스가 더욱 새롭게단장하고 있습니다.","メトロのサービスが変更されます。","Metro-ն կրկին փոփոխություններ է իրականացնում ծառայությունների մեջ:","Metro вносит дополнительные изменения в схемы движения."]]
# templates = ["header",1],["summary",1],["details",1],["end",1]
# final_template = pd.DataFrame(templates,columns=["section","order","en","es","zh-TW","vi","ko","ja","hy","ru"])


### 1.2 Read .docx and set final output

In [2]:
document = Document('../data/input/202112shakeup_en_es.docx')
table = document.tables[0]

headers = ["section","order","line","altline","en","es","new-schedule","current-schedule"]
# headers = ["section","order","line","altline","en","es","zh-TW","vi","ko","ja","hy","ru","new-schedule","current-schedule"]

def reset_final_df():
    return pd.DataFrame(columns=headers)

final_df = pd.DataFrame(columns=headers)

### 1.3 Set dataframe to docx table and pre-process data

In [3]:
document = Document('../data/input/202112shakeup_en_es.docx')
table = document.tables[0]
data = [[cell.text.replace("\n"," ").replace('"','').replace('" ','').lstrip() for cell in row.cells] for row in table.rows]

df = pd.DataFrame(data)
new_header = df.iloc[0]
df = df[1:] 
df.columns = new_header
# print(df.columns)
# df = df.rename(columns={'English':'en','Spanish':'es','Chinese (Traditional)':'zh-TW','Korean':'ko','Vietnamese':'vi','Japanese':'ja','Russian':'ru','Armenian':'hy'})

df = df.rename(columns={'English':'en','Spanish':'es'})

# df = df.rename(columns=df.iloc[0]).drop(df.index[0]).reset_index(drop=True)

df = df.replace(' +',r' ',regex=True)
df = df.replace('"',r'',regex=True)
# df.to_json('test.json')
# df.to_csv('test.csv')
df.head()

final_df = pd.DataFrame(columns=["section","order","line","altline","en","es","zh-TW","vi","ko","ja","hy","ru","new-schedule","current-schedule"])
final_df = pd.DataFrame(columns=["section","order","line","altline","en","es","new-schedule","current-schedule"])

## Populating the data

### 2.1 Adding the `Summary` sections

In [4]:
header1 = df.loc[(df['en'].str.contains('\u2013') == False) & (df['en'].str.contains('Metro is making service'))]
header1 = header1.assign(section='header')
header1 = header1.assign(order='1')

header2 = df.loc[(df['en'].str.contains('\u2013') == False) & (df['en'].str.contains('New schedules start'))]
header2 = header2.assign(section='header')
header2 = header2.assign(order='2')

if not final_df.empty:
    final_df = reset_final_df()

final_df = final_df.append(header1)
final_df = final_df.append(header2)

final_df

Unnamed: 0,section,order,line,altline,en,es,new-schedule,current-schedule
32,header,1,,,Metro is making service changes.,Metro está haciendo cambios en el servicio.,,
33,header,2,,,"New schedules start December 10, 2021.",El cobro de las tarifas de autobús de Metro se...,,


### 2.1.1 Populating the `Summary` sections

In [5]:
# th = df[df['en'].str.contains('Starting on'):df['en'].str.contains('We’re ')]
th = df.loc[(df['en'].str.contains('\u2013') == False) & (df.index < 30) & (df['en'].str.contains('We’re modify') == False)]

th = th.assign(section='summary')

th['order'] = ''

th_count = th.shape[0]
for i in range(0,th_count):
    th['order'].values[i] = i

th

final_df = final_df.append(th)
final_df

Unnamed: 0,section,order,line,altline,en,es,new-schedule,current-schedule
32,header,1,,,Metro is making service changes.,Metro está haciendo cambios en el servicio.,,
33,header,2,,,"New schedules start December 10, 2021.",El cobro de las tarifas de autobús de Metro se...,,
1,summary,0,,,"Starting on Sunday, December 19, 2021, Metro i...","A partir del domingo 19 de diciembre de 2021, ...",,
2,summary,1,,,We're realigning routes for easier access to k...,Estamos cambiando los recorridos para facilita...,,
3,summary,2,,,Some bus stops will also be consolidated to im...,También se consolidarán algunas paradas de aut...,,
4,summary,3,,,The following lines will have extra trips adde...,Las siguientes líneas tendrán más viajes en di...,,
5,summary,4,,,"On WEEKENDS (Saturday/Sunday): 256, 720","Fines de semana (sábado/domingo): 256, 720",,
6,summary,5,,,On SUNDAY only: 94,Solo Domingo: 94,,


In [6]:
df

Unnamed: 0,en,es
1,"Starting on Sunday, December 19, 2021, Metro i...","A partir del domingo 19 de diciembre de 2021, ..."
2,We're realigning routes for easier access to k...,Estamos cambiando los recorridos para facilita...
3,Some bus stops will also be consolidated to im...,También se consolidarán algunas paradas de aut...
4,The following lines will have extra trips adde...,Las siguientes líneas tendrán más viajes en di...
5,"On WEEKENDS (Saturday/Sunday): 256, 720","Fines de semana (sábado/domingo): 256, 720"
6,On SUNDAY only: 94,Solo Domingo: 94
7,We’re modifying service on these bus lines:,Estamos modificando el servicio en las siguien...
8,2 – Lines 2 and 200 merge into new Line 2 betw...,Línea 2: Las líneas 2 y 200 se fusionarán y fo...
9,4 – Line 4 changes route at the north end of d...,Línea 4: Línea 4 cambia su recorrido en el ext...
10,33 – Bus stops are discontinued for both direc...,Línea 33: Se descontinuarán las paradas de aut...


### 2.1.2 Adding Metro Rail Lines in the summary section

In [7]:
### filter out the rail lines
### note: right now this is hard coded... need a list of rail lines..
rail_df = df.loc[(df['en'].str.contains('\u2013')) & (df['en'].str.startswith('A Line (Blue), C Line (Green)') == True)]

### add this to the end of all the lines
end_lines = len(th) +1

### set the properties
rail_df = rail_df.assign(section='summary')
rail_df = rail_df.assign(order=end_lines)

### add to the final data frame
final_df = final_df.append(rail_df)
final_df

Unnamed: 0,section,order,line,altline,en,es,new-schedule,current-schedule
32,header,1,,,Metro is making service changes.,Metro está haciendo cambios en el servicio.,,
33,header,2,,,"New schedules start December 10, 2021.",El cobro de las tarifas de autobús de Metro se...,,
1,summary,0,,,"Starting on Sunday, December 19, 2021, Metro i...","A partir del domingo 19 de diciembre de 2021, ...",,
2,summary,1,,,We're realigning routes for easier access to k...,Estamos cambiando los recorridos para facilita...,,
3,summary,2,,,Some bus stops will also be consolidated to im...,También se consolidarán algunas paradas de aut...,,
4,summary,3,,,The following lines will have extra trips adde...,Las siguientes líneas tendrán más viajes en di...,,
5,summary,4,,,"On WEEKENDS (Saturday/Sunday): 256, 720","Fines de semana (sábado/domingo): 256, 720",,
6,summary,5,,,On SUNDAY only: 94,Solo Domingo: 94,,
30,summary,7,,,"A Line (Blue), C Line (Green), E Line (Expo), ...","Líneas de tren: A Line (Blue), C Line (Green),...",,


### 2.2. Adding pre-header for `details`

In [8]:
detail_header = df.loc[(df['en'].str.contains('\u2013') == False) & (df.index < 20) & (df['en'].str.contains('We’re modify') == True)]

detail_header = detail_header.assign(section='details')
detail_header = detail_header.assign(order=0)

final_df = final_df.append(detail_header)
detail_header
# final_df.to_json('final_takeone.json',orient='records')

Unnamed: 0,en,es,section,order
7,We’re modifying service on these bus lines:,Estamos modificando el servicio en las siguien...,details,0


In [9]:
detail_header

Unnamed: 0,en,es,section,order
7,We’re modifying service on these bus lines:,Estamos modificando el servicio en las siguien...,details,0


### 2.3 Adding the `details`/lines section

#### 2.3.1 Process all the lines
First we will read all the lines in from the master list of all the lines.

In [10]:
lines_df = pd.read_csv('../data/input/mybus-dec-2021 - Lines.csv', index_col=0)
lines_df['AltLine'] = lines_df.AltLine.fillna(0).astype(int)
all_lines = lines_df[['Line Label',"AltLine"]]

lines_count = all_lines.shape[0]

all_lines['order'] = ''
all_lines = all_lines.sort_values(by="Line Number")
for i in range(0,lines_count):
    all_lines['order'].values[i] = i+1
all_lines.reset_index(inplace=True)
all_lines = all_lines.rename(columns={"Line Label":"line_label","Line Number":"line"})
all_lines.head(4)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_lines['order'] = ''


Unnamed: 0,line,line_label,AltLine,order
0,2,2,0,1
1,4,4,0,2
2,10,10,10,3
3,14,14,14,4


#### 2.3.2 Filter the docx table for the `line details`
 

In [11]:
### filter the lines out based on em-dash and rail lines
lines_takeone_df = df.loc[(df['en'].str.contains('\u2013')) & (df['en'].str.startswith('A Line (Blue), C Line (Green)') == False)]

### create a field called `line` and set it to the first part of the split `em-dash`
lines_takeone_df['line'] = lines_takeone_df.en.str.split('–').str[0]

### extract duplicates
lines_takeone_df = lines_takeone_df.assign(oid=lines_takeone_df.line.str.split('/')).explode('oid')
dupes = lines_takeone_df.loc[(lines_takeone_df.duplicated(subset=['line']))]

### remove duplicates
lines_takeone_df = lines_takeone_df.drop_duplicates(subset=['line'])
### remove any lines with the "/" in it
lines_takeone_df = lines_takeone_df[lines_takeone_df["line"].str.contains("/")==False]

# lines_takeone_df
lines_takeone_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  lines_takeone_df['line'] = lines_takeone_df.en.str.split('–').str[0]


Unnamed: 0,en,es,line,oid
8,2 – Lines 2 and 200 merge into new Line 2 betw...,Línea 2: Las líneas 2 y 200 se fusionarán y fo...,2,2
9,4 – Line 4 changes route at the north end of d...,Línea 4: Línea 4 cambia su recorrido en el ext...,4,4
10,33 – Bus stops are discontinued for both direc...,Línea 33: Se descontinuarán las paradas de aut...,33,33
11,51 – Line 51 north terminus moves from Wilshir...,Línea 51: La terminal norte de la Línea 51 se ...,51,51
12,53 – Line 53 changes route to serve the upgrad...,Línea 53: Line 53 cambia su ruta para brindar ...,53,53
14,81 – Bus stops are discontinued at Figueroa/ 7...,Línea 81: Se descontinuarán las paradas de aut...,81,81
15,110 – Line 110 east terminus remains at Bell G...,Línea 110: La terminal este de la Línea 110 pe...,110,110
16,154 – Line 154 changes route to proceed direct...,"Línea 154: Línea 154 cambia su ruta, que irá d...",154,154
17,177 – Line 177 will extend service further nor...,Línea 177: Línea 177 extenderá su servicio hac...,177,177
18,179 – New Line 179 will operate between Rose H...,Línea 179: La New Línea 179 funcionará entre e...,179,179


In [12]:

# dupes2 = dupes.replace({'(\d+([ ]?[/])\d+)': '<br>'}, regex=True)
# dupes2

#### 2.3.3 Re-add duplicates

In [13]:
dupes['line'] = dupes['line'].str.split('/')
dupes = dupes.explode('line')
temp_df = dupes
temp_df2 = pd.DataFrame()
# dupes
for this_line in dupes['line']:
  line = this_line.strip(" ")
  print(line)
  temp_df = dupes[dupes["line"].str.contains(line)]
  temp_df = temp_df.replace({'(\d+([ ]?[/])\d+)': line}, regex=True,limit=1)
  temp_df = temp_df.replace({'(\d+)([ ][y][ ])(\d+)[ ]?:': line+":"}, regex=True,limit=1)
  temp_df = temp_df.replace({'Líneas': 'Línea'}, regex=True,limit=1)
  # temp_df = temp_df.replace({'(\d+([ ]?[-][ ]?)\d+)': line}, regex=True,limit=1)
  temp_df = temp_df.replace({'(\d+([ ][՝][ ])\d+)': line}, regex=True,limit=1)
  # temp_df = temp_df.replace({'(\d+([ ][՝][ ])\d+)': line}, regex=True,limit=1)
  temp_df2 = temp_df2.append(temp_df)

# temp_df
# lines_takeone_df = lines_takeone_df.append(dupes)
# this_line
# df_updated = dupes.replace({'|*****|': dupes['line']}, regex=True,limit=1)
  # Print the updated dataframe
# df_updated

lines_takeone_df = lines_takeone_df.append(temp_df2)
# df_updated

temp_df2
# dupes

78
79


Unnamed: 0,en,es,line,oid
13,78 – Lines 78 & 79 will be separated. Line 78 ...,Línea 78: Las líneas 78 y 79 se separarán. Lín...,78,79
13,79 – Lines 78 & 79 will be separated. Line 78 ...,Línea 79: Las líneas 78 y 79 se separarán. Lín...,79,79


#### 2.3.4 Join pdfs

In [14]:
# import shutil
import os

#define the folders to look through
folders = os.listdir("../files/schedules")

#set an array for the file types
pdfs_list = []

#create a list of file types
for root, dirs, files in os.walk("../files/schedules"):
    for filename in files:
        lines = filename.replace(" ","").split("_TT")[0].split("-")
        for line in lines:
            this_schedule = {}
            this_schedule['line'] = line.lstrip("0")
            this_schedule['new-schedule'] = "./files/schedules/"+filename
            pdfs_list.append(this_schedule)
            # print(line)
# print(pdfs_list)

schedule_df = pd.DataFrame(pdfs_list)
schedule_df.tail(10)



Unnamed: 0,line,new-schedule
77,662,./files/schedules/662_TT_09-12-21.pdf
78,690,./files/schedules/690_TT_09-12-21.pdf
79,720,./files/schedules/720_TT_09-12-21.pdf
80,754,./files/schedules/754_TT_09-12-21.pdf
81,761,./files/schedules/761_TT_09-12-21.pdf
82,802,./files/schedules/802_TT_09-12-21.pdf
83,854,./files/schedules/854_TT_09-12-21.pdf
84,901,./files/schedules/901_TT_09-12-21.pdf
85,910,./files/schedules/910-950_TT_09-12-21.pdf
86,950,./files/schedules/910-950_TT_09-12-21.pdf


#### 2.3.5 Join `lines docx` data to `all lines` data

We use the pandas method `merge` to join the data on the `line` field and use an `outer` join to make sure to keep all the line data.

In [15]:
### convert the unique line field to the same data type, integers 
all_lines['line'] = all_lines['line'].astype(int)
# all_lines['line']
# all_lines
lines_takeone_df['line']
lines_takeone_df['line'] = lines_takeone_df['line'].astype(int)
schedule_df['line'] = schedule_df['line'].astype(int)

### perform the merge 
merged_lines = all_lines.merge(lines_takeone_df, on='line',how='outer')
merged_lines2 = merged_lines.merge(schedule_df, on='line',how='outer')

### assign the "details" section
merged_lines2 = merged_lines2.assign(section='details')
# merged_lines['AltLine'] = all_lines['line'].astype(int)
merged_lines2

Unnamed: 0,line,line_label,AltLine,order,en,es,oid,new-schedule,section
0,2,2,0.0,1,2 – Lines 2 and 200 merge into new Line 2 betw...,Línea 2: Las líneas 2 y 200 se fusionarán y fo...,2,./files/schedules/002_TT_09-12-21.pdf,details
1,4,4,0.0,2,4 – Line 4 changes route at the north end of d...,Línea 4: Línea 4 cambia su recorrido en el ext...,4,./files/schedules/004_TT_09-12-21.pdf,details
2,10,10,10.0,3,,,,./files/schedules/010_TT_09-12-21.pdf,details
3,14,14,14.0,4,,,,./files/schedules/014_TT_09-12-21.pdf,details
4,16,16,0.0,5,,,,./files/schedules/016_TT_09-12-21.pdf,details
...,...,...,...,...,...,...,...,...,...
121,854,854 / L Line (Gold) Shuttle,0.0,122,854 – L Line (Gold) shuttle service frequency ...,Línea 854: La frecuencia del servicio de enlac...,854,./files/schedules/854_TT_09-12-21.pdf,details
122,901,901 / G Line (Orange),0.0,123,,,,./files/schedules/901_TT_09-12-21.pdf,details
123,910,910 / J Line (Silver),910.0,124,,,,./files/schedules/910-950_TT_09-12-21.pdf,details
124,950,950 / J Line (Silver),910.0,125,,,,./files/schedules/910-950_TT_09-12-21.pdf,details


#### 2.3.6 Join the merged lines to the final data frame

In [16]:
final_df = final_df.append(merged_lines2)

#### 2.3.7 Join the rail data at the end of the `details` 

In [17]:
final_df.head(20)

Unnamed: 0,section,order,line,altline,en,es,new-schedule,current-schedule,line_label,AltLine,oid
32,header,1,,,Metro is making service changes.,Metro está haciendo cambios en el servicio.,,,,,
33,header,2,,,"New schedules start December 10, 2021.",El cobro de las tarifas de autobús de Metro se...,,,,,
1,summary,0,,,"Starting on Sunday, December 19, 2021, Metro i...","A partir del domingo 19 de diciembre de 2021, ...",,,,,
2,summary,1,,,We're realigning routes for easier access to k...,Estamos cambiando los recorridos para facilita...,,,,,
3,summary,2,,,Some bus stops will also be consolidated to im...,También se consolidarán algunas paradas de aut...,,,,,
4,summary,3,,,The following lines will have extra trips adde...,Las siguientes líneas tendrán más viajes en di...,,,,,
5,summary,4,,,"On WEEKENDS (Saturday/Sunday): 256, 720","Fines de semana (sábado/domingo): 256, 720",,,,,
6,summary,5,,,On SUNDAY only: 94,Solo Domingo: 94,,,,,
30,summary,7,,,"A Line (Blue), C Line (Green), E Line (Expo), ...","Líneas de tren: A Line (Blue), C Line (Green),...",,,,,
7,details,0,,,We’re modifying service on these bus lines:,Estamos modificando el servicio en las siguien...,,,,,


### 2.4 Add the `end` section

In [18]:
### process the first end section
df = df.replace('metro.net/micro',r'<a href="https://www.metro.net/micro">metro.net/micro</a>',regex=True)
end1 = df.loc[(df['en'].str.contains('For more information '))]

end1 = end1.assign(section='end')
end1 = end1.assign(order=1)

### process the second end section
end2 = df.loc[(df['en'].str.contains('\\* M'))]

end2 = end2.assign(order=2)
end2 = end2.assign(section='end')

### add the second end section to the first
end1 = end1.append(end2)

### add the end section to the final data frame
final_df = final_df.append(end1)

### preview the end section
end1

Unnamed: 0,en,es,section,order
31,For more information on Metro service changes ...,Para obtener más información sobre los cam-bio...,end,1


## Final output
### 3.1 Additional edits

In [19]:
# ### Line 55
# final_df.loc[final_df.line==55, ['en']] = '55 – New stop at Compton / 89th St for the southbound Line 55.'
# final_df.loc[final_df.line==55, ['zh-TW']] = '55 – 南行 55 號線康普頓 / 89 街的新站。'
# final_df.loc[final_df.line==55, ['vi']] = '55 – Điểm dừng mới tại Compton / 89th St cho Tuyến 55 về phía nam.'
# final_df.loc[final_df.line==55, ['ko']] = '55 – 콤프턴 / 89th St에서 남쪽으로 향하는 55호선에 대한 새로운 정류장.'
# final_df.loc[final_df.line==55, ['ru']] = '55 – Новая остановка на Compton / 89th St для южной линии 55.'
# final_df.loc[final_df.line==55, ['es']] = 'Línea 55: Nueva parada en Compton / 89th St para la línea 55 en dirección sur.'
# final_df.loc[final_df.line==55, ['hy']] = '55՝ Նոր կանգառ Compton / 89th St for the southbound Line 55.'
# final_df.loc[final_df.line==55, ['ja']] = '55 - 南行きのライン55のための Compton / 89th Stで新しい停留所。'
# # final_df.loc[final_df.line==55, ['es', 'zh-TW', 'vi', 'ko', 'ja', 'hy', 'ru']] = '55 – New stop at Compton / 89th St for the southbound Line 55.'
# final_df.loc[final_df.line==55]


In [20]:
# canceled_message = " The following stops are being canceled: "
# canceled_message_es = " Las siguientes paradas están canceladas: "

# w_and_e = "(westbound and eastbound)"
# w_and_e_es = "(hacia el oeste y hacia el este)"

# canceled_message_owl = " The following Owl stops are being canceled " + w_and_e + ": "
# canceled_message_owl_es = " Las siguientes paradas nocturnas están canceladas " + w_and_e_es + ": "

# canceled_stops_2 = "Sunset / Mapleton " + w_and_e + "."
# canceled_stops_2_es = "Sunset / Mapleton " + w_and_e_es + "."

# canceled_stops_4 = "Santa Monica / 11th, Santa Monica / 17th, Santa Monica / 23rd, Santa Monica / Cloverfield, Santa Monica / Yale, Santa Monica / Berkeley, Santa Monica / Centinela, Santa Monica / Wellesley, Santa Monica / Brockton, Santa Monica / Westgate, Santa Monica / Federal, Santa Monica / Sawtelle."

# canceled_stops_33 = "Venice / Ogden (westbound), Venice / Genesee (eastbound)."
# canceled_stops_33_es = "Venice / Ogden (hacia el oeste), Venice / Genesee (hacia el este)."

# canceled_stops_602 = "Sunset / Rockingham " + w_and_e + "."
# canceled_stops_602_es = "Sunset / Rockingham " + w_and_e_es + "."

# final_df.loc[final_df.line==2, ['en']] = final_df.loc[final_df.line==2, ['en']] + canceled_message + canceled_stops_2
# final_df.loc[final_df.line==2, ['es']] = final_df.loc[final_df.line==2, ['es']] + canceled_message_es + canceled_stops_2_es

# final_df.loc[final_df.line==4, ['en']] = final_df.loc[final_df.line==4, ['en']] + canceled_message_owl + canceled_stops_4
# final_df.loc[final_df.line==4, ['es']] = final_df.loc[final_df.line==4, ['es']] + canceled_message_owl_es + canceled_stops_4

# final_df.loc[final_df.line==33, ['en']] = final_df.loc[final_df.line==33, ['en']] + canceled_message + canceled_stops_33
# final_df.loc[final_df.line==33, ['es']] = final_df.loc[final_df.line==33, ['es']] + canceled_message_es + canceled_stops_33_es

# final_df.loc[final_df.line==602, ['en']] = final_df.loc[final_df.line==602, ['en']] + canceled_message + canceled_stops_602
# final_df.loc[final_df.line==602, ['es']] = final_df.loc[final_df.line==602, ['es']] + canceled_message_es + canceled_stops_602_es


### 3.2 Check the data frame

In [21]:
final_df.head(55)

Unnamed: 0,section,order,line,altline,en,es,new-schedule,current-schedule,line_label,AltLine,oid
32,header,1,,,Metro is making service changes.,Metro está haciendo cambios en el servicio.,,,,,
33,header,2,,,"New schedules start December 10, 2021.",El cobro de las tarifas de autobús de Metro se...,,,,,
1,summary,0,,,"Starting on Sunday, December 19, 2021, Metro i...","A partir del domingo 19 de diciembre de 2021, ...",,,,,
2,summary,1,,,We're realigning routes for easier access to k...,Estamos cambiando los recorridos para facilita...,,,,,
3,summary,2,,,Some bus stops will also be consolidated to im...,También se consolidarán algunas paradas de aut...,,,,,
4,summary,3,,,The following lines will have extra trips adde...,Las siguientes líneas tendrán más viajes en di...,,,,,
5,summary,4,,,"On WEEKENDS (Saturday/Sunday): 256, 720","Fines de semana (sábado/domingo): 256, 720",,,,,
6,summary,5,,,On SUNDAY only: 94,Solo Domingo: 94,,,,,
30,summary,7,,,"A Line (Blue), C Line (Green), E Line (Expo), ...","Líneas de tren: A Line (Blue), C Line (Green),...",,,,,
7,details,0,,,We’re modifying service on these bus lines:,Estamos modificando el servicio en las siguien...,,,,,


### 3.2 Split the final data frame into JSON files depending on the language

In [22]:
languages = ['en','es']
# languages = ['en','es','zh-TW','vi','ko','ja','hy','ru']
DATA_OUTPUT_PATH = "../data/takeones/"
for i in languages:
    final_final_df = final_df[['section','order', i,'line', 'new-schedule', 'current-schedule']].copy()
    final_final_df = final_final_df.rename(columns={i: 'content'})
    final_final_df.to_json(DATA_OUTPUT_PATH + 'takeone-' + i + '.json',orient='records')
    print('Takeone created for: ' + i)

Takeone created for: en
Takeone created for: es


## Extra code

In [23]:
### RIP: code to split based on `:`
# th['en'] = th['en'].str.split(':')
# th = th.explode('en')
###