# Converting Take All Documents into JSON

## Documentation

This Jupyter Notebook takes in translations of the Take One brochure and outputs it as a JSON file for the MyBus tool.

The data was originally in a Word document.  In transferring it to a Word document, line breaks and spaces were cleaned up in the content.  Different languages use spaces differently.

The output file is used on the "All Changes" page of the MyBus tool to display the Take One brochure as an HTML page instead of only as a PDF file.  It contains all the details for all line changes aggregated into a single view.

### Notes

#### Not All Lines

Not all lines are listed in the Take One brochure, only those with major changes.  Some lines not listed in the brochure will still have updated schedules due to minor changes.  For the All Changes page to also act as a central source for updated schedule PDFs, this data needed to be updated.

#### Line Numbers

Lines with sister routes are listed in the brochure as a combined line.  For example - the 16/17.  To match entries with their corresponding schedule PDFs, an additional field for the line number was added.


## Setup 
### 1.1 Import modules

In [92]:
import pandas as pd 
import numpy as np
from docx.api import Document
# import re
# import json

# templates = [["header",1,"Metro is making more service changes.","Metro está haciendo más cambios en sus servicios.","Metro正在進行更多服務調整。","Metro hiện đang thực hiện nhiều thay đổi về dịch vụ.","메트로 서비스가 더욱 새롭게단장하고 있습니다.","メトロのサービスが変更されます。","Metro-ն կրկին փոփոխություններ է իրականացնում ծառայությունների մեջ:","Metro вносит дополнительные изменения в схемы движения."]]
# templates = ["header",1],["summary",1],["details",1],["end",1]
# final_template = pd.DataFrame(templates,columns=["section","order","en","es","zh-TW","vi","ko","ja","hy","ru"])


### 1.2 Read .docx and set final output

In [93]:
document = Document('../data/input/202112shakeup.docx')
table = document.tables[0]

# headers = ["section","order","line","altline","en","es","new-schedule","current-schedule"]
headers = ["section","order","header","line","altline",'route-changes','other-changes','schedule-changes','stop-cancellations',"en","es","zh-TW","vi","ko","ja","hy","ru","new-schedule","current-schedule"]

def reset_final_df():
    return pd.DataFrame(columns=headers)

final_df = pd.DataFrame(columns=headers)
final_df

Unnamed: 0,section,order,header,line,altline,route-changes,other-changes,schedule-changes,stop-cancellations,en,es,zh-TW,vi,ko,ja,hy,ru,new-schedule,current-schedule


### 1.3 Set dataframe to docx table and pre-process data

In [94]:
document = Document('../data/input/202112shakeup.docx')
table = document.tables[0]
data = [[cell.text.replace("\n"," ").replace('"','').replace('" ','').lstrip() for cell in row.cells] for row in table.rows]

df = pd.DataFrame(data)
new_header = df.iloc[0]
df = df[1:] 
df.columns = new_header
# print(df.columns)
df = df.rename(columns={'English':'en','Spanish':'es','Chinese (Traditional)':'zh-TW','Korean':'ko','Vietnamese':'vi','Japanese':'ja','Russian':'ru','Armenian':'hy'})

# df = df.rename(columns={'English':'en','Spanish':'es'})

# df = df.rename(columns=df.iloc[0]).drop(df.index[0]).reset_index(drop=True)

df = df.replace(' +',r' ',regex=True)
df = df.replace('"',r'',regex=True)
# df.to_json('test.json')
# df.to_csv('test.csv')
df.head()

final_df = pd.DataFrame(columns=["section","order","header","line","altline",'route-changes','other-changes','schedule-changes','stop-cancellations',"en","es","zh-TW","vi","ko","ja","hy","ru","new-schedule","current-schedule"])
# final_df = pd.DataFrame(columns=["section","order","header","line","altline","en","es",'route_changes','other_changes','schedule_changes','stop_cancellations',"new-schedule","current-schedule"])

## Populating the data

### 2.1 Adding the `Summary` sections

In [95]:
header1 = df.loc[(df['en'].str.contains('\u2013') == False) & (df['en'].str.contains('Metro is making more'))]
header1 = header1.assign(section='header')
header1 = header1.assign(order='1')

if not final_df.empty:
    final_df = reset_final_df()

final_df = final_df.append(header1)

final_df

Unnamed: 0,section,order,header,line,altline,route-changes,other-changes,schedule-changes,stop-cancellations,en,es,zh-TW,vi,ko,ja,hy,ru,new-schedule,current-schedule
1,header,1,,,,,,,,Metro is making more service changes.,Metro está haciendo más cambios en sus servicios.,Metro正在進行更多服務調整。,Metro sắp có thêm nhiều thay đổi về dịch vụ.,Metro 서비스가 더욱 새로워지고 있습니다.,Metroからのサービス改訂のお知らせ。,Metro-ն նոր փոփոխություններ է կատարում ծառայու...,Metro вносит дополнительные изменения в схемы ...,,


### 2.1.1 Populating the `Summary` sections

In [96]:
# Update the `df.index < 10` value when the number of rows in the summary changes

# th = df[df['en'].str.contains('Starting on'):df['en'].str.contains('We’re ')]
th = df.loc[(df['en'].str.contains('\u2013') == False) & (df.index < 11) & (df['en'].str.contains('Metro is making more') == False) & (df['en'].str.contains('We’re modifying service on these bus lines:') == False)]

th = th.assign(section='summary')

th['order'] = ''

th_count = th.shape[0]
for i in range(0,th_count):
    th['order'].values[i] = i

th

final_df = final_df.append(th)
final_df

Unnamed: 0,section,order,header,line,altline,route-changes,other-changes,schedule-changes,stop-cancellations,en,es,zh-TW,vi,ko,ja,hy,ru,new-schedule,current-schedule
1,header,1,,,,,,,,Metro is making more service changes.,Metro está haciendo más cambios en sus servicios.,Metro正在進行更多服務調整。,Metro sắp có thêm nhiều thay đổi về dịch vụ.,Metro 서비스가 더욱 새로워지고 있습니다.,Metroからのサービス改訂のお知らせ。,Metro-ն նոր փոփոխություններ է կատարում ծառայու...,Metro вносит дополнительные изменения в схемы ...,,
2,summary,0,,,,,,,,"Starting on Sunday, December 19, 2021, metro.n...","A partir del domingo 19 de diciembre de 2021, ...",將於2021年12月19日開始使用，metro.net 為了給您帶來更好的公交體驗，Metr...,"Bắt đầu từ Chủ Nhật, ngày 19 tháng 12 năm 2021...","2021년 12월 19일 일요일부터 시작, metro.net 더 나은 버스 환경을 ...",2021年12月19日より、metro.net Metroは、バスの利便性向上のため、サービ...,"Սկսած կիրակի՝ 2021 թ․ դեկտեմբերի 19-ից, metro....","Начиная с воскресенья, 19 декабря 2021 года, m...",,
3,summary,1,,,,,,,,The following lines will have extra trips in D...,Las siguientes líneas tendrán más viajes en di...,以下路線在12月會增加班次。,Những tuyến sau sẽ được tăng chuyến trong thán...,다음 노선은 12월에 운행이 추가될 예정입니다.,下記のライン路線では、12月に臨時便が運行されます。,Դեկտեմբերին հետևյալ գծերը կիրականացնեն հավելյա...,Дополнительные рейсы будут осуществляться по с...,,
4,summary,2,,,,,,,,On Weekdays:,Entre semana:,工作日：,Vào các Ngày Trong Tuần:,평일:,平日:,Աշխատանքային օրերին՝,В будние дни:,,
5,summary,3,,,,,,,,"10, 14, 16, 55, 60, 66, 70, 94, 108, 125, 152,...","10, 14, 16, 55, 60, 66, 70, 94, 108, 125, 152,...","10, 14, 16, 55, 60, 66, 70, 94, 108, 125, 152,...","10, 14, 16, 55, 60, 66, 70, 94, 108, 125, 152,...","10, 14, 16, 55, 60, 66, 70, 94, 108, 125, 152,...",10、14、16、55、60、66、70、94、108、125、152、165、166、23...,"10, 14, 16, 55, 60, 66, 70, 94, 108, 125, 152,...","10, 14, 16, 55, 60, 66, 70, 94, 108, 125, 152,...",,
6,summary,4,,,,,,,,On Weekends (Saturdays/Sundays):,Fines de semana (sábado/domingo):,週末（星期六/星期日）：,Vào các Ngày Cuối Tuần (Thứ Bảy/Chủ Nhật):,주말(토/일):,週末 (土曜日/日曜日)：,Հանգստյան օրերին (շաբաթ/կիրակի)՝,По выходным (суббота/воскресенье):,,
7,summary,5,,,,,,,,"256, 720","256, 720","256, 720","256, 720","256, 720",256、720,"256, 720","256, 720",,
8,summary,6,,,,,,,,On Sundays:,Solo domingo:,星期日：,Chủ Nhật:,일요일:,日曜日：,Կիրակի՝,Воскресенье:,,
9,summary,7,,,,,,,,94,94,94,94,94,94,94,94,,
10,summary,8,,,,,,,,"Rail: A Line (Blue), C Line (Green), E Line (E...","Líneas de tren: A Line (Blue), C Line (Green),...",軌道服務 – 輕軌線路： A Line（Blue）、C Line（Green）、E Line...,Đường sắt – Các tuyến đường sắt nội thành: A L...,"Rail – 경전철(Light rail line): A Line(Blue), C L...",鉄道 – ライトレールライン： A Line（ブルー）、C Line（グリーン）、E Lin...,Երկաթգիծ – Թեթև երկաթուղային գծերը՝ A Line (Bl...,Железнодорожные линии – Легкорельсовые линии: ...,,


In [97]:
df

Unnamed: 0,en,es,zh-TW,vi,ko,ja,hy,ru
1,Metro is making more service changes.,Metro está haciendo más cambios en sus servicios.,Metro正在進行更多服務調整。,Metro sắp có thêm nhiều thay đổi về dịch vụ.,Metro 서비스가 더욱 새로워지고 있습니다.,Metroからのサービス改訂のお知らせ。,Metro-ն նոր փոփոխություններ է կատարում ծառայու...,Metro вносит дополнительные изменения в схемы ...
2,"Starting on Sunday, December 19, 2021, metro.n...","A partir del domingo 19 de diciembre de 2021, ...",將於2021年12月19日開始使用，metro.net 為了給您帶來更好的公交體驗，Metr...,"Bắt đầu từ Chủ Nhật, ngày 19 tháng 12 năm 2021...","2021년 12월 19일 일요일부터 시작, metro.net 더 나은 버스 환경을 ...",2021年12月19日より、metro.net Metroは、バスの利便性向上のため、サービ...,"Սկսած կիրակի՝ 2021 թ․ դեկտեմբերի 19-ից, metro....","Начиная с воскресенья, 19 декабря 2021 года, m..."
3,The following lines will have extra trips in D...,Las siguientes líneas tendrán más viajes en di...,以下路線在12月會增加班次。,Những tuyến sau sẽ được tăng chuyến trong thán...,다음 노선은 12월에 운행이 추가될 예정입니다.,下記のライン路線では、12月に臨時便が運行されます。,Դեկտեմբերին հետևյալ գծերը կիրականացնեն հավելյա...,Дополнительные рейсы будут осуществляться по с...
4,On Weekdays:,Entre semana:,工作日：,Vào các Ngày Trong Tuần:,평일:,平日:,Աշխատանքային օրերին՝,В будние дни:
5,"10, 14, 16, 55, 60, 66, 70, 94, 108, 125, 152,...","10, 14, 16, 55, 60, 66, 70, 94, 108, 125, 152,...","10, 14, 16, 55, 60, 66, 70, 94, 108, 125, 152,...","10, 14, 16, 55, 60, 66, 70, 94, 108, 125, 152,...","10, 14, 16, 55, 60, 66, 70, 94, 108, 125, 152,...",10、14、16、55、60、66、70、94、108、125、152、165、166、23...,"10, 14, 16, 55, 60, 66, 70, 94, 108, 125, 152,...","10, 14, 16, 55, 60, 66, 70, 94, 108, 125, 152,..."
6,On Weekends (Saturdays/Sundays):,Fines de semana (sábado/domingo):,週末（星期六/星期日）：,Vào các Ngày Cuối Tuần (Thứ Bảy/Chủ Nhật):,주말(토/일):,週末 (土曜日/日曜日)：,Հանգստյան օրերին (շաբաթ/կիրակի)՝,По выходным (суббота/воскресенье):
7,"256, 720","256, 720","256, 720","256, 720","256, 720",256、720,"256, 720","256, 720"
8,On Sundays:,Solo domingo:,星期日：,Chủ Nhật:,일요일:,日曜日：,Կիրակի՝,Воскресенье:
9,94,94,94,94,94,94,94,94
10,"Rail: A Line (Blue), C Line (Green), E Line (E...","Líneas de tren: A Line (Blue), C Line (Green),...",軌道服務 – 輕軌線路： A Line（Blue）、C Line（Green）、E Line...,Đường sắt – Các tuyến đường sắt nội thành: A L...,"Rail – 경전철(Light rail line): A Line(Blue), C L...",鉄道 – ライトレールライン： A Line（ブルー）、C Line（グリーン）、E Lin...,Երկաթգիծ – Թեթև երկաթուղային գծերը՝ A Line (Bl...,Железнодорожные линии – Легкорельсовые линии: ...


### 2.1.2 Adding Metro Rail Lines in the summary section

In [98]:
### filter out the rail lines
### note: right now this is hard coded... need a list of rail lines..
rail_df = df.loc[(df['en'].str.contains('\u2013')) & (df['en'].str.startswith('A Line (Blue), C Line (Green)') == True)]

### add this to the end of all the lines
end_lines = len(th) +1

### set the properties
rail_df = rail_df.assign(section='summary')
rail_df = rail_df.assign(order=end_lines)

### add to the final data frame
final_df = final_df.append(rail_df)
final_df

Unnamed: 0,section,order,header,line,altline,route-changes,other-changes,schedule-changes,stop-cancellations,en,es,zh-TW,vi,ko,ja,hy,ru,new-schedule,current-schedule
1,header,1,,,,,,,,Metro is making more service changes.,Metro está haciendo más cambios en sus servicios.,Metro正在進行更多服務調整。,Metro sắp có thêm nhiều thay đổi về dịch vụ.,Metro 서비스가 더욱 새로워지고 있습니다.,Metroからのサービス改訂のお知らせ。,Metro-ն նոր փոփոխություններ է կատարում ծառայու...,Metro вносит дополнительные изменения в схемы ...,,
2,summary,0,,,,,,,,"Starting on Sunday, December 19, 2021, metro.n...","A partir del domingo 19 de diciembre de 2021, ...",將於2021年12月19日開始使用，metro.net 為了給您帶來更好的公交體驗，Metr...,"Bắt đầu từ Chủ Nhật, ngày 19 tháng 12 năm 2021...","2021년 12월 19일 일요일부터 시작, metro.net 더 나은 버스 환경을 ...",2021年12月19日より、metro.net Metroは、バスの利便性向上のため、サービ...,"Սկսած կիրակի՝ 2021 թ․ դեկտեմբերի 19-ից, metro....","Начиная с воскресенья, 19 декабря 2021 года, m...",,
3,summary,1,,,,,,,,The following lines will have extra trips in D...,Las siguientes líneas tendrán más viajes en di...,以下路線在12月會增加班次。,Những tuyến sau sẽ được tăng chuyến trong thán...,다음 노선은 12월에 운행이 추가될 예정입니다.,下記のライン路線では、12月に臨時便が運行されます。,Դեկտեմբերին հետևյալ գծերը կիրականացնեն հավելյա...,Дополнительные рейсы будут осуществляться по с...,,
4,summary,2,,,,,,,,On Weekdays:,Entre semana:,工作日：,Vào các Ngày Trong Tuần:,평일:,平日:,Աշխատանքային օրերին՝,В будние дни:,,
5,summary,3,,,,,,,,"10, 14, 16, 55, 60, 66, 70, 94, 108, 125, 152,...","10, 14, 16, 55, 60, 66, 70, 94, 108, 125, 152,...","10, 14, 16, 55, 60, 66, 70, 94, 108, 125, 152,...","10, 14, 16, 55, 60, 66, 70, 94, 108, 125, 152,...","10, 14, 16, 55, 60, 66, 70, 94, 108, 125, 152,...",10、14、16、55、60、66、70、94、108、125、152、165、166、23...,"10, 14, 16, 55, 60, 66, 70, 94, 108, 125, 152,...","10, 14, 16, 55, 60, 66, 70, 94, 108, 125, 152,...",,
6,summary,4,,,,,,,,On Weekends (Saturdays/Sundays):,Fines de semana (sábado/domingo):,週末（星期六/星期日）：,Vào các Ngày Cuối Tuần (Thứ Bảy/Chủ Nhật):,주말(토/일):,週末 (土曜日/日曜日)：,Հանգստյան օրերին (շաբաթ/կիրակի)՝,По выходным (суббота/воскресенье):,,
7,summary,5,,,,,,,,"256, 720","256, 720","256, 720","256, 720","256, 720",256、720,"256, 720","256, 720",,
8,summary,6,,,,,,,,On Sundays:,Solo domingo:,星期日：,Chủ Nhật:,일요일:,日曜日：,Կիրակի՝,Воскресенье:,,
9,summary,7,,,,,,,,94,94,94,94,94,94,94,94,,
10,summary,8,,,,,,,,"Rail: A Line (Blue), C Line (Green), E Line (E...","Líneas de tren: A Line (Blue), C Line (Green),...",軌道服務 – 輕軌線路： A Line（Blue）、C Line（Green）、E Line...,Đường sắt – Các tuyến đường sắt nội thành: A L...,"Rail – 경전철(Light rail line): A Line(Blue), C L...",鉄道 – ライトレールライン： A Line（ブルー）、C Line（グリーン）、E Lin...,Երկաթգիծ – Թեթև երկաթուղային գծերը՝ A Line (Bl...,Железнодорожные линии – Легкорельсовые линии: ...,,


<!-- 2.1.3. Joining summary page headers -->

In [99]:
# GOOGLE_SHEET_URL = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vSENm-oLTxuzcQUX_0tZ9X0Q2_HIudg1hi5p0MMauqWoHCuomsxb6H6AhqOkaeBY-X1ZKBTbFAzDKUM/pub?output=csv'
# line_changes = pd.read_csv(GOOGLE_SHEET_URL,
#     usecols={'Line Number', 'Line Label', 'Line Description','Route changes','Other changes','Schedule Changes','Stop Cancellations', 'Service'})

# # line_changes.columns = ["line-number","line-label","line-description","lines-merged","line-discontinued","details","card-1","card-2","card-3","current-schedule-url"]

# # line_changes = line_changes.fillna('')
# line_changes.head()

### 2.2. Adding pre-header for `details`

In [100]:
detail_header = df.loc[(df['en'].str.contains('\u2013') == False) & (df.index < 20) & (df['en'].str.contains('We’re modify') == True)]

detail_header = detail_header.assign(section='details')
detail_header = detail_header.assign(order=0)

final_df = final_df.append(detail_header)
detail_header
# final_df.to_json('final_takeone.json',orient='records')

Unnamed: 0,en,es,zh-TW,vi,ko,ja,hy,ru,section,order
11,We’re modifying service on these bus lines:,Estamos modificando el servicio en las siguien...,我們正在修改這些巴士線路的服務：,Chúng tôi hiện đang điều chỉnh dịch vụ ở những...,당사는 이 버스 노선 서비스를 변경하고 있습니다.,下記のバスラインのサービスが変更されます：,Մենք փոփոխում ենք հետևյալ ավտոբուսների գծերի ծ...,Мы вносим изменения в схему движения следующих...,details,0


In [101]:
detail_header

Unnamed: 0,en,es,zh-TW,vi,ko,ja,hy,ru,section,order
11,We’re modifying service on these bus lines:,Estamos modificando el servicio en las siguien...,我們正在修改這些巴士線路的服務：,Chúng tôi hiện đang điều chỉnh dịch vụ ở những...,당사는 이 버스 노선 서비스를 변경하고 있습니다.,下記のバスラインのサービスが変更されます：,Մենք փոփոխում ենք հետևյալ ավտոբուսների գծերի ծ...,Мы вносим изменения в схему движения следующих...,details,0


### 2.3 Adding the `details`/lines section

#### 2.3.1 Process all the lines
First we will read all the lines in from the master list of all the lines.

In [102]:
lines_df = pd.read_csv('../data/input/mybus-dec-2021 - Lines.csv', index_col=0)
lines_df['AltLine'] = lines_df.AltLine.fillna(0).astype(int)
all_lines = lines_df[['Line Label',"AltLine"]]

lines_count = all_lines.shape[0]

all_lines['order'] = ''
all_lines = all_lines.sort_values(by="Line Number")
for i in range(0,lines_count):
    all_lines['order'].values[i] = i+1
all_lines.reset_index(inplace=True)
all_lines = all_lines.rename(columns={"Line Label":"line_label","Line Number":"line"})
all_lines.head(4)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  all_lines['order'] = ''


Unnamed: 0,line,line_label,AltLine,order
0,2,2,0,1
1,4,4,0,2
2,10,10,10,3
3,14,14,14,4


#### 2.3.2 Filter the docx table for the `line details`
 

In [103]:
### filter the lines out based on em-dash and rail lines
lines_takeone_df = df.loc[(df['en'].str.contains('\u2013')) & (df['en'].str.startswith('Rail –') == False)]

### create a field called `line` and set it to the first part of the split `em-dash`
lines_takeone_df['line'] = lines_takeone_df.en.str.split('–').str[0]

### extract duplicates
lines_takeone_df = lines_takeone_df.assign(oid=lines_takeone_df.line.str.split('/')).explode('oid')
dupes = lines_takeone_df.loc[(lines_takeone_df.duplicated(subset=['line']))]

### remove duplicates
lines_takeone_df = lines_takeone_df.drop_duplicates(subset=['line'])
### remove any lines with the "/" in it
lines_takeone_df = lines_takeone_df[lines_takeone_df["line"].str.contains("/")==False]

# lines_takeone_df
lines_takeone_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  lines_takeone_df['line'] = lines_takeone_df.en.str.split('–').str[0]


Unnamed: 0,en,es,zh-TW,vi,ko,ja,hy,ru,line,oid
12,2 – Lines 2 and 200 merge into new Line 2 betw...,2: Las líneas 2 y 200 se fusionarán y formarán...,2 – 2號和200號路線合併為新的2號路線始終站為USC和UCLA/西木區，在工作日和週末...,2 – Tuyến 2 và Tuyến 200 sẽ được gộp thành Tuy...,2 - 2번과 200번 노선이 신규 2번 노선으로 병합되어 평일과 주말에 Alvar...,2 – ライン2およびライン200が統合され、平日および週末ともに、Alvarado Stお...,2 – 2 և 200 գծերը միավորվում են՝ դառնալով նոր ...,2 - Маршруты 2 и 200 объединяются в новый Марш...,2,2
13,4 – Line 4 changes route at the north end of d...,4: Línea 4 cambia su recorrido en el extremo n...,4 – 4號路線在LA市區北端的路線有所改變，更靠近Union Station。東行路線經C...,4 – Tuyến 4 sẽ đổi hướng ở phía bắc của trung ...,4 – 4번 노선이 Union Station에 더 근접하여 운행하기 위해 다운타운 ...,4 – ライン4は、Union Stationの近くを通過するようにダウンタウンLAの北端で...,4 – Line 4-ը փոխում է իր ուղղությունը downtown...,4 – Маршрут 4 меняет схему движения в северной...,4,4
14,33 – Bus stops are discontinued for both direc...,33: Se descontinuarán las paradas de autobús e...,33 – 由於客流量較低，且附近有替代站點，Venice Bl上的Glyndon、Butle...,33 – Các trạm dừng xe buýt trên cả hai chiều s...,33 – 버스 정류장이 이용객 감소와 인근 대체 정류장으로 인해 Glyndon의 V...,33 – GlyndonのVenice Bl、Butler/Minerva、Military...,33 – Ավտոբուսի կանգառները կդադարեն գործել երկո...,33 - Автобусные остановки убрали для обоих нап...,33,33
15,51 – Line 51 north terminus moves from Wilshir...,51: La terminal norte de la Línea 51 se trasla...,51 – 5號路線1的北端終點站從Wilshire/Vermont移至Westlake Ma...,51 – Trạm cuối phía bắc của Tuyến 51 sẽ chuyển...,51 – 51번 노선 북부 종착역이 Wilshire/Vermont에서 Westlak...,51 – ライン51の北終点は、Wilshire/VermontからWestlake Mac...,51 – Line 51-ի հյուսիսի վերջին կանգառը տեղափոխ...,51 - Маршрут 51 северная конечная остановка пе...,51,51
16,53 – Line 53 changes route to serve the upgrad...,53: Line 53 cambia su ruta para brindar servic...,53 - 53號路線有所改變，為Willowbrook/Rosa Parks A Line（...,53 – Tuyến 53 sẽ thay đổi lộ trình để phục vụ ...,53 – 53번 노선이 경로를 변경하여 Avalon C Line (Green) St...,53 – ライン53は、Avalon C Line（グリーン）駅の代わりに、Willowbr...,"53 – Line 53-ը փոխում է ուղղությունը, որպեսզի ...",53 – Маршрут 53 меняет схему движения для обсл...,53,53
18,81 – Bus stops are discontinued at Figueroa/74...,81: Se descontinuarán las paradas de autobús e...,81 – 由於客流量較低，且附近有替代站點，Figueroa/74th St和Figuero...,81 – Các trạm dừng xe buýt trên cả hai chiều s...,81 – 버스 정류장이 Figueroa/74th St와Figueroa/Ave 59 ...,81 – Figueroa/74th StおよびFigueroa/Ave 59のバス停は、利...,81 – Ավտոբուսի կանգառները կդադարեն գործել Figu...,81 – Автобусные остановки больше не работают в...,81,81
19,110 – Line 110 east terminus remains at Bell G...,110: La terminal este de la Línea 110 permanec...,110 – 110號路線的東行線終點站仍設在Bell Garden（Granger/Flor...,110 – Trạm cuối phía đông của Tuyến 110 vẫn ở ...,110 – 110번 노선 동부 종점은 Bell Gardens(Granger/Flor...,110 – ライン110の東終点は、Bell Gardens（Granger/Florenc...,110 – Line 110-ի արևելյան վերջին կանգառը մնում...,110 - Восточная конечная остановка Маршрута 11...,110,110
20,154 – Line 154 changes route to proceed direct...,"154: Línea 154 cambia su ruta, que irá directa...",154 - 由於Edison Bl和Oxnard St的客流量較低，154號路線將改變路線，...,154 – Tuyến 154 sẽ đổi lộ trình để chạy thẳng ...,154 – 154번 노선 경로가 Edison Bl와 Oxnard St의 이용객 감소...,154 – ライン154は、Edison BlおよびOxnard Stでの乗降人員が少なく、...,154 – Line 154-ը փոխում է ուղղությունը և շարու...,154 - Маршрут 154 меняет схему движения и буде...,154,154
21,177 – Line 177 will extend service further nor...,177: Línea 177 extenderá su servicio hacia el ...,177 – 177號路線將進一步向北延伸，途經Fair Oaks Av，從Mountain ...,177 – Tuyến 177 sẽ mở rộng phạm vi dịch vụ về ...,177 – 177번 노선은 Fair Oaks Av를 경유하여 북부 방향으로 연장 서...,177 – ライン177は、Pasadenaにお住いの皆様にさらにご利用いただくため、Fai...,177 – Line 177-ը կընդլայնի սպասարկվող տարածքը ...,177 - Маршрут 177 продлит движение дальше на с...,177,177
22,179 – New Line 179 will operate between Rose H...,179: La New Línea 179 funcionará entre el Rose...,179 - 新的179號路線的始終站為RoseHill交通中心（Huntington Dr/...,179 – Tuyến 179 mới sẽ chạy từ Rose Hill Trans...,179 – 신규 179번 노선이 Rose Hill Transit Center(Hun...,179 – 新しい179は、Rose Hill Transit Center（Hunting...,179 – Նոր Line 179-ը կգործի Rose Hill Transit ...,179 - Новый маршрут 179 будет курсировать межд...,179,179


In [104]:

# dupes2 = dupes.replace({'(\d+([ ]?[/])\d+)': '<br>'}, regex=True)
# dupes2

#### 2.3.3 Re-add duplicates

In [105]:
dupes['line'] = dupes['line'].str.split('/')
dupes = dupes.explode('line')
temp_df = dupes
temp_df2 = pd.DataFrame()
# dupes
for this_line in dupes['line']:
  line = this_line.strip(" ")
  print(line)
  temp_df = dupes[dupes["line"].str.contains(line)]
  temp_df = temp_df.replace({'(\d+([ ]?[/])\d+)': line}, regex=True,limit=1)
  temp_df = temp_df.replace({'(\d+)([ ][y][ ])(\d+)[ ]?:': line+":"}, regex=True,limit=1)
  temp_df = temp_df.replace({'Líneas': 'Línea'}, regex=True,limit=1)
  # temp_df = temp_df.replace({'(\d+([ ]?[-][ ]?)\d+)': line}, regex=True,limit=1)
  temp_df = temp_df.replace({'(\d+([ ][՝][ ])\d+)': line}, regex=True,limit=1)
  # temp_df = temp_df.replace({'(\d+([ ][՝][ ])\d+)': line}, regex=True,limit=1)
  temp_df2 = temp_df2.append(temp_df)

# temp_df
# lines_takeone_df = lines_takeone_df.append(dupes)
# this_line
# df_updated = dupes.replace({'|*****|': dupes['line']}, regex=True,limit=1)
  # Print the updated dataframe
# df_updated

lines_takeone_df = lines_takeone_df.append(temp_df2)
# df_updated

temp_df2
# dupes

78
79


Unnamed: 0,en,es,zh-TW,vi,ko,ja,hy,ru,line,oid
17,78 – Lines 78 & 79 will be separated. Line 78 ...,78: Las líneas 78 y 79 se separarán. Línea 78 ...,78 - 78號和79號路線將被分開。78號路線將繼續運營其常規路線，從南Arcadia至L...,78 – Tuyến 78 và 79 sẽ hoạt động riêng biệt. T...,78 – 78번 및 79번 노선은 분리됩니다. 78번 노선은 South Arcadi...,78 – ライン78 & 79は分割されます。ライン78は、引き続きダウンタウンLAとSou...,78 – 78 և 79 գծերը կառանձնացվեն Line 78-ը կշար...,78 – Маршруты 78 и 79 будут разделены. Маршрут...,78,79
17,79 – Lines 78 & 79 will be separated. Line 78 ...,79: Las líneas 78 y 79 se separarán. Línea 78 ...,79 - 78號和79號路線將被分開。78號路線將繼續運營其常規路線，從南Arcadia至L...,79 – Tuyến 78 và 79 sẽ hoạt động riêng biệt. T...,79 – 78번 및 79번 노선은 분리됩니다. 78번 노선은 South Arcadi...,79 – ライン78 & 79は分割されます。ライン78は、引き続きダウンタウンLAとSou...,79 – 78 և 79 գծերը կառանձնացվեն Line 78-ը կշար...,79 – Маршруты 78 и 79 будут разделены. Маршрут...,79,79


#### 2.3.4 Join pdfs

In [106]:
# import shutil
import os

#define the folders to look through
folders = os.listdir("../files/schedules")

#set an array for the file types
pdfs_list = []

#create a list of file types
for root, dirs, files in os.walk("../files/schedules"):
    for filename in files:
        lines = filename.replace(" ","").split("_TT")[0].split("-")
        for line in lines:
            this_schedule = {}
            this_schedule['line'] = line.lstrip("0")
            this_schedule['new-schedule'] = "./files/schedules/"+filename
            pdfs_list.append(this_schedule)
            # print(line)
# print(pdfs_list)

schedule_df = pd.DataFrame(pdfs_list)
schedule_df.tail(10)



Unnamed: 0,line,new-schedule
62,720,./files/schedules/720_TT_12-19-21.pdf
63,761,./files/schedules/761_TT_12-19-21.pdf
64,801,./files/schedules/801_TT_12-19-21.pdf
65,803,./files/schedules/803_TT_12-19-21.pdf
66,804,./files/schedules/804_TT_12_19_21.pdf
67,806,./files/schedules/806_TT_12-19-21.pdf
68,854,./files/schedules/854_TT_12-19-21.pdf
69,901,./files/schedules/901_TT_12-19-21.pdf
70,910,./files/schedules/910-950_TT_12-19-21.pdf
71,950,./files/schedules/910-950_TT_12-19-21.pdf


#### 2.3.5 Join header data with schedule data

#### 2.3.6 Join `lines docx` data to `all lines` data

We use the pandas method `merge` to join the data on the `line` field and use an `outer` join to make sure to keep all the line data.

In [107]:
### convert the unique line field to the same data type, integers
all_lines['line'] = all_lines['line'].astype(int)
# all_lines['line']
# all_lines
lines_takeone_df['line']
lines_takeone_df['line'] = lines_takeone_df['line'].astype(int)
schedule_df['line'] = schedule_df['line'].astype(int)

### perform the merge 
merged_lines = all_lines.merge(lines_takeone_df, on='line',how='outer')
merged_lines2 = merged_lines.merge(schedule_df, on='line',how='outer')

### assign the "details" section
merged_lines2 = merged_lines2.assign(section='details')
# merged_lines['AltLine'] = all_lines['line'].astype(int)
# merged_lines2 = merged_lines2.loc[merged_lines2['line'] == 801]


#### 2.3.7 Joining the `line-changes.json`

The `line-chnages.ipynb` needs to be run to generate a `line-changes.json` that pulls the latest data from the source Google Sheet.

Here we perform another join to get the current-schedule pdfs added into the final json. We are using merged_lines2 and creating a new output data frame, merge_lines3 as this is the 3rd merge!

In [108]:
line_changes_json = pd.read_json('../data/line-changes.json')
line_changes_json_short = line_changes_json[['route-changes','other-changes','schedule-changes','stop-cancellations','line-number',"current-schedule-url"]]
line_changes_json_short.head(10)


Unnamed: 0,route-changes,other-changes,schedule-changes,stop-cancellations,line-number,current-schedule-url
0,False,False,True,False,2,
1,False,True,False,False,4,
2,True,False,False,False,10,
3,True,False,False,False,14,
4,True,False,False,False,16,
5,False,False,False,False,18,https://media.metro.net/documents/ca3c7637-a7f...
6,False,False,False,False,20,
7,False,False,False,False,28,
8,False,False,False,False,30,https://www.metro.net/line-override/line-30/at...
9,False,True,False,False,33,https://media.metro.net/documents/bdca96d4-90d...


In [109]:

line_changes_json_short['line-number'] = line_changes_json_short['line-number'].astype(int)
line_changes_json_short = line_changes_json_short.rename(columns={"line-number":"line","current-schedule-url":"current-schedule"})
merged_lines2['line'] = merged_lines2['line'].astype(int)

merged_lines3 = merged_lines2.merge(line_changes_json_short, on='line',how='outer')
merged_lines3.head(10)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  line_changes_json_short['line-number'] = line_changes_json_short['line-number'].astype(int)


Unnamed: 0,line,line_label,AltLine,order,en,es,zh-TW,vi,ko,ja,hy,ru,oid,new-schedule,section,route-changes,other-changes,schedule-changes,stop-cancellations,current-schedule
0,2,2,0.0,1,2 – Lines 2 and 200 merge into new Line 2 betw...,2: Las líneas 2 y 200 se fusionarán y formarán...,2 – 2號和200號路線合併為新的2號路線始終站為USC和UCLA/西木區，在工作日和週末...,2 – Tuyến 2 và Tuyến 200 sẽ được gộp thành Tuy...,2 - 2번과 200번 노선이 신규 2번 노선으로 병합되어 평일과 주말에 Alvar...,2 – ライン2およびライン200が統合され、平日および週末ともに、Alvarado Stお...,2 – 2 և 200 գծերը միավորվում են՝ դառնալով նոր ...,2 - Маршруты 2 и 200 объединяются в новый Марш...,2.0,./files/schedules/002_TT_12-19-21.pdf,details,False,False,True,False,
1,4,4,0.0,2,4 – Line 4 changes route at the north end of d...,4: Línea 4 cambia su recorrido en el extremo n...,4 – 4號路線在LA市區北端的路線有所改變，更靠近Union Station。東行路線經C...,4 – Tuyến 4 sẽ đổi hướng ở phía bắc của trung ...,4 – 4번 노선이 Union Station에 더 근접하여 운행하기 위해 다운타운 ...,4 – ライン4は、Union Stationの近くを通過するようにダウンタウンLAの北端で...,4 – Line 4-ը փոխում է իր ուղղությունը downtown...,4 – Маршрут 4 меняет схему движения в северной...,4.0,./files/schedules/004_TT_12-19-21.pdf,details,False,True,False,False,
2,10,10,10.0,3,,,,,,,,,,./files/schedules/010_TT_12-19-21.pdf,details,True,False,False,False,
3,14,14,14.0,4,,,,,,,,,,./files/schedules/014_TT_12-19-21.pdf,details,True,False,False,False,
4,16,16,0.0,5,,,,,,,,,,./files/schedules/016_TT_12-19-21.pdf,details,True,False,False,False,
5,18,18,0.0,6,,,,,,,,,,,details,False,False,False,False,https://media.metro.net/documents/ca3c7637-a7f...
6,20,20,0.0,7,,,,,,,,,,./files/schedules/020_TT_12-19-21.pdf,details,False,False,False,False,
7,28,28,0.0,8,,,,,,,,,,./files/schedules/028_TT_12-19-21.pdf,details,False,False,False,False,
8,30,30,0.0,9,,,,,,,,,,,details,False,False,False,False,https://www.metro.net/line-override/line-30/at...
9,33,33,0.0,10,33 – Bus stops are discontinued for both direc...,33: Se descontinuarán las paradas de autobús e...,33 – 由於客流量較低，且附近有替代站點，Venice Bl上的Glyndon、Butle...,33 – Các trạm dừng xe buýt trên cả hai chiều s...,33 – 버스 정류장이 이용객 감소와 인근 대체 정류장으로 인해 Glyndon의 V...,33 – GlyndonのVenice Bl、Butler/Minerva、Military...,33 – Ավտոբուսի կանգառները կդադարեն գործել երկո...,33 - Автобусные остановки убрали для обоих нап...,33.0,,details,False,True,False,False,https://media.metro.net/documents/bdca96d4-90d...


#### 2.3.7 Join the merged lines to the final data frame

In [110]:
final_df = final_df.append(merged_lines3)

#### 2.3.8 Join the rail data at the end of the `details` 

In [111]:
final_df.head(20)

Unnamed: 0,section,order,header,line,altline,route-changes,other-changes,schedule-changes,stop-cancellations,en,...,vi,ko,ja,hy,ru,new-schedule,current-schedule,line_label,AltLine,oid
1,header,1,,,,,,,,Metro is making more service changes.,...,Metro sắp có thêm nhiều thay đổi về dịch vụ.,Metro 서비스가 더욱 새로워지고 있습니다.,Metroからのサービス改訂のお知らせ。,Metro-ն նոր փոփոխություններ է կատարում ծառայու...,Metro вносит дополнительные изменения в схемы ...,,,,,
2,summary,0,,,,,,,,"Starting on Sunday, December 19, 2021, metro.n...",...,"Bắt đầu từ Chủ Nhật, ngày 19 tháng 12 năm 2021...","2021년 12월 19일 일요일부터 시작, metro.net 더 나은 버스 환경을 ...",2021年12月19日より、metro.net Metroは、バスの利便性向上のため、サービ...,"Սկսած կիրակի՝ 2021 թ․ դեկտեմբերի 19-ից, metro....","Начиная с воскресенья, 19 декабря 2021 года, m...",,,,,
3,summary,1,,,,,,,,The following lines will have extra trips in D...,...,Những tuyến sau sẽ được tăng chuyến trong thán...,다음 노선은 12월에 운행이 추가될 예정입니다.,下記のライン路線では、12月に臨時便が運行されます。,Դեկտեմբերին հետևյալ գծերը կիրականացնեն հավելյա...,Дополнительные рейсы будут осуществляться по с...,,,,,
4,summary,2,,,,,,,,On Weekdays:,...,Vào các Ngày Trong Tuần:,평일:,平日:,Աշխատանքային օրերին՝,В будние дни:,,,,,
5,summary,3,,,,,,,,"10, 14, 16, 55, 60, 66, 70, 94, 108, 125, 152,...",...,"10, 14, 16, 55, 60, 66, 70, 94, 108, 125, 152,...","10, 14, 16, 55, 60, 66, 70, 94, 108, 125, 152,...",10、14、16、55、60、66、70、94、108、125、152、165、166、23...,"10, 14, 16, 55, 60, 66, 70, 94, 108, 125, 152,...","10, 14, 16, 55, 60, 66, 70, 94, 108, 125, 152,...",,,,,
6,summary,4,,,,,,,,On Weekends (Saturdays/Sundays):,...,Vào các Ngày Cuối Tuần (Thứ Bảy/Chủ Nhật):,주말(토/일):,週末 (土曜日/日曜日)：,Հանգստյան օրերին (շաբաթ/կիրակի)՝,По выходным (суббота/воскресенье):,,,,,
7,summary,5,,,,,,,,"256, 720",...,"256, 720","256, 720",256、720,"256, 720","256, 720",,,,,
8,summary,6,,,,,,,,On Sundays:,...,Chủ Nhật:,일요일:,日曜日：,Կիրակի՝,Воскресенье:,,,,,
9,summary,7,,,,,,,,94,...,94,94,94,94,94,,,,,
10,summary,8,,,,,,,,"Rail: A Line (Blue), C Line (Green), E Line (E...",...,Đường sắt – Các tuyến đường sắt nội thành: A L...,"Rail – 경전철(Light rail line): A Line(Blue), C L...",鉄道 – ライトレールライン： A Line（ブルー）、C Line（グリーン）、E Lin...,Երկաթգիծ – Թեթև երկաթուղային գծերը՝ A Line (Bl...,Железнодорожные линии – Легкорельсовые линии: ...,,,,,


### 2.4 Add the `end` section

In [112]:
### process the first end section
df = df.replace('metro.net/micro',r'<a href="https://www.metro.net/micro">metro.net/micro</a>',regex=True)
end1 = df.loc[(df['en'].str.contains('For more information '))]

end1 = end1.assign(section='end')
end1 = end1.assign(order=1)

### process the second end section
end2 = df.loc[(df['en'].str.contains('\\* M'))]

end2 = end2.assign(order=2)
end2 = end2.assign(section='end')

### add the second end section to the first
end1 = end1.append(end2)

### add the end section to the final data frame
final_df = final_df.append(end1)

### preview the end section
end1

Unnamed: 0,en,es,zh-TW,vi,ko,ja,hy,ru,section,order
35,For more information on Metro service changes ...,Para obtener más información sobre los cambios...,如需了解更多關於Metro服務變化和其他Metro服務的資訊，請與Metro客戶服務中心聯繫...,Để biết thêm thông tin về những thay đổi dịch ...,Metro 서비스 변경과 기타 Metro 서비스에 관한 자세한 정보는 전화(323....,Metroのサービス変更やその他のMetroサービスに関する詳細は、メトロ・カスタマー・サー...,Metro ծառայությունների փոփոխությունների և Metr...,Для получения дополнительной информации об изм...,end,1


## Final output
### 3.1 Additional edits

In [113]:
# ### Line 55
# final_df.loc[final_df.line==55, ['en']] = '55 – New stop at Compton / 89th St for the southbound Line 55.'
# final_df.loc[final_df.line==55, ['zh-TW']] = '55 – 南行 55 號線康普頓 / 89 街的新站。'
# final_df.loc[final_df.line==55, ['vi']] = '55 – Điểm dừng mới tại Compton / 89th St cho Tuyến 55 về phía nam.'
# final_df.loc[final_df.line==55, ['ko']] = '55 – 콤프턴 / 89th St에서 남쪽으로 향하는 55호선에 대한 새로운 정류장.'
# final_df.loc[final_df.line==55, ['ru']] = '55 – Новая остановка на Compton / 89th St для южной линии 55.'
# final_df.loc[final_df.line==55, ['es']] = 'Línea 55: Nueva parada en Compton / 89th St para la línea 55 en dirección sur.'
# final_df.loc[final_df.line==55, ['hy']] = '55՝ Նոր կանգառ Compton / 89th St for the southbound Line 55.'
# final_df.loc[final_df.line==55, ['ja']] = '55 - 南行きのライン55のための Compton / 89th Stで新しい停留所。'
# # final_df.loc[final_df.line==55, ['es', 'zh-TW', 'vi', 'ko', 'ja', 'hy', 'ru']] = '55 – New stop at Compton / 89th St for the southbound Line 55.'
# final_df.loc[final_df.line==55]


In [114]:
# canceled_message = " The following stops are being canceled: "
# canceled_message_es = " Las siguientes paradas están canceladas: "

# w_and_e = "(westbound and eastbound)"
# w_and_e_es = "(hacia el oeste y hacia el este)"

# canceled_message_owl = " The following Owl stops are being canceled " + w_and_e + ": "
# canceled_message_owl_es = " Las siguientes paradas nocturnas están canceladas " + w_and_e_es + ": "

# canceled_stops_2 = "Sunset / Mapleton " + w_and_e + "."
# canceled_stops_2_es = "Sunset / Mapleton " + w_and_e_es + "."

# canceled_stops_4 = "Santa Monica / 11th, Santa Monica / 17th, Santa Monica / 23rd, Santa Monica / Cloverfield, Santa Monica / Yale, Santa Monica / Berkeley, Santa Monica / Centinela, Santa Monica / Wellesley, Santa Monica / Brockton, Santa Monica / Westgate, Santa Monica / Federal, Santa Monica / Sawtelle."

# canceled_stops_33 = "Venice / Ogden (westbound), Venice / Genesee (eastbound)."
# canceled_stops_33_es = "Venice / Ogden (hacia el oeste), Venice / Genesee (hacia el este)."

# canceled_stops_602 = "Sunset / Rockingham " + w_and_e + "."
# canceled_stops_602_es = "Sunset / Rockingham " + w_and_e_es + "."

# final_df.loc[final_df.line==2, ['en']] = final_df.loc[final_df.line==2, ['en']] + canceled_message + canceled_stops_2
# final_df.loc[final_df.line==2, ['es']] = final_df.loc[final_df.line==2, ['es']] + canceled_message_es + canceled_stops_2_es

# final_df.loc[final_df.line==4, ['en']] = final_df.loc[final_df.line==4, ['en']] + canceled_message_owl + canceled_stops_4
# final_df.loc[final_df.line==4, ['es']] = final_df.loc[final_df.line==4, ['es']] + canceled_message_owl_es + canceled_stops_4

# final_df.loc[final_df.line==33, ['en']] = final_df.loc[final_df.line==33, ['en']] + canceled_message + canceled_stops_33
# final_df.loc[final_df.line==33, ['es']] = final_df.loc[final_df.line==33, ['es']] + canceled_message_es + canceled_stops_33_es

# final_df.loc[final_df.line==602, ['en']] = final_df.loc[final_df.line==602, ['en']] + canceled_message + canceled_stops_602
# final_df.loc[final_df.line==602, ['es']] = final_df.loc[final_df.line==602, ['es']] + canceled_message_es + canceled_stops_602_es


### 3.2 Check the data frame

In [115]:
final_df.head(55)

Unnamed: 0,section,order,header,line,altline,route-changes,other-changes,schedule-changes,stop-cancellations,en,...,vi,ko,ja,hy,ru,new-schedule,current-schedule,line_label,AltLine,oid
1,header,1,,,,,,,,Metro is making more service changes.,...,Metro sắp có thêm nhiều thay đổi về dịch vụ.,Metro 서비스가 더욱 새로워지고 있습니다.,Metroからのサービス改訂のお知らせ。,Metro-ն նոր փոփոխություններ է կատարում ծառայու...,Metro вносит дополнительные изменения в схемы ...,,,,,
2,summary,0,,,,,,,,"Starting on Sunday, December 19, 2021, metro.n...",...,"Bắt đầu từ Chủ Nhật, ngày 19 tháng 12 năm 2021...","2021년 12월 19일 일요일부터 시작, metro.net 더 나은 버스 환경을 ...",2021年12月19日より、metro.net Metroは、バスの利便性向上のため、サービ...,"Սկսած կիրակի՝ 2021 թ․ դեկտեմբերի 19-ից, metro....","Начиная с воскресенья, 19 декабря 2021 года, m...",,,,,
3,summary,1,,,,,,,,The following lines will have extra trips in D...,...,Những tuyến sau sẽ được tăng chuyến trong thán...,다음 노선은 12월에 운행이 추가될 예정입니다.,下記のライン路線では、12月に臨時便が運行されます。,Դեկտեմբերին հետևյալ գծերը կիրականացնեն հավելյա...,Дополнительные рейсы будут осуществляться по с...,,,,,
4,summary,2,,,,,,,,On Weekdays:,...,Vào các Ngày Trong Tuần:,평일:,平日:,Աշխատանքային օրերին՝,В будние дни:,,,,,
5,summary,3,,,,,,,,"10, 14, 16, 55, 60, 66, 70, 94, 108, 125, 152,...",...,"10, 14, 16, 55, 60, 66, 70, 94, 108, 125, 152,...","10, 14, 16, 55, 60, 66, 70, 94, 108, 125, 152,...",10、14、16、55、60、66、70、94、108、125、152、165、166、23...,"10, 14, 16, 55, 60, 66, 70, 94, 108, 125, 152,...","10, 14, 16, 55, 60, 66, 70, 94, 108, 125, 152,...",,,,,
6,summary,4,,,,,,,,On Weekends (Saturdays/Sundays):,...,Vào các Ngày Cuối Tuần (Thứ Bảy/Chủ Nhật):,주말(토/일):,週末 (土曜日/日曜日)：,Հանգստյան օրերին (շաբաթ/կիրակի)՝,По выходным (суббота/воскресенье):,,,,,
7,summary,5,,,,,,,,"256, 720",...,"256, 720","256, 720",256、720,"256, 720","256, 720",,,,,
8,summary,6,,,,,,,,On Sundays:,...,Chủ Nhật:,일요일:,日曜日：,Կիրակի՝,Воскресенье:,,,,,
9,summary,7,,,,,,,,94,...,94,94,94,94,94,,,,,
10,summary,8,,,,,,,,"Rail: A Line (Blue), C Line (Green), E Line (E...",...,Đường sắt – Các tuyến đường sắt nội thành: A L...,"Rail – 경전철(Light rail line): A Line(Blue), C L...",鉄道 – ライトレールライン： A Line（ブルー）、C Line（グリーン）、E Lin...,Երկաթգիծ – Թեթև երկաթուղային գծերը՝ A Line (Bl...,Железнодорожные линии – Легкорельсовые линии: ...,,,,,


### 3.2 Split the final data frame into JSON files depending on the language

In [116]:
languages = ['en','es','zh-TW','vi','ko','ja','hy','ru']
DATA_OUTPUT_PATH = "../data/takeones/"
for i in languages:
    final_final_df = final_df[['section','order', i,'line', 'route-changes','other-changes','schedule-changes','stop-cancellations','new-schedule', 'current-schedule']].copy()
    final_final_df = final_final_df.rename(columns={i: 'content'})
    final_final_df.to_json(DATA_OUTPUT_PATH + 'takeone-' + i + '.json',orient='records')
    print('Takeone created for: ' + i)

Takeone created for: en
Takeone created for: es
Takeone created for: zh-TW
Takeone created for: vi
Takeone created for: ko
Takeone created for: ja
Takeone created for: hy
Takeone created for: ru


## Extra code

In [117]:
### RIP: code to split based on `:`
# th['en'] = th['en'].str.split(':')
# th = th.explode('en')
###