Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ett] Configuring and scraping Etruscan. #444

Merged
merged 6 commits into from
Sep 14, 2021
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ Unreleased
- Scraped Kashmiri (`kas`). (\#431)
- Added Malayalam (`mal`) scrape. (\#434)
- Configuration and initial scrape for Dhivehi (`div`, Maldivian). (\#437)
- Configuring and scraping Etruscan (`ett`). (\#444)

#### Changed

Expand Down
3 changes: 2 additions & 1 deletion data/scrape/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@
| [TSV](tsv/enm_latn_broad.tsv) | enm | Middle English (1100-1500) | Middle English | Latin | | False | Broad | True | 6,855 |
| [TSV](tsv/epo_latn_broad.tsv) | epo | Esperanto | Esperanto | Latin | | False | Broad | True | 14,990 |
| [TSV](tsv/est_latn_broad.tsv) | est | Estonian | Estonian | Latin | | False | Broad | True | 429 |
| [TSV](tsv/ett_ital_broad.tsv) | ett | Etruscan | Etruscan | Old Italic | | False | Broad | False | 130 |
| [TSV](tsv/ewe_latn_broad.tsv) | ewe | Ewe | Ewe | Latin | | False | Broad | True | 120 |
| [TSV](tsv/fao_latn_broad.tsv) | fao | Faroese | Faroese | Latin | | False | Broad | True | 1,740 |
| [TSV](tsv/fao_latn_narrow.tsv) | fao | Faroese | Faroese | Latin | | False | Narrow | True | 1,120 |
Expand Down Expand Up @@ -198,7 +199,7 @@
| [TSV](tsv/mah_latn_broad.tsv) | mah | Marshallese | Marshallese | Latin | | False | Broad | True | 900 |
| [TSV](tsv/mah_latn_narrow.tsv) | mah | Marshallese | Marshallese | Latin | | False | Narrow | True | 1,502 |
| [TSV](tsv/mak_latn_narrow.tsv) | mak | Makasar | Makasar | Latin | | False | Narrow | True | 405 |
| [TSV](tsv/mal_mlym_narrow.tsv) | mal | Malayalam | Malayalam | Malayalam | | False | Narrow | None | 141 |
| [TSV](tsv/mal_mlym_narrow.tsv) | mal | Malayalam | Malayalam | Malayalam | | False | Narrow | False | 141 |
| [TSV](tsv/mar_deva_broad.tsv) | mar | Marathi | Marathi | Devanagari | | False | Broad | False | 588 |
| [TSV](tsv/mar_deva_narrow.tsv) | mar | Marathi | Marathi | Devanagari | | False | Narrow | False | 118 |
| [TSV](tsv/may_arab_ara_broad.tsv) | may | Malay (macrolanguage) | Malay | Arabic | | False | Broad | True | 628 |
Expand Down
9 changes: 9 additions & 0 deletions data/scrape/lib/languages.json
Original file line number Diff line number Diff line change
Expand Up @@ -555,6 +555,15 @@
"latn": "Latin"
}
},
"ett": {
"iso639_name": "Etruscan",
"wiktionary_name": "Etruscan",
"wiktionary_code": "ett",
"casefold": false,
"script": {
"ital": "Old Italic"
}
},
"ewe": {
"iso639_name": "Ewe",
"wiktionary_name": "Ewe",
Expand Down
130 changes: 130 additions & 0 deletions data/scrape/tsv/ett_ital_broad.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
𐌀 a
𐌀𐌂𐌀 a c a
𐌀𐌂𐌀𐌅𐌉𐌔𐌄𐌓 a c a v i s e r
𐌀𐌂𐌀𐌆𐌓 a c a z r
𐌀𐌂𐌀𐌋𐌀 a c a l a
𐌀𐌂𐌀𐌋𐌄 a c a l e
𐌀𐌂𐌀𐌋𐌅𐌄 a c a l v e
𐌀𐌂𐌀𐌋𐌉𐌀 a c a l i a
𐌀𐌂𐌀𐌐 a c a p
𐌀𐌂𐌀𐌓𐌉𐌀 a c a r i a
𐌀𐌂𐌀𐌓𐌉𐌀𐌋 a c a r i a l
𐌀𐌂𐌀𐌔 a c a s
𐌀𐌂𐌀𐌔𐌀 a c a s a
𐌀𐌂𐌀𐌔𐌂𐌄 a c a s c e
𐌀𐌂𐌀𐌔𐌓𐌉 a c a s r i
𐌀𐌂𐌄 a c e
𐌀𐌂𐌄𐌉 a t͡ʃ e i̯
𐌀𐌂𐌄𐌉𐌀𐌋 a t͡ʃ e i̯ a̯ l
𐌀𐌂𐌄𐌋 a c e l
𐌀𐌂𐌄𐌋𐌔 a c e l s
𐌀𐌂𐌄𐌔 a c e s
𐌀𐌂𐌉𐌋 a c i l
𐌀𐌂𐌉𐌋𐌀 a c i l a
𐌀𐌂𐌉𐌋𐌖 a c i l u
𐌀𐌂𐌉𐌋𐌖𐌍𐌉𐌀 a c i l u n i a
𐌀𐌂𐌋𐌖𐌔 a c l u s
𐌀𐌂𐌍𐌀𐌉𐌂𐌄 a c n a i c e
𐌀𐌂𐌍𐌀𐌉𐌍𐌄 a c n a i n e
𐌀𐌂𐌍𐌀𐌍𐌀𐌔 a c n a n a s
𐌀𐌂𐌍𐌀𐌑𐌅𐌄𐌓𐌔 a c n a s v e r s
𐌀𐌂𐌍𐌉𐌍𐌀 a c n i n a
𐌀𐌂𐌍𐌔 a c n s
𐌀𐌅𐌄 a f e
𐌀𐌅𐌉𐌋 ɑ w ɪ l
𐌀𐌅𐌉𐌋𐌔 a v i l s
𐌀𐌉𐌈𐌕𐌓𐌀 a i θ r a
𐌀𐌉𐌔 a j s
𐌀𐌉𐌔𐌍𐌀 a j s n a
𐌀𐌉𐌖𐌔 a i u s
𐌀𐌊𐌀 a c a
𐌀𐌋𐌂𐌄 a l c e
𐌀𐌋𐌂𐌖 a l c u
𐌀𐌋𐌄 a l e
𐌀𐌋𐌅 a l f
𐌀𐌋𐌅𐌀 a l f a
𐌀𐌋𐌅𐌀𐌋 a l f a l
𐌀𐌋𐌅𐌀𐌔 a l f a s
𐌀𐌋𐌅𐌉𐌄 a l f i e̯
𐌀𐌋𐌉𐌂𐌄 a l i c e
𐌀𐌋𐌉𐌊𐌄 a l i c e
𐌀𐌋𐌉𐌗𐌀 a l i χ a
𐌀𐌋𐌉𐌗𐌄 a l i χ e
𐌀𐌋𐌑𐌀𐌔𐌄 a l z a s e
𐌀𐌋𐌗𐌖 a l χ u
𐌀𐌍𐌂𐌀𐌓𐌉𐌀 a n c a r i a
𐌀𐌍𐌉𐌀𐌗 a n i a χ
𐌀𐌍𐌉𐌀𐌗𐌄𐌑 a n i a χ e z
𐌀𐌏𐌖𐌌𐌉𐌂𐌀 a θ u m i c a
𐌀𐌏𐌖𐌌𐌉𐌂𐌔 a θ u m i c s
𐌀𐌐𐌀 ɑ p ə
𐌀𐌐𐌉𐌀𐌍𐌀 a p i a n a
𐌀𐌐𐌉𐌓𐌄 a p i r e
𐌀𐌐𐌉𐌓𐌄𐌔 a p i r e s
𐌀𐌓𐌆𐌍𐌀 a r z n a
𐌀𐌓𐌆𐌍𐌄𐌀𐌋 a r z n e a̯ l
𐌀𐌓𐌆𐌍𐌄𐌉 a r z n e i̯
𐌀𐌓𐌆𐌍𐌉 a r z n i
𐌀𐌓𐌆𐌍𐌉𐌔 a r z n i s
𐌀𐌕𐌉 ɑ t ɪ
𐌀𐌕𐌉𐌀𐌋 a t i a̯ l
𐌀𐌕𐌓𐌀𐌍𐌄 a t r a n e
𐌀𐌕𐌓𐌀𐌍𐌄𐌔 a t r a n e s
𐌀𐌕𐌓𐌉𐌖𐌌 a t r i u m
𐌀𐌗𐌉𐌋𐌀𐌔 a χ i l a s
𐌀𐌗𐌉𐌋𐌄 a χ i l e
𐌀𐌗𐌉𐌋𐌄𐌉 a χ i l e i̯
𐌀𐌗𐌋𐌀𐌄 a χ l a e̯
𐌀𐌗𐌋𐌄 a χ l e
𐌀𐌗𐌌𐌄𐌌𐌓𐌖𐌍 a χ m e m r u n
𐌀𐌗𐌍𐌀𐌋 a χ n a l
𐌀𐌗𐌍𐌄𐌉 a χ n e i
𐌀𐌗𐌍𐌉 a χ n i
𐌀𐌗𐌓𐌖𐌌 a χ r u m
𐌀𐌗𐌖 a χ u
𐌀𐌗𐌖𐌀𐌋 a χ u a l
𐌀𐌗𐌖𐌉 a χ u i
𐌀𐌗𐌖𐌋𐌄 a χ u l e
𐌀𐌚𐌓 a f r
𐌀𐌚𐌖𐌓 a f u r
𐌁𐌀𐌋𐌕𐌄𐌀 b a l t e a
𐌁𐌄𐌓𐌂𐌏𐌌𐌔𐌍𐌀 b e r c o m s n a
𐌂𐌀𐌄 c a e
𐌂𐌀𐌄𐌔 c a e s
𐌂𐌀𐌅𐌄 c a v e
𐌂𐌀𐌅𐌄𐌈 c a v e θ
𐌂𐌀𐌅𐌔𐌀 c a v s a
𐌂𐌀𐌍𐌋𐌀𐌔 c a n l a s
𐌂𐌀𐌏𐌍𐌀 c a θ n a
𐌂𐌀𐌏𐌍𐌀𐌋 c a θ n a l
𐌂𐌀𐌏𐌍𐌉 c a θ n i
𐌂𐌀𐌏𐌍𐌉𐌔 c a θ n i s
𐌂𐌀𐌔𐌔𐌉𐌃𐌀 c a s s i d a
𐌂𐌀𐌔𐌔𐌉𐌔 c a s s i s
𐌂𐌀𐌔𐌕𐌂𐌄 c a s t c e
𐌂𐌀𐌕𐌍𐌉𐌔 c a t n i s
𐌂𐌉 k i
𐌄𐌉𐌔 e j s
𐌄𐌉𐌔𐌍𐌀 e j s n a
𐌄𐌔𐌀𐌋 e s a l
𐌅𐌄𐌓𐌔𐌄 v e r s e
𐌅𐌄𐌕𐌖𐌔 v e t u s
𐌅𐌉𐌊𐌖 v i c u
𐌆𐌀𐌋 t͡s a l
𐌆𐌀𐌌𐌀𐌈𐌉 t͡s a m a tʰ i
𐌆𐌀𐌌𐌈𐌉𐌂 t͡s a m tʰ i k
𐌇𐌀𐌋𐌊 h a l k
𐌇𐌀𐌋𐌗𐌆𐌀 h a l χ z a
𐌇𐌀𐌋𐌗𐌆𐌄 h a l χ z a
𐌈𐌖 tʰ u
𐌈𐌖𐌍 tʰ u n
𐌉𐌂𐌀𐌐 i c a p
𐌋𐌖𐌂𐌖𐌌𐌏𐌍𐌄𐌔 l u c u m o n e s
𐌋𐌖𐌂𐌖𐌌𐌖 l u c u m u
𐌋𐌖𐌄𐌀 l u e a
𐌌𐌀𐌙 m a kʰ
𐌍𐌄𐌓𐌉 n e ɾ ɪ
𐌐𐌄𐌓𐌂𐌖𐌌𐌔𐌍𐌀 p e r c u m s n a
𐌑𐌀 ʃ a
𐌗𐌉𐌋𐌀𐌔 χ i l a s
𐌘𐌄𐌓𐌔𐌖 pʰ e r s u
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dope-looking script

3 changes: 2 additions & 1 deletion data/scrape/tsv_summary.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ eng_latn_us_narrow.tsv eng English English Latin US, General American False Narr
enm_latn_broad.tsv enm Middle English (1100-1500) Middle English Latin False Broad True 6855
epo_latn_broad.tsv epo Esperanto Esperanto Latin False Broad True 14990
est_latn_broad.tsv est Estonian Estonian Latin False Broad True 429
ett_ital_broad.tsv ett Etruscan Etruscan Old Italic False Broad False 130
ewe_latn_broad.tsv ewe Ewe Ewe Latin False Broad True 120
fao_latn_broad.tsv fao Faroese Faroese Latin False Broad True 1740
fao_latn_narrow.tsv fao Faroese Faroese Latin False Narrow True 1120
Expand Down Expand Up @@ -196,7 +197,7 @@ mac_cyrl_narrow.tsv mac Macedonian Macedonian Cyrillic False Narrow True 6878
mah_latn_broad.tsv mah Marshallese Marshallese Latin False Broad True 900
mah_latn_narrow.tsv mah Marshallese Marshallese Latin False Narrow True 1502
mak_latn_narrow.tsv mak Makasar Makasar Latin False Narrow True 405
mal_mlym_narrow.tsv mal Malayalam Malayalam Malayalam False Narrow 141
mal_mlym_narrow.tsv mal Malayalam Malayalam Malayalam False Narrow False 141
mar_deva_broad.tsv mar Marathi Marathi Devanagari False Broad False 588
mar_deva_narrow.tsv mar Marathi Marathi Devanagari False Narrow False 118
may_arab_ara_broad.tsv may Malay (macrolanguage) Malay Arabic False Broad True 628
Expand Down
2 changes: 2 additions & 0 deletions wikipron/languagecodes.py
Original file line number Diff line number Diff line change
Expand Up @@ -460,4 +460,6 @@
"dv": "Dhivehi",
"divehi": "Dhivehi",
"maldivian": "Dhivehi",
# Etruscan: ISO 639-3 only.
"ett": "Etruscan",
}