Skip to content

Commit

Permalink
[ett] Configuring and scraping Etruscan. (#444)
Browse files Browse the repository at this point in the history
* Adding scrape.

* ISO 639-3 code mapping.

* Language configuration.

* Summaries.

* Updated.

Co-authored-by: Kyle Gorman <kylebgorman@gmail.com>
  • Loading branch information
agutkin and kylebgorman committed Sep 14, 2021
1 parent da40d4c commit b3f1c9b
Show file tree
Hide file tree
Showing 6 changed files with 144 additions and 0 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ Unreleased
- Added Kashmiri (`kas`). (\#431)
- Added Malayalam (`mal`). (\#434)
- Added Dhivehi (`div`). (\#437)
- Added Etruscan (`ett`). (\#444)
- Added Gujarati (`guj`). (\#445)
- Added Kannada (`kan`). (\#446)
- Added Karelian (`krl`). (\#447)
Expand Down
1 change: 1 addition & 0 deletions data/scrape/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@
| [TSV](tsv/enm_latn_broad.tsv) | enm | Middle English (1100-1500) | Middle English | Latin | | False | Broad | True | 6,855 |
| [TSV](tsv/epo_latn_broad.tsv) | epo | Esperanto | Esperanto | Latin | | False | Broad | True | 14,990 |
| [TSV](tsv/est_latn_broad.tsv) | est | Estonian | Estonian | Latin | | False | Broad | True | 429 |
| [TSV](tsv/ett_ital_broad.tsv) | ett | Etruscan | Etruscan | Old Italic | | False | Broad | False | 130 |
| [TSV](tsv/ewe_latn_broad.tsv) | ewe | Ewe | Ewe | Latin | | False | Broad | True | 120 |
| [TSV](tsv/fao_latn_broad.tsv) | fao | Faroese | Faroese | Latin | | False | Broad | True | 1,740 |
| [TSV](tsv/fao_latn_narrow.tsv) | fao | Faroese | Faroese | Latin | | False | Narrow | True | 1,120 |
Expand Down
9 changes: 9 additions & 0 deletions data/scrape/lib/languages.json
Original file line number Diff line number Diff line change
Expand Up @@ -555,6 +555,15 @@
"latn": "Latin"
}
},
"ett": {
"iso639_name": "Etruscan",
"wiktionary_name": "Etruscan",
"wiktionary_code": "ett",
"casefold": false,
"script": {
"ital": "Old Italic"
}
},
"ewe": {
"iso639_name": "Ewe",
"wiktionary_name": "Ewe",
Expand Down
130 changes: 130 additions & 0 deletions data/scrape/tsv/ett_ital_broad.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
πŒ€ a
πŒ€πŒ‚πŒ€ a c a
πŒ€πŒ‚πŒ€πŒ…πŒ‰πŒ”πŒ„πŒ“ a c a v i s e r
πŒ€πŒ‚πŒ€πŒ†πŒ“ a c a z r
πŒ€πŒ‚πŒ€πŒ‹πŒ€ a c a l a
πŒ€πŒ‚πŒ€πŒ‹πŒ„ a c a l e
πŒ€πŒ‚πŒ€πŒ‹πŒ…πŒ„ a c a l v e
πŒ€πŒ‚πŒ€πŒ‹πŒ‰πŒ€ a c a l i a
πŒ€πŒ‚πŒ€πŒ a c a p
πŒ€πŒ‚πŒ€πŒ“πŒ‰πŒ€ a c a r i a
πŒ€πŒ‚πŒ€πŒ“πŒ‰πŒ€πŒ‹ a c a r i a l
πŒ€πŒ‚πŒ€πŒ” a c a s
πŒ€πŒ‚πŒ€πŒ”πŒ€ a c a s a
πŒ€πŒ‚πŒ€πŒ”πŒ‚πŒ„ a c a s c e
πŒ€πŒ‚πŒ€πŒ”πŒ“πŒ‰ a c a s r i
πŒ€πŒ‚πŒ„ a c e
πŒ€πŒ‚πŒ„πŒ‰ a tΝ‘Κƒ e iΜ―
πŒ€πŒ‚πŒ„πŒ‰πŒ€πŒ‹ a tΝ‘Κƒ e iΜ― aΜ― l
πŒ€πŒ‚πŒ„πŒ‹ a c e l
πŒ€πŒ‚πŒ„πŒ‹πŒ” a c e l s
πŒ€πŒ‚πŒ„πŒ” a c e s
πŒ€πŒ‚πŒ‰πŒ‹ a c i l
πŒ€πŒ‚πŒ‰πŒ‹πŒ€ a c i l a
πŒ€πŒ‚πŒ‰πŒ‹πŒ– a c i l u
πŒ€πŒ‚πŒ‰πŒ‹πŒ–πŒπŒ‰πŒ€ a c i l u n i a
πŒ€πŒ‚πŒ‹πŒ–πŒ” a c l u s
πŒ€πŒ‚πŒπŒ€πŒ‰πŒ‚πŒ„ a c n a i c e
πŒ€πŒ‚πŒπŒ€πŒ‰πŒπŒ„ a c n a i n e
πŒ€πŒ‚πŒπŒ€πŒπŒ€πŒ” a c n a n a s
πŒ€πŒ‚πŒπŒ€πŒ‘πŒ…πŒ„πŒ“πŒ” a c n a s v e r s
πŒ€πŒ‚πŒπŒ‰πŒπŒ€ a c n i n a
πŒ€πŒ‚πŒπŒ” a c n s
πŒ€πŒ…πŒ„ a f e
πŒ€πŒ…πŒ‰πŒ‹ Ι‘ w Ιͺ l
πŒ€πŒ…πŒ‰πŒ‹πŒ” a v i l s
πŒ€πŒ‰πŒˆπŒ•πŒ“πŒ€ a i ΞΈ r a
πŒ€πŒ‰πŒ” a j s
πŒ€πŒ‰πŒ”πŒπŒ€ a j s n a
πŒ€πŒ‰πŒ–πŒ” a i u s
πŒ€πŒŠπŒ€ a c a
πŒ€πŒ‹πŒ‚πŒ„ a l c e
πŒ€πŒ‹πŒ‚πŒ– a l c u
πŒ€πŒ‹πŒ„ a l e
πŒ€πŒ‹πŒ… a l f
πŒ€πŒ‹πŒ…πŒ€ a l f a
πŒ€πŒ‹πŒ…πŒ€πŒ‹ a l f a l
πŒ€πŒ‹πŒ…πŒ€πŒ” a l f a s
πŒ€πŒ‹πŒ…πŒ‰πŒ„ a l f i eΜ―
πŒ€πŒ‹πŒ‰πŒ‚πŒ„ a l i c e
πŒ€πŒ‹πŒ‰πŒŠπŒ„ a l i c e
πŒ€πŒ‹πŒ‰πŒ—πŒ€ a l i Ο‡ a
πŒ€πŒ‹πŒ‰πŒ—πŒ„ a l i Ο‡ e
πŒ€πŒ‹πŒ‘πŒ€πŒ”πŒ„ a l z a s e
πŒ€πŒ‹πŒ—πŒ– a l Ο‡ u
πŒ€πŒπŒ‚πŒ€πŒ“πŒ‰πŒ€ a n c a r i a
πŒ€πŒπŒ‰πŒ€πŒ— a n i a Ο‡
πŒ€πŒπŒ‰πŒ€πŒ—πŒ„πŒ‘ a n i a Ο‡ e z
πŒ€πŒπŒ–πŒŒπŒ‰πŒ‚πŒ€ a ΞΈ u m i c a
πŒ€πŒπŒ–πŒŒπŒ‰πŒ‚πŒ” a ΞΈ u m i c s
πŒ€πŒπŒ€ Ι‘ p Ι™
πŒ€πŒπŒ‰πŒ€πŒπŒ€ a p i a n a
πŒ€πŒπŒ‰πŒ“πŒ„ a p i r e
πŒ€πŒπŒ‰πŒ“πŒ„πŒ” a p i r e s
πŒ€πŒ“πŒ†πŒπŒ€ a r z n a
πŒ€πŒ“πŒ†πŒπŒ„πŒ€πŒ‹ a r z n e aΜ― l
πŒ€πŒ“πŒ†πŒπŒ„πŒ‰ a r z n e iΜ―
πŒ€πŒ“πŒ†πŒπŒ‰ a r z n i
πŒ€πŒ“πŒ†πŒπŒ‰πŒ” a r z n i s
πŒ€πŒ•πŒ‰ Ι‘ t Ιͺ
πŒ€πŒ•πŒ‰πŒ€πŒ‹ a t i aΜ― l
πŒ€πŒ•πŒ“πŒ€πŒπŒ„ a t r a n e
πŒ€πŒ•πŒ“πŒ€πŒπŒ„πŒ” a t r a n e s
πŒ€πŒ•πŒ“πŒ‰πŒ–πŒŒ a t r i u m
πŒ€πŒ—πŒ‰πŒ‹πŒ€πŒ” a Ο‡ i l a s
πŒ€πŒ—πŒ‰πŒ‹πŒ„ a Ο‡ i l e
πŒ€πŒ—πŒ‰πŒ‹πŒ„πŒ‰ a Ο‡ i l e iΜ―
πŒ€πŒ—πŒ‹πŒ€πŒ„ a Ο‡ l a eΜ―
πŒ€πŒ—πŒ‹πŒ„ a Ο‡ l e
πŒ€πŒ—πŒŒπŒ„πŒŒπŒ“πŒ–πŒ a Ο‡ m e m r u n
πŒ€πŒ—πŒπŒ€πŒ‹ a Ο‡ n a l
πŒ€πŒ—πŒπŒ„πŒ‰ a Ο‡ n e i
πŒ€πŒ—πŒπŒ‰ a Ο‡ n i
πŒ€πŒ—πŒ“πŒ–πŒŒ a Ο‡ r u m
πŒ€πŒ—πŒ– a Ο‡ u
πŒ€πŒ—πŒ–πŒ€πŒ‹ a Ο‡ u a l
πŒ€πŒ—πŒ–πŒ‰ a Ο‡ u i
πŒ€πŒ—πŒ–πŒ‹πŒ„ a Ο‡ u l e
πŒ€πŒšπŒ“ a f r
πŒ€πŒšπŒ–πŒ“ a f u r
πŒπŒ€πŒ‹πŒ•πŒ„πŒ€ b a l t e a
πŒπŒ„πŒ“πŒ‚πŒπŒŒπŒ”πŒπŒ€ b e r c o m s n a
πŒ‚πŒ€πŒ„ c a e
πŒ‚πŒ€πŒ„πŒ” c a e s
πŒ‚πŒ€πŒ…πŒ„ c a v e
πŒ‚πŒ€πŒ…πŒ„πŒˆ c a v e ΞΈ
πŒ‚πŒ€πŒ…πŒ”πŒ€ c a v s a
πŒ‚πŒ€πŒπŒ‹πŒ€πŒ” c a n l a s
πŒ‚πŒ€πŒπŒπŒ€ c a ΞΈ n a
πŒ‚πŒ€πŒπŒπŒ€πŒ‹ c a ΞΈ n a l
πŒ‚πŒ€πŒπŒπŒ‰ c a ΞΈ n i
πŒ‚πŒ€πŒπŒπŒ‰πŒ” c a ΞΈ n i s
πŒ‚πŒ€πŒ”πŒ”πŒ‰πŒƒπŒ€ c a s s i d a
πŒ‚πŒ€πŒ”πŒ”πŒ‰πŒ” c a s s i s
πŒ‚πŒ€πŒ”πŒ•πŒ‚πŒ„ c a s t c e
πŒ‚πŒ€πŒ•πŒπŒ‰πŒ” c a t n i s
πŒ‚πŒ‰ k i
πŒ„πŒ‰πŒ” e j s
πŒ„πŒ‰πŒ”πŒπŒ€ e j s n a
πŒ„πŒ”πŒ€πŒ‹ e s a l
πŒ…πŒ„πŒ“πŒ”πŒ„ v e r s e
πŒ…πŒ„πŒ•πŒ–πŒ” v e t u s
πŒ…πŒ‰πŒŠπŒ– v i c u
πŒ†πŒ€πŒ‹ tΝ‘s a l
πŒ†πŒ€πŒŒπŒ€πŒˆπŒ‰ tΝ‘s a m a tΚ° i
πŒ†πŒ€πŒŒπŒˆπŒ‰πŒ‚ tΝ‘s a m tΚ° i k
πŒ‡πŒ€πŒ‹πŒŠ h a l k
πŒ‡πŒ€πŒ‹πŒ—πŒ†πŒ€ h a l Ο‡ z a
πŒ‡πŒ€πŒ‹πŒ—πŒ†πŒ„ h a l Ο‡ z a
πŒˆπŒ– tΚ° u
πŒˆπŒ–πŒ tΚ° u n
πŒ‰πŒ‚πŒ€πŒ i c a p
πŒ‹πŒ–πŒ‚πŒ–πŒŒπŒπŒπŒ„πŒ” l u c u m o n e s
πŒ‹πŒ–πŒ‚πŒ–πŒŒπŒ– l u c u m u
πŒ‹πŒ–πŒ„πŒ€ l u e a
πŒŒπŒ€πŒ™ m a kΚ°
πŒπŒ„πŒ“πŒ‰ n e ΙΎ Ιͺ
πŒπŒ„πŒ“πŒ‚πŒ–πŒŒπŒ”πŒπŒ€ p e r c u m s n a
πŒ‘πŒ€ Κƒ a
πŒ—πŒ‰πŒ‹πŒ€πŒ” Ο‡ i l a s
πŒ˜πŒ„πŒ“πŒ”πŒ– pΚ° e r s u
1 change: 1 addition & 0 deletions data/scrape/tsv_summary.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ eng_latn_us_narrow.tsv eng English English Latin US, General American False Narr
enm_latn_broad.tsv enm Middle English (1100-1500) Middle English Latin False Broad True 6855
epo_latn_broad.tsv epo Esperanto Esperanto Latin False Broad True 14990
est_latn_broad.tsv est Estonian Estonian Latin False Broad True 429
ett_ital_broad.tsv ett Etruscan Etruscan Old Italic False Broad False 130
ewe_latn_broad.tsv ewe Ewe Ewe Latin False Broad True 120
fao_latn_broad.tsv fao Faroese Faroese Latin False Broad True 1740
fao_latn_narrow.tsv fao Faroese Faroese Latin False Narrow True 1120
Expand Down
2 changes: 2 additions & 0 deletions wikipron/languagecodes.py
Original file line number Diff line number Diff line change
Expand Up @@ -460,4 +460,6 @@
"dv": "Dhivehi",
"divehi": "Dhivehi",
"maldivian": "Dhivehi",
# Etruscan: ISO 639-3 only.
"ett": "Etruscan",
}

0 comments on commit b3f1c9b

Please sign in to comment.