Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SEO] Bulgarian Transliteration Improvement - bg.php #3314

Closed
amilenkov opened this issue Oct 13, 2018 · 9 comments
Closed

[SEO] Bulgarian Transliteration Improvement - bg.php #3314

amilenkov opened this issue Oct 13, 2018 · 9 comments

Comments

@amilenkov
Copy link

amilenkov commented Oct 13, 2018

Some time ago I sent you a bg. php file for transliteration into Bulgarian and you put it in core\includes\transliteration. In this old file, Latin transliteration is based on one of the some different methods used in Bulgaria for transliteration from Cyrillic to Latin.

Now I am sending a new bg. php file which is based on the rules for transliteration of Bulgarian, which are implemented by Google. This would help search engines to better understand Bulgarian keywords and text written in Latin in URLs and accordingly to better SEO of Bulgarian web pages.

I suggest that you replace the old file with the new one in core\includes\transliteration in the next version of the core.

bg.zip


PR by @klonos: backdrop/backdrop#2321

@klonos
Copy link
Member

klonos commented Oct 13, 2018

Thank you @amilenkov 👍

I have filed a PR with your file ...I see that 0x044D => 'e' was removed as per your comment in #1544 (comment). One minor modification on my end was to add a dot to the comment in order to adhere to coding standards.

PS: may I ask what was your resource for the Google implementation? I would like to do the same for the Greek transliteration file.

@amilenkov
Copy link
Author

Thanks @klonos,

In Bulgaria there is an officially law-approved transliteration to Latin, but it is not used everywhere, but mainly in official documents, road signs, names of streets etc. People use various other systems, including using numbers for Bulgarian letters that do not have a Latin analogue.

For example, for the Bulgarian letter "ш " instead of the official "sh" they write digit 6 because it is shorter for input. Because the word for the number 6 in Bulgarian begins with the letter "ш" (шест).

You will see in the quoted documents that Google has learned to recognize even transliterated in this way (using figures) words because many people do.

Especially when using computers or smartphones without a Cyrillic keyboard they are very inventive.

That's why Google does not comply with the officially accepted transliteration, but follows the most common in Web pages.

I've tried to find Google sources how to do transliterate right, but I don't find anywhere. Perhaps because they do not perceive only one correct way of transliteration, but perceive a few commonly used and follow the practice of the people.

Source for the way I propose are recommendations from big specialized in SEO Bulgarian companies and from practice in my own sites. I see in practice that transliterated in this way Urls are correctly recognized by Google.

Here are two of the sources I have used (but they are in Bulgarian):

http://www.seo-bg.com/seo-google-transliteration-transliteracia.php

https://ganbox.com/blog/%D1%82%D1%80%D0%B0%D0%BD%D1%81%D0%BB%D0%B8%D1%82%D0%B5%D1%80%D0%B0%D1%86%D0%B8%D1%8F-%D0%B2-google/

There are other sources of SEO professionals and they recommend the same transliteration rules.

I use such a system for a transliteration of two years by manually replacing the BG. php file with my version in each new version of Backdrop. I do the same for Drupal sites.

And I see from the practice that the pages are indexed more successfully.

@klonos
Copy link
Member

klonos commented Oct 13, 2018

Thanks for taking the time to respond @amilenkov ...there is a similar situation in Greek, where we have invented a method of input called Greeklish. There are variations of this method, depending on the preference of people to either to be phonetically correct (the argument being that phonetic is simpler + when non-natives read words they sound more accurate), or to be orthographically correct (the argument being similar to the reason behind this joke).

I was hoping that there would be some publicly accessible, "Google-approved" list of transliteration lists, but it seems that this has been the product of empirical work 👍 ...oh well.

@amilenkov
Copy link
Author

It is easy to understand what system works better with Google.

For example, if you search in Bulgarian with the phrase "web site development" - in Bulgarian that is written "разработка на сайт".

In this transliteration, a problematic letter is "й" because is has not a Latin analogue and is transliterated different, such as "J", "y " or "i ". Much depends on what language the person has studied before and what is his level of education.

Those who have studied English would use one of three variants, equally understandable phonetically. But those who have studied Spanish in no case would use "J" because in Spanish this is an entirely different Bulgarian phonetic letter, that in English may sounds as "dzh" or "h".

So, if you do a search for "development of a site" in Bulgarian in Google:

https://www.google.com/search?q=%D0%B8%D0%B7%D1%80%D0%B0%D0%B1%D0%BE%D1%82%D0%BA%D0%B0+%D0%BD%D0%B0+%D1%81%D0%B0%D0%B9%D1%82&client=firefox-b&ei=IGDCW-6UI4nHrgSg46a4Bw&start=40&sa=N&biw=1649&bih=898

You'll see most page results recognizing "site" transliterated as "sait" or "sayt".

But very small number of pages, shown in first pages of results in Google, if any, will have in their URL "sajt".

This is an illustration how to practically understand which system of transliteration works in the very process of work and search engine optimization.

@jenlampton
Copy link
Member

This looks like an easy win, adding milestone candidate for the next bug-fix release.

@jenlampton jenlampton changed the title Bulgarian Transliteration Improvment - bg.php [SEO] Bulgarian Transliteration Improvement - bg.php Apr 28, 2019
@amilenkov
Copy link
Author

That's great, thank you!

Since publishing this issue on October 13, 2018, with every core update, I had to manually edit the core / includes / transliteration / bg.php file to get the desired transliteration.

Since then I carefully observed the transliteration used by other Bulgarian sites and I am convinced that the proposed transliteration is the most common and acceptable for both site visitors and search engines.

I have developed and maintained more than 20 Backdrop CMS sites and since the beginning of this year I have only been working with this CMS.

@herbdool
Copy link

I've done a code review. Kind of hard to test but seems safe to include.

@quicksketch
Copy link
Member

Looks like we have a failing test:

fail първа статия is correctly transliterated to pyrva statija (actual: parva statiya) in bg langcode. transliteration.test:60

@jenlampton jenlampton added this to the 1.13.2 milestone May 23, 2019
@klonos klonos modified the milestones: 1.13.2, 1.13.3 Jun 1, 2019
@quicksketch quicksketch modified the milestones: 1.13.3, 1.13.4 Aug 7, 2019
@jenlampton jenlampton modified the milestones: 1.13.4, 1.14.1 Sep 16, 2019
@jenlampton jenlampton modified the milestones: 1.14.1, 1.14.2 Oct 13, 2019
@klonos klonos modified the milestones: 1.14.2, 1.14.3 Dec 18, 2019
@jenlampton jenlampton modified the milestones: 1.14.3, 1.15.1 Jan 15, 2020
@jenlampton jenlampton modified the milestones: 1.15.1, 1.15.2 Mar 19, 2020
@jenlampton jenlampton modified the milestones: 1.15.2, 1.16.1, 1.16.2 May 15, 2020
@jenlampton jenlampton modified the milestones: 1.16.2, 1.16.3 Jun 17, 2020
@jenlampton jenlampton modified the milestones: 1.16.3, 1.17.1 Sep 15, 2020
@jenlampton jenlampton modified the milestones: 1.17.1, 1.17.2 Sep 30, 2020
@quicksketch
Copy link
Member

It looks like this issue was replaced by #1604? Which states a similar purpose and was also vetted by @amilenkov. Please reopen if I've misunderstood. I've merged the associated PR from that issue. Thanks @amilenkov and @klonos!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants