Skip to content

Commit

Permalink
Merge branch 'stage' into production
Browse files Browse the repository at this point in the history
  • Loading branch information
moz-rotimib committed Feb 14, 2024
2 parents 77e6418 + 9f939be commit 7d9f91d
Show file tree
Hide file tree
Showing 30 changed files with 1,320 additions and 100 deletions.
3 changes: 1 addition & 2 deletions .env-local-docker.example
Original file line number Diff line number Diff line change
@@ -1,11 +1,10 @@
CV_DB_ROOT_PASS="voicewebroot"
CV_MYSQLHOST="db"
CV_S3_CONFIG='{"endpoint": "http://s3proxy:80", "accessKeyId": "local-identity", "secretAccessKey": "local-credential", "s3ForcePathStyle": true}'
CV_STORAGE_LOCAL_DEVELOPMENT_ENDPOINT="http://storage:8080"
CV_BULK_SUBMISSION_BUCKET_NAME="common-voice-bulk-submissions"
CV_ENVIRONMENT="local"
CV_PROD="false"
CV_IMPORT_SENTENCES="true"
CV_EMAIL_USERNAME_FROM="commonvoice@mozilla.com"
CV_EMAIL_USERNAME_TO="commonvoice@mozilla.com"
CV_REDIS_URL='redis'
CV_REDIS_URL="redis"
69 changes: 48 additions & 21 deletions docs/DEVELOPMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ The project is organized into the following directories:
- _scripts_: Some scripts for managing data
- _server_: The server-side app logic, written in [TypeScript](http://www.typescriptlang.org/).
- _web_: The Common Voice website files, written in [TypeScript](http://www.typescriptlang.org/). We use [React](https://reactjs.org/) to build the website.
- _bundler_: Service that is creating the dataset release bundles for Common Voice, written in [TypeScript](http://www.typescriptlang.org/).

## Docker

Expand All @@ -26,33 +27,59 @@ You can find configurable options, like the port Common Voice is running on, in

If you're using Docker, you should save this file as `.env-local-docker` (see `.env-local-docker.example`) in the root directory of the project, and it will be formatted like unix env values, with each key having a `CV_` prefix. For example:

```
```Dotenv
CV_DB_ROOT_PASS="root"
CV_MYSQLHOST="db"
CV_IMPORT_SENTENCES="true"
```

> You can copy the example with `cp .env-local-docker.example .env-local-docker`.
Copy the example with:

```sh
> cd common-voice
> cp .env-local-docker.example .env-local-docker
```

This will instruct your application to import the sentences located in `server/data/*` on start up. This step is _IMPORTANT_ to be able to contribute to specific languages.

### Setup steps

Run the following commands:

```
```sh
> cd common-voice
> docker-compose up
```

This is going to:

- Launch a mysql instance configured for `common-voice`
- Launch an s3proxy instance to store files locally and avoid going through setting up AWS S3.
- Mount the project using a Docker volume to allow reflecting changes to the codebase directly to the container.
- Launch a fake GCP Cloud Storage instance to store files locally and avoid going through setting up GCP Cloud Storage
- Mount the project using a Docker volume to allow reflecting changes to the codebase directly to the container
- Import sentences from `server/data/*`
- Launch `common-voice` server
- Launch `bundler` service

You can visit the website at [http://localhost:9000](http://localhost:9000).
Once you've have imported the sentences for all locales (or just the ones that are of interest to you) open a new terminal and flush the redis cache:

**Note**: Docker can be a very memory-intensive process. If you notice intermittent failures, or if features like auto-rebuilding crash, try increasing Docker's available memory from within Docker's _Preferences > Resources_ settings.\*\*
```sh
> docker exec -it redis redis-cli FLUSHALL
```

This will ensure that on the next restart the languages, that we just imported sentences for, will be available for contribution.

Restart the server and you should be able to visit the website at [http://localhost:9000](http://localhost:9000).

**Notes**:

The _bundler_ service is not strictly needed to run the common voice website. Run the following commands to just run the minimal setup:

```sh
> cd common-voice
> docker-compose up web
```

Docker can be a very memory-intensive process. If you notice intermittent failures, or if features like auto-rebuilding crash, try increasing Docker's available memory from within Docker's _Preferences > Resources_ settings.\*\*

#### Apple M1 Silicon

Expand All @@ -75,14 +102,14 @@ ERROR: Couldn't connect to Docker daemon at http+docker://localhost - is it runn

You may need to build a new image. You can do that by issuing the following commands:

```
```sh
> cd docker/
> docker build .
```

Then after this you can:

```
```sh
> cd ..
> docker-compose up
```
Expand All @@ -107,7 +134,7 @@ You can find configurable options, like the port Common Voice is running on, in

If you installed the app manually, create a `/config.json` with the config you want to override in JSON format. The keys will not have a `CV_` prefix. For example:

```
```json
{
"IMPORT_SENTENCES": false,
"MYSQLDBNAME": "voice",
Expand All @@ -121,15 +148,15 @@ Once the required components are installed, you need to prepare your database.

You can either create a MySQL superuser that that uses the default `DB_ROOT_USER` and `DB_ROOT_PASS` values from `/server/src/config-helper.ts` or create your own config as described above.

### S3 configuration
### Cloud Storage configuration

The Common Voice project uses S3 for voice clip storage. This will be provided for you if you use the Docker installation, but if you are doing local development you will need to set up your own S3 instance. For detaield instructions on how to do that, see [HOWTO_S3.md](./HOWTO_S3.md)
The Common Voice project uses Google Cloud Storage for voice clip storage. This will be provided for you if you use the Docker installation, but if you are doing local development you will need to set up your own Cloud Storage instance. For detailed although outdated instructions on how to do that, see [HOWTO_S3.md](./HOWTO_S3.md). The steps to setup Buckets on GCP should be similar.

### Setup steps

Make sure your MySQL server is running. Then run the following commands:

```
```sh
> yarn
> yarn start
```
Expand All @@ -138,7 +165,7 @@ This will:

1. Install all JavaScript dependencies.
2. Build and serve files located in the `web` folder on localhost.
3. Save uploaded voice clips onto Amazon's S3.
3. Save uploaded voice clips onto Google's Cloud Storage.
4. Lint and rebuild all js files on every change.

You can then access the website at [http://localhost:9000](http://localhost:9000).
Expand Down Expand Up @@ -167,7 +194,7 @@ If you want to work with login-related features (Profile, Dashboard, Goals, ...)

For Docker, in `.env-local-docker`:

```env
```Dotenv
CV_AUTH0_DOMAIN="<domain_here>"
CV_AUTH0_CLIENT_ID="<client_id_here>"
CV_AUTH0_CLIENT_SECRET="<client_secret_here>"
Expand Down Expand Up @@ -196,14 +223,14 @@ To add a migration run:

At the moment you manually have to change the migration file extension to `.ts`. A migration has to expose the following API:

```typescript
```ts
export const up = async function (db: any): Promise<any> {
return null;
};
return null
}

export const down = async function (): Promise<any> {
return null;
};
return null
}
```

Migrations are always run when the server is started.
Expand All @@ -221,7 +248,7 @@ We're using [Fluent](http://projectfluent.org/) to localize strings. You can fin

To update the list of locales run:

```
```sh
> yarn import-locales
```

Expand Down
20 changes: 10 additions & 10 deletions docs/Sample Bulk Submission - Sheet1.tsv
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
Sentence (mandatory) Source (mandatory) Additional rationale for open license (mandatory) Sentence Quality Assurance Feedback (optional) O = satisfactory sentence, X = unsatisfactory sentence Domain (optional)
Six years have passed since I resolved on my present undertaking. Frankenstien, Mary Shelly, 1818, https://www.gutenberg.org/files/42324/42324-h/42324-h.htm My own submission, copyright waived O General
During her illness, many arguments had been urged to persuade my mother to refrain from attending upon her. Frankenstien, Mary Shelly, 1818, https://www.gutenberg.org/files/42324/42324-h/42324-h.htm My own submission, copyright waived O General
She died calmly; and her countenance expressed affection even in death. Frankenstien, Mary Shelly, 1818, https://www.gutenberg.org/files/42324/42324-h/42324-h.htm MCV CC0 waiver process - see legal form O General
My cat is a strange little dude. Jessica Rose (self) MCV CC0 waiver process - see legal form O
I should have brought sunscreen. Jessica Rose (self) More than 100 years since publication O General
Have you read the Doraemon comics yet? Jessica Rose (self) More than 100 years since publication O General
Her don't like pizza. Jane Doe (self) My own submission, copyright waived X
The cat was sitin on the windowsill. Jane Doe (self) My own submission, copyright waived X
The 3 elephants were playing in the mud John Doe (self) My own submission, copyright waived X
Sentence (mandatory) Source (mandatory) Additional rationale for open license (mandatory) Sentence Quality Assurance Feedback: leave blank, for internal use Domain (optional)
Six years have passed since I resolved on my present undertaking. Frankenstien, Mary Shelly, 1818, https://www.gutenberg.org/files/42324/42324-h/42324-h.htm My own submission, copyright waived General
During her illness, many arguments had been urged to persuade my mother to refrain from attending upon her. Frankenstien, Mary Shelly, 1818, https://www.gutenberg.org/files/42324/42324-h/42324-h.htm My own submission, copyright waived General
She died calmly; and her countenance expressed affection even in death. Frankenstien, Mary Shelly, 1818, https://www.gutenberg.org/files/42324/42324-h/42324-h.htm MCV CC0 waiver process - see legal form General
My cat is a strange little dude. Jessica Rose (self) MCV CC0 waiver process - see legal form
I should have brought sunscreen. Jessica Rose (self) More than 100 years since publication General
Have you read the Doraemon comics yet? Jessica Rose (self) More than 100 years since publication General
Her don't like pizza. Jane Doe (self) My own submission, copyright waived
The cat was sitin on the windowsill. Jane Doe (self) My own submission, copyright waived
The 3 elephants were playing in the mud John Doe (self) My own submission, copyright waived
21 changes: 13 additions & 8 deletions server/src/server.ts
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ import getCSPHeaderValue from './csp-header-value';
import { ValidationError } from 'express-json-validator-middleware';
import { setupUpdateValidatedSentencesQueue } from './infrastructure/queues/updateValidatedSentencesQueue';
import { setupBulkSubmissionQueue } from './infrastructure/queues/bulkSubmissionQueue';
import { importSentences } from './lib/model/db/import-sentences';

const MAINTENANCE_VERSION_KEY = 'maintenance-version';
const FULL_CLIENT_PATH = path.join(__dirname, '..', '..', 'web');
Expand Down Expand Up @@ -245,7 +246,7 @@ export default class Server {
/**
* Perform any scheduled maintenance on the data model.
*/
async performMaintenance(doImport: boolean): Promise<void> {
async performMaintenance(): Promise<void> {
const start = Date.now();
this.print('performing Maintenance');

Expand All @@ -254,11 +255,15 @@ export default class Server {
await scrubUserActivity();
await importLocales();

// We do not need to import sentences from files anymore, since users can
// directly add sentences on the CV platform now.
// if (doImport) {
// await importSentences(await this.model.db.mysql.createPool());
// }
// We no longer need to import sentences from files since users can now
// directly add sentences on the CV platform. However, it is still
// valuable to set up a local development environment.
if (
'local' == getConfig().ENVIRONMENT &&
getConfig().IMPORT_SENTENCES
) {
await importSentences(await this.model.db.mysql.createPool());
}

await importTargetSegments();
this.print('Maintenance complete');
Expand Down Expand Up @@ -325,7 +330,7 @@ export default class Server {
const { ENVIRONMENT } = getConfig();

if (!ENVIRONMENT || ENVIRONMENT === 'local') {
await this.performMaintenance(options.doImport);
await this.performMaintenance();
return;
}

Expand All @@ -349,7 +354,7 @@ export default class Server {
}

try {
await this.performMaintenance(options.doImport);
await this.performMaintenance();
await redis.set(MAINTENANCE_VERSION_KEY, this.version);
} catch (e) {
this.print('error during maintenance', e);
Expand Down
5 changes: 5 additions & 0 deletions web/locales/cy/messages.ftl
Original file line number Diff line number Diff line change
Expand Up @@ -823,6 +823,11 @@ no-information-available = Dim gwybodaeth ar gael
dataset-metadata-sex = Rhyw
# dataset metadata - age of contributor
dataset-metadata-age = Oed
donate-modal-message = Mae eich set ddata yn cael ei lwytho i lawr!
dataset-donate-modal-heading = Oeddech chi'n gwybod…
donate-modal-explanation-1 = Mae’n costio bron i filiwn o ddoleri’r flwyddyn i gynnal y setiau data a gwella’r llwyfan ar gyfer y 100+ o gymunedau iaith sy’n dibynnu ar yr hyn rydym yn ei wneud?
# Text in <bold></bold> will shown in bold
donate-modal-explanation-2 = <bold>Os ydych yn gwerthfawrogi data agored, cynhwysol - cyfrannwch heddiw!</bold>
## Download Modal

Expand Down
5 changes: 5 additions & 0 deletions web/locales/de/messages.ftl
Original file line number Diff line number Diff line change
Expand Up @@ -806,6 +806,11 @@ no-information-available = Keine Informationen verfügbar
dataset-metadata-sex = Geschlecht
# dataset metadata - age of contributor
dataset-metadata-age = Alter
donate-modal-message = Ihr Datensatz wird heruntergeladen!
dataset-donate-modal-heading = Wussten Sie …
donate-modal-explanation-1 = Es kostet fast eine Million Dollar im Jahr, die Datensätze zu hosten und die Plattform für die über 100 Sprachgemeinschaften zu verbessern, die sich auf unsere Arbeit verlassen.
# Text in <bold></bold> will shown in bold
donate-modal-explanation-2 = <bold>Wenn Sie Wert auf offene, inklusive Daten legen – spenden Sie heute!</bold>
## Download Modal

Expand Down
44 changes: 42 additions & 2 deletions web/locales/dsb/messages.ftl
Original file line number Diff line number Diff line change
Expand Up @@ -407,8 +407,12 @@ native-language =
profile-form-add-accent = Nowy swójski akcent „{ $inputValue }“ pśidaś
profile-form-submit-save = Składowaś
profile-form-submit-saved = Skłaźony
male = Muski
female = Žeńscyny
male_masculine = Muskecy/Maskulinowy
female_feminine = Žeńscyny/Femininy
intersex = Interseksuelny
transgender = Transgender
non-binary = Njebinarne
do_not_wish_to_say = Njok pódaś
# Gender
other = Druge
why-profile-title = Cogodla profil?
Expand Down Expand Up @@ -768,6 +772,8 @@ no-information-available = Žedne informacije k dispoziciji
dataset-metadata-sex = Rod
# dataset metadata - age of contributor
dataset-metadata-age = Starstwo
donate-modal-message = Waša datowa sajźba se ześěgujo!
dataset-donate-modal-heading = Sćo wěźeł…
## Download Modal

Expand Down Expand Up @@ -1545,9 +1551,11 @@ localization-select =
partnerships-header = Partnaŕstwa
partnerships-get-in-touch = Kontakt
partnerships-become-a-partner = Buźćo partnaŕ Common Voice
partnerships-community-header = Zgromadnosć, kreatiwne a ciwilna towarišnosć
partnerships-foundations-header = Załožby
partnerships-governments-header = Kněžaŕstwa
partnerships-academia-header = Uniwersity, akademikarje a slěźarje
partnerships-corporates-header = Wjelike korporacije a pśedewześa platformow
partnerships-our-partners = Naše partnarje
# FIRST POST SUBMISSION CTA
first-cta-header-text = Wjeliki źěk, až sćo pósćił waše powědańske klipy!
Expand Down Expand Up @@ -1590,6 +1598,7 @@ misreadings-tip-3 = [Wopśimjeśe njewótpowědujo]
background-noise-example-1 = Wjelicke dinosawrierje triasa.
background-noise-tip-2 = [Źěl teksta njejo słyšaś]
background-voices-example-1 = Wjelicke dinosawriery triasa. [cyta se wót jadnogo głosa]
background-voices-tip-1 = Pśiźoš? [głos drugego]
still-have-questions = Maśo hyšći pšašanja?
contact-common-voice = Stajśo z teamom Common Voice do zwiska
public-domain = Zjawnje wužywajobny
Expand Down Expand Up @@ -1617,6 +1626,9 @@ adding-sentences-subheader-offensive-content = Njepśistojne wopśimjeśe
reviewing-sentences-explanation-1 = Jolic sada kriterijam górjejce wótpowědujo, klikniśo na tłocašk „Jo“.
reviewing-sentences-explanation-2 = Jolic sada kriterijam górjejce njewótpowědujo, klikniśo na tłocašk „Ně“.
reviewing-sentences-explanation-4 = Jolic wam sady wujdu, pomagajśo nam dalšne sady zběraś.
## WRITE PAGE

sentence =
.label = Sada
sentence-input-value = Zapódajśo how swóju zjawnje wužywajobnu sadu
Expand All @@ -1639,19 +1651,47 @@ add-sentence-error = Zmólka pśi pśidawanju sady
required-field = Pšosym wupołniśo toś to pólo.
single-sentence = Jadna sada
bulk-sentences = Wjele sadow
# Sentence Domain dropdown option
agriculture = Rolnikaŕstwo
# Sentence Domain dropdown option
automotive = Awta
# Sentence Domain dropdown option
finance = Finance
# Sentence Domain dropdown option
food_service_retail = Caroba, słužbywugbaśe a pśedań
# Sentence Domain dropdown option
general = Powšykne
# Sentence Domain dropdown option
healthcare = Strowotnistwo
# Sentence Domain dropdown option
history_law_government = Stawizny, pšawnistwo a kněžaŕstwo
# Sentence Domain dropdown option
language_fundamentals = Rěcne zakłady (na pś. cyfry, pismiki, pjenjeze)
# Sentence Domain dropdown option
media_entertainment = Medije a rozwjaselenje
# Sentence Domain dropdown option
nature_environment = Pśiroda a wobswět
# Sentence Domain dropdown option
news_current_affairs = Nowosći a aktualne nastupnosći
# Sentence Domain dropdown option
technology_robotics = Technologija a robotika
## REVIEW PAGE

sc-review-rules-title = Wótpowědujo sada směrnicam?
report-sc-different-language = Druga rěc
report-sc-different-language-detail = Jo w drugej rěcy napisana ako pśeglědujom.
# SENTENCE-COLLECTOR-REDIRECT PAGE
sc-redirect-page-title = Pśewjeźomy někotare změny
sc-redirect-page-subtitle-2 = Stajśo nam pšašanja na <matrixLink>Matrix</matrixLink>, <discourseLink>Discourse</discourseLink> abo z <emailLink>e-mailu</emailLink>.
## BULK SUBMISSION

# <icon></icon> will be replaced with an icon that represents upload
sc-bulk-upload-header = Nagrajśo <icon></icon> zjawnje wužywajobne sady
sc-bulk-upload-instruction = Śěgniśo swóju dataju sem abo <uploadButton>klikniśo za nagrawanje</uploadButton>
sc-bulk-upload-instruction-drop = Pušććo dataju how, aby ju nagrał
try-upload-again-md = Nagraśe znowego wopytaś
select-file = Dataju wubraś
select-file-mobile = Wubjeŕśo dataju za nagraśe
accepted-files = Akceptěrowane datajowe typy: jano .tsv
Expand Down
5 changes: 5 additions & 0 deletions web/locales/el/messages.ftl
Original file line number Diff line number Diff line change
Expand Up @@ -808,6 +808,11 @@ no-information-available = Δεν υπάρχουν διαθέσιμες πληρ
dataset-metadata-sex = Φύλο
# dataset metadata - age of contributor
dataset-metadata-age = Ηλικία
donate-modal-message = Γίνεται λήψη του συνόλου δεδομένων σας!
dataset-donate-modal-heading = Γνωρίζατε ότι…
donate-modal-explanation-1 = Η «φιλοξενία» των συνόλων δεδομένων, καθώς και η βελτίωση της πλατφόρμας για τις 100+ γλωσσικές κοινότητες που βασίζονται στο έργο μας, κοστίζουν σχεδόν ένα εκατομμύριο δολάρια ετησίως;
# Text in <bold></bold> will shown in bold
donate-modal-explanation-2 = <bold>Εάν εκτιμάτε τα ανοικτά, συμπεριληπτικά δεδομένα, κάντε μια δωρεά σήμερα!</bold>
## Download Modal

Expand Down
Loading

0 comments on commit 7d9f91d

Please sign in to comment.