-
Notifications
You must be signed in to change notification settings - Fork 1
Remote Server Configuration
To train better word embedding models, we need to use huge datasets consisting of millions of words. To minimize the time required for training, we can use a remote server. In this guide, we explain the initial steps to start working on our project easily.
To run Python code on a remote server using Putty, follow these simple steps:
-
Download and install PuTTY from the official website. The installation process is straightforward, and you can accept all the default options during the installation process.
-
Launch PuTTY and enter the IP address or hostname of the remote server you want to connect to. You can also specify the port number if it is different from the default SSH port (22). Click on the "Open" button to initiate the connection. Once you have established the connection, you will be prompted to enter your password.
-
Navigate to your working folder and clone the repository:
git clone https://github.com/Turkish-Word-Embeddings/Word-Embeddings-Repository-for-Turkish.git- Open the repository and create a folder called corpus. This folder will be ignored by the
.gitignore. Open the folder and installbounwebcorpus.txtwith the following commands:
curl https://tulap.cmpe.boun.edu.tr/server/api/core/bitstreams/150f2e37-1dd3-4229-a37e-111f8a365edf/content --output bounwebcorpus.txt.zip
unzip bounwebcorpus.txt.zip
rm bounwebcorpus.txt.zip- To install
turkish-text-tokenized.txt, we have two options. You can download therarfiles from here and then unrar them. Unfortunately, we don't have unrar command or any equivalent in our remote server. You can use a Python script to unrar them. Alternatively, you can directly download the txt version from here:
gdown 1PTytZ7yGIl9QvxRxCsfWLlHycU4z1_Vp- Install Miniconda with Python 3.9 with the following command:
wget https://repo.anaconda.com/miniconda/Miniconda3-py39_23.1.0-1-Linux-x86_64.sh
bash <filename>You have to follow the instructions to complete the installation. You can capture the filename using Miniconda*. After installation, restart the terminal and type conda list to make sure it is successfully installed. You can activate the current environment with conda activate if it's not automatically activated. You can run conda info --envs to list all environments. The active environment is the one with an asterisk (*).
-
Using the python provided by conda, you can run the
pyscripts in the repository. -
In order to transfer the word embedding models from remote server to your local machine, you have to install
pscp.exefrom here. Then you should follow the steps provided here. For example, to copy the modelword2vec-ep5.model, I used the following command:
pscp karahan.saritas@boun.edu.tr@79.123.177.160:/clusterusers/karahan.saritas@boun.edu.tr/turkish-word-embeddings/word2vec/word2vec-ep5.model C:\Users\karab\Desktop