Skip to content

Remote Server Configuration

Cahid Arda edited this page Mar 14, 2024 · 9 revisions

Quick Start

To train better word embedding models, we need to use huge datasets consisting of millions of words. To minimize the time required for training, we can use a remote server. In this guide, we explain the initial steps to start working on our project easily.

Step-by-Step Guide

To run Python code on a remote server using Putty, follow these simple steps:

  1. Download and install PuTTY from the official website. The installation process is straightforward, and you can accept all the default options during the installation process.

  2. Launch PuTTY and enter the IP address or hostname of the remote server you want to connect to. You can also specify the port number if it is different from the default SSH port (22). Click on the "Open" button to initiate the connection. Once you have established the connection, you will be prompted to enter your password.

  3. Navigate to your working folder and clone the repository:

git clone https://github.com/Turkish-Word-Embeddings/Word-Embeddings-Repository-for-Turkish.git
  1. Open the repository and create a folder called corpus. This folder will be ignored by the .gitignore. Open the folder and install bounwebcorpus.txt with the following commands:
curl https://tulap.cmpe.boun.edu.tr/server/api/core/bitstreams/150f2e37-1dd3-4229-a37e-111f8a365edf/content --output bounwebcorpus.txt.zip
unzip bounwebcorpus.txt.zip
rm bounwebcorpus.txt.zip
  1. To install turkish-text-tokenized.txt, we have two options. You can download the rar files from here and then unrar them. Unfortunately, we don't have unrar command or any equivalent in our remote server. You can use a Python script to unrar them. Alternatively, you can directly download the txt version from here:
gdown 1PTytZ7yGIl9QvxRxCsfWLlHycU4z1_Vp
  1. Install Miniconda with Python 3.9 with the following command:
wget https://repo.anaconda.com/miniconda/Miniconda3-py39_23.1.0-1-Linux-x86_64.sh
bash <filename>

You have to follow the instructions to complete the installation. You can capture the filename using Miniconda*. After installation, restart the terminal and type conda list to make sure it is successfully installed. You can activate the current environment with conda activate if it's not automatically activated. You can run conda info --envs to list all environments. The active environment is the one with an asterisk (*).

  1. Using the python provided by conda, you can run the py scripts in the repository.

  2. In order to transfer the word embedding models from remote server to your local machine, you have to install pscp.exe from here. Then you should follow the steps provided here. For example, to copy the model word2vec-ep5.model, I used the following command:

pscp karahan.saritas@boun.edu.tr@79.123.177.160:/clusterusers/karahan.saritas@boun.edu.tr/turkish-word-embeddings/word2vec/word2vec-ep5.model  C:\Users\karab\Desktop

Clone this wiki locally