A lightweight Tkinter GUI to speed up PLINK-based population extraction and ROI-to-FASTA conversion
This repository provides a small GUI tool that automates repetitive pre-processing steps for population genetics analyses. The goal is not to replace command-line tooling, but to standardize, speed up, and reduce errors when producing population-specific PLINK files and extracting regions of interest (ROIs) for downstream conversion to FASTA.
(The key imports used in the code are: tkinter/ttk/filedialog for the GUI, shutil and os for filesystem operations, pandas for .fam parsing, and subprocess for calling PLINK/Perl.)
- Python 3.8+ recommended.
Using venv + pip
# create & activate
python3 -m venv .venv
# Linux / macOS
source .venv/bin/activate
# Windows (PowerShell)
.venv\Scripts\Activate.ps1
# upgrade packaging tools
pip install --upgrade pip setuptools wheel
# install runtime deps
pip install pandasUsing Conda (recommended if you already use conda)
conda create -n genomics python=3.10 pandas -y
conda activate genomics
# if tkinter is missing, install it from conda-forge
conda install -c conda-forge tk -ytkinter is included with many Python distributions but may require a platform package:
- Ubuntu / Debian
sudo apt update sudo apt install python3-tk
- Fedora
sudo dnf install python3-tkinter
- Arch
sudo pacman -S tk
- macOS
- Official python.org installers usually include
tkinter. - If using Homebrew Python, you may need
tcl-tkand extra config.
- Official python.org installers usually include
- Windows
- Official Python installer includes
tkinterby default.
- Official Python installer includes
- PLINK — required. Download the appropriate PLINK binary for your OS and either add it to your
PATHor set theplink_pathvariable in the code to the executable location. - Perl — required if you plan to use the included
myfile.plscript for.ped→ FASTA conversion.- Ubuntu:
sudo apt install perl - Windows: use Strawberry Perl
- macOS: Perl is usually present
- Ubuntu:
- PyInstaller (optional) — for creating single-file executables:
pip install pyinstaller- Finds
.famfiles in a selected directory and updates family/individual IDs to standardized population short tags. - Calls PLINK to build updated BFILES, then splits those BFILES by population (using generated per-population
.famfiles and--keep-fam). - Extracts a specified ROI (base-pair start/end) for a chosen chromosome and converts the
.pedoutput to FASTA using an external Perl script. - Displays step-by-step progress and error messages in the GUI.
-
Applicationclass inherits fromtk.Tkand sets up the UI (Input and Progress tabs). -
Input controls include: directory browser, population selection (default or custom), chromosome, ROI start/end and ROI list.
-
Core pipeline functions:
update_fam_ids(fam_file): parse.fam, map sample IDs to population short tags and write_updated.fam.run_first_step(...): enumerates.famfiles, updates IDs, and calls PLINK to create updated BFILES.run_population_files(...): splits the updated.faminto population-specific.famand calls PLINK--keep-famto generate per-population BFILES.make_ped_map_info(...): extracts ROI with PLINK--from-bp/--to-bp, produces.ped, copies the Perl script into the output folder and runs it to make a FASTA.
- Run the GUI:
python app.py- In the GUI:
- Click Browse and select the directory containing your PLINK files (
*.bed,*.bim,*.fam). - Choose Default Population (built-in mapping) or Custom Population (add your own short IDs / names).
- Enter Chromosome (e.g.,
1) and add one or more ROI ranges by fillingROI StartandROI Endand clicking Add ROI. - Click Run and monitor the Progress tab for logs and errors.
- Click Browse and select the directory containing your PLINK files (
workdir/
prefix.bed
prefix.bim
prefix.fam
other_prefix.bed
other_prefix.bim
other_prefix.fam
Below is a clearer, realistic preview of the files the GUI creates. Paths and names are examples — your actual filenames will depend on your input BFILE prefixes and the ROI ranges you provide.
workdir/ # directory you selected in GUI
└─ population_bfiles/ # created by the app (per-folder where .fam lives)
├─ Chr1_AFR.bed
├─ Chr1_AFR.bim
├─ Chr1_AFR.fam
├─ Chr1_AMR.bed
├─ Chr1_AMR.bim
├─ Chr1_AMR.fam
├─ Chr1_EUR.bed
├─ Chr1_EUR.bim
├─ Chr1_EUR.fam
└─ 1000000_to_2000000/ # ROI folder created for start=1,000,000 end=2,000,000
├─ AFR_ROI.ped # PLINK recode output for AFR in that ROI
├─ AFR_ROI.map
├─ AFR_FASTA.fas # FASTA produced by running the Perl converter
├─ AMR_ROI.ped
├─ AMR_FASTA.fas
├─ EUR_ROI.ped
├─ EUR_FASTA.fas
├─ myfile.pl # copied Perl helper (if configured to do so)
Notes:
- For each population and chromosome the app creates a `.bed/.bim/.fam` triplet named `Chr{chrom}_{SHORTTAG}.*`.
- For each ROI a subfolder named `{start}_to_{end}` is created and receives the `.ped/.map` (PLINK --recode) files and the FASTA outputs produced by the Perl script.
- If a population has no matching samples in `_updated.fam`, the corresponding `.fam` and BFILE will not be produced and the app logs this event.
Instead of attempting to guess or fix environment-specific tools, this app expects you to explicitly provide the locations of external dependencies. Before clicking Run, make sure you have either placed the tools on your PATH or updated the code/settings with the correct paths.
Please provide the following before running:
-
PLINK executable path — edit the
plink_pathvariable in the script or add PLINK to your systemPATH. Example absolute paths:- Windows:
C:\tools\plink\plink.exe - Linux/macOS:
/usr/local/bin/plink
- Windows:
-
Perl converter script path (
myfile.pl) — setsource_filein the code to point to the Perl script, or placemyfile.plinside the project folder and confirm the GUI is configured to copy it into ROI output folders. Example:D:\Projects\Summer 2024\myfile.pl(Windows)/home/user/scripts/myfile.pl(Linux/macOS)
-
Working directory — pick the directory containing the input BFILES (
.bed/.bim/.fam) via the GUI Browse button.
If any of those are not provided or invalid, the GUI will log the missing path and skip the affected steps. The app will not attempt to autodetect PLINK/Perl locations — it requires explicit paths to avoid running the wrong binary on your machine.
This project is licensed under the MIT License — see the LICENSE file for details.
Copyright (c) 2025 Rahul Madhav
Created by Rahul Madhav. I welcome bug reports and pull requests — please open an issue first to discuss larger changes.