Skip to content

Tkinter GUI for faster PLINK workflows: manage populations, update .fam IDs, and generate FASTA from .ped

License

Notifications You must be signed in to change notification settings

Scriptococcus/GPLINK

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Genomics Pipeline GUI

A lightweight Tkinter GUI to speed up PLINK-based population extraction and ROI-to-FASTA conversion


Purpose

This repository provides a small GUI tool that automates repetitive pre-processing steps for population genetics analyses. The goal is not to replace command-line tooling, but to standardize, speed up, and reduce errors when producing population-specific PLINK files and extracting regions of interest (ROIs) for downstream conversion to FASTA.


Key dependencies & installation

(The key imports used in the code are: tkinter/ttk/filedialog for the GUI, shutil and os for filesystem operations, pandas for .fam parsing, and subprocess for calling PLINK/Perl.)

Python

  • Python 3.8+ recommended.

Python packages (install in a virtual environment)

Using venv + pip

# create & activate
python3 -m venv .venv
# Linux / macOS
source .venv/bin/activate
# Windows (PowerShell)
.venv\Scripts\Activate.ps1

# upgrade packaging tools
pip install --upgrade pip setuptools wheel

# install runtime deps
pip install pandas

Using Conda (recommended if you already use conda)

conda create -n genomics python=3.10 pandas -y
conda activate genomics

# if tkinter is missing, install it from conda-forge
conda install -c conda-forge tk -y

tkinter (GUI toolkit)

tkinter is included with many Python distributions but may require a platform package:

  • Ubuntu / Debian
    sudo apt update
    sudo apt install python3-tk
  • Fedora
    sudo dnf install python3-tkinter
  • Arch
    sudo pacman -S tk
  • macOS
    • Official python.org installers usually include tkinter.
    • If using Homebrew Python, you may need tcl-tk and extra config.
  • Windows
    • Official Python installer includes tkinter by default.

External tools (not pip packages)

  • PLINK — required. Download the appropriate PLINK binary for your OS and either add it to your PATH or set the plink_path variable in the code to the executable location.
  • Perl — required if you plan to use the included myfile.pl script for .ped → FASTA conversion.
    • Ubuntu: sudo apt install perl
    • Windows: use Strawberry Perl
    • macOS: Perl is usually present
  • PyInstaller (optional) — for creating single-file executables:
pip install pyinstaller

What this tool does (short)

  • Finds .fam files in a selected directory and updates family/individual IDs to standardized population short tags.
  • Calls PLINK to build updated BFILES, then splits those BFILES by population (using generated per-population .fam files and --keep-fam).
  • Extracts a specified ROI (base-pair start/end) for a chosen chromosome and converts the .ped output to FASTA using an external Perl script.
  • Displays step-by-step progress and error messages in the GUI.

How the code is organised (brief explanation)

  • Application class inherits from tk.Tk and sets up the UI (Input and Progress tabs).

  • Input controls include: directory browser, population selection (default or custom), chromosome, ROI start/end and ROI list.

  • Core pipeline functions:

    • update_fam_ids(fam_file): parse .fam, map sample IDs to population short tags and write _updated.fam.
    • run_first_step(...): enumerates .fam files, updates IDs, and calls PLINK to create updated BFILES.
    • run_population_files(...): splits the updated .fam into population-specific .fam and calls PLINK --keep-fam to generate per-population BFILES.
    • make_ped_map_info(...): extracts ROI with PLINK --from-bp/--to-bp, produces .ped, copies the Perl script into the output folder and runs it to make a FASTA.

Quick usage (GUI walkthrough)

  1. Run the GUI:
python app.py
  1. In the GUI:
    • Click Browse and select the directory containing your PLINK files (*.bed, *.bim, *.fam).
    • Choose Default Population (built-in mapping) or Custom Population (add your own short IDs / names).
    • Enter Chromosome (e.g., 1) and add one or more ROI ranges by filling ROI Start and ROI End and clicking Add ROI.
    • Click Run and monitor the Progress tab for logs and errors.

Expected directory layout (before running)

workdir/
  prefix.bed
  prefix.bim
  prefix.fam
  other_prefix.bed
  other_prefix.bim
  other_prefix.fam

Preview: what files and folders are created (after running)

Below is a clearer, realistic preview of the files the GUI creates. Paths and names are examples — your actual filenames will depend on your input BFILE prefixes and the ROI ranges you provide.

workdir/                 # directory you selected in GUI
└─ population_bfiles/    # created by the app (per-folder where .fam lives)
   ├─ Chr1_AFR.bed
   ├─ Chr1_AFR.bim
   ├─ Chr1_AFR.fam
   ├─ Chr1_AMR.bed
   ├─ Chr1_AMR.bim
   ├─ Chr1_AMR.fam
   ├─ Chr1_EUR.bed
   ├─ Chr1_EUR.bim
   ├─ Chr1_EUR.fam
   └─ 1000000_to_2000000/    # ROI folder created for start=1,000,000 end=2,000,000
      ├─ AFR_ROI.ped          # PLINK recode output for AFR in that ROI
      ├─ AFR_ROI.map
      ├─ AFR_FASTA.fas        # FASTA produced by running the Perl converter
      ├─ AMR_ROI.ped
      ├─ AMR_FASTA.fas
      ├─ EUR_ROI.ped
      ├─ EUR_FASTA.fas
      ├─ myfile.pl             # copied Perl helper (if configured to do so)

Notes:
- For each population and chromosome the app creates a `.bed/.bim/.fam` triplet named `Chr{chrom}_{SHORTTAG}.*`.
- For each ROI a subfolder named `{start}_to_{end}` is created and receives the `.ped/.map` (PLINK --recode) files and the FASTA outputs produced by the Perl script.
- If a population has no matching samples in `_updated.fam`, the corresponding `.fam` and BFILE will not be produced and the app logs this event.

Troubleshooting

Instead of attempting to guess or fix environment-specific tools, this app expects you to explicitly provide the locations of external dependencies. Before clicking Run, make sure you have either placed the tools on your PATH or updated the code/settings with the correct paths.

Please provide the following before running:

  • PLINK executable path — edit the plink_path variable in the script or add PLINK to your system PATH. Example absolute paths:

    • Windows: C:\tools\plink\plink.exe
    • Linux/macOS: /usr/local/bin/plink
  • Perl converter script path (myfile.pl) — set source_file in the code to point to the Perl script, or place myfile.pl inside the project folder and confirm the GUI is configured to copy it into ROI output folders. Example:

    • D:\Projects\Summer 2024\myfile.pl (Windows)
    • /home/user/scripts/myfile.pl (Linux/macOS)
  • Working directory — pick the directory containing the input BFILES (.bed/.bim/.fam) via the GUI Browse button.

If any of those are not provided or invalid, the GUI will log the missing path and skip the affected steps. The app will not attempt to autodetect PLINK/Perl locations — it requires explicit paths to avoid running the wrong binary on your machine.


License

This project is licensed under the MIT License — see the LICENSE file for details.

Copyright (c) 2025 Rahul Madhav


Contact / Author

Created by Rahul Madhav. I welcome bug reports and pull requests — please open an issue first to discuss larger changes.

About

Tkinter GUI for faster PLINK workflows: manage populations, update .fam IDs, and generate FASTA from .ped

Topics

Resources

License

Stars

Watchers

Forks

Languages