Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNP weights #104

Open
Tannervanorden opened this issue Nov 16, 2023 · 0 comments
Open

SNP weights #104

Tannervanorden opened this issue Nov 16, 2023 · 0 comments

Comments

@Tannervanorden
Copy link

Hello, I am really struggling to get individual weights of SNPs from a pca analysis. I can get the weights from each individual, but I would really like to get the results from each SNP. I am also wondering why of 1.1 million SNPs in my plink file why the pca plot says it can only use 600,000. It says because they are not all on autosomes, but they are. They are in a species with 34 chromosomes. Any help would be appreciated. I am attaching my code. # Load the SNPRelate package
library(SNPRelate)

Set the file paths

bed.fn <- "/Users/tannervanorden/Desktop/Plink/Original_edited_2.bed"
fam.fn <- "/Users/tannervanorden/Desktop/Plink/Original_edited_2.fam"
bim.fn <- "/Users/tannervanorden/Desktop/Plink/Original_edited_2.bim"

Define the output GDS file

Specify a new GDS file name

new_gds_fn <- "/Users/tannervanorden/Desktop/Plink/PCA.gds"

Convert PLINK BED file to GDS file with cvt.chr="char"

snpgdsBED2GDS(bed.fn, fam.fn, bim.fn, new_gds_fn)

Perform PCA

genofile <- snpgdsOpen(new_gds_fn)
pca_result <- snpgdsPCA(genofile)

Extract population labels

Hendrys <- rep("Hendry's Creek", 42)
PineR <- rep('Pine & Ridge Creeks', 29)
Mill <- rep("Mill Creek", 18)
Trout = rep("Trout Creek", 4)
Mountain_Dell = rep("Mountain Dell", 2)
Birch = rep("Birch Creek", 3)
Willard = rep("Willard Creek", 30)

population_labels <- c(Hendrys, PineR, Mill, Trout, Mountain_Dell, Birch, Willard)

Create a data frame for ggplot

pca_plot_data <- data.frame(
LD1 = pca_result$eigenvect[, 2],
LD2 = pca_result$eigenvect[, 1],
Group = population_labels
)

Define population colors

population_colors <- c("darkorange", "blue", 'goldenrod1', 'red', 'chartreuse1', 'purple', 'pink', "cornflowerblue", 'bisque2', "black", "cyan", "brown", "lightgrey", "olivedrab3")

Create ggplot

pca_plot <- ggplot(pca_plot_data, aes(x = LD1, y = LD2, color = Group)) +
geom_point() +
scale_color_manual(values = population_colors) +
labs(x = "LD1", y = "LD2", title = "PCA Plot") + # Adjust title as needed
theme_linedraw()

Display the plot

print(pca_plot)

Close the GDS file

snpgdsClose(genofile)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant