## Practical 3

## Estimating Chromosomal Sex

In this practical, you will estimate the chromosomal sex of your mystery genome based on the ratio of reads aligning to the Y and X chromosomes, using the approach defined in Skoglund et al, 2013 [1]. 

It will be up to you to determine exactly how to carry out this analysis, but it can be achieved using a simple functions from `samtools` [2].

### Getting Started

<b>If you haven't already done so, start an interactive session</b>

- Sign in to https://ood.huit.harvard.edu/ 
- Navigate to `Interactive Apps → Jupyter Lab - 115`
- Launch a Jupyter Lab session with the following parameters:
    - Number of hours: 2
    - Number of CPUs: 2
- When the session is ready, click “Connect to Jupyter”

<b>Create a working directory (called "practical_3" from which you will run commands and store any files that you generate</b>

<b>Copy these practical instructions to your working directory and open them as a Jupyter Notebook</b>

Then navigate to the practical_3 directory on the sidebar and click on Practical3.ipynb to open it as a Jupyter Notebook

### Part 1) Estimating chromosomal sex

In order to estimate your mystery genome's chromosomal sex using the approach from Skoglund et al 2013 [1], you will need to determine the number of reads that align to the X and Y chromosomes. There are a variety of ways to do this and it is up to you to identify an approach that works for you. Regardless of the approach you choose, you should be sure to filter out any reads that have a mapping quality below 30.

<i>Hint</i> - There are several functions in samtools that will get you the information that you need. If you aren’t sure where to start, try checking the documentation: https://www.htslib.org/doc/samtools.html 

### <i>* Optional *</i> - Part 2) - Estimate chromosomal sex using the approach from Anastasiadou et al 2024 [3]

If you would like an extra challenge, you can try to estimate chromosomal sex using the approach from Anastasiadou et al 2024 [3]. Check out the methods section of the paper to see how they carried out their analysis. 

<i>Keep in mind</i> - Anastasiadou et al 2024 tends to over-estimate the frequency of individuals with XXY chromosomal arrangements, so be a bit skeptical if you get this result

## When you are finished

### Be sure to include the following in your report: 
<b>Methods section</b>: <br>
A description of each of the analyses that you performed. Be sure to:
- Indicate the tool(s) you used to filter and assess your data 
- Describe the approach that you used to calculate R<sub>Y</sub>.
- <i>Optionally, if you implemented the approach from Anastasiadou et al 2024 [3], describe how you carried this out as well</i>

<b>Results section</b>: <br>
Be sure to include the following in your results section:
- A table showing the number of reads that mapped to each chromosome (X and Y at a minimum, but include counts for chromosomes 1-22 if you calcualted those). 
- The sex chromosome ratio that you calculated (R<sub>Y</sub>) and the threshold that you used to determine chromosomal sex based on this ratio. 
- <i>Optionally, if you implemented the approach from Anastasiadou et al 2024 [3], include a table that shows the output of that analysis (i.e. one that includes N<sub>a</sub>, R<sub>x</sub> and R<sub>y</sub> along with the chromosomal sex estimation).

<b>Conclusion section</b>: <br>
Be sure to address the following, based on the results of your analyses: 
- Based on the sex chromosome ratio (and optionally on the result from Anastasiadou et al 2024), what do you think the chromosomal sex of your individual is?
- Are you confident in this chromosomal sex estimation, why or why not? (For this question, focus solely on whether you think you can accurately estimate chromosomal sex for your mystery genome (and what factors might impact your ability to do so), not about the complexities surrounding determining sex and gender more broadly).


### Additional Questions to answer at the end of your report: 
1) What are some reasons why you do not expect the ratio you calculated to exactly equal either 0 or 0.5? Remember - the reasons are different for each value, so be sure to provide separate explainations for both.  



## References

1) Skoglund, Pontus, et al. "Accurate sex identification of ancient human remains using DNA shotgun sequencing." Journal of archaeological Science 40.12 (2013): 4477-4482.
2) Danecek, Petr, et al. "Twelve years of SAMtools and BCFtools." Gigascience 10.2 (2021): giab008. https://doi.org/10.1093/gigascience/giab008
3) Anastasiadou, Kyriaki, et al. "Detection of chromosomal aneuploidy in ancient genomes." Communications Biology 7.1 (2024): 14.