# Day 2

Today, we will start using nf-core pipelines to find differentially abundant genes in our dataset. 
We are using data from the following paper: https://www.nature.com/articles/s41593-023-01350-3#Sec10

1. Please take some time to read through the paper and understand their approach, hypotheses and goals.

What was the objective of the study?

To understand the effects of long-term opioid therapy and addiction in patients with chonic pain, as well as to explore treatment with the HDAC1/HDAC2 inhibitor in individuals with opioid withdrawal.

What do the conditions mean?

oxy: Oxycodone (Mice treated with oxycodone injections)


sal: Saline (Control group)

What do the genotypes mean?

SNI: Spared nerve injury surgery (removal of 1-2 mm of the sciatic nerve and the hind leg)


Sham: Sham mice underwent the same incision to the hind leg, but the nerve was left untouched (control group)

Imagine you are the bioinformatician in the group who conducted this study. They hand you the raw files and ask you to analyze them.

What would you do?

Which groups would you compare to each other?

Please also mention which outcome you would expect to see from each comparison.

I would use differential expression analysis to compare each group (sham-oxy, sni-sal and sni-oxy) to the control (sham-sal) and potentially the groups sni-oxy vs sni-sal, for a more direct analysis the effects of oxycodone on individuals with cronic nerve injury. 

From the comparison of sni-sal vs sham-sal mice, I would expect symptoms of prolonged nerve damage, such as inflammatory smptoms. Additionally, I would expect a reduction in social behavior due to chronic pain. 

From the comparison of sham-oxy vs sham-sal mice, I would expect to find "regular" withdrawal symptoms, potentially manifesting in transcriptional changes. In addition, I might expect to see a reduction in social behavior during the withdrawal period, that returns back to normal once the physical withdrawal symptoms recede. The same differences may occur in the comparison between sni-oxy and sni-sal mice, although symptoms may be more pronounced due to the additional burden of chronic nerve injury. 

From the comparison of sni-oxy vs sham-sal mice, I would expect a combination of the above symptoms. Potentially, additional symptom may arise from the combination of conditions. Similarly to the two other states, there might be social withdrawal symptoms. 



Your group gave you a very suboptimal excel sheet (conditions_runs_oxy_project.xlsx) to get the information you need for each run they uploaded to the SRA.<br>
So, instead of directly diving into downloading the data and starting the analysis, you first need to sort the lazy table.<br>
Use Python and Pandas to get the table into a more sensible order.<br>
Then, perform some overview analysis and plot the results
1. How many samples do you have per condition?
2. How many samples do you have per genotype?
3. How often do you have each condition per genotype?

1. 8 samples per condition 
2. 8 samples per genotype 
3. 4 samples per condition per genotype

In [1]:
import csv
import pandas as pd
import openpyxl

In [2]:
base_counts = pd.read_csv("base_counts.csv", index_col = "Run")
conditions_data = pd.read_excel("conditions_runs_oxy_project.xlsx", index_col="Run")

In [3]:
conditions_data["Bases"] = base_counts.Bases

In [4]:
print(conditions_data)

            Patient RNA-seq  DNA-seq condition: Sal Condition: Oxy  \
Run                                                                  
SRR23195505       ?       x      NaN              x            NaN   
SRR23195506       ?       x      NaN            NaN              x   
SRR23195507       ?       x      NaN              x            NaN   
SRR23195508       ?       x      NaN            NaN              x   
SRR23195509       ?       x      NaN            NaN              x   
SRR23195510       ?       x      NaN              x            NaN   
SRR23195511       ?       x      NaN            NaN              x   
SRR23195512       ?       x      NaN              x            NaN   
SRR23195513       ?       x      NaN              x            NaN   
SRR23195514       ?       x      NaN            NaN              x   
SRR23195515       ?       x      NaN              x            NaN   
SRR23195516       ?       x      NaN            NaN              x   
SRR23195517       ? 

In [5]:
df = conditions_data
df = df.fillna(False)
df = df.replace("x", True)
df.sort_values(by="Bases")
df.to_csv("conditions.csv")

  df = df.fillna(False)
  df = df.replace("x", True)


They were so kind to also provide you with the information of the number of bases per run, so that you can know how much space the data will take on your Cluster.<br>
Add a new column to your fancy table with this information (base_counts.csv) and sort your dataframe according to this information and the condition.

Then select the 2 smallest runs from your dataset and download them from SRA (maybe an nf-core pipeline can help here?...)

In [6]:
#Runs selected: SRR23195516 and SRR2319551

!nextflow run nf-core/fetchngs --input ./ids.csv --outdir ./results -profile docker

[33mNextflow 25.04.8 is available - Please consider updating your version to it[m

[1m[38;5;232m[48;5;43m N E X T F L O W [0;2m  ~  [mversion 25.04.7[m
[K
Launching[35m `https://github.com/nf-core/fetchngs` [0;2m[[0;1;36mbig_edison[0;2m] DSL2 - [36mrevision: [0;36m8ec2d934f9 [master][m
[K
[33mWARN: Access to undefined parameter `monochromeLogs` -- Initialise it to a default value eg. `params.monochromeLogs = some_value`[39m[K


-[2m----------------------------------------------------[0m-
                                        [0;32m,--.[0;30m/[0;32m,-.[0m
[0;34m        ___     __   __   __   ___     [0;32m/,-._.--~'[0m
[0;34m  |\ | |__  __ /  ` /  \ |__) |__         [0;33m}  {[0m
[0;34m  | \| |       \__, \__/ |  \ |___     [0;32m\`-._,-`-,[0m
                                        [0;32m`._,._,'[0m
[0;35m  nf-core/fetchngs v1.12.0-g8ec2d93[0m
-[2m----------------------------------------------------[0m-
[1mCore Nextflow options[0m
  [0;34mr

While your files are downloading, get back to the paper and explain how you would try to reproduce the analysis.<br>
When you are done with this shout, so we can discuss the different ideas.

By reading the methods part and trying to recreate their exact approach - however, their methods part is not very detailed/specific