# Jupyter notebook for Unique/Common pulls
This notebook will walk you through how to pull out the common or unique genes to a specific conditions. 
*One note- watch out for paths! I used where I have my file stored, adjust to yours as needed.*

## Import libraries
First step is to import the libraries that are necessary to perform the analysis

In [None]:
import csv, os

## Comparing the samples
The first step will be to compare the samples and identify what is common and unique to each sample. 

### Define functions
You define functions for things you would re-use in the script downstream. Alternatively, you could put them in a different python file and import them for use. 

#### Counter function
This function will allow you to parse through a csv file and pull out just the list of genes that are significantly expressed (p<0.05). You can tweak this to include other filter parameters as well, such as log2FC > 1. 

In [None]:
def counter(item):
    genes=[] #Empty list to populate later
    with open(item) as chart:
        reader=csv.DictReader(chart, delimiter=',') #Open the CSV file as a dictionary to parse the columns
        for row in reader: #Search through each row to then
           if str(row['padj']) != 'NA': #If it has a P value
               if float(row['padj']) <0.05: #If its significant. You can also use: if float(row['padj']) < 0.05 and float(row['log2fc']) > 1: as example
                   genes.append(row['GeneID']) #Add that significant gene to the list
    chart.close()
    return genes #Return the list of signficant genes. 

#### Crossing functions
These functions will allow you to cross different gene lists to see what is common between them. 

In [None]:
def crosser2(itemlist):
    common2=[]#Empty list to populate later
    for gene in itemlist[0]: #For each gene that is in the first list
        if gene in itemlist[1]: #If it is in the second list
            common2.append(gene) #Add it to our originally empty list as a common gene!
    return common2 #Returns the list of common genes

#### Unique functions
These functions will allow you to cross different gene lists to see what is unique for each one. 

In [None]:
def unique2(itemlist):
    unique1=[] #Empty list to populate later
    unique2=[] #Empty list to populate later
    for gene in itemlist[0]: #For each gene in the first list
        if gene not in itemlist[1]: #If it is not also in the second list
            unique1.append(gene) #Awesome, its unique to the first list to add it to our originally empty list1
    for gene in itemlist[1]: #For each gene in the second list
        if gene not in itemlist[0]:#If it is not also in the first list
            unique2.append(gene) #Awesome, its unique to the second list to add it to our originally empty list2
    return unique1, unique2 #Return both lists of unique genes. 

##### Homework section!
In the box below, write up a function for crosser3 and unique3 to look between three samples at a time. 

In [None]:
def crosser3():
    pass
def unique3():
    pass

##### Scalable solution
Bit of a spoiler, but this was a crosser designed for both common genes and unique genes in one shot. Requires a bit more finese but is a more scalable solution for later. Just an FYI section

In [None]:
def crosser(itemlist):
    commons=[]
    dictionary_of_uniques={}
    for i in range(0, len(itemlist)):
        dictionary_of_uniques['unique'+str(i)]=[]
    
    flat_list=[] #List of all genes
    for i in range(0,len(itemlist)):
        for gene in itemlist[i][0]:
            flat_list.append(gene)
    
    for i in range(0,len(itemlist)):
        for gene in itemlist[i][0]:
            num=flat_list.count(gene)
            if num==1:
                dictionary_of_uniques[('unique'+str(i+1))].append(gene)
                
    for gene in itemlist[0][0]:
        num=flat_list.count(gene)
        if num==len(itemlist):
            common.append(gene)
            
    return commons, dictionary_of_uniques
'''
Of note for this, I restructured to be able to keep order of samples. Each item of itemlist has:
[sample_counted,'Sample Name']
'''

### Call your functions
This is where you would start calling your functions to do the work. 
#### Counters
First you must generate the list of significant genes for each sample as compared to Controls

In [None]:
#lI will be Sample I,  l3 will be Sample 3, lI3 will be a combination treatment of I and 3. 
#The "l" standing for list of whichever sample
l3= counter('sample3.csv')
print(len(l3))
#This calls out just the counter funtion to pull the list of significant genes in the sample
lI=counter('sampleI.csv')
print(len(lI))
lI3=counter('comboI3.csv')
print(len(lI3))

#### Cross the samples
Lets start by looking at what is common between the Sample 3 and the I Samples

In [None]:
common_l3_lI= crosser2([l3,lI])
#This calls the crosser2 function established above to compare the two lists 
#and pull out the gene names that are in both!
print(common_l3_lI)
print(len(common_l3_lI))

##### Homework section!
After you've built a crosser3, cross all three samples in the box below

In [None]:
#common_all=crosser3...

#### Get the unique for each sample
After finding what is common, lets find what is unique to each one! This biologically is very important, and in this case will help us build the story.

In [None]:
unique_l3_lI=unique2([l3,lI])
#This calls the unique2 function established above. 
#Remember- this function returns 2 different lists. So you can to call them accordingly
print(unique_l3_lI[0]) #gets the first item, in this case l3
print(len(unique_l3_lI[0]))
print(unique_l3_lI[1]) #gets the second item, in this case lI
print(len(unique_l3_lI[1]))

##### Homework section!
After you've built a unique3, compare all three samples in the box below

In [None]:
#unique_of_all_l3=unique3...

## Filtering the samples based on comparisons
Now that we know what is common and unique to each sample, lets filter our original files!

### Define new functions
Since we will be doing something new, we are going to need some new functions. 

#### Filtering function
This function will use an input list of genes and then the file to be filtered. It will then search through the file for the genes, and write a new file contained just those genes. 

In [None]:
def filterer(genelist_for_filtering, original_file, new_file_name):
    with open(original_file) as chart: #first we need to open the file
        reader=csv.DictReader(chart, delimiter=',')
        headers=[]#create an empty list for headers. We need to pull them from the original file first
        heads=reader.fieldnames#pulls the header names
        for item in heads: 
            headers.append(item)#adds each one to the headers list. You may ask why I did it this way. 
            #So if we need to add extra columns while we filter, all we have to do is add another line
        #like : headers.append('NEW COLUMN NAME') for later use
        
        with open(new_file_name, 'w', newline='\n') as output: #This initializes the new file we will be writing to
            wr=csv.writer(output, quoting=csv.QUOTE_ALL, delimiter=',')#and sets up the writer. 
            wr.writerow(headers)#This writes the headers that we built before
            for row in reader:#This is the original file still
                writelist=[]#An empty list again... I must really like those
                for gene in genelist_for_filtering:# for each gene that is in our filter list
                    if row['geneID']==gene:# if it matches with the row in question. AKA its a gene we want
                        for field in reader.fieldnames:#This is like doing row['geneID'] etc for each column name
                            writelist.append(str(row[field]))#And add it to our writing list
                            #If we wanted to add anything else per row we can do it here
                            #writelist.append(new_column_data)
                            wr.writerow(writelist)#Now we write that row
                    else:#If the gene isnt the same as the geneID
                        pass#To go back to the loop
            output.close()
        chart.close()

*Note* This wont work perfectly the way you want it to. There is a specific error in it. See if you can find it in the output files and fix it. 
Hint: It involves writing out your rows. 

### Call the functions
Now lets call the filtering function to create a new list that is just the genes unique to l3

In [None]:
filterer(unique_l3_lI[0], 'sample3.csv','Unique_3.csv')

##### Homework section!
Filter the sheets based on what is biologically relevant and important for the next step of analysis.

In [None]:
#filterer()

## Bonus section!
A huge step in many analyses is visualization. So here is a challenge. 
Using the lists above, make a Venn diagram. To start:

In [None]:
import matplotlib.pyplot as plt
from matplotlib_venn import venn3

##Make a plot!