## Plotting in Python

Matplotlib is a Python 2D plotting library which can produces publication quality figures.

You can generate plots, histograms, power spectra, bar charts, errorcharts, scatterplots, etc, with just a few lines of code.

Check the link for more details: http://matplotlib.org/


-----------------------------------------------------------------------------------------

- Use the following line to allow the plots to be displayed as part of the jupyter notebook:
```    %matplotlib inline ```
- After this, we need to import the plot module from matplotlib
```python
  %matplotlib inline
  import matplotlib.pyplot as plt
```
- We can then plot a line by first inputting an array of x-coordinates and  an array of y-coordinates
``` plt.plot([1,2,3,4],[2,4,6,12]) ```
- Then we show the plot by ```plt.show()```

In [None]:
# The following line to allows the plots to be displayed as part of the jupyter notebook
%matplotlib inline

# Import matplotlib's function pyplot to make simple plots
import matplotlib.pyplot as plt # To avoid using big names rename the module as plt

plt.plot([1,2,3,4],[2,4,6,12]) 
plt.show()


In [None]:

# In order to save the plot in the folder in which the script is located, an additional command is needed:

plt.plot([1,2,3,4],[2,4,6,12]) 
plt.savefig('test_plot.png', transparent=False) #save figure!
plt.show()

# Plotting

- There are a lot of other functions you can use with the plot
- You can add a third parameter to the ```plot``` function to label the line.
    - ```plt.plot([1,2,3,4],[2,4,6,12],label='Series1') ```
- You can also give labels to the x and y axis
```python
plt.xlabel("Time")
plt.ylabel("Fluorescence")
```
- You can name the plot with a title
```python
plt.title("Plot")
```
### Check the link for plotting: http://matplotlib.org/users/pyplot_tutorial.html

In [None]:
# Import matplotlib's function pyplot to make simple plots
%matplotlib inline 
import matplotlib.pyplot as plt

# Plot a line
plt.plot([1,2,3,4],[2,4,6,12],label='Series1') 

# Add Label for the X axis 
plt.xlabel('X') 

# Add Label for the Y axis 
plt.ylabel('Y') 

# Add Title to the plot 
plt.title('Test Plot') 

plt.show()

In [None]:
# Import matplotlib's function pyplot to make simple plots

import matplotlib.pyplot as plt

# Plot a line
plt.plot([1,2,3,4],[2,4,6,12],label='S1', marker='x', linestyle='-', color='b') 
# you can change several parameters.
plt.plot([1,2,3,4],[1,4,9,16],label='S2', marker='o', linestyle='--', color='r')
plt.plot([1,2,3,4],[3,0,20,9],label='S3', marker='.', linestyle='-.', color='g')
plt.legend()

# Add Label for the X axis 
plt.xlabel('A-axis') 

# Add Label for the Y axis 
plt.ylabel('B-axis') 

# Add Title to the plot 
plt.title('Nice Figure') 

plt.show()

## Line Plots: Exercise

1. Import the ```matplotlib.pyplot``` module
2. Import the ```math``` module
3. Create three new empty variable lists
    - List for x values
    - List for sin(x) values
    - List for cos(x) values
4. Create a for loop creating 60 data points with 0.1 difference between the values (e.g. 0.1, 0.2, 0.3, 0.4 etc). 
5. For each points, add the value to the list, the sin value to that list and the cosine value to that list
6. Plot the x vs cos line and the x vs sin line using the three lists

In [None]:
#1. Import matplotlib's function pyplot to make simple plots
#2. Import math module to access sin and cosine functions
# 1.
import matplotlib.pyplot as plt
# 2.
import math

In [None]:
#3. Create three new empty variable lists to store x values, sin(x) values and cos(x) values
x=[] # Empty list to store x axis values
cos_y=[] # Empty list to store cosine values
sin_y=[] # Empty list to store sine values

In [None]:
#4. Create a for loop creating 60 data points with 0.1 difference between the values (e.g. 0.1, 0.2, 0.3, 0.4 etc). 
x = 0
x_list = []
for i in range(60):
    x += 0.1
    x_list += [x]
print (x)

In [None]:
#5 For each point, add the value to the x list, the sin value to the sin list and the cosine value to the cos list
cos_y = []
for i in x_list:
    cos_y += [math.cos(i)] 

sin_y = []
for i in x_list:
    sin_y += [math.sin(i)] 
print (sin_y)

In [None]:
#6. Plot the x vs cos line and the x vs sin line using the three lists
plt.plot(x_list, cos_y, label='cos', marker='s', color='b') 
plt.plot(x_list, sin_y, label='sin', marker='o', linestyle='--', color='r') 
plt.legend() 

plt.xlabel('cos x') 
plt.ylabel('cos y') 
plt.title('Sin-Cos Plot') 

plt.show()

# Histograms
- Besides lines, we can also create histograms with matplotlib
- An histogram is created by using ```plt.hist([1,2,3,4,5,5,5,5,6,7])```
        or ```plt.hist([[1,2,2,2,3,3,3,4,5,6,7],[5,6,5,6,5,6,7,8,2]])```
        
- The input is a single array with data, list or list of lists
- You can vary the number of bins and see what will happen
- You can use the same functions as the plot for this
    - ```python
    plt.xlabel("X")
    plt.ylabel("Y")
    plt.title("Normal distribution")
    ```

In [None]:
import matplotlib.pyplot as plt
import random 

#Create an empty list to store data 
data=[] 

#Generate 500 data values 
for i in range(500): 
    data += [random.normalvariate(10, 3)] 

#Plot a histogram 
plt.hist(data,bins=30) 

#Set X and Y labels and plot title 
plt.xlabel('Values') 
plt.ylabel('Frequency') 
plt.title('Normal Distribution') 

plt.show()

In [None]:
#Create three empty lists to store data 
set1 = []
set2 = []
set3 = []


#Generate 3 sets with 1000 data values 
for i in range(1000): 
    set1 += [random.normalvariate(10, 3)] 
    set2 += [random.normalvariate(11, 1)]   
    set3 += [random.normalvariate(2, 7)] 
    
## when we make a list of lists the plt.hist function will plot three separate histograms in one panel    
moredata=[set1,set2,set3] 

plt.hist(moredata,bins=100) ## change the bins in to a different number
plt.legend(["1","2","3"])
plt.show()

  
## Plotting: Exercise 2 (Assignment type)

1. Import the random and dna_tools module
- Create 2 new variables with random DNA nucleotides of length 500.
    - Hint: use the random module and for loop
- Use your own dna_tools module to count nucleotide usage (A,T,G, 	and C) in seq_r1 and seq_r2. 
- Make a line plot to display the nucleotide usage. 
- Use different markers and labels for the two different sequences.


In [None]:
#1. Import the random and dna_tools module
import random
from dna_tools import get_counts
import matplotlib.pyplot as plt


In [None]:
#2. Create 2 new variables with random DNA nucleotides of length 500.
random_seq1 = '' 
random_seq2 = '' 

for i in range(500): 
    random_seq1 += random.choice('ATGC') 
    random_seq2 += random.choice('ATGC') 

In [None]:
#3. Use your own dna_tools module to count nucleotide usage (A,T,G, and C) in random sequence 1 and random sequence 2
# also print the result
def get_counts(variable1):
    codon_counts_mono = {}
    for codon2_start in range(0,len(variable1),1):
        codon2 = variable1[codon2_start:(codon2_start+1)]
        if codon2 in codon_counts_mono:
            codon_counts_mono[codon2] = codon_counts_mono[codon2] + 1
        else:
            codon_counts_mono[codon2] = 1 #{"count":1, "start_nuc":codon2[0], "length": len(codon2)}
    return (codon_counts_mono)

random_seq1_counts = get_counts(random_seq1)
random_seq2_counts = get_counts(random_seq2)

basepairs = list(random_seq1_counts.keys()) # extract the keys and force them in a list
basepairs_sorted = sorted(basepairs) ## sort the list so the order will always be A, C, G, T
rand1_num = []  ## empty list for the random sequence 1 counts
rand2_num = []  ## empty list for the random sequence 2 counts
for bp in basepairs_sorted: # loop over the basepair letters
    rand1_num += [random_seq1_counts[bp]] ## get counts from dictionary
    rand2_num += [random_seq2_counts[bp]]

print (basepairs, rand1_num, random_seq1_counts)
print (basepairs, rand2_num)

In [None]:
#4. Show the nucleotide usage in a line plot
# do the values match those in Q3?


In [None]:
#5. Use different markers and labels for the two different sequences.
plt.xticks([1,2,3,4],basepairs_sorted) 
plt.plot([1,2,3,4], list(random_seq1_counts.values()),linestyle = "dashed", marker='s', label='seq_1dict_not_ok') ##!!!! wrong way of ploting, order of dict can be anything 
plt.plot([1,2,3,4], list(random_seq2_counts.values()),linestyle = "dashed", marker='o', label='seq_2dict_not_ok') ##!!!! wrong way, compare with the two lines below
plt.plot([1,2,3,4], rand1_num, marker='s', label='seq_1') ## plot the line
plt.plot([1,2,3,4], rand2_num, marker='o', label='seq_2') ## plot the line
plt.legend()
plt.show()

In [None]:
## maybe a barplot is a better way of visualising this data
plt.xticks([1,2,3,4],basepairs) 
plt.bar(x = [0.8,1.8,2.8,3.8], height = rand1_num, width = [0.4]*4, label='seq_1',edgecolor ="magenta",linewidth = 2)
plt.bar(x = [1.2,2.2,3.2,4.2], height = rand2_num, width = [0.4]*4 ,label='seq_2')

plt.legend()
plt.show()


## Plotting - Exercise 3 (Assignment type)
1. Import the random and dna_tools modules
- Generate 100 random DNA sequences of length 500. 
    - Store these sequences in a list
- Plot a histogram for 'A' nucleotide usage in the 100 random DNA 	sequences. 
- Add histograms of other nucleotide usage in the same histogram.

In [None]:
#1. Import the random and dna_tools modules
import random
from dna_tools import get_counts
import matplotlib.pyplot as plt

In [None]:
#2. Generate 100 random DNA sequences of length 500. Store these sequences in a list
seqs = [None]*100 

for n in range(100):
    seqs[n] = ''
    for i in range(500):
        seqs[n] += random.choice('ATGC')

In [None]:
#3. Plot a histogram for 'A' nucleotide usage in the 100 random DNA sequences. 


In [None]:
#4. Add histograms of other nucleotide usage in the same histogram.
labels = ['A','C','G','T'] 
colors = ['r','g','b','y'] 
counts = [[],[],[],[]]  ##!!! al list of list will make matplotkib automatically format your histgram nicely.

# Use for loop to count bases in each sequence
for seq in seqs:
    counts[0]+= [(seq.count('A'))]
    counts[1].append(seq.count('C'))
    counts[2].append(seq.count('G'))
    counts[3].append(seq.count('T'))

# Plot the results in the form of a histogram
plt.hist(counts, bins=20, label=["A","C","T","G"]) 
plt.style.use('seaborn-colorblind')
plt.xlabel('Values') 
plt.ylabel('Frequency') 
plt.title('Nucleotide Counts') 
plt.legend(loc='upper right')
plt.show()

In [None]:
# Alternative

seqs = []
for i in range(100):
    random_seq = ''
    for i in range(500):
        random_seq += random.choice("ATGC")
    seqs.append(random_seq)

labels = ['A','C','G','T'] 
colors = ['r','g','b','y'] 

A = []
G = []
T = []
C = []

for seq in seqs:
    counts = get_counts(seq,1)
    A.append(counts['A'])
    G.append(counts['G'])
    T.append(counts['T'])
    C.append(counts['C'])
print (A)
plt.figure()
plt.hist(A, bins=100)

plt.figure()
plt.hist([A,C,G,T], bins=15,label=labels, color=colors)
plt.legend()
plt.show()
