## Counting DNA Nucleotides

### Problem
A string is simply an ordered collection of symbols selected from some alphabet and formed into a word; the length of a string is the number of symbols that it contains.

An example of a length 21 DNA string (whose alphabet contains the symbols 'A', 'C', 'G', and 'T') is "ATGCTTCAGAAAGGTCTTACG."

**Given: A DNA string s of length at most 1000 nt**

**Return: Four integers (separated by spaces) counting the respective number of times that the symbols 'A', 'C', 'G', and 'T' occur in s**

##### Sample Dataset
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC

##### Sample Output
20 12 17 21

#### Solution

In [16]:
codes="AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC"
code_dict={'A':0,'C':0,'G':0,'T':0} #initialize the dictionary
for i in codes:
    code_dict[i]+=1 #calculate the frequency of each of the nucleotides

for values in code_dict.values():
    print(values,end=' ') #print the nucleotides in a single line

20 12 17 21 

## Transcribing DNA into RNA

### Problem
An RNA string is a string formed from the alphabet containing 'A', 'C', 'G', and 'U'.

Given a DNA string t
 corresponding to a coding strand, its transcribed RNA string u
 is formed by replacing all occurrences of 'T' in t
 with 'U' in u.

**Given: A DNA string t having length at most 1000 nt**

**Return: The transcribed RNA string of t.**

##### Sample Dataset
GATGGAACTTGACTACGTAAATT

##### Sample Output
GAUGGAACUUGACUACGUAAAUU

#### Solution

In [47]:
dna="GATGGAACTTGACTACGTAAATT"
rna=[i.replace('T','U') for i in dna] #replace T with U in the dna string
print(*rna,sep='') #you can also use ''.join(rna) and it will print the same result

GAUGGAACUUGACUACGUAAAUU


## Complementing a Strand of DNA

### Problem 
In DNA strings, symbols 'A' and 'T' are complements of each other, as are 'C' and 'G'.

The reverse complement of a DNA string s
 is the string sc
 formed by reversing the symbols of s
, then taking the complement of each symbol (e.g., the reverse complement of "GTCA" is "TGAC").

**Given: A DNA string s of length at most 1000 bp**.

**Return: The reverse complement sc of s.**

##### Sample Dataset
AAAACCCGGT

##### Sample Output
ACCGGGTTTT

#### Solution

In [46]:
sample = "AAAACCCGGT"
complement_template = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}

#the list comprehension below reverses the sample, and returns the complement of each nucleotide
reverse_complement = ''.join([complement_template[code] for code in reversed(sample)])

print(reverse_complement)

ACCGGGTTTT


## Computing GC Content
### Problem
The GC-content of a DNA string is given by the percentage of symbols in the string that are 'C' or 'G'. For example, the GC-content of "AGCTATAG" is 37.5%. Note that the reverse complement of any DNA string has the same GC-content.

DNA strings must be labeled when they are consolidated into a database. A commonly used method of string labeling is called FASTA format. In this format, the string is introduced by a line that begins with '>', followed by some labeling information. Subsequent lines contain the string itself; the first line to begin with '>' indicates the label of the next string.

In Rosalind's implementation, a string in FASTA format will be labeled by the ID "Rosalind_xxxx", where "xxxx" denotes a four-digit code between 0000 and 9999.

**Given: At most 10 DNA strings in FASTA format (of length at most 1 kbp each)**.

**Return: The ID of the string having the highest GC-content, followed by the GC-content of that string. Rosalind allows for a default error of 0.001 in all decimal answers unless otherwise stated; please see the note on absolute error below**

##### Sample Dataset
>Rosalind_6404
CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCC
TCCCACTAATAATTCTGAGG
>Rosalind_5959
CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCAGGCGCTCCGCCGAAGGTCT
ATATCCATTTGTCAGCAGACACGC
>Rosalind_0808
CCACCCTCGTGGTATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGACCAGCCCGGAC
TGGGAACCTGCGGGCAGTAGGTGGAAT

##### Sample Output
Rosalind_0808
60.919540

#### Solution

In [58]:
sample_dataset={'Rosalind_6404':'CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCCTCCCACTAATAATTCTGAGG',
                'Rosalind_5959':'CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCAGGCGCTCCGCCGAAGGTCTATATCCATTTGTCAGCAGACACGC',
                'Rosalind_0808':'CCACCCTCGTGGTATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGACCAGCCCGGACTGGGAACCTGCGGGCAGTAGGTGGAAT'}

keys=sample_dataset.keys()
for value in keys:
    sample_dataset[value]

CCTGCGGAAGATCGGCACTAGAATAGCCAGAACCGTTTCTCTGAGGCTTCCGGCCTTCCCTCCCACTAATAATTCTGAGG
CCATCGGTAGCGCATCCTTAGTCCAATTAAGTCCCTATCCAGGCGCTCCGCCGAAGGTCTATATCCATTTGTCAGCAGACACGC
CCACCCTCGTGGTATGGCTAGGCATTCAGGAACCGGAGAACGCTTCAGACCAGCCCGGACTGGGAACCTGCGGGCAGTAGGTGGAAT


In [56]:
g

3

### Mendel's First law

In [4]:
#USING ![name](https)

![Mendel](https://rosalind.info/media/problems/iprb/balls_tree.png)

In [None]:
#using html tag <img src ='https' style='height: 300px' />

<img src= "https://rosalind.info/media/problems/iprb/balls_tree.png" style ="height: 200px"/>

### Problem
Given: Three positive integers k, m, and n, representing a population containing k+m+n organisms: k individuals 
are homozygous dominant for a factor, m are heterozygous, and n are homozygous recessive.

<br/> Return: The probability that two randomly selected mating organisms will produce an individual possessing a dominant allele (and thus displaying the dominant phenotype). 
<br/> Assume that any two organisms can mate.

#### Solution

In [1]:
def probability (k,m,n):
    summation=k+m+n
    pr_k=k/summation
    pr_m=m/summation
    pr_n=n/summation
    
    pr_k_k=pr_k * ((k-1)/(summation-1))*1
    pr_k_m=pr_k * ((m)/(summation-1))*1
    pr_k_n=pr_k * ((n)/(summation-1))*1
    
    pr_m_m=pr_m * ((m-1)/(summation-1))*.75
    pr_m_k=pr_m * ((k)/(summation-1))*1
    pr_m_n=pr_m * ((n)/(summation-1))*.5
    
    pr_n_n=pr_n * ((n-1)/(summation-1))*0
    pr_n_m=pr_n * ((m)/(summation-1))*.5
    pr_n_k=pr_n * ((k)/(summation-1))*1
    
    
    answer=pr_k_k + pr_k_m + pr_k_n + pr_m_m + pr_m_k + pr_m_n + pr_n_n + pr_n_m + pr_n_k
    return answer

In [30]:
probability(2,2,2)

0.7833333333333333

#### Test data set

In [35]:
probability(22,19,21)

0.7608408249603387

In [31]:
import requests

f=requests.get('https://rosalind.info/problems/iprb/dataset/')
with open ('probability.txt','wb+') as file:
    file.write(f.content)