## Incomplete Charactersclick to collapse
The modern revolution in genome sequencing has produced a huge amount of genetic data for a wide variety of species. One ultimate goal of possessing all this information is to be able to construct complete phylogenies via direct genome analysis.

For example, say that we have a gene shared by a number of taxa. We could create a character based on whether species are known to possess the gene or not, and then use a huge character table to construct our desired phylogeny. However, the present bottleneck with such a method is that it assumes that we already possess complete genome information for all possible species. The race is on to sequence as many species genomes as possible; for instance, the Genome 10K Project aims to sequence 10,000 species genomes over the next decade. Yet for the time being, possessing a complete genomic picture of all Earth's species remains a dream.

As a result of these practical limitations, we need to be able to work with partial characters, which divide taxa into three separate groups: those possessing the character, those not possessing the character, and those for which we do not yet have conclusive information.

## Problem
A partial split of a set S
 of n
 taxa models a partial character and is denoted by A∣B
, where A
 and B
 are still the two disjoint subsets of taxa divided by the character. Unlike in the case of splits, we do not necessarily require that A∪B=S
; (A∪B)c
 corresponds to those taxa for which we lack conclusive evidence regarding the character.

We can assemble a collection of partial characters into a generalized partial character table C
 in which the symbol x
 is placed in Ci,j
 if we do not have conclusive evidence regarding the j
th taxon with respect to the i
th partial character.

A quartet is a partial split A∣B
 in which both A
 and B
 contain precisely two elements. For the sake of simplicity, we often will consider quartets instead of partial characters. We say that a quartet A∣B
 is inferred from a partial split C∣D
 if A⊆C
 and B⊆D
 (or equivalently A⊆D
 and B⊆C
). For example, {1,3}∣{2,4}
 and {3,5}∣{2,4}
 can be inferred from {1,3,5}∣{2,4}
.

**Given**: A partial character table C
.

**Return**: The collection of all quartets that can be inferred from the splits corresponding to the underlying characters of C
.

# Sample Dataset
cat dog elephant ostrich mouse rabbit robot
01xxx00
x11xx00
111x00x
Sample Output
{elephant, dog} {rabbit, robot}
{cat, dog} {mouse, rabbit}
{mouse, rabbit} {cat, elephant}
{dog, elephant} {mouse, rabbit}

In [77]:
path = r"D:/rosalind_q_test.txt"
path = r"D:/rosalind_qrt.txt"
with open (path,'r') as f:
    rf=f.readlines()
print(rf)
S = rf[0].replace('\n','').split(' ')
PCT = [item.replace('\n','') for item in rf[1:]]
print(S)
print(PCT)

['Allactaga_valliceps Ameiva_indica Bubulcus_multifasciata Buthus_rubida Crocodylus_galactonotus Eudrornias_aristotelis Larus_ladogensis Margaritifera_euptilura Minipterus_merganser Monticola_mykiss Panthera_rufinus Porphyrio_vulpes Rangifer_porphyrio Rhacophorus_capreolus Rhynchaspis_crassicauda Rosalia_docilis Spizaetus_collaris\n', '10111x11x01x01x11\n', '0100110101101x100\n', 'xxxxxxxxx1xxxx10x\n', 'x1xx00x000001xx0x\n', 'x0xxx11xxx1xxxxx1\n', 'xxxx1xxxxxx0xxxxx\n', 'x1xxxxx11xxxxx10x\n', 'xx0xxxxxxxxxxxxx1\n', '01011xx101111x1xx\n', '10x100x010110x011\n', 'xx001xxx0xx0000x0\n', '11101xx1xxx01110x\n', '0xx0xxx1x1x0xxxxx\n', '0x0x1xxxxxx1xxxxx\n']
['Allactaga_valliceps', 'Ameiva_indica', 'Bubulcus_multifasciata', 'Buthus_rubida', 'Crocodylus_galactonotus', 'Eudrornias_aristotelis', 'Larus_ladogensis', 'Margaritifera_euptilura', 'Minipterus_merganser', 'Monticola_mykiss', 'Panthera_rufinus', 'Porphyrio_vulpes', 'Rangifer_porphyrio', 'Rhacophorus_capreolus', 'Rhynchaspis_crassicauda',

In [79]:
def makePCT0(ps):
    s0 = []
    s1 = []
    count = ps.count('1')
    for psi in range(len(ps)):
        if count <= 2:
            if ps[psi] == '0':
                s0.append(psi)
            else:
                s1.append(psi)
        else:
            if ps[psi] == '1':
                s1.append(psi)
            else:
                s0.append(psi)
    return s0,s1

S0,S1 = makePCT0(PCT[0])
print(S0)
print(S1)


[1, 5, 8, 9, 11, 12, 14]
[0, 2, 3, 4, 6, 7, 10, 13, 15, 16]


In [82]:
def makePCT(ps):
    s0 = []
    s1 = []
    s00 = []
    s11 = []
    count0 = ps.count('0')
    count1 = ps.count('1')
    for psi in range(len(ps)):
        if count0 >= 2 and count1 >= 2:
            if ps[psi] == '0':
                s0.append(psi)
                s00.append(psi)
            elif ps[psi] == '1':
                s1.append(psi)
                s11.append(psi)
            else:
                s00.append(psi)
                s11.append(psi)
    return s0,s1,s00,s11

S0,S1,S00,S11 = makePCT(PCT[0])
print(S0)
print(S1)
print(S00)
print(S11)
S0,S1,S00,S11 = makePCT(PCT[1])
print(S0)
print(S1)
print(S00)
print(S11)
S0,S1,S00,S11 = makePCT(PCT[-1])
print(S0)
print(S1)
print(S00)
print(S11)


[1, 9, 12]
[0, 2, 3, 4, 6, 7, 10, 13, 15, 16]
[1, 5, 8, 9, 11, 12, 14]
[0, 2, 3, 4, 5, 6, 7, 8, 10, 11, 13, 14, 15, 16]
[0, 2, 3, 6, 8, 11, 15, 16]
[1, 4, 5, 7, 9, 10, 12, 14]
[0, 2, 3, 6, 8, 11, 13, 15, 16]
[1, 4, 5, 7, 9, 10, 12, 13, 14]
[0, 2]
[4, 11]
[0, 1, 2, 3, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 16]
[1, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]


In [83]:
def makeQuartet(l):
    qt = []
    for i in range(len(l)):
        for j in range(i+1, len(l)):
            qt.append([l[i], l[j]])
    return qt

QT0 = makeQuartet(S0)
QT1 = makeQuartet(S1)
print(QT0)
print(QT1)

[[0, 2]]
[[4, 11]]


In [84]:
def makeQuartetS(l,s):
    qtS = []
    for i in range(len(l)):
        for j in range(i+1, len(l)):
            qtS.append([s[l[i]], s[l[j]]])
    return qtS

QTS0 = makeQuartetS(S0,S)
QTS1 = makeQuartetS(S1,S)
print(QTS0)
print(QTS1)

[['Allactaga_valliceps', 'Bubulcus_multifasciata']]
[['Crocodylus_galactonotus', 'Porphyrio_vulpes']]


In [85]:
res = []
for pct in PCT:
    S0,S1,S00,S11 = makePCT(pct)
    print('S0:',S00)
    print('S1:',S11)
    if len(S0) >= 2 and len(S1) >= 2:
        QT0 = makeQuartetS(S0,S)
        QT1 = makeQuartetS(S1,S)
        print('QT0:',QT0)
        print('QT1:',QT1)
        #[res.append({qt0,qt1}) for qt0 in QT0 for qt1 in QT1 if {qt0,qt1} not in res]
        for qt0 in QT0:
            for qt1 in QT1:
                if [qt0,qt1] not in res:
                    res.append([qt0,qt1])
        print(res)


S0: [1, 5, 8, 9, 11, 12, 14]
S1: [0, 2, 3, 4, 5, 6, 7, 8, 10, 11, 13, 14, 15, 16]
QT0: [['Ameiva_indica', 'Monticola_mykiss'], ['Ameiva_indica', 'Rangifer_porphyrio'], ['Monticola_mykiss', 'Rangifer_porphyrio']]
QT1: [['Allactaga_valliceps', 'Bubulcus_multifasciata'], ['Allactaga_valliceps', 'Buthus_rubida'], ['Allactaga_valliceps', 'Crocodylus_galactonotus'], ['Allactaga_valliceps', 'Larus_ladogensis'], ['Allactaga_valliceps', 'Margaritifera_euptilura'], ['Allactaga_valliceps', 'Panthera_rufinus'], ['Allactaga_valliceps', 'Rhacophorus_capreolus'], ['Allactaga_valliceps', 'Rosalia_docilis'], ['Allactaga_valliceps', 'Spizaetus_collaris'], ['Bubulcus_multifasciata', 'Buthus_rubida'], ['Bubulcus_multifasciata', 'Crocodylus_galactonotus'], ['Bubulcus_multifasciata', 'Larus_ladogensis'], ['Bubulcus_multifasciata', 'Margaritifera_euptilura'], ['Bubulcus_multifasciata', 'Panthera_rufinus'], ['Bubulcus_multifasciata', 'Rhacophorus_capreolus'], ['Bubulcus_multifasciata', 'Rosalia_docilis'], ['B

In [86]:
for linei in res:
    print("{"+str(linei[0][0])+", "+str(linei[0][1])+"}"+" {"+str(linei[1][0])+", "+str(linei[1][1])+"}")

{Ameiva_indica, Monticola_mykiss} {Allactaga_valliceps, Bubulcus_multifasciata}
{Ameiva_indica, Monticola_mykiss} {Allactaga_valliceps, Buthus_rubida}
{Ameiva_indica, Monticola_mykiss} {Allactaga_valliceps, Crocodylus_galactonotus}
{Ameiva_indica, Monticola_mykiss} {Allactaga_valliceps, Larus_ladogensis}
{Ameiva_indica, Monticola_mykiss} {Allactaga_valliceps, Margaritifera_euptilura}
{Ameiva_indica, Monticola_mykiss} {Allactaga_valliceps, Panthera_rufinus}
{Ameiva_indica, Monticola_mykiss} {Allactaga_valliceps, Rhacophorus_capreolus}
{Ameiva_indica, Monticola_mykiss} {Allactaga_valliceps, Rosalia_docilis}
{Ameiva_indica, Monticola_mykiss} {Allactaga_valliceps, Spizaetus_collaris}
{Ameiva_indica, Monticola_mykiss} {Bubulcus_multifasciata, Buthus_rubida}
{Ameiva_indica, Monticola_mykiss} {Bubulcus_multifasciata, Crocodylus_galactonotus}
{Ameiva_indica, Monticola_mykiss} {Bubulcus_multifasciata, Larus_ladogensis}
{Ameiva_indica, Monticola_mykiss} {Bubulcus_multifasciata, Margaritifera_eup

In [87]:
import re

# 用来两两组合的函数
def combination(loc, dataset):
    i = 0
    com = []
    while i <= len(loc):
        j = i + 1
        while j < len(loc):
            tep = []
            tep.append(dataset[loc[i]])
            tep.append(dataset[loc[j]])
            j = j + 1
            com.append(tep)
        i = i + 1
    return com

result = []
for k in PCT:
    # 统计“1”和“0”的数目
    count1 = len(re.findall('0', k))
    count2 = len(re.findall('1', k))
    if count1 < 2 or count2 < 2:
        continue  # 假如“0”或“1”的数目小于2，就没有继续分割的必要了
    else:
        # 记录“1”和“0”的位置
        loc0 = []
        for pos, cha in enumerate(k):
            if(cha == '1'):
                loc0.append(pos)
        loc1 = []
        for pos, cha in enumerate(k):
            if(cha == '0'):
                loc1.append(pos)

        # 将位置两两组合
        com0 = combination(loc0, S)
        com1 = combination(loc1, S)

        # 再将组合后的结果再两两组合，形成quartet
        i = 0
        com = []
        while i < len(com0):
            j = 0
            while j < len(com1):
                tep = []
                tep.append(com0[i])
                tep.append(com1[j])
                j = j + 1
                com.append(tep)
            i = i + 1
        result.append(com)


# 将重复的数据删除，注意有些数据只是顺序有颠倒，所以用tep记录并检查一遍
uniresult = []
for i in result:
    for j in i:
        tep = []
        tep.append(j[1])
        tep.append(j[0])
        if j not in uniresult and tep not in uniresult:
            uniresult.append(j)

'''# 按要求输出数据到txt文件
f = open(r'D:\output.txt', 'a')
for j in uniresult:
    f.write('{')
    f.write(j[0][0])
    f.write(', ')
    f.write(j[0][1])
    f.write('}')
    f.write(' {')
    f.write(j[1][0])
    f.write(', ')
    f.write(j[1][1])
    f.write('}')
    f.write('\n')'''

print(uniresult)


[[['Allactaga_valliceps', 'Bubulcus_multifasciata'], ['Ameiva_indica', 'Monticola_mykiss']], [['Allactaga_valliceps', 'Bubulcus_multifasciata'], ['Ameiva_indica', 'Rangifer_porphyrio']], [['Allactaga_valliceps', 'Bubulcus_multifasciata'], ['Monticola_mykiss', 'Rangifer_porphyrio']], [['Allactaga_valliceps', 'Buthus_rubida'], ['Ameiva_indica', 'Monticola_mykiss']], [['Allactaga_valliceps', 'Buthus_rubida'], ['Ameiva_indica', 'Rangifer_porphyrio']], [['Allactaga_valliceps', 'Buthus_rubida'], ['Monticola_mykiss', 'Rangifer_porphyrio']], [['Allactaga_valliceps', 'Crocodylus_galactonotus'], ['Ameiva_indica', 'Monticola_mykiss']], [['Allactaga_valliceps', 'Crocodylus_galactonotus'], ['Ameiva_indica', 'Rangifer_porphyrio']], [['Allactaga_valliceps', 'Crocodylus_galactonotus'], ['Monticola_mykiss', 'Rangifer_porphyrio']], [['Allactaga_valliceps', 'Larus_ladogensis'], ['Ameiva_indica', 'Monticola_mykiss']], [['Allactaga_valliceps', 'Larus_ladogensis'], ['Ameiva_indica', 'Rangifer_porphyrio']], 

  '''# 按要求输出数据到txt文件


In [88]:
for linei in uniresult:
    print("{"+str(linei[0][0])+", "+str(linei[0][1])+"}"+" {"+str(linei[1][0])+", "+str(linei[1][1])+"}")

{Allactaga_valliceps, Bubulcus_multifasciata} {Ameiva_indica, Monticola_mykiss}
{Allactaga_valliceps, Bubulcus_multifasciata} {Ameiva_indica, Rangifer_porphyrio}
{Allactaga_valliceps, Bubulcus_multifasciata} {Monticola_mykiss, Rangifer_porphyrio}
{Allactaga_valliceps, Buthus_rubida} {Ameiva_indica, Monticola_mykiss}
{Allactaga_valliceps, Buthus_rubida} {Ameiva_indica, Rangifer_porphyrio}
{Allactaga_valliceps, Buthus_rubida} {Monticola_mykiss, Rangifer_porphyrio}
{Allactaga_valliceps, Crocodylus_galactonotus} {Ameiva_indica, Monticola_mykiss}
{Allactaga_valliceps, Crocodylus_galactonotus} {Ameiva_indica, Rangifer_porphyrio}
{Allactaga_valliceps, Crocodylus_galactonotus} {Monticola_mykiss, Rangifer_porphyrio}
{Allactaga_valliceps, Larus_ladogensis} {Ameiva_indica, Monticola_mykiss}
{Allactaga_valliceps, Larus_ladogensis} {Ameiva_indica, Rangifer_porphyrio}
{Allactaga_valliceps, Larus_ladogensis} {Monticola_mykiss, Rangifer_porphyrio}
{Allactaga_valliceps, Margaritifera_euptilura} {Ameiva