# MLPA Data Model

MLPA reports have 4 data types:

* Tests - varchar(200)
    * Methods (text)
* Genes - varchar(200)
* Variations - varchar(200)
* Results - text
* Interpretations - text

A test has a methods

A test has many genes

A test has many variations

A variation has many results

A result has many interpretations

```
GeneID : {
    variation1 : {
                 result : interpretation
               }
   variation1: {
                 result : interpretation
               }
}  
```

## Example MLPA Test

**Test Name** 

Alpha Thalassemia: HBA1&HBA2 Deletion/Duplication Analysis

**Method**

DNA analysis of the alpha-globin locus (HBA1/HBA2, OMIM 141800/141850, 16pter - 16p13.3)  is performed by multiplex ligation-dependent probe amplification (MLPA). This methodology will, in principle, detect all genomic deletions and duplications involving this locus, as well as the Constant Spring point mutation. Any partial exonic deletions and duplications outside of the region of interest cannot be detected. The analysis does not detect other point mutations within alpha globin locus. The presence of a rare mutation cannot be entirely ruled out. Any interpretation given here should be clinically correlated with available information about presentation and relevant family history of the patient.

**GENE IDS**

HBA1 and HBA2

**Variation**

"Heterozygous α3.7 deletion" 

**Result:** 

"Heterozygous α3.7 deletion: One copy of the α3.7 deletion was detected in the analyzed region."  


**Interpretation** 

"This result is most consistent with this patient being an unaffected carrier of alpha-thalassemia with a single gene deletion, also called alpha + thalassemia trait. Individuals with alpha + thalassemia trait typically have no clinical findings. Genetic counselor is available to health care providers  to discuss this results further. "

In [17]:
import pandas as pd

## Example Test

**Name**

Alpha Thalassemia: HBA1&HBA2 Deletion/Duplication Analysis



In [11]:
test_df_records = [
    {
        "id": 1,
        "name": "Alpha Thalassemia: HBA1&HBA2 Deletion/Duplication Analysis",
        "method": """DNA analysis of the alpha-globin locus (HBA1/HBA2, OMIM 141800/141850, 16pter - 16p13.3)  is performed by multiplex ligation-dependent probe amplification (MLPA). This methodology will, in principle, detect all genomic deletions and duplications involving this locus, as well as the Constant Spring point mutation. Any partial exonic deletions and duplications outside of the region of interest cannot be detected. The analysis does not detect other point mutations within alpha globin locus. The presence of a rare mutation cannot be entirely ruled out. Any interpretation given here should be clinically correlated with available information about presentation and relevant family history of the patient."""
    }
]

In [12]:
test_df = pd.DataFrame.from_records(test_df_records)

In [8]:
test_df

Unnamed: 0,name,method
0,Alpha Thalassemia: HBA1&HBA2 Deletion/Duplicat...,DNA analysis of the alpha-globin locus (HBA1/H...


## Example Gene

Genes - HBA1&HBA2

In [14]:
gene_records = [
    {
        "id": 1,
        "name": "HBA1"
    },
    {
        "id": 2,
        "name": "HBA2"
    }
]
gene_df = pd.DataFrame.from_records(gene_records)
gene_df

Unnamed: 0,id,name
0,1,HBA1
1,2,HBA2


In [25]:
variations = """α3.7 deletion
Heterozygous α4.2 deletion
Heterozygous HbCS variant
Homozygous α3.7 deletion
Homozygous α4.2 deletion
Homozygous HbCS variant
Heterozygous α3.7 deletion & Heterozygous α4.2 deletion
Heterozygous α3.7 deletion & Heterozygous HbCS variant
Heterozygous α4.2 deletion & Heterozygous HbCS variant
Heterozygous --SEA deletion
Heterozygous --MED deletion
Heterozygous --FIL deletion
Heterozygous --THAI deletion
Heterozygous α3.7 deletion & Heterozygous --SEA deletion
Heterozygous α4.2 deletion & Heterozygous --SEA deletion
Heterozygous HbCS variant & Heterozygous --SEA deletion
Heterozygous α3.7 deletion & Heterozygous --MED deletion
Heterozygous α4.2 deletion & Heterozygous --MED deletion
Heterozygous HbCS variant & Heterozygous --MED deletion
Homozygous --SEA deletion
Homozygous --MED deletion
Homozygous --FIL deletion
Homozygous --THAI deletion"""

variations = variations.split("\n")
variation_records = []
x = 1
for variation in variations:
    variation_records.append({
      "id": x,
      "test_id": 1  ,
      "variation": variation,
    })
    x = x+1

variations_df = pd.DataFrame.from_records(variation_records)
variations_df

Unnamed: 0,id,test_id,variation
0,1,1,α3.7 deletion
1,2,1,Heterozygous α4.2 deletion
2,3,1,Heterozygous HbCS variant
3,4,1,Homozygous α3.7 deletion
4,5,1,Homozygous α4.2 deletion
5,6,1,Homozygous HbCS variant
6,7,1,Heterozygous α3.7 deletion & Heterozygous α4.2...
7,8,1,Heterozygous α3.7 deletion & Heterozygous HbCS...
8,9,1,Heterozygous α4.2 deletion & Heterozygous HbCS...
9,10,1,Heterozygous --SEA deletion


In [20]:
test_gene_records = [
    {"gene_id": 1, "test_id": 1},
    {"gene_id": 2, "test_id": 1},
]

test_gene_df = pd.DataFrame.from_records(test_gene_records)

In [27]:
results = """Negative: No deletions were detected within the alpha globin cluster region.
Heterozygous α3.7 deletion: One copy of the α3.7 deletion was detected in the analyzed region.
Heterozygous α4.2 deletion: One copy of the α4.2 deletion was detected in the analyzed region.
Heterozygous HbCS variant: One copy of the Hb constant spring (HbCS) variant (heterozygous) was detected in the analyzed region.
Homozygous α3.7 deletion: Two copies of the α3.7 deletion were detected in the analyzed region.
Homozygous α4.2 deletion: Two copies of the α4.2 deletion were detected in the analyzed region.
Homozygous HbCS variant: Two copies of the Hb constant spring (HbCS) variant (homozygous) were detected in the analyzed region.
Heterozygous α3.7 deletion & Heterozygous α4.2 deletion: One copy of the α3.7 deletion and one copy of the α4.2 deletion were detected in the analyzed region.
Heterozygous α3.7 deletion & Heterozygous HbCS variant : One copy of the α3.7 deletion and one copy of the Hb constant spring (HbCS) variant were detected in the analyzed region.
Heterozygous α4.2 deletion & Heterozygous HbCS variant : One copy of the α4.2 deletion and one copy of the Hb constant spring (HbCS) variant were detected in the analyzed region.
Heterozygous --SEA deletion: One copy of the --SEA deletion was detected in the analyzed region.
Heterozygous --MED deletion: One copy of the --MED deletion was detected in the analyzed region.
Heterozygous --FIL deletion: One copy of the --FIL deletion was detected in the analyzed region.
Heterozygous --THAI deletion: One copy of the --THAI deletion was detected in the analyzed region.
Heterozygous α3.7 deletion & Heterozygous --SEA deletion: One copy of the α3.7 deletion and one copy of the --SEA deletion were detected in the analyzed region.
Heterozygous α4.2 deletion & Heterozygous --SEA deletion: One copy of the α4.2 deletion and one copy of the --SEA deletion were detected in the analyzed region.
Heterozygous HbCS variant & Heterozygous --SEA deletion: One copy of the HbCS variant and one copy of the --SEA deletion were detected in the analyzed region.
Heterozygous α3.7 deletion & Heterozygous --MED deletion: One copy of the α3.7 deletion and one copy of the --MED deletion were detected in the analyzed region.
Heterozygous α4.2 deletion & Heterozygous --MED deletion: One copy of the α4.2 deletion and one copy of the --MED deletion were detected in the analyzed region.
Heterozygous HbCS variant & Heterozygous --MED deletion: One copy of the HbCS variant and one copy of the --MED deletion were detected in the analyzed region.
Homozygous --SEA deletion:Two copies of the --SEA deletion was detected in the analyzed region.
Homozygous --MED deletion: Two copies of the --MED deletion was detected in the analyzed region.
Homozygous --FIL deletion: Two copies of the --FIL deletion was detected in the analyzed region.
Homozygous --THAI deletion: Two copies of the --THAI deletion was detected in the analyzed region."""

results = results.split("\n")
results_records = []

x = 1
for result in results:
    results_records.append({
        "test_id": 1,
        "variation_id": x,
        "result": result,
    })
    x = x+1
    
results_df = pd.DataFrame.from_records(results_records)
results_df

Unnamed: 0,test_id,variation_id,result
0,1,1,Negative: No deletions were detected within th...
1,1,2,Heterozygous α3.7 deletion: One copy of the α3...
2,1,3,Heterozygous α4.2 deletion: One copy of the α4...
3,1,4,Heterozygous HbCS variant: One copy of the Hb ...
4,1,5,Homozygous α3.7 deletion: Two copies of the α3...
5,1,6,Homozygous α4.2 deletion: Two copies of the α4...
6,1,7,Homozygous HbCS variant: Two copies of the Hb ...
7,1,8,Heterozygous α3.7 deletion & Heterozygous α4.2...
8,1,9,Heterozygous α3.7 deletion & Heterozygous HbCS...
9,1,10,Heterozygous α4.2 deletion & Heterozygous HbCS...


In [30]:
interpretation = """This is a dummy interpretation that should be created by a real genetic counselor. Genetic counselor is available to health care providers to discuss this results further."""

interpretation_records = [{
    "id": 1,
    # I'm making up the result id
    "result_id": 1,
    "interpretation": interpretation,
}]

interpretation_df = pd.DataFrame.from_records(interpretation_records)
interpretation_df

Unnamed: 0,id,result_id,interpretation
0,1,1,This is a dummy interpretation that should be ...
