<h1 Style="font-size: 3rem" >Targeted quantification of photosynthetic proteins - Whole Cell</h1>

1. [Targeted quantification of photosynthetic proteins](#Targeted-quantification-of-photosynthetic-proteins)
2. [References](#References)<br>
    2.1 [Quantification of target peptides using qConQuantifier](#Quantification-of-target-peptides-using-qConQuantifie)<br>
    2.2 [Label efficiency and quantification correction](#Label-efficiency-and-quantification-correction)<br>
    2.3 [Calculation of protein concentration per cell](#Calculation-of-protein-concentration-per-cell)<br>

## Targeted quantification of photosynthetic proteins

### What will you do!?

1. Read in QuantifiedPeptides.txt (output from QconQuantifier)
2. This will use "wholeCellNameMapping" to apply correct information about used peptides to the evaluated data<br>
   2.1 You will have to change the names of the 'map' - keys to fit your filenames.<br>
   2.2 You will have to change concentration to fit your real peptide concentrations.<br>
3. For all sequences the found intensities are averaged (mean), meaning: For "PEPTIDE" we take the average of all intensities over all charges and global modifications; <br>
    ```fsharp
    //PeptideSequence   globalMod charge -> intensity
    ADLNVPLDK       False 2 -> missing
    ADLNVPLDK       True  2 -> 1858.40598596182
    ```
    ... will result in:
    ```fsharp
    ADLNVPLDK           -> 1858.4059859618
    ```
4. Add protein names to correct peptide sequence
5. Calculate ratio in float
6. filter out peptides <code>"AFPDAYVR"; "EVTLGFVDLMR"</code> (Hammel et al.)
7. filter out all intensities, which are missing(nan) or (+/-)infinite



In [None]:
#load @"../IfSharp/References.fsx"
#load @"../IfSharp/Paket.Generated.Refs.fsx"
#load @"../IfSharp/FSharp.Plotly.fsx"
#load @"../IfSharp/DeedleAux.fsx"

open Deedle
open FSharpAux
open FSharp.Stats
open FSharp.Plotly
open FSharp.Stats.Fitting.LinearRegression.OrdinaryLeastSquares.Linear

<div Style="max-width: 85%">
	<div Style="text-align: justify ; font-size: 1.8rem ; margin-top: 2rem ; line-height: 1.5">
This are functions and parameters which are used for the styling of the graphs.
	</div>
</div>

In [2]:
let colorArray = [|"#E2001A"; "#FB6D26"; "#00519E"; "#00e257";|]

let colorForMean = "#366F8E"

let xAxis showGrid title titleSize tickSize = Axis.LinearAxis.init(Title=title,Showgrid=showGrid,Showline=true,Mirror=StyleParam.Mirror.All,Zeroline=false,Tickmode=StyleParam.TickMode.Auto,Ticks= StyleParam.TickOptions.Inside, Tickfont=Font.init(StyleParam.FontFamily.Arial,Size=tickSize),Titlefont=Font.init(StyleParam.FontFamily.Arial,Size=titleSize))
let yAxis showGrid title titleSize tickSize = Axis.LinearAxis.init(Title=title,Showgrid=showGrid,Showline=true,Mirror=StyleParam.Mirror.All,Tickmode=StyleParam.TickMode.Auto,Ticks= StyleParam.TickOptions.Inside,Tickfont=Font.init(StyleParam.FontFamily.Arial,Size=tickSize),Titlefont=Font.init(StyleParam.FontFamily.Arial,Size=titleSize))

let config = Config.init(ToImageButtonOptions = ToImageButtonOptions.init(Format = StyleParam.ImageFormat.SVG, Filename = "praktikumsplot.svg"), EditableAnnotations = [AnnotationEditOptions.LegendPosition])

<div Style="max-width: 85%">
	<div Style="text-align: justify ; font-size: 1.8rem ; margin-top: 2rem ; line-height: 1.5">
        Next, we need a <code>map</code> of all proteins present in our QconCat proteins with their corresponding peptides.
	</div>
</div>

In [3]:
let peptideProtMapping =
    [
    //PS
    "iRT"   =>  "LGGNEQVTR"
    "LCI5"  =>  "SALPSNWK"
    "LCI5"  =>  "SVLPANWR"
    "rbcL"  =>  "DTDILAAFR"
    "rbcL"  =>  "EVTLGFVDLMR"
    "rbcL"  =>  "FLFVAEAIYK"
    "rbcL"  =>  "LTYYTPDYVVR"
    "RBCS2" =>  "AYVSNESAIR"
    "RBCS2" =>  "LVAFDNQK"
    "RBCS2" =>  "YWTMWK"
    "RBCS2" =>  "AFPDAYVR"
    "RCA1"  =>  "VPLILGIWGGK"
    "RCA1"  =>  "IGQQLVNAR"
    "RCA1"  =>  "SLVDEQENVK"
    "PCY1"  =>  "LGADSGALEFVPK"
    "PCY1"  =>  "DDYLNAPGETYSVK"
    "psaB"  =>  "TPLANLVYWK"
    "psaB"  =>  "ALYGFDFLLSSK"
    "psaB"  =>  "TNFGIGHR"
    "atpB"  =>  "LSIFETGIK"
    "atpB"  =>  "TAPAFVDLDTR"
    "petA"  =>  "IPAGPDLIVK"
    "petA"  =>  "NILVVGPVPGK"
    "petA"  =>  "IVAITALSEK"
    "petA"  =>  "YPIYFGGNR"
    "FNR1"  =>  "LYSIASSR"
    "FNR1"  =>  "LDYALSR"
    "D1"    =>  "VLNTWADIINR"
    "D1"    =>  "EWELSFR"
    "D1"    =>  "NTWADIINR"
    "D1"    =>  "LIFQYASFNNSR"
    "LCI5"  =>  "TALPADWR"
    "psbD"  =>  "LVFPEEVLPR"
    "psbD"  =>  "NILLNEGIR"
    "psbD"  =>  "TWFDDADDWLR"
    //CBC
    "PGK"   =>  "ADLNVPLDK"
    "PGK"   =>  "TFNDALADAK"
    "PGK"   =>  "LSELLGKPVTK"
    "Gap3"  =>  "AVSLVLPSLK"
    "Gap3"  =>  "VLITAPAK"
    "FBA3"  =>  "ALQNTVLK"
    "FBA3"  =>  "VMFEGILLK"
    "FBA3"  =>  "SVVSIPHGPSIIAAR"
    "FBP1"  =>  "VPLFIGSK"
    "FBP1"  =>  "TLLYGGIYGYPGDAK"
    "FBP1"  =>  "IYSFNEGNYGLWDDSVK"
    "SBP"   =>  "LTNITGR"
    "SBP"   =>  "LLFEALK"
    "TRK1"  =>  "FLAIDAINK"
    "TRK1"  =>  "VSTLIGYGSPNK"
    "TRK1"  =>  "NPDFFNR"
    "RPE1"  =>  "FIESQVAK"
    "RPE1"  =>  "GVNPWIEVDGGVTPENAYK"
    "RPE1"  =>  "SDIIVSPSILSADFSR"
    "PRK1"  =>  "IYLDISDDIK"
    "PRK1"  =>  "VAELLDFK"
    "PRK1"  =>  "GHSLESIK"
    "TPI1"  =>  "SLFGESNEVVAK"
    "TPI1"  =>  "LVDELNAGTIPR"
    "RPI1"  =>  "LANLPEVK"
    "RPI1"  =>  "LQNIVGVPTSIR"
    "RPI1"  =>  "TQLSQDELK"
    "DP12"  =>  "SGQPAVDLNK"
    "DP12"  =>  "ASGQPAVDLNK"
    "RMT1"  =>  "AEAALLVR"
    "RMT1"  =>  "SNSTPLGSR"
    "FBA1"  =>  "GILASDESNATTGK"
    "FBA1"  =>  "ALQSSTLK"
    "FBA2"  =>  "VSAADVAR"
    "FBA2"  =>  "ALQASVLK"
    "Cre07.g338451" =>  "VTEAAALASGR"
    "FBP1"  =>  "NLALELVR"
    "CalSciex"  =>  "SAEGLDASASLR"
    ]
    |> List.map (fun (x,y) -> y,x)
    |> Map.ofList

<div Style="max-width: 85%">
	<div Style="text-align: justify ; font-size: 1.8rem ; margin-top: 2rem ; line-height: 1.5">
        After we got our peptide/protein map, we need a <code>map</code> for the files we want to analyze. For that we need the filename and a description, what the file contains (experiment, spiked in peptide concentration, and strain).
	</div>
</div>

In [4]:
let wholeCellNameMapping = 
    [
    // filename(found in metadata file) => ("WholeCell_ISD", (spiked in peptide concentration, C. reinhardtii strain) ) 
    "20200206MS169msFSSTqp001"      =>  ("WholeCell_ISD",("1:1","CW15") )
    "20200206MS169msFSSTqp002"      =>  ("WholeCell_ISD",("1:5","CW15") )
    "20200206MS169msFSSTqp003"      =>  ("WholeCell_ISD",("1:25","CW15") )
    "20200206MS169msFSSTqp004"      =>  ("WholeCell_ISD",("1:125","CW15"))
    "20200206MS169msFSSTqp007"      =>  ("WholeCell_ISD",("1:1","UVM4") )
    "20200206MS169msFSSTqp008"      =>  ("WholeCell_ISD",("1:5","UVM4") )
    "20200206MS169msFSSTqp009"      =>  ("WholeCell_ISD",("1:25","UVM4") )
    "20200206MS169msFSSTqp010"      =>  ("WholeCell_ISD",("1:125","UVM4"))
    "20200206MS169msFSSTqp013"      =>  ("WholeCell_ISD",("1:1","4A") )
    "20200206MS169msFSSTqp014"      =>  ("WholeCell_ISD",("1:5","4A") )
    "20200206MS169msFSSTqp015"      =>  ("WholeCell_ISD",("1:25","4A") )
    "20200206MS169msFSSTqp016"      =>  ("WholeCell_ISD",("1:125","4A"))
    ]
    |> Map.ofList

<div Style="max-width: 85%">
    <div Style="text-align: justify ; font-size: 1.8rem ; margin-top: 2rem ; line-height: 1.5">
        The following functions are a combination of support functions and <code>Record Types</code> used for an easier information access later in the more complex functions.
    </div>
</div>

In [5]:
/// This function is used to check for missing or infinity values, all of which have to be handled in addition to normal values.
let isBad a =
    nan.Equals(a) || infinity.Equals(a) || (-infinity).Equals(a)
    
let root = __SOURCE_DIRECTORY__

type PepResult1 = 
    {
        Protein     :   string
        Peptide     :   string
        StrainValues:   (float*float) []
        StrainName  :   string
    }

    static member create prot pep vals strainName = {
        Protein      =  prot
        Peptide      =  pep
        StrainValues =  vals
        StrainName   =  strainName
    }

<div Style="max-width: 85%">
	<div Style="text-align: justify ; font-size: 1.8rem ; margin-top: 2rem ; line-height: 1.5">
This function reads the output of the QconQuantifier and indexes it with the Sequence of the peptide, the global mod (tells if it is labeled or not) and the charge, since they are unique for each entry.
	</div>
</div>

In [6]:
let readQConcatResultFrame p : Frame<string*(bool*int),string>=
    let schemaFrame =
        Frame.ReadCsv(path = p,separators="\t")
    let schema =
        schemaFrame.ColumnKeys
        |> Seq.filter (fun x -> not (x = "StringSequence" || x = "GlobalMod" || x = "Charge"))
        |> Seq.map (sprintf "%s=float")
        |> Seq.append ["StringSequence=string";"GlobalMod=bool";"Charge=int"]
        |> String.concat ","
    Frame.ReadCsv(path = p,schema=schema,separators="\t")
    |> Frame.indexRowsUsing (fun os -> (os.GetAs<string>("StringSequence"),((os.GetAs<bool>("GlobalMod"),(os.GetAs<int>("Charge"))))))
    |> Frame.dropCol "StringSequence"
    |> Frame.dropCol "GlobalMod"
    |> Frame.dropCol "Charge"
    |> Frame.sortRowsByKey

<div Style="max-width: 85%">
	<div Style="text-align: justify ; font-size: 1.8rem ; margin-top: 2rem ; line-height: 1.5">
This function takes the path to the result file as a parameter. It reads the information (with the <code>readQConcatResultFrame</code> function from above) and filters for the results of the files given in <code>wholeCellNameMapping</code>. Then it searches for the quantification info of the found peptides and only retains those. For all different charges of the peptide, the quantification information is averaged.
	</div>
</div>

In [16]:
let getWholeCellResults path : Frame<string*(string*(string*string)),string*string> = 
    readQConcatResultFrame (root + path)//@"\..\AuxFiles\FredQuantifiedPeptides.txt"
    |> Frame.filterCols 
        (fun ck cs ->
            let newCK = Map.tryFindKey (fun key t -> key = (ck.Split('_').[1 ..] |> String.concat "_")) wholeCellNameMapping
            newCK.IsSome
        )
    |> Frame.mapColKeys 
        (fun (ck:string) -> 
            let newCK = Map.find (ck.Split('_').[1 ..] |> String.concat "_") wholeCellNameMapping
            ck.Split('_').[0] , newCK
        )
    |> Frame.sortColsByKey
    |> Frame.filterCols (fun ck _ -> (fst ck).Contains("Quant"))
    |> Frame.applyLevel (fun (sequence,(gmod,charge)) -> sequence) Stats.mean
    |> Frame.transpose
    |> Frame.mapColKeys
        (fun ck ->
            match Map.tryFind ck peptideProtMapping with
            |Some prot  -> prot,ck
            |None       -> (sprintf "%s not found in 'peptideProtMapping'(QConCat)" ck),ck
        )
// Here we filter for only rbcL entries to reduce the visual overload        
getWholeCellResults "\..\AuxFiles\FredQuantifiedPeptides.txt"
|> Frame.filterCols (fun ck cs -> fst ck = "rbcL")
|> fun x -> x.Print()

                                           rbcL                                                                
                                           DTDILAAFR        EVTLGFVDLMR      FLFVAEAIYK       LTYYTPDYVVR      
N14Quant       WholeCell_ISD 1:1   4A   -> <missing>        1822.22143957728 27368.5240234027 26190.2861242702 
                                   CW15 -> <missing>        2180.4744454363  36447.9713544295 37880.2059479232 
                                   UVM4 -> <missing>        1091.94221504379 26540.4587996413 30769.5688508342 
                             1:125 4A   -> <missing>        865.567247430097 8173.99996025073 11004.7543842426 
                                   CW15 -> <missing>        1683.9204860914  24442.6212420372 24599.5015361703 
                                   UVM4 -> <missing>        684.271560056444 5359.13542203211 8934.76926556884 
                             1:25  4A   -> <missing>        646.817206090597 6377.96659828755 86

<div Style="max-width: 85%">
	<div Style="text-align: justify ; font-size: 1.8rem ; margin-top: 2rem ; line-height: 1.5">
This function returns all information needed for linearity determination over different QconCat protein concentrations. It is possible to look for specific peptides and to exclude certain peptides.
	</div>
</div>

In [8]:
let getForLinearity (proteinsToShow:string [] option) (peptidesToIgnore:string [] option) (wcResults:Frame<(string*(string*(string*string))),(string*string)>)= 
    
    let prepToShowProteins =
        if proteinsToShow.IsSome then
            proteinsToShow.Value 
            |> Array.map (fun x -> x.ToLower())
        else 
            [||]

    let prepToIgnorePeptides =
        if peptidesToIgnore.IsSome then
            peptidesToIgnore.Value
            |> Array.map (fun x -> x.ToLower())
        else 
            [||]

    wcResults
    |> Frame.mapCols (
        fun _ os -> 
            os.As<float>() 
            |> Series.mapValues (fun (x:float) -> if x > 2000000. || x < 1. then nan else x)
    )
    |> Frame.mapRowKeys 
        (fun (q,(n,(ratio,strain))) ->  
            let ratioSplit = ratio.Split(':')
            let ratio' = (float ratioSplit.[0]) / (float ratioSplit.[1])
            (q,(n,(ratio',strain)))
    
        )
    // Bad Peptides according to Hammel et al
    |> Frame.filterCols (fun ck _ -> not ((snd ck) = "EVTLGFVDLMR") && not ((snd ck) = "AFPDAYVR"))
    |> Frame.filterRows (fun (q,(n,ratio)) _ -> not (q.Contains("Minus")))
    |> Frame.sortRowsByKey
    |> Frame.sortColsByKey
    |> Frame.filterCols (fun ck cs -> 
        let prepProtInFrame = (fst ck).ToLower()
        let prepPeptInFrame = (snd ck).ToLower()
        match proteinsToShow, peptidesToIgnore with
        | Some _, Some _ -> 
            Array.exists (fun x -> x = prepProtInFrame) prepToShowProteins
            && Array.exists (fun x -> x = prepPeptInFrame) prepToIgnorePeptides |> not
        | Some _, None ->
            Array.exists (fun x -> x = prepProtInFrame) prepToShowProteins
        | None, Some _ ->
            Array.exists (fun x -> x = prepPeptInFrame) prepToIgnorePeptides |> not
        | None, None -> true
    )

<div Style="max-width: 85%">
	<div Style="text-align: justify ; font-size: 1.8rem ; margin-top: 2rem ; line-height: 1.5">
Those two functions take as an input the result from <code>getForLinearity</code>, filter it for the <sup>14</sup>N and <sup>15</sup>N values respectively, and transform it from the deedle specific <code>Frame</code> Type to a <code>JaggedArray</code>.
	</div>
</div>

In [9]:
let n14Lin (linearityData:Frame<(string*(string*(float*string))),(string*string)>) =
    linearityData
    |> Frame.filterRows (fun rk _ -> not ((fst rk).Contains("N15Quant")))
    |> Frame.transpose
    |> Frame.toArray2D
    |> Array2D.toJaggedArray

let n15Lin (linearityData:Frame<(string*(string*(float*string))),(string*string)>)=
    linearityData
    |> Frame.filterRows (fun rk _ -> not ((fst rk).Contains("N14")) && not ((fst rk).Contains("Corrected")))
    |> Frame.transpose
    |> Frame.toArray2D
    |> Array2D.toJaggedArray

<div Style="max-width: 85%">
	<div Style="text-align: justify ; font-size: 1.8rem ; margin-top: 2rem ; line-height: 1.5">
This function compares the peptide ratios (<sup>14</sup>N / <sup>15</sup>N) of the different strains for the given proteins. It displays those results for each strain for each peptide that was found for that protein and averages it. It also compares at the protein level for all the strains. <code>strainNameArr</code> is an array with the names of all the strains used in the experiment. <code>dilutionArr</code> is an array containing the information of the <sup>15</sup>N dilution series.
	</div>
</div>

In [17]:
let plotPeptideISD wcResults (proteinsToShow:string[] option) (peptidesToIgnore:string[] option) strainNameArr dilutionArr =

    let forLinearity = getForLinearity proteinsToShow peptidesToIgnore wcResults
    
    printfn "For Linearity:"
    forLinearity.Print()

    let strainNamesSorted = strainNameArr |> Array.sort //[|"4A"; "UVM";"CW15"|] 

    let dilutionsSorted = dilutionArr |> Array.sortDescending //[|1.;5.;25.;125.|] 

    let dilutionArrTimesStrains =
        [|for dil in dilutionsSorted do 
            yield! Array.init strainNamesSorted.Length (fun _ -> dil) |]

    let sortedStrainValues =
        Array.map2 (Array.zip) (n14Lin forLinearity) (n15Lin forLinearity)
        |> JaggedArray.map (fun (n14,n15) -> if (isBad n14 || isBad n15) then 0. else n14/n15)
        // dilutions times the number of strains 3 x 125, 3 x 25, 3 x 5 ...
        // zip the following values to the columns
        |> Array.map (fun x -> Array.zip dilutionArrTimesStrains x)
        |> Array.map 
            (fun (values) -> 
                Array.init 
                    strainNamesSorted.Length 
                    (fun ind -> 
                        [|for x in ind .. strainNamesSorted.Length .. values.Length-1 do yield values.[x]|]
                    )
                |> Array.zip strainNamesSorted
            )
        |> Array.zip (Seq.toArray forLinearity.ColumnKeys)
        |> Array.collect (fun ((prot,pept),strainResults) -> 
            strainResults 
            |> Array.map (fun (strainName,values) ->
                PepResult1.create prot pept values strainName
            )  
        )

    let comparePeptidesInStrainCharts =
        sortedStrainValues
        |> Array.groupBy (fun x -> x.StrainName, x.Protein)
        |> Array.map (fun (header,peptInfoArr) -> 
            header,
            peptInfoArr
            |> Array.mapi (fun i peptInfo ->
                Chart.Scatter(peptInfo.StrainValues,mode=StyleParam.Mode.Markers, MarkerSymbol = StyleParam.Symbol.Cross, Color=colorArray.[i])
                |> Chart.withTraceName (sprintf "%s -  %s - %s" peptInfo.StrainName peptInfo.Protein peptInfo.Peptide)
            )
            |> Chart.Combine
            |> Chart.withX_Axis (xAxis false (fst header + "-N14 Sample/N15 QProtein ratio") 20 16)
            |> Chart.withY_Axis (yAxis false "N14/N15 Quantification ratio" 20 16)
        )

    let comparePeptidesInStrainMEANS =
        sortedStrainValues
        |> Array.groupBy (fun x -> x.StrainName, x.Protein)
        |> Array.map (fun (header,peptInfoArr) -> 
            header,
            [|for pept in peptInfoArr do
                yield! pept.StrainValues|]
        )
        |> Array.map (fun (header,values) -> header,values|> (Array.groupBy fst))
        |> Array.map (fun ((strain,prot),values) -> 
            let means =
                values 
                |> Seq.map (fun (xAxis,values) -> 
                    xAxis,values 
                    |> Seq.meanBy snd
                )
            (strain,prot),means
        )

    let compareBetweenStrainsChart =
        comparePeptidesInStrainMEANS
        |> Array.groupBy (fun (header,x) -> snd header)
        |> Array.map (fun (prot,strains) -> 
            ("CompareStrains", prot),
            strains
            |> Array.mapi (fun i ((strain,prot),values) -> 
                Chart.Scatter(values,mode=StyleParam.Mode.Lines_Markers, MarkerSymbol = StyleParam.Symbol.Cross, Color=colorArray.[i])
                |> Chart.withTraceName (sprintf "mean %s -  %s" strain prot)
            )
            |> Chart.Combine
            |> Chart.withX_Axis (xAxis false ("Strain Means-" + "N14 Sample/N15 QProtein ratio") 20 16)
            |> Chart.withY_Axis (yAxis false "N14/N15 Quantification ratio" 20 16)
        )

    let comparePeptidesInStrainMEANSCharts =
        comparePeptidesInStrainMEANS
        |> Array.map (fun ((strain,prot),means) ->
            (strain,prot),
            Chart.Scatter(means,mode=StyleParam.Mode.Lines_Markers, MarkerSymbol = StyleParam.Symbol.Circle, Color=colorForMean, Opacity=0.8)
            |> Chart.withTraceName (sprintf "mean %s - %s" strain prot)
        )

    let alignPeptideAndPeptideMeans =
        Array.zip comparePeptidesInStrainCharts comparePeptidesInStrainMEANSCharts
        |> Array.map (fun ((header,chart),(header,chartMean)) -> header, Chart.Combine [|chart; chartMean|])

    [|yield! compareBetweenStrainsChart; yield! alignPeptideAndPeptideMeans|]
    |> Array.groupBy (fun (header,chart) -> snd header) 
    |> fun x -> x
    |> Array.map (fun (prot,chartsWithMeta) -> 
        chartsWithMeta
        |> Array.map (fun x -> 
            snd x 
        )
        |> Chart.Stack(2,Space=0.15)
        |> Chart.withTitle prot
        |> Chart.withSize (1200.,900.)
        |> Chart.withConfig config
        |> Display
    )
    
let wcResults = getWholeCellResults "\..\AuxFiles\FredQuantifiedPeptides.txt"

let sth = plotPeptideISD wcResults (Some [|"RBCL"; "Rbcs2"|]) None [|"4A"; "UVM";"CW15"|] [|1.;5.;25.;125.|] 

For Linearity:
                                     rbcL                                               RBCS2            
                                     DTDILAAFR        FLFVAEAIYK       LTYYTPDYVVR      AYVSNESAIR       
N14Quant WholeCell_ISD 0.008 4A   -> <missing>        8173.99996025073 11004.7543842426 5021.02965633758 
                             CW15 -> <missing>        24442.6212420372 24599.5015361703 18814.4212019197 
                             UVM4 -> <missing>        5359.13542203211 8934.76926556884 6809.77880994903 
                       0.04  4A   -> <missing>        6377.96659828755 8637.66500601916 6019.95250607964 
                             CW15 -> <missing>        15182.577928782  15356.1663571102 10405.0645345887 
                             UVM4 -> <missing>        3149.47321610006 5058.3332952497  4134.0735263168  
                       0.2   4A   -> <missing>        5229.41171532585 7639.547468136   6816.93459235203 
                     

<div Style="max-width: 85%">
	<div Style="text-align: justify ; font-size: 1.8rem ; margin-top: 2rem ; line-height: 1.5">
<code>wholeCellPeptideRatios</code> has a similar function as <code>plotPeptideISD</code>, but instead of a graph it returns the actual values to work with.
	</div>
</div>

In [11]:
let wholeCellPeptideRatios wcResults proteinsToShow peptidesToIgnore (strainNameArr:string []) (dilutionArr: float[]) =

    let forLinearity = getForLinearity proteinsToShow peptidesToIgnore wcResults

    let strainNamesSorted = strainNameArr |> Array.sort //[|"4A"; "UVM";"CW15"|] 

    let dilutionsSorted = dilutionArr |> Array.sortDescending //[|1.;5.;25.;125.|] 

    let dilutionArrTimesStrains =
        [|for dil in dilutionsSorted do 
            yield! Array.init strainNamesSorted.Length (fun _ -> dil) |]

    let sortedStrainValues =
        Array.map2 (Array.zip) (n14Lin forLinearity) (n15Lin forLinearity)
        // JP12_WC_07
        |> JaggedArray.map (fun (n14,n15) -> if (isBad n14 || isBad n15) then 0. else n14/n15)
        // dilutions times the number of strains 3 x 125, 3 x 25, 3 x 5 ...
        // zip the following values to the columns
        |> Array.map (fun x -> Array.zip dilutionArrTimesStrains x)
        |> Array.map 
            (fun (values) -> 
                Array.init 
                    strainNamesSorted.Length 
                    (fun ind -> 
                        [|for x in ind .. strainNamesSorted.Length .. values.Length-1 do yield values.[x]|]
                    )
                |> Array.zip strainNamesSorted
            )
        |> Array.zip (Seq.toArray forLinearity.ColumnKeys)
        |> Array.collect (fun ((prot,pept),strainResults) -> 
            strainResults 
            |> Array.map (fun (strainName,values) ->
                PepResult1.create prot pept values strainName
            )  
        )

    let comparePeptidesInStrainMEANS =
        sortedStrainValues
        |> Array.groupBy (fun x -> x.StrainName, x.Protein)
        |> Array.map (fun (header,peptInfoArr) -> 
            header,
            [|for pept in peptInfoArr do
                yield! pept.StrainValues|]
        )
        |> Array.map (fun (header,values) -> header,values|> (Array.groupBy fst))
        |> Array.map (fun ((strain,prot),values) -> 
            let means =
                values 
                |> Seq.map (fun (xAxis,values) -> 
                    xAxis,values 
                    |> Seq.meanBy snd
                )
            PepResult1.create prot "mean" (Seq.toArray means) strain
        )

    [|yield! comparePeptidesInStrainMEANS; yield! sortedStrainValues|]
    |> Array.groupBy (fun x -> x.StrainName, x.Protein)
    |> Array.map (fun x -> snd x)
    |> Array.map (
        fun x ->
            [for pepRes in x do
                yield
                    ((pepRes.Protein,pepRes.StrainName),pepRes.Peptide) => series pepRes.StrainValues
            ]
            |> frame
        )
    |> Seq.reduce (Frame.join JoinKind.Outer)
    |> Frame.transpose
    |> Frame.sortRowsByKey
    |> Frame.sortColsByKey

let sth2 = wholeCellPeptideRatios wcResults (Some [|"RBCL"; "Rbcs2"|]) (Some [|"DTDILAAFR"|]) [|"4A"; "UVM";"CW15"|] [|1.;5.;25.;125.|]
sth2.Print()

                             1                5                25               125              
(rbcL, 4A)    FLFVAEAIYK  -> 2.06947861097614 7.10997943034996 20.629914263127  30.6582012903869 
              LTYYTPDYVVR -> 2.21921859941653 10.6293124813075 29.0278528418796 37.9822031405613 
              mean        -> 2.14434860519633 8.86964595582872 24.8288835525033 34.3202022154741 
(rbcL, CW15)  FLFVAEAIYK  -> 2.56045015067148 9.47159017125891 36.3060023757454 172.430567046045 
              LTYYTPDYVVR -> 2.99325412324438 12.0517741731554 47.0262981423251 63.88783176643   
              mean        -> 2.77685213695793 10.7616821722072 41.6661502590353 118.159199406238 
(rbcL, UVM)   FLFVAEAIYK  -> 2.5606493932246  9.28728459308256 13.2359352829558 40.5860880186285 
              LTYYTPDYVVR -> 2.98369516600166 12.8101619050306 26.9079052809061 34.0626490169816 
              mean        -> 2.77217227961313 11.0487232490566 20.0719202819309 37.324368517805  
(RBCS2, 4A

<div Style="max-width: 85%">
	<div Style="text-align: justify ; font-size: 1.8rem ; margin-top: 2rem ; line-height: 1.5">
<code>rbc_L_vs_S_rbcl_RatiosS_wholeCell</code> does a quality assessment for the whole-cell sample preparation. A linear regression of RuBisCO subunit relative quantification in dependence on the overall <sup>14</sup>N/<sup>15</sup>N protein ratio in whole-cell samples is done. The coefficient of determination (R2) is shown for both models.
	</div>
</div>

In [18]:
let rbc_L_vs_S_rbcl_RatiosS_wholeCell prot1Name prot2Name (wcPeptideRatios:Frame<((string*string)*string),float>) =
    
    let prot1_RatiosS_wholeCell =
        wcPeptideRatios
        |> Frame.filterRows (fun ((prot,strain),pep) _ -> pep = "mean")
        |> Frame.transpose
        |> Frame.filterCols (fun ((prot,strain),pep) _ -> prot = prot1Name)
        |> Frame.transpose
        |> Frame.applyLevel fst Stats.mean
    
    let prot2_RatiosSdsIgd_wholeCell =
        wcPeptideRatios
        |> Frame.filterRows (fun ((prot,strain),pep) _ -> pep = "mean")
        |> Frame.transpose
        |> Frame.filterCols (fun ((prot,strain),pep) _ -> prot = prot2Name) //(fun (prot,(pep,_)) _ -> prot = "RBCS2" && (not (pep = "AFPDAYVR" || pep = "EVTLGFVDLMR")))
        |> Frame.transpose
        |> Frame.applyLevel fst Stats.mean

    let dilutionsSorted = 
        wcPeptideRatios.ColumnKeys
        |> Array.ofSeq

    let strainNames = 
        wcPeptideRatios.RowKeys
        |> Seq.map (fun ((x,y),z) -> y)
        |> (Seq.distinct >> Array.ofSeq)

    let prot1 =
        prot1_RatiosS_wholeCell
        |> fun x ->
            printfn "Protein1:"
            x.Print()
            x
        |> Frame.toArray2D
        |> JaggedArray.ofArray2D

    let prot1Coeff,prot1FitVals,prot1Determination =
        prot1
        |> Array.mapi (
            fun i strainVals ->
                // RBCL Regression of relative quantification values
                let RBCLcoeff = Univariable.coefficient (vector dilutionsSorted) (vector strainVals)
                let RBCLfitFunc = Univariable.fit RBCLcoeff
                let RBCLfitVals = dilutionsSorted |> Array.map RBCLfitFunc
                let RBCLdetermination = FSharp.Stats.Fitting.GoodnessOfFit.calculateDeterminationFromValue strainVals RBCLfitVals
                let RBCLpearson = FSharp.Stats.Correlation.Seq.pearson (strainVals) dilutionsSorted
                printfn "%s - Pearson WholeCell RBCL: %f" strainNames.[i] RBCLpearson
                RBCLcoeff, RBCLfitVals, RBCLdetermination
        )
        |> Array.unzip3
                
    let prot2 =
        prot2_RatiosSdsIgd_wholeCell
        |> fun x ->
            printfn "Protein2:"
            x.Print()
            x
        |> Frame.toArray2D
        |> JaggedArray.ofArray2D

    let prot2Coeff,prot2FitVals,prot2Determination =
        prot2
        |> Array.mapi (
            fun i strainVals ->
                let RBCScoeff = Univariable.coefficient (vector dilutionsSorted ) (vector strainVals) 
                let RBCSfitFunc = Univariable.fit RBCScoeff
                let RBCSfitVals = dilutionsSorted  |> Array.map RBCSfitFunc
                let RBCSdetermination = FSharp.Stats.Fitting.GoodnessOfFit.calculateDeterminationFromValue strainVals RBCSfitVals
                let RBCSpearson = FSharp.Stats.Correlation.Seq.pearson (strainVals) dilutionsSorted
                printfn "%s - Pearson WholeCell RBCS: %f" strainNames.[i] RBCSpearson
                RBCScoeff, RBCSfitVals, RBCSdetermination
        )
        |> Array.unzip3

    let chartPearsons prot1 (prot1Coeff:Vector<float>) prot1FitVals prot1Determination prot2 (prot2Coeff:Vector<float>) prot2FitVals prot2Ddetermination strain =
        [
            Chart.Point ((Array.zip dilutionsSorted prot1),Name = sprintf "%s Quantified Ratios" prot1Name)
            |> Chart.withMarkerStyle(Size=10,Symbol = StyleParam.Symbol.Cross)
            Chart.Line(Array.zip dilutionsSorted prot1FitVals,Name = (sprintf "%s linear regression: %.2f x + (%2f) ; R² = %.4f" prot1Name prot1Coeff.[1] prot1Coeff.[0] prot1Determination))
            |> Chart.withLineStyle(Color="lightblue",Dash=StyleParam.DrawingStyle.DashDot)

            Chart.Point ((Array.zip dilutionsSorted prot2),Name = sprintf "%s Quantified Ratios" prot2Name,MarkerSymbol = StyleParam.Symbol.Cross)
            |> Chart.withMarkerStyle(Size=10,Symbol = StyleParam.Symbol.Cross)
            Chart.Line(Array.zip dilutionsSorted prot2FitVals,Name = (sprintf "%s linear regression: %.2f x + (%2f) ; R² = %.4f" prot2Name prot2Coeff.[1] prot2Coeff.[0] prot2Ddetermination))
            |> Chart.withLineStyle(Color="LightGreen",Dash=StyleParam.DrawingStyle.DashDot)
        ]
        |> Chart.Combine
        |> Chart.withTitle (sprintf "%s - Whole cell extracts: Stability of %s/%s ratios between samples" strain prot1Name prot2Name)
        |> Chart.withX_Axis (yAxis false "N14 Sample / N15 QProtein ratio" 20 16)
        |> Chart.withY_Axis (xAxis false "relative quantification" 20 16 )
        |> Chart.withConfig config
        |> Chart.withSize (900.,500.)
        |> Display

    for i in 0 .. 2 do 
        chartPearsons 
            prot1.[i] prot1Coeff.[i] prot1FitVals.[i] prot1Determination.[i]
            prot2.[i] prot2Coeff.[i] prot2FitVals.[i] prot2Determination.[i]
            strainNames.[i]

rbc_L_vs_S_rbcl_RatiosS_wholeCell "rbcL" "RBCS2" sth2

Protein1:
             1                5                25               125              
rbcL 4A   -> 2.14434860519633 8.86964595582872 24.8288835525033 34.3202022154741 
     CW15 -> 2.77685213695793 10.7616821722072 41.6661502590353 118.159199406238 
     UVM  -> 2.77217227961313 11.0487232490566 20.0719202819309 37.324368517805  

4A - Pearson WholeCell RBCL: 0.864919
CW15 - Pearson WholeCell RBCL: 0.989714
UVM - Pearson WholeCell RBCL: 0.945415
Protein2:
              1                5                25               125              
RBCS2 4A   -> 1.41149255419424 7.4008650938191  17.0155337424956 41.6134857376871 
      CW15 -> 1.65064700829214 8.6351227560191  32.7054111649184 58.4920514782453 
      UVM  -> 1.80823844374316 8.79779553484811 34.6705590524958 24.8398521977448 

4A - Pearson WholeCell RBCS: 0.980164
CW15 - Pearson WholeCell RBCS: 0.935343
UVM - Pearson WholeCell RBCS: 0.490735


## References

<ol Style="max-width: 85% ; text-align: justify ; font-size: 1.8rem ; margin-top: 2rem ; line-height: 1.5">
    <li Value ="1" Id="exmp">exmp</li>
</ol>