# IGVF CRISPR Jamboree 2024: Perturb-seq Inference (Python)

Authors: Gene Katsevich and Logan Blaine

Date: February 21, 2024


# Overview

## Perturb-seq inference

The goal of perturb-seq inference is to quantify the extent to which the perturbation of a given genomic element impacts the expression of a given gene. We allow a range of statistical interpretations of this task. In a frequentist framework, this task can be viewed as testing the null hypothesis that the perturbation of the genomic element has no effect on the gene's expression, or as estimating the effect size of the perturbation on the gene's expression. In a Bayesian framework, this task can be viewed as estimating the posterior probability of the presence of a non-zero effect, or as a posterior mean of the effect size. 

## Jamboree goals

The goal of the perturb-seq inference portion of the Jamboree is to implement a number of perturb-seq inference methodologies using common input and output formats. Following the Jamboree, these implementations will be added as modules to a Nextflow pipeline, which will then be used to benchmark their statistical and computational performance. This benchmarking effort will suggest best practices for perturb-seq inference, and will be used to inform the development of the IGVF perturb-seq analysis pipeline.

## Data format overview

The primary input to a perturb-seq inference module is a `MuData` object, which contains both the perturb-seq data and a set of element-gene pairs for which the inference is to be performed. The output of each method should be the same `MuData` object, except with an additional table containing one or more measures of association for each element-gene pair. The `MuData` format is an HDF5-based language-agnostic format compatible with import into both R and Python. Each `MuData` object will contain a minimal set of fields required for inference, and potentially one or more optional fields that provide additional information. For the purposes of this Jamboree, we have provided `MuData` objects for subsets of the Gasperini et al (2019) and Papalexi et al (2021) datasets. For each dataset, we have provided a minimal `MuData` object that contains just the required fields, as well as a more fleshed out object that contains additional optional fields.

## Requested function API, documentation, and demonstration

Please write a function in your language of choice with the following arguments: 

- The first argument should be `mudata_input_fp`, a filepath to a `MuData` object.
- The second argument should be `mudata_output_fp`, a filepath to an output `MuData` object.
- There may be one or more additional arguments specific to your method. 

The function should read the `MuData` object from `mudata_input_fp`, perform the inference, and write the resulting `MuData` object to `mudata_output_fp` (in Python, via `mudata.read_h5mu()`). The function should include documentation of any additional arguments used. Furthermore, please include a demonstration of the use of your function on at least one of the sample datasets provided, and a brief discussion of the results.

# `MuData` format

Let us walk through the input and output format specifications, from the perspective of Python, using a subset of the Gasperini et al (2019) dataset as an example.

In [1]:
import mudata as md
import pandas as pd

data_dir = "/Users/ljb80/Projects/igvf-jamboree/sceptre-igvf/data/inference"

## Required input fields

We start with an example of the minimal `MuData` object required for perturb-seq inference.

In [2]:
mudata_input_fp = f"{data_dir}/gasperini_inference_input_minimal.h5mu"
input_minimal = md.read_h5mu(mudata_input_fp)
input_minimal



The minimal `MuData` object for perturb-seq inference contains two modalities: `gene` and `guide`. 

### `gene` modality

The `gene` modality just needs to have a `.X` matrix containing the RNA UMI counts. 

In [3]:
input_minimal["gene"].X
input_minimal["gene"][:2, :2].to_df()

Unnamed: 0,ENSG00000008853,ENSG00000104679
GCTTGAATCGAATGCT-1_1B_1_SI-GA-F2,0.0,2.0
AGCTTGATCGAGAGCA-1_1A_2_SI-GA-E3,0.0,0.0


### `guide` modality

The `guide` modality needs to have a `.X` matrix containing the gRNA UMI counts, as well as a `.layers['guide_assignment']` matrix containing the binary gRNA assignments. 

In [4]:
input_minimal["guide"].X
input_minimal["guide"].layers["guide_assignment"]

<9704x55 sparse matrix of type '<class 'numpy.float64'>'
	with 10563 stored elements in Compressed Sparse Row format>

We can view a couple rows and columns of each:

In [5]:
cell_ids = ["GCTTGAATCGAATGCT-1_1B_1_SI-GA-F2", "GGTGAAGCACCAGGCT-1_1A_6_SI-GA-E7"]
grna_ids = ["GCCCTGCTACCCACTTACAG", "ATGTAGAAGGAGACACCGGG"]
pd.DataFrame(
    input_minimal["guide"][cell_ids, grna_ids].X.toarray(),
    index=cell_ids,
    columns=grna_ids,
)

pd.DataFrame(
    input_minimal["guide"][cell_ids, grna_ids].layers["guide_assignment"].toarray(),
    index=cell_ids,
    columns=grna_ids,
)

Unnamed: 0,GCCCTGCTACCCACTTACAG,ATGTAGAAGGAGACACCGGG
GCTTGAATCGAATGCT-1_1B_1_SI-GA-F2,1.0,0.0
GGTGAAGCACCAGGCT-1_1A_6_SI-GA-E7,0.0,1.0


In addition to the guide UMI counts and assignments, the `guide` modality must contain certain metadata information. This includes a `.var` data frame containing at least the binary variable `targeting` (`TRUE` if the guide targets a genomic element of interest or `FALSE` if it is safe- or non-targeting) and the string `intended_target_name` (the name of the genomic element targeted by the guide). 

In [6]:
input_minimal["guide"].var.iloc[[0, 1, 20, 21, 30, 31]]

Unnamed: 0,targeting,intended_target_name
ATGTAGAAGGAGACACCGGG,True,ENSG00000012660
GCGCAGAGGCGGATGTAGAG,True,ENSG00000012660
ACACCCTCATTAGAACCCAG,True,candidate_enh_1
TTAAGAGCCTCGGTTCCCCT,True,candidate_enh_1
GACCTCCTGTGATCAGGTGG,False,non-targeting
ATTGGTATCCGTATAAGCAG,False,non-targeting


Note that the `targeting` column is a string rather than a Boolean due to type compatibility issues involving R, Python, and HDF5. It can be cast to a Boolean if desired. 

Finally, the guide modality must contain `uns` fields called `moi` (`low` or `high`) and `capture_method` (`"CROP-seq"` or `"direct capture"`):

In [7]:
input_minimal["guide"].uns["capture_method"][0]
input_minimal["guide"].uns["moi"][0]

'high'

### Global `.uns`

The input `MuData` object is also required to have a global `.uns` field named `pairs_to_test`, which is a data frame containing the pairs of elements (specified via `intended_target_name`) and genes (specified via `gene_id`) for which the inference is to be performed.

In [8]:
pd.DataFrame(input_minimal.uns["pairs_to_test"])

Unnamed: 0,gene_id,intended_target_name
0,ENSG00000187109,ENSG00000187109
1,ENSG00000114850,ENSG00000114850
2,ENSG00000134851,ENSG00000134851
3,ENSG00000163866,ENSG00000163866
4,ENSG00000181610,ENSG00000181610
...,...,...
105,ENSG00000106789,candidate_enh_2
106,ENSG00000125482,candidate_enh_3
107,ENSG00000095380,candidate_enh_2
108,ENSG00000158941,candidate_enh_1


## Optional input fields

Next we consider optional fields that can be included in the input `MuData` object.

In [9]:
mudata_input_fp = f"{data_dir}/gasperini_inference_input.h5mu"
input_optional = md.read_h5mu(mudata_input_fp)
input_optional



### `gene` modality

The `MuData` object may include cellwise covariates for the `gene` modality in `.mod['gene].obs`, such as number of genes with nonzero UMI counts (`num_expressed_genes`) and total RNA UMIs (`total_gene_umis`):

In [10]:
input_optional["gene"].obs

Unnamed: 0,num_expressed_genes,total_gene_umis
GCTTGAATCGAATGCT-1_1B_1_SI-GA-F2,41,280.0
AGCTTGATCGAGAGCA-1_1A_2_SI-GA-E3,35,192.0
CCCAATCTCCTCAATT-1_1B_1_SI-GA-F2,41,781.0
CGCGGTACACTGTCGG-1_1A_2_SI-GA-E3,37,189.0
GGACGTCTCATGTCTT-1_1B_8_SI-GA-F9,32,262.0
...,...,...
CGCTATCTCTATCGCC-1_2A_4_SI-GA-G5,23,203.0
TCACAAGCAGCCTTGG-1_2A_6_SI-GA-G7,30,173.0
GCTGCAGGTGAAGGCT-1_2B_6_SI-GA-H7,37,428.0
GGATTACCATGTTGAC-1_2A_4_SI-GA-G5,47,658.0


The `MuData` object may include per-gene metadata in `.mod['gene'].var`, such as the HGNC gene symbol (`symbol`), the gene chromosome (`chr`), start (`gene_start`), and end (`gene_end`) coordinates:

In [11]:
input_optional["gene"].var

Unnamed: 0,symbol,gene_chr,gene_start,gene_end
ENSG00000008853,RHOBTB2,chr8,22844930.0,22844931.0
ENSG00000104679,R3HCC1,chr8,23145421.0,23145422.0
ENSG00000104689,TNFRSF10A,chr8,23082573.0,23082574.0
ENSG00000120889,TNFRSF10B,chr8,22926533.0,22926534.0
ENSG00000120896,SORBS3,chr8,22409208.0,22409209.0
...,...,...,...,...
ENSG00000114850,SSR3,chr3,156271913.0,156271914.0
ENSG00000072274,TFRC,chr3,195808960.0,195808961.0
ENSG00000134851,TMEM165,chr4,56262124.0,56262125.0
ENSG00000198899,,,,


### `guide` modality

The `MuData` object may include cellwise covariates for the `guide` modality in `.mod['guide'].obs`, such as number of guides with nonzero UMI counts (`num_expressed_guides`) and total guide UMIs (`total_guide_umis`):

In [12]:
input_optional["guide"].obs

Unnamed: 0,num_expressed_guides,total_guide_umis
GCTTGAATCGAATGCT-1_1B_1_SI-GA-F2,1,9.0
AGCTTGATCGAGAGCA-1_1A_2_SI-GA-E3,1,18.0
CCCAATCTCCTCAATT-1_1B_1_SI-GA-F2,1,24.0
CGCGGTACACTGTCGG-1_1A_2_SI-GA-E3,1,26.0
GGACGTCTCATGTCTT-1_1B_8_SI-GA-F9,1,12.0
...,...,...
CGCTATCTCTATCGCC-1_2A_4_SI-GA-G5,1,5.0
TCACAAGCAGCCTTGG-1_2A_6_SI-GA-G7,1,39.0
GCTGCAGGTGAAGGCT-1_2B_6_SI-GA-H7,1,21.0
GGATTACCATGTTGAC-1_2A_4_SI-GA-G5,1,73.0


The `MuData` object may include per-guide metadata in `.mod['guide'].var` in addition to the required `targeting` and `intended_target_name` fields, such as the chromosome (`intended_target_chr`), start (`intended_target_start`), and end (`intended_target_end`) of the targeted element:

In [13]:
input_optional["guide"].var.iloc[[0, 1, 20, 21, 30, 31]]

Unnamed: 0,targeting,intended_target_name,intended_target_chr,intended_target_start,intended_target_end
ATGTAGAAGGAGACACCGGG,True,ENSG00000012660,chr6,53213723.0,53213738.0
GCGCAGAGGCGGATGTAGAG,True,ENSG00000012660,chr6,53213738.0,53213754.0
ACACCCTCATTAGAACCCAG,True,candidate_enh_1,chr8,23366136.0,23366564.0
TTAAGAGCCTCGGTTCCCCT,True,candidate_enh_1,chr8,23366564.0,23366992.0
GACCTCCTGTGATCAGGTGG,False,non-targeting,,-9.0,-9.0
ATTGGTATCCGTATAAGCAG,False,non-targeting,,-9.0,-9.0


### Global `.obs`

Optionally, the `MuData` input object can contain a global `obs` field containing cell-level information that is not specific to modality, such as batch information. Here is what it looks like for the Gasperini data:

In [14]:
input_optional.obs[["prep_batch", "within_batch_chip", "within_chip_lane"]]

Unnamed: 0,prep_batch,within_batch_chip,within_chip_lane
GCTTGAATCGAATGCT-1_1B_1_SI-GA-F2,prep_batch_1,within_batch_chip_B,within_chip_lane_1
AGCTTGATCGAGAGCA-1_1A_2_SI-GA-E3,prep_batch_1,within_batch_chip_A,within_chip_lane_2
CCCAATCTCCTCAATT-1_1B_1_SI-GA-F2,prep_batch_1,within_batch_chip_B,within_chip_lane_1
CGCGGTACACTGTCGG-1_1A_2_SI-GA-E3,prep_batch_1,within_batch_chip_A,within_chip_lane_2
GGACGTCTCATGTCTT-1_1B_8_SI-GA-F9,prep_batch_1,within_batch_chip_B,within_chip_lane_8
...,...,...,...
CGCTATCTCTATCGCC-1_2A_4_SI-GA-G5,prep_batch_2,within_batch_chip_A,within_chip_lane_4
TCACAAGCAGCCTTGG-1_2A_6_SI-GA-G7,prep_batch_2,within_batch_chip_A,within_chip_lane_6
GCTGCAGGTGAAGGCT-1_2B_6_SI-GA-H7,prep_batch_2,within_batch_chip_B,within_chip_lane_6
GGATTACCATGTTGAC-1_2A_4_SI-GA-G5,prep_batch_2,within_batch_chip_A,within_chip_lane_4


### Pairs to test 

Optionally, `.uns['pairs_to_test']` can have a third column: `pair_type`: 

In [15]:
pd.DataFrame(input_optional.uns["pairs_to_test"])

Unnamed: 0,gene_id,intended_target_name,pair_type
0,ENSG00000187109,ENSG00000187109,positive_control
1,ENSG00000114850,ENSG00000114850,positive_control
2,ENSG00000134851,ENSG00000134851,positive_control
3,ENSG00000163866,ENSG00000163866,positive_control
4,ENSG00000181610,ENSG00000181610,positive_control
...,...,...,...
105,ENSG00000106789,candidate_enh_2,discovery
106,ENSG00000125482,candidate_enh_3,discovery
107,ENSG00000095380,candidate_enh_2,discovery
108,ENSG00000158941,candidate_enh_1,discovery


This optional column classifies pairs based on whether they are intended to be positive controls (an association is known to exist), negative controls (an association is known not to exist), or discovery pairs (pairs where it is unknown whether an association exists). This information need not be used by the inference module, but it is useful for downstream analysis.

## Output fields

The output should be the same `MuData` object as the input, with the addition of a `test_results` field to the global `.uns`:

In [16]:
mudata_output_fp = f"{data_dir}/gasperini_inference_output.h5mu"
output_optional = md.read_h5mu(mudata_output_fp)
output_optional



In [17]:
pd.DataFrame(output_optional.uns["test_results"])

Unnamed: 0,gene_id,intended_target_name,log2_fc,p_value,pair_type
0,ENSG00000187109,ENSG00000187109,-0.774367,3.217223e-85,positive_control
1,ENSG00000114850,ENSG00000114850,-1.849572,2.414163e-79,positive_control
2,ENSG00000134851,ENSG00000134851,-0.893860,4.309833e-50,positive_control
3,ENSG00000163866,ENSG00000163866,-1.223700,4.704066e-49,positive_control
4,ENSG00000181610,ENSG00000181610,-1.314285,3.766690e-42,positive_control
...,...,...,...,...,...
105,ENSG00000106789,candidate_enh_2,0.079632,6.660000e-01,discovery
106,ENSG00000125482,candidate_enh_3,0.144014,8.900000e-01,discovery
107,ENSG00000095380,candidate_enh_2,-0.165492,3.400000e-02,discovery
108,ENSG00000158941,candidate_enh_1,0.117617,7.980000e-01,discovery


This is a data frame containing the same columns as the `pairs_to_test` data frame, plus at least one column containing a measure of the association for each pair. These columns can be `p_value`, `log2_fc`, `posterior_probability`, or any other measure of association.

# Sample submission

Here we present a sample Jamboree submission using PerTurbo to perform a Bayesian statistical analysis.

In [18]:
import scanpy as sc
import numpy as np


## Function

Here is a sample function that computes a posterior probability value based on PerTurbo


In [19]:
def run_scanpy_test_for_target(
    mdata, intended_target_name, gene_ids=None, method="wilcoxon"
):
    targeting_guides = (
        mdata["guide"].var["intended_target_name"] == intended_target_name
    )
    targeted_cells = (
        mdata["guide"][:, targeting_guides].layers["guide_assignment"].sum(axis=1)
    )
    if gene_ids is None:
        adata = mdata["gene"].copy()
    else:
        adata = mdata["gene"][:, gene_ids].copy()
    adata.obs = adata.obs.assign(
        guide_status=["present" if c > 0 else "not_present" for c in targeted_cells]
    )
    # sc.pp.log1p(mdata["gene"])
    # sc.pp.normalize_total(mdata["gene"])
    # sc.pp.regress_out(mdata["gene"], "prep_batch")
    sc.tl.rank_genes_groups(
        adata,
        groupby="guide_status",
        groups=["present"],
        method=method,
        # inplace=False,
    )
    test_results_obj = adata.uns["rank_genes_groups"]
    test_results_df = pd.DataFrame(
        {
            "gene_id": test_results_obj["names"]["present"],
            "intended_target_name": intended_target_name,
            "p_value": test_results_obj["pvals"]["present"],
            "log2_fc": test_results_obj["logfoldchanges"]["present"],
        }
    )
    return test_results_df

In [20]:
run_scanpy_test_for_target(input_minimal, "candidate_enh_1")



  return reduction(axis=axis, out=out, **passkwargs)


Unnamed: 0,gene_id,intended_target_name,p_value,log2_fc
0,ENSG00000148303,candidate_enh_1,3.059994e-03,5.771828
1,ENSG00000114850,candidate_enh_1,4.001935e-03,0.482913
2,ENSG00000175854,candidate_enh_1,5.329449e-03,0.473417
3,ENSG00000095380,candidate_enh_1,5.936656e-03,0.377006
4,ENSG00000167112,candidate_enh_1,7.017157e-03,0.337333
...,...,...,...,...
107,ENSG00000148288,candidate_enh_1,5.890089e-01,-0.074178
108,ENSG00000228253,candidate_enh_1,4.983945e-01,-0.181934
109,ENSG00000106785,candidate_enh_1,4.730550e-01,-0.209303
110,ENSG00000165698,candidate_enh_1,4.635750e-01,-0.542389


In [33]:
def run_scanpy_test_for_pairs(mdata, method="wilcoxon"):
    pairs_to_test = pd.DataFrame(mdata.uns["pairs_to_test"])
    targets = pairs_to_test["intended_target_name"].unique()
    test_results_list = []
    for target in targets:
        tested_genes = pairs_to_test.query(f"intended_target_name=='{target}'")
        test_results_target = run_scanpy_test_for_target(
            mdata, target, tested_genes["gene_id"], method=method
        )
        test_results_list.append(test_results_target)
    mdata.uns["test_results"] = pd.concat(test_results_list).merge(pairs_to_test, how="right")
    return mdata


mdata_output = run_scanpy_test_for_pairs(input_optional)





  return reduction(axis=axis, out=out, **passkwargs)
  return reduction(axis=axis, out=out, **passkwargs)
  return reduction(axis=axis, out=out, **passkwargs)
  return reduction(axis=axis, out=out, **passkwargs)
  return reduction(axis=axis, out=out, **passkwargs)
  return reduction(axis=axis, out=out, **passkwargs)




  return reduction(axis=axis, out=out, **passkwargs)
  return reduction(axis=axis, out=out, **passkwargs)
  return reduction(axis=axis, out=out, **passkwargs)
  return reduction(axis=axis, out=out, **passkwargs)
  return reduction(axis=axis, out=out, **passkwargs)




  return reduction(axis=axis, out=out, **passkwargs)
  return reduction(axis=axis, out=out, **passkwargs)
  return reduction(axis=axis, out=out, **passkwargs)
  return reduction(axis=axis, out=out, **passkwargs)


In [34]:
mdata_output.uns["test_results"].sort_values("p_value").head(20)

Unnamed: 0,gene_id,intended_target_name,p_value,log2_fc,pair_type
1,ENSG00000114850,ENSG00000114850,4.6110260000000003e-125,-4.046391,positive_control
0,ENSG00000187109,ENSG00000187109,8.623445e-72,-8.093996,positive_control
3,ENSG00000163866,ENSG00000163866,1.686899e-64,-2.53024,positive_control
4,ENSG00000181610,ENSG00000181610,3.7388479999999995e-63,-3.105063,positive_control
5,ENSG00000113552,ENSG00000113552,1.5201470000000002e-55,-2.536467,positive_control
2,ENSG00000134851,ENSG00000134851,6.193244e-47,-1.858057,positive_control
6,ENSG00000012660,ENSG00000012660,6.921033999999999e-34,-1.996503,positive_control
15,ENSG00000147454,candidate_enh_1,1.439739e-24,-3.986566,discovery
7,ENSG00000143321,ENSG00000143321,4.435129e-11,-1.215734,positive_control
12,ENSG00000165702,candidate_enh_4,4.671225e-10,-1.052339,discovery


In [35]:
def run_scanpy_test(input_mudata_fp, output_mudata_fp, method="wilcoxon"):
    mdata = md.read(input_mudata_fp)
    mdata_out = run_scanpy_test_for_pairs(mdata, method=method)
    mdata_out.write(output_mudata_fp)
    return mdata_out


## Demonstration

Here is a demonstration of PerTurbo applied the Gasperini data:

In [24]:
# break
mudata_input_fp = f"{data_dir}/gasperini_inference_input.h5mu"

output_scanpy = dict()
for method in ["t-test", "wilcoxon", "t-test_overestim_var"]:
    mudata_output_fp = f"gasperini_inference_output_scanpy_{method}.h5mu"
    output_scanpy[method] = run_scanpy_test(mudata_input_fp, mudata_output_fp, method=method)
output_scanpy





A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[key] = c
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[key] = c
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[key] = c
  return reduction(axis=axis, out=out, **passkwargs)




  return reduction(axis=axis, out=out, **passkwargs)
  return reduction(axis=axis, out=out, **passkwargs)
  return reduction(axis=axis, out=out, **passkwargs)
  return reduction(axis=axis, out=out, **passkwargs)
  return reduction(axis=axis, out=out, **passkwargs)
  return reduction(axis=axis, out=out, **passkwargs)




  return reduction(axis=axis, out=out, **passkwargs)
  return reduction(axis=axis, out=out, **passkwargs)
  return reduction(axis=axis, out=out, **passkwargs)
  return reduction(axis=axis, out=out, **passkwargs)
  return reduction(axis=axis, out=out, **passkwargs)




  return reduction(axis=axis, out=out, **passkwargs)
  return reduction(axis=axis, out=out, **passkwargs)
  return reduction(axis=axis, out=out, **passkwargs)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[key] = c
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[key] = c
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[key] = c




A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[key] = c
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[key] = c
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[key] = c


{'t-test': MuData object with n_obs × n_vars = 9704 × 167
   obs:	'prep_batch', 'within_batch_chip', 'within_chip_lane'
   uns:	'pairs_to_test', 'test_results'
   2 modalities
     gene:	9704 x 112
       obs:	'num_expressed_genes', 'total_gene_umis'
       var:	'symbol', 'gene_chr', 'gene_start', 'gene_end'
     guide:	9704 x 55
       obs:	'num_expressed_guides', 'total_guide_umis'
       var:	'targeting', 'intended_target_name', 'intended_target_chr', 'intended_target_start', 'intended_target_end'
       uns:	'capture_method', 'moi'
       layers:	'guide_assignment',
 'wilcoxon': MuData object with n_obs × n_vars = 9704 × 167
   obs:	'prep_batch', 'within_batch_chip', 'within_chip_lane'
   uns:	'pairs_to_test', 'test_results'
   2 modalities
     gene:	9704 x 112
       obs:	'num_expressed_genes', 'total_gene_umis'
       var:	'symbol', 'gene_chr', 'gene_start', 'gene_end'
     guide:	9704 x 55
       obs:	'num_expressed_guides', 'total_guide_umis'
       var:	'targeting', 'intended

In [25]:
for method, mdata in output_scanpy.items():
    print(method)
    print(mdata.uns["test_results"].sort_values("p_value").head(20))

t-test
     index          gene_id intended_target_name        p_value   log2_fc  \
1        1  ENSG00000114850      ENSG00000114850  1.680092e-121 -4.046391   
5        5  ENSG00000113552      ENSG00000113552   4.495481e-71 -2.536467   
3        3  ENSG00000163866      ENSG00000163866   1.913215e-67 -2.530240   
0        0  ENSG00000187109      ENSG00000187109   1.268217e-60 -8.093996   
4        4  ENSG00000181610      ENSG00000181610   4.334996e-51 -3.105063   
2        2  ENSG00000134851      ENSG00000134851   6.910590e-48 -1.858057   
6        6  ENSG00000012660      ENSG00000012660   4.269067e-38 -1.996503   
109    109  ENSG00000147454      candidate_enh_1   1.019083e-27 -3.986566   
68      68  ENSG00000135046      candidate_enh_5   4.361029e-19 -2.456676   
85      85  ENSG00000165702      candidate_enh_4   8.165343e-19 -1.052339   
7        7  ENSG00000143321      ENSG00000143321   6.059363e-12 -1.215734   
66      66  ENSG00000106992      candidate_enh_3   1.718332e-09 -0.80