# Basics on Using `Corpus`

One of the easiest ways to get started with allofplos is to use the `Corpus` class. 

But, you ask:

> Why use the `Corpus` class?

It is a straightforward way to get back `Article` objects from your corpus without needing to instantiate them one by one.

It also has handy utilities if you wanted to do more specific things that we're not going to get into.

> How do I use it? 

Eager, are we‽ I thought you'd never ask!

# Import Corpus

First, we need to import `Corpus`. 

We're also going to import the `starterdir` corpus directory to use the corpus that comes with `allofplos`.

In [1]:
from allofplos import Corpus, starterdir

## Instantiate Corpus

Second we need to instantiate the `Corpus` object. 

In this case we're going to pass in `starterdir` so we use allofplos' starter corpus. 

In [2]:
corpus = Corpus(starterdir)

## Use Corpus

Now you're ready to use `corpus`!

### See how many articles are in the corpus

You can use `len(corpus)` to get the number of articles in the corpus.

In [3]:
len(corpus)

122

### Display a random article 

To get a single random article we use `corpus.random_article`.

This will resample the article each time you ask for it.

In [4]:
display(corpus.random_article)

DOI: 10.1371/journal.pmed.1001186
Title: Guidance for Evidence-Informed Policies about Health Systems: Linking Guidance Development to Policy Development

### Display an article from a specific doi


If you already know the doi for the article you are interested in, you can access the doi like you would in a dictionary: `corpus[your_doi]`.

In [5]:
corpus['10.1371/journal.pcbi.1004141']

DOI: 10.1371/journal.pcbi.1004141
Title: The Equivalence of Information-Theoretic and Likelihood-Based Methods for Neural Dimensionality Reduction

### Access every article in the corpus

You can use python's `for article in corpus:` syntax to do something to each article in your corpus.

This will return the articles in a new random order each time you call it.

In [6]:
for article in corpus:
    print("doi:", article.doi, "journal:", article.journal)

doi: 10.1371/journal.pmed.1001080 journal: PLOS Medicine
doi: 10.1371/journal.pbio.1001636 journal: PLOS Biology
doi: 10.1371/journal.pone.0126470 journal: PLOS ONE
doi: 10.1371/journal.pone.0108198 journal: PLOS ONE
doi: 10.1371/journal.pone.0147124 journal: PLOS ONE
doi: 10.1371/journal.pone.0040259 journal: PLOS ONE
doi: 10.1371/journal.pbio.1001044 journal: PLOS Biology
doi: 10.1371/journal.pone.0070598 journal: PLOS ONE
doi: 10.1371/journal.pcbi.1001051 journal: PLOS Computational Biology
doi: 10.1371/journal.pone.0118238 journal: PLOS ONE
doi: 10.1371/journal.pone.0146913 journal: PLOS ONE
doi: 10.1371/journal.pone.0160653 journal: PLOS ONE
doi: 10.1371/journal.pmed.0030445 journal: PLOS Medicine
doi: 10.1371/journal.pone.0118342 journal: PLOS ONE
doi: 10.1371/journal.ppat.1002735 journal: PLOS Pathogens
doi: 10.1371/journal.pcbi.1002484 journal: PLOS Computational Biology
doi: 10.1371/journal.pone.0116201 journal: PLOS ONE
doi: 10.1371/journal.pcbi.1000112 journal: PLOS Computat

### Access a random sample of articles

You can use the `corpus.random_sample()` method to get a random sample of articles from the corpus. 

The best way to use this is by iterating through the random sample: `for article in corpus.random_sample(x)`

**NB**: It returns a generator (not a list) to avoid using too much memory.

In [7]:
for article in corpus.random_sample(50):
    display(article)

DOI: 10.1371/journal.ppat.1002247
Title: Vaccinia Virus Protein C6 Is a Virulence Factor that Binds TBK-1 Adaptor Proteins and Inhibits Activation of IRF3 and IRF7

DOI: 10.1371/journal.pone.0068090
Title: Abnormal Contextual Modulation of Visual Contour Detection in Patients with Schizophrenia

DOI: 10.1371/journal.pcbi.1004113
Title: HPV Clearance and the Neglected Role of Stochasticity

DOI: 10.1371/journal.pone.0055490
Title: Genetic Testing for TMEM154 Mutations Associated with Lentivirus Susceptibility in Sheep

DOI: 10.1371/journal.pone.0117688
Title: Iterative Most-Likely Point Registration (IMLP): A Robust Algorithm for Computing Optimal Shape Alignment

DOI: 10.1371/journal.pone.0121226
Title: AIB-OR: Improving Onion Routing Circuit Construction Using Anonymous Identity-Based Cryptosystems

DOI: 10.1371/journal.pone.0153170
Title: Renal Transplant Recipients Treated with Calcineurin-Inhibitors Lack Circulating Immature Transitional CD19+CD24hiCD38hi Regulatory B-Lymphocytes

DOI: 10.1371/journal.pone.0046041
Title: Potential Role of M. tuberculosis Specific IFN-γ and IL-2 ELISPOT Assays in Discriminating Children with Active or Latent Tuberculosis

DOI: 10.1371/journal.pone.0100977
Title: Identification of a Major Phosphopeptide in Human Tristetraprolin by Phosphopeptide Mapping and Mass Spectrometry

DOI: 10.1371/journal.pbio.1001199
Title: Interplay between BRCA1 and RHAMM Regulates Epithelial Apicobasal Polarization and May Influence Risk of Breast Cancer

DOI: 10.1371/journal.pmed.1001473
Title: Uncovering Treatment Burden as a Key Concept for Stroke Care: A Systematic Review of Qualitative Research

DOI: 10.1371/journal.ppat.1002735
Title: Synergistic Parasite-Pathogen Interactions Mediated by Host Immunity Can Drive the Collapse of Honeybee Colonies

DOI: 10.1371/journal.pmed.1000097
Title: Preferred Reporting Items for Systematic Reviews and Meta-Analyses: The PRISMA Statement

DOI: 10.1371/journal.pmed.0020124
Title: Why Most Published Research Findings Are False

DOI: 10.1371/journal.ppat.1000166
Title: The Pseudomonas Quinolone Signal (PQS) Balances Life and Death in Pseudomonas aeruginosa Populations

DOI: 10.1371/journal.pcbi.1000112
Title: Evolution of Evolvability in Gene Regulatory Networks

DOI: 10.1371/journal.pone.0026358
Title: Benefit from B-Lymphocyte Depletion Using the Anti-CD20 Antibody Rituximab in Chronic Fatigue Syndrome. A Double-Blind and Placebo-Controlled Study

DOI: 10.1371/journal.pone.0087236
Title: New Material of Beelzebufo, a Hyperossified Frog (Amphibia: Anura) from the Late Cretaceous of Madagascar

DOI: 10.1371/journal.pcbi.1004089
Title: Delayed Response and Biosonar Perception Explain Movement Coordination in Trawling Bats

DOI: 10.1371/journal.pone.0005723
Title: Complete Primate Skeleton from the Middle Eocene of Messel in Germany: Morphology and Paleobiology

DOI: 10.1371/journal.pone.0120049
Title: Effects of Acute Exposure to Increased Plasma Branched-Chain Amino Acid Concentrations on Insulin-Mediated Plasma Glucose Turnover in Healthy Young Subjects

DOI: 10.1371/journal.pone.0116201
Title: Relation between Financial Market Structure and the Real Economy: Comparison between Clustering Methods

DOI: 10.1371/journal.pone.0080518
Title: Ecoinformatics Can Reveal Yield Gaps Associated with Crop-Pest Interactions: A Proof-of-Concept

DOI: 10.1371/journal.pone.0069640
Title: Alternative Immunomodulatory Strategies for Xenotransplantation: CD80/CD86-CTLA4 Pathway-Modified Immature Dendritic Cells Promote Xenograft Survival

DOI: 10.1371/journal.pone.0108198
Title: Correction: Macrophage Control of Phagocytosed Mycobacteria Is Increased by Factors Secreted by Alveolar Epithelial Cells through Nitric Oxide Independent Mechanisms

DOI: 10.1371/journal.ppat.0020025
Title: Identification of a Novel Gammaretrovirus in Prostate Tumors of Patients Homozygous for R462Q RNASEL Variant

DOI: 10.1371/journal.pmed.1001518
Title: Acupuncture and Counselling for Depression in Primary Care: A Randomised Controlled Trial

DOI: 10.1371/journal.pmed.0030205
Title: Mischievous Odds Ratios

DOI: 10.1371/journal.pone.0147124
Title: A Microarray-Based Analysis Reveals that a Short Photoperiod Promotes Hair Growth in the Arbas Cashmere Goat

DOI: 10.1371/journal.pone.0028031
Title: Polymorphisms in Genes Involved in the NF-κB Signalling Pathway Are Associated with Bone Mineral Density, Geometry and Turnover in Men

DOI: 10.1371/journal.pbio.1000359
Title: The Light-Driven Proton Pump Proteorhodopsin Enhances Bacterial Survival during Tough Times

DOI: 10.1371/journal.pmed.0030445
Title: Social Medicine in the Twenty-First Century

DOI: 10.1371/journal.pone.0078761
Title: Additive Partitioning of Coral Reef Fish Diversity across Hierarchical Spatial Scales throughout the Caribbean

DOI: 10.1371/journal.pbio.1001636
Title: Lost Branches on the Tree of Life

DOI: 10.1371/journal.pone.0117949
Title: Exact Solutions of Linear Reaction-Diffusion Processes on a Uniformly Growing Domain: Criteria for Successful Colonization

DOI: 10.1371/journal.ppat.1000105
Title: Anti-Fungal Innate Immunity in C. elegans Is Enhanced by Evolutionary Diversification of Antimicrobial Peptides

DOI: 10.1371/journal.pbio.1001289
Title: Neuroscience, Ethics, and National Security: The State of the Art

DOI: 10.1371/journal.ppat.1003133
Title: Schmallenberg Virus Pathogenesis, Tropism and Interaction with the Innate Immune System of the Host

DOI: 10.1371/journal.pbio.0030408
Title: Stimulating the Brain Makes the Fingers More Sensitive

DOI: 10.1371/journal.pmed.0030132
Title: Bigger and Better: How Pfizer Redefined Erectile Dysfunction

DOI: 10.1371/journal.pmed.0020402
Title: Tackling Inherited Blindness

DOI: 10.1371/journal.pmed.1001300
Title: Multidrug Resistant Pulmonary Tuberculosis Treatment Regimens and Patient Outcomes: An Individual Patient Data Meta-analysis of 9,153 Patients

DOI: 10.1371/journal.ppat.1005207
Title: Retraction: Extreme Resistance as a Host Counter-counter Defense against Viral Suppression of RNA Silencing

DOI: 10.1371/journal.pntd.0001041
Title: A Phase Two Randomised Controlled Double Blind Trial of High Dose Intravenous Methylprednisolone and Oral Prednisolone versus Intravenous Normal Saline and Oral Prednisolone in Individuals with Leprosy Type 1 Reactions and/or Nerve Function Impairment

DOI: 10.1371/journal.pcbi.1003292
Title: Reconstructing the Genomic Content of Microbiome Taxa through Shotgun Metagenomic Deconvolution

DOI: 10.1371/journal.pone.0138823
Title: Structure-Activity Relationship of Indole-Tethered Pyrimidine Derivatives that Concurrently Inhibit Epidermal Growth Factor Receptor and Other Angiokinases

DOI: 10.1371/journal.pcbi.1000589
Title: A Quick Guide for Developing Effective Bioinformatics Programming Skills

DOI: 10.1371/journal.pone.0078921
Title: Serum Based Diagnosis of Asthma Using Raman Spectroscopy: An Early Phase Pilot Study

DOI: 10.1371/journal.pmed.0020007
Title: Educating the Brain to Avoid Dementia: Can Mental Exercise Prevent Alzheimer Disease?

DOI: 10.1371/journal.pone.0074790
Title: Identification and Characterization of a Novel Plasmodium falciparum Adhesin Involved in Erythrocyte Invasion

# Now you know! 

Now you know the basics of using the `Corpus` class. 

- You can point `Corpus(directory)` to a corpus directory on your file system. 
- You can how many articles are in your corpus with `len(Corpus())`
- You can get one random article with `Corpus().random_article`.
- You can get the article with a specific doi with `Corpus()[doi]`.
- You can access all of the articles in a corpus iteratively with `for article in Corpus():`.
- You can access `x` random articles from the corpus with `for article in Corpus().random_sample(x):`.

Now it's time to check out the Article tutorial. Once it exists, we'll definitely link to it here.