-
Notifications
You must be signed in to change notification settings - Fork 50
/
mapping-file.txt
116 lines (116 loc) · 327 KB
/
mapping-file.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
#SampleID BarcodeSequence LinkerPrimerSequence center_name center_project_name emp_status experiment_design_description key_seq library_construction_protocol linker platform region run_center run_date run_prefix samp_size sample_center sequencing_meth study_center target_gene target_subfragment age age_unit altitude anonymized_name assigned_from_geo body_habitat body_product body_site collection_timestamp country depth dna_extracted elevation env_biome env_feature env_matter has_physical_specimen host_subject_id host_taxid latitude longitude physical_specimen_remaining project_name required_sample_info_status sample_type sex taxon_id title Description
232.F10Space217 ATCGCTCGAGGA CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 F10Space217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True F1 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Space_bar female 408169 Forensic_identification_using_skin_bacterial_communities Space_bar
232.F11Space217 ATCTACTACACG CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 F11Space217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True F1 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Space_bar female 408169 Forensic_identification_using_skin_bacterial_communities Space_bar
232.F12Space217 ATCTGGTGCTAT CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 F12Space217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True F1 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Space_bar unknown 408169 Forensic_identification_using_skin_bacterial_communities Space_bar
232.L1Space217 ATGCAGCTCAGT CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 L1Space217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True L1 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Space_bar unknown 408169 Forensic_identification_using_skin_bacterial_communities Space_bar
232.L3Space217 ATGCGTAGTGCG CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 L3Space217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True L3 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Space_bar unknown 408169 Forensic_identification_using_skin_bacterial_communities Space_bar
232.M10Space217 ATCGCGGACGAT CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M10Space217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M1 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Space_bar male 408169 Forensic_identification_using_skin_bacterial_communities Space_bar
232.M11Space217 ATCGTACAACTC CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M11Space217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M1 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Space_bar male 408169 Forensic_identification_using_skin_bacterial_communities Space_bar
232.M2Akey217 ACATGATCGTTC CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M2Akey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M2 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Akey male 408169 Forensic_identification_using_skin_bacterial_communities Akey
232.M2Bkey217 ACGCGATACTGG CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M2Bkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M2 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Bkey male 408169 Forensic_identification_using_skin_bacterial_communities Bkey
232.M2Ckey217 ACGATGCGACCA CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M2Ckey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M2 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Ckey male 408169 Forensic_identification_using_skin_bacterial_communities Ckey
232.M2Dkey217 ACATTCAGCGCA CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M2Dkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M2 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Dkey male 408169 Forensic_identification_using_skin_bacterial_communities Dkey
232.M2Ekey217 ACACTGTTCATG CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M2Ekey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M2 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Ekey male 408169 Forensic_identification_using_skin_bacterial_communities Ekey
232.M2Enter217 ACGGTGAGTGTC CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M2Enter217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M2 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Ente male 408169 Forensic_identification_using_skin_bacterial_communities Ente
232.M2Fkey217 ACCACATACATC CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M2Fkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M2 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Fkey male 408169 Forensic_identification_using_skin_bacterial_communities Fkey
232.M2Gkey217 ACCAGACGATGC CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M2Gkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M2 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Gkey male 408169 Forensic_identification_using_skin_bacterial_communities Gkey
232.M2Hkey217 ACCAGCGACTAG CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M2Hkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M2 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Hkey male 408169 Forensic_identification_using_skin_bacterial_communities Hkey
232.M2Ikey217 ACAGTGCTTCAT CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M2Ikey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M2 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Ikey male 408169 Forensic_identification_using_skin_bacterial_communities Ikey
232.M2Indl217 AACTCGTCGATG CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 36 years 0 M2Indl217 False UBERON:skin UBERON:sebum UBERON:skin 7/15/08 GAZ:United States of America 0 True 1624 ENVO:human-associated habitat ENVO:human-associated habitat ENVO:human-associated habitat True M2 9606 40.0083 -105.2705 False fierer_forensic_keyboard completed finger_tip male 539655 Forensic_identification_using_skin_bacterial_communities finger_tip
232.M2Indr217 AATCGTGACTCG CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 36 years 0 M2Indr217 False UBERON:skin UBERON:sebum UBERON:skin 7/15/08 GAZ:United States of America 0 True 1624 ENVO:human-associated habitat ENVO:human-associated habitat ENVO:human-associated habitat True M2 9606 40.0083 -105.2705 False fierer_forensic_keyboard completed finger_tip male 539655 Forensic_identification_using_skin_bacterial_communities finger_tip
232.M2Jkey217 ACCGCAGAGTCA CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M2Jkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M2 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Jkey male 408169 Forensic_identification_using_skin_bacterial_communities Jkey
232.M2Kkey217 ACCTCGATCAGA CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M2Kkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M2 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Kkey male 408169 Forensic_identification_using_skin_bacterial_communities Kkey
232.M2Lkey217 ACCTGTCTCTCT CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M2Lkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M2 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Lkey male 408169 Forensic_identification_using_skin_bacterial_communities Lkey
232.M2Lsft217 ACGGATCGTCAG CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M2Lsft217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M2 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Left_shift male 408169 Forensic_identification_using_skin_bacterial_communities Left_shift
232.M2Midl217 AACTGTGCGTAC CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 36 years 0 M2Midl217 False UBERON:skin UBERON:sebum UBERON:skin 7/15/08 GAZ:United States of America 0 True 1624 ENVO:human-associated habitat ENVO:human-associated habitat ENVO:human-associated habitat True M2 9606 40.0083 -105.2705 False fierer_forensic_keyboard completed finger_tip male 539655 Forensic_identification_using_skin_bacterial_communities finger_tip
232.M2Midr217 ACACACTATGGC CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 36 years 0 M2Midr217 False UBERON:skin UBERON:sebum UBERON:skin 7/15/08 GAZ:United States of America 0 True 1624 ENVO:human-associated habitat ENVO:human-associated habitat ENVO:human-associated habitat True M2 9606 40.0083 -105.2705 False fierer_forensic_keyboard completed finger_tip male 539655 Forensic_identification_using_skin_bacterial_communities finger_tip
232.M2Mkey217 ACGCTATCTGGA CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M2Mkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M2 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Mkey male 408169 Forensic_identification_using_skin_bacterial_communities Mkey
232.M2Nkey217 ACGCGCAGATAC CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M2Nkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M2 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Nkey male 408169 Forensic_identification_using_skin_bacterial_communities Nkey
232.M2Okey217 ACAGTTGCGCGA CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M2Okey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M2 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Okey male 408169 Forensic_identification_using_skin_bacterial_communities Okey
232.M2Pinl217 AAGCTGCAGTCG CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 36 years 0 M2Pinl217 False UBERON:skin UBERON:sebum UBERON:skin 7/15/08 GAZ:United States of America 0 True 1624 ENVO:human-associated habitat ENVO:human-associated habitat ENVO:human-associated habitat True M2 9606 40.0083 -105.2705 False fierer_forensic_keyboard completed finger_tip male 539655 Forensic_identification_using_skin_bacterial_communities finger_tip
232.M2Pinr217 ACACGAGCCACA CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 36 years 0 M2Pinr217 False UBERON:skin UBERON:sebum UBERON:skin 7/15/08 GAZ:United States of America 0 True 1624 ENVO:human-associated habitat ENVO:human-associated habitat ENVO:human-associated habitat True M2 9606 40.0083 -105.2705 False fierer_forensic_keyboard completed finger_tip male 539655 Forensic_identification_using_skin_bacterial_communities finger_tip
232.M2Pkey217 ACATCACTTAGC CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M2Pkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M2 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Pkey male 408169 Forensic_identification_using_skin_bacterial_communities Pkey
232.M2Qkey217 ACACGGTGTCTA CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M2Qkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M2 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Qkey male 408169 Forensic_identification_using_skin_bacterial_communities Qkey
232.M2Rinl217 AAGAGATGTCGA CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 36 years 0 M2Rinl217 False UBERON:skin UBERON:sebum UBERON:skin 7/15/08 GAZ:United States of America 0 True 1624 ENVO:human-associated habitat ENVO:human-associated habitat ENVO:human-associated habitat True M2 9606 40.0083 -105.2705 False fierer_forensic_keyboard completed finger_tip male 539655 Forensic_identification_using_skin_bacterial_communities finger_tip
232.M2Rinr217 ACACATGTCTAC CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 36 years 0 M2Rinr217 False UBERON:skin UBERON:sebum UBERON:skin 7/15/08 GAZ:United States of America 0 True 1624 ENVO:human-associated habitat ENVO:human-associated habitat ENVO:human-associated habitat True M2 9606 40.0083 -105.2705 False fierer_forensic_keyboard completed finger_tip male 539655 Forensic_identification_using_skin_bacterial_communities finger_tip
232.M2Rkey217 ACAGACCACTCA CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M2Rkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M2 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Rkey male 408169 Forensic_identification_using_skin_bacterial_communities Rkey
232.M2Rsft217 ACGCTCATGGAT CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M2Rsft217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M2 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Right_shift male 408169 Forensic_identification_using_skin_bacterial_communities Right_shift
232.M2Skey217 ACATGTCACGTG CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M2Skey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M2 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Skey male 408169 Forensic_identification_using_skin_bacterial_communities Skey
232.M2Space217 ACGTACTCAGTG CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M2Space217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M2 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Space_bar male 408169 Forensic_identification_using_skin_bacterial_communities Space_bar
232.M2Thml217 AACGCACGCTAG CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 36 years 0 M2Thml217 False UBERON:skin UBERON:sebum UBERON:skin 7/15/08 GAZ:United States of America 0 True 1624 ENVO:human-associated habitat ENVO:human-associated habitat ENVO:human-associated habitat True M2 9606 40.0083 -105.2705 False fierer_forensic_keyboard completed finger_tip male 539655 Forensic_identification_using_skin_bacterial_communities finger_tip
232.M2Thmr217 AATCAGTCTCGT CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 36 years 0 M2Thmr217 False UBERON:skin UBERON:sebum UBERON:skin 7/15/08 GAZ:United States of America 0 True 1624 ENVO:human-associated habitat ENVO:human-associated habitat ENVO:human-associated habitat True M2 9606 40.0083 -105.2705 False fierer_forensic_keyboard completed finger_tip male 539655 Forensic_identification_using_skin_bacterial_communities finger_tip
232.M2Tkey217 ACAGAGTCGGCT CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M2Tkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M2 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Tkey male 408169 Forensic_identification_using_skin_bacterial_communities Tkey
232.M2Ukey217 ACAGCTAGCTTG CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M2Ukey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M2 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Ukey male 408169 Forensic_identification_using_skin_bacterial_communities Ukey
232.M2Vkey217 ACGCAACTGCTA CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M2Vkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M2 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Vkey male 408169 Forensic_identification_using_skin_bacterial_communities Vkey
232.M2Wkey217 ACACTAGATCCG CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M2Wkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M2 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Wkey male 408169 Forensic_identification_using_skin_bacterial_communities Wkey
232.M2Xkey217 ACGAGTGCTATC CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M2Xkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M2 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Xkey male 408169 Forensic_identification_using_skin_bacterial_communities Xkey
232.M2Ykey217 ACAGCAGTGGTC CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M2Ykey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M2 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Ykey male 408169 Forensic_identification_using_skin_bacterial_communities Ykey
232.M2Zkey217 ACGACGTCTTAG CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M2Zkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M2 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Zkey male 408169 Forensic_identification_using_skin_bacterial_communities Zkey
232.M3Akey217 AGTGTTCGATCG CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M3Akey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M3 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Akey male 408169 Forensic_identification_using_skin_bacterial_communities Akey
232.M3Bkey217 ATATGCCAGTGC CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M3Bkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M3 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Bkey male 408169 Forensic_identification_using_skin_bacterial_communities Bkey
232.M3Ckey217 ATAGGCGATCTC CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M3Ckey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M3 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Ckey male 408169 Forensic_identification_using_skin_bacterial_communities Ckey
232.M3Ekey217 AGTCACATCACT CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M3Ekey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M3 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Ekey male 408169 Forensic_identification_using_skin_bacterial_communities Ekey
232.M3Gkey217 ATAATCTCGTCG CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M3Gkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M3 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Gkey male 408169 Forensic_identification_using_skin_bacterial_communities Gkey
232.M3Hkey217 ATACACGTGGCG CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M3Hkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M3 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Hkey male 408169 Forensic_identification_using_skin_bacterial_communities Hkey
232.M3Indl217 AGCTATCCACGA CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 33 years 0 M3Indl217 False UBERON:skin UBERON:sebum UBERON:skin 7/15/08 GAZ:United States of America 0 True 1624 ENVO:human-associated habitat ENVO:human-associated habitat ENVO:human-associated habitat True M3 9606 40.0083 -105.2705 False fierer_forensic_keyboard completed finger_tip male 539655 Forensic_identification_using_skin_bacterial_communities finger_tip
232.M3Indr217 AGGACGCACTGT CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 33 years 0 M3Indr217 False UBERON:skin UBERON:sebum UBERON:skin 7/15/08 GAZ:United States of America 0 True 1624 ENVO:human-associated habitat ENVO:human-associated habitat ENVO:human-associated habitat True M3 9606 40.0083 -105.2705 False fierer_forensic_keyboard completed finger_tip male 539655 Forensic_identification_using_skin_bacterial_communities finger_tip
232.M3Jkey217 ATACAGAGCTCC CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M3Jkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M3 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Jkey male 408169 Forensic_identification_using_skin_bacterial_communities Jkey
232.M3Kkey217 ATACGTCTTCGA CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M3Kkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M3 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Kkey male 408169 Forensic_identification_using_skin_bacterial_communities Kkey
232.M3Lkey217 ATACTATTGCGC CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M3Lkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M3 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Lkey male 408169 Forensic_identification_using_skin_bacterial_communities Lkey
232.M3Lsft217 ATCAGGCGTGTG CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M3Lsft217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M3 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Left_Shift male 408169 Forensic_identification_using_skin_bacterial_communities Left_Shift
232.M3Midl217 AGCTCCATACAG CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 33 years 0 M3Midl217 False UBERON:skin UBERON:sebum UBERON:skin 7/15/08 GAZ:United States of America 0 True 1624 ENVO:human-associated habitat ENVO:human-associated habitat ENVO:human-associated habitat True M3 9606 40.0083 -105.2705 False fierer_forensic_keyboard completed finger_tip male 539655 Forensic_identification_using_skin_bacterial_communities finger_tip
232.M3Midr217 AGGCTACACGAC CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 33 years 0 M3Midr217 False UBERON:skin UBERON:sebum UBERON:skin 7/15/08 GAZ:United States of America 0 True 1624 ENVO:human-associated habitat ENVO:human-associated habitat ENVO:human-associated habitat True M3 9606 40.0083 -105.2705 False fierer_forensic_keyboard completed finger_tip male 539655 Forensic_identification_using_skin_bacterial_communities finger_tip
232.M3Mkey217 ATCACTAGTCAC CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M3Mkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M3 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Mkey male 408169 Forensic_identification_using_skin_bacterial_communities Mkey
232.M3Nkey217 ATCACGTAGCGG CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M3Nkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M3 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Nkey male 408169 Forensic_identification_using_skin_bacterial_communities Nkey
232.M3Pinl217 AGCTGACTAGTC CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 33 years 0 M3Pinl217 False UBERON:skin UBERON:sebum UBERON:skin 7/15/08 GAZ:United States of America 0 True 1624 ENVO:human-associated habitat ENVO:human-associated habitat ENVO:human-associated habitat True M3 9606 40.0083 -105.2705 False fierer_forensic_keyboard completed finger_tip male 539655 Forensic_identification_using_skin_bacterial_communities finger_tip
232.M3Pinr217 AGTACGCTCGAG CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 33 years 0 M3Pinr217 False UBERON:skin UBERON:sebum UBERON:skin 7/15/08 GAZ:United States of America 0 True 1624 ENVO:human-associated habitat ENVO:human-associated habitat ENVO:human-associated habitat True M3 9606 40.0083 -105.2705 False fierer_forensic_keyboard completed finger_tip male 539655 Forensic_identification_using_skin_bacterial_communities finger_tip
232.M3Pkey217 AGTGTCACGGTG CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M3Pkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M3 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Pkey male 408169 Forensic_identification_using_skin_bacterial_communities Pkey
232.M3Qkey217 AGTACTGCAGGC CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M3Qkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M3 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Qkey male 408169 Forensic_identification_using_skin_bacterial_communities Qkey
232.M3Rinl217 AGCTCTCAGAGG CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 33 years 0 M3Rinl217 False UBERON:skin UBERON:sebum UBERON:skin 7/15/08 GAZ:United States of America 0 True 1624 ENVO:human-associated habitat ENVO:human-associated habitat ENVO:human-associated habitat True M3 9606 40.0083 -105.2705 False fierer_forensic_keyboard completed finger_tip male 539655 Forensic_identification_using_skin_bacterial_communities finger_tip
232.M3Rinr217 AGGTGTGATCGC CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 33 years 0 M3Rinr217 False UBERON:skin UBERON:sebum UBERON:skin 7/15/08 GAZ:United States of America 0 True 1624 ENVO:human-associated habitat ENVO:human-associated habitat ENVO:human-associated habitat True M3 9606 40.0083 -105.2705 False fierer_forensic_keyboard completed finger_tip male 539655 Forensic_identification_using_skin_bacterial_communities finger_tip
232.M3Rkey217 AGTCCATAGCTG CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M3Rkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M3 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Rkey male 408169 Forensic_identification_using_skin_bacterial_communities Rkey
232.M3Rsft217 ATCCGATCACAG CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M3Rsft217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M3 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Right_shift male 408169 Forensic_identification_using_skin_bacterial_communities Right_shift
232.M3Space217 ATCGATCTGTGG CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M3Space217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M3 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Space_bar male 408169 Forensic_identification_using_skin_bacterial_communities Space_bar
232.M3Thml217 AGCGTAGGTCGT CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 33 years 0 M3Thml217 False UBERON:skin UBERON:sebum UBERON:skin 7/15/08 GAZ:United States of America 0 True 1624 ENVO:human-associated habitat ENVO:human-associated habitat ENVO:human-associated habitat True M3 9606 40.0083 -105.2705 False fierer_forensic_keyboard completed finger_tip male 539655 Forensic_identification_using_skin_bacterial_communities finger_tip
232.M3Thmr217 AGCTTGACAGCT CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 33 years 0 M3Thmr217 False UBERON:skin UBERON:sebum UBERON:skin 7/15/08 GAZ:United States of America 0 True 1624 ENVO:human-associated habitat ENVO:human-associated habitat ENVO:human-associated habitat True M3 9606 40.0083 -105.2705 False fierer_forensic_keyboard completed finger_tip male 539655 Forensic_identification_using_skin_bacterial_communities finger_tip
232.M3Tkey217 AGTCTACTCTGA CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M3Tkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M3 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Tkey male 408169 Forensic_identification_using_skin_bacterial_communities Tkey
232.M3Vkey217 ATATCGCTACTG CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M3Vkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M3 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Vkey male 408169 Forensic_identification_using_skin_bacterial_communities Vkey
232.M3Wkey217 AGTAGTATCCTC CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M3Wkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M3 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Wkey male 408169 Forensic_identification_using_skin_bacterial_communities Wkey
232.M3Xkey217 ATAGCTCCATAC CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M3Xkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M3 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Xkey male 408169 Forensic_identification_using_skin_bacterial_communities Xkey
232.M3Ykey217 AGTCTCGCATAT CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M3Ykey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M3 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Ykey male 408169 Forensic_identification_using_skin_bacterial_communities Ykey
232.M3Zkey217 ATACTCACTCAG CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M3Zkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M3 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Zkey male 408169 Forensic_identification_using_skin_bacterial_communities Zkey
232.M9Akey217 AGACCGTCAGAC CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M9Akey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M9 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Akey male 408169 Forensic_identification_using_skin_bacterial_communities Akey
232.M9Bkey217 AGCAGCACTTGT CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M9Bkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M9 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Bkey male 408169 Forensic_identification_using_skin_bacterial_communities Bkey
232.M9Ckey217 AGCACACCTACA CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M9Ckey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M9 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Ckey male 408169 Forensic_identification_using_skin_bacterial_communities Ckey
232.M9Dkey217 AGACTGCGTACT CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M9Dkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M9 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Dkey male 408169 Forensic_identification_using_skin_bacterial_communities Dkey
232.M9Ekey217 ACTCTTCTAGAG CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M9Ekey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M9 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Ekey male 408169 Forensic_identification_using_skin_bacterial_communities Ekey
232.M9Enter217 AGCGAGCTATCT CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M9Enter217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M9 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Ente male 408169 Forensic_identification_using_skin_bacterial_communities Ente
232.M9Fkey217 AGAGAGCAAGTG CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M9Fkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M9 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Fkey male 408169 Forensic_identification_using_skin_bacterial_communities Fkey
232.M9Gkey217 AGAGCAAGAGCA CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M9Gkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M9 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Gkey male 408169 Forensic_identification_using_skin_bacterial_communities Gkey
232.M9Hkey217 AGAGTAGCTAAG CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M9Hkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M9 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Hkey male 408169 Forensic_identification_using_skin_bacterial_communities Hkey
232.M9Indl217 ACGTGAGAGAAT CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 25 years 0 M9Indl217 False UBERON:skin UBERON:sebum UBERON:skin 7/15/08 GAZ:United States of America 0 True 1624 ENVO:human-associated habitat ENVO:human-associated habitat ENVO:human-associated habitat True M9 9606 40.0083 -105.2705 False fierer_forensic_keyboard completed finger_tip male 539655 Forensic_identification_using_skin_bacterial_communities finger_tip
232.M9Indr217 ACTAGCTCCATA CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 25 years 0 M9Indr217 False UBERON:skin UBERON:sebum UBERON:skin 7/15/08 GAZ:United States of America 0 True 1624 ENVO:human-associated habitat ENVO:human-associated habitat ENVO:human-associated habitat True M9 9606 40.0083 -105.2705 False fierer_forensic_keyboard completed finger_tip male 539655 Forensic_identification_using_skin_bacterial_communities finger_tip
232.M9Kkey217 AGATACACGCGC CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M9Kkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M9 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Kkey male 408169 Forensic_identification_using_skin_bacterial_communities Kkey
232.M9Midl217 ACGTGCCGTAGA CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 25 years 0 M9Midl217 False UBERON:skin UBERON:sebum UBERON:skin 7/15/08 GAZ:United States of America 0 True 1624 ENVO:human-associated habitat ENVO:human-associated habitat ENVO:human-associated habitat True M9 9606 40.0083 -105.2705 False fierer_forensic_keyboard completed finger_tip male 539655 Forensic_identification_using_skin_bacterial_communities finger_tip
232.M9Midr217 ACTATTGTCACG CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 25 years 0 M9Midr217 False UBERON:skin UBERON:sebum UBERON:skin 7/15/08 GAZ:United States of America 0 True 1624 ENVO:human-associated habitat ENVO:human-associated habitat ENVO:human-associated habitat True M9 9606 40.0083 -105.2705 False fierer_forensic_keyboard completed finger_tip male 539655 Forensic_identification_using_skin_bacterial_communities finger_tip
232.M9Mkey217 AGCATATGAGAG CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M9Mkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M9 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Mkey male 408169 Forensic_identification_using_skin_bacterial_communities Mkey
232.M9Nkey217 AGCAGTCGCGAT CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M9Nkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M9 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Nkey male 408169 Forensic_identification_using_skin_bacterial_communities Nkey
232.M9Okey217 ACTTGTAGCAGC CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M9Okey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M9 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Okey male 408169 Forensic_identification_using_skin_bacterial_communities Okey
232.M9Pinl217 ACTACAGCCTAT CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 25 years 0 M9Pinl217 False UBERON:skin UBERON:sebum UBERON:skin 7/15/08 GAZ:United States of America 0 True 1624 ENVO:human-associated habitat ENVO:human-associated habitat ENVO:human-associated habitat True M9 9606 40.0083 -105.2705 False fierer_forensic_keyboard completed finger_tip male 539655 Forensic_identification_using_skin_bacterial_communities finger_tip
232.M9Pinr217 ACTCAGATACTC CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 25 years 0 M9Pinr217 False UBERON:skin UBERON:sebum UBERON:skin 7/15/08 GAZ:United States of America 0 True 1624 ENVO:human-associated habitat ENVO:human-associated habitat ENVO:human-associated habitat True M9 9606 40.0083 -105.2705 False fierer_forensic_keyboard completed finger_tip male 539655 Forensic_identification_using_skin_bacterial_communities finger_tip
232.M9Pkey217 AGAACACGTCTC CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M9Pkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M9 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Pkey male 408169 Forensic_identification_using_skin_bacterial_communities Pkey
232.M9Qkey217 ACTCGATTCGAT CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M9Qkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M9 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Qkey male 408169 Forensic_identification_using_skin_bacterial_communities Qkey
232.M9Rinl217 ACGTTAGCACAC CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 25 years 0 M9Rinl217 False UBERON:skin UBERON:sebum UBERON:skin 7/15/08 GAZ:United States of America 0 True 1624 ENVO:human-associated habitat ENVO:human-associated habitat ENVO:human-associated habitat True M9 9606 40.0083 -105.2705 False fierer_forensic_keyboard completed finger_tip male 539655 Forensic_identification_using_skin_bacterial_communities finger_tip
232.M9Rinr217 ACTCACGGTATG CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 25 years 0 M9Rinr217 False UBERON:skin UBERON:sebum UBERON:skin 7/15/08 GAZ:United States of America 0 True 1624 ENVO:human-associated habitat ENVO:human-associated habitat ENVO:human-associated habitat True M9 9606 40.0083 -105.2705 False fierer_forensic_keyboard completed finger_tip male 539655 Forensic_identification_using_skin_bacterial_communities finger_tip
232.M9Skey217 AGACGTGCACTG CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M9Skey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M9 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Skey male 408169 Forensic_identification_using_skin_bacterial_communities Skey
232.M9Space217 AGCGCTGATGTG CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M9Space217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M9 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Space_bar male 408169 Forensic_identification_using_skin_bacterial_communities Space_bar
232.M9Thml217 ACGTCTGTAGCA CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 25 years 0 M9Thml217 False UBERON:skin UBERON:sebum UBERON:skin 7/15/08 GAZ:United States of America 0 True 1624 ENVO:human-associated habitat ENVO:human-associated habitat ENVO:human-associated habitat True M9 9606 40.0083 -105.2705 False fierer_forensic_keyboard completed finger_tip male 539655 Forensic_identification_using_skin_bacterial_communities finger_tip
232.M9Thmr217 ACTACGTGTGGT CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 25 years 0 M9Thmr217 False UBERON:skin UBERON:sebum UBERON:skin 7/15/08 GAZ:United States of America 0 True 1624 ENVO:human-associated habitat ENVO:human-associated habitat ENVO:human-associated habitat True M9 9606 40.0083 -105.2705 False fierer_forensic_keyboard completed finger_tip male 539655 Forensic_identification_using_skin_bacterial_communities finger_tip
232.M9Vkey217 AGCACGAGCCTA CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M9Vkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M9 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Vkey male 408169 Forensic_identification_using_skin_bacterial_communities Vkey
232.M9Wkey217 ACTCGCACAGGA CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M9Wkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M9 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Wkey male 408169 Forensic_identification_using_skin_bacterial_communities Wkey
232.M9Xkey217 AGATGTTCTGCT CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M9Xkey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M9 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Xkey male 408169 Forensic_identification_using_skin_bacterial_communities Xkey
232.M9Ykey217 ACTGTACGCGTA CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 M9Ykey217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True M9 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Ykey male 408169 Forensic_identification_using_skin_bacterial_communities Ykey
232.R1Space217 ATCTCTGGCATA CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 R1Space217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True R1 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Space_bar unknown 408169 Forensic_identification_using_skin_bacterial_communities Space_bar
232.U1Space217 ATGACCATCGTG CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 U1Space217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True U1 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Space_bar unknown 408169 Forensic_identification_using_skin_bacterial_communities Space_bar
232.U2Space217 ATGACTCATTCG CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 U2Space217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True U2 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Space_bar unknown 408169 Forensic_identification_using_skin_bacterial_communities Space_bar
232.U3Space217 ATGAGACTCCAC CATGCTGCCTCCCGTAGGAGT CCME Forensic_identification_using_skin_bacterial_communities EMP Forensic_identification_using_skin_bacterial_communities TCAG 16S_rRNA_gene_sequences_were_processed_according_to_the_methods_described_in_our_previous_publications_(Fierer_et_al.,_2008;_Hamady_et_al.,_2008)._Briefly,_sequences_<200_or_>300gnt_or_with_average_quality_scores_of_<25_were_removed_from_the_dataset,_as_were_those_with_uncorrectable_barcodes,_ambiguous_bases,_or_if_the_bacterial_16S_rRNA_gene-specific_primer_was_absent._Sequences_were_then_assigned_to_the_specific_subsamples_based_on_their_unique_12nt_barcode_and_then_grouped_into_phylotypes_at_the_97%_level_of_sequence_identity_using_cd-hit_(Li_&_Godzik,_2006)_with_a_minimum_coverage_of_97%._We_chose_to_group_the_phylotypes_at_97%_identity_because_this_matches_the_limits_of_resolution_of_pyrosequencing_(Kunin_et_al.,_2010)_and_because_the_branch_length_so_omitted_contributes_little_to_the_tree_and_therefore_to_phylogenetic_estimates_of___diversity_(Hamady_et_al.,_2009)._A_representative_for_each_phylotype_was_chosen_by_selecting_the_most_abundant_sequence_in_the_phylotype,_with_ties_being_broken_by_choosing_the_longest_sequence._A_phylogenetic_tree_of_the_representative_sequences_was_constructed_using_the_Kimura_2-parameter_model_in_Fast_Tree_(Price_et_al.,_2009)_after_sequences_were_aligned_with_NAST_(minimum_150nt_at_75%_minimum_identity)_(DeSantis_et_al.,_2006a)_against_the_GreenGenes_database_(DeSantis_et_al.,_2006b)._Hypervariable_regions_were_screened_out_of_the_alignment_using_PH_Lane_mask_(http://greengenes.lbl.gov/)._Differences_in_the_community_composition_for_each_pair_of_samples_were_determined_from_the_phylogenetic_tree_using_the_weighted_and_unweighted_UniFrac_algorithms_(Lozupone_&_Knight,_2005;_Lozupone_et_al.,_2006)._UniFrac_is_a_tree-based_metric_that_measures_the_distance_between_two_communities_as_the_fraction_of_branch_length_in_a_phylogenetic_tree_that_is_unique_to_one_of_the_communities_(as_opposed_to_being_shared_by_both)._This_method_of_community_comparison_accounts_for_the_relative_similarities_and_differences_among_phylotypes_(or_higher_taxa)_rather_than_treating_all_taxa_at_a_given_level_of_divergence_as_equal_(Lozupone_&_Knight,_2008)._Although_UniFrac_depends_on_a_phylogenetic_tree,_it_is_relatively_robust_to_differences_in_the_tree_reconstruction_method_or_to_the_approximation_of_using_phylotypes_to_represent_groups_of_very_similar_sequences_(Hamady_et_al.,_2009). CA FLX 0 CCME 8/14/08 FFCKVMW 1, swab CCME pyrosequencing CCME 16S rRNA V2 unknown years 0 U3Space217 False unknown unknown unknown 7/15/08 GAZ:United States of America 0 True 1624 ENVO:surface ENVO:surface ENVO:surface True U3 36244 40.0083 -105.2705 False fierer_forensic_keyboard completed Space_bar unknown 408169 Forensic_identification_using_skin_bacterial_communities Space_bar