-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit ce9394f
Showing
18 changed files
with
411 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
Package: chem.databases | ||
Title: Collection of 3 Chemical Databases from Public Sources | ||
Version: 1.0.0 | ||
Authors@R: person("Irucka", "Embry", role = c("aut", "cre"), email = "iembry@ecoccs.com") | ||
Maintainer: Irucka Embry <iembry@ecoccs.com> | ||
Depends: R (>= 3.5.0) | ||
Suggests: data.table (>= 1.10.2), install.load, spelling | ||
Description: Contains the Multi-Species Acute Toxicity Database (CAS & SMILES | ||
columns only) [United States (US) Department of Health and Human Services | ||
(DHHS) National Institutes of Health (NIH) National Cancer Institute (NCI), | ||
"Multi-Species Acute Toxicity Database", | ||
<https://cactus.nci.nih.gov/download/acute-toxicity-db/>] combined with the | ||
Toxic Substances Control Act (TSCA) Inventory [United States Environmental | ||
Protection Agency (US EPA), "Toxic Substances Control Act (TSCA) Chemical | ||
Substance Inventory", | ||
<https://www.epa.gov/tsca-inventory/how-access-tsca-inventory} and | ||
<https://cdxapps.epa.gov/oms-substance-registry-services/substance-list-details/169>] | ||
and the Agency for Toxic Substances and Disease Registry (ATSDR) Database | ||
[United States (US) Department of Health and Human Services (DHHS) Centers | ||
for Disease Control and Prevention (CDC)/Agency for Toxic Substances and | ||
Disease Registry (ATSDR), "Agency for Toxic Substances and Disease Registry | ||
(ATSDR) Database", | ||
<https://cdxapps.epa.gov/oms-substance-registry-services/substance-list-details/105>] | ||
in 2 data sets. One data set has a focus on the latter 2 databases and one | ||
data set focuses on the former database. Also contains the collection of | ||
chemical data from Wikipedia compiled in the US EPA CompTox Chemicals | ||
Dashboard [United States Environmental Protection Agency (US EPA) / | ||
Wikimedia Foundation, Inc. "CompTox Chemicals Dashboard v2.2.1", | ||
<https://comptox.epa.gov/dashboard/chemical-lists/WIKIPEDIA>]. | ||
URL: https://gitlab.com/iembry/chem.databases | ||
BugReports: https://gitlab.com/iembry/chem.databases/-/issues | ||
License: CC0 | ||
Copyright: This software is in the public domain because it contains | ||
mostly materials that originally came from the United States | ||
(US) Environmental Protection Agency (US EPA), the United | ||
States (US) Department of Health and Human Services (DHHS) | ||
Centers for Disease Control and Prevention (CDC)/ Agency for | ||
Toxic Substances and Disease Registry (ATSDR), or the United | ||
States (US) Department of Health and Human Services (DHHS) | ||
National Institutes of Health (NIH) National Cancer Institute | ||
(NCI). | ||
Language: en-US | ||
Encoding: UTF-8 | ||
LazyData: true | ||
LazyDataCompression: xz | ||
RoxygenNote: 7.2.3 | ||
NeedsCompilation: no | ||
Packaged: 2023-09-20 19:02:34 UTC; xbyri | ||
Author: Irucka Embry [aut, cre] | ||
Repository: CRAN | ||
Date/Publication: 2023-09-21 13:40:05 UTC |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
59ac40bd520f7bbce8d0c975b14cf680 *DESCRIPTION | ||
dc21c19f0d6968ee25d441b2cf46017d *NAMESPACE | ||
6f3aaae29667ffe5556131c42a01e1a3 *NEWS.md | ||
be1a005d368da68a8da81d6220cc000c *R/chem.databases-package.R | ||
eb263809cc0514cb133dee0a39900a91 *R/chem_wiki.R | ||
caedcb5fa9226127720b6244c0acb5fb *R/data-atsdr_tsca_ld50_a.R | ||
e1d6714d51923b8eada2166de05133f4 *R/data-atsdr_tsca_ld50_b.R | ||
4c93496f2aa8bdc252f63cd8137171a0 *README.md | ||
3e49c06af357232e20cf0f47c022b85f *data/atsdr_tsca_ld50_a.rda | ||
96d57965af3432218889c17384baedff *data/atsdr_tsca_ld50_b.rda | ||
c3cb0c9da380731ba2b17e9d48f55983 *data/chem_wiki.rda | ||
95e8c3eeaa36ffebfd17060a42d46ea1 *inst/WORDLIST | ||
a31f305bd7a62d75d4db6e802bf767b7 *man/atsdr_tsca_ld50_a.Rd | ||
38ddd471dae2360a02b1e63bfc0352a9 *man/atsdr_tsca_ld50_b.Rd | ||
8a5b39531494de7e73491793a647a05c *man/chem.databases-package.Rd | ||
8d4b044edecd8c8f99483a3aaff23fac *man/chem_wiki.Rd | ||
0622a97a2aaa3c342f09636052c2d7f5 *tests/spelling.R |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
# Generated by roxygen2: do not edit by hand | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
# chem.databases 1.0.0 (20 September 2023) | ||
|
||
* Initial release |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
#' chem.databases: Collection of 3 Chemical Databases | ||
#' | ||
#' chem.databases provides the Multi-Species Acute Toxicity Database (CAS & | ||
#' SMILES columns only) combined with the Toxic Substances Control Act (TSCA) | ||
#' Inventory and the Agency for Toxic Substances and Disease Registry (ATSDR) | ||
#' Database in 2 data sets. One data set has a focus on the latter 2 databases | ||
#' and one data set focuses on the former database. Also contains the | ||
#' collection of chemical data from Wikipedia compiled in the US EPA CompTox | ||
#' Chemicals Dashboard. | ||
#' | ||
#' | ||
#' @keywords internal | ||
"_PACKAGE" | ||
#> [1] "_PACKAGE" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
#' CompTox Chemicals Dashboard From Wikipedia | ||
#' | ||
#' A table containing chemical data from the CompTox Chemicals Dashboard which | ||
#' is derived from various Wikipedia pages. | ||
#' | ||
#' | ||
#' | ||
#' @format A data.table data frame with 19,239 rows and 9 variables: | ||
#' \describe{ | ||
#' item{CAS}{Chemical Abstracts Service (CAS) Registry Number} | ||
#' item{Substance Name}{Preferred Chemical Substance Name} | ||
#' item{IUPAC Name}{IUPAC Chemical Name} | ||
#' item{Molecular Formula}{Chemical Molecular Formula} | ||
#' item{SMILES}{Simplified Molecular-Input Line-Entry System (SMILES) Chemical Structural Notation} | ||
#' item{InChI}{International Chemical Identifier (InChI) Chemical Structural Notation} | ||
#' item{InChIKey}{Hashed Version of the InChI} | ||
#' item{Average Mass}{Average Chemical Molecular Mass} | ||
#' item{Monoisotopic Mass}{Single Chemical Isotope Mass} | ||
#' } | ||
#' | ||
#' | ||
#' @source | ||
#' United States Environmental Protection Agency (US EPA) / Wikimedia Foundation, Inc. "CompTox Chemicals Dashboard v2.2.1", \url{https://comptox.epa.gov/dashboard/chemical-lists/WIKIPEDIA}. | ||
#' | ||
#' | ||
#' | ||
#' | ||
"chem_wiki" | ||
#> [1] "chem_wiki" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
#' Collection of ATSDR, NCI, and TSCA Chemical Databases Combined Focused on ATSDR and TSCA Data | ||
#' | ||
#' A table containing chemical data from 3 US federal agencies. | ||
#' | ||
#' | ||
#' | ||
#' @format A data.table data frame with 69,557 rows and 4 variables: | ||
#' \describe{ | ||
#' item{CAS}{Chemical Abstracts Service (CAS) Registry Number} | ||
#' item{Substance Name}{Preferred Chemical Substance Name} | ||
#' item{Registry Name}{Registry Chemical Name} | ||
#' item{SMILES}{Simplified Molecular-Input Line-Entry System (SMILES) Chemical Structural Notation} | ||
#' } | ||
#' | ||
#' | ||
#' @source | ||
#' \enumerate{ | ||
#' \item United States (US) Department of Health and Human Services (DHHS) Centers for Disease Control and Prevention (CDC)/Agency for Toxic Substances and Disease Registry (ATSDR), "Agency for Toxic Substances and Disease Registry (ATSDR) Database", \url{https://cdxapps.epa.gov/oms-substance-registry-services/substance-list-details/105}. | ||
#' \item United States (US) Department of Health and Human Services (DHHS) National Institutes of Health (NIH) National Cancer Institute (NCI), "Multi-Species Acute Toxicity Database", \url{https://cactus.nci.nih.gov/download/acute-toxicity-db/}. | ||
#' \item United States Environmental Protection Agency (US EPA), "Toxic Substances Control Act (TSCA) Chemical Substance Inventory", \url{https://www.epa.gov/tsca-inventory/how-access-tsca-inventory} and \url{https://cdxapps.epa.gov/oms-substance-registry-services/substance-list-details/169}. | ||
#' } | ||
#' | ||
#' | ||
"atsdr_tsca_ld50_a" | ||
#> [1] "atsdr_tsca_ld50_a" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
#' Collection of ATSDR, NCI, and TSCA Chemical Databases Combined Focused on NCI Data | ||
#' | ||
#' A table containing chemical data from 3 US federal agencies. | ||
#' | ||
#' | ||
#' | ||
#' @format A data.table data frame with 80,081 rows and 4 variables: | ||
#' \describe{ | ||
#' item{CAS}{Chemical Abstracts Service (CAS) Registry Number} | ||
#' item{Substance Name}{Preferred Chemical Substance Name} | ||
#' item{Registry Name}{Registry Chemical Name} | ||
#' item{SMILES}{Simplified Molecular-Input Line-Entry System (SMILES) Chemical Structural Notation} | ||
#' } | ||
#' | ||
#' | ||
#' @source | ||
#' \enumerate{ | ||
#' \item United States (US) Department of Health and Human Services (DHHS) Centers for Disease Control and Prevention (CDC)/Agency for Toxic Substances and Disease Registry (ATSDR), "Agency for Toxic Substances and Disease Registry (ATSDR) Database", \url{https://cdxapps.epa.gov/oms-substance-registry-services/substance-list-details/105}. | ||
#' \item United States (US) Department of Health and Human Services (DHHS) National Institutes of Health (NIH) National Cancer Institute (NCI), "Multi-Species Acute Toxicity Database", \url{https://cactus.nci.nih.gov/download/acute-toxicity-db/}. | ||
#' \item United States Environmental Protection Agency (US EPA), "Toxic Substances Control Act (TSCA) Chemical Substance Inventory", \url{https://www.epa.gov/tsca-inventory/how-access-tsca-inventory} and \url{https://cdxapps.epa.gov/oms-substance-registry-services/substance-list-details/169}. | ||
#' } | ||
#' | ||
#' | ||
"atsdr_tsca_ld50_b" | ||
#> [1] "atsdr_tsca_ld50_b" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,102 @@ | ||
# chem.databases | ||
|
||
R data package with chemical data mostly from the US federal government [the United States (US) Environmental Protection Agency (EPA), the United States (US) Department of Health and Human Services (DHHS) Centers for Disease Control and Prevention (CDC)/Agency for Toxic Substances and Disease Registry (ATSDR), or the United States (US) Department of Health and Human Services (DHHS) National Institutes of Health (NIH) National Cancer Institute (NCI)] and a collection of chemical data from Wikipedia pages contained in the CompTox Chemicals Dashboard v2.2.1. | ||
|
||
The following is the description of the data sets: | ||
|
||
1) tsca | ||
|
||
“The Toxic Substances Control Act (TSCA) was enacted by Congress in 1976 and amended in 2016, and provides EPA authority to regulate certain new and existing chemicals commercialized in the United States for non-exempt purpose. Section 8(b) of TSCA requires EPA to compile, keep current, and publish a list of each chemical substance that is commercialized in the US for a TSCA use. The original TSCA Inventory was published in 1979 and included chemicals existing in US commerce at the time TSCA was first enacted. New chemicals are added to the Inventory when a Notice of Commencement is received for chemicals reported to the EPA under TSCA Section 5 through the Pre-Manufacture Notification (PMN) process.” [https://cdxapps.epa.gov/oms-substance-registry-services/substance-list-details/169] | ||
|
||
2) atsdr | ||
|
||
“The Agency for Toxic Substances and Disease Registry (ATSDR) database contains information about chemical substances that are considered to be toxicological profile candidates. The Comprehensive Environmental Response, Compensation, and Liability Act of 1980 (CERCLA or Superfund), as amended by the Superfund Amendments and Reauthorization Act of 1986 (SARA) requires ATSDR and EPA to maintain a Priority List of Hazardous Substances. Each substance on the priority list is a candidate to become the subject of a toxicological profile prepared by ATSDR with the subsequent identification of priority data for that substance. The ATSDR database includes chemicals from that priority list.” [https://cdxapps.epa.gov/oms-substance-registry-services/substance-list-details/105] | ||
|
||
3) ld50 | ||
|
||
“Multi-Species Acute Toxicity Database Download Page | ||
Toxicity measurements for 80,081 unique compounds | ||
The file contains curated data for acute toxicity, primarily focusing on the endpoints: lethal dose fifty (LD50); lethal dose low (LDLo); and toxic dose low (TDLo). It contains 80,081 unique compounds with measurements against 59 endpoints: different combinations of species (mouse, rat, rabbit etc.), exposure route (oral, skin, intramuscular, etc.) and dose metric (LD50, LDLo, TDLo). There is overlap between these data and the public data that were used in the creation of the Registry of Toxic Effects of Chemical Substances (RTECS®) database.” [https://cactus.nci.nih.gov/download/acute-toxicity-db/] | ||
|
||
4) chem_wiki | ||
|
||
“CompTox Chemicals Dashboard v2.2.1 | ||
Description: Wikipedia includes data for thousands of chemicals. ChemBoxes and DrugBoxes includes data such as CAS Registry Numbers, SMILES and InChIs. This list is an assembly from various Wikipedia pages and is a list under ongoing curation and expansion (last updated 07/26/2022.” [https://comptox.epa.gov/dashboard/chemical-lists/WIKIPEDIA] | ||
|
||
|
||
|
||
# Installation | ||
|
||
```R | ||
install.packages("chem.databases") | ||
``` | ||
|
||
|
||
# Help | ||
|
||
With credit due to the `matlab` package, for a complete list of functions and the package DESCRIPTION file, use: | ||
|
||
```R | ||
library(help = "chem.databases") | ||
``` | ||
|
||
|
||
|
||
# Package Contents | ||
|
||
This package contains 3 data sets: | ||
|
||
* `atsdr_tsca_ld50_a`: Combined Collection of ATSDR, NCI, and TSCA Chemical Databases Combined Focused on ATSDR and TSCA Data | ||
* `atsdr_tsca_ld50_b`: Combined Collection of ATSDR, NCI, and TSCA Chemical Databases Combined Focused on NCI Data | ||
* `chem_wiki`: CompTox Chemicals Dashboard From Wikipedia | ||
|
||
|
||
|
||
# Examples | ||
|
||
```R | ||
|
||
install.load::load_package("chem.databases", "data.table") | ||
|
||
# atsdr_tsca_ld50_a | ||
|
||
data(atsdr_tsca_ld50_a) | ||
|
||
atsdr_tsca_ld50_a[atsdr_tsca_ld50_a$"Registry Name" %in% "n-Propylbenzene", ] | ||
|
||
|
||
|
||
# atsdr_tsca_ld50_b | ||
data(atsdr_tsca_ld50_b) | ||
|
||
atsdr_tsca_ld50_b[atsdr_tsca_ld50_b$CAS %in% "50-00-0", ] | ||
|
||
|
||
|
||
# chem_wiki | ||
|
||
data(chem_wiki) | ||
|
||
chem_wiki[chem_wiki$CAS %in% "110-63-4", ] | ||
|
||
``` | ||
|
||
|
||
# Copyright | ||
|
||
This software is in the public domain because it contains mostly materials that originally came from the US federal government [the United States (US) Environmental Protection Agency (EPA), the United States (US) Department of Health and Human Services (DHHS) Centers for Disease Control and Prevention (CDC)/Agency for Toxic Substances and Disease Registry (ATSDR), or the United States (US) Department of Health and Human Services (DHHS) National Institutes of Health (NIH) National Cancer Institute (NCI)]. Other materials are derived from Wikipedia where the “text is available under the Creative Commons Attribution-ShareAlike License 4.0”. | ||
|
||
Copyright Status from the US EPA (https://www.epa.gov/web-policies-and-procedures/epa-disclaimers#copyright): | ||
|
||
“The U.S. Government retains a nonexclusive, royalty-free license to publish or reproduce these documents, or allow others to do so, for U.S. Government purposes. These documents may be freely distributed and used for non-commercial, scientific and educational purposes. Commercial use of the documents available from the EPA websites may be protected under the U.S. and Foreign Copyright Laws. Individual documents on the EPA website may have different copyright conditions, and that will be noted in those documents.” | ||
|
||
This software is provided “AS IS.” | ||
|
||
|
||
|
||
# Donations Accepted | ||
|
||
If you want to support the continued development of this and my other R packages, feel free to: | ||
|
||
<p><script src="https://liberapay.com/iaembry/widgets/button.js"></script> | ||
<noscript><a href="https://liberapay.com/iaembry/donate"><img alt="Donate using Liberapay" src="https://liberapay.com/assets/widgets/donate.svg"></a></noscript></p> |
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
ATSDR | ||
CAS | ||
CERCLA | ||
ChemBoxes | ||
CompTox | ||
DHHS | ||
DrugBoxes | ||
InChIs | ||
LD | ||
LDLo | ||
NCI | ||
PMN | ||
Pre | ||
RTECS | ||
Reauthorization | ||
ShareAlike | ||
TDLo | ||
TSCA | ||
atsdr | ||
curation | ||
ld | ||
tsca |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
Oops, something went wrong.