Skip to content

Commit

Permalink
version 1.0.0
Browse files Browse the repository at this point in the history
  • Loading branch information
iembry authored and cran-robot committed Sep 21, 2023
0 parents commit ce9394f
Show file tree
Hide file tree
Showing 18 changed files with 411 additions and 0 deletions.
51 changes: 51 additions & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
Package: chem.databases
Title: Collection of 3 Chemical Databases from Public Sources
Version: 1.0.0
Authors@R: person("Irucka", "Embry", role = c("aut", "cre"), email = "iembry@ecoccs.com")
Maintainer: Irucka Embry <iembry@ecoccs.com>
Depends: R (>= 3.5.0)
Suggests: data.table (>= 1.10.2), install.load, spelling
Description: Contains the Multi-Species Acute Toxicity Database (CAS & SMILES
columns only) [United States (US) Department of Health and Human Services
(DHHS) National Institutes of Health (NIH) National Cancer Institute (NCI),
"Multi-Species Acute Toxicity Database",
<https://cactus.nci.nih.gov/download/acute-toxicity-db/>] combined with the
Toxic Substances Control Act (TSCA) Inventory [United States Environmental
Protection Agency (US EPA), "Toxic Substances Control Act (TSCA) Chemical
Substance Inventory",
<https://www.epa.gov/tsca-inventory/how-access-tsca-inventory} and
<https://cdxapps.epa.gov/oms-substance-registry-services/substance-list-details/169>]
and the Agency for Toxic Substances and Disease Registry (ATSDR) Database
[United States (US) Department of Health and Human Services (DHHS) Centers
for Disease Control and Prevention (CDC)/Agency for Toxic Substances and
Disease Registry (ATSDR), "Agency for Toxic Substances and Disease Registry
(ATSDR) Database",
<https://cdxapps.epa.gov/oms-substance-registry-services/substance-list-details/105>]
in 2 data sets. One data set has a focus on the latter 2 databases and one
data set focuses on the former database. Also contains the collection of
chemical data from Wikipedia compiled in the US EPA CompTox Chemicals
Dashboard [United States Environmental Protection Agency (US EPA) /
Wikimedia Foundation, Inc. "CompTox Chemicals Dashboard v2.2.1",
<https://comptox.epa.gov/dashboard/chemical-lists/WIKIPEDIA>].
URL: https://gitlab.com/iembry/chem.databases
BugReports: https://gitlab.com/iembry/chem.databases/-/issues
License: CC0
Copyright: This software is in the public domain because it contains
mostly materials that originally came from the United States
(US) Environmental Protection Agency (US EPA), the United
States (US) Department of Health and Human Services (DHHS)
Centers for Disease Control and Prevention (CDC)/ Agency for
Toxic Substances and Disease Registry (ATSDR), or the United
States (US) Department of Health and Human Services (DHHS)
National Institutes of Health (NIH) National Cancer Institute
(NCI).
Language: en-US
Encoding: UTF-8
LazyData: true
LazyDataCompression: xz
RoxygenNote: 7.2.3
NeedsCompilation: no
Packaged: 2023-09-20 19:02:34 UTC; xbyri
Author: Irucka Embry [aut, cre]
Repository: CRAN
Date/Publication: 2023-09-21 13:40:05 UTC
17 changes: 17 additions & 0 deletions MD5
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
59ac40bd520f7bbce8d0c975b14cf680 *DESCRIPTION
dc21c19f0d6968ee25d441b2cf46017d *NAMESPACE
6f3aaae29667ffe5556131c42a01e1a3 *NEWS.md
be1a005d368da68a8da81d6220cc000c *R/chem.databases-package.R
eb263809cc0514cb133dee0a39900a91 *R/chem_wiki.R
caedcb5fa9226127720b6244c0acb5fb *R/data-atsdr_tsca_ld50_a.R
e1d6714d51923b8eada2166de05133f4 *R/data-atsdr_tsca_ld50_b.R
4c93496f2aa8bdc252f63cd8137171a0 *README.md
3e49c06af357232e20cf0f47c022b85f *data/atsdr_tsca_ld50_a.rda
96d57965af3432218889c17384baedff *data/atsdr_tsca_ld50_b.rda
c3cb0c9da380731ba2b17e9d48f55983 *data/chem_wiki.rda
95e8c3eeaa36ffebfd17060a42d46ea1 *inst/WORDLIST
a31f305bd7a62d75d4db6e802bf767b7 *man/atsdr_tsca_ld50_a.Rd
38ddd471dae2360a02b1e63bfc0352a9 *man/atsdr_tsca_ld50_b.Rd
8a5b39531494de7e73491793a647a05c *man/chem.databases-package.Rd
8d4b044edecd8c8f99483a3aaff23fac *man/chem_wiki.Rd
0622a97a2aaa3c342f09636052c2d7f5 *tests/spelling.R
2 changes: 2 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Generated by roxygen2: do not edit by hand

3 changes: 3 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# chem.databases 1.0.0 (20 September 2023)

* Initial release
14 changes: 14 additions & 0 deletions R/chem.databases-package.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
#' chem.databases: Collection of 3 Chemical Databases
#'
#' chem.databases provides the Multi-Species Acute Toxicity Database (CAS &
#' SMILES columns only) combined with the Toxic Substances Control Act (TSCA)
#' Inventory and the Agency for Toxic Substances and Disease Registry (ATSDR)
#' Database in 2 data sets. One data set has a focus on the latter 2 databases
#' and one data set focuses on the former database. Also contains the
#' collection of chemical data from Wikipedia compiled in the US EPA CompTox
#' Chemicals Dashboard.
#'
#'
#' @keywords internal
"_PACKAGE"
#> [1] "_PACKAGE"
29 changes: 29 additions & 0 deletions R/chem_wiki.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
#' CompTox Chemicals Dashboard From Wikipedia
#'
#' A table containing chemical data from the CompTox Chemicals Dashboard which
#' is derived from various Wikipedia pages.
#'
#'
#'
#' @format A data.table data frame with 19,239 rows and 9 variables:
#' \describe{
#' item{CAS}{Chemical Abstracts Service (CAS) Registry Number}
#' item{Substance Name}{Preferred Chemical Substance Name}
#' item{IUPAC Name}{IUPAC Chemical Name}
#' item{Molecular Formula}{Chemical Molecular Formula}
#' item{SMILES}{Simplified Molecular-Input Line-Entry System (SMILES) Chemical Structural Notation}
#' item{InChI}{International Chemical Identifier (InChI) Chemical Structural Notation}
#' item{InChIKey}{Hashed Version of the InChI}
#' item{Average Mass}{Average Chemical Molecular Mass}
#' item{Monoisotopic Mass}{Single Chemical Isotope Mass}
#' }
#'
#'
#' @source
#' United States Environmental Protection Agency (US EPA) / Wikimedia Foundation, Inc. "CompTox Chemicals Dashboard v2.2.1", \url{https://comptox.epa.gov/dashboard/chemical-lists/WIKIPEDIA}.
#'
#'
#'
#'
"chem_wiki"
#> [1] "chem_wiki"
25 changes: 25 additions & 0 deletions R/data-atsdr_tsca_ld50_a.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#' Collection of ATSDR, NCI, and TSCA Chemical Databases Combined Focused on ATSDR and TSCA Data
#'
#' A table containing chemical data from 3 US federal agencies.
#'
#'
#'
#' @format A data.table data frame with 69,557 rows and 4 variables:
#' \describe{
#' item{CAS}{Chemical Abstracts Service (CAS) Registry Number}
#' item{Substance Name}{Preferred Chemical Substance Name}
#' item{Registry Name}{Registry Chemical Name}
#' item{SMILES}{Simplified Molecular-Input Line-Entry System (SMILES) Chemical Structural Notation}
#' }
#'
#'
#' @source
#' \enumerate{
#' \item United States (US) Department of Health and Human Services (DHHS) Centers for Disease Control and Prevention (CDC)/Agency for Toxic Substances and Disease Registry (ATSDR), "Agency for Toxic Substances and Disease Registry (ATSDR) Database", \url{https://cdxapps.epa.gov/oms-substance-registry-services/substance-list-details/105}.
#' \item United States (US) Department of Health and Human Services (DHHS) National Institutes of Health (NIH) National Cancer Institute (NCI), "Multi-Species Acute Toxicity Database", \url{https://cactus.nci.nih.gov/download/acute-toxicity-db/}.
#' \item United States Environmental Protection Agency (US EPA), "Toxic Substances Control Act (TSCA) Chemical Substance Inventory", \url{https://www.epa.gov/tsca-inventory/how-access-tsca-inventory} and \url{https://cdxapps.epa.gov/oms-substance-registry-services/substance-list-details/169}.
#' }
#'
#'
"atsdr_tsca_ld50_a"
#> [1] "atsdr_tsca_ld50_a"
25 changes: 25 additions & 0 deletions R/data-atsdr_tsca_ld50_b.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#' Collection of ATSDR, NCI, and TSCA Chemical Databases Combined Focused on NCI Data
#'
#' A table containing chemical data from 3 US federal agencies.
#'
#'
#'
#' @format A data.table data frame with 80,081 rows and 4 variables:
#' \describe{
#' item{CAS}{Chemical Abstracts Service (CAS) Registry Number}
#' item{Substance Name}{Preferred Chemical Substance Name}
#' item{Registry Name}{Registry Chemical Name}
#' item{SMILES}{Simplified Molecular-Input Line-Entry System (SMILES) Chemical Structural Notation}
#' }
#'
#'
#' @source
#' \enumerate{
#' \item United States (US) Department of Health and Human Services (DHHS) Centers for Disease Control and Prevention (CDC)/Agency for Toxic Substances and Disease Registry (ATSDR), "Agency for Toxic Substances and Disease Registry (ATSDR) Database", \url{https://cdxapps.epa.gov/oms-substance-registry-services/substance-list-details/105}.
#' \item United States (US) Department of Health and Human Services (DHHS) National Institutes of Health (NIH) National Cancer Institute (NCI), "Multi-Species Acute Toxicity Database", \url{https://cactus.nci.nih.gov/download/acute-toxicity-db/}.
#' \item United States Environmental Protection Agency (US EPA), "Toxic Substances Control Act (TSCA) Chemical Substance Inventory", \url{https://www.epa.gov/tsca-inventory/how-access-tsca-inventory} and \url{https://cdxapps.epa.gov/oms-substance-registry-services/substance-list-details/169}.
#' }
#'
#'
"atsdr_tsca_ld50_b"
#> [1] "atsdr_tsca_ld50_b"
102 changes: 102 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# chem.databases

R data package with chemical data mostly from the US federal government [the United States (US) Environmental Protection Agency (EPA), the United States (US) Department of Health and Human Services (DHHS) Centers for Disease Control and Prevention (CDC)/Agency for Toxic Substances and Disease Registry (ATSDR), or the United States (US) Department of Health and Human Services (DHHS) National Institutes of Health (NIH) National Cancer Institute (NCI)] and a collection of chemical data from Wikipedia pages contained in the CompTox Chemicals Dashboard v2.2.1.

The following is the description of the data sets:

1) tsca

&ldquo;The Toxic Substances Control Act (TSCA) was enacted by Congress in 1976 and amended in 2016, and provides EPA authority to regulate certain new and existing chemicals commercialized in the United States for non-exempt purpose. Section 8(b) of TSCA requires EPA to compile, keep current, and publish a list of each chemical substance that is commercialized in the US for a TSCA use. The original TSCA Inventory was published in 1979 and included chemicals existing in US commerce at the time TSCA was first enacted. New chemicals are added to the Inventory when a Notice of Commencement is received for chemicals reported to the EPA under TSCA Section 5 through the Pre-Manufacture Notification (PMN) process.&rdquo; [https://cdxapps.epa.gov/oms-substance-registry-services/substance-list-details/169]

2) atsdr

&ldquo;The Agency for Toxic Substances and Disease Registry (ATSDR) database contains information about chemical substances that are considered to be toxicological profile candidates. The Comprehensive Environmental Response, Compensation, and Liability Act of 1980 (CERCLA or Superfund), as amended by the Superfund Amendments and Reauthorization Act of 1986 (SARA) requires ATSDR and EPA to maintain a Priority List of Hazardous Substances. Each substance on the priority list is a candidate to become the subject of a toxicological profile prepared by ATSDR with the subsequent identification of priority data for that substance. The ATSDR database includes chemicals from that priority list.&rdquo; [https://cdxapps.epa.gov/oms-substance-registry-services/substance-list-details/105]

3) ld50

&ldquo;Multi-Species Acute Toxicity Database Download Page
Toxicity measurements for 80,081 unique compounds
The file contains curated data for acute toxicity, primarily focusing on the endpoints: lethal dose fifty (LD50); lethal dose low (LDLo); and toxic dose low (TDLo). It contains 80,081 unique compounds with measurements against 59 endpoints: different combinations of species (mouse, rat, rabbit etc.), exposure route (oral, skin, intramuscular, etc.) and dose metric (LD50, LDLo, TDLo). There is overlap between these data and the public data that were used in the creation of the Registry of Toxic Effects of Chemical Substances (RTECS®) database.&rdquo; [https://cactus.nci.nih.gov/download/acute-toxicity-db/]

4) chem_wiki

&ldquo;CompTox Chemicals Dashboard v2.2.1
Description: Wikipedia includes data for thousands of chemicals. ChemBoxes and DrugBoxes includes data such as CAS Registry Numbers, SMILES and InChIs. This list is an assembly from various Wikipedia pages and is a list under ongoing curation and expansion (last updated 07/26/2022.&rdquo; [https://comptox.epa.gov/dashboard/chemical-lists/WIKIPEDIA]



# Installation

```R
install.packages("chem.databases")
```


# Help

With credit due to the `matlab` package, for a complete list of functions and the package DESCRIPTION file, use:

```R
library(help = "chem.databases")
```



# Package Contents

This package contains 3 data sets:

* `atsdr_tsca_ld50_a`: Combined Collection of ATSDR, NCI, and TSCA Chemical Databases Combined Focused on ATSDR and TSCA Data
* `atsdr_tsca_ld50_b`: Combined Collection of ATSDR, NCI, and TSCA Chemical Databases Combined Focused on NCI Data
* `chem_wiki`: CompTox Chemicals Dashboard From Wikipedia



# Examples

```R

install.load::load_package("chem.databases", "data.table")

# atsdr_tsca_ld50_a

data(atsdr_tsca_ld50_a)

atsdr_tsca_ld50_a[atsdr_tsca_ld50_a$"Registry Name" %in% "n-Propylbenzene", ]



# atsdr_tsca_ld50_b
data(atsdr_tsca_ld50_b)

atsdr_tsca_ld50_b[atsdr_tsca_ld50_b$CAS %in% "50-00-0", ]



# chem_wiki

data(chem_wiki)

chem_wiki[chem_wiki$CAS %in% "110-63-4", ]

```


# Copyright

This software is in the public domain because it contains mostly materials that originally came from the US federal government [the United States (US) Environmental Protection Agency (EPA), the United States (US) Department of Health and Human Services (DHHS) Centers for Disease Control and Prevention (CDC)/Agency for Toxic Substances and Disease Registry (ATSDR), or the United States (US) Department of Health and Human Services (DHHS) National Institutes of Health (NIH) National Cancer Institute (NCI)]. Other materials are derived from Wikipedia where the &ldquo;text is available under the Creative Commons Attribution-ShareAlike License 4.0&rdquo;.

Copyright Status from the US EPA (https://www.epa.gov/web-policies-and-procedures/epa-disclaimers#copyright):

&ldquo;The U.S. Government retains a nonexclusive, royalty-free license to publish or reproduce these documents, or allow others to do so, for U.S. Government purposes. These documents may be freely distributed and used for non-commercial, scientific and educational purposes. Commercial use of the documents available from the EPA websites may be protected under the U.S. and Foreign Copyright Laws. Individual documents on the EPA website may have different copyright conditions, and that will be noted in those documents.&rdquo;

This software is provided &ldquo;AS IS.&rdquo;



# Donations Accepted

If you want to support the continued development of this and my other R packages, feel free to:

<p><script src="https://liberapay.com/iaembry/widgets/button.js"></script>
<noscript><a href="https://liberapay.com/iaembry/donate"><img alt="Donate using Liberapay" src="https://liberapay.com/assets/widgets/donate.svg"></a></noscript></p>
Binary file added data/atsdr_tsca_ld50_a.rda
Binary file not shown.
Binary file added data/atsdr_tsca_ld50_b.rda
Binary file not shown.
Binary file added data/chem_wiki.rda
Binary file not shown.
22 changes: 22 additions & 0 deletions inst/WORDLIST
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
ATSDR
CAS
CERCLA
ChemBoxes
CompTox
DHHS
DrugBoxes
InChIs
LD
LDLo
NCI
PMN
Pre
RTECS
Reauthorization
ShareAlike
TDLo
TSCA
atsdr
curation
ld
tsca
29 changes: 29 additions & 0 deletions man/atsdr_tsca_ld50_a.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

29 changes: 29 additions & 0 deletions man/atsdr_tsca_ld50_b.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

29 changes: 29 additions & 0 deletions man/chem.databases-package.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit ce9394f

Please sign in to comment.