-
Notifications
You must be signed in to change notification settings - Fork 0
/
rosv.Rmd
178 lines (127 loc) · 6.38 KB
/
rosv.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
---
title: "Introduction to rosv"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Introduction to rosv}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
## Package purpose
The {rosv} package provides two core purposes:
- Access information from the [Open Source Vulnerability (OSV) database](https://osv.dev/).
- Operate on vulnerability information and create formatted lists for package administration.
Consequently, the functions in {rosv} relate to querying or downloading content from OSV, parsing JSON content, and generating tables and lists regarding key information on package vulnerabilities. The OSV database is not specific to R related repositories such as CRAN; users can access information on any ecosystem available in OSV. For R users who also dabble in Python, they can search for package vulnerabilities within the PyPI repository while remaining in an R interface.
## Basic Examples
The following examples will outline an assortment of package functionality but first we must load the package!
```{r setup}
library(rosv)
```
### Detect vulnerable packages
One of the simplest queries is to provide a package and ecosystem and return a `TRUE`/`FALSE` response
informing you if that package has ever been listed with a vulnerability.
```{r, eval = FALSE}
is_pkg_vulnerable(c('dask', 'dash'), ecosystem = c('PyPI', 'PyPI'))
```
```{r, echo = FALSE}
c(dask = TRUE, dash = FALSE)
```
The number of vulnerabilities detected for each package can also be queried.
```{r, eval = FALSE}
osv_count_vulns(c('dask', 'readxl', 'dplyr'), c('PyPI', 'CRAN', 'CRAN'))
```
```{r, echo = FALSE}
c(dask = 1, readxl = 3, dplyr = 0)
```
### List package vulnerabilities
The most basic usage of {rosv} is to pull all versions of an ecosystem's packages (e.g. PyPI or CRAN) listed
in the OSV database. This can be achieved using high-level functions such as `osv_query()` and `create_osv_list()`.
To start we can query one package in PyPI for vulnerabilities.
```{r, eval = FALSE}
pkg_vul <- osv_query('dask', ecosystem = 'PyPI', all_affected = FALSE)
```
Use the OSV query to generate a sorted and de-duplicated list of just package name and version.
```{r, eval = FALSE}
pkg_tbl <- create_osv_list(pkg_vul, as.data.frame = TRUE)
head(pkg_tbl, 3)
```
```{r, echo = FALSE}
data.frame(name = rep('dask', 3),
versions = c('0.10.0', '0.10.1', '0.10.2'))
```
Pull the entire set of PyPI vulnerability data and de-duplicate
```{r example, eval = FALSE}
pkg_vul <- osv_query(ecosystem = 'PyPI', all_affected = FALSE)
pypi_vul <- create_osv_list(pkg_vul, as.data.frame = FALSE, NA_value = ' ')
head(pypi_vul, 3)
```
```{r, echo = FALSE}
c("aaiohttp\t ", "accesscontrol\t2.13.0", "accesscontrol\t2.13.1")
```
## Scan an R project
Packages discovered within an R project (such as {renv} LOCK files and installed packages at `.libPaths()`) can be
parsed and scanned directly using `osv_scan()`. A `data.frame` is returned with the package name and a logical value
specifying if a vulnerability was discovered in the OSV database. If a particular scanning mode does not exist, similar
functionality can be created if a package list and associated version information is passed to `is_pkg_vulnerable()`.
```{r, eval = FALSE}
osv_scan('r_project')
```
```{r, echo = FALSE}
data.frame(name = c('commonmark', 'jsonlite', 'askpass', 'base'),
version = c( '1.9.0', '1.8.7', '1.2.0', '4.3.1'),
ecosystem = rep('CRAN', 4),
is_vul = c(TRUE, TRUE, FALSE , FALSE))
```
## Use API helpers directly
Lower-level functions return more detail about the API request and response
contained within the R6 object. These are more flexible than their higher-level alternatives.
By default, the results of these functions will be cached. This can be overriden
by specifying `cache = FALSE`.
The higher-level API query function `osv_query()` builds upon these helpers to align the format of returned content,
making it the preferred choice for typical use-cases.
```{r, eval = FALSE}
# Returns entire response object to parse as you please.
osv_query_1('dask', ecosystem = 'PyPI')
# Returns the vulnerability IDs for packages in list
osv_querybatch('dask', ecosystem = 'PyPI')
# Return vulnerabilities from different ecosystems as vectors
osv_querybatch(c('dask', 'readxl'), ecosystem = c('PyPI', 'CRAN'))
# Grab details by vulns ID
osv_vulns('PYSEC-2021-387')
# Download vulns for an ecosystem
osv_download('PYSEC-2021-387', 'PyPI')
osv_download(ecosystem = 'PyPI', download_only = TRUE)
```
## Result caching
By default, results from queries using API helpers (e.g. `osv_query()` or `osv_querybatch()`) will cache results using `memoise::memoise()`. The caching can be turned off directly using function parameters or globally reset using `clear_osv_cache()`. Caching is the default behavior to help enforce polite access of the OSV API. When clearing the cache, all vulnerability files saved to disk at the temporary R session location will also be removed (refer to the environment variable `ROSV_CACHE_GLOBAL`).
```{r, eval = FALSE}
# Query without caching
osv_query('dask', ecosystem = 'PyPI', cache = FALSE)
# File will be saved to disk
osv_download('PYSEC-2021-387', 'PyPI')
# Clear cache, as needed
clear_osv_cache()
```
## Creating a cross-referenced whitelist
When using a product such as {miniCRAN} or Posit Package Manager there may be
corporate requirements to limit what packages users can install. Although having a
whitelist is often recommended, it should either specify the exact versions that are approved
or exclude packages with known vulnerabilities. Given the sheer amount of packages and their versions, this
is often difficult. The following method will take a vector of packages (from PyPI) and cross-reference against
the OSV database. If packages are identified they are either entirely dropped, or the specific versions with flagged
vulnerabilities are excluded.
```{r, eval = FALSE}
# List of packages of interest
python_pkg <- c('dask', 'dash', 'keras')
# Create the xref whitelist
xref_pkg_list <- create_xref_whitelist(python_pkg,
ecosystem = 'PyPI',
output_format = 'requirements.txt')
# Output requirements.txt which can be used with PPM product
writeLines(xref_pkg_list, file.path(tempdir(), 'requirements.txt'))
```