Skip to content
This repository has been archived by the owner on Sep 30, 2022. It is now read-only.

Commit

Permalink
rm toy 180:
Browse files Browse the repository at this point in the history
  • Loading branch information
maurolepore committed Feb 20, 2020
1 parent 076a6fb commit fcb1a14
Show file tree
Hide file tree
Showing 2 changed files with 0 additions and 84 deletions.
1 change: 0 additions & 1 deletion .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,3 @@
^r2dii\.match\.Rproj$
^vignettes/_validate-matches\.md$
^vignettes/articles$
^vignettes/intro\.Rmd$
83 changes: 0 additions & 83 deletions vignettes/toy.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -93,86 +93,3 @@ match_name(your_loanbook, your_ald, by_sector = FALSE) %>%
match_name(your_loanbook, your_ald, by_sector = TRUE) %>%
nrow()
```

`min_score` allows you to minimum threshold `score`.

```{r}
matched <- match_name(your_loanbook, your_ald, min_score = 0.9)
range(matched$score)
```

### Maybe overwrite matches

If you are happy with the matching coverage achieved, proceed to the next step. Otherwise, you can manually add matches, not found automatically by `match_name()`. To do this, manually inspect the `ald` and find a company you would like to match to your loanbook. Once a match is found, use excel to write a .csv file similar to [`overwrite_demo`](https://2degreesinvesting.github.io/r2dii.dataraw/reference/overwrite_demo.html), where:

* `level` indicates the level that the manual match should be added to (e.g. `direct_loantaker`)
* `id_2dii` is the id of the loanbook company you would like to match (from the output of `match_name()`)
* `name` is the ald company you would like to manually link to
* `sector` optionally you can also overwrite the sector.
* `source` this can be used later to determine where all manual matches came from.

```{r}
matched <- match_name(
your_loanbook, your_ald, min_score = 0.9, overwrite = overwrite_demo
)
```

## Validate matches

Write the output of `match_name()` into a .csv file with:

```r
# Writting to current working directory
matched %>%
readr::write_csv("matched.csv")
```

Compare, edit, and save the data manually:

* Open _matched.csv_ with any spreadsheet editor (e.g. MS Excel, Google Sheets).
* Compare the columns `name` and `name_ald` manually to determine if the match is valid. Other information can be used in conjunction with just the names to ensure the two entities match (sector, internal information on the company structure, etc.)
* Edit the data:
* If you are happy with the match, set the `score` value to `1`.
* Otherwise set or leave the `score` value to anything other than `1`.
* Save the edited file as, say, _valid_matches.csv_.

Re-read the edited file (validated) with:

```r
# Reading from current working directory
valid_matches <- readr::read_csv("valid_matches.csv")
```

## Prioritize validated matches by level

The validated dataset may have multiple matches per loan. Consider the case where a loan is given to "Acme Power USA", a subsidiary of "Acme Power Co.". There may be both "Acme Power USA" and "Acme Power Co." in the `ald`, and so there could be two valid matches for this loan. To get the best match only, use `prioritize()` -- it picks rows where `score` is 2 and `level` per loan is of highest `priority()`:

```{r}
# Using an example of valid matches stored in r2dii.analysis
path <- system.file("extdata", "valid_matches.csv", package = "r2dii.analysis")
valid_matches <- suppressMessages(read_csv(path))
some_interesting_columns <- vars(id_2dii, level, score)
valid_matches %>%
prioritize() %>%
select(!!! some_interesting_columns)
```

By default, highest priority refers to the most granular match (`direct_loantaker`). The default priority is set internally via `prioritize_levels()`.

```{r}
prioritize_level(matched)
```

You may use a different priority. One way to do that is to pass a function to `priority`. For example, use `rev` to reverse the default priority.

```{r}
matched %>%
prioritize(priority = rev) %>%
select(!!! some_interesting_columns)
```

## Next: Analyze

Once you achieve enough matching coverage, you can analyze the output of `prioritize()` with the package [r2dii.analysis](https://github.com/2DegreesInvesting/r2dii.analysis).

0 comments on commit fcb1a14

Please sign in to comment.